skip to primary navigationskip to content
 

IRT, CAT and Machine Learning Summer School

When Jul 23, 2018 10:30 AM to
Jul 27, 2018 05:30 PM
Where Cambridge, UK
Add event to calendar vCal
iCal

IRT, CAT and Machine Learning in R and Concerto

This 5-day Summer School combines two of our most popular courses (“Item Response Theory (IRT) and Computer Adaptive Testing (CAT) in R and Concerto” and “Machine Learning in R for Social Scientists”) into a program tailored for researchers in the social and health sciences. It is suitable for researchers taking their first steps in R programming, psychometrics, and machine learning, but also for those who already have a solid understanding and wish to advance their skills.  

The course is taught by Dr Chris Gibbons (Harvard Medical School/The Psychometrics Centre) and Mr Aiden Loe (The Psychometrics Centre). Aiden and Chris have delivered variants of these courses here in Cambridge and also, by invitation, at universities, hospitals, private firms, and government institutions in the United Kingdom, Sweden, Australia, Malaysia, Canada, and the United States. 

For those who are new to R, some additional materials will be given to you before the workshop so that you can become familiar with the R syntax and working environment. There are no obligations to complete these additional materials, but it will undoubtedly hasten the learning pace during the workshop. Those with experience in these domains will also find challenging content and develop their knowledge under the supervision of the experienced University of Cambridge Psychometrics Centre staff. 

LocationDates and TimeFeeTutors

Cambridge Judge
Business School
Trumpington Street
Cambridge CB2 1AG

MAP

July 23rd to 27th 2018
5-day Summer School

5 days (Mon-Fri)
Business £1000 (+ 20% VAT)
Academic £800 (+ 20% VAT)
Students £600 (+ 20% VAT) 

4 days (Tue-Fri)
Business £900 (+ 20% VAT)
Academic £700 (+ 20% VAT)
Students £500 (+ 20% VAT) 

Click on above links
to pay by Credit Card

Dr Chris Gibbons,
Aiden Loe

Prior experience with R is not necessary. Participants should bring their own laptops with the latest version of R installed ( https://www.r-project.org/ ) and also RStudio (http://www.rstudio.com/ide/download/ )

Programme

Day 1: Introduction to R

Day 1 will provide delegates with a solid introduction to R programming, getting everyone up to speed and allowing us to assess the varying levels of prior expertise among the group. Homework materials will be available before the course to enable new users to become familiar with R before the course. No prior knowledge in R is assumed for those attending day 1.

In the morning, we will cover the framework of R and show delegates how they can load different types of datasets into R. Delegates will also learn how to use various R functions to shape the dataset for statistical analyses.

In the afternoon, Delegates will conduct simple statistical analysis (t-test, correlation, linear regression) using R. Additional practices will be provided and delegates will have time to work on it in the afternoon. We will end the day by teaching basic data visualisation techniques using the popular ggplot2 R package. 

Day 2: Item Response Theory (IRT) and the Concerto testing platform

Day 2 will focus on teaching the fundamentals of the Concerto testing platform and introducing Item Response Theory (IRT). The IRT models will then be applied to datasets. 

In the morning, delegates will develop and publish a fixed length personality assessment using both flowchart logic and R programming within the Concerto platform.

In the afternoon, delegates will learn about Item Response Theory and have the first-hand experience of applying IRT and Differential Item Functioning (DIF) analyses to questionnaire data. IRT can be used to improve questionnaire accuracy and sensitivity, while DIF analyses are used to compare data quality and item performance across different groups.

Day 3: Computerized Adaptive Testing (CAT) Development

Day 3 will focus on developing your own CAT. In the morning, delegates will learn about CAT theories and the associated R functions. They will learn how to simulate different CAT algorithms to evaluate which is best suited to different datasets and testing scenarios.

In the afternoon, they will use their knowledge of IRT to build a CAT in Concerto. As part of this development process, we will guide them through IRT scoring in R and how to interpret the test outputs. By the end of the day, they will learn to visualise and present instant feedback to test-takers through the Concerto platform in a clear and interpretable format.

Delegates will consolidate the theoretical and practical knowledge attained during the day and explore the latest developments in online psychometrics. We expect delegates to independently run an IRT analysis and develop a CAT from a new dataset, with minimal supervision. The purpose of this exercise is to give delegates confidence in their newly acquired skills and ensure that these continue to develop beyond the programme.

Day 4: Data mining and sentiment text analysis

On Day 4, delegates learn to use APIs and web scraping method (e.g. twitteR) to mine different forms of data from the Internet. In the morning, they will obtain and manipulate unstructured text data and public domain data (e.g. from Wikipedia or the BBC) using popular data science R packages (e.g. xml2, jsonlite).

In the afternoon, they will learn advanced data wrangling techniques to shape the dataset into a structured and more interpretable manner using different R packages (e.g. data.table, tidyverse). They will subsequently perform analyses on this dataset to visualise and derive insights (e.g. sentiment analysis) from it. Discussions will be conducted regarding the legality of scraping online content for commercial and academic use.

If there is sufficient time, we will introduce the ShinyR app to delegates.  For more information about the ShinyR app, please see (https://shiny.rstudio.com).

Day 5: Machine Learning in practice

Day 5 will focus on the implementation of machine learning algorithms. The morning will focus on machine-learning concepts. Delegates will learn how predictive algorithms work, the difference between supervised and unsupervised methods, feature selection and extraction, the assessment of algorithm performance, and more. This introduction to the theory behind the statistics and practice of data science will also help to deepen delegates’ understanding of the underlying mechanisms at work, ahead of implementing their algorithms in practice.

The afternoon will be spent implementing machine learning algorithms in the Concerto platform, for example for text classification and sentiment analysis. Delegates will write and run their own scripts to see different algorithms in action on pre-cleaned datasets that we provide.

We will also leave time to discuss what’s next for Concerto and psychometrics, such as continuous item deployment, automatic item generation, crowdsourcing, mobile systems integration, Big Data assessment and other trends. This will hopefully inspire ideas for further collaboration.

Timetable

Day 1 – Introduction to R and Rstudio
10.30 – 11.30 R and Rstudio installation, Introduction to R
11.30 - 13.00 Data analysis and R basics, Practical (Part 1)
13.00 - 02.00 Lunch
14.00 - 15.30 Practical (Part 2)
16.00 - 17.30 Practical examples and exercises, Questions

Day 2 – Item Response Theory (IRT)
09.30 - 11.00 Concerto testing platform (Theory and Practical)
11.15 - 12.45 IRT (Theory), Different IRT response scales (dichotomous and polytomous) (Theory).
12.45 - 02.00 Lunch
14.00 - 15.30 IRT modelling of the CESD depression scale (Theory & Practical)
16.00 - 17.30 IRT diagnostics of the CESD depression scale (Theory & Practical), Practical examples and exercises, Revisions and Questions

Day 3 – Computer Adaptive Testing (CAT) with Concerto

09.30 - 11.00 Computer Adaptive Testing (CAT Theory and Practical)
11.15 - 12.45 Firestar simulation of item bank performance (Practical), Programming a polytomous adaptive test using the catR package (Practical)
12.45 - 02.00 Lunch
14.00 - 15.30 Introduction to adaptive testing in Concerto (Theory and Practical)
16.00 - 17.30 Revision & Exercise

Day 4: Introduction to Data science
09.30 - 11.00 Web scraping techniques (Part 1 - Mining online content)
11.15 - 12.45 Web scraping techniques (Part 2 - Cleaning and organising data)
12.45 - 02.00 Lunch
14.00 - 15.30 Data ‘wrangling’ using tidyverse application
16.00 - 17.30 Gaining insights into datasets. Exercises and Q&A.

Day 5: Practical implementation of Machine Learning 

  • Types of machine learning
  • High-dimensional spaces
  • Commonly used techniques including generalised linear models, support vector machines and neural networks
  • Training machine learning algorithms
  • Side-by-side performance comparison for model performance
  • Feature selection and extraction
  • Text mining challenges including polysemy, adverbs and contronyms
  • Assessing algorithm performance using accuracy, sensitivity, and specificity
  • Ensemble learning and nFold validation
  • Web implementation of machine learning using Concerto
  • Certificates and Q&A

Selected feedback from alumni of our courses: 

  • "The course was incredibly well-pitched.. it was the best course I have ever attended in terms of quality of teaching!"
  • "Chris and Aiden were engaging speakers who clearly were very familiar with both the theoretical underpinnings of IRT and CAT, as well as the Concerto platform.  As instructors, they managed the diversity in skills and experience in the classroom well." 
  •  "A really great atmosphere and great teaching that fostered a collaborative learning environment."
  •  "I liked the 'spirit' of the workshop which was intense, motivating and very hands-on - thank you so much for a brilliant course!"
  •  "An excellent three days; it was incredibly informative, well-paced and well organised."