Recent Research Publications

Kosinski, M., Stillwell D.J., Graepel, T. (2013) Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences (PNAS).
Abstract
We show that easily accessible digital records of behavior, Facebook Likes, can be used to automatically and accurately predict a range of highly sensitive personal attributes including: sexual orientation, ethnicity, religious and political views, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, and gender. The analysis presented is based on a dataset of over 58,000 volunteers who provided their Facebook Likes,detailed demographic profiles, and the results of several  psychometric tests. The proposed model uses dimensionality reduction for preprocessing the Likes data, which are then entered into logistic/linear regression to predict individual psychodemographic profiles from Likes. The model correctly discriminates between homosexual and heterosexual men in 88% of cases, African Americans and Caucasian Americans in 95% of cases, and between Democrat and Republican in 85% of cases. For the personality trait “Openness,” prediction accuracy is close to the test–retest accuracy of a standard
personality test. We give examples of associations between attributes and Likes and discuss implications for online personalization and privacy.

Bonneville-Roussy, A., Vallerand, R. J., & Bouffard, T. (2013). The roles of autonomy support and harmonious and obsessive passions in educational persistence. Learning and Individual Differences. (in press) 

Schwartz, H. A., Eichstaedt, J.C., Kern, M.L., Dziurzynski, L., Ramones, S.M., Agrawal, M. Shah, A., Kosinski, M., Stillwell, D.S., Seligman, M.E.P, Ungar, L.H. (2013) Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach. PLOS ONE

Kosinski, M., Stillwell, D.J., Kohli, P., Bachrach, Y., Graepel, T. (2012) Personality and Website Choice. Web Science (Evanston, IL)
Abstract

We find that preference for websites, like preference for ob- jects in the offline world, is influenced by personality.  We combine personality profiles and website choices of more than 160.000 users and investigate whether different  web- sites attract audience of different personality. Using two in- dependent sources of website choices, we show that website audiences often have distinct personality profiles, that there is a psychologically  meaningful  relationship  between per- sonality and preferences related to website and website cat- egories, and that results are stable across independent data sources.  Our findings are useful for researchers interested in website content personalization,  text search, search result optimization and online marketing. (View paper)

Kosinski, M., Bachrach, Y., Kasneci, G., Van Gael, J., Graepel, T. (2012) Crowd IQ: Measuring the Intelligence of Crowdsourcing Platforms. Web Science (Evanston, IL)
Abstract
We measure crowdsourcing performance based on a standard IQ questionnaire, and examine Amazon’s Mechanical Turk (AMT) performance under different conditions. These include variations of the payment amount offered, the way incorrect responses affect workers’ reputations, threshold reputation scores of participating AMT workers, and the number of workers per task. We show that crowds composed of workers of high reputation achieve higher performance than low reputation crowds, and the effect of the amount of payment is non-monotone—both paying too much and too little affects performance. Furthermore, higher performance is achieved when the task is designed such that incorrect responses can decrease workers’ reputation scores. Using majority vote to aggregate multiple responses to the same task can significantly improve performance, which can be further boosted by dynamically allocating workers to tasks in order to break ties. (view paper)

Bachrach, Y., Kosinski, M., Graepel, T.,  Kohli, P., Stillwell, D.J., (2012) Personality and Patterns of Facebook Usage. Web Science (Evanston, IL)
Abstract
We show how users’ activity on Facebook relates to their personality, as measured by the standard Five Factor Model. Our dataset consists of the personality profiles and Facebook profile data of 180,000 users. We examine correlations between users’ personality and the properties of their Facebook profiles such as the size and density of their friendship network, number uploaded photos, number of events attended, number of group memberships, and number of times user has been tagged in photos. Our results show significant relationships between personality traits and various features of Facebook profiles. We then show how multivariate regression allows prediction of the personality traits of an individual user given their Facebook profile. The best accuracy of such predictions is achieved for Extraversion and Neuroticism, the lowest accuracy is obtained for Agreeableness, with Openness and Conscientiousness lying in the middle.(view paper)

Wang, N., Kosinski, M., Stillwell, D.J. & Rust, J. (2012) Can well-being be measured using Facebook status updates? Validation of Facebook’s Gross National Happiness Index. Social Indicators Research.
Abstract
Facebook's Gross National Happiness (FGNH) indexes the positive and negative words used in the millions of status updates submitted daily by Facebook users. FGNH has face validity: it shows a weekly cycle and increases on national holidays. Also, happier individuals use more positive words and fewer negative words in their status updates (Kramer, 2010). We examined the validity of FGNH in measuring mood and well-being by comparing it with scores on Diener's Satisfaction with Life Scale (SWLS), administered to an average of 34 Facebook users every day for a year, then aggregated by day, week, month, quarter and half year. FGNH and SWLS were not significantly correlated, with a negative correlation coefficient. Also, aggregated SWLS scores showed a positive relationship with numbers of negative words in status updates. We conclude that FGNH is a valid measure for neither mood nor well-being; however, it may play a role in mood regulation. This challenges the assumption that linguistic analysis of internet messages is related to underlying psychological states. (view paper)

Bachrach, Y., Graepel, T., Kasneci, G., Kosinski, M., Van Gael, J. (2012) Crowd IQ - Aggregating Opinions to Boost Performance. Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems (Valencia, June).
Abstract
We show how the quality of decisions based on the aggregated opinions of the crowd can be conveniently studied using a sample of individual responses to a standard IQ questionnaire. We aggregated the responses to the IQ questionnaire using simple majority voting and a machine learning approach based on a probabilistic graphical model. The score for the aggregated questionnaire, Crowd IQ, serves as a quality measure of decisions based on aggregating opinions, which also allows quantifying individual and crowd performance on the same scale. We show that Crowd IQ grows quickly with the size of the crowd but saturates, and that for small homogeneous crowds the Crowd IQ significantly exceeds the IQ of even their most intelligent member. We investigate alternative ways of aggregating the responses and the impact of the aggregation method on the resulting Crowd IQ. We also discuss Contextual IQ, a method of quantifying the individual participant’s contribution to the Crowd IQ based on the Shapley value from cooperative game theory. (view paper)

Stillwell, D.J. & Tunney, R.J. (2012) Effects of measurement methods on the relationship between smoking and delay reward discounting. Addiction.
Abstract
Aims: Delay reward discounting (DRD) measures the degree to which a person prefers smaller rewards soon or larger rewards later. People who smoke have been shown to have higher DRD. There are several ways of measuring DRD and the method used might influence the association between smoking and DRD. The key differences are the order that the items are presented in, the delays used, and the magnitude of the delayed amount.
Setting: An international online study running from September 2010 to June 2011.
Participants: N = 9454; 38% male, mean age = 23.1.
Design and Measurements: Users completed a multi-item DRD task. They were randomly presented the immediate rewards in an ascending, descending, or randomized order. The delays were between 1 week and 5 years. The delayed amounts were $1000 for all delays, and $100 for 1 month. Users also self-reported their smoking status.
Findings: A hyperbolic DRD function fit better than an exponential function. There were differences in the derived DRD function based on methodology used. Items presented in a randomized order, longer delays and smaller rewards showed steeper discounting. However, these did not interact with smoking status, as for all methodologies used daily smokers showed the steepest discounting, followed by non-daily smokers, then non-smokers.
Conclusions: Smokers discount more steeply irrespective of which method is used. However, the methods of assessing DRD influence the parameters, which means that parameters estimated with different methods cannot be compared.
(view paper)

Quercia, D., Lambiotte, R., Kosinski, M., Stillwell, D. & Crowcroft, J. (2011) The Personality of Popular Facebook Users. ACM CSCW 2012.
Abstract
We study the relationship between Facebook popularity (number of contacts) and personality traits on a large number of subjects. We test to which extent two prevalent viewpoints hold. That is, popular users (those with many social contacts) are the ones whose personality traits either predict many offline (real world) friends or predict propensity to maintain superficial relationships. We find that the predictor for number of friends in the real world (Extraversion) is also a predictor for number of Facebook contacts. We then test whether people who have many social contacts on Facebook are the ones who are able to adapt themselves to new forms of communication, present themselves in likable ways, and have propensity to maintain superficial relationships. We show that there is no statistical evidence to support such a conjecture. (view paper)

Quercia, D., Kosinski, M., Stillwell, D. & Crowcroft, J. (2011) Our Twitter profiles, our selves: Predicting personality with Twitter. IEEE SocialCom.
Abstract
Psychological personality has been shown to affect a variety of aspects: preferences for interaction styles in the digital world and for music genres, for example. Consequently, the design of personalized user interfaces and music recommender systems might benefit from understanding the relationship between personality and use of social media. Since there has not been a study between personality and use of Twitter at large, we set out to analyze the relationship between personality and different types of Twitter users, including popular users and influentials. For 335 users, we gather personality data, analyze it, and find that both popular users and influentials are extroverts and emotionally stable (low in the trait of Neuroticism). Interestingly, we also find that popular users are ‘imaginative’ (high in Openness), while influentials tend to be ‘organized’ (high in Conscientiousness). We then show a way of accurately predicting a user’s personality simply based on three counts publicly available on profiles: following, followers, and listed counts. Knowing these three quantities about an active user, one can predict the user’s five personality traits with a rootmean- squared error below 0.88 on a [1; 5] scale. Based on these promising results, we argue that being able to predict user personality goes well beyond our initial goal of informing the design of new personalized applications as it, for example, expands current studies on privacy in social media. (view paper)

Rust, J. (2012) Psychometrics. In Research Methods in Psychology, (Eds: G. M. Breakwell, J. A. Smith & D. B. Wright) 141-162.

Golombok, S., Rust, J., Xervoulis, K, Goilding, J and Hines, M. (2012) Continuity in sex-typed behavior from Preschool to Adoloscence: A Longitudinal population study of boys and girls aged 3-12 years. Archives of Sexual Behavior 41(3), 591-597. 

Rust, J. & Golombok, S. (2008) Modern Psychometrics: The science of psychological assessment (3rd Edition). Routledge, London. (also in Chinese, Ren Min University Press, Beijing, 2011)

Golombok, S., Rust, J., Zervoulis, K., Croudace, T., Golding, J. & Hines, M. (2008) Developmental trajectories of sex-typed behaviour in boys and girls: A longitudinal population study of children aged 2.5 to 8 years. Child Development. 79(5) 1583-159.