The myPersonality dataset of over 6,000,000 personality profiles and associated anonimized Facebook data, collected by David Stillwell of The Psychometrics Centre, continues to be a superlative resource for researchers all over the world. In a paper published in October 2013, colleagues both here and in the University of Pennsylvania analysed 700 million words, phrases, and topic instances collected from the Facebook messages of 75,000 of these volunteers, which enabled them to examine how the use of words online varied with personality and with demographics such as gender and age. Striking variations in language were found between groups. The open-vocabulary technique used also found connections not captured by traditional closed-vocabulary word-category analyses. For more information see Schwartz HA, Eichstaedt JC, Kern ML, Dziurzynski L, Ramones SM, Agrawal M, Shah A, Stillwell D, Kosinski M and Seligman M (2013) Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach. PLoS ONE 8(9): e73791. doi:10.1371/journal.pone.007
Large differences were found, as might be expected, between the words used at different ages
Teenagers (13 to 18 year olds)
Young Adults (23 to 29 year olds)
But the most striking differences were those found between men and women in their use of words. In the figure below, words are clustered according to groupings rather than presented individually (the correlations in the top right hand corner of each group represent its predictive power). The differences are dramatic and arguably an affont to gender equality. But, sadly, the data speaks for itself.
When it comes to personality, the word clouds provide considerable evidence for the accuracy of personality inventories in terms of their ability to predict individual differences. For example:
And (but not for the faint hearted)