A Cheap, Naturalistic, Large-Scale Research Method Designed To Assess And Interpret Our Social Media Linguistic Interactions

The most important news for me this year came in June, with the publication of “Automatic Personality Assessment Through Social Media Language” in the Journal of Personality and Social Psychology. For those working at the intersection of psychology and technology, the results of this study confirmed what many of us had been anticipating: the validation of a cheap, naturalistic, large-scale research method designed to assess and interpret the linguistic interactions that millions of us engage in online, every single day.

With a sample of over 66,000 active social media participants, the researchers used a rich, open-vocabulary approach to build a predictive model of personality using the “Big Five” personality traits of openness, conscientiousness, extraversion, agreeableness, and neuroticism. 

The methodology they employed yielded more accurate language-based predictions of personality than any other study to date, demonstrating not only a robust alternative to existing approaches, but also that this kind of research can now be accomplished on an unprecedented scale and level of accuracy.

In and of themselves, general insights into a population’s personalities may not seem particularly consequential. We might know for example, that individuals who score highly for extraversion generally prefer using more positive emotional words (such as amazing, great, happy), whereas those who score higher in neuroticism tend to use first-person singulars (such as I, me, mine) with greater frequency. Both are interesting observations, but it’s not until we get multiple data points at scale that a more profound picture emerges.

Considering the ease with which we can create unique profiles for users employing little more than a few cookies and an IP address, we are now in the unique position of being able to cluster traits together and compile overall personality dispositions for millions of users, which can then be stored in psychometric databases. In fact, several companies have already begun this task, with commercial applications in mind.

Given that certain personality dispositions are associated with a whole range of predictable life outcomes (for instance a propensity towards risk-taking behaviors within high-scoring extravert populations), it is perfectly conceivable that such data could be used to concretely impact the quality of our lives, for good and for bad.

This, for me, is where the importance of the research kicks in. 

On the positive side, if we are able to design programs that can make predictions about our personality by assessing publicly available data (our written interactions across social media channels), this may provide a means through which we can become empowered to discover more about our motivations, our behaviors, and ourselves. From a commercial perspective, it may also lead to smarter advertising and applications that can adapt to better serve our needs.

On the negative side, however, outside the realm of academic research, such data mining practices do not yet require consent, and could therefore be used by any entity with the adequate capabilities to profile and categorize people (whether as citizens, customers, or potential employees) without their knowledge and beyond their control. Such information could then be used to determine whether to grant certain people access to particular services (such as lines of credit or medical insurance), career paths, and even citizenship.

Given the predictive potential of such a system and the limited attention it has received in the wider media, it is of vital importance that this news enters public discourse so that we are all better equipped to understand how the information we share online may be used to reveal potentially intimate aspects of ourselves. Only then will we be able to make an informed choice as to how (or whether) we choose to engage online, and the impact this may have on our future life choices.