Algorithm deduces drunk tweets from geolocation and behavioural data
Wed 16 Mar 2016
Researchers have devised a method of identifying people who post on Twitter while they are drunk at home, using both the geolocation tags that tweets can provide and be sleuthing other factors that indicate that the tweeter is inebriated.
The paper Inferring Fine-grained Details on User Activities and Home Location from Social Media: Detecting Drinking-While-Tweeting Patterns in Communities [PDF], led by Nabil Hossain at the University of Rochester, outlines the initial difficulties the team had in distinguishing tweeters who were merely discussing alcohol versus those who are under the influence of it at the time of tweeting. The researchers set Amazon Mechanical Turk (sold as ‘Artificial Artificial Intelligence’) workers to the task of analysing alcohol-related tweets from two regions – New York City and the environs of the City of Rochester at the margins of the state.
The dataset came in at over 11,000 geolocated tweets. However, a user’s GPS coordinates are no proof that they are home, so additional analysis was performed on users’ feeds, wherein the workers seek out phrases that mentioned or indicated ‘home’ or other language likely to indicate that the tweeter was not ‘out’ in the accepted sense. Three worker threads were quizzed regarding whether or not the user was considered to be ‘at home’, and only 100% agreement between them would lead to the ‘home’ assignation.
Pattern analysis on user tweets over a period of time was also performed, to help associate geo-tag information with repetitive behaviour as exhibited in the users’ Twitter timelines. Other factors included the preferred location of tweets and frequency from a particular location, and the resulting data helped Hossain’s team to determine the users’ home locations to within 100 metres at 80% accuracy.
Results indicated that the tweeting drinkers in Monroe were likelier to be out of their houses (if not their gourds) than the denizens of New York City. The researchers ascribe this potentially to NYC’s high availability of alcohol compared to a more rural environment. In Monroe County more people are likelier to be drinking more than one kilometre away from home.
The report concludes:
‘Models that permit the fine-grained study of alcohol consumption in social media can reveal important real-time information about users and the influences they have on each other. We can begin to evaluate the merits of these data for public health research. Such analyses can teach us who is and isn’t referencing alcohol on Twitter, and in what settings, to evaluate the degree of self-reporting biases, and also help to create a tool for improving a community’s health, given social networks can become a resource to spread positive health behaviour.’