Sedilek et al.
Twitter isn't as much a micro-blogging site anymore as it is a data pool. It can tell us what's happening during an emergency, how people are feeling about national news events, and even the difference between a geek and a nerd. Mined for the right keywords, it can also tell you where people are getting sick.
As a post doc at the University of Rochester, Adam Sadilek, now a data scientist for Google, built nEmesis, a machine learning system that tracks where people tweet about food poisoning.
The system flagged relevant stomach and food-related updates from a pool of 3.8 million million tweets posted between January and April in New York City, then human eyes (recruited through Amazon's Mechanical Turk) determined the 6,000 that seemed most indicative of food poisoning so that nEmesis could learn what data to look for. Tweets that contained phrases like "throw up," "pepto-bismol" or "my tummy hurts" were flagged as being related to foodborne illness (and to the tweeter being a total whiner).
The health scores nEmesis assigned to restaurants based on the number of tweeters who fell ill after visiting came close to the scores food inspectors had submitted to the city's health department. The program can then color-code restaurants based on the likelihood of getting food poisoning from eating there.
Social media over-sharers will no doubt feel vindicated. There is a reason to tweet about your every cough and tummy grumble. It's providing data to science!
nEmesis (paper here) will be presented at the Conference on Human Computation & Crowdsourcing in November.