Twitter Discussions of Nonmedical Prescription Drug Use Correlate with Federal Survey Data

Nicholas Genes* Michael Chary*
Nicholas Genes*, Mount Sinai School of Medicine, New York, United States
Michael Chary*, Icahn School of Medicine at Mount Sinai, New York, United States
Alex F. Manini, Icahn School of Medicine at Mount Sinai, New York, United States

Track: Research
Presentation Topic: Public (e-)health, population health technologies, surveillance
Presentation Type: Oral presentation
Submission Type: Single Presentation

Building: Sheraton Maui Resort
Room: C - Napili
Date: 2014-11-14 11:50 AM – 12:35 PM
Last modified: 2014-09-04

If you are the presenter of this abstract (or if you cite this abstract in a talk or on a poster), please show the QR code in your slide or poster (QR code contains this URL).


Background: The drugs people use, and the way people learn about drugs, have changed in recent years. Methods of surveying usage and educating the public about health risks must correspondingly change. Analyzing social media represents a fast, inexpensive way to uncover the epidemiology of drug use and identify points of intervention.

Objective: To (1) automatically categorize Twitter messages about prescription opioid use into discussions about medical or nonmedical use employing unsupervised clustering, and (2) determine whether Twitter message content and metadata about nonmedical prescription opioid use correlates with prior US federal survey-based estimates, by state.

Methods: Twitter’s streaming API was queried for 60 days for geocoded tweets mentioning any prescription opioid currently sold in the US. Before a linguistic analysis, the text was converted to lowercase. Non-ASCII characters and stopwords were removed, and words were lemmatized. Messages were clustered according to semantic distance (a novel measure of textual similarity based on average path distances of each message’s recognizable words, from WordNet). We normalized the number to medically-related tweets to the total number of tweets emitted from a region to control for variations in tweets volume due to geography. We compared the results to reference data derived from the 2010-2011 NSDUH State Estimates of Substance Use and Mental Disorders.

Results: 100,000 tweets mentioning prescription opioids were obtained and, through semantic distance, automatically separated into distinct clusters related to medical use, nonmedical use, and unrelated to use. Nonmedical usage tweets correlated to federal estimates to the resolution of states, with r = 0.60.

Conclusions: This is the first demonstrations that tweets (that is, extremely short pieces of text) regarding medical and nonmedical usage of drugs can be automatically categorized based on linguistic features. Our estimates of nonmedical usage of prescription opioids correlate with federal estimates at the state level. Discussions of these and other drugs of abuse can be analyzed, in real-time and at finer geographic resolution, at a fraction of the cost of federal surveys, to help guide public health outreach, education and policy solutions.

Medicine 2.0® is happy to support and promote other conferences and workshops in this area. Contact us to produce, disseminate and promote your conference or workshop under this label and in this event series. In addition, we are always looking for hosts of future World Congresses. Medicine 2.0® is a registered trademark of JMIR Publications Inc., the leading academic ehealth publisher.
Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.