Applying Sparse Machine Learning Methods to Twitter: Analysis of the Change in Pap Smear Guidelines
|
If you are the presenter of this abstract (or if you cite this abstract in a talk or on a poster), please show the QR code in your slide or poster (QR code contains this URL). |
Abstract
Background
Social networking websites have fundamentally changed the way in which we communicate; two-thirds of adults on the Internet now use online social networking sites like Twitter. Although we know from previous work that individuals discuss health issues in online social media, it is often difficult to synthesize the vast amounts of textual data available.
Objective
We sought to use machine learning to summarize real-world Twitter discussions about cervical cancer, with the goals of 1) providing insight into individuals’ opinions and decision-making processes for cancer screening, and 2) understanding if these tools could detect changes in cancer discussion topics over time.
Methods
We applied methodologies from statistics, known as sparse machine learning algorithms, to U.S. Twitter messages about cervical cancer. We focused our searches in the first half of 2012, when there was a major change in the U.S. Preventive Services Task Force (USPSTF) guidelines for Pap smear screening – announcing a recommendation for screening every three years rather than yearly for those at low-risk. We analyzed all tweets containing “cervical cancer,†“pap smear,†and “pap test†in the first six months of the year, splitting the search into two time periods before and after the March USPSTF announcement: 1) January 1-March 13, 2012, and 2) March 14-June 30, 2012. The machine learning results were short lists of terms that identified messages more representative of one time period over the other. We also pulled down full Twitter messages to provide context for these terms.
Results
There were 2,549 messages about cervical cancer in the baseline period and 4,673 in the follow-up period. Terms in the baseline period included “information,†“study shows,†and “having sex,†while those in the follow-up period included “need,†“don’t need†and “new guidelines.†Although these terms gave indication that there were a substantial number of Twitter discussions about the new screening guidelines in the follow-up period, this was even more evident when examining full Twitter messages. In the follow-up period, there were many tweets referencing the new recommended screening intervals, such as, “New Pap guidelines! YAY! Under Age 21: no pap smear, no [human papillomavirus] HPV testing. Age 21-29 Pap every 3 years.â€
Conclusions
We demonstrated that machine learning tools can be applied to cervical cancer prevention and screening discussions on Twitter. This method allowed us to demonstrate that there is significant publicly available dialogue about cervical cancer screening on social media sites. Moreover, we were able to detect shifts in public discussions about the change in cervical cancer screening guidelines. However, additional insight provided by examining full messages gave increased context to the discussions – suggesting mixed methods approaches (i.e., combining machine learning with qualitative analyses) might be most fruitful in future work.
Social networking websites have fundamentally changed the way in which we communicate; two-thirds of adults on the Internet now use online social networking sites like Twitter. Although we know from previous work that individuals discuss health issues in online social media, it is often difficult to synthesize the vast amounts of textual data available.
Objective
We sought to use machine learning to summarize real-world Twitter discussions about cervical cancer, with the goals of 1) providing insight into individuals’ opinions and decision-making processes for cancer screening, and 2) understanding if these tools could detect changes in cancer discussion topics over time.
Methods
We applied methodologies from statistics, known as sparse machine learning algorithms, to U.S. Twitter messages about cervical cancer. We focused our searches in the first half of 2012, when there was a major change in the U.S. Preventive Services Task Force (USPSTF) guidelines for Pap smear screening – announcing a recommendation for screening every three years rather than yearly for those at low-risk. We analyzed all tweets containing “cervical cancer,†“pap smear,†and “pap test†in the first six months of the year, splitting the search into two time periods before and after the March USPSTF announcement: 1) January 1-March 13, 2012, and 2) March 14-June 30, 2012. The machine learning results were short lists of terms that identified messages more representative of one time period over the other. We also pulled down full Twitter messages to provide context for these terms.
Results
There were 2,549 messages about cervical cancer in the baseline period and 4,673 in the follow-up period. Terms in the baseline period included “information,†“study shows,†and “having sex,†while those in the follow-up period included “need,†“don’t need†and “new guidelines.†Although these terms gave indication that there were a substantial number of Twitter discussions about the new screening guidelines in the follow-up period, this was even more evident when examining full Twitter messages. In the follow-up period, there were many tweets referencing the new recommended screening intervals, such as, “New Pap guidelines! YAY! Under Age 21: no pap smear, no [human papillomavirus] HPV testing. Age 21-29 Pap every 3 years.â€
Conclusions
We demonstrated that machine learning tools can be applied to cervical cancer prevention and screening discussions on Twitter. This method allowed us to demonstrate that there is significant publicly available dialogue about cervical cancer screening on social media sites. Moreover, we were able to detect shifts in public discussions about the change in cervical cancer screening guidelines. However, additional insight provided by examining full messages gave increased context to the discussions – suggesting mixed methods approaches (i.e., combining machine learning with qualitative analyses) might be most fruitful in future work.
Medicine 2.0® is happy to support and promote other conferences and workshops in this area. Contact us to produce, disseminate and promote your conference or workshop under this label and in this event series. In addition, we are always looking for hosts of future World Congresses. Medicine 2.0® is a registered trademark of JMIR Publications Inc., the leading academic ehealth publisher.

This work is licensed under a Creative Commons Attribution 3.0 License.