Overcoming data scarcity of Twitter: using tweets as bootstrap with application to autism-related topic content analysis

Adham Beykikhoshk, Oggie Arandelovic, Dinh Phung Svetha Venkatesh

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Notwithstanding recent work which has demonstrated the potential of using Twitter messages for content-specific data mining and analysis, the depth of such analysis is inherently limited by the scarcity of data imposed by the 140 character tweet limit. In this paper we describe a novel approach for targeted knowledge exploration which uses tweet content analysis as a preliminary step. This step is used to bootstrap more sophisticated data collection from directly related but much richer content sources. In particular we demonstrate that valuable information can be collected by following URLs included in tweets. We automatically extract content from the corresponding web pages and treating each web page as a document linked to the original tweet show how a temporal topic model based on a hierarchical Dirichlet process can be used to track the evolution of a complex topic structure of a Twitter community. Using autism-related tweets we demonstrate that our method is capable of capturing a much more meaningful picture of information exchange than user-chosen hashtags.

Original languageEnglish
Title of host publicationProceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2015
EditorsJ Pei, F Silvestri, J Tang
PublisherACM
Pages1354-1361
Number of pages8
ISBN (Print)9781450338547
DOIs
Publication statusPublished - 25 Aug 2015
EventIEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2015 - Paris, France
Duration: 25 Aug 201528 Aug 2015

Conference

ConferenceIEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2015
Country/TerritoryFrance
CityParis
Period25/08/1528/08/15

Fingerprint

Dive into the research topics of 'Overcoming data scarcity of Twitter: using tweets as bootstrap with application to autism-related topic content analysis'. Together they form a unique fingerprint.

Cite this