Hierarchical Dirichlet process for tracking complex topical structure evolution and its application to autism research literature

Adham Beykikhoshk*, Oggie Arandelovic, Svetha Venkatesh, Dinh Phung

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper we describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension. In contrast to previous work our model does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the model can capture. Our key technical contribution is a framework based on (i) discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes: emergence and disappearance, evolution, splitting and merging. The power of the proposed framework is demonstrated on the medical literature corpus concerned with the autism spectrum disorder (ASD) – an increasingly important research subject of significant social and healthcare importance. In addition to the collected ASD literature corpus which we made freely available, our contributions also include two free online tools we built as aids to ASD researchers. These can be used for semantically meaningful navigation and searching, as well as knowledge discovery from this large and rapidly growing corpus of literature.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer-Verlag
Pages550-562
Number of pages13
Volume9077
ISBN (Print)9783319180373
DOIs
Publication statusPublished - 2015
Event19th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2015 - Ho Chi Minh City, Vietnam
Duration: 19 May 201522 May 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9077
ISSN (Print)03029743
ISSN (Electronic)16113349

Conference

Conference19th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2015
Country/TerritoryVietnam
CityHo Chi Minh City
Period19/05/1522/05/15

Fingerprint

Dive into the research topics of 'Hierarchical Dirichlet process for tracking complex topical structure evolution and its application to autism research literature'. Together they form a unique fingerprint.

Cite this