The imagination of crowds: Conversational AAC language modeling using crowdsourcing and large data sources

Keith Vertanen, Per Ola Kristensson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Augmented and alternative communication (AAC) devices enable users with certain communication disabilities to participate in everyday conversations. Such devices often rely on statistical language models to improve text entry by offering word predictions. These predictions can be improved if the language
model is trained on data that closely reflects the style of the users’ intended communications. Unfortunately, there is no large dataset consisting of genuine AAC messages. In this paper we demonstrate how we can crowdsource the creation of a large set of fictional AAC messages. We show that these messages
model conversational AAC better than the currently used datasets based on telephone conversations or newswire text. We leverage our crowdsourced messages to intelligently select sentences from much larger sets of Twitter, blog and Usenet data. Compared to a model trained only on telephone transcripts, our best performing model reduced perplexity on three test sets of AAC-like communications by 60–82% relative. This translated to a potential keystroke savings in a predictive keyboard interface of 5–11%.
Original languageEnglish
Title of host publicationProceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP 2011)
PublisherAssociation for Computational Linguistics
Pages700-711
Number of pages12
ISBN (Print)978-193728411-4
Publication statusPublished - 2011
Event2011 Conference on Empirical Methods on Natural Language Processing - Edinburgh, United Kingdom
Duration: 27 Jul 201129 Jul 2011

Conference

Conference2011 Conference on Empirical Methods on Natural Language Processing
Abbreviated titleEMNLP 2011
Country/TerritoryUnited Kingdom
CityEdinburgh
Period27/07/1129/07/11

Fingerprint

Dive into the research topics of 'The imagination of crowds: Conversational AAC language modeling using crowdsourcing and large data sources'. Together they form a unique fingerprint.

Cite this