frances: cloud-based historical text mining with deep learning and parallel processing

Lilin Yu*, Ash Charlton, Wilfrid Askins, Melissa Terras, Rosa Filgueira*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

17 Downloads (Pure)

Abstract

frances is an advanced cloud-based text mining digital platform that leverages information extraction, knowledge graphs, natural language processing (NLP), deep learning, and parallel processing techniques. It has been specifically designed to unlock the full potential of historical digital textual collections, such as those from the National Library of Scotland, offering cloud-based capabilities and extended support for complex NLP analyses and data visualizations. frances enables realtime recurrent operational text mining and provides robust capabilities for temporal analysis, accompanied by automatic visualizations for easy result inspection. In this paper, we present the motivation behind the development of frances, emphasizing its innovative design and novel implementation aspects. We also outline future development directions. Additionally, we evaluate the platform through two comprehensive case studies in history and publishing history. Feedback from participants in these studies demonstrates that frances accelerates their work and facilitates rapid testing
and dissemination of ideas.
Original languageEnglish
Title of host publicationProceedings
Subtitle of host publication2023 IEEE 19th international conference on e-science (e-science)
EditorsGeorge Angelos Papadopoulos, Rosa Filgueira, Rafael Ferreira Da Silva
Place of PublicationPiscataway, NJ
PublisherIEEE
Chapter10254798
Number of pages10
ISBN (Electronic)9798350322231
ISBN (Print)9798350322248
DOIs
Publication statusPublished - 25 Sept 2023
Event19th IEEE International Conference on eScience - Limassol, Cyprus, Limassol, Cyprus
Duration: 9 Oct 202313 Oct 2023
Conference number: 19
https://www.escience-conference.org/2023/

Publication series

NameIEEE international conference on e-science
ISSN (Print)2325-372X
ISSN (Electronic)2325-3703

Conference

Conference19th IEEE International Conference on eScience
Abbreviated titleeScience
Country/TerritoryCyprus
CityLimassol
Period9/10/2313/10/23
Internet address

Keywords

  • Digitised historical collections
  • Information extraction
  • Apache Spark
  • Parallel processing
  • Text mining
  • Cloud-based platform
  • Knowledge graphs
  • Natural language processing

Fingerprint

Dive into the research topics of 'frances: cloud-based historical text mining with deep learning and parallel processing'. Together they form a unique fingerprint.

Cite this