frances: a deep learning NLP and text mining web tool to unlock historical digital collections: a case study on the Encyclopaedia Britannica

Rosa Filgueira

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Downloads (Pure)

Abstract

This work presents frances, an integrated text mining tool that combines information extraction, knowledge graphs, NLP, deep learning, parallel processing and Semantic Web techniques to unlock the full value of historical digital textual collections, offering new capabilities for researchers to use powerful analysis methods without being distracted by the technology and middleware details. To demonstrate these capabilities, we use the first eight editions of the Encyclopaedia Britannica offered by the National Library of Scotland (NLS) as an example digital collection to mine and analyse. We have developed novel parallel heuristics to extract terms from the original collection (alongside metadata), which provides a mix of unstructured and semi-structured input data, and populated a new knowledge graph with this information. Our Natural Language Processing models enable frances to perform advanced analyses that go significantly beyond simple search using the information stored in the knowledge graph. Furthermore, frances also allows for creating and running complex text mining analyses at scale. Our results show that the novel computational techniques developed within frances provide a vehicle for researchers to formalize and connect findings and insights derived from the analysis of large-scale digital corpora such as the Encyclopaedia Britannica.
Original languageEnglish
Title of host publication2022 IEEE 18th International Conference on e-Science (e-Science)
PublisherIEEE
Pages246-255
Number of pages10
ISBN (Electronic)9781665461245
ISBN (Print)9781665461252
DOIs
Publication statusPublished - 14 Dec 2022
Event18th IEEE International eScience Conference (eScience 2022) - https://www.escience-conference.org/2022/, Salt Lake City, United States
Duration: 10 Oct 202214 Oct 2022
Conference number: 18

Publication series

NameIEEE international conference on e-science and grid computing

Conference

Conference18th IEEE International eScience Conference (eScience 2022)
Abbreviated titleeScience 2022
Country/TerritoryUnited States
CitySalt Lake City
Period10/10/2214/10/22

Keywords

  • Information extraction
  • Knowlege graph
  • Transfer learning
  • Natural language processing
  • Text mining
  • Web tools
  • Semantic web
  • Parallel computing
  • Digital tools
  • Digital textual collections
  • Deep learning
  • Metadata
  • Knowledge engineering
  • Information retrieval

Fingerprint

Dive into the research topics of 'frances: a deep learning NLP and text mining web tool to unlock historical digital collections: a case study on the Encyclopaedia Britannica'. Together they form a unique fingerprint.

Cite this