GazetteersScotland-KG: A Knowlege Graph for representing the Gazetteers of Scotland (1803-1901)

Dataset

Description

This Knowlege Graph represents the information of the "Gazeteers of Scotland" (years: 1803 - 1901) collection in RDF (ttl format). This collection comprises twenty volumes of the most popular descriptive gazetteers of Scotland in the 19th century. Principal places in Scotland, including towns, counties, castles, glens, antiquities and parishes, are listed alphabetically. Each entry includes detailed historical and geographical information about each place.  The raw dataset is provided by the NLS in this link. As  other NLS data collections, they are originally provided using two XMLs schemas: METS  for descriptive, structural, technical and administrative metadata (Title, Author, Publisher, etc); and ALTO  for encoding the OCR text of a page.

In this work, we have extracted the information from METS and ALTO XMLS using defoe tool and developed a new information extraction defoe query , and created a new Knowlege Graph called GazetteersScotland-KG.  The GazetteersScotland-KG uses the NLS Ontology to represent the information extracted. Furthermore, during the information extraction phase, we have employed several techniques to mitigate two common OCR errors: long-S and the line-break hyphenation.

The GazetteersScotland-KG contains 354,998 RDF triples. It has information from 12 series and 20 volumes: Each serie can have several Volumes. Each serie has an Editorm Publisher, mmsid, Shelf-Locator, publication year, etc.  A Volume has several Pages,  with text in them. The data model of the GazetteersScotland-KG can be found here.


 
Date made available2022
PublisherZenodo

Cite this