ChapbooksScotland-KG: A Knowlege Graph for representing the "Chapbooks Printed In Scotland" (1671 - 1893)

Dataset

Description

This Knowlege Graph represents the information of the "Chapbooks Printed In Scotland" (years: 1671 - 1893) collection in RDF (ttl format). This dataset comprises more than 3,000 chapbooks printed in Scotland from the 17th to 19th century. They form part of the Lauriston Castle Collection, which was bequeathed to the Library in 1926. It includes some 500 chapbook volumes containing around 5,500 individual items, more than half of which were printed in Scotland.  The raw dataset is provided by the NLS in this link. As  other NLS data collections, they are originally provided using two XMLs schemas: METS  for descriptive, structural, technical and administrative metadata (Title, Author, Publisher, etc); and ALTO  for encoding the OCR text of a page.

In this work, we have extracted the information from METS and ALTO XMLS using defoe tool and developed a new information extraction defoe query , and created a new Knowlege Graph called ChapbooksScotland-KG.  The ChapbooksScotland-KG uses the NLS Ontology to represent the information extracted. Furthermore, during the information extraction phase, we have employed several techniques to mitigate two common OCR errors: long-S and the line-break hyphenation.

The ChapbooksScotland-KG contains 352,270 RDF triples. It has information from 2728 series and 3080 volumes. Each serie can have several Volumes, Suplements, references to Books; it also has an Editor and a Publisher, which can be a Person or an Organization. A Volume has several Pages,  with text in them. The data model of the ChapbooksScotland-KG can be found here.
Date made available2022
PublisherZenodo

Cite this