Automatic methods for coding historical occupation descriptions to standard classifications

Graham Njal Cameron Kirby, Jamie Carson, Fraser Dunlop, Chris Dibben, Alan Dearle, Lee Williamson, Eilidh Garrett, Alice Reid

Research output: Chapter in Book/Report/Conference proceedingChapter (peer-reviewed)peer-review

Abstract

The increasing availability of digitised registration records presents a significant opportunity for research in many fields including those of human geography, genealogy and medicine. Re-examining original records allows researchers to study relationships between factors such as occupation, cause of death, illness, and geographic region. This can be facilitated by coding these factors to standard classifications. This chapter describes work to develop a method for automatically coding the occupations from 29 million Scottish birth, death and marriage records, containing around 50 million occupation descriptions, to standard classifications. A range of approaches using text processing and supervised machine learning is evaluated, achieving classification performance of 75% micro-precision/recall, 61% macro-precision and 66% macro-recall on a smaller test set. Further development that may be needed for classification of the full data set is discussed.
Original languageEnglish
Title of host publicationPopulation Reconstruction
EditorsGerrit Bloothooft, Peter Christen, Kees Mandemakers, Marijn Schraagen
PublisherSpringer
Pages43-60
Number of pages18
ISBN (Electronic)978-3-319-19884-2
ISBN (Print)978-3-319-19883-5
DOIs
Publication statusPublished - Aug 2015

Keywords

  • Population reconstruction
  • Occupations
  • Automatic classification

Fingerprint

Dive into the research topics of 'Automatic methods for coding historical occupation descriptions to standard classifications'. Together they form a unique fingerprint.

Cite this