Projects per year
Abstract
The increasing availability of digitised registration records presents a significant opportunity for research in many fields including those of human geography, genealogy and medicine. Re-examining original records allows researchers to study relationships between factors such as occupation, cause of death, illness, and geographic region. This can be facilitated by coding these factors to standard classifications. This paper describes work to develop a method for automatically coding the occupations from 29 million Scottish birth, death and marriage records, containing around 50 million occupation descriptions, to standard classifications. A range of approaches using text processing and supervised machine learning is evaluated, achieving accuracy of 92.3 ± 0.2% on a smaller test set. The paper speculates on further development that may be needed for classification of the full data set.
Original language | English |
---|---|
Publication status | Accepted/In press - 2014 |
Event | Workshop on Population Reconstruction - International Institute of Social History, Amsterdam, Netherlands Duration: 19 Feb 2014 → 21 Feb 2014 |
Workshop
Workshop | Workshop on Population Reconstruction |
---|---|
Country/Territory | Netherlands |
City | Amsterdam |
Period | 19/02/14 → 21/02/14 |
Fingerprint
Dive into the research topics of 'Automatic methods for coding historical occupation descriptions to standard classifications'. Together they form a unique fingerprint.Projects
- 5 Finished
-
Digitising Scotland: Digitising Scotland
Kirby, G. N. C. (PI)
Economic & Social Research Council
1/09/12 → 31/10/14
Project: Standard
-
Digitising Scotland: Digitising Scotland
Dibben, C. J. L. (PI), Feng, Z. (CoI) & Williamson, L. (CoI)
Economic & Social Research Council
1/08/12 → 30/10/14
Project: Standard
-
Extension to Longitudinal Studies Centre: Extension for the Longitudinal Studies Centre - Scotland from 2012 to 2017
Findlay, A. M. (PI), Findlay, A. M. (PI), Dibben, C. J. L. (CoI) & Feng, Z. (CoI)
Economic & Social Research Council
1/08/12 → 31/07/17
Project: Standard