An approach to population linkage using graph databases

Alan Dearle*, Graham Njal Cameron Kirby, Ozgur Akgun

*Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

Abstract

We report on a database project which is in the process of linking 29 million vital event records encompassing the entire population of Scotland from 1856 until 1973. Since these records contain no common identifiers, the challenge is to form a pedigree by performing probabilistic linkage over the records. We describe the linkage methodology used to create links between records, for example identifying the birth and marriage records of a single person, and discuss the database technologies employed in the project. A graph database (Neo4j) is used to store both the original vital event records and the links made between them. A metric index is used to find potential links efficiently. Finally, we demonstrate how linkage can be improved by augmenting links based on record distance thresholds with local graph analysis.
Original languageEnglish
Pages291-302
Number of pages12
Publication statusPublished - 5 Jul 2023
Event31st Italian Symposium on Advanced Database Systems - Galzignano Terme, Italy
Duration: 2 Jul 20235 Jul 2023
https://sebd2023.dei.unipd.it

Conference

Conference31st Italian Symposium on Advanced Database Systems
Abbreviated titleSEBD 2023
Country/TerritoryItaly
Period2/07/235/07/23
Internet address

Keywords

  • Metric indexing
  • Metric search
  • Data linkage
  • Graph databases
  • Similarity search

Fingerprint

Dive into the research topics of 'An approach to population linkage using graph databases'. Together they form a unique fingerprint.

Cite this