Projects per year
Abstract
Record linkage is the process of identifying records that refer to the same real-world entities in situations where entity identifiers are unavailable. Records are linked on the basis of similarity between common attributes, with every pair being classified as a link or non-link depending on their similarity. Linkage is usually performed in a three-step process: first, groups of similar candidate records are identified using indexing, then pairs within the same group are compared in more detail, and finally classified. Even state-of-the-art indexing techniques, such as locality sensitive hashing, have potential drawbacks. They may fail to group together some true matching records with high similarity, or they may group records with low similarity, leading to high computational overhead. We propose using metric space indexing (MSI) to perform complete linkage, resulting in a parameter-free process combining indexing, comparison and classification into a single step delivering complete and efficient record linkage. An evaluation on real-world data from several domains shows that linkage using MSI can yield better quality than current indexing techniques, with similar execution cost, without the need for domain knowledge or trial and error to configure the process.
Original language | English |
---|---|
Title of host publication | Advances in Knowledge Discovery and Data Mining |
Subtitle of host publication | 22nd Pacific-Asia Conference, PAKDD 2018, Melbourne, VIC, Australia, June 3-6, 2018, Proceedings, Part III |
Editors | Dinh Phung, Vincent S. Tseng, Geoff Webb, Bao Ho, Mohadeseh Ganji, Lida Rashidi |
Place of Publication | Cham |
Publisher | Springer |
Pages | 89-101 |
Number of pages | 13 |
ISBN (Electronic) | 9783319930404 |
ISBN (Print) | 9783319930398 |
DOIs | |
Publication status | Published - 2018 |
Event | 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2018) - Melbourne, Australia Duration: 3 Jun 2018 → 6 Jun 2018 Conference number: 22 http://prada-research.net/pakdd18/ |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Publisher | Springer |
Volume | 10939 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2018) |
---|---|
Abbreviated title | PAKDD 2018 |
Country/Territory | Australia |
City | Melbourne |
Period | 3/06/18 → 6/06/18 |
Internet address |
Keywords
- Entity resolution
- Data matching
- Similarity search
- Blocking
Fingerprint
Dive into the research topics of 'Using metric space indexing for complete and efficient record linkage'. Together they form a unique fingerprint.Projects
- 3 Finished
-
Digitising Scotland: Digitising Scotland
Kirby, G. N. C. (PI)
Economic & Social Research Council
31/10/14 → 31/10/20
Project: Standard
-
Administrative Data Research Centres: ESRC - Admin Data Service - Scottish Consortium
Kirby, G. N. C. (PI)
1/11/13 → 31/10/18
Project: Standard
-
SFC SMART Tourism: Content beyond the edge of internet
Dearle, A. (PI)
1/12/12 → 30/06/14
Project: Standard
Profiles
-
Ozgur Akgun
- School of Computer Science - Senior Lecturer, Director of Impact
- Centre for Interdisciplinary Research in Computational Algebra
Person: Academic
Datasets
-
Using metric space indexing for complete and efficient record linkage (dataset)
Akgun, O. (Creator), Dearle, A. (Creator), Kirby, G. N. C. (Creator) & Christen, P. (Creator), GitHub, 26 Feb 2018
https://github.com/digitisingscotland/pakdd2018-metric-linkage
Dataset