Abstract
We define BitPart (Bitwise representations of binary Partitions), a novel exact search mechanism intended for use in high-dimensional spaces. In outline, a fixed set of reference objects is used to define a large set of regions within the original space, and each data item is characterised according to its containment within these regions. In contrast with other mechanisms only a subset of this information is selected, according to the query, before a search within the re-cast space is performed. Partial data representations are accessed only if they are known to be potentially useful towards the calculation of the exact query solution.
Our mechanism requires Ω(N log N ) space to evaluate a query, where N is the cardinality of the data, and therefore does not scale as well as previously defined mechanisms with low-dimensional data. However it has recently been shown that, for a nearest neighbour search in high dimensions, a sequential scan of the data is essentially unavoidable. This result has been suspected for a long time, and has been referred to as the curse of dimensionality in this context.
In the light of this result, the compromise achieved by this work is to make the best possible use of the available fast memory, and to offer great potential for parallel query evaluation. To our knowledge, it gives the best compromise currently known for performing exact search over data whose dimensionality is too high to allow the useful application of metric indexing, yet is still sufficiently low to give at least some traction from the metric and supermetric properties.
Our mechanism requires Ω(N log N ) space to evaluate a query, where N is the cardinality of the data, and therefore does not scale as well as previously defined mechanisms with low-dimensional data. However it has recently been shown that, for a nearest neighbour search in high dimensions, a sequential scan of the data is essentially unavoidable. This result has been suspected for a long time, and has been referred to as the curse of dimensionality in this context.
In the light of this result, the compromise achieved by this work is to make the best possible use of the available fast memory, and to offer great potential for parallel query evaluation. To our knowledge, it gives the best compromise currently known for performing exact search over data whose dimensionality is too high to allow the useful application of metric indexing, yet is still sufficiently low to give at least some traction from the metric and supermetric properties.
Original language | English |
---|---|
Article number | 101493 |
Number of pages | 14 |
Journal | Information Systems |
Volume | 95 |
Early online date | 4 Feb 2020 |
DOIs | |
Publication status | Published - Jan 2021 |
Keywords
- Similarity search
- Metric space
- Metric indexing
- Metric search
- Four-point property
Fingerprint
Dive into the research topics of 'BitPart: exact metric search in high(er) dimensions'. Together they form a unique fingerprint.Profiles
Datasets
-
BitPart
Dearle, A. (Creator) & Connor, R. (Creator), GitHub, 2020
https://github.com/aldearle/BitPart
Dataset