Abstract
Similarity search using metric indexing techniques is largely a solved problem in low-dimensional spaces. However techniques based only on the triangle inequality property start to fail as dimensionality increases.
Since proper metric spaces allow a finite projection of any three objects into a 2D Euclidean space, the notion of angle can be validly applied among any three (but no more) objects. High dimensionality is known to have interesting effects on angles in vector spaces, but to our knowledge this has not been studied in more general metric spaces. Here, we consider the use of angles among objects in combination with distances.
As dimensionality becomes higher, we show that the variance in sampled angles reduces. Furthermore, sampled angles also become correlated with inter-object distances, giving different distributions between query solutions and non-solutions. We show the theoretical underpinnings of this observation in unbounded high-dimensional Euclidean spaces, and then examine how the pure property is reflected in some real-world high dimensional spaces. Our experiments on both generated and real world datasets demonstrate that these observations can have an important impact on the tractability of search as dimensionality increases.
Since proper metric spaces allow a finite projection of any three objects into a 2D Euclidean space, the notion of angle can be validly applied among any three (but no more) objects. High dimensionality is known to have interesting effects on angles in vector spaces, but to our knowledge this has not been studied in more general metric spaces. Here, we consider the use of angles among objects in combination with distances.
As dimensionality becomes higher, we show that the variance in sampled angles reduces. Furthermore, sampled angles also become correlated with inter-object distances, giving different distributions between query solutions and non-solutions. We show the theoretical underpinnings of this observation in unbounded high-dimensional Euclidean spaces, and then examine how the pure property is reflected in some real-world high dimensional spaces. Our experiments on both generated and real world datasets demonstrate that these observations can have an important impact on the tractability of search as dimensionality increases.
Original language | English |
---|---|
Title of host publication | Similarity Search and Applications |
Subtitle of host publication | 13th International Conference, SISAP 2020, Copenhagen, Denmark, September 30–October 2, 2020, Proceedings |
Editors | Shin'ichi Satoh, Lucia Vadicamo, Arthur Zimek, Fabio Carrara, Ilaria Bartolini, Martin Aumüller, Björn Þór Jónsson, Rasmus Pagh |
Place of Publication | Cham |
Publisher | Springer |
Pages | 233-247 |
ISBN (Electronic) | 9783030609368 |
ISBN (Print) | 9783030609351 |
DOIs | |
Publication status | Published - 2020 |
Event | 13th International Conference on Similarity Search and Applications, SISAP 2020 - Virtual Duration: 30 Sept 2020 → 2 Oct 2020 http://www.sisap.org/2020/ |
Publication series
Name | Lecture Notes in Computer Science (Information Systems and Applications, incl. Internet/Web, and HCI) |
---|---|
Publisher | Springer |
Volume | 12440 |
ISSN (Print) | 0302-9743 |
Conference
Conference | 13th International Conference on Similarity Search and Applications, SISAP 2020 |
---|---|
Abbreviated title | SISAP 2020 |
Period | 30/09/20 → 2/10/20 |
Internet address |
Keywords
- Metric search
- High dimensional space
Fingerprint
Dive into the research topics of 'Sampled angles in high-dimensional spaces'. Together they form a unique fingerprint.Profiles
Datasets
-
SISAP2020_angles
Dearle, A. (Creator) & Connor, R. (Creator), GitHub, 2020
https://github.com/aldearle/SISAP2020_angles
Dataset