TY - GEN
T1 - High-dimensional simplexes for supermetric search
AU - Connor, Richard
AU - Vadicamo, Lucia
AU - Rabitti, Fausto
PY - 2017
Y1 - 2017
N2 - In a metric space, triangle inequality implies that, for any three objects, a triangle with edge lengths corresponding to their pairwise distances can be formed. The n-point property is a generalisation of this where, for any (n+1) objects in the space, there exists an n-dimensional simplex whose edge lengths correspond to the distances among the objects. In general, metric spaces do not have this property; however in 1953, Blumenthal showed that any semi-metric space which is isometrically embeddable in a Hilbert space also has the n-point property. We have previously called such spaces supermetric spaces, and have shown that many metric spaces are also supermetric, including Euclidean, Cosine, Jensen-Shannon and Triangular spaces of any dimension. Here we show how such simplexes can be constructed from only their edge lengths, and we show how the geometry of the simplexes can be used to determine lower and upper bounds on unknown distances within the original space. By increasing the number of dimensions, these bounds converge to the true distance. Finally we show that for any Hilbert-embeddable space, it is possible to construct Euclidean spaces of arbitrary dimensions, from which these lower and upper bounds of the original space can be determined. These spaces may be much cheaper to query than the original. For similarity search, the engineering tradeoffs are good: we show significant reductions in data size and metric cost with little loss of accuracy, leading to a significant overall improvement in exact search performance.
AB - In a metric space, triangle inequality implies that, for any three objects, a triangle with edge lengths corresponding to their pairwise distances can be formed. The n-point property is a generalisation of this where, for any (n+1) objects in the space, there exists an n-dimensional simplex whose edge lengths correspond to the distances among the objects. In general, metric spaces do not have this property; however in 1953, Blumenthal showed that any semi-metric space which is isometrically embeddable in a Hilbert space also has the n-point property. We have previously called such spaces supermetric spaces, and have shown that many metric spaces are also supermetric, including Euclidean, Cosine, Jensen-Shannon and Triangular spaces of any dimension. Here we show how such simplexes can be constructed from only their edge lengths, and we show how the geometry of the simplexes can be used to determine lower and upper bounds on unknown distances within the original space. By increasing the number of dimensions, these bounds converge to the true distance. Finally we show that for any Hilbert-embeddable space, it is possible to construct Euclidean spaces of arbitrary dimensions, from which these lower and upper bounds of the original space can be determined. These spaces may be much cheaper to query than the original. For similarity search, the engineering tradeoffs are good: we show significant reductions in data size and metric cost with little loss of accuracy, leading to a significant overall improvement in exact search performance.
KW - Dimensionality reduction
KW - Metric embedding
KW - Metric search
KW - Supermetric space
U2 - 10.1007/978-3-319-68474-1_7
DO - 10.1007/978-3-319-68474-1_7
M3 - Conference contribution
AN - SCOPUS:85031297997
SN - 9783319684734
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 96
EP - 109
BT - Similarity Search and Applications - 10th International Conference, SISAP 2017, Proceedings
A2 - Borutta, Felix
A2 - Kroger, Peer
A2 - Seidl, Thomas
A2 - Beecks, Christian
PB - Springer-Verlag
T2 - 10th International Conference on Similarity Search and Applications, SISAP 2017
Y2 - 4 October 2017 through 6 October 2017
ER -