TY - JOUR
T1 - Re-ranking via local embeddings
T2 - a use case with permutation-based indexing and the nSimplex projection
AU - Vadicamo, Lucia
AU - Gennaro, Claudio
AU - Falchi, Fabrizio
AU - Chávez, Edgar
AU - Connor, Richard
AU - Amato, Giuseppe
N1 - The work was supported by Smart News - Social sensing for breaking news project (co-funded by the Tuscany region, CUP CIPE D58C15000270008), VISECH ARCO-CNR project (co-funded by the Tuscany region , CUP B56J17001330004), and AI4EU project (funded by the EC, H2020 - Contract n. 825619).
PY - 2021/1
Y1 - 2021/1
N2 - Approximate Nearest Neighbor (ANN) search is a prevalent paradigm for searching intrinsically high dimensional objects in large-scale data sets. Recently, the permutation-based approach for ANN has attracted a lot of interest due to its versatility in being used in the more general class of metric spaces. In this approach, the entire database is ranked by a permutation distance to the query. Typically, permutations allow the efficient selection of a candidate set of results, but typically to achieve high recall or precision this set has to be reviewed using the original metric and data. This can lead to a sizeable percentage of the database being recalled, along with many expensive distance calculations.To reduce the number of metric computations and the number of database elements accessed, we propose here a re-ranking based on a local embedding using the nSimplex projection. The nSimplex projection produces Euclidean vectors from objects in metric spaces which possess the n-point property. The mapping is obtained from the distances to a set of reference objects, and the original metric can be lower bounded and upper bounded by the Euclidean distance of objects sharing the same set of references.Our approach is particularly advantageous for extensive databases or expensive metric function. We reuse the distances computed in the permutations in the first stage, and hence the memory footprint of the index is not increased.An extensive experimental evaluation of our approach is presented, demonstrating excellent results even on a set of hundreds of millions of objects.
AB - Approximate Nearest Neighbor (ANN) search is a prevalent paradigm for searching intrinsically high dimensional objects in large-scale data sets. Recently, the permutation-based approach for ANN has attracted a lot of interest due to its versatility in being used in the more general class of metric spaces. In this approach, the entire database is ranked by a permutation distance to the query. Typically, permutations allow the efficient selection of a candidate set of results, but typically to achieve high recall or precision this set has to be reviewed using the original metric and data. This can lead to a sizeable percentage of the database being recalled, along with many expensive distance calculations.To reduce the number of metric computations and the number of database elements accessed, we propose here a re-ranking based on a local embedding using the nSimplex projection. The nSimplex projection produces Euclidean vectors from objects in metric spaces which possess the n-point property. The mapping is obtained from the distances to a set of reference objects, and the original metric can be lower bounded and upper bounded by the Euclidean distance of objects sharing the same set of references.Our approach is particularly advantageous for extensive databases or expensive metric function. We reuse the distances computed in the permutations in the first stage, and hence the memory footprint of the index is not increased.An extensive experimental evaluation of our approach is presented, demonstrating excellent results even on a set of hundreds of millions of objects.
KW - Distance bounds
KW - Metric local embeddings
KW - Metric search
KW - n-point property
KW - nSimplex projection
KW - Permutation-based indexing
UR - https://www.scopus.com/pages/publications/85079425963
U2 - 10.1016/j.is.2020.101506
DO - 10.1016/j.is.2020.101506
M3 - Article
AN - SCOPUS:85079425963
SN - 0306-4379
VL - 95
JO - Information Systems
JF - Information Systems
M1 - 101506
ER -