28560 articles – 22061 references  [version française]

inria-00071910, version 1

Approximate k-Nearest-Neighbor Searches: A New Algorithm with Probabilistic Control of the Precision

Sid-Ahmed Berrani 1, Laurent Amsaleg () a2, Patrick Gros () a2

N° RR-4675 (2002)

Abstract: This paper describes a new approach for performing efficient approximate k-nearest-neighbor searches in high-dimensional databases. This approach allows a fine and intuitive control over the precision of the search by setting at run time the maximum probability for a vector that would be in the exact answer set to be missing in the approximate set of answers. One of the contribution of this paper is an off-line process computing controlled approximations shrinking each cluster within which feature vectors are enclosed. Those approximations are values for (approximate) radiuses of clusters, and they are computed for all the levels of precision defined beforehand. Therefore, to answer a query, the search process simply considers the appropriate approximations corresponding to the desired level of precision. This may cause the actual nearest-neighbors of the query point to be ignored. Our method, however, probabilistically bounds the chances for this to happen. This paper also presents a performance study of the implementa- tion. It shows, for example, that our method is 6.72 times faster than the sequential scan when it handles more then 5 10^6 24-dimensions vectors, even when the probability of missing one of the true nearest-neighbors is below 0.01.

  • a –  CNRS
  • 1:  Thomson Multimedia R&D France
  • Thomson
  • 2:  TEXMEX (INRIA - IRISA)
  • CNRS : UMR6074 – INRIA – Institut National des Sciences Appliquées (INSA) - Rennes – Université de Rennes 1
  • Domain : Computer Science/Other
  • Keywords : APPROXIMATE NEAREST NEIGHBOR SEARCHE / MULTIDIMENSIONAL INDEXING / CONTENT-BASED RETRIEVAL
  • Internal note : RR-4675
 
  • inria-00071910, version 1
  • oai:hal.inria.fr:inria-00071910
  • From: 
  • Submitted on: Tuesday, 23 May 2006 19:13:45
  • Updated on: Monday, 8 January 2007 14:23:19