High-dimensional approximate nearest neighbor: k -d Generalized Randomized Forests

Abstract : We propose a new data-structure, the generalized randomized k-d forest, or k-d GeRaF, for approximate nearest neighbor searching in high dimensions. In particular, we introduce new ran-domization techniques to specify a set of independently constructed trees where search is performed simultaneously, hence increasing accuracy. We omit backtracking, and we optimize distance computations , thus accelerating queries. We release public domain software GeRaF and we compare it to existing implementations of state-of-the-art methods including BBD-trees, Locality Sensitive Hashing, randomized k-d forests, and product quantization. Experimental results indicate that our method would be the method of choice in dimensions around 1,000, and probably up to 10,000, and pointsets of cardinality up to a few hundred thousands or even one million; this range of inputs is encountered in many critical applications today. For instance, we handle a real dataset of 10 6 images represented in 960 dimensions with a query time of less than 1sec on average and 90% responses being true nearest neighbors.
Document type :
Preprints, Working Papers, ...
Complete list of metadatas

Cited literature [17 references]  Display  Hide  Download

https://hal.inria.fr/hal-02370318
Contributor : Yannis Avrithis <>
Submitted on : Tuesday, November 19, 2019 - 1:40:37 PM
Last modification on : Thursday, November 21, 2019 - 1:21:05 AM

File

1603.09596.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02370318, version 1
  • ARXIV : 1603.09596

Citation

Yannis Avrithis, Ioannis Z. Emiris, Giorgos Samaras. High-dimensional approximate nearest neighbor: k -d Generalized Randomized Forests. 2016. ⟨hal-02370318⟩

Share

Metrics

Record views

26

Files downloads

27