NV-tree: An Efficient Disk-Based Index for Approximate Search in Very Large High-Dimensional Collections

Herwig Lejsek 1, 2 Friðrik-Heidar Ásmundsson 1 Björn Þór Jónsson 2 Laurent Amsaleg 3
3 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : Over the last two decades, much research effort has been spent on nearest neighbor search in high-dimensional data sets. Most of the approaches published thus far have, however, only been tested on rather small collections. When large collections have been considered, high-performance environments have been used, in particular systems with a large main memory. Accessing data on disk has largely been avoided because disk operations are considered to be too slow. It has been shown, however, that using large amounts of memory is generally not an economic choice. Therefore, we propose the NV-tree, which is a very efficient disk-based data structure that can give good approximate answers to nearest neighbor queries with a single disk operation, even for very large collections of high-dimensional data. Using a single NV-tree, the returned results have high recall but contain a number of false positives. By combining two or three NV-trees, most of those false positives can be avoided while retaining the high recall. Finally, we compare the NV-tree to locality sensitive hashing, a popular method for ??-distance search. We show that they return results of similar quality, but the NV-tree uses many fewer disk reads
Type de document :
Article dans une revue
IEEE Transactions on Pattern Analysis and Machine Intelligence, Institute of Electrical and Electronics Engineers, 2009, 31 (5), pp.869-883. 〈http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4531743〉. 〈10.1109/TPAMI.2008.130〉
Liste complète des métadonnées

https://hal.inria.fr/hal-00794359
Contributeur : Patrick Gros <>
Soumis le : lundi 25 février 2013 - 16:21:57
Dernière modification le : jeudi 11 janvier 2018 - 06:20:10

Identifiants

Collections

Citation

Herwig Lejsek, Friðrik-Heidar Ásmundsson, Björn Þór Jónsson, Laurent Amsaleg. NV-tree: An Efficient Disk-Based Index for Approximate Search in Very Large High-Dimensional Collections. IEEE Transactions on Pattern Analysis and Machine Intelligence, Institute of Electrical and Electronics Engineers, 2009, 31 (5), pp.869-883. 〈http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4531743〉. 〈10.1109/TPAMI.2008.130〉. 〈hal-00794359〉

Partager

Métriques

Consultations de la notice

205