]. S. Agarwal, N. Snavely, I. Simon, S. Seitz, and R. Szeliski, Building rome in a day, 2009.

A. Angeli, D. Filliat, S. Doncieux, and J. Meyer, Fast and Incremental Method for Loop-Closure Detection Using Bags of Visual Words, IEEE Transactions on Robotics, vol.24, issue.5, 2008.
DOI : 10.1109/TRO.2008.2004514

URL : https://hal.archives-ouvertes.fr/hal-00652598

S. Bhat, M. Berger, G. Simon, and F. Sur, Transitive closure based visual words for point matching in video sequence, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00486749

Y. Cheng, Mean shift, mode seeking, and clustering. PAMI, 1995.

D. Comaniciu, P. Meer, and S. Member, Mean shift: a robust approach toward feature space analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.24, issue.5, 2002.
DOI : 10.1109/34.1000236

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.160.3832

C. Engels, F. Fraundorfer, and D. Nistér, Integration of tracked and recognized features for locally and globally robust structure from motion, VISAPP International Workshop on Robotic Perception, 2008.

B. Georgescu, I. Shimshoni, and P. Meer, Mean shift based clustering in high dimensions: a texture classification example, Proceedings Ninth IEEE International Conference on Computer Vision, 2003.
DOI : 10.1109/ICCV.2003.1238382

I. Gordon and D. G. Lowe, Scene modeling, recognition and tracking with invariant image features, In ISMAR, vol.1, issue.2, 2004.

B. K. Horn, Closed-form solution of absolute orientation using unit quaternions Matlab code available at www, J. Opt. Soc. Am. A, 1987.

E. Hsiao, A. Collet, and M. Hebert, Making specific features less discriminative to improve point-based 3D object recognition, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010.
DOI : 10.1109/CVPR.2010.5539981

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.187.7990

H. Jégou, M. Douze, and C. Schmid, On the burstiness of visual elements, 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
DOI : 10.1109/CVPR.2009.5206609

V. Lepetit and P. Fua, Monocular model-based 3d tracking of rigid objects: A survey. Foundations and Trends in Computer Graphics and Vision, pp.1-89, 2005.

V. Lepetit, F. Moreno-noguer, and P. Fua, EPnP: An Accurate O(n) Solution to the PnP Problem, International Journal of Computer Vision, vol.60, issue.12, 2009.
DOI : 10.1007/s11263-008-0152-6

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.184.1090

V. Lepetit, J. Pilet, and P. Fua, Point matching as a classification problem for fast and robust object pose estimation, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004., 2004.
DOI : 10.1109/CVPR.2004.1315170

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.117.8934

D. G. Lowe, Object recognition from local scale-invariant features, Proceedings of the Seventh IEEE International Conference on Computer Vision, pp.1150-1157, 1999.
DOI : 10.1109/ICCV.1999.790410

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.121.4065

J. Morel and G. Yu, ASIFT: A New Framework for Fully Affine Invariant Image Comparison, SIAM Journal on Imaging Sciences, vol.2, issue.2, 2009.
DOI : 10.1137/080732730

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.155.1721

D. M. Mount and S. Arya, ANN: A library for approximate nearest neighbor searching

D. Nister and H. Stewenius, Scalable Recognition with a Vocabulary Tree, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Volume 2 (CVPR'06), 2006.
DOI : 10.1109/CVPR.2006.264

F. Schaffalitzky and A. Zisserman, Automated location matching in movies, Computer Vision and Image Understanding, vol.92, issue.2-3, 2003.
DOI : 10.1016/j.cviu.2003.06.008

J. Sivic and A. Zisserman, Video Google: a text retrieval approach to object matching in videos, Proceedings Ninth IEEE International Conference on Computer Vision
DOI : 10.1109/ICCV.2003.1238663

A. Vedaldi and B. Fulkerson, Vlfeat, Proceedings of the international conference on Multimedia, MM '10
DOI : 10.1145/1873951.1874249

J. Xiao, J. Chen, D. Yeung, and L. Quan, Structuring Visual Words in 3D for Arbitrary-View Object Localization, 2008.
DOI : 10.1007/978-3-540-88690-7_54