Solutions for Processing K Nearest Neighbor Joins for Massive Data on MapReduce

Ge Song 1 Justine Rochas 2 Fabrice Huet 2 Frédéric Magoulès 1
2 SCALE - Safe Composition of Autonomous applications with Large-SCALE Execution environment
CRISAM - Inria Sophia Antipolis - Méditerranée , COMRED - COMmunications, Réseaux, systèmes Embarqués et Distribués
Abstract : Given a point p and a set of points S, the kNN operation finds the k closest points to p in S. It is a compu-tational intensive task with a large range of applications such as knowledge discovery or data mining. However, as the volume and the dimension of data increase, only distributed approaches can perform such costly operation in a reasonable time. Recent works have focused on implementing efficient solutions using the MapReduce programming model because it is suitable for large scale data processing. Also, it can easily be executed in a distributed environment. Although these works provide different solutions to the same problem, each one has particular constraints and properties. There is no readily available comparison to help users choose the one most appropriate for their needs. This is the problem we address in this work. Firstly, we show that all kNN implementations go through a common workflow, which we use as a basis for classification. Secondly, we describe precisely the different techniques published so far. And lastly, we provide a set of objective criteria that can be used to make informed decisions.
Type de document :
Communication dans un congrès
23rd Euromicro International Conference on Parallel, Distributed and Network-based Processing, Mar 2015, Turku, Finland. 2015
Liste complète des métadonnées

Littérature citée [30 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01097337
Contributeur : Justine Rochas <>
Soumis le : lundi 12 janvier 2015 - 13:34:08
Dernière modification le : jeudi 8 décembre 2016 - 10:31:43
Document(s) archivé(s) le : jeudi 10 septembre 2015 - 23:40:52

Fichier

bare_conf.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01097337, version 1

Collections

Citation

Ge Song, Justine Rochas, Fabrice Huet, Frédéric Magoulès. Solutions for Processing K Nearest Neighbor Joins for Massive Data on MapReduce. 23rd Euromicro International Conference on Parallel, Distributed and Network-based Processing, Mar 2015, Turku, Finland. 2015. 〈hal-01097337〉

Partager

Métriques

Consultations de
la notice

484

Téléchargements du document

1601