Solutions for Processing K Nearest Neighbor Joins for Massive Data on MapReduce

Ge Song; Justine Rochas; Fabrice Huet; Frédéric Magoulès

Communication Dans Un Congrès Année : 2015

Solutions for Processing K Nearest Neighbor Joins for Massive Data on MapReduce

(1, 2) , (3) , (3) , (2)

1
2
3

Ge Song

Fonction : Auteur
PersonId : 962581

Ecole Centrale Paris

Mathématiques Appliquées aux Systèmes - EA 4037

Justine Rochas

Fonction : Auteur
PersonId : 949561

Safe Composition of Autonomous applications with Large-SCALE Execution environment

Fabrice Huet

Fonction : Auteur
PersonId : 1352
IdHAL : fabrice-huet
IdRef : 076390829

Safe Composition of Autonomous applications with Large-SCALE Execution environment

Frédéric Magoulès

Fonction : Auteur
PersonId : 171049
IdHAL : magoulesf
ORCID : 0000-0002-1198-7539
IdRef : 089428048

Mathématiques Appliquées aux Systèmes - EA 4037

Résumé

Given a point p and a set of points S, the kNN operation finds the k closest points to p in S. It is a compu-tational intensive task with a large range of applications such as knowledge discovery or data mining. However, as the volume and the dimension of data increase, only distributed approaches can perform such costly operation in a reasonable time. Recent works have focused on implementing efficient solutions using the MapReduce programming model because it is suitable for large scale data processing. Also, it can easily be executed in a distributed environment. Although these works provide different solutions to the same problem, each one has particular constraints and properties. There is no readily available comparison to help users choose the one most appropriate for their needs. This is the problem we address in this work. Firstly, we show that all kNN implementations go through a common workflow, which we use as a basis for classification. Secondly, we describe precisely the different techniques published so far. And lastly, we provide a set of objective criteria that can be used to make informed decisions.

Mots clés

Hadoop MapReduce kNN Join Data Partition

Domaines

Calcul parallèle, distribué et partagé [cs.DC] Algorithme et structure de données [cs.DS] Informatique et langage [cs.CL]

Fichier principal

bare_conf.pdf (558.11 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Justine Rochas : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01097337

Soumis le : lundi 12 janvier 2015-13:34:08

Dernière modification le : lundi 26 février 2024-11:22:13

Archivage à long terme le : jeudi 10 septembre 2015-23:40:52

Dates et versions

hal-01097337 , version 1 (12-01-2015)

Identifiants

HAL Id : hal-01097337 , version 1

Citer

Ge Song, Justine Rochas, Fabrice Huet, Frédéric Magoulès. Solutions for Processing K Nearest Neighbor Joins for Massive Data on MapReduce. 23rd Euromicro International Conference on Parallel, Distributed and Network-based Processing, Mar 2015, Turku, Finland. ⟨hal-01097337⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA I3S MAS GRID5000 CENTRALESUPELEC INRIA2 MICS UNIV-PARIS-SACLAY UNIV-COTEDAZUR SILECS GS-ENGINEERING GS-COMPUTER-SCIENCE

423 Consultations

1818 Téléchargements

Solutions for Processing K Nearest Neighbor Joins for Massive Data on MapReduce

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager