An Efficient Data Indexing Approach on Hadoop Using Java Persistence API - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2010

An Efficient Data Indexing Approach on Hadoop Using Java Persistence API

Résumé

Data indexing is common in data mining when working with high-dimensional, large-scale data sets. Hadoop, a cloud computing project using the MapReduce framework in Java, has become of significant interest in distributed data mining. To resolve problems of globalization, random-write and duration in Hadoop, a data indexing approach on Hadoop using the Java Persistence API (JPA) is elaborated in the implementation of a KD-tree algorithm on Hadoop. An improved intersection algorithm for distributed data indexing on Hadoop is proposed, it performs O(M+logN), and is suitable for occasions of multiple intersections. We compare the data indexing algorithm on open dataset and synthetic dataset in a modest cloud environment. The results show the algorithms are feasible in large-scale data mining.
Fichier principal
Vignette du fichier
An_Efficient_Data_Indexing_Approach_on_Hadoop_using_Java_Persistence_API.pdf (484.95 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01055056 , version 1 (11-08-2014)

Licence

Paternité

Identifiants

Citer

Yang Lai, Shi Zhongzhi. An Efficient Data Indexing Approach on Hadoop Using Java Persistence API. 6th IFIP TC 12 International Conference on Intelligent Information Processing (IIP), Oct 2010, Manchester, United Kingdom. pp.213-224, ⟨10.1007/978-3-642-16327-2_27⟩. ⟨hal-01055056⟩
275 Consultations
405 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More