An Efficient Data Indexing Approach on Hadoop Using Java Persistence API

Abstract : Data indexing is common in data mining when working with high-dimensional, large-scale data sets. Hadoop, a cloud computing project using the MapReduce framework in Java, has become of significant interest in distributed data mining. To resolve problems of globalization, random-write and duration in Hadoop, a data indexing approach on Hadoop using the Java Persistence API (JPA) is elaborated in the implementation of a KD-tree algorithm on Hadoop. An improved intersection algorithm for distributed data indexing on Hadoop is proposed, it performs O(M+logN), and is suitable for occasions of multiple intersections. We compare the data indexing algorithm on open dataset and synthetic dataset in a modest cloud environment. The results show the algorithms are feasible in large-scale data mining.
Type de document :
Communication dans un congrès
Zhongzhi Shi; Sunil Vadera; Agnar Aamodt; David Leake. 6th IFIP TC 12 International Conference on Intelligent Information Processing (IIP), Oct 2010, Manchester, United Kingdom. Springer, IFIP Advances in Information and Communication Technology, AICT-340, pp.213-224, 2010, Intelligent Information Processing V. 〈10.1007/978-3-642-16327-2_27〉
Liste complète des métadonnées

Littérature citée [13 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01055056
Contributeur : Hal Ifip <>
Soumis le : lundi 11 août 2014 - 13:10:02
Dernière modification le : vendredi 3 novembre 2017 - 22:24:06
Document(s) archivé(s) le : mercredi 26 novembre 2014 - 21:55:58

Fichier

An_Efficient_Data_Indexing_App...
Fichiers produits par l'(les) auteur(s)

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

Citation

Yang Lai, Shi Zhongzhi. An Efficient Data Indexing Approach on Hadoop Using Java Persistence API. Zhongzhi Shi; Sunil Vadera; Agnar Aamodt; David Leake. 6th IFIP TC 12 International Conference on Intelligent Information Processing (IIP), Oct 2010, Manchester, United Kingdom. Springer, IFIP Advances in Information and Communication Technology, AICT-340, pp.213-224, 2010, Intelligent Information Processing V. 〈10.1007/978-3-642-16327-2_27〉. 〈hal-01055056〉

Partager

Métriques

Consultations de la notice

249

Téléchargements de fichiers

397