CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and MapReduce

Abstract : As Internet develops rapidly huge amounts of texts need to be processed in a short time. This entails the necessity of fast, scalable methods for text processing. In this paper a method for pairwise text similarity on massive data-sets, using the Cosine Similarity metric and the tf-idf (Term Frequency-Inverse Document Frequency) normalization method is proposed. The research approach is mainly focused on the MapReduce paradigm, a model for processing large data-sets in parallel manner, with a distributed algorithm on computer clusters. Through MapReduce model application on each step of the proposed method, text processing speed and scalability is enhanced in reference to other traditional methods. The CSMR (Cosine Similarity with MapReduce) method’s implementation is currently at the implementation stage. Precise and analytical conclusions concerning the efficiency of the proposed method are to be reached upon completion and review of the overall project phases.
Type de document :
Communication dans un congrès
Lazaros Iliadis; Ilias Maglogiannis; Harris Papadopoulos; Spyros Sioutas; Christos Makris. 10th IFIP International Conference on Artificial Intelligence Applications and Innovations (AIAI), Sep 2014, Rhodes, Greece. Springer, IFIP Advances in Information and Communication Technology, AICT-437, pp.211-220, 2014, Artificial Intelligence Applications and Innovations. 〈10.1007/978-3-662-44722-2_23〉
Liste complète des métadonnées

Littérature citée [19 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01391048
Contributeur : Hal Ifip <>
Soumis le : mercredi 2 novembre 2016 - 17:17:19
Dernière modification le : vendredi 1 décembre 2017 - 01:16:37
Document(s) archivé(s) le : vendredi 3 février 2017 - 14:53:54

Fichier

978-3-662-44722-2_23_Chapter.p...
Fichiers produits par l'(les) auteur(s)

Licence


Distributed under a Creative Commons Paternité 4.0 International License

Identifiants

Citation

Giannakouris-Salalidis Victor, Plerou Antonia, Sioutas Spyros. CSMR: A Scalable Algorithm for Text Clustering with Cosine Similarity and MapReduce. Lazaros Iliadis; Ilias Maglogiannis; Harris Papadopoulos; Spyros Sioutas; Christos Makris. 10th IFIP International Conference on Artificial Intelligence Applications and Innovations (AIAI), Sep 2014, Rhodes, Greece. Springer, IFIP Advances in Information and Communication Technology, AICT-437, pp.211-220, 2014, Artificial Intelligence Applications and Innovations. 〈10.1007/978-3-662-44722-2_23〉. 〈hal-01391048〉

Partager

Métriques

Consultations de la notice

92

Téléchargements de fichiers

249