X. Wu, X. Zhu, and S. Member, Data Mining with Big Data, pp.97-107, 2014.

R. Taylor, An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics, BMC Bioinformatics, vol.11, issue.Suppl 12, 2010.
DOI : 10.1186/1471-2105-11-S12-S1

A. Matsunaga, M. Tsugawa, and J. Fortes, CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications, 2008 IEEE Fourth International Conference on eScience, p.62, 2008.
DOI : 10.1109/eScience.2008.62

A. Huang, Similarity Measures for Text Document Clustering, 2008.

S. Tata, J. Patel, C. Science, and A. Arbor, Estimating the Selectivity of tf-idf based Cosine Similarity Predicates, pp.7-12, 2007.

T. Elsayed, J. Lin, and D. Oard, Pairwise document similarity in large collections with MapReduce, Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies Short Papers, HLT '08, pp.265-268, 2008.
DOI : 10.3115/1557690.1557767

L. Bin and G. Yuan, Improvement of TF-IDF Algorithm Based on Hadoop Framework, Proceedings of the 2nd International Conference on Computer Application and System Modeling, 2012.
DOI : 10.2991/iccasm.2012.98

J. Bank and C. B. , Calculating the Jaccard Similarity Coefficient with Map Reduce for Entity Pairs in Wikipedia, 2008.

J. Wan, W. Yu, and X. Xu, Design and Implement of Distributed Document Clustering Based on MapReduce, pp.278-280, 2009.

P. Zhou, J. Lei, and W. Ye, Large-Scale Data Sets Clustering Based on MapReduce and Hadoop, pp.5956-5963, 2011.

R. Mihalcea, C. Corley, and C. Strapparava, Corpus-based and Knowledge-based Measures of Text Semantic Similarity, 2005.

P. Turney, From Frequency to Meaning : Vector Space Models of Semantics, pp.141-188, 2010.

V. V. Raghavan and S. Wong, A critical analysis of vector space model for information retrieval, Journal of the American Society for Information Science, vol.36, issue.5, pp.279-287, 1986.
DOI : 10.1002/(SICI)1097-4571(198609)37:5<279::AID-ASI1>3.0.CO;2-Q

K. Kalaivendhan and P. Sumathi, An Efficient Clustering Method To Find Similarity Between The Documents, pp.2532-2535, 2014.

K. Shvachko, H. Kuang, S. Radia, and R. Chansler, The Hadoop Distributed File System, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 2010.
DOI : 10.1109/MSST.2010.5496972

X. Lin, Z. Meng, C. Xu, and M. Wang, A Practical Performance Model for Hadoop MapReduce, 2012 IEEE International Conference on Cluster Computing Workshops, 2012.
DOI : 10.1109/ClusterW.2012.24

J. Ekanayake, S. Pallickara, and G. Fox, MapReduce for Data Intensive Scientific Analyses, 2008 IEEE Fourth International Conference on eScience, p.59, 2008.
DOI : 10.1109/eScience.2008.59

J. Dean, S. Ghemawat, and . Mapreduce, Simplified Data Processing on Large Clusters, pp.1-13

R. Lämmel, Google???s MapReduce programming model ??? Revisited, Science of Computer Programming, vol.70, issue.1, 2008.
DOI : 10.1016/j.scico.2007.07.001