. Mikolov, episciences.org ISSN 2416-5999, an open-access journal Journal of Data Mining and Digital Humanities http://jdmdh.episciences.org ISSN 2416-5999, an open-access journal Another perspective is the study of the efficiency and scalability of statistico-semantic methods. We are currently working on the implementation of methods such as word2vec, 2013) and the improvement of memory efficiency of the algorithm of (Mousselly-Sergiehet al.,2013) based on stream processing methods

J. P. Bao, J. Y. Shen, H. Y. Liu, and X. D. Liu, « A fast document copy detection model ». Soft Computing -A Fusion of Foundations, Methodologies and Applications, vol.10, pp.1-41, 2006.
DOI : 10.1007/s00500-005-0463-2

M. Büchler, A. Geßner, T. Eckart, and G. Heyer, « Unsupervised Detection and Visualisation of Textual Reuse on Ancient Greek Texts, Journal of the Chicago Colloquium on Digital Humanities and Computer Science, vol.1, pp.2-3, 2010.

M. Büchler, G. Crane, M. Moritz, and A. Babeu, Increasing Recall for Text Re-use in Historical Documents to Support Research in the Humanities, Lecture Notes in Computer Science, vol.7489, pp.95-100, 2012.
DOI : 10.1007/978-3-642-33290-6_11

M. Büchler, G. Crane, M. Mueller, P. Burns, G. Heyer et al., Step Closer To Paraphrase Detection On Historical Texts: About The Quality of Text Re-use Techniques and the Ability to Learn Paradigmatic Relations, Journal of the Chicago Colloquium on Digital Humanities and Computer Science, 2011.

B. Coulie and . La, lemmatisation des textes grecs et byzantins : une approche particulière de la langue et des auteurs, Byzantion : revue internationale des études byzantines, pp.35-54, 1996.

D. P. Lyras, K. N. Sgarbas, and N. D. Fakotakis, APPLYING SIMILARITY MEASURES FOR AUTOMATIC LEMMATIZATION: A CASE STUDY FOR MODERN GREEK AND ENGLISH, International Journal on Artificial Intelligence Tools, vol.18, issue.05, p.1043, 1142.
DOI : 10.1075/jgl.4.09ral

A. Ernst-gerlach and G. Crane, « Identifying quotations in reference works and primary materials », Research and Advanced Technology for Digital Libraries, pp.78-87, 2008.
DOI : 10.1007/978-3-540-87599-4_9

F. Alvi, M. El-sayed, . El-alfy, G. Wasfi, . Khatib et al., Abdel-Aal, « Analysis and extraction of sentence-level paraphrase sub-corpus in CS education, Proceedings of the 2012 ACM SIGITE Conference, pp.49-54

C. Jo and B. Pavel, New Functions for Unsupervised Asymmetrical Paraphrase Detection, Journal of Software, vol.2, issue.4, pp.4-12, 2007.
DOI : 10.4304/jsw.2.4.12-23

J. Lee, A computational model of text reuse in ancient literary texts, Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp.472-479, 2007.

J. Leskovec, L. Backstrom, and J. Kleinberk, Meme-tracking and the dynamics of the news cycle, Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '09, pp.497-506
DOI : 10.1145/1557019.1557077

R. Lukashenko, V. Graudina, and J. Grundspenkis, Computer-based plagiarism detection methods and tools, Proceedings of the 2007 international conference on Computer systems and technologies , CompSysTech '07, pp.1-6, 2007.
DOI : 10.1145/1330598.1330642

T. Mikolov, K. Chen, G. Corrado, and J. Dean, Efficient Estimation of Word Representations in Vector Space. Computation and Language, 2013.

T. Mikolov, I. Sutskever, K. C. Corrado, G. S. Dean, and J. , Distributed Representations of Words and their Compositionality Retrieved from http, NIPS, pp.3111-3119, 2013.

H. Mousselly-sergieh, E. Egyed-zsigmond, G. Gianini, M. Döller, H. Kosch et al., « Tag Similarity in Folksonomies, pp.319-334, 2013.

R. Nawab, M. Stevenson, and P. Clough, « Detecting Text Reuse with Modified and Weighted N-grams, Proceedings of the ACM First Joint Conference on Lexical and Computational Semantics, pp.54-58, 2012.