Skip to Main content Skip to Navigation
Conference papers

Comparing Sanskrit Texts for Critical Editions: the sequences move problem

Abstract : A critical edition takes into account various versions of the same text in order to show the differences between two distinct versions, in terms of words that have been missing, changed, omitted or displaced. Traditionally, Sanskrit is written without spaces between words, and the word order can be changed without altering the meaning of a sentence. This paper describes the characteristics which make Sanskrit text comparisons a specific matter. It presents two different methods for comparing Sanskrit texts, which can be used to develop a computer assisted critical edition. The first one method uses the L.C.S., while the second one uses the global alignment algorithm. Comparing them, we see that the second method provides better results, but that neither of these methods can detect when a word or a sentence fragment has been moved. We then present a method based on N-gram that can detect such a movement when it is not too far from its original location. We will see how the method behaves on several examples and look for future possible developments.
Document type :
Conference papers
Complete list of metadata

Cited literature [10 references]  Display  Hide  Download

https://hal.inria.fr/hal-00796131
Contributor : Greyc Référent Connect in order to contact the contributor
Submitted on : Friday, July 11, 2014 - 11:53:24 AM
Last modification on : Saturday, June 25, 2022 - 9:47:03 AM
Long-term archiving on: : Saturday, October 11, 2014 - 10:35:31 AM

File

ACTI-KEMMAR-2012-1.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-00796131, version 1

Citation

Nicolas Béchet, Marc Le Pouliquen, Marc Csernel. Comparing Sanskrit Texts for Critical Editions: the sequences move problem. 13th Internationlal Conference on Intelligent Text Processing and Computational Linguistics, Indian Institute of Technology Delhi, 2012, New Delhi, India. ⟨hal-00796131⟩

Share

Metrics

Record views

401

Files downloads

280