Skip to Main content Skip to Navigation
New interface
Conference papers

Techniques of Czech Language Lossless Text Compression

Abstract : For lossless data compression of the texts of natural language and for achieving better compression ratio we can use linguistic and grammatical properties extracted from the text analysis. This work deals with usage of word order, word categories and grammatical rules in sentences and sentence units in Czech language. Special grammatical properties of this language which are different from for example English language are used here. Further, there is an algorithm designed for searching similarities in analyzed sentence structures and its next processing to final compressed file. For analysis of the sentence units a special tool is used which allows parsing on more levels.
Complete list of metadata

Cited literature [15 references]  Display  Hide  Download
Contributor : Hal Ifip Connect in order to contact the contributor
Submitted on : Friday, November 17, 2017 - 3:45:51 PM
Last modification on : Saturday, June 1, 2019 - 11:34:02 AM
Long-term archiving on: : Sunday, February 18, 2018 - 2:36:28 PM


Files produced by the author(s)


Distributed under a Creative Commons Attribution 4.0 International License




Jiří Ševčík, Jiří Dvorský. Techniques of Czech Language Lossless Text Compression. 15th IFIP International Conference on Computer Information Systems and Industrial Management (CISIM), Sep 2016, Vilnius, Lithuania. pp.265-276, ⟨10.1007/978-3-319-45378-1_24⟩. ⟨hal-01637512⟩



Record views


Files downloads