Skip to Main content Skip to Navigation
Conference papers

Techniques of Czech Language Lossless Text Compression

Abstract : For lossless data compression of the texts of natural language and for achieving better compression ratio we can use linguistic and grammatical properties extracted from the text analysis. This work deals with usage of word order, word categories and grammatical rules in sentences and sentence units in Czech language. Special grammatical properties of this language which are different from for example English language are used here. Further, there is an algorithm designed for searching similarities in analyzed sentence structures and its next processing to final compressed file. For analysis of the sentence units a special tool is used which allows parsing on more levels.
Complete list of metadata

Cited literature [15 references]  Display  Hide  Download

https://hal.inria.fr/hal-01637512
Contributor : Hal Ifip <>
Submitted on : Friday, November 17, 2017 - 3:45:51 PM
Last modification on : Saturday, June 1, 2019 - 11:34:02 AM
Long-term archiving on: : Sunday, February 18, 2018 - 2:36:28 PM

File

419526_1_En_24_Chapter.pdf
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

Collections

Citation

Jiří Ševčík, Jiří Dvorský. Techniques of Czech Language Lossless Text Compression. 15th IFIP International Conference on Computer Information Systems and Industrial Management (CISIM), Sep 2016, Vilnius, Lithuania. pp.265-276, ⟨10.1007/978-3-319-45378-1_24⟩. ⟨hal-01637512⟩

Share

Metrics

Record views

96

Files downloads

337