Skip to Main content Skip to Navigation
Conference papers

SAPKOS: Experimental Czech Multi-label Document Classification and Analysis System

Abstract : This paper presents an experimental multi-label document classification and analysis system called SAPKOS. The system which integrates the state-of-the-art machine learning and natural language processing approaches is intended to be used by the Czech news Agency (ČTK). Its main purpose is to save human resources in the task of annotation of newspaper articles with topics. Another important functionality is automatic comparison of the ČTK production with popular Czech media. The results of this analysis will be used to adapt the ČTK production to better correspond to the today’s market requirements. An interesting contribution is that, to the best of our knowledge, no other automatic Czech document classification system exists. It is also worth mentioning that the system accuracy is very high. This score is obtained due to the unique system architecture which integrates a maximum entropy based classification engine with the novel confidence measure method.
Complete list of metadatas

Cited literature [34 references]  Display  Hide  Download

https://hal.inria.fr/hal-01385368
Contributor : Hal Ifip <>
Submitted on : Friday, October 21, 2016 - 11:43:36 AM
Last modification on : Thursday, March 5, 2020 - 5:41:08 PM

File

978-3-319-23868-5_24_Chapter.p...
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

Citation

Ladislav Lenc, Pavel Král. SAPKOS: Experimental Czech Multi-label Document Classification and Analysis System. 11th IFIP International Conference on Artificial Intelligence Applications and Innovations (AIAI 2015), Sep 2015, Bayonne, France. pp.337-350, ⟨10.1007/978-3-319-23868-5_24⟩. ⟨hal-01385368⟩

Share

Metrics

Record views

121

Files downloads

179