Skip to Main content Skip to Navigation
Conference papers

Text Simplification of Patent Documents

Abstract : This paper represents an automatic text simplification system for patent documents. The simplification system is embedded in the broader context of an information retrieval system which extracts IDM related knowledge from patent documents. Extracting elements of IDM ontology from patents involves training machine-learning model. However, an accuracy of the model is compromised when the given text is too long, hence the need of simplifying the texts to improve machine learning. There have been precedent studies on automatic text simplification based on hand-written rules or statistical approach. However, few researches addressed simplifying patent documents. Patent document has its particularity in its lengthy sentences and multiword expression terminology, which often hinder accurate parsing. Therefore, in this research, we present our method to automatically simplify texts of patent documents and scientific papers by analyzing their syntactic and lexical patterns.
Document type :
Conference papers
Complete list of metadata

Cited literature [24 references]  Display  Hide  Download
Contributor : Hal Ifip Connect in order to contact the contributor
Submitted on : Thursday, September 5, 2019 - 3:15:55 PM
Last modification on : Wednesday, December 1, 2021 - 3:32:12 PM
Long-term archiving on: : Thursday, February 6, 2020 - 10:19:18 AM


Files produced by the author(s)


Distributed under a Creative Commons Attribution 4.0 International License



Jeongwoo Kang, Achille Souili, Denis Cavallucci. Text Simplification of Patent Documents. 18th TRIZ Future Conference (TFC), Oct 2018, Strasbourg, France. pp.225-237, ⟨10.1007/978-3-030-02456-7_19⟩. ⟨hal-02279758⟩



Record views


Files downloads