Skip to Main content Skip to Navigation
Conference papers

Text Simplification of Patent Documents

Abstract : This paper represents an automatic text simplification system for patent documents. The simplification system is embedded in the broader context of an information retrieval system which extracts IDM related knowledge from patent documents. Extracting elements of IDM ontology from patents involves training machine-learning model. However, an accuracy of the model is compromised when the given text is too long, hence the need of simplifying the texts to improve machine learning. There have been precedent studies on automatic text simplification based on hand-written rules or statistical approach. However, few researches addressed simplifying patent documents. Patent document has its particularity in its lengthy sentences and multiword expression terminology, which often hinder accurate parsing. Therefore, in this research, we present our method to automatically simplify texts of patent documents and scientific papers by analyzing their syntactic and lexical patterns.
Document type :
Conference papers
Complete list of metadata

Cited literature [24 references]  Display  Hide  Download

https://hal.inria.fr/hal-02279758
Contributor : Hal Ifip <>
Submitted on : Thursday, September 5, 2019 - 3:15:55 PM
Last modification on : Friday, September 6, 2019 - 11:48:22 AM
Long-term archiving on: : Thursday, February 6, 2020 - 10:19:18 AM

File

474537_1_En_19_Chapter.pdf
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

Citation

Jeongwoo Kang, Achille Souili, Denis Cavallucci. Text Simplification of Patent Documents. 18th TRIZ Future Conference (TFC), Oct 2018, Strasbourg, France. pp.225-237, ⟨10.1007/978-3-030-02456-7_19⟩. ⟨hal-02279758⟩

Share

Metrics

Record views

122

Files downloads

53