Skip to Main content Skip to Navigation
Conference papers

Building a Knowledge Based Summarization System for Text Data Mining

Abstract : This paper provides details on building a knowledge based automatic summarization system for mining text data. The knowledge based system mines text data on documents and webpages to create abstractive summaries by generalizing new concepts, deriving main topics, and creating new sentences. The knowledge based system makes use of the domain knowledge provided by Cyc development platform that consists of the world’s largest knowledge base and one of the most powerful inference engines. The system extracts syntactic structures and semantic features by employing natural language processing techniques and Cyc knowledge base and reasoning engine. The system creates a summary of the given documents in three stages: knowledge acquisition, knowledge discovery, and knowledge representation for human readers. The knowledge acquisition derives syntactic structure of each sentence in the documents and maps their words and their syntactic relationships into Cyc knowledge base. The knowledge discovery abstracts novel concepts and derives main topics of the documents by exploring the ontology of the mapped concepts and by clustering the concepts. The knowledge representation creates new English sentences to summarize the documents. This system has been implemented and integrated with Cyc knowledge based system. The implementation encodes a process consisting seven stages: syntactic analysis, mapping words to Cyc, concept propagation, concept weights and relations accumulation, topic derivation, subject identification, and new sentence generation. The implementation has been tested on various documents and webpages. The test performance data suggests that such a system could benefit from running on parallel and distributed computing platforms. The test results showed that the system is capable of creating new sentences that include abstracted concepts not explicitly mentioned in the original documents and that contain information synthesized from different parts of the documents to compose a summary.
Complete list of metadatas

Cited literature [20 references]  Display  Hide  Download

https://hal.inria.fr/hal-02060037
Contributor : Hal Ifip <>
Submitted on : Thursday, March 7, 2019 - 10:36:09 AM
Last modification on : Friday, March 8, 2019 - 1:23:52 AM
Long-term archiving on: : Sunday, June 9, 2019 - 10:31:10 AM

File

 Restricted access
To satisfy the distribution rights of the publisher, the document is embargoed until : 2021-01-01

Please log in to resquest access to the document

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

Citation

Andrey Timofeyev, Ben Choi. Building a Knowledge Based Summarization System for Text Data Mining. 2nd International Cross-Domain Conference for Machine Learning and Knowledge Extraction (CD-MAKE), Aug 2018, Hamburg, Germany. pp.118-133, ⟨10.1007/978-3-319-99740-7_8⟩. ⟨hal-02060037⟩

Share

Metrics

Record views

91