Skip to Main content Skip to Navigation
Conference papers

A General Approach to Extracting Full Names and Abbreviations for Chinese Entities from the Web

Abstract : Identifying Full names/abbreviations for entities is a challenging problem in many applications, e.g. question answering and information retrieval. In this paper, we propose a general extraction method of extracting full names/abbreviations from Chinese Web corpora. For a given entity, we construct forward and backward query items and commit them to a search engine (e.g. Google), and utilize search results to extract full names and abbreviations for the entity. To verify the results, filtering and marking methods are used to sort all the results. Experiments show that our method achieves precision of 84.7% for abbreviations, and 77.0% for full names.
Document type :
Conference papers
Complete list of metadata

Cited literature [11 references]  Display  Hide  Download

https://hal.inria.fr/hal-01060363
Contributor : Hal Ifip <>
Submitted on : Tuesday, November 21, 2017 - 4:40:38 PM
Last modification on : Thursday, March 5, 2020 - 5:43:14 PM

File

JiangCYLW10.pdf
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

Citation

Guang Jiang, Cao Cungen, Sui Yuefei, Han Lu, Shi Wang. A General Approach to Extracting Full Names and Abbreviations for Chinese Entities from the Web. 6th IFIP TC 12 International Conference on Intelligent Information Processing (IIP), Oct 2010, Manchester, United Kingdom. pp.271-280, ⟨10.1007/978-3-642-16327-2_33⟩. ⟨hal-01060363⟩

Share

Metrics

Record views

190

Files downloads

191