HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Experiments with citation mining and key-term extraction for Prior Art Search

Abstract : This technical note presents the system built for the IP track of CLEF 2010 based on PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS), the modular search infrastructure initially realized for CLEF IP 2009. We largely reused the system of the previous CLEF IP but at a relatively smaller scale and with the improvement of three main components: • A new citation mining tool based on Conditional Random Fields (CRF). • A key-term extraction module developed for technical and scientific documents and adapted to patent document structures using a vast ranges of metrics, features and a bagged decision tree. • An improvement of our multi-domain terminological database called GRISP. We used the Okapi BM25 and the Indri retrieval models for the prior art task and a KNN model for the automatic classification task under the IPC subclasses. In both tasks, specific final re-ranking techniques were used, including multiple regression models based on SVM. Although the Prior Art task was more challenging and we used a more limited number of retrieval models, we maintained similar results as last year. We performed, however, miserably at the classification task, and we consider that an instance-based KNN algorithm is not competitive with standard classifiers based on preliminary large scale training.
Document type :
Conference papers
Complete list of metadata

Cited literature [10 references]  Display  Hide  Download

Contributor : Laurent Romary Connect in order to contact the contributor
Submitted on : Tuesday, August 17, 2010 - 6:29:42 PM
Last modification on : Friday, February 4, 2022 - 3:19:08 AM
Long-term archiving on: : Thursday, November 18, 2010 - 3:15:36 AM


Files produced by the author(s)


  • HAL Id : inria-00510267, version 1



Patrice Lopez, Laurent Romary. Experiments with citation mining and key-term extraction for Prior Art Search. CLEF 2010 - Conference on Multilingual and Multimodal Information Access Evaluation, Sep 2010, Padua, Italy. ⟨inria-00510267⟩



Record views


Files downloads