Experiments with citation mining and key-term extraction for Prior Art Search

Abstract : This technical note presents the system built for the IP track of CLEF 2010 based on PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS), the modular search infrastructure initially realized for CLEF IP 2009. We largely reused the system of the previous CLEF IP but at a relatively smaller scale and with the improvement of three main components: • A new citation mining tool based on Conditional Random Fields (CRF). • A key-term extraction module developed for technical and scientific documents and adapted to patent document structures using a vast ranges of metrics, features and a bagged decision tree. • An improvement of our multi-domain terminological database called GRISP. We used the Okapi BM25 and the Indri retrieval models for the prior art task and a KNN model for the automatic classification task under the IPC subclasses. In both tasks, specific final re-ranking techniques were used, including multiple regression models based on SVM. Although the Prior Art task was more challenging and we used a more limited number of retrieval models, we maintained similar results as last year. We performed, however, miserably at the classification task, and we consider that an instance-based KNN algorithm is not competitive with standard classifiers based on preliminary large scale training.
Document type :
Conference papers
Complete list of metadatas

Cited literature [10 references]  Display  Hide  Download

https://hal.inria.fr/inria-00510267
Contributor : Laurent Romary <>
Submitted on : Tuesday, August 17, 2010 - 6:29:42 PM
Last modification on : Friday, March 22, 2019 - 2:22:12 PM
Long-term archiving on : Thursday, November 18, 2010 - 3:15:36 AM

Files

technote.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : inria-00510267, version 1

Collections

Citation

Patrice Lopez, Laurent Romary. Experiments with citation mining and key-term extraction for Prior Art Search. CLEF 2010 - Conference on Multilingual and Multimodal Information Access Evaluation, Sep 2010, Padua, Italy. ⟨inria-00510267⟩

Share

Metrics

Record views

549

Files downloads

809