Experiments with citation mining and key-term extraction for Prior Art Search - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2010

Experiments with citation mining and key-term extraction for Prior Art Search

Résumé

This technical note presents the system built for the IP track of CLEF 2010 based on PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS), the modular search infrastructure initially realized for CLEF IP 2009. We largely reused the system of the previous CLEF IP but at a relatively smaller scale and with the improvement of three main components: • A new citation mining tool based on Conditional Random Fields (CRF). • A key-term extraction module developed for technical and scientific documents and adapted to patent document structures using a vast ranges of metrics, features and a bagged decision tree. • An improvement of our multi-domain terminological database called GRISP. We used the Okapi BM25 and the Indri retrieval models for the prior art task and a KNN model for the automatic classification task under the IPC subclasses. In both tasks, specific final re-ranking techniques were used, including multiple regression models based on SVM. Although the Prior Art task was more challenging and we used a more limited number of retrieval models, we maintained similar results as last year. We performed, however, miserably at the classification task, and we consider that an instance-based KNN algorithm is not competitive with standard classifiers based on preliminary large scale training.
Fichier principal
Vignette du fichier
technote.pdf (173.13 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

inria-00510267 , version 1 (17-08-2010)

Identifiants

  • HAL Id : inria-00510267 , version 1

Citer

Patrice Lopez, Laurent Romary. Experiments with citation mining and key-term extraction for Prior Art Search. CLEF 2010 - Conference on Multilingual and Multimodal Information Access Evaluation, Sep 2010, Padua, Italy. ⟨inria-00510267⟩

Collections

INRIA INRIA2
431 Consultations
776 Téléchargements

Partager

Gmail Facebook X LinkedIn More