Experiments with citation mining and key-term extraction for Prior Art Search

Abstract : This technical note presents the system built for the IP track of CLEF 2010 based on PATATRAS (PATent and Article Tracking, Retrieval and AnalysiS), the modular search infrastructure initially realized for CLEF IP 2009. We largely reused the system of the previous CLEF IP but at a relatively smaller scale and with the improvement of three main components: • A new citation mining tool based on Conditional Random Fields (CRF). • A key-term extraction module developed for technical and scientific documents and adapted to patent document structures using a vast ranges of metrics, features and a bagged decision tree. • An improvement of our multi-domain terminological database called GRISP. We used the Okapi BM25 and the Indri retrieval models for the prior art task and a KNN model for the automatic classification task under the IPC subclasses. In both tasks, specific final re-ranking techniques were used, including multiple regression models based on SVM. Although the Prior Art task was more challenging and we used a more limited number of retrieval models, we maintained similar results as last year. We performed, however, miserably at the classification task, and we consider that an instance-based KNN algorithm is not competitive with standard classifiers based on preliminary large scale training.
Type de document :
Communication dans un congrès
CLEF 2010 - Conference on Multilingual and Multimodal Information Access Evaluation, Sep 2010, Padua, Italy. 2010
Liste complète des métadonnées

Littérature citée [10 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00510267
Contributeur : Laurent Romary <>
Soumis le : mardi 17 août 2010 - 18:29:42
Dernière modification le : vendredi 3 novembre 2017 - 08:24:01
Document(s) archivé(s) le : jeudi 18 novembre 2010 - 03:15:36

Fichiers

technote.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00510267, version 1

Collections

Citation

Patrice Lopez, Laurent Romary. Experiments with citation mining and key-term extraction for Prior Art Search. CLEF 2010 - Conference on Multilingual and Multimodal Information Access Evaluation, Sep 2010, Padua, Italy. 2010. 〈inria-00510267〉

Partager

Métriques

Consultations de la notice

514

Téléchargements de fichiers

728