Entity ranking in Wikipedia

Anne-Marie Vercoustre; James A. Thom; Jovan Pehcevski

Rapport (Rapport De Recherche) Année : 2007

Entity ranking in Wikipedia

(1, 2) , (3) , (2)

1
2
3

Anne-Marie Vercoustre

Fonction : Auteur
PersonId : 830030

INRIA Rocquencourt

Usage-centered design, analysis and improvement of information systems

James A. Thom

Fonction : Auteur

Computer Science and Information Technology

Jovan Pehcevski

Fonction : Auteur
PersonId : 830040

Usage-centered design, analysis and improvement of information systems

Résumé

The traditional entity extraction problem lies in the ability of extracting named entities from plain text using natural language processing techniques and intensive training from large document collections. Examples of named entities include organisations, people, locations, or dates. There are many research activities involving named entities; we are interested in entity ranking in the field of information retrieval. In this paper, we describe our approach to identifying and ranking entities from the INEX Wikipedia document collection. Wikipedia offers a number of interesting features for entity identification and ranking that we first introduce. We then describe the principles and the architecture of our entity ranking system. The paper also introduces our methodology for evaluating the effectiveness of entity ranking, as well as preliminary results which show that the use of categories and the link structure of Wikipedia, together with entity examples, can significantly improve retrieval effectiveness.

Mots clés

Entity Ranking Test collection XML Retrieval Wikipedia

Domaines

Recherche d'information [cs.IR]

Fichier principal

RapportInria.pdf (235.09 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Anne-Marie Vercoustre : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00172511

Soumis le : lundi 17 septembre 2007-14:10:12

Dernière modification le : mercredi 15 mars 2023-08:58:08

Archivage à long terme le : jeudi 8 avril 2010-21:56:34

Dates et versions

inria-00172511 , version 1 (17-09-2007)

inria-00172511 , version 2 (18-09-2007)

Identifiants

HAL Id : inria-00172511 , version 1

Citer

Anne-Marie Vercoustre, James A. Thom, Jovan Pehcevski. Entity ranking in Wikipedia. [Research Report] RR-6294, 2007, pp.8. ⟨inria-00172511v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INRIA-RRRT

334 Consultations

398 Téléchargements

Entity ranking in Wikipedia

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager