Indiscriminateness in representation spaces of terms and documents - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

Indiscriminateness in representation spaces of terms and documents

Résumé

Examining the properties of representation spaces for documents or words in Information Retrieval (IR) – typically R n with n large – brings precious insights to help the retrieval process. Recently, several authors have studied the real dimensionality of the datasets, called intrinsic dimensionality, in specific parts of these spaces [14]. They have shown that this dimensionality is chiefly tied with the notion of in-discriminateness among neighbors of a query point in the vector space. In this paper, we propose to revisit this notion in the specific case of IR. More precisely, we show how to estimate indiscriminateness from IR similarities in order to use it in representation spaces used for documents and words [18, 7]. We show that indiscriminateness may be used to characterize difficult queries; moreover we show that this notion, applied to word embeddings, can help to choose terms to use for query expansion.
Fichier principal
Vignette du fichier
Claveau_ECIR2018.pdf (489.3 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01859568 , version 1 (22-08-2018)

Identifiants

Citer

Vincent Claveau. Indiscriminateness in representation spaces of terms and documents. ECIR 2018 - 40th European Conference in Information Retrieval, Mar 2018, Grenoble, France. pp.251-262, ⟨10.1007/978-3-319-76941-7_19⟩. ⟨hal-01859568⟩
130 Consultations
154 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More