Comparing performance of different set-covering strategies for linguistic content optimization in speech corpora - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2012

Comparing performance of different set-covering strategies for linguistic content optimization in speech corpora

Nelly Barbot
Olivier Boëffard
  • Fonction : Auteur
  • PersonId : 883118
Arnaud Delhay

Résumé

Set covering algorithms are efficient tools for solving an optimal linguistic corpus reduction. The optimality of such a process is directly related to the descriptive features of the sentences of a reference corpus. This article suggests to verify experimentally the behaviour of three algorithms, a greedy approach and a lagrangian relaxation based one giving importance to rare events and a third one considering the Kullback-Liebler divergence between a reference and the ongoing distribution of events. The analysis of the content of the reduced corpora shows that the both first approaches stay the most effective to compress a corpus while guaranteeing a minimal content. The variant which minimises the Kullback-Liebler divergence guarantees a distribution of events close to a reference distribution as expected; however, the price for this solution is a much more important corpus. In the proposed experiments, we have also evaluated a mixed-approach considering a random complement to the smallest coverings.

Domaines

Son [cs.SD]
Fichier non déposé

Dates et versions

hal-00784377 , version 1 (04-02-2013)

Identifiants

  • HAL Id : hal-00784377 , version 1

Citer

Nelly Barbot, Olivier Boëffard, Arnaud Delhay. Comparing performance of different set-covering strategies for linguistic content optimization in speech corpora. International Conference on Language Resources and Evaluation (LREC'12), May 2012, Istanbul, Turkey. ⟨hal-00784377⟩
171 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More