Mining monolingual and bilingual corpora

Abstract : In this paper, we describe two new methods of mining monolingual and bilingual text corpora that heavily rely on the use of association rules and triggers. The association rules based method is firstly applied in query expansion. The conducted experiments on French newspapers and on a set of scientific documents show that the proposed approach outperforms the baseline model. The second method focuses on the machine translation and is motivated by the results of triggers on statistical language modeling. In order to build up a translation table, association rules and triggers are then generalized to mine bilingual corpora. In this respect, we propose respectively the concepts of inter-lingual association rules and inter-lingual triggers. Both methods have been integrated in a real statistical machine translation. Carried out experiments highlight the practical feasibility of the introduced approaches in the context of machine translation and show that inter-lingual triggers achieve better results than those obtained using the third IBM model.
Type de document :
Article dans une revue
Intelligent Data Analysis, IOS Press, 2010, 14 (6), pp.663-682
Liste complète des métadonnées

Littérature citée [33 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00545493
Contributeur : David Langlois <>
Soumis le : mardi 21 novembre 2017 - 15:03:13
Dernière modification le : jeudi 11 janvier 2018 - 06:19:57

Identifiants

  • HAL Id : inria-00545493, version 1

Collections

Citation

Chiraz Latiri, Kamel Smaïli, Caroline Lavecchia, David Langlois. Mining monolingual and bilingual corpora. Intelligent Data Analysis, IOS Press, 2010, 14 (6), pp.663-682. 〈inria-00545493〉

Partager

Métriques

Consultations de la notice

338

Téléchargements de fichiers

29