Building and Modelling Multilingual Subjective Corpora

Motaz Saad 1 David Langlois 1 Kamel Smaïli 1
1 SMarT - Statistical Machine Translation and Speech Modelization and Text
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : Building multilingual opinionated models requires multilingual corpora annotated with opinion labels. Unfortunately, such kind of corpora are rare. We consider opinions in this work as subjective or objective. In this paper, we introduce an annotation method that can be reliably transferred across topic domains and across languages. The method starts by building a classifier that annotates sentences into subjective/objective label using a training data from "movie reviews" domain which is in English language. The annotation can be transferred to another language by classifying English sentences in parallel corpora and transferring the same annotation to the same sentences of the other language. We also shed the light on the link between opinion mining and statistical language modelling, and how such corpora are useful for domain specific language modelling. We show the distinction between subjective and objective sentences which tends to be stable across domains and languages. Our experiments show that language models trained on objective (respectively subjective) corpus lead to better perplexities on objective (respectively subjective) test.
Type de document :
Communication dans un congrès
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), May 2014, Reykjavik, Iceland, Iceland. European Language Resources Association (ELRA), 2014
Liste complète des métadonnées

Littérature citée [10 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-00995755
Contributeur : Motaz Saad <>
Soumis le : lundi 20 novembre 2017 - 09:59:20
Dernière modification le : lundi 24 septembre 2018 - 09:04:03
Document(s) archivé(s) le : mercredi 21 février 2018 - 12:34:35

Fichier

LREC2014Smaili.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00995755, version 1

Collections

Citation

Motaz Saad, David Langlois, Kamel Smaïli. Building and Modelling Multilingual Subjective Corpora. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), May 2014, Reykjavik, Iceland, Iceland. European Language Resources Association (ELRA), 2014. 〈hal-00995755〉

Partager

Métriques

Consultations de la notice

252

Téléchargements de fichiers

43