Skip to Main content Skip to Navigation
Theses

Mining Documents and Sentiments in Cross-lingual Context

Motaz Saad 1
1 SMarT - Statistical Machine Translation and Speech Modelization and Text
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : The aim of this thesis is to study sentiments in comparable documents. First, we collect English, French and Arabic comparable corpora from Wikipedia and Euronews, and we align each corpus at the document level. We further gather English-Arabic news documents from local and foreign news agencies. The English documents are collected from BBC website and the Arabic document are collected from Al-jazeera website. Second, we present a cross-lingual document similarity measure to automatically retrieve and align comparable documents. Then, we propose a cross-lingual sentiment annotation method to label source and target documents with sentiments. Finally, we use statistical measures to compare the agreement of sentiments in the source and the target pair of the comparable documents. The methods presented in this thesis are language independent and they can be applied on any language pair.
Document type :
Theses
Complete list of metadata

Cited literature [46 references]  Display  Hide  Download

https://hal.inria.fr/tel-01751251
Contributor : Motaz Saad <>
Submitted on : Sunday, February 15, 2015 - 5:17:42 PM
Last modification on : Tuesday, December 18, 2018 - 4:38:02 PM
Long-term archiving on: : Thursday, May 28, 2015 - 3:10:49 PM

Identifiers

  • HAL Id : tel-01751251, version 2

Citation

Motaz Saad. Mining Documents and Sentiments in Cross-lingual Context. Document and Text Processing. Université de Lorraine, 2015. English. ⟨NNT : 2015LORR0003⟩. ⟨tel-01751251v2⟩

Share

Metrics

Record views

791

Files downloads

1156