Skip to Main content Skip to Navigation

Mining Documents and Sentiments in Cross-lingual Context

Motaz Saad 1 
1 SMarT - Statistical Machine Translation and Speech Modelization and Text
LORIA - NLPKD - Department of Natural Language Processing & Knowledge Discovery
Abstract : The aim of this thesis is to study sentiments in comparable documents. First, we collect English, French and Arabic comparable corpora from Wikipedia and Euronews, and we align each corpus at the document level. We further gather English-Arabic news documents from local and foreign news agencies. The English documents are collected from BBC website and the Arabic document are collected from Al-jazeera website. Second, we present a cross-lingual document similarity measure to automatically retrieve and align comparable documents. Then, we propose a cross-lingual sentiment annotation method to label source and target documents with sentiments. Finally, we use statistical measures to compare the agreement of sentiments in the source and the target pair of the comparable documents. The methods presented in this thesis are language independent and they can be applied on any language pair.
Document type :
Complete list of metadata

Cited literature [46 references]  Display  Hide  Download
Contributor : Motaz Saad Connect in order to contact the contributor
Submitted on : Sunday, February 15, 2015 - 5:17:42 PM
Last modification on : Saturday, October 16, 2021 - 11:26:09 AM
Long-term archiving on: : Thursday, May 28, 2015 - 3:10:49 PM


  • HAL Id : tel-01751251, version 2


Motaz Saad. Mining Documents and Sentiments in Cross-lingual Context. Document and Text Processing. Université de Lorraine, 2015. English. ⟨NNT : 2015LORR0003⟩. ⟨tel-01751251v2⟩



Record views


Files downloads