Data Driven Lemmatization and Parsing of Italian

Abstract : This paper aims at presenting some preliminary results for data driven lemmatisation for Italian. Based on a joint lemmatisation and part-of-speech tagging models, our system relies on a architecture that has already been proved successful for French. 'Besides' intrinsic evaluation for this task, we want to measure its usefulness and adequacy by using our system as input for the task of parsing. This approach achieves state-of-the-art parsing accuracy on unlabeled text without any gold information supplied (83.70% of F1 score in a 10-fold cross-validation setting), without requiring any prior knowledge of the language. This shows that our methodology is perfectly suitable for wide coverage parsing of Italian
Type de document :
Communication dans un congrès
Bernardo Magnini and Francesco Cutugno and Mauro Falcone and Emanuele Pianta. EVALITA 2011 - Evaluation of NLP and Speech Tools for Italian, Jan 2012, Rome, Italy. Springer, 7689, pp.249-256, 2012, Lecture Notes in Computer Science. 〈10.1007/978-3-642-35828-9_27〉
Liste complète des métadonnées

https://hal.inria.fr/hal-00778153
Contributeur : Djamé Seddah <>
Soumis le : vendredi 18 janvier 2013 - 18:16:55
Dernière modification le : samedi 9 juin 2018 - 10:30:05

Identifiants

Collections

Citation

Djamé Seddah, Joseph Le Roux, Benoît Sagot. Data Driven Lemmatization and Parsing of Italian. Bernardo Magnini and Francesco Cutugno and Mauro Falcone and Emanuele Pianta. EVALITA 2011 - Evaluation of NLP and Speech Tools for Italian, Jan 2012, Rome, Italy. Springer, 7689, pp.249-256, 2012, Lecture Notes in Computer Science. 〈10.1007/978-3-642-35828-9_27〉. 〈hal-00778153〉

Partager

Métriques

Consultations de la notice

304