Domain adaptation for sequence labeling using hidden Markov models

Edouard Grave; Guillaume Obozinski; Francis Bach

Communication Dans Un Congrès Année : 2013

Domain adaptation for sequence labeling using hidden Markov models

(1, 2) , (3) , (1, 2)

1
2
3

Edouard Grave

Fonction : Auteur
PersonId : 909352

Laboratoire d'informatique de l'école normale supérieure

Statistical Machine Learning and Parsimony

Guillaume Obozinski

Fonction : Auteur
PersonId : 942456

Laboratoire d'Informatique Gaspard-Monge

Francis Bach

Fonction : Auteur
PersonId : 863126

Laboratoire d'informatique de l'école normale supérieure

Statistical Machine Learning and Parsimony

Résumé

Most natural language processing systems based on machine learning are not robust to domain shift. For example, a state-of-the-art syntactic dependency parser trained on Wall Street Journal sentences has an absolute drop in performance of more than ten points when tested on textual data from the Web. An efficient solution to make these methods more robust to domain shift is to first learn a word representation using large amounts of unlabeled data from both domains, and then use this representation as features in a supervised learning algorithm. In this paper, we propose to use hidden Markov models to learn word representations for part-of-speech tagging. In particular, we study the influence of using data from the source, the target or both domains to learn the representation and the different ways to represent words using an HMM.

Mots clés

domain adaptation hidden Markov model word representation

Domaines

Informatique et langage [cs.CL] Apprentissage [cs.LG]

Fichier principal

da.pdf (106.03 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Edouard Grave : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-00918371

Soumis le : vendredi 13 décembre 2013-14:11:27

Dernière modification le : vendredi 19 avril 2024-16:18:55

Archivage à long terme le : mardi 18 mars 2014-12:35:35

Dates et versions

hal-00918371 , version 1 (13-12-2013)

Identifiants

HAL Id : hal-00918371 , version 1
ARXIV : 1312.4092

Citer

Edouard Grave, Guillaume Obozinski, Francis Bach. Domain adaptation for sequence labeling using hidden Markov models. New Directions in Transfer and Multi-Task: Learning Across Domains and Tasks (NIPS Workshop), Dec 2013, Lake Tahoe, United States. ⟨hal-00918371⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS ENPC CNRS INRIA UNIV-MLV LIGM_A3SI PARISTECH LIGM INRIA2 PSL ESIEE-PARIS UNIV-EIFFEL JSE2024

501 Consultations

196 Téléchargements

Domain adaptation for sequence labeling using hidden Markov models

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager