Structured Penalties for Log-linear Language Models - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2013

Structured Penalties for Log-linear Language Models

Résumé

Language models can be formalized as loglinear regression models where the input features represent previously observed contexts up to a certain length m. The complexity of existing algorithms to learn the parameters by maximum likelihood scale linearly in nd, where n is the length of the training corpus and d is the number of observed features. We present a model that grows logarithmically in d, making it possible to efficiently leverage longer contexts. We account for the sequential structure of natural language using treestructured penalized objectives to avoid overfitting and achieve better generalization.
Fichier principal
Vignette du fichier
anil_emnlp.pdf (195.54 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-00904820 , version 1 (15-11-2013)

Identifiants

  • HAL Id : hal-00904820 , version 1

Citer

Anil Nelakanti, Cédric Archambeau, Julien Mairal, Francis Bach, Guillaume Bouchard. Structured Penalties for Log-linear Language Models. EMNLP - Empirical Methods in Natural Language Processing, Oct 2013, Seattle, United States. pp.233-243. ⟨hal-00904820⟩
486 Consultations
184 Téléchargements

Partager

Gmail Facebook X LinkedIn More