A bottom-up efficient algorithm learning substitutable languages from positive examples

François Coste; Gaelle Garet; Jacques Nicolas

Communication Dans Un Congrès Année : 2014

A bottom-up efficient algorithm learning substitutable languages from positive examples

(1) , (1) , (1)

François Coste

Fonction : Auteur
PersonId : 9592
IdHAL : francois-coste
ORCID : 0000-0001-9134-6557
IdRef : 133160203

Dynamics, Logics and Inference for biological Systems and Sequences

Gaelle Garet

Fonction : Auteur
PersonId : 913472

Dynamics, Logics and Inference for biological Systems and Sequences

Jacques Nicolas

Fonction : Auteur
PersonId : 5225
IdHAL : jacques-nicolas
IdRef : 116276142

Dynamics, Logics and Inference for biological Systems and Sequences

Résumé

Based on Harris’s substitutability criterion, the recent definitions of classes of substitutable languages have led to interesting polynomial learnability results for expressive formal languages. These classes are also promising for practical applications: in natural language analysis, because definitions have strong linguisitic support, but also in biology for modeling protein families, as suggested in our previous study introducing the class of local substitutable languages. But turning recent theoretical advances into practice badly needs truly practical algorithms. We present here an efficient learning algorithm, motivated by intelligibility and parsing efficiency of the result, which directly reduces the positive sample into a small non-redundant canonical grammar of the target substitutable language. Thanks to this new algorithm, we have been able to extend our experimentation to a complete protein dataset confirming that it is possible to learn grammars on proteins with high specificity and good sensitivity by a generalization based on local substitutability.

Mots clés

(local) substitutable languages learning algorithm canonical grammar proteins

Domaines

Apprentissage [cs.LG] Intelligence artificielle [cs.AI] Bio-Informatique, Biologie Systémique [q-bio.QM]

François Coste : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01080249

Soumis le : mardi 4 novembre 2014-17:23:12

Dernière modification le : vendredi 24 mars 2023-14:52:59

Dates et versions

hal-01080249 , version 1 (04-11-2014)

Identifiants

HAL Id : hal-01080249 , version 1

Citer

François Coste, Gaelle Garet, Jacques Nicolas. A bottom-up efficient algorithm learning substitutable languages from positive examples. ICGI (International Conference on Grammatical Inference), Sep 2014, Kyoto, Japan. pp.49--63. ⟨hal-01080249⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM EC-PARIS UNIV-RENNES1 CNRS INRIA INSA-RENNES IRISA IRISA-D7 INRIA2 UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES UR1-MATH-NUM

163 Consultations

0 Téléchargements

A bottom-up efficient algorithm learning substitutable languages from positive examples

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager