Fast parser for biological sequences and a new algorithm for the inference of substitutable languages

Mikaïl Demirdelen 1
1 Dyliss - Dynamics, Logics and Inference for biological Systems and Sequences
Inria Rennes – Bretagne Atlantique , IRISA-D7 - GESTION DES DONNÉES ET DE LA CONNAISSANCE
Abstract : Grammatical inference, or grammar induction, studies how to learn automatically implicit rules behind some sequential data. This domain has a real scientific purpose and can be useful in numerous domains like natural language processing or bioinformatics as they often manipulate sequences. The tool we use to describe these data is the formal grammar. There exists some categories of grammars that are more expressive than others and therefore, more complicated to learn. In order to infer these expressive grammars some options have been developped. One of them is to make substitutable languages assumptions. The goal of my internship is to search for methods to improve the results of expressive grammar inference using these subsitutable languages. These improvements will be especially made for practical applications, and more particularly for biological sequences. In this report, I will first describe the state-of-art algorithm that can learn an expressive class of substitutable language and how it is currently implemented. Then, I will develop how I improved the current parser to make it useful for real cases. Finally, I will talk about my contributions to improve the learning capability of the state-of-the-art algorihtm adapting it to a new class of subsitutable languages.
Liste complète des métadonnées

Littérature citée [13 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/hal-01406352
Contributeur : François Coste <>
Soumis le : jeudi 1 décembre 2016 - 09:59:57
Dernière modification le : mercredi 16 mai 2018 - 11:23:35
Document(s) archivé(s) le : lundi 20 mars 2017 - 16:21:41

Identifiants

  • HAL Id : hal-01406352, version 1

Citation

Mikaïl Demirdelen. Fast parser for biological sequences and a new algorithm for the inference of substitutable languages. Machine Learning [cs.LG]. 2016. 〈hal-01406352〉

Partager

Métriques

Consultations de la notice

270

Téléchargements de fichiers

50