Biological Sequence Modeling with Convolutional Kernel Networks

Dexiong Chen; Laurent Jacob; Julien Mairal

doi:10.1093/bioinformatics/btz094

Article Dans Une Revue Bioinformatics Année : 2019

Biological Sequence Modeling with Convolutional Kernel Networks

(1) , (2) , (1)

1
2

Dexiong Chen

Fonction : Auteur

Apprentissage de modèles à partir de données massives

Laurent Jacob

Fonction : Auteur
PersonId : 21877
IdHAL : laurent-jacob
ORCID : 0000-0002-7826-2719
IdRef : 176737952

Statistique en grande dimension pour la génomique

Julien Mairal

Fonction : Auteur
PersonId : 1034832
ORCID : 0000-0001-6991-2110
IdRef : 152125256

Apprentissage de modèles à partir de données massives

Résumé

The growing number of annotated biological sequences available makes it possible to learn genotype-phenotype relationships from data with increasingly high accuracy. When large quantities of labeled samples are available for training a model, convolutional neural networks can be used to predict the phenotype of unannotated sequences with good accuracy. Unfortunately, their performance with medium- or small-scale datasets is mitigated, which requires inventing new data-efficient approaches. In this paper, we introduce a hybrid approach between convolutional neural networks and kernel methods to model biological sequences. Our method enjoys the ability of convolutional neural networks to learn data representations that are adapted to a specific task, while the kernel point of view yields algorithms that perform significantly better when the amount of training data is small. We illustrate these advantages for transcription factor binding prediction and protein homology detection, and we demonstrate that our model is also simple to interpret, which is crucial for discovering predictive motifs in sequences. The source code is freely available at https://gitlab.inria.fr/dchen/CKN-seq.

Domaines

Bio-informatique [q-bio.QM] Machine Learning [stat.ML] Apprentissage [cs.LG] Cryptographie et sécurité [cs.CR]

Fichier principal

biorxiv.pdf (767.71 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Julien Mairal : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01632912

Soumis le : mardi 29 janvier 2019-16:05:53

Dernière modification le : vendredi 26 avril 2024-13:47:50

Archivage à long terme le : mardi 30 avril 2019-17:11:08

Dates et versions

hal-01632912 , version 1 (10-11-2017)

hal-01632912 , version 2 (08-10-2018)

hal-01632912 , version 3 (29-01-2019)

Identifiants

HAL Id : hal-01632912 , version 3
DOI : 10.1093/bioinformatics/btz094

Citer

Dexiong Chen, Laurent Jacob, Julien Mairal. Biological Sequence Modeling with Convolutional Kernel Networks. Bioinformatics, 2019, 35 (18), pp.3294-3302. ⟨10.1093/bioinformatics/btz094⟩. ⟨hal-01632912v3⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-RENNES1 UGA CNRS INRIA UNIV-LYON1 IRISA INSMI LJK LJK_GI BIOENVIS INRIA2 LJK-GI-THOTH UR1-MATH-STIC UR1-UFR-ISTIC UNIV-RENNES LBBE UDL ANR UR1-MATH-NUM

761 Consultations

1264 Téléchargements

Biological Sequence Modeling with Convolutional Kernel Networks

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager