Biological Sequence Modeling with Convolutional Kernel Networks

Dexiong Chen 1 Laurent Jacob 2 Julien Mairal 1
1 Thoth - Apprentissage de modèles à partir de données massives
Inria Grenoble - Rhône-Alpes, LJK - Laboratoire Jean Kuntzmann
2 Statistique en grande dimension pour la génomique
PEGASE - Département PEGASE [LBBE]
Abstract : The growing number of annotated biological sequences available makes it possible to learn genotype-phenotype relationships from data with increasingly high accuracy. When large quantities of labeled samples are available for training a model, convolutional neural networks can be used to predict the phenotype of unannotated sequences with good accuracy. Unfortunately, their performance with medium- or small-scale datasets is mitigated, which requires inventing new data-efficient approaches. In this paper, we introduce a hybrid approach between convolutional neural networks and kernel methods to model biological sequences. Our method enjoys the ability of convolutional neural networks to learn data representations that are adapted to a specific task, while the kernel point of view yields algorithms that perform significantly better when the amount of training data is small. We illustrate these advantages for transcription factor binding prediction and protein homology detection, and we demonstrate that our model is also simple to interpret, which is crucial for discovering predictive motifs in sequences. The source code is freely available at https://gitlab.inria.fr/dchen/CKN-seq.
Complete list of metadatas

Cited literature [43 references]  Display  Hide  Download

https://hal.inria.fr/hal-01632912
Contributor : Julien Mairal <>
Submitted on : Tuesday, January 29, 2019 - 4:05:53 PM
Last modification on : Monday, September 30, 2019 - 9:57:48 AM
Long-term archiving on : Tuesday, April 30, 2019 - 5:11:08 PM

File

biorxiv.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Dexiong Chen, Laurent Jacob, Julien Mairal. Biological Sequence Modeling with Convolutional Kernel Networks. Bioinformatics, Oxford University Press (OUP), 2019, 35 (18), pp.3294-3302. ⟨10.1093/bioinformatics/btz094⟩. ⟨hal-01632912v3⟩

Share

Metrics

Record views

181

Files downloads

951