On subset seeds for protein alignment

Abstract : We apply the concept of subset seeds proposed in [1] to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We propose several different design methods and use them to construct several alphabets. We then perform a comparative analysis of seeds built over those alphabets and compare them with the standard BLASTP seeding method [2], [3], as well as with the family of vector seeds proposed in [4]. While the formalism of subset seeds is less expressive (but less costly to implement) than the cumulative principle used in BLASTP and vector seeds, our seeds show a similar or even better performance than BLASTP on Bernoulli models of proteins compatible with the common BLOSUM62 matrix. Finally, we perform a large-scale benchmarking of our seeds against several main databases of protein alignments. Here again, the results show a comparable or better performance of our seeds vs. BLASTP.
Type de document :
Article dans une revue
IEEE/ACM Transactions on Computational Biology and Bioinformatics, Institute of Electrical and Electronics Engineers, 2009, 6 (3), pp.483-494. 〈10.1109/TCBB.2009.4〉
Liste complète des métadonnées

Littérature citée [38 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00354773
Contributeur : Laurent Noé <>
Soumis le : mercredi 21 janvier 2009 - 00:39:53
Dernière modification le : jeudi 11 janvier 2018 - 06:22:13
Document(s) archivé(s) le : mardi 8 juin 2010 - 18:57:11

Fichiers

TCBBrevision.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

Citation

Mikhail Roytberg, Anna Gambin, Laurent Noé, Slawomir Lasota, Eugenia Furletova, et al.. On subset seeds for protein alignment. IEEE/ACM Transactions on Computational Biology and Bioinformatics, Institute of Electrical and Electronics Engineers, 2009, 6 (3), pp.483-494. 〈10.1109/TCBB.2009.4〉. 〈inria-00354773〉

Partager

Métriques

Consultations de la notice

280

Téléchargements de fichiers

423