Coding Region Prediction Based on a Universal DNA Sequence Representation Method

Abstract : Graphical representation of DNA sequences provides a simple and intuitive way of viewing, anchoring, and comparing various gene structures, so a simple and non-degenerate method is attractive to both biologists and computational biologists. In this study, a universal graphical representation method for DNA sequences based on S.S.-T. Yau's method is presented. The method adopts a trigonometric function to represent the four nucleotides A, G, C, and T. Some interesting characteristics of the universal representation are introduced. We exploit frequency analysis with our representation method on DNA sequences, demonstrating possible applications in coding region prediction, and sequence analysis. Based on the statistically experimental results from this frequency analysis, a simple coding region predictor and an optimized one are presented. An experiment on the broadly accepted ROSETTA data set demonstrates that the performance of the optimized predictor is comparable to that of other popular methods.
Document type :
Journal articles
Complete list of metadatas

https://hal.inria.fr/inria-00347594
Contributor : Dominique Lavenier <>
Submitted on : Tuesday, December 16, 2008 - 11:42:50 AM
Last modification on : Thursday, August 1, 2019 - 2:12:40 PM

Identifiers

Citation

Dominique Lavenier, Xianyang Jiang, Stephen Yau. Coding Region Prediction Based on a Universal DNA Sequence Representation Method. Journal of Computational Biology, Mary Ann Liebert, 2008, 15 (10), pp.1237-1256. ⟨10.1089/cmb.2008.0041⟩. ⟨inria-00347594⟩

Share

Metrics

Record views

317