IUPACpal: efficient identification of inverted repeats in IUPAC-encoded DNA sequences
Résumé
Abstract Background An inverted repeat is a DNA sequence followed downstream by its reverse complement, potentially with a gap in the centre. Inverted repeats are found in both prokaryotic and eukaryotic genomes and they have been linked with countless possible functions. Many international consortia provide a comprehensive description of common genetic variation making alternative sequence representations, such as IUPAC encoding, necessary for leveraging the full potential of such broad variation datasets. Results We present IUPACpal , an exact tool for efficient identification of inverted repeats in IUPAC-encoded DNA sequences allowing also for potential mismatches and gaps in the inverted repeats. Conclusion Within the parameters that were tested, our experimental results show that IUPACpal compares favourably to a similar application packaged with EMBOSS . We show that IUPACpal identifies many previously unidentified inverted repeats when compared with EMBOSS , and that this is also performed with orders of magnitude improved speed.
Domaines
Sciences du Vivant [q-bio]
Origine : Fichiers éditeurs autorisés sur une archive ouverte