G. Bejerano and G. Yona, Modeling protein families using probabilistic suffix trees, Proc. 3rd Ann. Conf. Computational Molecular Biology (RECOMB), pp.15-24, 1999.

G. Bennett, Probability inequalities for the sum of independent random variables, Journal of the American Statistical Association, vol.57, pp.33-45, 1962.

K. P. Burnham and D. R. Anderson, Model Selection and Inference : A Practical Information-Theoretic Approach, 1998.

P. G. Ferreira and P. J. Azevedo, Chapter vi : Deterministic motif mining in protein databases, Successes and New Directions in Data Mining, 2007.

B. Grant, A. Rodrigues, K. Elsawy, J. Mccammon, and L. Caves, Bio3d : An r package for the comparative analysis of protein structures, Bioinformatics, vol.22, pp.2695-2696, 2006.

D. Hawkins, Identification of Outliers, 1980.

C. M. Hurvich and C. L. Tsai, Regression and time series model selection in small samples, Biometrika, vol.76, issue.2, pp.297-307, 1989.

E. M. Knorr and R. T. Ng, Algorithms for mining distance-based outliers in large datasets, Proc. 24th Int. Conf. Very Large Data Bases, VLDB, pp.392-403, 1998.

D. Ron, Y. Singer, and N. Tishby, The power of amnesia : Learning probabilistic automata with variable memory length, Machine Learning, vol.25, issue.2-3, pp.117-149, 1996.

G. Schwarz, Estimating the dimension of a model, Annals of Statistics, vol.6, issue.2, pp.461-464, 1978.

C. Shannon, A mathematical theory of communication, Bell System Technical Journal, vol.27, pp.379-423, 1948.

N. Sugiura, Further analysis of the data by akaike's information criterion and the finite corrections, Communications in Statistics : Theory and Methods, vol.7, pp.13-26, 1978.

P. Sun, S. Chawla, and B. Arunasalam, Mining for outliers in sequential databases, Proc. 6th SIAM Int. Conf. Data Mining, pp.94-105, 2006.

R. D. Team, R : A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, 2006.