H. Akaike, Statistical predictor identification, Annals of the Institute of Statistical Mathematics, vol.3, issue.1, pp.203-217, 1970.
DOI : 10.1007/BF02506337

M. Antony and P. Bartlett, Neural network learning : Theoretical Foundations, 1999.
DOI : 10.1017/CBO9780511624216

C. Borglet and K. Kruse, Graphical Models -Methods for data analysis and Mining, 2002.

C. G. Broyden, The Convergence of a Class of Double-rank Minimization Algorithms, IMA Journal of Applied Mathematics, vol.6, issue.3, pp.222-231, 1970.
DOI : 10.1093/imamat/6.3.222

R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu, « A limited memory algorithm for bound constrained optimization, J. Sci. Comput, vol.16, issue.5, pp.1190-1208, 1995.

J. Cheng, D. Bell, and W. Liu, « An algorithm for bayesian network construction from data, 6th International Workshop on Artificial Intelligence and Statistics, pp.83-90, 1997.

J. Cheng, D. Bell, and W. Liu, Learning belief networks from data, Proceedings of the sixth international conference on Information and knowledge management , CIKM '97, pp.325-331, 1997.
DOI : 10.1145/266714.266920

J. Cheng, R. Greiner, J. Kelly, D. Bell, and W. Liu, Learning Bayesian networks from data: An information-theory based approach, Artificial Intelligence, vol.137, issue.1-2, pp.43-90, 2002.
DOI : 10.1016/S0004-3702(02)00191-1

D. Chickering, Transformational Characterization of Equivalent Bayesian Network Structures, Proceedings of the 11th Annual Conference on Uncertainty in Artificial Intelligence (UAI-95), pp.87-98, 1995.

D. M. Chickering, Learning equivalence classes of bayesian-network structures, J. Mach. Learn. Res, vol.2, pp.445-498, 2002.

D. M. Chickering, « Optimal structure identification with greedy search, Journal of Machine Learning Research, vol.3, pp.507-554, 2002.

D. M. Chickering and D. Heckerman, « Efficient Approximations for the Marginal Likelihood of Bayesian Networks with Hidden Variables, Machine Learning, vol.29, issue.2/3, pp.181-212, 1997.
DOI : 10.1023/A:1007469629108

G. Cooper and E. Hersovits, A Bayesian method for the induction of probabilistic networks from data, Machine Learning, pp.309-347, 1992.
DOI : 10.1007/BF00994110

R. Fletcher, A new approach to variable metric algorithms, The Computer Journal, vol.13, issue.3, pp.317-322, 1970.
DOI : 10.1093/comjnl/13.3.317

D. Goldfarb, A family of variable-metric methods derived by variational means, Mathematics of Computation, vol.24, issue.109, pp.23-26, 1970.
DOI : 10.1090/S0025-5718-1970-0258249-6

H. Guo, E. Horvitz, W. Hsu, and E. Santos, « A Survey of Algorithms for Real-Time Bayesian Network Inference, 2002.

D. Heckerman, D. Geiger, M. Chickering, and . Learning, Bayesian networks : The combination of knowledge and statistical data, Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence, 1994.

A. Kolmogorov and V. Tikhomirov, « -entropy and -capacity of sets in functional spaces », Amer, Math. Soc. Translations, vol.17, pp.277-364, 1961.

F. R. Kschischang, B. Frey, and H. Loeliger, Factor graphs and the sum-product algorithm, IEEE Transactions on Information Theory, vol.47, issue.2, pp.498-519, 2001.
DOI : 10.1109/18.910572

S. Lauritzen and D. Spiegelhalter, « Local computations with probabilities on graphical structures and their applications to expert systems, J. Royal Statistical Society B, vol.50, pp.157-224, 1988.

J. C. Meza and «. Opt++, An Object-Oriented Class Library for Nonlinear Optimization, 1994.
DOI : 10.2172/10136172

J. Rissanen, Modeling by shortest data description, Automatica, vol.14, issue.5, pp.465-471, 1978.
DOI : 10.1016/0005-1098(78)90005-5

C. Robert, « The Bayesian Choice : a decision theoric motivation, 1994.
DOI : 10.1007/978-1-4757-4314-2

G. Schwartz, Estimating the dimension of a model », The annals of Statistics, pp.461-464, 1978.

D. F. Shanno, Conditioning of quasi-Newton methods for function minimization, Mathematics of Computation, vol.24, issue.111, pp.647-656, 1970.
DOI : 10.1090/S0025-5718-1970-0274029-X

A. Srinivasan, « Low-discrepancy Sets for High-Dimensional Rectangles: A Survey, Bulletin of the European Association for Theoretical Computer Science, vol.70, pp.67-76, 2000.

V. Vapnik, The Nature of Statistical Learning Theory, 1995.

V. Vapnik and A. Chervonenkis, « On the uniform convergence of relative frequencies of events to their probabilities », Theory of probability and its applications, pp.264-280, 1971.

M. Vidyasagar, Theory of Learning and Generalization, 1997.

P. Wocjan, D. Janzing, and T. Beth, Required sample size for learning sparse bayesian networks with many variables », LANL e-print cs, 2002.

C. Zhu, R. Byrd, P. Lu, J. Nocedal, and «. , a limited memory FORTRAN code for solving bound constrained optimization problems, 1994.