J. Aczél and Z. Daròczy, On Measures of Information and Their Characterizations, Mathematics in Science and Engineering, vol.115, 1975.

A. Agarwal, H. Daume, and I. , A geometric view of conjugate priors, Machine Learning, vol.16, issue.1, 2010.
DOI : 10.1007/s10994-010-5203-x

H. Akaike, A new look at the statistical model identification, IEEE Transactions on Automatic Control, vol.19, issue.6, pp.716-723, 1974.
DOI : 10.1109/TAC.1974.1100705

S. M. Ali and S. D. Silvey, A general class of coefficients of divergence of one distribution from another, Journal of the Royal Statistical Society -Series B Methodological, vol.28, issue.1, pp.131-142, 1966.

Y. Altun and A. Smola, Unifying Divergence Minimization and Statistical Inference Via Convex Duality, Proceedings of the 19th Annual Conference on Learning Theory (COLT'06), pp.139-153, 2006.
DOI : 10.1007/11776420_13

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.94.5263

S. Amari, Differential-Geometrical Methods in Statistics, Lecture Notes In Statistics, vol.28, 1985.
DOI : 10.1007/978-1-4612-5056-2

S. Amari, Information geometry on hierarchy of probability distributions, IEEE Transactions on Information Theory, vol.47, issue.5, pp.1701-1711, 2001.
DOI : 10.1109/18.930911

S. Amari, -Divergence, Neural Computation, vol.14, issue.10, pp.2780-2796, 2007.
DOI : 10.1162/08997660460734047

URL : https://hal.archives-ouvertes.fr/hal-01407788

S. Amari, <formula formulatype="inline"><tex Notation="TeX">$\alpha$</tex> </formula>-Divergence Is Unique, Belonging to Both <formula formulatype="inline"> <tex Notation="TeX">$f$</tex></formula>-Divergence and Bregman Divergence Classes, IEEE Transactions on Information Theory, vol.55, issue.11, pp.4925-4931, 2009.
DOI : 10.1109/TIT.2009.2030485

S. Amari, Information Geometry and Its Applications: Convex Function and Dually Flat Manifold, Emerging Trends in Visual Computing -LIX Colloquium, pp.75-102, 2008.
DOI : 10.1162/08997660460734047

S. Amari, Information geometry derived from divergence functions, Proceedings of the 3rd International Symposium on Information Geometry and its Applications, 2010.

S. Amari and H. Nagaoka, Methods of Information Geometry, volume 191 of Translations of Mathematical Monographs, 2000.

E. Arikan, An inequality on guessing and its application to sequential decoding, IEEE Transactions on Information Theory, vol.42, issue.1, pp.99-105, 1996.
DOI : 10.1109/18.481781

S. Arimoto, Information-theoretical considerations on estimation problems, Information and Control, vol.19, issue.3, pp.181-194, 1971.
DOI : 10.1016/S0019-9958(71)90065-9

S. Arimoto, Information measures and capacity of order ? for discrete memoryless channels, Topics in Information Theory -2nd Colloquium, pp.41-52, 1975.

V. Arsigny, P. Fillard, X. Pennec, and N. Ayache, Geometric Means in a Novel Vector Space Structure on Symmetric Positive???Definite Matrices, SIAM Journal on Matrix Analysis and Applications, vol.29, issue.1, pp.328-347, 2007.
DOI : 10.1137/050637996

URL : https://hal.archives-ouvertes.fr/inria-00616031

K. A. Arwini and C. T. Dodson, Information Geometry -Near Randomness and Near Independence, Lecture Notes in Mathematics, 1953.

L. Aviyente, R. Brakel, M. Kushwaha, H. Snodgrass, W. Shevrin et al., Characterization of Event Related Potentials Using Information Theoretic Distance Measures, IEEE Transactions on Biomedical Engineering, vol.51, issue.5, pp.737-743, 2004.
DOI : 10.1109/TBME.2004.824133

A. Banerjee, I. Dhillon, J. Ghosh, and S. Merugu, An information theoretic analysis of maximum likelihood mixture estimation for exponential families, Twenty-first international conference on Machine learning , ICML '04, 2004.
DOI : 10.1145/1015330.1015431

A. Banerjee, I. Dhillon, J. Ghosh, S. Merugu, and D. S. Modha, A generalized maximum entropy approach to bregman co-clustering and matrix approximation, Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining , KDD '04, pp.1919-1986, 2007.
DOI : 10.1145/1014052.1014111

A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh, Clustering with Bregman Divergences, Journal of Machine Learning Research, vol.6, pp.1705-1749, 2005.
DOI : 10.1137/1.9781611972740.22

O. E. Barndorff-nielsen, D. R. Cox, and N. Reid, The Role of Differential Geometry in Statistical Theory, International Statistical Review / Revue Internationale de Statistique, vol.54, issue.1, pp.83-96, 1986.
DOI : 10.2307/1403260

M. Basseville, Information: entropies, divergences et moyennes, 1996.
URL : https://hal.archives-ouvertes.fr/inria-00490399

M. Basseville, Information criteria for residual generation and fault detection and isolation, Automatica, vol.33, issue.5, pp.783-803, 1997.
DOI : 10.1016/S0005-1098(97)00004-6

URL : https://hal.archives-ouvertes.fr/inria-00073800

M. Basseville and J. Cardoso, On entropies, divergences, and mean values, Proceedings of 1995 IEEE International Symposium on Information Theory, p.330, 1995.
DOI : 10.1109/ISIT.1995.550317

A. Basu, I. R. Harris, N. Hjort, and M. C. Jones, Robust and efficient estimation by minimising a density power divergence, Biometrika, vol.85, issue.3, pp.549-559, 1998.
DOI : 10.1093/biomet/85.3.549

A. Basu and B. G. Lindsay, Minimum disparity estimation for continuous models: Efficiency, distributions and robustness, Annals of the Institute of Statistical Mathematics, vol.81, issue.4, pp.683-705, 1994.
DOI : 10.1007/BF00773476

A. Basu and B. G. Lindsay, The iteratively reweighted estimating equation in minimum distance problems, Computational Statistics & Data Analysis, vol.45, issue.2, pp.105-124, 2004.
DOI : 10.1016/S0167-9473(02)00326-2

A. Basu, H. Shioya, and C. Park, Statistical Inference, 2011.
DOI : 10.1201/9781315374062-9

H. H. Bauschke, Duality for Bregman projections onto translated cones and affine subspaces, Journal of Approximation Theory, vol.121, issue.1, pp.1-12, 1983.
DOI : 10.1016/S0021-9045(02)00040-0

J. Bercher, On some entropy functionals derived from R??nyi information divergence, Information Sciences, vol.178, issue.12, pp.2489-2506, 2008.
DOI : 10.1016/j.ins.2008.02.003

A. Bhattacharyya, On a measure of divergence between two statistical populations defined by their probability distributions, Bulletin of the Calcutta Mathematical Society, vol.35, pp.99-109, 1943.

L. Birgé, A New Lower Bound for Multiple Hypothesis Testing, IEEE Transactions on Information Theory, vol.51, issue.4, pp.1611-1615, 2005.
DOI : 10.1109/TIT.2005.844101

R. E. Blahut, Hypothesis testing and information theory, IEEE Transactions on Information Theory, vol.20, issue.4, pp.405-417, 1974.
DOI : 10.1109/TIT.1974.1055254

R. E. Blahut, Principles and Practice of Information Theory. Series in Electrical and Computer Engineering, 1987.

J. Boets, K. De-cock, B. De, and . Moor, A Mutual Information Based Distance for Multivariate Gaussian Processes, Modeling, pp.15-33, 2007.
DOI : 10.1007/978-3-540-73570-0_3

L. M. Bregman, The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming, USSR Computational Mathematics and Mathematical Physics, vol.7, issue.3, pp.200-217, 1967.
DOI : 10.1016/0041-5553(67)90040-7

M. Broniatowski and A. Keziou, Minimization of ??-divergences on sets of signed measures, Studia Scientiarum Mathematicarum Hungarica, vol.43, issue.4, pp.403-442, 2006.
DOI : 10.1556/SScMath.43.2006.4.2

URL : https://hal.archives-ouvertes.fr/hal-00467649

M. Broniatowski and A. Keziou, Parametric estimation and tests through divergences and the duality technique, Journal of Multivariate Analysis, vol.100, issue.1, pp.16-36, 2009.
DOI : 10.1016/j.jmva.2008.03.011

M. Broniatowski and A. Keziou, Minimization of ??-divergences on sets of signed measures, Studia Scientiarum Mathematicarum Hungarica, vol.43, issue.4, 2010.
DOI : 10.1556/SScMath.43.2006.4.2

URL : https://hal.archives-ouvertes.fr/hal-00467649

M. Broniatowski and A. Keziou, On generalized empirical likelihood methods, 2010.

M. Broniatowski and I. Vajda, Several applications of divergence criteria in continuous families, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00430179

J. Burbea and C. R. Rao, Entropy differential metric, distance and divergence measures in probability spaces: A unified approach, Journal of Multivariate Analysis, vol.12, issue.4, pp.575-596, 1982.
DOI : 10.1016/0047-259X(82)90065-3

J. Burbea and C. R. Rao, On the convexity of higher order Jensen differences based on entropy functions (Corresp.), IEEE Transactions on Information Theory, vol.28, issue.6, pp.961-963, 1982.
DOI : 10.1109/TIT.1982.1056573

J. Burbea and C. R. Rao, On the convexity of some divergence measures based on entropy functions, IEEE Transactions on Information Theory, vol.28, issue.3, pp.489-495, 1982.
DOI : 10.1109/TIT.1982.1056497

J. Burg, D. Luenberger, and D. Wenger, Estimation of structured covariance matrices, Proceedings of the IEEE, pp.963-974, 1982.
DOI : 10.1109/PROC.1982.12427

C. I. Byrnes, T. T. Georgiou, and A. Lindquist, A generalized entropy criterion for Nevanlinna-Pick interpolation with degree constraint, IEEE Transactions on Automatic Control, vol.45, issue.6, pp.822-839, 2001.
DOI : 10.1109/9.928584

M. A. Carreira-perpiñán and G. E. Hinton, On contrastive divergence learning, Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics (AIS- TATS'05), pp.59-66, 2005.

L. Cayton, Fast nearest neighbor retrieval for bregman divergences, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.112-119, 2008.
DOI : 10.1145/1390156.1390171

L. Cayton, Efficient Bregman range search, Advances in Neural Information Processing Systems, pp.243-251, 2009.

H. Chernoff, A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations, The Annals of Mathematical Statistics, vol.23, issue.4, pp.493-507, 1952.
DOI : 10.1214/aoms/1177729330

A. Cichocki and S. Amari, Families of Alpha- Beta- and Gamma- Divergences: Flexible and Robust Measures of Similarities, Entropy, vol.12, issue.6, pp.1532-1568, 2010.
DOI : 10.3390/e12061532

A. Cichocki, R. Zdunek, and S. Amari, Csisz??r???s Divergences for Non-negative Matrix Factorization: Family of New Algorithms, Proceedings of the 6th International Conference on Independent Component Analysis and Blind Source Separation (ICA'06), pp.32-39, 2006.
DOI : 10.1007/11679363_5

A. Cichocki, R. Zdunek, and S. Amari, Nonnegative matrix and tensor factorization, IEEE Signal Processing Magazine, vol.25, issue.1, pp.142-145, 2008.
DOI : 10.1002/9780470747278

A. Cichocki, R. Zdunek, A. Phan, and S. Amari, Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation, 2009.
DOI : 10.1002/9780470747278

M. Collins, R. E. Schapire, and Y. Singer, Logistic regression, AdaBoost and Bregman distances, Machine Learning, pp.253-285, 2002.

T. M. Cover and J. A. Thomas, Elements of Information Theory, 1991.

T. M. Cover and J. A. Thomas, Elements of Information Theory, Second Edition, 2006.

I. Csiszár, Eine informationstheoretische ungleichung und ihre anwendung auf den beweis der ergodizitat von markoffschen ketten, Magyar. Tud. Akad. Mat. Kutato Int. Kozl, vol.8, pp.85-108, 1963.

I. Csiszár, Information-type measures of difference of probability distributions and indirect observation, Studia Scientiarum Mathematicarum Hungarica, vol.2, pp.299-318, 1967.

I. Csiszár, On topological properties of f -divergence, Studia Scientiarum Mathematicarum Hungarica, vol.2, pp.329-339, 1967.

I. Csiszár, $I$-Divergence Geometry of Probability Distributions and Minimization Problems, The Annals of Probability, vol.3, issue.1, pp.146-158, 1975.
DOI : 10.1214/aop/1176996454

I. Csiszár, Information measures: a critical survey, Transactions of the 7th Conference on Information Theory, Statistical Decision Functions, Random Processes, pp.73-86, 1974.

I. Csiszár, Why Least Squares and Maximum Entropy? An Axiomatic Approach to Inference for Linear Inverse Problems, The Annals of Statistics, vol.19, issue.4, pp.2032-2066, 1991.
DOI : 10.1214/aos/1176348385

I. Csiszár, Generalized cutoff rates and Renyi's information measures, IEEE Transactions on Information Theory, vol.41, issue.1, pp.26-34, 1995.
DOI : 10.1109/18.370121

I. Csiszár, Axiomatic Characterizations of Information Measures, Entropy, vol.10, issue.3, pp.261-273, 2008.
DOI : 10.3390/e10030261

I. Csiszár and F. Matus, On minimization of multivariate entropy functionals, 2009 IEEE Information Theory Workshop on Networking and Information Theory, pp.96-100, 2009.
DOI : 10.1109/ITWNIT.2009.5158549

M. , D. Gupta, and T. S. Huang, Bregman distance to l 1 regularized logistic regression, 2010.

S. , D. Pietra, V. D. Pietra, and J. Lafferty, Duality and auxiliary functions for Bregman distances, 2002.

A. Dembo, Information inequalities and concentration of measure, The Annals of Probability, vol.25, issue.2, pp.927-939, 1997.
DOI : 10.1214/aop/1024404424

A. Dembo, T. M. Cover, and J. A. Thomas, Information theoretic inequalities, IEEE Transactions on Information Theory, vol.37, issue.6, pp.1501-1518, 1991.
DOI : 10.1109/18.104312

L. Devroye, L. Györfi, and G. Lugosi, A Probabilistic Theory of Pattern Recognition, volume 31 of Stochastic Modelling and Applied Probability, 1996.

I. S. Dhillon, S. Mallela, and R. Kumar, A divisive information-theoretic feature clustering algorithm for text classification, Journal of Machine Learning Research, vol.3, pp.1265-1287, 2003.

I. S. Dhillon and S. Sra, Generalized nonnegative matrix approximations with Bregman divergences, Advances in Neural Information Processing Systems, pp.283-290, 2005.

I. S. Dhillon and J. A. Tropp, Matrix Nearness Problems with Bregman Divergences, SIAM Journal on Matrix Analysis and Applications, vol.29, issue.4, pp.1120-1146, 2007.
DOI : 10.1137/060649021

URL : http://authors.library.caltech.edu/9428/1/DHIsiamjmaa07.pdf

D. Donoho and V. Stodden, When does non-negative matrix factorization give a correct decomposition into parts?, Advances in Neural Information Processing Systems, 2003.

M. D. Donsker and S. Varadhan, Asymptotic evaluation of certain markov process expectations for large time, II, Communications on Pure and Applied Mathematics, vol.28, issue.2, pp.279-301, 1975.
DOI : 10.1002/cpa.3160280206

I. L. Dryden, A. Kolydenko, D. Zhou, and B. Li, Non-Euclidean statistical analysis of covariance matrices and diffusion tensors, 2010.

S. Eguchi and S. Kato, Entropy and Divergence Associated with Power Function and the Statistical Application, Entropy, vol.12, issue.2, pp.262-274, 2010.
DOI : 10.3390/e12020262

D. M. Endres and J. E. Schindelin, A new metric for probability distributions, IEEE Transactions on Information Theory, vol.49, issue.7, pp.1858-1860, 2003.
DOI : 10.1109/TIT.2003.813506

M. D. Esteban, A general class of entropy statistics, Applications of Mathematics, vol.42, issue.3, pp.161-169, 1997.
DOI : 10.1023/A:1022447020419

A. Ferrante, M. Pavon, and F. Ramponi, Hellinger Versus Kullback&#x2013;Leibler Multivariable Spectrum Approximation, IEEE Transactions on Automatic Control, vol.53, issue.4, pp.954-967, 2008.
DOI : 10.1109/TAC.2008.920238

D. Ferrari and Y. Yang, Maximum L q -likelihood estimation, The Annals of Statistics, vol.38, issue.2, pp.753-783, 2010.
DOI : 10.1214/09-AOS687

L. Finesso and P. Spreij, Nonnegative matrix factorization and I-divergence alternating minimization, Linear Algebra and its Applications, vol.416, issue.2-3, pp.270-287, 2006.
DOI : 10.1016/j.laa.2005.11.012

A. Fischer, Quantization and clustering with Bregman divergences, Journal of Multivariate Analysis, vol.101, issue.9, pp.2207-2221, 2010.
DOI : 10.1016/j.jmva.2010.05.008

URL : http://doi.org/10.1016/j.jmva.2010.05.008

B. A. Frigyik, S. Srivastava, and M. R. Gupta, Functional Bregman Divergence and Bayesian Estimation of Distributions, IEEE Transactions on Information Theory, vol.54, issue.11, pp.5130-5139, 2008.
DOI : 10.1109/TIT.2008.929943

Y. Fujimoto and N. Murata, A modified EM algorithm for mixture models based on Bregman divergence, Annals of the Institute of Statistical Mathematics, vol.16, issue.4, pp.3-25, 2007.
DOI : 10.1007/s10463-006-0097-x

C. Févotte, N. Bertin, and J. Durrieu, Nonnegative Matrix Factorization with the Itakura-Saito Divergence: With Application to Music Analysis, Neural Computation, vol.14, issue.3, pp.793-830, 2009.
DOI : 10.1016/j.sigpro.2007.01.024

C. Févotte and J. Idier, Algorithms for nonnegative matrix factorization with the beta-divergence, 2010.

T. T. Georgiou, Relative entropy and the multivariable multidimensional moment problem, IEEE Transactions on Information Theory, vol.52, issue.3, pp.1052-1066, 2006.
DOI : 10.1109/TIT.2005.864422

T. T. Georgiou, Distances and Riemannian Metrics for Spectral Density Functions, IEEE Transactions on Signal Processing, vol.55, issue.8, pp.3995-4003, 2007.
DOI : 10.1109/TSP.2007.896119

T. T. Georgiou, J. Karlsson, and M. S. Takyar, Metrics for Power Spectra: An Axiomatic Approach, IEEE Transactions on Signal Processing, vol.57, issue.3, pp.859-867, 2009.
DOI : 10.1109/TSP.2008.2010009

T. T. Georgiou and A. Lindquist, Kullback-leibler approximation of spectral density functions, IEEE Transactions on Information Theory, vol.49, issue.11, pp.2910-2917, 2003.
DOI : 10.1109/TIT.2003.819324

T. T. Georgiou and A. Lindquist, A Convex Optimization Approach to ARMA Modeling, IEEE Transactions on Automatic Control, vol.53, issue.5, pp.1108-1119, 2008.
DOI : 10.1109/TAC.2008.923684

P. Gibilisco, E. Riccomagno, M. P. Rogantin, and H. P. Wynn, Algebraic and Geometric Methods in Statistics, 2010.

G. L. Gilardoni, On Pinsker's and Vajda's Type Inequalities for Csisz&#x00E1;r's <formula formulatype="inline"><tex Notation="TeX">$f$</tex> </formula>-Divergences, IEEE Transactions on Information Theory, vol.56, issue.11, pp.5377-5386, 2010.
DOI : 10.1109/TIT.2010.2068710

A. H. Gray and J. D. , Distance measures for speech processing, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.24, issue.5, pp.380-391, 1976.
DOI : 10.1109/TASSP.1976.1162849

R. M. Gray, Entropy and Information Theory, 1990.

R. M. Gray, Entropy and Information Theory, Second Edition, 2010.

R. M. Gray, A. Buzo, A. H. Gray, and Y. Matsuyama, Distortion measures for speech processing, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.28, issue.4, pp.367-376, 1980.
DOI : 10.1109/TASSP.1980.1163421

P. D. Grünwald and A. P. Dawid, Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory, Annals of Statistics, vol.32, issue.4, pp.1367-1433, 2004.

A. Guntuboyina, Lower bounds for the minimax risk using f -divergences and applications, 2010.

L. Györfi and T. Nemetz, f-dissimilarity: A generalization of the affinity of several distributions, Annals of the Institute of Statistical Mathematics, vol.26, issue.1, pp.105-113, 1978.
DOI : 10.1007/BF02480206

P. Harremoës and I. Vajda, Joint range of f-divergences, 2010 IEEE International Symposium on Information Theory, pp.1345-1349, 2010.
DOI : 10.1109/ISIT.2010.5513445

P. Harremoës and I. Vajda, On Bahadur efficiency of power divergence statistics, 2010.

P. Harremoës and I. Vajda, On pairs of f -divergences and their joint range, 2010.

P. Harremoës and C. Vignat, R??nyi Entropies of Projections, 2006 IEEE International Symposium on Information Theory, pp.1827-1830, 2006.
DOI : 10.1109/ISIT.2006.261750

M. E. Havrda and F. Charvát, Quantification method of classification processes: concept of structural ?-entropy, Kybernetika, vol.3, pp.30-35, 1967.

Y. He, A. B. Hamza, and H. Krim, A generalized divergence measure for robust image registration, IEEE Transactions on Signal Processing, vol.51, issue.5, pp.1211-1220, 2003.

A. O. Hero, B. Ma, O. Michel, and J. Gorman, Alpha-divergence for classification, indexing and retrieval, 2001.

G. E. Hinton, Training Products of Experts by Minimizing Contrastive Divergence, Neural Computation, vol.22, issue.8, pp.1771-1800, 2002.
DOI : 10.1162/089976600300015385

A. Hyvärinen, Estimation of non-normalized statistical models by score matching, Journal of Machine Learning Research, vol.6, pp.695-709, 2005.

A. Hyvärinen, Some extensions of score matching, Computational Statistics & Data Analysis, vol.51, issue.5, pp.2499-2512, 2007.
DOI : 10.1016/j.csda.2006.09.003

W. James and C. Stein, Estimation with Quadratic Loss, Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability, pp.361-379, 1961.
DOI : 10.1007/978-1-4612-0919-5_30

O. Johnson and A. Barron, Fisher information inequalities and the central limit theorem. Probability Theory and Related Fields, pp.391-409, 2004.

R. Johnson, Axiomatic characterization of the directed divergences and their linear combinations, IEEE Transactions on Information Theory, vol.25, issue.6, pp.709-716, 1979.
DOI : 10.1109/TIT.1979.1056113

L. K. Jones and C. L. Byrne, General entropy criteria for inverse problems, with applications to data compression, pattern classification, and cluster analysis, IEEE Transactions on Information Theory, vol.36, issue.1, pp.23-30, 1990.
DOI : 10.1109/18.50370

M. C. Jones, N. Hjort, I. R. Harris, and A. Basu, A comparison of related density-based minimum divergence estimators, Biometrika, vol.88, issue.3, pp.865-873, 2001.
DOI : 10.1093/biomet/88.3.865

A. Kagan and T. Yu, Some inequalities related to the Stam inequality, Applications of Mathematics, vol.53, issue.3, pp.195-205, 2008.
DOI : 10.1007/s10492-008-0004-2

T. Kanamori and A. Ohara, A Bregman extension of quasi-Newton updates I: an information geometrical framework, Optimization Methods and Software, vol.39, issue.1, 2010.
DOI : 10.1007/s10107-007-0137-1

T. Kanamori and A. Ohara, A Bregman extension of quasi-Newton updates II: Convergence and robustness properties, 2010.

A. Karagrigoriou and T. Papaioannou, On Measures of Information and Divergence and Model Selection Criteria, Statistical Models and Methods for Biomedical and Technical Systems, Statistics for Industry and Technology, pp.503-518, 2008.
DOI : 10.1007/978-0-8176-4619-6_35

J. Karlsson, T. T. Georgiou, and A. G. Lindquist, The Inverse Problem of Analytic Interpolation With Degree Constraint and Weight Selection for Control Synthesis, IEEE Transactions on Automatic Control, vol.55, issue.2, pp.405-418, 2010.
DOI : 10.1109/TAC.2009.2037280

D. Kazakos, On resolution and exponential discrimination between Gaussian stationary vector processes and dynamic models, IEEE Transactions on Automatic Control, vol.25, issue.2, pp.294-296, 1980.
DOI : 10.1109/TAC.1980.1102275

D. Kazakos, Spectral distance measures between continuous-time vector Gaussian processes (Corresp.), IEEE Transactions on Information Theory, vol.28, issue.4, pp.679-684, 1982.
DOI : 10.1109/TIT.1982.1056521

D. Kazakos and P. Papantoni-kazakos, Spectral distance measures between Gaussian processes, IEEE Transactions on Automatic Control, vol.25, issue.5, pp.950-959, 1980.
DOI : 10.1109/TAC.1980.1102475

D. Kazakos and P. Papantoni-kazakos, Detection and Estimation, 1990.

M. Kim and S. Lee, Estimation of a tail index based on minimum density power divergence, Journal of Multivariate Analysis, vol.99, issue.10, pp.2453-2471, 2008.
DOI : 10.1016/j.jmva.2008.02.031

J. Kivinen and M. K. Warmuth, Boosting as entropy projection, Proceedings of the twelfth annual conference on Computational learning theory , COLT '99, pp.134-144, 1999.
DOI : 10.1145/307400.307424

J. Kivinen, M. K. Warmuth, and B. Hassibi, The p-norm generalization of the LMS algorithm for adaptive filtering, IEEE Transactions on Signal Processing, vol.54, issue.5, pp.1782-1793, 2006.
DOI : 10.1109/TSP.2006.872551

L. Knockaert, A class of statistical and spectral distance measures based on Bose-Einstein statistics, IEEE Transactions on Signal Processing, vol.41, issue.11, pp.413171-3174, 1993.
DOI : 10.1109/78.257248

]. L. Knockaert, Statistical thermodynamics and natural f -divergences, 1994.

L. Knockaert, On scale and concentration invariance in entropies, Information Sciences, vol.152, pp.139-144, 2003.
DOI : 10.1016/S0020-0255(03)00058-6

R. Kompass, A Generalized Divergence Measure for Nonnegative Matrix Factorization, Neural Computation, vol.39, issue.3, pp.780-791, 2007.
DOI : 10.1162/089976602320264033

B. Kulis, M. A. Sustik, and I. S. Dhillon, Low-rank kernel learning with Bregman matrix divergences, Journal of Machine Learning Research, vol.10, pp.341-376, 2009.

S. Kullback, J. C. Keegel, and J. H. Kullback, Topics in Statistical Information Theory, volume 42 of Lecture Notes in Statistics, 1987.

J. D. Lafferty, Statistical learning algorithms based on Bregman distances, Proceedings of the 1997 Canadian Workshop on Information Theory, pp.77-80, 1997.

J. D. Lafferty, Additive models, boosting, and inference for generalized divergences, Proceedings of the twelfth annual conference on Computational learning theory , COLT '99, pp.125-133, 1999.
DOI : 10.1145/307400.307422

G. , L. Besnerais, J. Bercher, and G. Demoment, A new look at entropy for solving linear inverse problems, IEEE Transactions on Information Theory, vol.45, issue.5, pp.1565-1578, 1999.

G. Lebanon and J. Lafferty, Boosting and maximum likelihood for exponential models, Advances in Neural Information Processing Systems, 2001.

N. Leonenko and O. Seleznjev, Statistical inference for the <mml:math altimg="si16.gif" display="inline" overflow="scroll" xmlns:xocs="http://www.elsevier.com/xml/xocs/dtd" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.elsevier.com/xml/ja/dtd" xmlns:ja="http://www.elsevier.com/xml/ja/dtd" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:tb="http://www.elsevier.com/xml/common/table/dtd" xmlns:sb="http://www.elsevier.com/xml/common/struct-bib/dtd" xmlns:ce="http://www.elsevier.com/xml/common/dtd" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:cals="http://www.elsevier.com/xml/common/cals/dtd"><mml:mi>??</mml:mi></mml:math>-entropy and the quadratic R??nyi entropy, Journal of Multivariate Analysis, vol.101, issue.9, pp.1981-1994, 2010.
DOI : 10.1016/j.jmva.2010.05.009

B. Levy and R. Nikoukhah, Robust Least-Squares Estimation With a Relative Entropy Constraint, IEEE Transactions on Information Theory, vol.50, issue.1, pp.89-104, 2004.
DOI : 10.1109/TIT.2003.821992

K. Li, W. Zhou, and S. Yu, Effective metric for detecting distributed denial-of-service attacks based on information divergence, IET Communications, vol.3, issue.12, pp.1851-1860, 2009.
DOI : 10.1049/iet-com.2008.0586

F. Liese and I. Vajda, Convex Statistical Distances, 1987.

F. Liese and I. Vajda, On Divergences and Informations in Statistics and Information Theory, IEEE Transactions on Information Theory, vol.52, issue.10, pp.4394-4412, 2006.
DOI : 10.1109/TIT.2006.881731

J. Lin, Divergence measures based on the Shannon entropy, IEEE Transactions on Information Theory, vol.37, issue.1, pp.145-151, 1991.
DOI : 10.1109/18.61115

B. G. Lindsay, Efficiency Versus Robustness: The Case for Minimum Hellinger Distance and Related Methods, The Annals of Statistics, vol.22, issue.2, pp.1081-1114, 1994.
DOI : 10.1214/aos/1176325512

E. Lutwak, D. Yang, and G. Zhang, Cram??r???Rao and Moment-Entropy Inequalities for Renyi Entropy and Generalized Fisher Information, IEEE Transactions on Information Theory, vol.51, issue.2, pp.473-478, 2005.
DOI : 10.1109/TIT.2004.840871

S. Ma, D. Goldfarb, and L. Chen, Fixed point and Bregman iterative methods for matrix rank minimization, Mathematical Programming, Series A, 2010.
DOI : 10.1007/s10107-009-0306-5

D. Mackay, Information Theory, Inference & Learning Algorithms, 2003.

P. Maji, <formula formulatype="inline"><tex Notation="TeX">$f$</tex></formula>-Information Measures for Efficient Selection of Discriminative Genes From Microarray Data, IEEE Transactions on Biomedical Engineering, vol.56, issue.4, pp.1063-1069, 2009.
DOI : 10.1109/TBME.2008.2004502

P. Maji and S. K. , Feature Selection Using f-Information Measures in Fuzzy Approximation Spaces, IEEE Transactions on Knowledge and Data Engineering, vol.22, issue.6, pp.854-867, 2010.
DOI : 10.1109/TKDE.2009.124

M. Markatou, A. Basu, and B. G. Lindsay, Weighted Likelihood Equations with Bootstrap Root Search, Journal of the American Statistical Association, vol.81, issue.442, pp.740-750, 1998.
DOI : 10.1080/01621459.1998.10473726

A. Mathai and P. Rathie, Basic Concepts in Information Theory and Statistics, 1975.

Y. Matsuyama, Non-logarithmic information measures, ??-weighted EM algorithms and speedup of learning, Proceedings. 1998 IEEE International Symposium on Information Theory (Cat. No.98CH36252), p.385, 1998.
DOI : 10.1109/ISIT.1998.708990

Y. Matsuyama, The ??-EM algorithm and its applications, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100), pp.592-595, 2000.
DOI : 10.1109/ICASSP.2000.862051

Y. Matsuyama, The ??-EM algorithm: surrogate likelihood maximization using ??-logarithmic information measures, IEEE Transactions on Information Theory, vol.49, issue.3, pp.692-706, 2003.
DOI : 10.1109/TIT.2002.808105

Y. Matsuyama, N. Katsumata, and S. Imahara, Convex divergence as a surrogate function for independence: The f -divergence, Proceedings of the 3rd International Conference on Independent Component Analysis and Blind Signal Separation, pp.31-36, 2001.

K. Mattheou, S. Lee, and A. Karagrigoriou, A model selection criterion based on the BHHJ measure of divergence, Journal of Statistical Planning and Inference, vol.139, issue.2, pp.228-235, 2009.
DOI : 10.1016/j.jspi.2008.04.022

F. Matus, Divergence From Factorizable Distributions and Matroid Representations by Partitions, IEEE Transactions on Information Theory, vol.55, issue.12, pp.5375-5381, 2009.
DOI : 10.1109/TIT.2009.2032806

K. Matusita, DISCRIMINATION AND THE AFFINITY OF DISTRIBUTIONS, Discriminant Analysis and Applications, pp.213-223, 1973.
DOI : 10.1016/B978-0-12-154050-0.50018-6

N. Merhav, Data Processing Theorems and the Second Law of Thermodynamics, IEEE Transactions on Information Theory, vol.57, issue.8, 2010.
DOI : 10.1109/TIT.2011.2159052

M. Minami and S. Eguchi, Robust blind source separation by beta divergence, Neural Computation, vol.14, issue.8, pp.1859-1886, 2002.

T. Minka, Divergence measures and message passing, 2005.

A. Mnih and G. Hinton, Learning nonlinear constraints with contrastive backpropagation, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., pp.1302-1307, 2005.
DOI : 10.1109/IJCNN.2005.1556042

M. N. Mollah, M. Minami, and S. Eguchi, -Divergence Method, Neural Computation, vol.14, issue.1, pp.166-190, 2006.
DOI : 10.1162/089976602760128045

T. Morimoto, -Theorem, Journal of the Physical Society of Japan, vol.18, issue.3, pp.328-331, 1963.
DOI : 10.1143/JPSJ.18.328

URL : https://hal.archives-ouvertes.fr/hal-00658784

N. Murata, T. Takenouchi, T. Kanamori, and S. Eguchi, Information Geometry of U-Boost and Bregman Divergence, Neural Computation, vol.5, issue.7, pp.1437-1481, 2004.
DOI : 10.1162/089976604322860695

G. Nason, Robust projection indices, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.63, issue.3, pp.551-567, 2001.
DOI : 10.1111/1467-9868.00298

P. Nath, On a coding theorem connected with R??nyi's entropy, Information and Control, vol.29, issue.3, pp.234-242, 1975.
DOI : 10.1016/S0019-9958(75)90404-0

X. Nguyen, M. J. Wainwright, and M. I. Jordan, On surrogate loss functions and f -divergences, The Annals of Statistics, vol.37, issue.2, pp.876-904, 2009.
DOI : 10.1214/08-AOS595

X. Nguyen, M. J. Wainwright, and M. I. Jordan, Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization, IEEE Transactions on Information Theory, vol.56, issue.11, pp.5847-5861, 2010.
DOI : 10.1109/TIT.2010.2068870

F. Nielsen and S. Boltz, The Burbea-Rao and Bhattacharyya centroids, 2010.

F. Nielsen and R. Nock, Sided and Symmetrized Bregman Centroids, IEEE Transactions on Information Theory, vol.55, issue.6, pp.2882-2904, 2009.
DOI : 10.1109/TIT.2009.2018176

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.172.2671

F. Nielsen, P. Piro, and M. Barlaud, Bregman vantage point trees for efficient nearest Neighbor Queries, 2009 IEEE International Conference on Multimedia and Expo, pp.878-881, 2009.
DOI : 10.1109/ICME.2009.5202635

URL : https://hal.archives-ouvertes.fr/hal-00481723

T. Nishimura and F. Komaki, The Information Geometric Structure of Generalized Empirical Likelihood Estimators, Communications in Statistics - Theory and Methods, vol.22, issue.12, pp.1867-1879, 2008.
DOI : 10.1162/08997660460734047

R. Nock and F. Nielsen, Bregman Divergences and Surrogates for Learning, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.31, issue.11, pp.312048-2059, 2009.
DOI : 10.1109/TPAMI.2008.225

L. Pardo, Statistical Inference Based on Divergence Measures. Chapman & Hall/CRC Monographs on Statistics & Applied Probability, 2006.

L. Pardo, M. Salicrú, M. L. Menéndez, and D. Morales, Divergence measures based on entropy functions and statistical inference. Sankhy¯ a: The Indian Journal of, Statistics -Series B, vol.57, issue.3, pp.315-337, 1995.

M. Pardo and I. Vajda, On asymptotic properties of information-theoretic divergences, IEEE Transactions on Information Theory, vol.49, issue.7, pp.1860-1867, 2003.
DOI : 10.1109/TIT.2003.813509

R. K. Patra, A. Mandal, and A. Basu, Minimum Hellinger distance estimation with inlier modification. Sankhy¯ a: The Indian Journal of, Statistics -Series B, vol.70, issue.2, pp.310-323, 2008.

M. Pavon and A. Ferrante, On the Georgiou???Lindquist Approach to Constrained Kullback???Leibler Approximation of Spectral Densities, IEEE Transactions on Automatic Control, vol.51, issue.4, pp.639-644, 2006.
DOI : 10.1109/TAC.2006.872755

B. Pelletier, Informative barycentres in statistics, Annals of the Institute of Statistical Mathematics, vol.11, issue.3, pp.767-780, 2005.
DOI : 10.1007/BF02915437

B. Pelletier, Inference in ?-families of distributions, Statistics -A Journal of Theoretical and Applied Statistics, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00460631

A. Perez, ???Barycenter??? of a Set of Probability Measures and its Application in Statistical Decision, Proceedings of the 6th Prague Symposium on Computational Statistics, COMPSTAT'84, pp.154-159, 1984.
DOI : 10.1007/978-3-642-51883-6_21

D. Petz, Monotone metrics on matrix spaces, Linear Algebra and its Applications, vol.244, pp.81-96, 1996.
DOI : 10.1016/0024-3795(94)00211-8

D. Petz and R. Temesi, Means of Positive Numbers and Matrices, SIAM Journal on Matrix Analysis and Applications, vol.27, issue.3, pp.712-720, 2005.
DOI : 10.1137/050621906

D. Pham, F. Vrins, and M. Verleysen, On the Risk of Using R&#x00C9;nyi's Entropy for Blind Source Separation, IEEE Transactions on Signal Processing, vol.56, issue.10, pp.4611-4620, 2008.
DOI : 10.1109/TSP.2008.928109

J. P. Pluim, J. B. Maintz, and M. A. , <tex>$f$</tex>-Information Measures in Medical Image Registration, IEEE Transactions on Medical Imaging, vol.23, issue.12, pp.1508-1516, 2004.
DOI : 10.1109/TMI.2004.836872

Y. Qiao and N. Minematsu, A Study on Invariance of <formula formulatype="inline"> <tex Notation="TeX">$f$</tex></formula>-Divergence and Its Application to Speech Recognition, IEEE Transactions on Signal Processing, vol.58, issue.7, pp.3884-3890, 2010.
DOI : 10.1109/TSP.2010.2047340

F. Ramponi, A. Ferrante, and M. Pavon, A Globally Convergent Matricial Algorithm for Multivariate Spectral Estimation, IEEE Transactions on Automatic Control, vol.54, issue.10, pp.2376-2388, 2009.
DOI : 10.1109/TAC.2009.2028977

C. R. Rao, Information and the Accuracy Attainable in the Estimation of Statistical Parameters, Bulletin of the Calcutta Mathematical Society, vol.37, pp.81-91, 1945.
DOI : 10.1007/978-1-4612-0919-5_16

C. R. Rao, Diversity and dissimilarity coefficients: A unified approach, Theoretical Population Biology, vol.21, issue.1, pp.24-43, 1982.
DOI : 10.1016/0040-5809(82)90004-1

C. R. Rao, Diversity: its measurement, decomposition, apportionment and analysis. Sankhy¯ a: The Indian Journal of Statistics -Series A, pp.1-22, 1982.

C. R. Rao, Rao's Axiomatization of Diversity Measures, Encyclopedia of Statistical Sciences, pp.614-617, 1986.
DOI : 10.1002/9781118445112.stat01781

C. R. Rao, S. Amari, O. E. Barndorff-nielsen, R. E. Kass, S. L. Lauritzen et al., Differential metrics in probability spaces, Differential Geometry in Statistical Inference, 1987.

C. R. Rao and T. Nayak, Cross entropy, dissimilarity measures, and characterizations of quadratic entropy, IEEE Transactions on Information Theory, vol.31, issue.5, pp.589-593, 1985.
DOI : 10.1109/TIT.1985.1057082

J. Rauh, Finding the Maximizers of the Information Divergence From an Exponential Family, IEEE Transactions on Information Theory, vol.57, issue.6, 2009.
DOI : 10.1109/TIT.2011.2136230

P. Ravikumar, A. Agarwal, and M. J. Wainwright, Message-passing for graph-structured linear programs, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.1043-1080, 2010.
DOI : 10.1145/1390156.1390257

T. Read and N. Cressie, Goodness-of-Fit Statistics for Discrete Multivariate Data, Statistics, 1988.
DOI : 10.1007/978-1-4612-4578-0

M. D. Reid and R. C. Williamson, Information, divergence and risk for binary experiments, 2009.

M. D. Reid and R. C. Williamson, Composite binary losses, Journal of Machine Learning Research, pp.2387-2422, 2010.

A. Rényi, On measures of information and entropy, Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability, pp.547-561, 1961.

A. Rényi, On some basic problems of statistics from the point of view of information theory, Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp.531-543, 1967.

W. Sander, Measures of Information, Handbook of Measure Theory, pp.1523-1565, 2002.
DOI : 10.1016/B978-044450263-6/50038-5

R. Santos-rodriguez, D. Garcia-garcia, and J. Cid-sueiro, Cost-Sensitive Classification Based on Bregman Divergences for Medical Diagnosis, 2009 International Conference on Machine Learning and Applications, pp.551-556, 2009.
DOI : 10.1109/ICMLA.2009.82

M. P. Schützenberger, Contribution aux Applications Statistiques de la Théorie de l'Information, Thèse d' ´ Etat, 1953.

F. C. Schweppe, On the Bhattacharyya distance and the divergence between Gaussian processes, Information and Control, vol.11, issue.4, pp.373-395, 1967.
DOI : 10.1016/S0019-9958(67)90610-9

F. C. Schweppe, State space evaluation of the Bhattacharyya distance between two Gaussian processes, Information and Control, vol.11, issue.3, pp.352-372, 1967.
DOI : 10.1016/S0019-9958(67)90609-2

J. E. Shore and R. M. Gray, Minimum Cross-Entropy Pattern Classification and Cluster Analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.4, issue.1, pp.11-17, 1982.
DOI : 10.1109/TPAMI.1982.4767189

S. Si, D. Tao, and B. Geng, Bregman Divergence-Based Regularization for Transfer Subspace Learning, IEEE Transactions on Knowledge and Data Engineering, vol.22, issue.7, pp.929-942, 2010.
DOI : 10.1109/TKDE.2009.126

R. Sibson, Information radius. Probability Theory and Related Fields, pp.149-160, 1969.

B. K. Sriperumbudur, A. Gretton, K. Fukumizu, G. R. Lanckriet, and B. Schölkopf, On integral probability metrics, ?-divergences and binary classification, 2009.

S. Srivastava, M. R. Gupta, and B. A. Frigyik, Bayesian quadratic discriminant analysis, Journal of Machine Learning Research, vol.8, pp.1277-1305, 2007.

A. A. Stoorvogel and J. H. Van-schuppen, Approximation problems with the divergence criterion for Gaussian variables and Gaussian processes, Systems & Control Letters, vol.35, issue.4, pp.207-218, 1998.
DOI : 10.1016/S0167-6911(98)00053-X

W. Stummer and I. Vajda, On Bregman Distances and Divergences of Probability Measures, IEEE Transactions on Information Theory, vol.58, issue.3, 2009.
DOI : 10.1109/TIT.2011.2178139

W. Stummer and I. Vajda, On divergences of finite measures and their applicability in statistics and information theory, Statistics, vol.43, issue.2, pp.169-187, 2010.
DOI : 10.1214/aop/1176996402

I. Sutskever and T. Tieleman, On the convergence properties of contrastive divergence, Proceedings of the 13th International Workshop on Artificial Intelligence and Statistics (AISTATS'10), pp.78-795, 2010.

I. J. Taneja, On generalized information measures and their applications Advances in Electronics and Electron Physics, pp.327-413, 1989.

I. J. Taneja, On Generalized Information Measures and Their Applications, 2001.
DOI : 10.1016/S0065-2539(08)60580-6

B. Taskar, S. Lacoste-julien, and M. I. Jordan, Structured prediction, dual extragradient and Bregman projections, Journal of Machine Learning Research, vol.7, pp.1627-1653, 2006.

M. Teboulle, A unified continuous optimization framework for center-based clustering methods, Journal of Machine Learning Research, vol.8, pp.65-102, 2007.

M. Teboulle, P. Berkhin, I. S. Dhillon, Y. Guan, and J. Kogan, Clustering with Entropy-Like k-Means Algorithms, Grouping Multidimensional Data -Recent Advances in Clustering, pp.127-160, 2006.
DOI : 10.1007/3-540-28349-8_5

A. Toma and M. Broniatowski, Dual divergence estimators and tests: Robustness results, Journal of Multivariate Analysis, vol.102, issue.1, 2009.
DOI : 10.1016/j.jmva.2010.07.010

URL : https://hal.archives-ouvertes.fr/hal-00441124

F. Topsoe, Some inequalities for information divergence and related measures of discrimination, IEEE Transactions on Information Theory, vol.46, issue.4, pp.1602-1609, 2000.
DOI : 10.1109/18.850703

E. Torgersen, Comparison of Statistical Experiments, volume 36 of Encyclopedia of Mathematics and Its Applications, 1991.

K. Tsuda, G. Rätsch, and M. K. Warmuth, Matrix exponentiated gradient updates for on-line learning and Bregman projection, Journal of Machine Learning Research, vol.6, pp.995-1018, 2005.

M. Tsukada and H. Suyari, Tsallis differential entropy and divergences derived from the generalized Shannon- Khinchin axioms, Proceedings of the IEEE International Symposium on Information Theory (ISIT'09), pp.149-153, 2009.

I. Vajda, ? ? -divergence and generalized Fisher's information, Transactions of the 6th Prague Conference on Information Theory, Statistical Decision Functions, Random Processes, pp.873-886, 1971.

I. Vajda, Theory of Statistical Inference and Information, volume 11 of Series B: Mathematical and Statistical Methods, 1989.

I. Vajda, Modifications of divergence criteria for applications in continuous families Academy of Sciences of the Czech Republic, Institute of Information Theory and Automation, Research Report, vol.2230, 2008.

I. Vajda, On metric divergences of probability measures, Kybernetika, vol.45, issue.6, pp.885-900, 2009.

B. Vemuri, M. Liu, S. Amari, and F. Nielsen, Total Bregman Divergence and Its Applications to DTI Analysis, IEEE Transactions on Medical Imaging, vol.30, issue.2, 2010.
DOI : 10.1109/TMI.2010.2086464

C. Vignat, A. O. Hero, and J. A. Costa, A Geometric Characterization of Maximum R??nyi Entropy Distributions, 2006 IEEE International Symposium on Information Theory, pp.1822-1826, 2006.
DOI : 10.1109/ISIT.2006.261749

F. Vrins, D. Pham, and M. Verleysen, Is the General Form of Renyi???s Entropy a Contrast for Source Separation?, Proceedings of the 7th International Conference on Independent Component Analysis and Blind Source Separation (ICA'07), pp.129-136, 2007.
DOI : 10.1007/978-3-540-74494-8_17

S. Wang and D. Schuurmans, Learning Continuous Latent Variable Models with Bregman Divergences, Proceedings of the 14th International Conference on Algorithmic Learning Theory (ALT'03), pp.190-204, 2003.
DOI : 10.1007/978-3-540-39624-6_16

L. Wu, R. Jin, S. C. Hoi, J. Zhu, and N. Yu, Learning Bregman distance functions and its application for semi-supervised clustering, Advances in Neural Information Processing Systems, pp.2089-2097, 2009.

R. W. Yeung, A First Course in Information Theory Information Technology: Transmission, Processing and Storage, 2002.

R. W. Yeung, Information Theory and Network Coding Information Technology: Transmission, Processing and Storage, 2008.

W. Yin, S. Osher, D. Goldfarb, and J. Darbon, Bregman Iterative Algorithms for $\ell_1$-Minimization with Applications to Compressed Sensing, SIAM Journal on Imaging Sciences, vol.1, issue.1, pp.143-168, 2008.
DOI : 10.1137/070703983

S. Yu and P. G. Mehta, The Kullback-Leibler rate pseudo-metric for comparing dynamical systems, IEEE Transactions on Automatic Control, vol.55, issue.7, pp.1585-1598, 2010.

R. G. Zaripov, New Measures and Methods in Information Theory. A. N. Tupolev State Technical University Press, 2005.

J. Zhang, Divergence Function, Duality, and Convex Analysis, Neural Computation, vol.37, issue.1, pp.159-195, 2004.
DOI : 10.1007/BF02309013

J. Ziv and M. Zakai, On functionals satisfying a data-processing theorem, IEEE Transactions on Information Theory, vol.19, issue.3, pp.275-283, 1973.
DOI : 10.1109/TIT.1973.1055015