, Estimation of the Warfarin dose with clinical and pharmacogenetic data, New England Journal of Medicine, vol.360, issue.8, pp.753-764, 2009.
Taming the monster: A fast and simple algorithm for contextual bandits, International Conference on Machine Learning (ICML), 2014. ,
Understanding the impact of entropy on policy optimization, International Conference on Machine Learning (ICML), 2019. ,
Drug dosage in laboratory animals: a handbook, 1966. ,
Optimization over continuous and multi-dimensional decisions with observational data, Advances in Neural Information Processing Systems (NeurIPS), 2018. ,
Counterfactual reasoning and learning systems: The example of computational advertising, Journal of Machine Learning Research, vol.14, issue.1, pp.3207-3260, 2013. ,
Semi-parametric efficient policy learning with continuous actions, Advances in Neural Information Processing Systems (NeurIPS), 2019. ,
Doubly robust policy evaluation and learning, International Conference on Machine Learning (ICML), 2011. ,
, Orthogonal statistical learning, 2019.
A generalized proximal point algorithm for certain non-convex minimization problems, International Journal of Systems Science, vol.12, issue.8, pp.989-1000, 1981. ,
The propensity score with continuous treatments. Applied Bayesian modeling and causal inference from incomplete-data perspectives, vol.226164, pp.73-84, 2004. ,
A generalization of sampling without replacement from a finite universe, Journal of the American Statistical Association, vol.47, issue.260, pp.663-685, 1952. ,
Doubly robust off-policy value evaluation for reinforcement learning, International Conference on Machine Learning (ICML), 2016. ,
Deep learning with logged bandit feedback, International Conference on Learning Representations (ICLR), 2018. ,
Approximately optimal approximate reinforcement learning, International Conference on Machine Learning (ICML), 2002. ,
Policy evaluation and optimization with continuous treatments, International Conference on Artificial Intelligence and Statistics (AISTATS), 2018. ,
The epoch-greedy algorithm for multi-armed bandits with side information, Advances in Neural Information Processing Systems (NIPS), 2008. ,
Large-scale validation of counterfactual learning methods: A test-bed, 2016. ,
An unbiased offline evaluation of contextual bandit algorithms with generalized linear models, Proceedings of the Workshop on On-line Trading of Exploration and Exploitation 2, 2012. ,
On the limited memory bfgs method for large scale optimization, Mathematical programming, vol.45, pp.503-528, 1989. ,
Empirical bernstein bounds and sample variance penalization, Conference on Learning Theory (COLT, 2009. ,
Monte Carlo theory, methods and examples, 2013. ,
Catalyst for gradient-based nonconvex optimization, International Conference on Artificial Intelligence and Statistics (AISTATS), 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01773296
Semiparametric efficiency in multivariate regression models with missing data, Journal of the American Statistical Association, vol.90, issue.429, pp.122-129, 1995. ,
Monotone operators and the proximal point algorithm, SIAM journal on control and optimization, vol.14, issue.5, pp.877-898, 1976. ,
, Proximal policy optimization algorithms, 2017.
Cab: Continuous adaptive blending for policy evaluation and learning, International Conference on Machine Learning, pp.6005-6014, 2019. ,
Policy gradient methods for reinforcement learning with function approximation, Advances in Neural Information Processing Systems (NIPS), 2000. ,
Counterfactual risk minimization: Learning from logged bandit feedback, International Conference on Machine Learning (ICML), 2015. ,
The self-normalized estimator for counterfactual learning, Advances in Neural Information Processing Systems (NIPS), 2015. ,
Optimal and adaptive off-policy evaluation in contextual bandits, International Conference on Machine Learning (ICML), 2017. ,
Using the nyström method to speed up kernel machines, Adv. Neural Information Processing Systems (NIPS), 2001. ,
Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine learning, vol.8, issue.3-4, pp.229-256, 1992. ,