Abstract : Large scale production grids are an important case for autonomic computing. They follow a mutualization paradigm: decision-making (human or automatic) is distributed and largely independent, and, at the same time, it must implement the highlevel goals of the grid management. This paper deals with the scheduling problem with two partially conflicting goals: fairshare and Quality of Service (QoS). Fair sharing is a wellknown issue motivated by return on investment for participating institutions. Differentiated QoS has emerged as an important and unexpected requirement in the current usage of production grids. In the framework of the EGEE grid (one of the largest existing grids), applications from diverse scientific communities require a pseudo-interactive response time. More generally, seamless integration of the grid power into everyday use calls for unplanned and interactive access to grid resources, which defines reactive grids. The major result of this paper is that the combination of utility functions and reinforcement learning (RL) provides a general and efficient method for dynamically allocating grid resources in order to satisfy both end users with differentiated requirements and participating institutions. Combining RL methods and utility functions for resource allocation was pioneered by Tesauro and Vengerov. While the application contexts are different, the resource allocation issues are very similar. The main difference in our work is that we consider a multi-criteria optimization problem that includes a fair-share objective. A first contribution of our work is the definition of a set of variables describing states and actions that allows us to formulate the grid scheduling problem as a continuous action-state space reinforcement learning problem. To capture the immediate goals of end users and the long-term objectives of administrators, we propose automatically derived utility functions. Finally, our experimental results on a synthetic workload and a real EGEE trace show that RL clearly outperforms the classical schedulers, so it is a realistic alternative to empirical scheduler design.