Simulation-based search of combinatorial games

Lukasz Lew 1 Rémi Coulom 2
2 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : Monte-Carlo Tree Search is a very successful game playing algorithm. Unfortunately it su ers from the horizon e ect: some important tactical sequences may be delayed beyond the depth of the search tree, causing evaluation errors. Temporal-di erence search with a function approximation is a method that was proposed to overcome these weaknesses, by adaptively changing the simulation policy outside the tree. In this paper we present an experimental evidence demonstrating that a temporal di erence search may fail to nd an optimal policy, even in very simple game positions. Classical temporal di erence algorithms try to evaluate a local situation with a numerical value, but, as it appears, a single number is not enough to model the dynamics of a partial two-player game state. As a solution we propose to replace numerical values by approximate thermographs. With this richer representation of partial states, reinforcement-learning algorithms converge and accurately represent dynamics of states, allowing to nd an optimal policy.
Type de document :
Communication dans un congrès
ICML 2010 : Workshop on Machine Learning and Games, Jun 2010, Haifa, Israel. 2010, 〈〉
Liste complète des métadonnées
Contributeur : Ist Rennes <>
Soumis le : jeudi 3 mai 2012 - 14:00:03
Dernière modification le : jeudi 11 janvier 2018 - 06:22:13


  • HAL Id : hal-00694030, version 1



Lukasz Lew, Rémi Coulom. Simulation-based search of combinatorial games. ICML 2010 : Workshop on Machine Learning and Games, Jun 2010, Haifa, Israel. 2010, 〈〉. 〈hal-00694030〉



Consultations de la notice