Simulation-based search of combinatorial games

Lukasz Lew 1 Rémi Coulom 2
2 SEQUEL - Sequential Learning
LIFL - Laboratoire d'Informatique Fondamentale de Lille, Inria Lille - Nord Europe, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal
Abstract : Monte-Carlo Tree Search is a very successful game playing algorithm. Unfortunately it su ers from the horizon e ect: some important tactical sequences may be delayed beyond the depth of the search tree, causing evaluation errors. Temporal-di erence search with a function approximation is a method that was proposed to overcome these weaknesses, by adaptively changing the simulation policy outside the tree. In this paper we present an experimental evidence demonstrating that a temporal di erence search may fail to nd an optimal policy, even in very simple game positions. Classical temporal di erence algorithms try to evaluate a local situation with a numerical value, but, as it appears, a single number is not enough to model the dynamics of a partial two-player game state. As a solution we propose to replace numerical values by approximate thermographs. With this richer representation of partial states, reinforcement-learning algorithms converge and accurately represent dynamics of states, allowing to nd an optimal policy.
Document type :
Conference papers
Complete list of metadatas

https://hal.inria.fr/hal-00694030
Contributor : Ist Rennes <>
Submitted on : Thursday, May 3, 2012 - 2:00:03 PM
Last modification on : Thursday, February 21, 2019 - 10:52:49 AM

Identifiers

  • HAL Id : hal-00694030, version 1

Collections

Citation

Lukasz Lew, Rémi Coulom. Simulation-based search of combinatorial games. ICML 2010 : Workshop on Machine Learning and Games, Jun 2010, Haifa, Israel. ⟨hal-00694030⟩

Share

Metrics

Record views

231