Combining Online and Offline Knowledge in UCT

Sylvain Gelly 1 David Silver 2
1 TANC - Algorithmic number theory for cryptology
LIX - Laboratoire d'informatique de l'École polytechnique [Palaiseau], Inria Saclay - Ile de France, Polytechnique - X, CNRS - Centre National de la Recherche Scientifique : UMR7161
Abstract : The UCT algorithm learns a value function online using sample-based search. The T D(lambda) algorithm can learn a value function offline for the on-policy distribution. We consider three approaches for combining offline and online value functions in the UCT algorithm. First, the offline value function is used as a default policy during Monte-Carlo simulation. Second, the UCT value function is combined with a rapid online estimate of action values. Third, the offline value function is used as prior knowledge in the UCT search tree. We evaluate these algorithms in 9 × 9 Go against GnuGo 3.7.10. The first algorithm performs better than UCT with a random simulation policy, but surprisingly, worse than UCT with a weaker, handcrafted simulation policy. The second algorithm outperforms UCT altogether. The third algorithm outperforms UCT with handcrafted prior knowledge. We combine these algorithms in MoGo, the world's strongest 9 × 9 Go program. Each technique significantly improves MoGo's playing strength.
Type de document :
Communication dans un congrès
International Conference of Machine Learning, Jun 2007, Corvallis, United States. 2007
Liste complète des métadonnées

Littérature citée [13 références]  Voir  Masquer  Télécharger

https://hal.inria.fr/inria-00164003
Contributeur : Sylvain Gelly <>
Soumis le : jeudi 19 juillet 2007 - 13:51:04
Dernière modification le : jeudi 11 janvier 2018 - 06:22:14
Document(s) archivé(s) le : jeudi 8 avril 2010 - 23:37:50

Fichier

GellySilverICML2007.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

  • HAL Id : inria-00164003, version 1

Collections

Citation

Sylvain Gelly, David Silver. Combining Online and Offline Knowledge in UCT. International Conference of Machine Learning, Jun 2007, Corvallis, United States. 2007. 〈inria-00164003〉

Partager

Métriques

Consultations de la notice

748

Téléchargements de fichiers

247