Combining Online and Offline Knowledge in UCT - Inria - Institut national de recherche en sciences et technologies du numérique Accéder directement au contenu
Communication Dans Un Congrès Année : 2007

Combining Online and Offline Knowledge in UCT

Sylvain Gelly
  • Fonction : Auteur
  • PersonId : 836546

Résumé

The UCT algorithm learns a value function online using sample-based search. The T D(lambda) algorithm can learn a value function offline for the on-policy distribution. We consider three approaches for combining offline and online value functions in the UCT algorithm. First, the offline value function is used as a default policy during Monte-Carlo simulation. Second, the UCT value function is combined with a rapid online estimate of action values. Third, the offline value function is used as prior knowledge in the UCT search tree. We evaluate these algorithms in 9 × 9 Go against GnuGo 3.7.10. The first algorithm performs better than UCT with a random simulation policy, but surprisingly, worse than UCT with a weaker, handcrafted simulation policy. The second algorithm outperforms UCT altogether. The third algorithm outperforms UCT with handcrafted prior knowledge. We combine these algorithms in MoGo, the world's strongest 9 × 9 Go program. Each technique significantly improves MoGo's playing strength.
Fichier principal
Vignette du fichier
GellySilverICML2007.pdf (173.72 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte
Loading...

Dates et versions

inria-00164003 , version 1 (19-07-2007)

Identifiants

  • HAL Id : inria-00164003 , version 1

Citer

Sylvain Gelly, David Silver. Combining Online and Offline Knowledge in UCT. International Conference of Machine Learning, Jun 2007, Corvallis, United States. ⟨inria-00164003⟩
11913 Consultations
3288 Téléchargements

Partager

Gmail Facebook X LinkedIn More