Combining Online and Offline Knowledge in UCT - Inria - Institut national de recherche en sciences et technologies du numérique Access content directly
Conference Papers Year : 2007

Combining Online and Offline Knowledge in UCT

Sylvain Gelly
  • Function : Author
  • PersonId : 836546

Abstract

The UCT algorithm learns a value function online using sample-based search. The T D(lambda) algorithm can learn a value function offline for the on-policy distribution. We consider three approaches for combining offline and online value functions in the UCT algorithm. First, the offline value function is used as a default policy during Monte-Carlo simulation. Second, the UCT value function is combined with a rapid online estimate of action values. Third, the offline value function is used as prior knowledge in the UCT search tree. We evaluate these algorithms in 9 × 9 Go against GnuGo 3.7.10. The first algorithm performs better than UCT with a random simulation policy, but surprisingly, worse than UCT with a weaker, handcrafted simulation policy. The second algorithm outperforms UCT altogether. The third algorithm outperforms UCT with handcrafted prior knowledge. We combine these algorithms in MoGo, the world's strongest 9 × 9 Go program. Each technique significantly improves MoGo's playing strength.
Fichier principal
Vignette du fichier
GellySilverICML2007.pdf (173.72 Ko) Télécharger le fichier
Origin : Publisher files allowed on an open archive
Loading...

Dates and versions

inria-00164003 , version 1 (19-07-2007)

Identifiers

  • HAL Id : inria-00164003 , version 1

Cite

Sylvain Gelly, David Silver. Combining Online and Offline Knowledge in UCT. International Conference of Machine Learning, Jun 2007, Corvallis, United States. ⟨inria-00164003⟩
11912 View
3280 Download

Share

Gmail Facebook X LinkedIn More