Skip to Main content Skip to Navigation
Conference papers

Combining Online and Offline Knowledge in UCT

Sylvain Gelly 1 David Silver 2
1 TANC - Algorithmic number theory for cryptology
Inria Saclay - Ile de France, LIX - Laboratoire d'informatique de l'École polytechnique [Palaiseau]
Abstract : The UCT algorithm learns a value function online using sample-based search. The T D(lambda) algorithm can learn a value function offline for the on-policy distribution. We consider three approaches for combining offline and online value functions in the UCT algorithm. First, the offline value function is used as a default policy during Monte-Carlo simulation. Second, the UCT value function is combined with a rapid online estimate of action values. Third, the offline value function is used as prior knowledge in the UCT search tree. We evaluate these algorithms in 9 × 9 Go against GnuGo 3.7.10. The first algorithm performs better than UCT with a random simulation policy, but surprisingly, worse than UCT with a weaker, handcrafted simulation policy. The second algorithm outperforms UCT altogether. The third algorithm outperforms UCT with handcrafted prior knowledge. We combine these algorithms in MoGo, the world's strongest 9 × 9 Go program. Each technique significantly improves MoGo's playing strength.
Complete list of metadata

Cited literature [13 references]  Display  Hide  Download
Contributor : Sylvain Gelly <>
Submitted on : Thursday, July 19, 2007 - 1:51:04 PM
Last modification on : Wednesday, March 27, 2019 - 4:41:29 PM
Long-term archiving on: : Thursday, April 8, 2010 - 11:37:50 PM


Publisher files allowed on an open archive


  • HAL Id : inria-00164003, version 1



Sylvain Gelly, David Silver. Combining Online and Offline Knowledge in UCT. International Conference of Machine Learning, Jun 2007, Corvallis, United States. ⟨inria-00164003⟩



Record views


Files downloads