HAL will be down for maintenance from Friday, June 10 at 4pm through Monday, June 13 at 9am. More information
Skip to Main content Skip to Navigation
Conference papers

Combining Online and Offline Knowledge in UCT

Sylvain Gelly 1 David Silver 2
1 TANC - Algorithmic number theory for cryptology
Inria Saclay - Ile de France, LIX - Laboratoire d'informatique de l'École polytechnique [Palaiseau]
Abstract : The UCT algorithm learns a value function online using sample-based search. The T D(lambda) algorithm can learn a value function offline for the on-policy distribution. We consider three approaches for combining offline and online value functions in the UCT algorithm. First, the offline value function is used as a default policy during Monte-Carlo simulation. Second, the UCT value function is combined with a rapid online estimate of action values. Third, the offline value function is used as prior knowledge in the UCT search tree. We evaluate these algorithms in 9 × 9 Go against GnuGo 3.7.10. The first algorithm performs better than UCT with a random simulation policy, but surprisingly, worse than UCT with a weaker, handcrafted simulation policy. The second algorithm outperforms UCT altogether. The third algorithm outperforms UCT with handcrafted prior knowledge. We combine these algorithms in MoGo, the world's strongest 9 × 9 Go program. Each technique significantly improves MoGo's playing strength.
Complete list of metadata

Cited literature [13 references]  Display  Hide  Download

https://hal.inria.fr/inria-00164003
Contributor : Sylvain Gelly Connect in order to contact the contributor
Submitted on : Thursday, July 19, 2007 - 1:51:04 PM
Last modification on : Friday, February 4, 2022 - 3:09:07 AM
Long-term archiving on: : Thursday, April 8, 2010 - 11:37:50 PM

File

GellySilverICML2007.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : inria-00164003, version 1

Collections

Citation

Sylvain Gelly, David Silver. Combining Online and Offline Knowledge in UCT. International Conference of Machine Learning, Jun 2007, Corvallis, United States. ⟨inria-00164003⟩

Share

Metrics

Record views

11853

Files downloads

2879