Combining Online and Offline Knowledge in UCT

Sylvain Gelly; David Silver

Conference Papers Year : 2007

Combining Online and Offline Knowledge in UCT

(1) , (2)

1
2

Sylvain Gelly

Function : Author
PersonId : 836546

Algorithmic number theory for cryptology

David Silver

Function : Author

University of Alberta

Abstract

The UCT algorithm learns a value function online using sample-based search. The T D(lambda) algorithm can learn a value function offline for the on-policy distribution. We consider three approaches for combining offline and online value functions in the UCT algorithm. First, the offline value function is used as a default policy during Monte-Carlo simulation. Second, the UCT value function is combined with a rapid online estimate of action values. Third, the offline value function is used as prior knowledge in the UCT search tree. We evaluate these algorithms in 9 × 9 Go against GnuGo 3.7.10. The ﬁrst algorithm performs better than UCT with a random simulation policy, but surprisingly, worse than UCT with a weaker, handcrafted simulation policy. The second algorithm outperforms UCT altogether. The third algorithm outperforms UCT with handcrafted prior knowledge. We combine these algorithms in MoGo, the world's strongest 9 × 9 Go program. Each technique signiﬁcantly improves MoGo's playing strength.

Domains

Artificial Intelligence [cs.AI] Computer Science and Game Theory [cs.GT] Machine Learning [cs.LG]

Fichier principal

GellySilverICML2007.pdf (173.72 Ko)

Origin : Publisher files allowed on an open archive

Sylvain Gelly : Connect in order to contact the contributor

https://inria.hal.science/inria-00164003

Submitted on : Thursday, July 19, 2007-1:51:04 PM

Last modification on : Friday, March 24, 2023-2:52:49 PM

Long-term archiving on: Thursday, April 8, 2010-11:37:50 PM

Dates and versions

inria-00164003 , version 1 (19-07-2007)

Identifiers

HAL Id : inria-00164003 , version 1

Cite

Sylvain Gelly, David Silver. Combining Online and Offline Knowledge in UCT. International Conference of Machine Learning, Jun 2007, Corvallis, United States. ⟨inria-00164003⟩

Export

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

X CNRS INRIA LIX X-LIX X-DEP-INFO PARISTECH INRIA2

11912 View

3280 Download

Combining Online and Offline Knowledge in UCT

Abstract

Domains

Dates and versions

Identifiers

Cite

Export

Collections

Share