Combining Online and Offline Knowledge in UCT

Sylvain Gelly; David Silver

Communication Dans Un Congrès Année : 2007

Combining Online and Offline Knowledge in UCT

(1) , (2)

1
2

Sylvain Gelly

Fonction : Auteur
PersonId : 836546

Algorithmic number theory for cryptology

David Silver

Fonction : Auteur

University of Alberta

Résumé

The UCT algorithm learns a value function online using sample-based search. The T D(lambda) algorithm can learn a value function offline for the on-policy distribution. We consider three approaches for combining offline and online value functions in the UCT algorithm. First, the offline value function is used as a default policy during Monte-Carlo simulation. Second, the UCT value function is combined with a rapid online estimate of action values. Third, the offline value function is used as prior knowledge in the UCT search tree. We evaluate these algorithms in 9 × 9 Go against GnuGo 3.7.10. The ﬁrst algorithm performs better than UCT with a random simulation policy, but surprisingly, worse than UCT with a weaker, handcrafted simulation policy. The second algorithm outperforms UCT altogether. The third algorithm outperforms UCT with handcrafted prior knowledge. We combine these algorithms in MoGo, the world's strongest 9 × 9 Go program. Each technique signiﬁcantly improves MoGo's playing strength.

Domaines

Intelligence artificielle [cs.AI] Informatique et théorie des jeux [cs.GT] Apprentissage [cs.LG]

Fichier principal

GellySilverICML2007.pdf (173.72 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Sylvain Gelly : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00164003

Soumis le : jeudi 19 juillet 2007-13:51:04

Dernière modification le : vendredi 24 mars 2023-14:52:49

Archivage à long terme le : jeudi 8 avril 2010-23:37:50

Dates et versions

inria-00164003 , version 1 (19-07-2007)

Identifiants

HAL Id : inria-00164003 , version 1

Citer

Sylvain Gelly, David Silver. Combining Online and Offline Knowledge in UCT. International Conference of Machine Learning, Jun 2007, Corvallis, United States. ⟨inria-00164003⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

X CNRS INRIA LIX X-LIX X-DEP-INFO PARISTECH INRIA2

11913 Consultations

3288 Téléchargements

Combining Online and Offline Knowledge in UCT

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager