Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules[…] Starting tabula rasa, our new program AlphaGo Zero achieved superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo.
So not only does it teach itself how to play, it also handily beats the previous version, which started from expert human knowledge. What isn’t mentioned in the abstract is that also needed almost an order of magnitude less processing power in order to do so.
Funnily enough, when I read about this the first thing which came into my mind was this slightly salty post from facebook’s head of AI research, Yann LeCun:
Congrats to the DeepMind AlphaGo team for this Grand Slam.
Now, can you do it purely through reinforcement learning, without pre-training the convolutional net on recorded games between humans?
Yes, it seems that they can.