The article "Chess, a Drosophila of reasoning" by Garry Kasparov is fascinating. In it he describes AlphaZero's tactics in chess, which no longer follow the usual "stupid machine logic". Or like Lee Sedol's comment on move 37 in the second game against AlphaGo: "...I thought surely AlphaGo is creative".
Let's stick to chess. Claude Shannon already described how the game should be played by a computer in 1950. The two central elements are still the same today. On the one hand the program must know the permitted moves and on the other hand it must evaluate the game position. The latter in order to judge which one to choose when searching for the next move. This is how chess programs have worked so far.
In order to master the complexity of the arithmetic task, numerous optimizations were also thought of. One of them is the use of libraries of played games whose outcome is known. Kasparov describes the result of this approach with the words: "Much as airplanes don't flap their wings like birds, machines don't generate chess moves like humans do.
The algorithm AlphaGo learned the same way in the first run, mechanistically and based on historical knowledge. Later it was further refined by letting it play anonymously against people online. Not very efficient. So its successor AlphaZero learned by playing the program only against itself. According to Demis Hassabis, CEO of Deep Mind, it taught itself chess "in a few hours" using massive computing power. Two aspects stand out in the result of the learning process: risk taking and the style of play of the program.
Until now, chess programs had oriented themselves on Shannon's Minimax algorithm and, as is usual in the chess world, on material loss as a value function. The playing pieces (and their mobility and position) have a value that can be added: 9 for the queen, 6 for the rook, etc.
AlphaZero works with a Monte Carlo simulation and chose the way of a holistic evaluation of the whole board in relation to the learned game process. It tries out and thus develops a playing style in which the machine does not react to the opponent, but actively shapes and drives the game. Individual moves seem clumsy because of the human habit of weighting the material value. And it is only many moves later that he understands its tactical significance. Away from deterministic calculation ('thinking straight ahead') and towards intuition. This behaviour is also reflected in the fact that AlphaZero evaluates fewer game positions than chess programs like Stockfish. Or as Kasparov says: "AlphaZero works smarter not harder".
And also during training (against itself) it is noticeable that AlphaZero prefers short thinking times (typical thinking time of 40ms). This shows an amazing parallel to the development of chess. Chess at competition level has been "brainwashed" and is provided with much more complex analysis. In the past, however, grandmasters used to train by means of blitz chess games. The more mistakes they saw, the more they thought they could learn. Obviously AlphaZero has found this out and has optimized his training, developing a wilder, more creative and unique style than his predecessors. The learning process of AlphaZero for chess took about 9 hours and the program played about 100 games per second against itself. AlphaZero therefore plays more risky and unpredictable than AlphaGo. This makes me think of the current world chess champion Magnus Carlsen, who is the stronger opponent in time-limited games. Both love moves that have never been played before.
The story continues in the book "Game Changer: AlphaZero's Groundbreaking Chess Strategies and the Promise of AI".
The example of AlphaZero and chess shows impressively how the choice of the algorithmic solution approach influences the style of the solution. At least in chess the time has come when machines and their developers develop their own style and no longer rely exclusively on computing power and pragmatism. Fascinating.