In Nim, there’s a restricted variety of optimum strikes for a given board configuration. When you don’t play one in every of them, then you definately basically cede management to your opponent, who can go on to win in the event that they play nothing however optimum strikes. And once more, the optimum strikes might be recognized by evaluating a mathematical parity perform.
So, there are causes to suppose that the coaching course of that labored for chess won’t be efficient for Nim. The shock is simply how unhealthy it truly was. Zhou and Riis discovered that for a Nim board with 5 rows, the AI obtained good pretty rapidly and was nonetheless enhancing after 500 coaching iterations. Including only one extra row, nevertheless, triggered the speed of enchancment to sluggish dramatically. And, for a seven-row board, good points in efficiency had basically stopped by the point the AI had performed itself 500 occasions.
To raised illustrate the issue, the researchers swapped out the subsystem that advised potential strikes with one which operated randomly. On a seven-row Nim board, the efficiency of the skilled and randomized variations was indistinguishable over 500 coaching good points. Primarily, as soon as the board obtained giant sufficient, the system was incapable of studying from observing sport outcomes. The preliminary state of the seven-row configuration has three potential strikes which are all in line with an final win. But when the skilled transfer evaluator of their system was requested to test all potential strikes, it evaluated each single one as roughly equal.
The researchers conclude that Nim requires gamers to be taught the parity perform to play successfully. And the coaching process that works so nicely for chess and Go is incapable of doing so.
Not simply Nim
One solution to view the conclusion is that Nim (and by extension, all neutral video games) is simply bizarre. However Zhou and Riis additionally discovered some indicators that comparable issues may additionally crop up in chess-playing AIs that had been skilled on this method. They recognized a number of “unsuitable” chess strikes—ones that missed a mating assault or threw an end-game—that had been initially rated extremely by the AI’s board evaluator. It was solely as a result of the software program took quite a lot of extra branches out a number of strikes into the longer term that it was in a position to keep away from these gaffes.

