Day thirty-eight
The negamax saga continues. Will there ever be a time when doing anything with negamax doesn’t cause some kind of problem? I doubt it. It seems so simple, and yet always ends up being fickle. Today was devoted to optimization. I had already implemented alpha beta pruning which, on a 4x4 board, meant that instead of taking 24+ hours to choose a move, it did it in just over a minute. I considered this progress, since when I wanted to play my computer I could. I should point out that I did not actually let my negamax run for 24 hours, I just assumed it would take around that time, although it would probably take longer.
One of the stories for this week, though, was to get that to less than a second. I had hoped that my immutable board and no longer needing to clear the board would magically shave off at least a minute. Unsurprisingly, it didn’t, so I was back to looking for negamax optimizations. First I came across negamax with alpha beta and transposition tables.
Transposition tables use memoization. The negamax algorithm saves board states into a HashMap, with the board as the key and the score as the value. Whenever it has a board state to deal with, it will check the table to see whether it has already encountered that state, and if it has, simply returns the already recorded score and therefore does not have to recurse through the rest of the possible boards. If it is yet to encounter that particular board, then it saves that board and the score into the HashMap. I found some pseudocode and tried to implement it. It seemed to work in that it did not break anything. It definitely didn’t work in that it added 2 minutes to my first move time on a 4x4 board. Not cool.
I played with the code for a while, tried to figure out what was going wrong, and eventually came to the conclusion that I couldn’t understand if I had got the implementation really wrong or if it was the added cost of having to search and save to a hashmap that was adding to the time.
At the time, I decided it was just too complicated for what I was trying to do, and I knew there was a simpler solution, so I gave up. After a while, though, all of the things I had done wrong dawned on me, and it’s a little embarrasing how obvious they are. The problem was this: my board is now immutable, and whenever a mark is placed a new board is returned. In my implementation I was checking that the HashMap key was equal to the current board, which it never would have been, so I was adding cost to the algorithm with absolutley no gains.
In the meantime I had come up with a way to make my algorithm faster, using Late Move Reductions. So my Perfect Player no longer calculated every single option for every free space. Instead, it checks for a possible loss or win in the next few moves, and acts accordingly. I don’t know why, but this felt a bit hacky to me. The whole point of minimax and negamax for me was that the computer player knew every single possibility that could happen, and that was why it was perfect. Now, it’s still unbeatable, and it’s fast, but it felt a little like cheating.
As per usual for a Thursday night though, I’ve got more to do than I have time for. I really want to see whether I can get Transposition tables working, because I’ve never used memoization before and I’m really enjoying working with algorithms at the moment, but I’ve got the Data Structures and Algorithms weekly assignments to do, and I’ve got to figure out how to work with and test the play framework. I’m really looking forward to both of these, so I’m going to postpone my work on the tables.
I did do a little more work on it though. First, I overrode the equals method for my Board, so that if one board has the same state as another, they should be considered equal. This is something I haven’t really had to do before, although now that I have done it and seen how simple it is, I realise how useful it can be. In my Rock, Paper, Scissors it would have been easy to override the compare to method which would elimitate the need for any rules logic.
I also created a TranspositionTable class so that if I were to check if my table contained a board key, the correct equals() method would be called, and I could easily find the corresponding value to a matched key. I created a TTEntry class so that every value assosciated with a board key in my table holds its score, its depth, the move and a flag. I really enjoyed creating my own elements. Usually I assume the language will have the features I need, probably because I started with ruby and it is such a helpful language. Java isn’t, and it took me a while to realise that if it didn’t have a method I wanted, or didn’t behave as I’d hoped, I can just create something that does.
Then I rewrote the pseudocode and wondered what would happen. I think I understand what is meant to happen, so when all my tests failed, I wasn’t surprised, even though I couldn’t figure out why. This is negamax, after all. I’m leaving it for now, hoping that if I can come back to it with fresh eyes I will figure out what I’ve done wrong, but even if I don’t get it working, I feel like I’ve learnt a lot and really enjoyed just playing with Java.