Wednesday, July 25, 2012

Let's Make Us A Game II-3: Scalability and Optimization

In my last update, I lied -- before we get into the graphical aspects of our simple Tic-Tac-Toe Bastard game, I decided that I want to revisit some basic gameplay issues we've encountered along the way.

One limitation I noted last week is that playing tic-tac-toe on a 3 x 3 grid is not necessarily very challenging or unpredictable; it's not very hard to play the computer to a draw, even on its highest difficulty setting, unless we get distracted by our own planning and miss the computer's drive to victory.  So, in an effort to reach a point where the computer's thoroughness might exceed the player's ability to keep track of all the details, I'd like to expand the model beyond the 3 x 3 standard, establishing a variable-driven approach that allows 4 x 4 or 5 x 5 play... basically any N x N matrix that memory will allow.

This wasn't hard to do -- the toughest part was writing a function to populate the POSSIBLE_WINS matrix dynamically, for any square matrix, instead of relying on the hardcoded 3 x 3 set I defined by hand in previous versions.  This was just a matter of getting my loop indices organized correctly.  With a few other hardcoded 3 and 9 values replaced with proper variable references, our existing code now functions for any matrix size we care to specify.

But there's a predictable cost --  with a larger matrix, the AI routine is slowing down significantly, because it has so many more possible sequences of play to consider.  A little benchmarking produces these results on my three-year-old laptop, for the computer's first move after the player has moved (later moves become faster as there are fewer new possibilities to consider with a more populated matrix):

Matrix Size    Difficulty    Time (s)       Moves Considered

3 x 3             9                1.058           59,704
4 x 4             5                8.577           396,075
4 x 4             6                86.974         3,999,675
5 x 5             4                7.881           267,744
5 x 5             5                158.515       5,368,224

I got tired of waiting for even a 4 x 4 grid on difficulty 9, and this is enough data to provide food for thought.

What this suggests it that the brute-force, full-recursion approach is acceptable for a standard Tic-Tac-Toe matrix, but that the possibilities also become too much for the computer to handle efficiently as the matrix gets larger.  We could probably speed this up by coding the AI routine in C, the language in which Lua itself is implemented.  But that's just pushing the problem down a level -- it will still get bogged down with suitably large matrices and look-ahead limits.  What we need to do is further optimize the approach.

First, let's consider whether deeper lookahead is even really buying us any improvement in the computer's play.  Does it really make better choices given a difficulty of 9, or would it make approximately the same choice with less detailed consideration?  To test this, I'll start with a 3 x 3 matrix and see which move the computer makes, ignoring the actual scoring mechanism underneath.  I will always pick square 1 as the human player, and fix things so that I always go first.

3 x 3 - difficulty 1 - CPU picks square 3 after .001 seconds, and lets me win easily by not noticing my diagonal in progress
3 x 3 - difficulty 2 - CPU picks square 3 after .002 seconds, blocks my diagonal but allows me to win
3 x 3 - difficulty 3 - CPU picks square 4 after .006 seconds, blocks diagonal but still allows me to win
3 x 3 - difficulty 4 - CPU picks square 2 after .056 seconds, blocks diagonal but still allows me to win
3 x 3 - difficulty 5 - CPU picks square 5 after 0.216 seconds, and fights me to a draw
3 x 3 - difficulty 6 - CPU picks square 5 after 0.452 seconds, and fights me to a draw
3 x 3 - difficulty 7 - CPU picks square 5 after 0.775 seconds, and fights me to a draw
3 x 3 - difficulty 8 - CPU picks square 5 after 0.997 seconds, and fights me to a draw
3 x 3 - difficulty 9 - CPU picks square 5 after 0.968 seconds, and fights me to a draw



This indicates that beyond lookahead 5 with a 3 x 3 matrix, there may not be a practical improvement in the computer's first-move play; it just spends more time coming to the same conclusion.  (It also indicates that, as we suspected, in a 9-element matrix with one square filled in, levels 8 and 9 do very much the same thing because there's no 9th lookahead possible.)  I see similar results with a 4 x 4 matrix and a 5 x 5 matrix -- the computer doesn't necessarily benefit from further consideration of its options, and in fact it starts making pretty good choices as soon as I get to difficulty/lookahead 2, where its decision-making is at least non-random.  (Just for fun, I examined a 2 x 2 matrix and discovered that whoever goes first always wins, so that's not an interesting configuration for our purposes.)

What does this mean?  It suggests that a little bit of look-ahead may be enough to present a credible approach to AI play, and that Tic-Tac-Toe may not be complex enough to merit a full, deep analysis.  It also suggests, to my surprise, that with a larger matrix there may be less value to looking ahead more deeply -- more of the available possibilities present the same likelihood of winning, so we may be able to save some time by recognizing this and tailoring our maximum lookahead to match our empirical observations.

A second issue to consider is suggested by the sheer numbers we're looking at -- playing with a 3 x 3 matrix on difficulty 9 considers 59,704 moves.  But if we think about this, there are really only 3 to the 8th power possible grid configurations remaining after the player has made the first move: 3 possible values for each square -- empty, player, or CPU -- and 8 remaining empty squares, for a total of 6,561 possible setups.  And there are impossible combinations in that set as well, because the players take turns -- we would never see a board where the player has marked 7 squares and the CPU has marked only 2.  So considering every possible sequence of play may be causing our algorithm to do a lot of redundant scoring of the same grid layout, arrived at by different sequences of moves.

Trading memory for speed and vice-versa is a classic optimization technique; in this case, we should see whether retaining scores for board configurations already examined allows us to avoid re-scoring the same state more than once.  Adding a global scoredGrids table allows us to do this; we'll translate each grid definition into a base-3 decimal number; if we already have an entry in the scoredGrids table for this index, we know we've already examined it and needn't rerun the recursive scoring operation, we can just return the score we came up with the first time we examined that particular configuration.  Lua's table operation really helps us out here because it is very efficient -- we don't have to define empty table slots for moves we won't actually be considering, so we aren't wasting memory that way (though this approach will consume a fair amount of memory.)

With this memory retention in place, let's see how this affects our benchmark results.  There will probably be some overhead from the additional table handling, but we should see an overall drop in the number of moves considered... 

Whoops.  There was a filing problem with my initial implementation -- the score matrix was storing results for both player and CPU activity, so we were getting distortion in the AI and it wasn't playing very well.  We want to use the results specific to CPU moves, so I added some casing logic to distinguish CPU from player moves.  With everything working, let's see how these numbers compare to our original benchmarks.... hmmmm.  These numbers look a lot better than they did earlier, except something else is wrong with the AI.  I'm getting bad play from the CPU on some matrix sizes and some skill levels.  There's still a bug of some sort afoot!

Here's the problem -- the stored score for a given configuration is not necessarily valid later in the game, because of the variability due to the lookahead setting.  The initial approximate scores may be fine for making a first move, but they misrepresent the likelihood of victory later in the game, misleading the CPU's play as it continues to respond to outdated information.  If the lookahead is set to a level that does a full recursion during the first exchange, then this isn't a problem, because the initial calculations cover all future possibilities; that's why my 3 x 3 play at difficulty 9 still worked as expected.  But in many other cases, our code is now making bad assumptions later in the game based on its initial ideas about how best to proceed; it's not really responding properly to the evolving game.

That's easy to fix, though -- our real goal for the reuse is to apply within a given move; while this error did appear to save us some time later in the game, reusing those old results from earlier analysis is a bad idea.  So we'll just clear out the stored scores each time we start a new move. And now we see that we're doing less evaluation -- about half as much -- but still playing just as well.
 
Matrix Size    Difficulty    Time (s)       Moves Considered


3 x 3             9                0.924           32,387
4 x 4             5                4.662           65,745
4 x 4             6                43.283         496,175

5 x 5             4                4.799           76,4765 x 5             5                76.604         692,784

Still, some of these timings are uncomfortably long for a player to wait.  So we'll also check the behavior at our theoretical minimum lookahead values -- play still looks good, and the performance is looking pretty solid too.  We'll take advantage of this to keep our engine running efficiently -- nobody should object to these wait times:

3 x 3             5                0.114           2445
4 x 4             2                0.011           225
5 x 5             2                0.034           576

So the only remaining mystery is, why is 3 x 3 play not very successful with a lookahead of 2?   I think the answer is that in a 3 x 3 matrix, the player who goes first has a certain advantage, especially if the CPU is not looking all the way ahead to the endgame.  It's similar to the 2 x 2 case -- victory can come suddenly, especially if the CPU is blind to the possibility of more than one victory condition turning up at the same moment.  In the larger matrices, using a short look-ahead to be agile about blocking a player's impending victory is sufficient to play to a draw, or to win if the player gets distracted.  I think we can conclude that the 3 x 3 game board is actually more interesting from a strategic standpoint, which may be why it has endured as the Tic-Tac-Toe standard despite many misguided efforts -- like mine -- to "improve" the gameplay.

Now, I think we're finally ready to move on with a graphical representation of our game!

The latest source code is below the fold:




-- set up constants

EMPTY_ID = 0;
MY_ID = 1;
CPU_ID = 2;

GRID_DIMENSION = 3;

GRID_SIZE = GRID_DIMENSION * GRID_DIMENSION;

-- and useful semi-variables

aiDebug = false;
aiDebug2 = false;
perfDebug = true;
scaleDebug = false;

PLAYERS = { MY_ID, CPU_ID };

MAX_SCORE = 16;

MAX_DIFFICULTY = 9;

DIFFICULTY = 1;

-- set up structures

gGrid = {};

for i = 1, GRID_SIZE do
   table.insert(gGrid, EMPTY_ID);
end;

totalEvaluated = 0;

-- utility functions

function labelPlayer(id, cell)

   if id == EMPTY_ID then
      return tonumber(cell);
   elseif id == MY_ID then
      return "P";
   elseif id == CPU_ID then
      return "C";
   end;

end;

function labelPlayerChar(id)

   if id == EMPTY_ID then
      return ".";
   elseif id == MY_ID then
      return "P";
   elseif id == CPU_ID then
      return "C";
   end;

end;

-- Convert grid contents into a base N unique decimal index for organizing scored grid configurations

function indexGridString(grid)

   local index = 0;

   for i = 1, GRID_SIZE do
      index = index * 3 + grid[i];
   end;

   return index;

end;

function gridToString(grid)

   local display = "";

   for i = 1, GRID_SIZE do
      display = string.format("%s%s", display, labelPlayerChar(grid[i]));
   end;

   return display;

end;


function doPrintGrid(grid)

    print " ";
    for i = 1, GRID_DIMENSION do
       local thisLine = "";
       for j = 1, GRID_DIMENSION do
          local cellIndex = (i -1 ) * GRID_DIMENSION + j;
          thisLine = string.format("%s%2s | ", thisLine, labelPlayer(grid[cellIndex], cellIndex));
       end;
       thisLine = thisLine.sub(thisLine, 1, -3);
       print(thisLine);
       if (i < GRID_DIMENSION) then print("----------------------------"); end;
    end;
    print " ";

end;

function doPrintGridRaw(grid)

    print("GRID:");
    print(grid[1], "|", grid[2], "|", grid[3]);
    print("----------------------------------");
    print(grid[4], "|", grid[5], "|", grid[6]);
    print("----------------------------------");
    print(grid[7], "|", grid[8], "|", grid[9]);
    print " ";

end;

function identifyPlayer(id)

   if id == EMPTY_ID then
      return "NOBODY";
   elseif id == MY_ID then
      return "PLAYER";
   elseif id == CPU_ID then
      return "COMPUTER";
   end;

end;

-- game logic functions

function checkVictory(grid, id)

   local won = false;

   for i in pairs(POSSIBLE_WINS) do
      local thisWon = true;     
      for j = 1, GRID_DIMENSION do
         if (grid[POSSIBLE_WINS[i][j]] ~= id) then
            thisWon = false;
         end;
      end;
      if thisWon then won = true; end;
   end;

   return won;

end;

function gridFull(grid)

   local full = true;

   for i = 1, GRID_SIZE do

       if (grid[i] == EMPTY_ID) then full = false; end;

   end;

   return full;

end;

function gridEmpty(grid)

   local empty = true;

   for i = 1, GRID_SIZE do

       if (grid[i] ~= EMPTY_ID) then empty = false; end;

   end;

   return empty;

end;


-- Build the set of possible wins -- columns, rows, diagonals

function mapWins()

   local POSSIBLE_WINS = {};

   for i = 1, GRID_DIMENSION do
      local rowSet = {};
      for j = 1, GRID_DIMENSION do
         table.insert(rowSet, (i - 1) * GRID_DIMENSION + j);
      end;
      table.insert(POSSIBLE_WINS, rowSet);
   end;

   for i = 1, GRID_DIMENSION do
      local colSet = {};
      for j = 1, GRID_DIMENSION do
         table.insert(colSet, i + (j - 1) * GRID_DIMENSION);
      end;
      table.insert(POSSIBLE_WINS, colSet);
   end;

   local diagSet = {};
   for j = 1, GRID_DIMENSION do
      table.insert(diagSet, j + (j - 1) * GRID_DIMENSION);
   end;
   table.insert(POSSIBLE_WINS, diagSet);

   local diagSet = {};
   for j = 1, GRID_DIMENSION do
      table.insert(diagSet, j + (3 - j) * GRID_DIMENSION);
   end;
   table.insert(POSSIBLE_WINS, diagSet);

   return POSSIBLE_WINS;

end;

-- GUI functions

function getPlayerInput(grid, id)

   local playerMove;

   repeat
      io.write(string.format("Make your move [1-%i]: ", GRID_SIZE));
      io.flush();
      playerMove=tonumber(io.read());
      if (grid[playerMove] ~= EMPTY_ID) then
         print "That square is already occupied";
      end;
   until grid[playerMove] == EMPTY_ID;

   print("Player marks square ", playerMove);

   return playerMove;

end;

-- AI functions

scoredGridsCPU = {};  -- will be used to retain board configurations already examined
scoredCountCPU = 0;
scoredGridsPlayer = {};
scoredCountPlayer = 0;

-- Determine the id of the current player's opponent

function other_player(id)
   for k = 1, table.getn(PLAYERS) do
      if PLAYERS[k] ~= id then
         return PLAYERS[k];
      end;
   end;

end;

-- Score a move's likelihood of winning given the current state of the grid

function scoreMove(grid, id, move, currentLevel, maxLevel)

   local score = 1;
   local newGrid = {};

   local indentString = string.rep(" ", currentLevel); -- for readability of debug output

   if (aiDebug) then print(string.format("%sNew scoreMove call - %i of %i, player %i move %i...",

indentString, currentLevel, maxLevel, id, move)); end;

   for i = 1, GRID_SIZE do
      newGrid[i] = grid[i];
   end;

   newGrid[move] = id; -- mark this move as a hypothetical next state

   -- Prepare to retain this evaluation for reuse later

   indexGrid = indexGridString(newGrid);
   if (aiDebug2) then
      print("indexGrid = ", indexGrid);
      print("scoredGridsCPU[] = ", scoredGridsCPU[indexGrid]);
      print(gridToString(newGrid));
   end;

   if (id == CPU_ID) then
      if scoredGridsCPU[indexGrid] ~= nil then
         return scoredGridsCPU[indexGrid];
      end;
   else
      if scoredGridsPlayer[indexGrid] ~= nil then
         return scoredGridsPlayer[indexGrid];
      end;
   end;     

   totalEvaluated = totalEvaluated + 1;

   if checkVictory(newGrid, id) then

      -- This is the (or a) winning move, so score it at the maximum

      score = MAX_SCORE;

      if (aiDebug) then print(string.format("%sFound a winning move for player %i, move %i",         

                     indentString, id, move)); end;
  
   elseif not gridFull(newGrid) and (currentLevel < maxLevel) then

      -- Assess possible next moves made after this move, based on likelihood of victory
      -- Want to come up a single score in the range 1 to MAX_SCORE that leads toward a win

      if (aiDebug) then print(string.format("%sEvaluating other player's move...", indentString));

end;

      local thisScore = 8;
      local movesEvaluated = 0;

      if (aiDebug) then doPrintGrid(newGrid); end;

      for j = 1, GRID_SIZE do

          if (newGrid[j] == EMPTY_ID) then

              thisScore = thisScore - 0.5 * scoreMove(newGrid, other_player(id), j, currentLevel + 1,

maxLevel);
              movesEvaluated = movesEvaluated + 1;

           end;

      end;     

      if (aiDebug) then print(string.format("%sthisScore, movesEvaluated: %f %i",                    

          indentString, thisScore, movesEvaluated)); end;

      score = math.random(1, MAX_SCORE);
      score = thisScore / movesEvaluated;

      if (aiDebug) then print(string.format("%sEvaluated move %i for player %i at %i",               

                       indentString, move, id, score)); end;

   end;

   if (id == CPU_ID) then
      scoredGridsCPU[indexGrid] = score;
      scoredCountCPU = scoredCountCPU + 1;
      if (aiDebug2) then print("CPU score = ", score); end;
   else
      scoredGridsPlayer[indexGrid] = score;
      scoredCountPlayer = scoredCountPlayer + 1;
      if (aiDebug2) then print("Player score = ", score); end;
   end;

   return score;

end;

-- Simplest case:  Pick a blank square at random

function getCPUInputEasy(grid, id)

   local cpuMove;

   repeat
      cpuMove = math.random(1, GRID_SIZE);
   until grid[cpuMove] == EMPTY_ID;

   return cpuMove;

end;

-- AI Case: Score possible moves and pick one of the best

function getCPUInput(grid, id)

   local cpuMove;
   local aiScore = { };
   local bestMove;
   local maxScore = -16;

   scoredGridsCPU = {};  -- will be used to retain board configurations already examined
   scoredCountCPU = 0; 
   scoredGridsPlayer = {};
   scoredCountPlayer = 0;

   local startTime = os.clock();

   totalEvaluated = 0;

   print ("CPU is thinking...");

   if gridEmpty(grid) then -- No need to consider moves, just pick at random

      bestMove = getCPUInputEasy(grid, id);

   else

   -- Consider and score each possible move
   for i = 1, GRID_SIZE do
      local score = 0;
      if grid[i] == EMPTY_ID then -- possible move worth evaluating
         score = scoreMove(grid, id, i, 1, DIFFICULTY);
      end;
      aiScore[i] = score;
      if score > maxScore then maxScore = score; end;     
   end;

   if (aiDebug) then doPrintGridRaw(aiScore); end;

   -- Decide which move to make, choosing at random from best moves by score
   bestMoves = {};
   local j = 0;
   for i = 1, GRID_SIZE do
      if (aiScore[i] == maxScore) and (grid[i] == EMPTY_ID) then
         j = j + 1;
         bestMoves[j] = i;
      end;
   end;

   if (table.getn(bestMoves) > 0) then
      bestMove = bestMoves[math.random(1, table.getn(bestMoves))];
   else
      bestMove = getCPUInputEasy(grid);
   end;

   if (perfDebug) then
      print ("CPU took ", os.clock() - startTime);
      print("CPU evaluated ", totalEvaluated, " moves");
      print("scoredCountCPU = ", scoredCountCPU);
      print("scoredCountPlayer = ", scoredCountPlayer);
   end;

   end;

   print("CPU marks square ", bestMove);

   return bestMove;

end;

function getMove(grid, id)

   local square = nil;

   if id == MY_ID then
      square = getPlayerInput(grid, id);
   else
      square = getCPUInput(grid, id);
   end;

   return square;

end;

-- main code

local gameOver = false;
local winner = EMPTY_ID;

math.randomseed(os.clock());

POSSIBLE_WINS = mapWins();

print(" ");
print("TIC-TAC-TOE BASTARD");
print(" ");

DIFFICULTY = nil;

repeat
   io.write("Select Difficulty 1-9 [1 = Easy, 9 = Hard]: ");
   io.flush();
   DIFFICULTY=tonumber(io.read());
until DIFFICULTY ~= nil;

print(" ");
print("COIN FLIP...");
coinFlip = math.random(1, 10);
coinFlip = math.random(1, 10); -- do it again, first time always seems to yield 1

if coinFlip < 5 then
   print("HEADS! Computer goes first");
   PLAYERS[1] = CPU_ID;
   PLAYERS[2] = MY_ID;
else
   print("TAILS! Player goes first");
   PLAYERS = {MY_ID, CPU_ID};
end;

doPrintGrid(gGrid);

while not gameOver do

   for i in pairs(PLAYERS) do

      print(string.format("%s is up!", identifyPlayer(PLAYERS[i])));

      gGrid[getMove(gGrid, PLAYERS[i])] = PLAYERS[i];

      if checkVictory(gGrid, PLAYERS[i]) then
         winner = PLAYERS[i];
         gameOver = true;
       end;

       if not gameOver and gridFull(gGrid) then
          winner = EMPTY_ID;
          gameOver = true;
       end;

       doPrintGrid(gGrid);

       if gameOver then break; end;

    end;
end;

if (winner ~= EMPTY_ID) then
    print (string.format("GAME OVER! %s WINS!", identifyPlayer(winner)));
else
    print ("IT'S A DRAW!");
end;



No comments:

Post a Comment