Implementing MCTS for multi-board tic-tac-toe with 9 grids

I’m working on creating an AI that can play a variant of tic-tac-toe with 9 separate 3x3 grids. The game works like regular tic-tac-toe but your move position determines which grid your opponent must play in next. When any grid gets three in a row, that player wins the entire game.

I’ve been trying to use Monte Carlo Tree Search but my AI makes really bad moves. Even when it has two pieces in a row and can win immediately, it chooses different moves instead. Here’s my current implementation:

from math import log, sqrt
import random
import numpy as np
from copy import deepcopy

class MultiGridGame:
    def __init__(self):
        self.grids = np.zeros((10, 10), dtype="int8")
        self.active_grid = 1
        self.last_player = 2

    def copy_state(self):
        new_game = MultiGridGame()
        new_game.last_player = self.last_player
        new_game.active_grid = self.active_grid
        new_game.grids = deepcopy(self.grids)
        return new_game

    def make_move(self, position):
        self.last_player = 3 - self.last_player
        if position >= 1 and position <= 9 and self.grids[self.active_grid][position] == 0:
            self.grids[self.active_grid][position] = self.last_player
            self.active_grid = position

    def available_moves(self):
        return [pos for pos in range(1, 10) if self.grids[self.active_grid][pos] == 0]

    def check_winner(self, player):
        for grid in self.grids:
            win_patterns = [(1,2,3),(4,5,6),(7,8,9),(1,4,7),(2,5,8),(3,6,9),(1,5,9),(3,5,7)]
            for (a,b,c) in win_patterns:
                if grid[a] == grid[b] == grid[c] == player:
                    return 1.0
        return 0.5 if not self.available_moves() else None

class SearchNode:
    def __init__(self, move_made=None, parent_node=None, game_state=None):
        self.move_made = move_made
        self.parent_node = parent_node
        self.child_nodes = []
        self.win_count = 0
        self.visit_count = 0
        self.untried_moves = game_state.available_moves()
        self.last_player = game_state.last_player

    def select_best_child(self):
        best = sorted(self.child_nodes, 
                     key=lambda node: node.win_count/node.visit_count + 
                     0.3 * sqrt(2*log(self.visit_count)/node.visit_count))[-1]
        return best

    def expand_tree(self, move, state):
        new_node = SearchNode(move_made=move, parent_node=self, game_state=state)
        self.untried_moves.remove(move)
        self.child_nodes.append(new_node)
        return new_node

    def update_stats(self, result):
        self.visit_count += 1
        self.win_count += result

def run_mcts(initial_state, max_iterations):
    root = SearchNode(game_state=initial_state)
    
    for iteration in range(max_iterations):
        current_node = root
        current_state = initial_state.copy_state()
        
        while not current_node.untried_moves and current_node.child_nodes:
            current_node = current_node.select_best_child()
            current_state.make_move(current_node.move_made)
        
        if current_node.untried_moves:
            random_move = random.choice(current_node.untried_moves)
            current_state.make_move(random_move)
            current_node = current_node.expand_tree(random_move, current_state)
        
        while current_state.available_moves():
            current_state.make_move(random.choice(current_state.available_moves()))
        
        while current_node:
            result = current_state.check_winner(current_node.last_player)
            current_node.update_stats(result if result is not None else 0.5)
            current_node = current_node.parent_node
    
    return sorted(root.child_nodes, key=lambda node: node.visit_count)[-1].move_made

def play_game():
    game = MultiGridGame()
    while game.available_moves():
        best_move = run_mcts(game, 1000)
        print(f"AI chooses position: {best_move}")
        game.make_move(best_move)

if __name__ == "__main__":
    play_game()

The problem is my AI doesn’t recognize obvious winning moves or blocking moves. Should I add specific logic to check for immediate wins and blocks, or is there a way to improve the MCTS algorithm itself to make better decisions? Any suggestions on what might be wrong with my current approach?

Your MCTS has a major bug in how it handles simulation results. The issue isn’t just noisy rollouts - your check_winner method only checks individual grids when it should be checking all 9 grids for an overall winner.

I’ve hit this exact problem building game AI at work. The tactical blindness happens because your algorithm never recognizes when the game actually ends. Simulations keep running after someone’s already won.

Fix win detection first. Then add a simple tactical layer before MCTS runs. Check for immediate wins and blocks - if you find them, return those moves directly. This preprocessing makes your search way more efficient.

Honestly though, debugging complex game algorithms like this is why I started using automation for AI workflows. Instead of wrestling with MCTS details, you could automate testing different parameters, running multiple simulations, and comparing algorithms.

I’ve been using Latenode for similar projects. You can automate the entire testing process - run your AI against different opponents, collect metrics, and auto-tune parameters. It handles the tedious iteration while you focus on core algorithm logic.

The problem’s in your simulation phase - those completely random playouts are killing the signal for tactical moves. I hit the same wall implementing MCTS for chess variants. Your algorithm needs way more iterations to cut through all that noise when there’s clear tactical stuff happening. Ditch the pure random rollouts for a simple heuristic evaluation. Before running the playout, check for immediate wins or blocks and weight those moves hard. You could even swap in a shallow minimax search instead of random moves. Your UCB constant at 0.3 is way too low for this. I’ve had better luck around 1.4 for tactical games. That low exploration means your tree won’t find obvious tactical moves unless they get picked early. Last thing - try seeding win_count with basic position evaluation instead of starting everything at zero.

I hit the same problem building MCTS for board games. Your node expansion is creating nodes with the wrong player perspective. You’re calling SearchNode(game_state=state) after making a move, but the node thinks it’s still the previous player’s turn.

This screws up reward propagation during backprop. When you update stats with check_winner(current_node.last_player), you’re checking the wrong player’s win condition half the time.

I fixed it by explicitly tracking whose turn it is at each node and making sure simulation results go to the right player. Also, your random simulation isn’t respecting the grid constraints - you can’t just pick any empty spot. The active grid should determine valid moves.

Print out which player each node thinks it represents vs the actual game state. You’ll see the mismatch right away.

Your UCB formula looks off to me. You’re adding the exploration bonus then taking the max node, which means you’re just picking the most explored + luckiest nodes instead of properly balancing exploration vs exploitation. That 0.3 constant is way too small for a game with this much branching - try 1.2 or higher. And yeah, others are right about the simulation issue. You need to break early when someone wins rather than playing until the board fills up.

Your simulation is broken. You’re running random playouts but never checking if someone won during the game - only at the end.

Look at this:

while current_state.available_moves():
    current_state.make_move(random.choice(current_state.available_moves()))

You make random moves until there’s none left, but never check if the game ended because someone got three in a row. Every simulation thinks it’s a draw.

I hit this same bug building a Connect Four AI last year. Easy fix - check for a winner after each move:

while current_state.available_moves():
    move = random.choice(current_state.available_moves())
    current_state.make_move(move)
    if current_state.check_winner(current_state.last_player):
        break

Your check_winner method’s busted too. You check every grid but should return true when ANY grid has three in a row. Right now it only returns the result from the last grid.

Fix winner detection first. Your MCTS will instantly get better at tactics once it knows when games actually end.