Creating binary observation vectors in OpenAI Gym environments

I’m working on a custom Gym environment and need help setting up the spaces correctly.

For my project, I want to build:

  1. Action setup: A discrete action system where I can choose from multiple options (let’s say actions 1 through n). I think the Discrete space works fine for this part.

  2. State tracking: This is where I’m stuck. I need an observation space that can represent all possible combinations of actions that have been executed. Basically, I want a binary vector where each position shows whether a specific action was already performed (1) or not yet taken (0).

For example, if I have 3 possible actions, my observation might look like [1, 0, 1] meaning actions 1 and 3 were completed but action 2 is still pending.

What’s the proper way to implement this kind of binary observation space in Gym? I’ve been looking through the documentation but haven’t found clear examples for this specific use case.

Any guidance would be appreciated!

Use gym.spaces.MultiBinary(n) where n is your number of actions. This creates a binary vector space where each element is 0 or 1. In your environment’s __init__, set self.observation_space = gym.spaces.MultiBinary(3) for 3 actions. The space handles validation and sampling of binary vectors like [0, 1, 1] or [1, 0, 0]. I used this in a resource collection environment to track gathered items. You’ll need to maintain the state vector in your step function - flip the right index to 1 when an action executes. Just remember MultiBinary expects numpy arrays for observations, so use np.array([1, 0, 1]) instead of Python lists to avoid issues with the space’s contains method.

Yes, while MultiBinary is a solid choice, there are a few additional considerations to bear in mind. Ensure that your observation vector is properly initialized in the reset() method and state transitions are accurately managed in the step() method. In my experience with task scheduling, I faced challenges with whether actions should be permanently disabled post-execution or remain available. If you prefer that completed actions cannot be re-executed, implement additional logic in the step() function to verify the current state prior to allowing further actions. Alternatively, you might explore using gym.spaces.Box with dtype=np.int8 and bounds [0,1], as it offers similar functionality to MultiBinary but allows for more precise shape specifications, which can be beneficial for multi-dimensional binary observations in the future.

heads up - make sure your reset() method sets the binary vector to all zeros. I screwed up by not resetting properly and got weird carryover between episodes. also think about whether you want the environment to end when all actions finish or keep it running.

I ran into shape validation issues when I first used binary observation vectors. MultiBinary works great, but watch your numpy array shapes - especially if you’re planning to expand later. I screwed up by mixing dtypes between observation generation and what the space expected. This caused silent training failures that were a pain to debug. Keep your dtypes consistent across your entire environment. Memory tip: if you’ve got tons of possible actions, use uint8 or bool arrays instead of standard ints. Saves a lot of space. Also, I’d recommend writing a quick helper method that converts your binary vector back to a list of completed actions - makes debugging way easier.

Quick heads up on the reward structure with this setup. I built something similar for a multi-objective environment and my agent got totally confused about action priorities. MultiBinary space works great like everyone said, but you’ll want your reward function to consider the current observation state. I had to adjust rewards based on completed actions so the agent wouldn’t try impossible sequences. Also, validate that actions are actually valid for the current binary state - especially if some actions need others completed first. This gets critical with dependencies.