Dynamic action space in OpenAI Gym environment based on current state

I’m working with OpenAI Gym to build a custom environment and using it with a DQN agent. My challenge is that the available actions change depending on what state the environment is in.

Let me give you an example. My environment has 46 total possible actions, but when the agent is in a specific state, only 7 of those actions are actually valid to use. I can’t figure out how to implement this properly.

I found some discussions about this topic but they didn’t solve my issue. The official Gym docs don’t seem to have clear guidance on handling dynamic action spaces either.

I’m confused about how the DQN agent actually selects actions in this scenario. Does it just pick randomly from all 46 actions? How does it know which ones are valid for the current state?

Has anyone dealt with this kind of state-dependent action space before? What approach did you use to make it work?

i had a similar issue! what worked for me was using a mask for the q-values of invalid actions. like, if an action is not valid in a given state, just set its q-value to a very low number or -inf. that way, your agent learns to stick to valid moves pretty fast!