I’m working with OpenAI Gym to build a custom environment and using it with a DQN agent. My challenge is that the available actions change depending on what state the environment is in.
Let me give you an example. My environment has 46 total possible actions, but when the agent is in a specific state, only 7 of those actions are actually valid to use. I can’t figure out how to implement this properly.
I found some discussions about this topic but they didn’t solve my issue. The official Gym docs don’t seem to have clear guidance on handling dynamic action spaces either.
I’m confused about how the DQN agent actually selects actions in this scenario. Does it just pick randomly from all 46 actions? How does it know which ones are valid for the current state?
Has anyone dealt with this kind of state-dependent action space before? What approach did you use to make it work?