Converting OpenAI gym visual observations to RAM format for Atari games

I’m working with Atari environments in OpenAI gym and noticed they come in two versions. Take Pong for example - there’s Pong-v0 which gives visual frames and Pong-ram-v0 which returns RAM state as observations.

The RAM version outputs an array with 128 elements, while the regular version gives me a 210x160 pixel image. I want to train my agent using the RAM observations since they’re more compact, but then visualize the results using the image-based environment.

My main question: Is there a way to convert the visual observation (210x160 image array) into the equivalent RAM observation format (128-element array)?

Here’s what I’m trying to achieve:

env_visual = gym.make('Pong-v0')
env_ram = gym.make('Pong-ram-v0')

# Get visual observation
visual_obs = env_visual.reset()
print(visual_obs.shape)  # (210, 160, 3)

# Get RAM observation  
ram_obs = env_ram.reset()
print(ram_obs.shape)  # (128,)

# How to convert visual_obs to ram_obs format?
converted_obs = convert_visual_to_ram(visual_obs)

I want to train my model on the RAM data but show the gameplay using the visual environment. Any ideas on how to bridge these two observation formats?

honestly, i think you’ve got this backwards. the RAM data is atari’s internal state - the visuals are generated from that, not the other way around. why not just train on the visual frames directly? try frame stacking or some preprocessing techniques instead.

Nope, that conversion won’t work. RAM state contains all the actual memory data - object positions, velocities, game variables, everything the system needs to render frames. You can’t reverse-engineer this from pixels because different RAM configurations could produce nearly identical visuals.

Here’s what you should do instead: run both environments at the same time. Train your agent on the RAM version, but create a separate visual environment for when you want to see what’s happening. Feed both the same action sequence with the same random seed and they’ll stay synced. You get compact training data plus visual feedback whenever you need it.