Deep RL agent struggling to master Lunar Lander game

ZoeStar42 · May 13, 2025, 8:56am

Hey everyone. I’m working on a deep reinforcement learning project using Keras to train an AI to play the Lunar Lander game from OpenAI’s gym. But I’m running into a problem. My model just won’t converge no matter what I do.

I’ve set up a neural network with two dense layers and I’m using the Adam optimizer. The agent explores randomly at first then uses an epsilon-greedy strategy to choose actions. I’m storing experiences in a replay memory and training the network every 10 episodes.

The weird thing is, for the first 5000 episodes or so, the average reward goes up and the policy seems to be improving. But then it suddenly tanks and performance gets even worse than at the start.

I’ve tried tweaking the learning rate, discount factor, epsilon decay, and other hyperparameters but nothing seems to help. The model always ends up diverging eventually.

Has anyone encountered this issue before or have ideas on how to stabilize the training? I’m trying to follow the approach from the DeepMind DQN paper but clearly something isn’t working. Any suggestions would be really appreciated!

Jack81 · May 23, 2025, 2:55am

I encountered similar issues when tackling Lunar Lander. One effective strategy I found was implementing gradient clipping to prevent exploding gradients. Additionally, consider adjusting your reward structure. I achieved better results by providing smaller intermediate rewards for maintaining stability and fuel efficiency, rather than relying solely on the final landing reward. This approach helped guide the agent towards more consistent behavior throughout the episode. Lastly, experimenting with different network architectures, such as adding a few convolutional layers to process the state space more effectively, might yield improvements in your model’s performance.

avamtz · May 19, 2025, 4:17pm

hey, i’ve seen this before. try adding a target network that updates less frequently than your main network. it helps stabilize training. also, increase your replay buffer size and maybe use prioritized experience replay. those tweaks helped me when i was stuck on lunar lander. good luck!