I’m encountering an error while trying to run a policy gradient reinforcement learning script using TensorFlow and OpenAI Gym. The error arises during the training process when the environment attempts to render.
Here’s the initial setup code I have:
import tensorflow as tf
import gym
import numpy as np
import matplotlib.pyplot as plt
import os
env = gym.make('CartPole-v1')
This is my neural network implementation:
class PolicyNetwork:
def __init__(self, action_count, observation_size):
init = tf.contrib.layers.xavier_initializer()
self.state_input = tf.placeholder(dtype=tf.float32, shape=[None, observation_size])
layer1 = tf.layers.dense(self.state_input, 16, activation=tf.nn.relu, kernel_initializer=init)
layer2 = tf.layers.dense(layer1, 16, activation=tf.nn.relu, kernel_initializer=init)
logits = tf.layers.dense(layer2, action_count, activation=None)
self.action_probs = tf.nn.softmax(logits)
self.predicted_action = tf.argmax(self.action_probs, axis=1)
self.reward_input = tf.placeholder(shape=[None, ], dtype=tf.float32)
self.action_input = tf.placeholder(shape=[None, ], dtype=tf.int32)
action_onehot = tf.one_hot(self.action_input, action_count)
cross_ent = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=action_onehot)
self.policy_loss = tf.reduce_mean(cross_ent * self.reward_input)
self.grads = tf.gradients(self.policy_loss, tf.trainable_variables())
self.grad_placeholders = []
for idx, var in enumerate(tf.trainable_variables()):
grad_ph = tf.placeholder(tf.float32)
self.grad_placeholders.append(grad_ph)
opt = tf.train.AdamOptimizer(learning_rate=1e-2)
self.apply_grads = opt.apply_gradients(zip(self.grad_placeholders, tf.trainable_variables()))
I also created a reward processing function:
gamma = 0.95
def process_rewards(reward_list):
processed = np.zeros_like(reward_list)
running_total = 0
for idx in reversed(range(len(reward_list))):
running_total = running_total * gamma + reward_list[idx]
processed[idx] = running_total
processed -= np.mean(processed)
processed /= np.std(processed)
return processed
The issue arises in my training loop when I execute env.render(). The complete error message is AttributeError: 'NoneType' object has no attribute 'set_current', which suggests a problem in the pyglet display system.
I’m using Windows with TensorFlow 1.4.0, OpenAI Gym 0.10.5, and Python 3.6.5. Has anyone else faced this rendering issue? How might I resolve the context error, or would it be better to skip rendering calls during training?