I’m working on implementing a SAC algorithm from the ground up and want to train it on the BipedalWalker environment. For my neural network to work better, I need both actions and observations to be normalized in the range [0, 1].
I successfully found the RescaleAction wrapper for handling actions, but I’m struggling with the NormalizeObservation wrapper. Can I apply it during environment initialization like this?
original_env = gym.make("BipedalWalker-v3", render_mode='rgb_array')
wrapped_env = RescaleAction(original_env, min_action=0, max_action=1)
final_env = NormalizeObservation(wrapped_env)
Will this setup automatically normalize all subsequent observations returned by the environment? I’m confused about when exactly the normalization gets applied and whether this approach is correct for my use case.
Your approach looks mostly right. NormalizeObservation will auto-normalize all observations after you wrap the environment. But here’s the catch - this wrapper uses standardization (mean=0, std=1), not [0,1] scaling. It tracks running mean and standard deviation of observations it sees. If you specifically need [0,1] range, you’ll want a custom wrapper or extra preprocessing after normalization. For SAC, I’ve found the standard normalization works fine, but if you really need [0,1] scaling, try a sigmoid transformation or min-max scaler wrapper. One thing to keep in mind - the normalization stats update during training. This is usually good for learning but means the transformation changes over time.
Your wrapper chain will work, but heads up about NormalizeObservation - it doesn’t actually normalize to [0,1]. It does z-score normalization instead, which centers observations around zero with unit variance. You’ll get values roughly in the [-3,3] range assuming normal distribution. If you want true [0,1] normalization, you’ll need a custom wrapper that tracks min/max values and applies (obs - min) / (max - min) scaling. That said, z-score normalization often works better for RL algorithms since it preserves relative differences between observations more effectively. One thing to watch out for: NormalizeObservation uses running statistics, so the normalization keeps changing as it collects more data. This can make training unstable early on when the statistics haven’t settled yet. If you run into issues, try using a warm-up period or initialize with pre-computed statistics.