BipedalWalkerHardcore-v3 deep-reinforcement-learning reinforcement-learning stable-baselines3

parameters <br>

model = A2C(policy = "MlpPolicy", <br> env = env, <br> n_steps = 256, <br> learning_rate = 0.001, <br> gamma = 0.99, <br> verbose=1) <br>