SAC Agent playing BipedalWalker-v3
This is a trained model of a SAC agent playing BipedalWalker-v3 using the stable-baselines3 library.
Usage (with Stable-baselines3)
TODO: Add your code
from stable_baselines3 import ...
from huggingface_sb3 import load_from_hub
...
Well he does ok but still gets stuck on the rocks. Here are my hyperparameters not that they did me much good 😂:
def linear_schedule(initial_value, final_value=0.00001):
def func(progress_remaining):
"""Progress will decrease from 1 (beginning) to 0 (end)"""
return final_value + (initial_value - final_value) * progress_remaining
return func
initial_learning_rate = 7.3e-4
model = SAC(
policy='MlpPolicy',
env=env,
learning_rate=linear_schedule(initial_learning_rate),
buffer_size=1000000,
batch_size=256,
ent_coef=0.005,
gamma=0.99,
tau=0.01,
train_freq=1,
gradient_steps=1,
learning_starts=10000,
policy_kwargs=dict(net_arch=[400, 300]),
verbose=1
)
These are pretty well tuned but SAC leads to too much exploration and the agent is unable to exploit the required actions to complete the course. I suspect TD3 will be more successful so plan to turn back to that