
ppo Agent playing SnowballTarget
This is a trained model of a ppo agent playing SnowballTarget using the Unity ML-Agents Library.
Usage (with ML-Agents)
The Documentation: https://unity-technologies.github.io/ml-agents/ML-Agents-Toolkit-Documentation/
Watch the Agent play
You can watch the agent playing directly in your browser
- Go to https://huggingface.co/spaces/ThomasSimonini/ML-Agents-SnowballTarget
- Step 1: Find the model_id: Francesco-A/ppo-SnowballTarget-v1
- Step 2: Select the .nn /.onnx file
- Click on Watch the agent play
Training hyperparameters
behaviors:
SnowballTarget:
trainer_type: ppo
summary_freq: 10000
keep_checkpoints: 10
checkpoint_interval: 55000
max_steps: 250000
time_horizon: 64
threaded: true
hyperparameters:
learning_rate: 0.0003
learning_rate_schedule: linear
batch_size: 128
buffer_size: 2048
beta: 0.005
epsilon: 0.2
lambd: 0.95
num_epoch: 3
network_settings:
normalize: false
hidden_units: 256
num_layers: 2
vis_encode_type: simple
reward_signals:
extrinsic:
gamma: 0.99
strength: 1.0
Training details
| Step | Time Elapsed | Mean Reward | Std of Reward | Status |
|---|---|---|---|---|
| 10000 | 29.079 s | 3.636 | 1.746 | Training |
| 20000 | 55.042 s | 7.164 | 2.661 | Training |
| 30000 | 77.884 s | 9.818 | 2.534 | Training |
| 40000 | 103.229 s | 11.509 | 2.263 | Training |
| 50000 | 127.046 s | 14.659 | 2.495 | Training |
| 60000 | 150.811 s | 15.655 | 2.414 | Training |
| 70000 | 174.292 s | 16.955 | 2.540 | Training |
| 80000 | 198.938 s | 18.091 | 2.481 | Training |
| 90000 | 221.915 s | 19.182 | 3.143 | Training |
| 100000 | 246.203 s | 21.182 | 2.724 | Training |
| 110000 | 271.024 s | 22.463 | 2.250 | Training |
| 120000 | 292.551 s | 24.044 | 2.190 | Training |
| 130000 | 317.539 s | 24.291 | 2.103 | Training |
| 140000 | 340.057 s | 24.455 | 4.423 | Training |
| 150000 | 366.645 s | 25.236 | 2.358 | Training |
| 160000 | 390.192 s | 25.000 | 1.895 | Training |
| 170000 | 414.326 s | 25.273 | 2.482 | Training |
| 180000 | 438.103 s | 25.750 | 1.798 | Training |
| 190000 | 462.837 s | 25.673 | 1.888 | Training |
| 200000 | 485.258 s | 25.295 | 2.380 | Training |
| 210000 | 509.542 s | 25.855 | 2.066 | Training |
| 220000 | 535.202 s | 26.111 | 1.931 | Training |
| 230000 | 556.965 s | 25.644 | 2.252 | Training |
| 240000 | 582.135 s | 26.018 | 2.673 | Training |
| 250000 | 604.248 s | 26.091 | 1.917 | Training |