SnowballTarget deep-reinforcement-learning reinforcement-learning ML-Agents-SnowballTarget

8s6tgwmc.png

ppo Agent playing SnowballTarget

This is a trained model of a ppo agent playing SnowballTarget using the Unity ML-Agents Library.

Usage (with ML-Agents)

The Documentation: https://unity-technologies.github.io/ml-agents/ML-Agents-Toolkit-Documentation/

Watch the Agent play

You can watch the agent playing directly in your browser

  1. Go to https://huggingface.co/spaces/ThomasSimonini/ML-Agents-SnowballTarget
  2. Step 1: Find the model_id: Francesco-A/ppo-SnowballTarget-v1
  3. Step 2: Select the .nn /.onnx file
  4. Click on Watch the agent play

Training hyperparameters

behaviors:
  SnowballTarget:
    trainer_type: ppo
    summary_freq: 10000
    keep_checkpoints: 10
    checkpoint_interval: 55000
    max_steps: 250000
    time_horizon: 64
    threaded: true
    hyperparameters:
      learning_rate: 0.0003
      learning_rate_schedule: linear
      batch_size: 128
      buffer_size: 2048
      beta: 0.005
      epsilon: 0.2
      lambd: 0.95
      num_epoch: 3
    network_settings:
      normalize: false
      hidden_units: 256
      num_layers: 2
      vis_encode_type: simple
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0

Training details

Step Time Elapsed Mean Reward Std of Reward Status
10000 29.079 s 3.636 1.746 Training
20000 55.042 s 7.164 2.661 Training
30000 77.884 s 9.818 2.534 Training
40000 103.229 s 11.509 2.263 Training
50000 127.046 s 14.659 2.495 Training
60000 150.811 s 15.655 2.414 Training
70000 174.292 s 16.955 2.540 Training
80000 198.938 s 18.091 2.481 Training
90000 221.915 s 19.182 3.143 Training
100000 246.203 s 21.182 2.724 Training
110000 271.024 s 22.463 2.250 Training
120000 292.551 s 24.044 2.190 Training
130000 317.539 s 24.291 2.103 Training
140000 340.057 s 24.455 4.423 Training
150000 366.645 s 25.236 2.358 Training
160000 390.192 s 25.000 1.895 Training
170000 414.326 s 25.273 2.482 Training
180000 438.103 s 25.750 1.798 Training
190000 462.837 s 25.673 1.888 Training
200000 485.258 s 25.295 2.380 Training
210000 509.542 s 25.855 2.066 Training
220000 535.202 s 26.111 1.931 Training
230000 556.965 s 25.644 2.252 Training
240000 582.135 s 26.018 2.673 Training
250000 604.248 s 26.091 1.917 Training