deep-reinforcement-learning reinforcement-learning sample-factory

A(n) APPO model trained on the atari_pong environment.

This model was trained using Sample-Factory 2.0: https://github.com/alex-petrenko/sample-factory. Documentation for how to use Sample-Factory can be found at https://www.samplefactory.dev/

Downloading the model

After installing Sample-Factory, download the model with:

python -m sample_factory.huggingface.load_from_hub -r MattStammers/appo-atari_pong

Using the model

To run the model after download, use the enjoy script corresponding to this environment:

python -m sf_examples.atari.enjoy_atari --algo=APPO --env=atari_pong --train_dir=./train_dir --experiment=appo-atari_pong

You can also upload models to the Hugging Face Hub using the same script with the --push_to_hub flag. See https://www.samplefactory.dev/10-huggingface/huggingface/ for more details

Training with this model

To continue training with this model, use the train script corresponding to this environment:

python -m sf_examples.atari.train_atari --algo=APPO --env=atari_pong --train_dir=./train_dir --experiment=appo-atari_pong --restart_behavior=resume --train_for_env_steps=10000000000

Note, you may have to adjust --train_for_env_steps to a suitably high number as the experiment will resume at the number of steps it concluded at.

SOTA Performance

The 10 million timestep-trained Pong agents are all fairly similar but none reach the level of my 'perfect' Pong agent trained using the baselines PPO implementation: https://huggingface.co/MattStammers/ppo-PongNoFrameskip-v4-final

That agent was trained with 50 million timesteps. The performance at 10 million is otherwise similar and this agent would probably reach that level with another 40million timesteps of gameplay.

I consider the Atari Pong environment now solved although it continues to act as a useful benchmarking test for algorithms as a first-pass in the Atari environments.