SpaceInvadersNoFrameskip-v4 deep-reinforcement-learning reinforcement-learning stable-baselines3

QRDQN Agent playing SpaceInvadersNoFrameskip-v4

This is a trained model of a QRDQN agent playing SpaceInvadersNoFrameskip-v4 using the stable-baselines3 library and the RL Zoo.

The RL Zoo is a training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.

Usage (with SB3 RL Zoo)

RL Zoo: https://github.com/DLR-RM/rl-baselines3-zoo<br/> SB3: https://github.com/DLR-RM/stable-baselines3<br/> SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib

Install the RL Zoo (with SB3 and SB3-Contrib):

pip install rl_zoo3
# Download model and save it into the logs/ folder
python -m rl_zoo3.load_from_hub --algo qrdqn --env SpaceInvadersNoFrameskip-v4 -orga MattStammers -f logs/
python -m rl_zoo3.enjoy --algo qrdqn --env SpaceInvadersNoFrameskip-v4  -f logs/

If you installed the RL Zoo3 via pip (pip install rl_zoo3), from anywhere you can do:

python -m rl_zoo3.load_from_hub --algo qrdqn --env SpaceInvadersNoFrameskip-v4 -orga MattStammers -f logs/
python -m rl_zoo3.enjoy --algo qrdqn --env SpaceInvadersNoFrameskip-v4  -f logs/

Training (with the RL Zoo)

python -m rl_zoo3.train --algo qrdqn --env SpaceInvadersNoFrameskip-v4 -f logs/
# Upload the model and generate video (when possible)
python -m rl_zoo3.push_to_hub --algo qrdqn --env SpaceInvadersNoFrameskip-v4 -f logs/ -orga MattStammers

Hyperparameters

OrderedDict([('batch_size', 64),
             ('env_wrapper',
              ['stable_baselines3.common.atari_wrappers.AtariWrapper']),
             ('exploration_fraction', 0.025),
             ('frame_stack', 4),
             ('n_timesteps', 50000000.0),
             ('normalize', False),
             ('optimize_memory_usage', False),
             ('policy', 'CnnPolicy')])

Environment Arguments

{'render_mode': 'rgb_array'}

So another 24 hours of training has made all the difference and the agent is now ranked number 3. Not because it is getting more points but because it is becoming more consistent