SpaceInvadersNoFrameskip-v4 deep-reinforcement-learning reinforcement-learning stable-baselines3

QRDQN Agent playing SpaceInvadersNoFrameskip-v4

This is a trained model of a QRDQN agent playing SpaceInvadersNoFrameskip-v4 using the stable-baselines3 library and the RL Zoo.

The RL Zoo is a training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.

Usage (with SB3 RL Zoo)

RL Zoo: https://github.com/DLR-RM/rl-baselines3-zoo<br/> SB3: https://github.com/DLR-RM/stable-baselines3<br/> SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib

Install the RL Zoo (with SB3 and SB3-Contrib):

pip install rl_zoo3
# Download model and save it into the logs/ folder
python -m rl_zoo3.load_from_hub --algo qrdqn --env SpaceInvadersNoFrameskip-v4 -orga MattStammers -f logs/
python -m rl_zoo3.enjoy --algo qrdqn --env SpaceInvadersNoFrameskip-v4  -f logs/

If you installed the RL Zoo3 via pip (pip install rl_zoo3), from anywhere you can do:

python -m rl_zoo3.load_from_hub --algo qrdqn --env SpaceInvadersNoFrameskip-v4 -orga MattStammers -f logs/
python -m rl_zoo3.enjoy --algo qrdqn --env SpaceInvadersNoFrameskip-v4  -f logs/

Training (with the RL Zoo)

python -m rl_zoo3.train --algo qrdqn --env SpaceInvadersNoFrameskip-v4 -f logs/
# Upload the model and generate video (when possible)
python -m rl_zoo3.push_to_hub --algo qrdqn --env SpaceInvadersNoFrameskip-v4 -f logs/ -orga MattStammers

Hyperparameters

OrderedDict([('batch_size', 64),
             ('env_wrapper',
              ['stable_baselines3.common.atari_wrappers.AtariWrapper']),
             ('exploration_fraction', 0.025),
             ('frame_stack', 4),
             ('n_timesteps', 10000000.0),
             ('normalize', False),
             ('optimize_memory_usage', False),
             ('policy', 'CnnPolicy')])

Environment Arguments

{'render_mode': 'rgb_array'}

Agent Synopsis

QR-DQN is remarkably effective at space invaders vs. other models and trumps standard DQN but it takes fairly extensive training even to get to this point. This model was trained for 10 million timesteps with the above hyperparameters so is expensive to train using Colab.

If you record your own video then you can see the player bullets. This gives a much better idea of the agent's more effective gameplay strategy.

Unlike the prior DQN and PPO agents I have trained at this game the QR-DQN agent is more careful not to destroy his own defence barrier which is a key component aiding his success