RotoBART
Running the script
Script arguemnts
Available model config arguments from script:
encoder_layers
encoder_ffn_dim
decoder_layers
decoder_ffn_dim
d_model
vocab_size
max_position_embeddings
encoder_layerdrop
decoder_layerdrop
Training Arguments:
testing
: only uses 1 batch, for testing the script
adafactor
: will enable adafactor, removing the command will revert to Adam
grad_accum
: what value for gradient accumulation to use, default is 4
use_bf16
: convert the model to bf16
colab_tpu
: if running on a colab TPU
use_wandb
: log using Weights & Biases (via Tensorboard)
save_strategy
: whether or not to save model checkpoints based on steps or epoch
python rotobart/run_dnlm_flax.py \
--output_dir rotobart_output \
--overwrite_output_dir \
--dataset_path rotobart/pile.py \
--model_name_or_path rotobart \
--tokenizer_name ./rotobart/vocab-2/the_pile.model \
--shuffle_buffer_size 1000 \
--do_train --do_eval \
--max_seq_length 1024 \
--encoder_layers 2 \
--decoder_layers 2 \
--per_device_train_batch_size 2 \
--per_device_eval_batch_size 2 \
--logging_steps 8 \
--num_train_steps 1000 \
--eval_steps 1000 \
--save_steps 1000 \
--save_strategy steps \
--num_eval_samples 100 \
--warmup_steps 30 \
--learning_rate 1e-4 \
--use_wandb \
--testing \
--use_bf16 \
--adafactor
alt
python3 run_dnlm_flax.py --output_dir rotobart_output --overwrite_output_dir --dataset_path pile.py --model_name_or_path rotobart --tokenizer_name vocab-2/the_pile.model --shuffle_buffer_size 1000 --do_train --do_eval --max_position_embeddings 2048 --max_seq_length 2048 --encoder_layers 6 --decoder_layers 6 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --logging_steps 100 --num_train_steps 50000 --eval_steps 2500 --save_steps 2500 --save_strategy steps --num_eval_samples 5000 --warmup_steps 5000 --learning_rate 1e-4 --use_wandb --use_bf16 --adafactor