RotoBART

Running the script

Script arguemnts

Available model config arguments from script:

encoder_layers
encoder_ffn_dim
decoder_layers
decoder_ffn_dim
d_model
vocab_size
max_position_embeddings
encoder_layerdrop
decoder_layerdrop

Training Arguments:

testing : only uses 1 batch, for testing the script

adafactor: will enable adafactor, removing the command will revert to Adam

grad_accum: what value for gradient accumulation to use, default is 4

use_bf16: convert the model to bf16

colab_tpu: if running on a colab TPU

use_wandb: log using Weights & Biases (via Tensorboard)

save_strategy: whether or not to save model checkpoints based on steps or epoch

python rotobart/run_dnlm_flax.py \
  --output_dir rotobart_output \
  --overwrite_output_dir \
  --dataset_path rotobart/pile.py \
  --model_name_or_path rotobart \
  --tokenizer_name ./rotobart/vocab-2/the_pile.model \
  --shuffle_buffer_size 1000 \
  --do_train --do_eval \
  --max_seq_length 1024 \
  --encoder_layers 2 \
  --decoder_layers 2 \
  --per_device_train_batch_size 2 \
  --per_device_eval_batch_size 2 \
  --logging_steps 8 \
  --num_train_steps 1000 \
  --eval_steps 1000 \
  --save_steps 1000 \
  --save_strategy steps \
  --num_eval_samples 100 \
  --warmup_steps 30 \
  --learning_rate 1e-4 \
  --use_wandb \
  --testing \
  --use_bf16 \
  --adafactor

alt

python3 run_dnlm_flax.py   --output_dir rotobart_output   --overwrite_output_dir   --dataset_path pile.py   --model_name_or_path rotobart   --tokenizer_name vocab-2/the_pile.model   --shuffle_buffer_size 1000   --do_train --do_eval --max_position_embeddings 2048  --max_seq_length 2048   --encoder_layers 6   --decoder_layers 6   --per_device_train_batch_size 1   --per_device_eval_batch_size 1   --logging_steps 100   --num_train_steps 50000   --eval_steps 2500   --save_steps 2500   --save_strategy steps   --num_eval_samples 5000   --warmup_steps 5000   --learning_rate 1e-4   --use_wandb   --use_bf16   --adafactor