flan-t5-xl-instructiongen

This model is a fine-tuned version of google/flan-t5-xl on the pszemraj/fleece2instructions dataset. It achieves the following results on the evaluation set:

Loss: 0.8314
Rouge1: 65.3297
Rouge2: 48.8475
Rougel: 63.4183
Rougelsum: 63.5458
Gen Len: 13.7474

Model description

More information needed

Intended uses & limitations

Generate/recover instructions (assumes that there is just an instruction, not inputs as well) from arbitrary text.

Training and evaluation data

Refer to pszemraj/fleece2instructions

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 6e-05
train_batch_size: 4
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 16
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.03
num_epochs: 2.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
0.9615	1.0	362	0.8353	63.9163	47.0456	61.9554	62.0549	13.3737
0.809	2.0	724	0.8251	64.5398	47.9107	62.5928	62.7278	13.4763