long-t5-tglobal-base-16384-booksci-summary: v1

An experiment investigating transfer learning capabilities by fine-tuning models on different datasets starting from the booksum checkpoint.

Model Details

This model is a fine-tuned version of pszemraj/long-t5-tglobal-base-16384-book-summary on the pszemraj/scientific_lay_summarisation-elife-norm dataset for two epochs.

Usage

It's recommended to use this model with beam search decoding. If interested, you can also use the textsum util repo to have most of this abstracted out for you:

pip install -U textsum

from textsum.summarize import Summarizer

model_name = "pszemraj/long-t5-tglobal-base-16384-booksci-summary-v1"
summarizer = Summarizer(model_name) # GPU auto-detected
text = "put the text you don't want to read here"
summary = summarizer.summarize_string(text)
print(summary)

Intended uses & limitations

This is an initial experiment
Domain generalization abilities at time of writing are unknown

Training procedure

Note: this model was trained at a lower LR & not till "absolute convergence" with the intention of retaining some of the properties learned from the initial fine-tuning on booksum

Results

It achieves the following results on the evaluation set:

Loss: 2.3994
Rouge1: 34.2428
Rouge2: 4.3644
Rougel: 12.5332
Rougelsum: 30.6965
Gen Len: 294.0249

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 4
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 16
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.05
num_epochs: 2.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
2.7492	0.99	67	2.4272	34.6436	4.4536	12.4985	30.916	300.7635
2.6689	1.97	134	2.3994	34.2428	4.3644	12.5332	30.6965	294.0249