<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
flan-t5-large-clang8-e8-b16
This model is a fine-tuned version of google/flan-t5-large on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 0.3695
- Rouge1: 81.5302
- Rouge2: 75.0006
- Rougel: 80.8048
- Rougelsum: 80.8804
- Gen Len: 16.4010
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adafactor
- lr_scheduler_type: linear
- num_epochs: 8
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
---|---|---|---|---|---|---|---|---|
0.2445 | 0.34 | 50000 | 0.4619 | 75.7068 | 68.1664 | 74.9827 | 75.0547 | 15.3599 |
0.204 | 0.68 | 100000 | 0.4205 | 78.5266 | 71.5654 | 77.9103 | 77.9927 | 15.7896 |
0.1874 | 1.02 | 150000 | 0.3911 | 79.4458 | 72.6087 | 78.9344 | 78.9609 | 16.0050 |
0.1632 | 1.36 | 200000 | 0.3829 | 80.8458 | 74.0334 | 80.2669 | 80.294 | 16.1926 |
0.1613 | 1.7 | 250000 | 0.4153 | 77.1978 | 70.3072 | 76.6089 | 76.6786 | 15.5472 |
0.1563 | 2.04 | 300000 | 0.3695 | 81.5302 | 75.0006 | 80.8048 | 80.8804 | 16.4010 |
0.1363 | 2.38 | 350000 | 0.3996 | 78.4903 | 71.72 | 77.8836 | 77.9325 | 15.8473 |
0.1364 | 2.72 | 400000 | 0.3952 | 78.9584 | 72.1529 | 78.2906 | 78.3357 | 15.8277 |
0.1296 | 3.06 | 450000 | 0.3843 | 78.845 | 71.8678 | 78.1533 | 78.2124 | 15.8288 |
0.1116 | 3.4 | 500000 | 0.3839 | 80.9622 | 74.4656 | 80.4038 | 80.4719 | 16.1280 |
0.1119 | 3.74 | 550000 | 0.3734 | 80.4889 | 73.7373 | 79.7885 | 79.8515 | 16.1164 |
0.1042 | 4.08 | 600000 | 0.3857 | 80.6982 | 73.9657 | 80.1064 | 80.1566 | 16.1577 |
0.088 | 4.42 | 650000 | 0.3929 | 79.4607 | 72.7156 | 78.8256 | 78.8513 | 15.8441 |
0.0879 | 4.76 | 700000 | 0.3817 | 80.4776 | 73.8007 | 79.796 | 79.8602 | 16.1504 |
0.0803 | 5.1 | 750000 | 0.4107 | 80.5342 | 73.8022 | 79.8444 | 79.9184 | 16.1771 |
0.0654 | 5.44 | 800000 | 0.4003 | 79.7355 | 73.18 | 79.161 | 79.1934 | 15.9977 |
0.0658 | 5.78 | 850000 | 0.4130 | 78.9623 | 72.3575 | 78.2787 | 78.3253 | 15.7458 |
0.0573 | 6.13 | 900000 | 0.4490 | 79.4727 | 72.8786 | 78.7881 | 78.8366 | 15.9856 |
0.0451 | 6.47 | 950000 | 0.4379 | 80.0433 | 73.4325 | 79.4249 | 79.4754 | 15.9811 |
0.0447 | 6.81 | 1000000 | 0.4500 | 80.3675 | 73.7878 | 79.75 | 79.8028 | 16.0735 |
0.0381 | 7.15 | 1050000 | 0.5095 | 79.3452 | 72.6912 | 78.7528 | 78.7793 | 15.8832 |
0.0299 | 7.49 | 1100000 | 0.4950 | 79.956 | 73.2692 | 79.3025 | 79.3577 | 16.0187 |
0.0291 | 7.83 | 1150000 | 0.4987 | 79.4864 | 72.8335 | 78.9 | 78.9366 | 15.9037 |
Framework versions
- Transformers 4.27.4
- Pytorch 1.11.0a0+b6df043
- Datasets 2.11.0
- Tokenizers 0.13.2