Llama-2-7b-chat-hf-romanian
This model is a fine-tuned version of Llama-2-7b-chat-hf on the _ dataset.
Training procedure
Trained with LoRA in 4-bit (bitsandbytes enables this process) using the rank of low rank matrices equal to 64 and targeting all linear layers.
Training hyperparameters
For training, the following parameters were used:
- num_train_epochs=3
- per_device_train_batch_size=1
- gradient_accumulation_steps=2
- optim="paged_adamw_32bit"
- save_steps=0
- logging_steps=10
- learning_rate=2e-4
- weight_decay=0.001
- fp16=False
- bf16=False
- max_grad_norm=0.3
- max_steps=-1
- warmup_ratio=0.03
- group_by_length=True
- lr_scheduler_type="cosine"
Training results
Training Loss | Learning Rate | Epoch |
---|---|---|
2.9963 | 9.0909090e-05 | 0.04 |
1.3888 | 0.00019993735 | 0.12 |
0.4808 | 0.00019923349 | 0.20 |
0.4582 | 0.00019775300 | 0.29 |
0.2332 | 0.00019410305 | 0.41 |
0.4112 | 0.00019074472 | 0.49 |
0.2399 | 0.00018438534 | 0.61 |
0.3119 | 0.00017931288 | 0.70 |
0.2503 | 0.00017361969 | 0.78 |
0.3141 | 0.00016401474 | 0.90 |
0.2772 | 0.00015328346 | 1.02 |
0.2380 | 0.00014559606 | 1.10 |
0.1548 | 0.00013755180 | 1.19 |
0.2151 | 0.00012495469 | 1.31 |
0.1473 | 0.00011629864 | 1.39 |
0.2067 | 0.00010309684 | 1.51 |
0.1223 | 9.4250959e-05 | 1.60 |
0.1683 | 8.5450063e-05 | 1.68 |
0.1239 | 7.2483527e-05 | 1.80 |
0.1528 | 6.4094366e-05 | 1.88 |
0.1233 | 5.2057449e-05 | 2.00 |
0.1031 | 4.4488973e-05 | 2.09 |
0.0833 | 3.3968443e-05 | 2.21 |
0.1133 | 2.7589595e-05 | 2.29 |
0.1005 | 1.9098300e-05 | 2.41 |
0.1298 | 1.4220031e-05 | 2.49 |
0.0860 | 8.1718995e-06 | 2.62 |
0.0849 | 5.0320121e-06 | 2.70 |
0.0899 | 1.7218739e-06 | 2.82 |
0.1144 | 4.7342963e-07 | 2.90 |
0.0796 | 3.9157071e-09 | 2.99 |
Framework versions
- transformers: 4.34.0
- accelerate: 0.23.0
- peft: 0.5.0
- sentencepiece: 0.1.99
- bitsandbytes: 0.41.1
- torch: 2.0.1+cu118
- datasets: 2.14.5