<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
qwen_w1w2attnproj_tiny_textbooks_r16
This model is a fine-tuned version of Qwen/Qwen-14B on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 2.4371
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 1
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 0.01
- num_epochs: 1
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 2.3844 | 0.01 | 63 | 2.4861 |
| 2.409 | 0.02 | 126 | 2.4854 |
| 2.4198 | 0.03 | 189 | 2.4866 |
| 2.421 | 0.04 | 252 | 2.4875 |
| 2.4453 | 0.03 | 315 | 2.4575 |
| 2.441 | 0.03 | 378 | 2.4554 |
| 2.4694 | 0.04 | 441 | 2.4557 |
| 2.4546 | 0.04 | 504 | 2.4526 |
| 2.4488 | 0.05 | 567 | 2.4523 |
| 2.4392 | 0.05 | 630 | 2.4511 |
| 2.4684 | 0.06 | 693 | 2.4512 |
| 2.4409 | 0.06 | 756 | 2.4497 |
| 2.454 | 0.07 | 819 | 2.4497 |
| 2.4985 | 0.07 | 882 | 2.4494 |
| 2.4399 | 0.08 | 945 | 2.4489 |
| 2.432 | 0.08 | 1008 | 2.4487 |
| 2.4786 | 0.09 | 1071 | 2.4479 |
| 2.4552 | 0.09 | 1134 | 2.4491 |
| 2.4235 | 0.1 | 1197 | 2.4479 |
| 2.453 | 0.1 | 1260 | 2.4470 |
| 2.4243 | 0.11 | 1323 | 2.4494 |
| 2.4527 | 0.11 | 1386 | 2.4466 |
| 2.4585 | 0.12 | 1449 | 2.4463 |
| 2.4444 | 0.12 | 1512 | 2.4456 |
| 2.4511 | 0.13 | 1575 | 2.4456 |
| 2.4472 | 0.13 | 1638 | 2.4443 |
| 2.383 | 0.14 | 1701 | 2.4457 |
| 2.4525 | 0.14 | 1764 | 2.4454 |
| 2.4131 | 0.15 | 1827 | 2.4464 |
| 2.4107 | 0.15 | 1890 | 2.4431 |
| 2.4582 | 0.16 | 1953 | 2.4432 |
| 2.4864 | 0.16 | 2016 | 2.4438 |
| 2.3838 | 0.17 | 2079 | 2.4437 |
| 2.4234 | 0.17 | 2142 | 2.4442 |
| 2.4264 | 0.18 | 2205 | 2.4432 |
| 2.4545 | 0.18 | 2268 | 2.4445 |
| 2.4139 | 0.19 | 2331 | 2.4428 |
| 2.4687 | 0.19 | 2394 | 2.4437 |
| 2.4442 | 0.2 | 2457 | 2.4442 |
| 2.434 | 0.2 | 2520 | 2.4430 |
| 2.4308 | 0.21 | 2583 | 2.4426 |
| 2.471 | 0.21 | 2646 | 2.4431 |
| 2.4646 | 0.22 | 2709 | 2.4423 |
| 2.4552 | 0.22 | 2772 | 2.4427 |
| 2.4185 | 0.23 | 2835 | 2.4423 |
| 2.4542 | 0.23 | 2898 | 2.4418 |
| 2.4011 | 0.24 | 2961 | 2.4413 |
| 2.4482 | 0.24 | 3024 | 2.4413 |
| 2.4525 | 0.25 | 3087 | 2.4413 |
| 2.408 | 0.25 | 3150 | 2.4434 |
| 2.4191 | 0.26 | 3213 | 2.4418 |
| 2.4033 | 0.26 | 3276 | 2.4429 |
| 2.3819 | 0.27 | 3339 | 2.4407 |
| 2.4762 | 0.27 | 3402 | 2.4420 |
| 2.4616 | 0.28 | 3465 | 2.4409 |
| 2.4629 | 0.28 | 3528 | 2.4409 |
| 2.4333 | 0.29 | 3591 | 2.4421 |
| 2.385 | 0.29 | 3654 | 2.4410 |
| 2.44 | 0.3 | 3717 | 2.4416 |
| 2.4133 | 0.3 | 3780 | 2.4417 |
| 2.4344 | 0.31 | 3843 | 2.4417 |
| 2.452 | 0.31 | 3906 | 2.4407 |
| 2.438 | 0.32 | 3969 | 2.4419 |
| 2.4558 | 0.32 | 4032 | 2.4429 |
| 2.4647 | 0.33 | 4095 | 2.4410 |
| 2.4441 | 0.33 | 4158 | 2.4411 |
| 2.4563 | 0.34 | 4221 | 2.4429 |
| 2.4394 | 0.34 | 4284 | 2.4418 |
| 2.4319 | 0.35 | 4347 | 2.4407 |
| 2.3877 | 0.35 | 4410 | 2.4420 |
| 2.4482 | 0.36 | 4473 | 2.4417 |
| 2.4157 | 0.36 | 4536 | 2.4406 |
| 2.4566 | 0.37 | 4599 | 2.4416 |
| 2.4521 | 0.37 | 4662 | 2.4409 |
| 2.4647 | 0.38 | 4725 | 2.4411 |
| 2.4338 | 0.38 | 4788 | 2.4405 |
| 2.5055 | 0.39 | 4851 | 2.4423 |
| 2.4696 | 0.39 | 4914 | 2.4417 |
| 2.4031 | 0.4 | 4977 | 2.4411 |
| 2.4554 | 0.4 | 5040 | 2.4414 |
| 2.4009 | 0.41 | 5103 | 2.4405 |
| 2.4632 | 0.41 | 5166 | 2.4408 |
| 2.4518 | 0.42 | 5229 | 2.4402 |
| 2.5038 | 0.42 | 5292 | 2.4403 |
| 2.4748 | 0.43 | 5355 | 2.4401 |
| 2.4026 | 0.43 | 5418 | 2.4407 |
| 2.4788 | 0.44 | 5481 | 2.4401 |
| 2.4105 | 0.44 | 5544 | 2.4401 |
| 2.4177 | 0.45 | 5607 | 2.4407 |
| 2.4788 | 0.45 | 5670 | 2.4400 |
| 2.4809 | 0.46 | 5733 | 2.4401 |
| 2.4344 | 0.46 | 5796 | 2.4399 |
| 2.4275 | 0.47 | 5859 | 2.4392 |
| 2.435 | 0.47 | 5922 | 2.4392 |
| 2.3988 | 0.48 | 5985 | 2.4388 |
| 2.4626 | 0.48 | 6048 | 2.4386 |
| 2.4623 | 0.49 | 6111 | 2.4391 |
| 2.4529 | 0.49 | 6174 | 2.4389 |
| 2.4769 | 0.5 | 6237 | 2.4378 |
| 2.4646 | 0.5 | 6300 | 2.4389 |
| 2.4157 | 0.51 | 6363 | 2.4389 |
| 2.4269 | 0.51 | 6426 | 2.4387 |
| 2.4655 | 0.52 | 6489 | 2.4383 |
| 2.4145 | 0.52 | 6552 | 2.4381 |
| 2.4205 | 0.53 | 6615 | 2.4377 |
| 2.4176 | 0.53 | 6678 | 2.4389 |
| 2.4225 | 0.54 | 6741 | 2.4381 |
| 2.47 | 0.54 | 6804 | 2.4386 |
| 2.4348 | 0.55 | 6867 | 2.4382 |
| 2.4374 | 0.55 | 6930 | 2.4388 |
| 2.399 | 0.56 | 6993 | 2.4388 |
| 2.3887 | 0.56 | 7056 | 2.4383 |
| 2.4227 | 0.57 | 7119 | 2.4376 |
| 2.444 | 0.57 | 7182 | 2.4375 |
| 2.4646 | 0.58 | 7245 | 2.4370 |
| 2.4007 | 0.58 | 7308 | 2.4373 |
| 2.4717 | 0.59 | 7371 | 2.4377 |
| 2.4804 | 0.59 | 7434 | 2.4373 |
| 2.4753 | 0.6 | 7497 | 2.4375 |
| 2.4125 | 0.6 | 7560 | 2.4378 |
| 2.45 | 0.61 | 7623 | 2.4377 |
| 2.4177 | 0.61 | 7686 | 2.4372 |
| 2.4355 | 0.62 | 7749 | 2.4376 |
| 2.4414 | 0.62 | 7812 | 2.4380 |
| 2.4072 | 0.63 | 7875 | 2.4386 |
| 2.4581 | 0.64 | 7938 | 2.4378 |
| 2.4161 | 0.64 | 8001 | 2.4373 |
| 2.4596 | 0.65 | 8064 | 2.4375 |
| 2.3987 | 0.65 | 8127 | 2.4373 |
| 2.4152 | 0.66 | 8190 | 2.4372 |
| 2.4381 | 0.66 | 8253 | 2.4381 |
| 2.4604 | 0.67 | 8316 | 2.4378 |
| 2.4212 | 0.67 | 8379 | 2.4376 |
| 2.421 | 0.68 | 8442 | 2.4368 |
| 2.4346 | 0.68 | 8505 | 2.4370 |
| 2.4284 | 0.69 | 8568 | 2.4371 |
| 2.4525 | 0.69 | 8631 | 2.4371 |
| 2.4434 | 0.7 | 8694 | 2.4380 |
| 2.3519 | 0.7 | 8757 | 2.4374 |
| 2.4297 | 0.71 | 8820 | 2.4376 |
| 2.3884 | 0.71 | 8883 | 2.4378 |
| 2.4528 | 0.72 | 8946 | 2.4374 |
| 2.4313 | 0.72 | 9009 | 2.4372 |
| 2.4309 | 0.73 | 9072 | 2.4374 |
| 2.4178 | 0.73 | 9135 | 2.4373 |
| 2.4132 | 0.74 | 9198 | 2.4371 |
| 2.4062 | 0.74 | 9261 | 2.4372 |
| 2.4276 | 0.75 | 9324 | 2.4373 |
| 2.4452 | 0.75 | 9387 | 2.4370 |
| 2.4256 | 0.76 | 9450 | 2.4374 |
| 2.4125 | 0.76 | 9513 | 2.4377 |
| 2.4055 | 0.77 | 9576 | 2.4377 |
| 2.4655 | 0.77 | 9639 | 2.4379 |
| 2.4328 | 0.78 | 9702 | 2.4376 |
| 2.4632 | 0.78 | 9765 | 2.4372 |
| 2.4374 | 0.79 | 9828 | 2.4375 |
| 2.4247 | 0.79 | 9891 | 2.4378 |
| 2.417 | 0.8 | 9954 | 2.4375 |
| 2.4711 | 0.8 | 10017 | 2.4373 |
| 2.4041 | 0.81 | 10080 | 2.4376 |
| 2.4048 | 0.81 | 10143 | 2.4376 |
| 2.42 | 0.82 | 10206 | 2.4373 |
| 2.3942 | 0.82 | 10269 | 2.4376 |
| 2.4307 | 0.83 | 10332 | 2.4379 |
| 2.435 | 0.83 | 10395 | 2.4377 |
| 2.4647 | 0.84 | 10458 | 2.4377 |
| 2.4622 | 0.84 | 10521 | 2.4373 |
| 2.4204 | 0.85 | 10584 | 2.4373 |
| 2.4195 | 0.85 | 10647 | 2.4373 |
| 2.3934 | 0.86 | 10710 | 2.4373 |
| 2.4511 | 0.86 | 10773 | 2.4374 |
| 2.4431 | 0.87 | 10836 | 2.4372 |
| 2.4013 | 0.87 | 10899 | 2.4370 |
| 2.444 | 0.88 | 10962 | 2.4372 |
| 2.4355 | 0.88 | 11025 | 2.4371 |
| 2.4502 | 0.89 | 11088 | 2.4372 |
| 2.4418 | 0.89 | 11151 | 2.4369 |
| 2.4307 | 0.9 | 11214 | 2.4371 |
| 2.4342 | 0.9 | 11277 | 2.4372 |
| 2.4753 | 0.91 | 11340 | 2.4369 |
| 2.4044 | 0.91 | 11403 | 2.4372 |
| 2.4289 | 0.92 | 11466 | 2.4371 |
| 2.468 | 0.92 | 11529 | 2.4371 |
| 2.4546 | 0.93 | 11592 | 2.4372 |
| 2.4484 | 0.93 | 11655 | 2.4370 |
| 2.4566 | 0.94 | 11718 | 2.4372 |
| 2.4079 | 0.94 | 11781 | 2.4370 |
| 2.4264 | 0.95 | 11844 | 2.4371 |
| 2.4729 | 0.95 | 11907 | 2.4371 |
| 2.3787 | 0.96 | 11970 | 2.4372 |
| 2.3911 | 0.96 | 12033 | 2.4370 |
| 2.4216 | 0.97 | 12096 | 2.4371 |
| 2.4465 | 0.97 | 12159 | 2.4371 |
| 2.3939 | 0.98 | 12222 | 2.4370 |
| 2.419 | 0.98 | 12285 | 2.4371 |
| 2.4358 | 0.99 | 12348 | 2.4370 |
| 2.47 | 0.99 | 12411 | 2.4371 |
| 2.423 | 1.0 | 12474 | 2.4371 |
Framework versions
- Transformers 4.35.0.dev0
- Pytorch 2.1.0+cu121
- Datasets 2.5.2
- Tokenizers 0.14.0