llama-7b-SFT-qlora-eli5-wiki_DPO_ds_RM_top_2_1024_r_64_alpha_16

This model is a fine-tuned version of dhmeltzer/llama-7b-SFT_eli5_wiki65k_1024_r_64_alpha_16_merged on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.6538
Rewards/chosen: 0.1408
Rewards/rejected: -0.0291
Rewards/accuracies: 0.6248
Rewards/margins: 0.1699
Logps/rejected: -199.6676
Logps/chosen: -203.9681
Logits/rejected: 0.8159
Logits/chosen: 0.8393

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 32
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.03
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6828	0.2	37	0.6867	-0.3470	-0.4719	0.5792	0.1249	-201.8816	-206.4072	0.7977	0.8213
0.6666	0.41	74	0.6731	-0.1233	-0.2593	0.5855	0.1361	-200.8187	-205.2885	0.8159	0.8381
0.6713	0.61	111	0.6645	0.0492	-0.1110	0.6019	0.1602	-200.0772	-204.4260	0.8299	0.8526
0.6749	0.82	148	0.6593	0.2291	0.0917	0.5912	0.1374	-199.0636	-203.5266	0.8189	0.8414
0.6688	1.02	185	0.6538	0.1408	-0.0291	0.6248	0.1699	-199.6676	-203.9681	0.8159	0.8393
0.3721	1.23	222	0.6911	-0.3548	-0.6171	0.6007	0.2623	-202.6077	-206.4462	0.8193	0.8406
0.2845	1.43	259	0.6989	-0.3528	-0.5968	0.5984	0.2441	-202.5062	-206.4359	0.7886	0.8059
0.2646	1.64	296	0.6991	-0.4016	-0.6359	0.5880	0.2343	-202.7015	-206.6800	0.7696	0.7875
0.2263	1.84	333	0.7063	-0.4773	-0.7137	0.5925	0.2365	-203.0908	-207.0584	0.7653	0.7833

Framework versions

Transformers 4.32.1
Pytorch 2.0.1+cu118
Datasets 2.14.4
Tokenizers 0.13.3

llama-7b-SFT-qlora-eli5-wiki_DPO_ds_RM_top_2_1024_r_64_alpha_16

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

NSDT 3DConvert

UnrealSynth

DreamTexture.js