llama-7b-SFT-qlora-eli5_DPO_ds_RM_contrast_1024_r_64_alpha_16

This model is a fine-tuned version of dhmeltzer/llama-7b-SFT_ds_eli5_1024_r_64_alpha_16_merged on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.6204
Rewards/chosen: -0.0937
Rewards/rejected: -0.3490
Rewards/accuracies: 0.6641
Rewards/margins: 0.2553
Logps/rejected: -205.9560
Logps/chosen: -211.6011
Logits/rejected: 1.1663
Logits/chosen: 1.1890

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 32
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.03
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6908	0.1	19	0.6536	-0.2975	-0.4466	0.6060	0.1491	-206.9322	-213.6386	1.1767	1.1980
0.6613	0.21	38	0.6391	-0.1759	-0.3858	0.6172	0.2099	-206.3239	-212.4229	1.1695	1.1930
0.6667	0.31	57	0.6297	-0.0287	-0.2656	0.6440	0.2369	-205.1224	-210.9511	1.1612	1.1863
0.6532	0.42	76	0.6271	-0.0915	-0.3376	0.6172	0.2461	-205.8420	-211.5791	1.1395	1.1612
0.6546	0.52	95	0.6235	-0.0575	-0.2906	0.6362	0.2331	-205.3723	-211.2390	1.1551	1.1781
0.6528	0.62	114	0.6231	-0.0939	-0.3382	0.6562	0.2443	-205.8482	-211.6033	1.1702	1.1932
0.646	0.73	133	0.6204	-0.1525	-0.4204	0.6518	0.2678	-206.6696	-212.1891	1.1664	1.1886
0.6524	0.83	152	0.6208	-0.1083	-0.3660	0.6607	0.2577	-206.1257	-211.7465	1.1548	1.1765
0.6335	0.94	171	0.6204	-0.0937	-0.3490	0.6641	0.2553	-205.9560	-211.6011	1.1663	1.1890

Framework versions

Transformers 4.32.1
Pytorch 2.0.1+cu118
Datasets 2.14.4
Tokenizers 0.13.3

llama-7b-SFT-qlora-eli5_DPO_ds_RM_contrast_1024_r_64_alpha_16

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

NSDT 3DConvert

UnrealSynth

DreamTexture.js