llama-7b-SFT-qlora-wiki_DPO_ds_RM_top_2_1024_r_64_alpha_16

This model is a fine-tuned version of dhmeltzer/llama-7b-SFT_ds_wiki65k_1024_r_64_alpha_16_merged on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.6572
Rewards/chosen: -0.1473
Rewards/rejected: -0.2755
Rewards/accuracies: 0.6128
Rewards/margins: 0.1282
Logps/rejected: -203.3539
Logps/chosen: -207.2538
Logits/rejected: 1.1534
Logits/chosen: 1.1690

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 32
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.03
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6925	0.1	19	0.6761	-0.1021	-0.1593	0.5697	0.0573	-202.1919	-206.8013	1.1506	1.1664
0.6754	0.21	38	0.6738	-0.4156	-0.5460	0.5701	0.1303	-206.0580	-209.9368	1.1257	1.1406
0.6799	0.31	57	0.6666	-0.0458	-0.1454	0.5932	0.0996	-202.0523	-206.2388	1.1176	1.1327
0.6618	0.42	76	0.6637	-0.1458	-0.2745	0.5971	0.1286	-203.3434	-207.2391	1.1195	1.1333
0.6706	0.52	95	0.6607	-0.0386	-0.1827	0.5971	0.1440	-202.4252	-206.1670	1.1334	1.1484
0.668	0.63	114	0.6596	-0.1615	-0.2945	0.6035	0.1330	-203.5434	-207.3955	1.1500	1.1661
0.6712	0.73	133	0.6597	-0.1703	-0.2905	0.5979	0.1202	-203.5037	-207.4840	1.1515	1.1672
0.6715	0.84	152	0.6588	-0.1516	-0.2745	0.6100	0.1229	-203.3436	-207.2964	1.1532	1.1691
0.673	0.94	171	0.6572	-0.1473	-0.2755	0.6128	0.1282	-203.3539	-207.2538	1.1534	1.1690

Framework versions

Transformers 4.32.1
Pytorch 2.0.1+cu118
Datasets 2.14.4
Tokenizers 0.13.3

llama-7b-SFT-qlora-wiki_DPO_ds_RM_top_2_1024_r_64_alpha_16

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

NSDT 3DConvert

UnrealSynth

DreamTexture.js