llama-7b-SFT-qlora-eli5-wiki_DPO_ds_RM_random_1024_r_64_alpha_16

This model is a fine-tuned version of dhmeltzer/llama-7b-SFT_eli5_wiki65k_1024_r_64_alpha_16_merged on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.6788
Rewards/chosen: -0.0760
Rewards/rejected: -0.1428
Rewards/accuracies: 0.5781
Rewards/margins: 0.0669
Logps/rejected: -202.0682
Logps/chosen: -199.2469
Logits/rejected: 1.0323
Logits/chosen: 1.0541

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 32
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.03
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6913	0.1	19	0.6845	-0.4006	-0.4672	0.5558	0.0665	-205.3114	-202.4936	1.0265	1.0467
0.6768	0.21	38	0.6796	-0.3409	-0.4196	0.5603	0.0787	-204.8360	-201.8965	1.0326	1.0538
0.6771	0.31	57	0.6788	-0.0760	-0.1428	0.5781	0.0669	-202.0682	-199.2469	1.0323	1.0541
0.6665	0.41	76	0.6826	-0.1511	-0.2355	0.5703	0.0843	-202.9944	-199.9986	1.0413	1.0635
0.6669	0.52	95	0.6830	-0.1285	-0.2165	0.5781	0.0880	-202.8050	-199.7720	1.0299	1.0522
0.669	0.62	114	0.6800	-0.0932	-0.1803	0.5725	0.0871	-202.4429	-199.4187	1.0126	1.0352
0.6559	0.72	133	0.6829	-0.0011	-0.1074	0.5759	0.1063	-201.7135	-198.4980	1.0015	1.0232
0.6698	0.83	152	0.6810	-0.0519	-0.1530	0.5781	0.1011	-202.1696	-199.0062	0.9974	1.0192
0.6643	0.93	171	0.6799	-0.0579	-0.1589	0.5658	0.1010	-202.2284	-199.0658	1.0002	1.0220

Framework versions

Transformers 4.32.1
Pytorch 2.0.1+cu118
Datasets 2.14.4
Tokenizers 0.13.3

llama-7b-SFT-qlora-eli5-wiki_DPO_ds_RM_random_1024_r_64_alpha_16

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

NSDT 3DConvert

UnrealSynth

DreamTexture.js