llama-7b-SFT-qlora-wiki_DPO_ds_RM_random_1024_r_64_alpha_16

This model is a fine-tuned version of dhmeltzer/llama-7b-SFT_ds_wiki65k_1024_r_64_alpha_16_merged on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.6801
Rewards/chosen: -0.1790
Rewards/rejected: -0.2369
Rewards/accuracies: 0.5469
Rewards/margins: 0.0578
Logps/rejected: -206.1121
Logps/chosen: -202.9860
Logits/rejected: 1.1465
Logits/chosen: 1.1674

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 32
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.03
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6904	0.1	19	0.6904	-0.3143	-0.3636	0.5458	0.0493	-207.3793	-204.3384	1.1224	1.1416
0.6725	0.21	38	0.6850	-0.3901	-0.4540	0.5547	0.0640	-208.2836	-205.0964	1.1270	1.1469
0.6818	0.31	57	0.6801	-0.1790	-0.2369	0.5469	0.0578	-206.1121	-202.9860	1.1465	1.1674
0.6671	0.41	76	0.6863	-0.2598	-0.3469	0.5580	0.0871	-207.2126	-203.7936	1.1468	1.1665
0.6683	0.52	95	0.6841	-0.1475	-0.2325	0.5502	0.0851	-206.0687	-202.6704	1.1388	1.1590
0.6626	0.62	114	0.6846	-0.0836	-0.1600	0.5480	0.0764	-205.3429	-202.0314	1.1263	1.1474
0.6593	0.72	133	0.6864	-0.1272	-0.2184	0.5625	0.0912	-205.9276	-202.4675	1.1106	1.1306
0.672	0.83	152	0.6857	-0.1452	-0.2334	0.5592	0.0882	-206.0777	-202.6477	1.1086	1.1293
0.6671	0.93	171	0.6855	-0.1472	-0.2350	0.5547	0.0878	-206.0934	-202.6673	1.1071	1.1270

Framework versions

Transformers 4.32.1
Pytorch 2.0.1+cu118
Datasets 2.14.4
Tokenizers 0.13.3

llama-7b-SFT-qlora-wiki_DPO_ds_RM_random_1024_r_64_alpha_16

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

NSDT 3DConvert

UnrealSynth

DreamTexture.js