Mistral-11B-OmniMix-pippa-sharegpt-11b-qlora

This is a repository of my Mistral-11B-OmniMix Qlora checkpoints of the PIPPA-ShareGPT dataset.

You can read more about the dataset on its relevant page. It's a ShareGPT reformat of the PIPPA dataset by PygmalionAI. The reformat was done to allow for axolotl compatability.

Architecture

Model Architecture: Mistral-11B-OmniMix
Training Algorithm: QLora
Dataset Used: PIPPA-ShareGPT (pippa_sharegpt_trimmed.jsonl)

Training Details

Dataset: PIPPA-ShareGPT
Datset type: ShareGPT
Training Parameters: See Here
Training Environment: Axolotl
sequence_len: 8196

Instruct Format

ShareGPT gets converted to vicuna format. The dataset uses modified roles of USER and CHARACTER instead of USER and ASSISTANT.

SYSTEM: Enter roleplay mode...
USER: {prompt}
CHARACTER:

Notes

This Qlora was produced as an experiment to see how the public version of PIPPA can affect a model. Also, Mistral is fairly new and training/finetune can be broken. As a result, I have no idea if this lora is of great quality or absolute garbage and was mean to be only used with OmniMix.

Acknowledgments

Thanks to:

PygmalionAI: The creators of the PIPPA dataset
Axolotl: Finetuning suite
Kingbri: The OG author of this LoRA who helped me a lot

Donate?

If you'd like to donate to Kingbri, you can do so here: https://ko-fi.com/kingbri

If you'd like to donate to me, you can also do it here: https://ko-fi.com/undiai

You should not feel obligated to donate, but if you do, we'll appreciate it.

Axolotl stuff

Training procedure

The following bitsandbytes quantization config was used during training:

quant_method: bitsandbytes
load_in_8bit: False
load_in_4bit: True
llm_int8_threshold: 6.0
llm_int8_skip_modules: None
llm_int8_enable_fp32_cpu_offload: False
llm_int8_has_fp16_weight: False
bnb_4bit_quant_type: nf4
bnb_4bit_use_double_quant: True
bnb_4bit_compute_dtype: bfloat16

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss
1.6447	0.34	50	1.6321
1.6243	0.68	100	1.5702
1.527	1.01	150	1.5406
1.4873	1.35	200	1.5275
1.5005	1.69	250	1.5196
1.4054	2.03	300	1.5153
1.4145	2.36	350	1.5149
1.4867	2.7	400	1.5138

Framework versions

Transformers 4.35.0.dev0
Pytorch 2.0.1+cu118
Datasets 2.14.5
Tokenizers 0.14.1