code

About

Hi, this is the Readme.

This Model was created as a study experiment, to re-create alpaca on my end.
It uses the gururise/AlpacaDataCleaned Dataset ( From April 7 )


Specifications

Base Model:
  LLaMA 7B

Training Parameters:
  Micro_Batch_Size = 8
  Batch_Size = 128
  Gradient_Accumulation_Steps = Batch_Size / Micro_Batch_Size   # ( 0.0625 )
  Epochs = 2
  Learning_Rate = 2e-5
  Cutoff_Len = 256   # This ( 256 ) accounts for about 96% of all data
  Lora_R = 4
  Lora_Alpha = 16
  Lora_Dropout = 0.05

Files

adapter_model.bin                # This is the Fine-tuned Weights that goes over the base LLaMA Model.
  adapter_config.bin                # This is Config File for the adapter_model file.

consolidated.00.pth              # This File is the Base Model File ( LLaMA 7B ), merged with the fine-tuned weights ( adapter_model.bin ).
  tokenizer.model                    # This is the tokenizer file, it converts the input text ( prompt ) to tokens that the NN can understand.
  params.json                          # Parameters of the Model.

ggml_model_f16.bin            # This is the same model ( consolidated.00.pth ), but now it's in 'ggml f16' format. We need this format to quantize it with llama.cpp.
  llama-hf-7b                          # This folder contains the same model ( consolidated.00.pth ), but now it's in 'huggingface' format. We need this format to quantize it with GPTQ.

quantized-model:
    ggml-model-q4_0.bin   # This is the 4-bit Quantized Model by llama.cpp, I found this to be better than GPTQ.
    llama7b-4bit-128g.pt    # This is the Quantized Model by GPTQ. It takes longer to train and gives worse results compared to llama.cpp, but it does have a ( 7.6% ) smaller file size.