Model Card for llama-2-13b-apollo-guana
Model Details
Model Description
This model is a fine-tuned version of the Llama-2-13b-chat-hf model. It's trained to perform causal language modeling tasks. The fine-tuning is performed on a dataset named "guanaco-llama2-1k".
- Developed by: [Iago Gaspar]
- Shared by: [AI Flow Solutions]
- Model type: Causal Language Model
- Language(s) (NLP): [English (future pt-pt)]
- License: [Lamma2]
- Finetuned from model : meta-llama/Llama-2-13b-chat-hf
Model Sources
- Repository: [https://huggingface.co/Wtzwho/llama-2-13b-apollo-guana/tree/main]
Uses
Direct Use
This model can be used for various NLP tasks like text generation, summarization, question-answering, etc.
Downstream Use
The model can be further fine-tuned for more specific tasks such as sentiment analysis, translation, etc.
Out-of-Scope Use
The model is not intended for generating harmful or biased content.
Bias, Risks, and Limitations
The model inherits the biases present in the training data and the base model. Users should be cautious while using the model in sensitive applications.
Recommendations
Users should evaluate the model for biases and other ethical considerations before deploying it for real-world applications.
How to Get Started with the Model
The model can be loaded using the Transformers library and can be used for generating text, among other things.
Environmental Impact
- Hardware Type: [More Information Needed]
- Hours used: [More Information Needed]
- Cloud Provider: [More Information Needed]
- Compute Region: [More Information Needed]
- Carbon Emitted: [More Information Needed]
Technical Specifications
Model Architecture and Objective
The architecture is based on the Llama-2-13b-chat-hf model with causal language modeling as the primary objective.
Compute Infrastructure
Hardware
Training was conducted on a T4 GPU.
Software
The model was trained using the following Python packages:
- accelerate==0.21.0
- peft==0.4.0
- bitsandbytes==0.40.2
- transformers==4.31.0
- trl==0.4.7
Training Details
Training Data
The model was fine-tuned on a dataset named mlabonne/guanaco-llama2-1k
.
Preprocessing
The text data was tokenized using the LLaMA tokenizer.
Training Hyperparameters
- Training regime: fp32
- Learning rate: 2e-4
- Weight decay: 0.001
- Batch size per GPU for training: 4
- Batch size per GPU for evaluation: 4
- Number of update steps to accumulate the gradients for: 1
- Maximum gradient normal (gradient clipping): 0.3
- Optimizer: paged_adamw_32bit
- Learning rate schedule: cosine
- Number of training epochs: 1
- Hardware Type: T4 GPU
- Quantization type (fp4 or nf4): nf4
Additional Features
-
QLoRA parameters
- LoRA attention dimension: 64
- Alpha parameter for LoRA scaling: 16
- Dropout probability for LoRA layers: 0.1
-
bitsandbytes parameters
- Activate 4-bit precision base model loading: True
- Compute dtype for 4-bit base models: float16
- Activate nested quantization for 4-bit base models (double quantization): False
Evaluation
Testing Data
[soon...]
Factors
[soon...]
Metrics
[soon...]