Model Card for Apollo-13B-pt
Model Details
Model Description
This model is a fine-tuned version of the Llama-2-13b-chat-hf model. It's trained to perform causal language modeling tasks. The fine-tuning is performed on a dataset named "guanaco-llama2-1k".
- Developed by: [Iago Gaspar]
- Shared by: [AI Flow Solutions]
- Model type: Causal Language Model
- Language(s) (NLP): pt-pt
- License: Lamma2
- Finetuned from model: meta-llama/Llama-2-13b-chat-hf
Model Sources
- Repository: Hugging Face Repository
Uses
Direct Use
This model can be used for various NLP tasks like text generation, summarization, question-answering, etc.
Downstream Use
The model can be further fine-tuned for more specific tasks such as sentiment analysis, translation, etc.
Out-of-Scope Use
The model is not intended for generating harmful or biased content.
Bias, Risks, and Limitations
The model inherits the biases present in the training data and the base model. Users should be cautious while using the model in sensitive applications.
Recommendations
Users should evaluate the model for biases and other ethical considerations before deploying it for real-world applications.
How to Get Started with the Model
The model can be loaded using the Transformers library and can be used for generating text, among other things.
Environmental Impact
- Hardware Type: NVIDIA A100 GPU
- Hours used: 6 hours
- Cloud Provider: Google
- Compute Region: EU
- Carbon Emitted: Approximately 0.71 kgCO₂
Technical Specifications
Model Architecture
The architecture is based on the Llama-2-13b-chat-hf model with causal language modeling as the primary objective.
Training Details
- Hardware: A100G for 5 hours
- Software:
- accelerate==0.21.0
- peft==0.4.0
- bitsandbytes==0.40.2
- transformers==4.31.0
- trl==0.4.7
- Training Data:
- mlabonne/guanaco-llama2-1k
- vlofgren/cabrita-and-guanaco-PTBR
- Learning Rate: 2e-4
- Batch Size: 4 (per GPU)
- Optimizer: paged_adamw_32bit
- Quantization: 4-bit with nf4 type
QLoRA Parameters
- LoRA Attention Dimension: 64
- Alpha Parameter for LoRA Scaling: 16
- Dropout Probability for LoRA Layers: 0.1
Bitsandbytes Parameters
- 4-bit Precision Base Model Loading: True
- Compute dtype for 4-bit Base Models: float16
- Nested Quantization for 4-bit Base Models: False
Additional Features
- The model supports Weights & Biases integration.
- Provides various options for inference post-training.
- Compatible with larger models up to 70B parameters.
- Supports merging of weights for enhanced performance.
Evaluation
Testing Data
[Soon...]
Factors
[Soon...]
Metrics
[Soon...]