Model Card for Apollo-13B-pt

Model Details

Model Description

This model is a fine-tuned version of the Llama-2-13b-chat-hf model. It's trained to perform causal language modeling tasks. The fine-tuning is performed on a dataset named "guanaco-llama2-1k".

Developed by: [Iago Gaspar]
Shared by: [AI Flow Solutions]
Model type: Causal Language Model
Language(s) (NLP): pt-pt
License: Lamma2
Finetuned from model: meta-llama/Llama-2-13b-chat-hf

Model Sources

Repository: Hugging Face Repository

Uses

Direct Use

This model can be used for various NLP tasks like text generation, summarization, question-answering, etc.

Downstream Use

The model can be further fine-tuned for more specific tasks such as sentiment analysis, translation, etc.

Out-of-Scope Use

The model is not intended for generating harmful or biased content.

Bias, Risks, and Limitations

The model inherits the biases present in the training data and the base model. Users should be cautious while using the model in sensitive applications.

Recommendations

Users should evaluate the model for biases and other ethical considerations before deploying it for real-world applications.

How to Get Started with the Model

The model can be loaded using the Transformers library and can be used for generating text, among other things.

Environmental Impact

Hardware Type: NVIDIA A100 GPU
Hours used: 6 hours
Cloud Provider: Google
Compute Region: EU
Carbon Emitted: Approximately 0.71 kgCO₂

Technical Specifications

Model Architecture

The architecture is based on the Llama-2-13b-chat-hf model with causal language modeling as the primary objective.

Training Details

Hardware: A100G for 5 hours
Software:
- accelerate==0.21.0
- peft==0.4.0
- bitsandbytes==0.40.2
- transformers==4.31.0
- trl==0.4.7
Training Data:
- mlabonne/guanaco-llama2-1k
- vlofgren/cabrita-and-guanaco-PTBR
Learning Rate: 2e-4
Batch Size: 4 (per GPU)
Optimizer: paged_adamw_32bit
Quantization: 4-bit with nf4 type

QLoRA Parameters

LoRA Attention Dimension: 64
Alpha Parameter for LoRA Scaling: 16
Dropout Probability for LoRA Layers: 0.1

Bitsandbytes Parameters

4-bit Precision Base Model Loading: True
Compute dtype for 4-bit Base Models: float16
Nested Quantization for 4-bit Base Models: False

Additional Features

The model supports Weights & Biases integration.
Provides various options for inference post-training.
Compatible with larger models up to 70B parameters.
Supports merging of weights for enhanced performance.

Evaluation

Testing Data

[Soon...]

Factors

[Soon...]

Metrics

[Soon...]

Model Card for Apollo-13B-pt

Model Details

Model Description

Model Sources

Uses

Direct Use

Downstream Use

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

How to Get Started with the Model

Environmental Impact

Technical Specifications

Model Architecture

Training Details

QLoRA Parameters

Bitsandbytes Parameters

Additional Features

Evaluation

Testing Data

Factors

Metrics

NSDT 3DConvert

UnrealSynth

DreamTexture.js