Satoken
This is a SetFit model trained on multilingual datasets (mentioned below) for Sentiment classification.
The model has been trained using an efficient few-shot learning technique that involves:
- Fine-tuning a Sentence Transformer with contrastive learning.
- Training a classification head with features from the fine-tuned Sentence Transformer.
It is utilized by Germla for it's feedback analysis tool. (specifically the Sentiment analysis feature)
For other models (specific language-basis) check here
Usage
To use this model for inference, first install the SetFit library:
python -m pip install setfit
You can then run inference as follows:
from setfit import SetFitModel
# Download from Hub and run inference
model = SetFitModel.from_pretrained("germla/satoken")
# Run inference
preds = model(["i loved the spiderman movie!", "pineapple on pizza is the worst 🤮"])
Training Details
Training Data
Training Procedure
We made sure to have a balanced dataset. The model was trained on only 35% (50% for chinese) of the train split of all datasets.
Preprocessing
- Basic Cleaning (removal of dups, links, mentions, hashtags, etc.)
- Removal of stopwords using nltk
Speeds, Sizes, Times
The training procedure took 6hours on the NVIDIA T4 GPU.
Evaluation
Testing Data, Factors & Metrics
Environmental Impact
- Hardware Type: NVIDIA T4 GPU
- Hours used: 6
- Cloud Provider: Amazon Web Services
- Compute Region: ap-south-1 (Mumbai)
- Carbon Emitted: 0.39 kg co2 eq.