LinkedCringe v0.2: e5-small
fine-tuned on LinkedCringe v0.2 from intfloat/e5-small
<a href="https://ibb.co/VMJPTwK"><img src="https://i.ibb.co/XFjvtYw/carbon.png" alt="carbon" border="0"></a>
<!-- alternate --> <!-- <a href="https://ibb.co/hR49z8Q"><img src="https://i.ibb.co/991g5YK/image.png" alt="image" border="0"></a> -->
<a href="https://colab.research.google.com/gist/pszemraj/0b0c2663aa38f3b5f2d923010cfda5a8/scratchpad.ipynb"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> </a>
This is an initial test/work-in-progress, but not bad thus far.
Model
This is a SetFit model that can be used for text classification. The model has been trained using an efficient few-shot learning technique that involves:
- Fine-tuning a Sentence Transformer with contrastive learning.
- Training a classification head with features from the fine-tuned Sentence Transformer.
Labels
This model has been trained (using methods described above) to predict a single class label for `<text>' from the following:
# numeric id: text label
{
1: 'cringe',
2: 'relevant',
3: 'info',
4: 'noise'
}
Usage
To use this model for inference, first install the SetFit library:
python -m pip install setfit
basic inference
You can then run inference as follows:
from setfit import SetFitModel
# Download from Hub and run inference
model = SetFitModel.from_pretrained("pszemraj/e5-small-LinkedCringe-setfit-skl-20it-2e")
# Run inference
preds = model(["i loved the spiderman movie!", "pineapple on pizza is the worst ๐คฎ"])
# manually refer to labels above
preds
Class object with utils
create a"custom" wrapper class with the labels:
from setfit import SetFitModel
from typing import List, Dict
class PostClassifier:
DEFAULT_ID2LABEL = {1: "cringe", 2: "relevant", 3: "info", 4: "noise"}
def __init__(
self,
model_id: str = "pszemraj/e5-small-LinkedCringe-setfit-skl-20it-2e",
id2label: Dict[int, str] = None,
):
"""Initialize PostClassifier with model name and/or label mapping."""
self.model = SetFitModel.from_pretrained(model_id)
self.id2label = id2label if id2label else self.DEFAULT_ID2LABEL
def classify(self, texts: List[str]) -> List[str]:
"""Classify list of texts, return list of corresponding labels."""
preds = self.model(texts)
return [self.id2label[int(pred)] for pred in preds]
def predict_proba(self, texts: List[str]) -> List[Dict[str, float]]:
"""Predict label probabilities for a list of texts, return a list of probability dictionaries."""
proba = self.model.predict_proba(texts)
return [
{self.id2label.get(i + 1, "Unknown"): float(pred) for i, pred in enumerate(pred)}
for pred in proba
]
def __call__(self, texts: List[str]) -> List[str]:
"""Enable class instance to act as a function for text classification."""
return self.classify(texts)
instantiate & classify :
# import PostClassifier if you defined it in another script etc
model_name="pszemraj/e5-small-LinkedCringe-setfit-skl-20it-2e"
classifier = PostClassifier(model_name)
# classify some posts (these should all be cringe maaaaybe noise)
posts = [
"๐ Innovation is our middle name! We're taking synergy to new heights and disrupting the market with our game-changing solutions. Stay tuned for the next paradigm shift! ๐ฅ #CorporateRevolution #SynergisticSolutions",
"๐ Attention all trailblazers! Our cutting-edge product is the epitome of excellence. It's time to elevate your success and ride the wave of unparalleled achievements. Join us on this journey towards greatness! ๐ #UnleashYourPotential #SuccessRevolution",
"๐ We're not just a company, we're a global force for change! Our world-class team is committed to revolutionizing industries and making a lasting impact. Together, let's reshape the future and leave a legacy that will be remembered for ages! ๐ช #GlobalTrailblazers #LegacyMakers",
"๐ฅ Harness the power of synergy and unlock your true potential with our transformative solutions. Together, we'll ignite a fire of success that will radiate across industries. Join the league of winners and conquer new frontiers! ๐ #SynergyChampions #UnleashThePowerWithin",
"๐ก Innovation alert! Our visionary team has cracked the code to redefine excellence. Get ready to be blown away by our mind-boggling breakthroughs that will leave your competitors in the dust. It's time to disrupt the status quo and embrace the future! ๐ #InnovationRevolution #ExcellenceUnleashed",
"๐ Welcome to the era of limitless possibilities! Our revolutionary platform will empower you to transcend boundaries and achieve unprecedented success. Together, let's shape a future where dreams become realities and ordinary becomes extraordinary! โจ #LimitlessSuccess #DreamBig",
"๐ฅ Brace yourselves for a seismic shift in the industry! Our game-changing product is set to revolutionize the way you work, think, and succeed. Say goodbye to mediocrity and join the league of pioneers leading the charge towards a brighter tomorrow! ๐ #IndustryDisruptors #PioneeringSuccess",
"๐ Attention all innovators and disruptors! It's time to break free from the chains of convention and rewrite the rulebook of success. Join us on this exhilarating journey as we create a new chapter in the annals of greatness. The sky's not the limitโit's just the beginning! ๐ซ #BreakingBarriers #UnleashGreatness",
"๐ Unlock the secret to unprecedented achievements with our exclusive formula for success. Our team of experts has distilled years of wisdom into a powerful elixir that will propel you to the zenith of greatness. It's time to embrace the extraordinary and become a legend in your own right! ๐ฅ #FormulaForSuccess #RiseToGreatness",
"๐ Step into the realm of infinite possibilities and seize the keys to your success. Our groundbreaking solutions will unlock doors you never knew existed, propelling you towards a future filled with limitless growth and prosperity. Dare to dream big and let us be your catalyst for greatness! ๐ #UnlockYourPotential #LimitlessSuccess"
]
post_preds = classifier(posts)
print(post_preds)
eval - detailed
***** Running evaluation *****
{'accuracy': 0.8,
'based_model_id': 'intfloat/e5-small',
'tuned_model_id': 'e5-small-LinkedCringe-setfit-skl-20it-2e'}
# 10-post results
['cringe',
'cringe',
'info',
'cringe',
'cringe',
'cringe',
'cringe',
'cringe',
'cringe',
'cringe']
BibTeX entry and citation info
Note: this is for
setfit
and not this checkpoint.
@article{https://doi.org/10.48550/arxiv.2209.11055,
doi = {10.48550/ARXIV.2209.11055},
url = {https://arxiv.org/abs/2209.11055},
author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {Efficient Few-Shot Learning Without Prompts},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}