textclassisification roberta robertabase sentimentanalysis nlp tweetanalysis tweet analysis sentiment positive newsanalysis

<b>BYRD'S I - ROBERTA BASED TWEET/REVIEW/TEXT ANALYSIS</b>

This is ro<b>BERT</b>a-base model fine tuned on 8 datasets with ~20 M tweets this model is suitable for english while can do a fine job on other languages.

<b>Git Repo:</b><a href = "https://github.com/Caffeine-Coders/Sentiment-Analysis-Project"> SENTIMENTANALYSIS-PROJECT</a>

<b>Demo:</b><a href = "https://byrdi.netlify.app/"> BYRD'S I</a>

<b>labels: </b> 0 -> Negative; 1 -> Neutral; 2 -> Positive;

<b>Model Metrics</b><br/> <b>Accuracy: </b> ~96% <br/> <b>Sparse Categorical Accuracy: </b> 0.9597 <br/> <b>Loss: </b> 0.1144 <br/> <b>val_loss -- [onLast_train] : </b> 0.1482 <br/> <b>Note: </b> Due to dataset discrepencies of Neutral data we published another model <a href = "https://huggingface.co/AK776161/birdseye_roberta-base-18"> Byrd's I only positive_negative model</a> to find only neutral data and have used <b>AdaBoot</b> method to get the accurate output.

Example of Classification:

from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoModelForSeq2SeqLM
from transformers import TFAutoModelForSequenceClassification
import pandas as pd
import numpy as np
import tensorflow

# model 0
tokenizer = AutoTokenizer.from_pretrained("AK776161/birdseye_roberta-base-18", use_fast = True)
model = AutoModelForSequenceClassification.from_pretrained("AK776161/birdseye_roberta-base-18", from_tf=True)
# model1 
tokenizer1 = AutoTokenizer.from_pretrained("AK776161/birdseye_roberta-base-tweet-eval", use_fast = True)
model1 = AutoModelForSequenceClassification.from_pretrained("AK776161/birdseye_roberta-base-tweet-eval",from_tf=True)

#-----------------------Adaboot technique---------------------------
def nparraymeancalc(arr1, arr2):
  returner = []
  for i in range(0,len(arr1)):
    if(arr1[i][1] < -7):
      arr1[i][1] = 0
    returner.append(np.mean([arr1[i],arr2[i]], axis = 0))
    
  return np.array(returner)

def predictions(tokenizedtext):
  output1 = model(**tokenizedtext)
  output2 = model1(**tokenizedtext)

  logits1 = output1.logits
  logits1 = logits1.detach().numpy()

  logits2 = output2.logits
  logits2 = logits2.detach().numpy()
  
  # print(logits1, logits2)
  predictionresult = nparraymeancalc(logits1,logits2)

  return np.array(predictionresult)

def labelassign(predictionresult):
  labels = []
  for i in predictionresult:
    label_id = i.argmax()
    labels.append(label_id)
  return labels

tokenizeddata = tokenizer("----YOUR_TEXT---", return_tensors = 'pt', padding = True, truncation = True)
result = predictions(tokenizeddata)

print(labelassign(result))

Output for "I LOVE YOU":

1) Positive: 0.994
2) Negative: 0.000
3) Neutral: 0.006