NLP-reviews

This model is a fine-tuned version of bert-base-uncased on the Sentiment Labelled Sentences Data Set.

Model description

Given a sentence, this model will return the probabilities of it having a positive or negative sentiment, and the probabilities that it would be a review you would find from amazon.com, imdb.com, or yelp.com.

It is a multi-label classification model which is able to determine both the sentiment of text and a grouping the text belongs to.

Training and evaluation data

The data is obtained from the procured Sentiment Labelled Sentences Data Set.

Each entry has a sentiment score: 1 for positive or 0 for negative.

The data comes from one of three different websites:

amazon.com
imdb.com
yelp.com

There are 500 positive and 500 negative sentences from each website, selected randomly from a larger dataset of reviews, and were chosen based on having clear positive or negative connotation.

This was split into a 90-10 train-test split for model training and evaluation.

The code used to train the model is at https://github.com/josephtkim/huggingface-sentiment-analysis.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss
No log	1.0	338	0.2270
0.2235	2.0	676	0.2737
0.0644	3.0	1014	0.3171
0.0644	4.0	1352	0.3511
0.0193	5.0	1690	0.3726
0.0119	6.0	2028	0.3638
0.0119	7.0	2366	0.3337
0.0043	8.0	2704	0.3424
0.0019	9.0	3042	0.3387
0.0019	10.0	3380	0.3467

Framework versions

Transformers 4.29.1
Pytorch 2.0.0+cu118
Datasets 2.12.0
Tokenizers 0.13.3