MUmairAB/bert-ner

The model training notebook is available on my GitHub Repo.

This model is a fine-tuned version of bert-base-cased on Cnoll2003 dataset. It achieves the following results on the evaluation set:

Train Loss: 0.0003
Validation Loss: 0.0880
Epoch: 19

How to use this model

#Install the transformers library
!pip install transformers

#Import the pipeline
from transformers import pipeline

#Import the model from HuggingFace
checkpoint = "MUmairAB/bert-ner"
model = pipeline(task="token-classification",
                 model=checkpoint)

#Use the model
raw_text = "My name is umair and i work at Swits AI in Antarctica."
model(raw_text)

Model description

Model: "tf_bert_for_token_classification"

_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 bert (TFBertMainLayer)      multiple                  107719680 
                                                                 
 dropout_37 (Dropout)        multiple                  0         
                                                                 
 classifier (Dense)          multiple                  6921      
                                                                 
=================================================================
Total params: 107,726,601
Trainable params: 107,726,601
Non-trainable params: 0
_________________________________________________________________

Intended uses & limitations

This model can be used for named entity recognition tasks. It is trained on Conll2003 dataset. The model can classify four types of named entities:

persons,
locations,
organizations, and
names of miscellaneous entities that do not belong to the previous three groups.

Training and evaluation data

The model is evaluated on seqeval metric and the result is as follows:

{'LOC': {'precision': 0.9655361050328227,
  'recall': 0.9608056614044638,
  'f1': 0.9631650750341064,
  'number': 1837},
 'MISC': {'precision': 0.8789144050104384,
  'recall': 0.913232104121475,
  'f1': 0.8957446808510638,
  'number': 922},
 'ORG': {'precision': 0.9075144508670521,
  'recall': 0.9366144668158091,
  'f1': 0.9218348623853211,
  'number': 1341},
 'PER': {'precision': 0.962011771000535,
  'recall': 0.9761129207383279,
  'f1': 0.9690110482349771,
  'number': 1842},
 'overall_precision': 0.9374068554396423,
 'overall_recall': 0.9527095254123191,
 'overall_f1': 0.944996244053084,
 'overall_accuracy': 0.9864013657502796}

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 2e-05, 'decay_steps': 17560, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.01}
training_precision: float32

Training results

Train Loss	Validation Loss	Epoch
0.1775	0.0635	0
0.0470	0.0559	1
0.0278	0.0603	2
0.0174	0.0603	3
0.0124	0.0615	4
0.0077	0.0722	5
0.0060	0.0731	6
0.0038	0.0757	7
0.0043	0.0731	8
0.0041	0.0735	9
0.0019	0.0724	10
0.0019	0.0786	11
0.0010	0.0843	12
0.0008	0.0814	13
0.0011	0.0867	14
0.0008	0.0883	15
0.0005	0.0861	16
0.0005	0.0869	17
0.0003	0.0880	18
0.0003	0.0880	19

Framework versions

Transformers 4.30.2
TensorFlow 2.12.0
Datasets 2.13.1
Tokenizers 0.13.3