BertNER

This model fine-tuned for the Named Entity Recognition (NER) task on a mixed NER dataset collected from ARMAN, PEYMA, and WikiANN that covered ten types of entities:

Date (DAT)
Event (EVE)
Facility (FAC)
Location (LOC)
Money (MON)
Organization (ORG)
Percent (PCT)
Person (PER)
Product (PRO)
Time (TIM)

Dataset Information

	Records	B-DAT	B-EVE	B-FAC	B-LOC	B-MON	B-ORG	B-PCT	B-PER	B-PRO	B-TIM	I-DAT	I-EVE	I-FAC	I-LOC	I-MON	I-ORG	I-PCT	I-PER	I-PRO	I-TIM
Train	29133	1423	1487	1400	13919	417	15926	355	12347	1855	150	1947	5018	2421	4118	1059	19579	573	7699	1914	332
Valid	5142	267	253	250	2362	100	2651	64	2173	317	19	373	799	387	717	270	3260	101	1382	303	35
Test	6049	407	256	248	2886	98	3216	94	2646	318	43	568	888	408	858	263	3967	141	1707	296	78

Evaluation

The following tables summarize the scores obtained by model overall and per each class.

Overall

Model	accuracy	precision	recall	f1
Bert	0.995086	0.953454	0.961113	0.957268

Per entities

	number	precision	recall	f1
DAT	407	0.860636	0.864865	0.862745
EVE	256	0.969582	0.996094	0.982659
FAC	248	0.976190	0.991935	0.984000
LOC	2884	0.970232	0.971914	0.971072
MON	98	0.905263	0.877551	0.891192
ORG	3216	0.939125	0.954602	0.946800
PCT	94	1.000000	0.968085	0.983784
PER	2645	0.965244	0.965974	0.965608
PRO	318	0.981481	1.000000	0.990654
TIM	43	0.692308	0.837209	0.757895

How To Use

You use this model with Transformers pipeline for NER.

Installing requirements

pip install transformers

How to predict using pipeline

from transformers import AutoTokenizer
from transformers import AutoModelForTokenClassification  # for pytorch
from transformers import TFAutoModelForTokenClassification  # for tensorflow
from transformers import pipeline


model_name_or_path = "HooshvareLab/bert-fa-zwnj-base-ner" 
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForTokenClassification.from_pretrained(model_name_or_path)  # Pytorch
# model = TFAutoModelForTokenClassification.from_pretrained(model_name_or_path)  # Tensorflow

nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "در سال ۲۰۱۳ درگذشت و آندرتیکر و کین برای او مراسم یادبود گرفتند."

ner_results = nlp(example)
print(ner_results)

Questions?

Post a Github issue on the ParsNER Issues repo.