Hebrew NER Model

NER model based on RoBERTa (HeRo) for entity extraction from hebrew text.

You can clone this model using the following command:

git clone https://huggingface.co/etzion/hebrew_ner

For implementation example, you can see the following Google Colab notebook:

Google Colab - Hebrew NER

Code example

import python libraries

from transformers import AutoModelForTokenClassification, AutoTokenizer
from transformers.pipelines.pt_utils import KeyDataset
from transformers import pipeline
import pandas as pd
import torch

Define model

model_path = 'etzion/hebrew_ner'

device     = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
tokenizer  = AutoTokenizer.from_pretrained(model_path)
model      = AutoModelForTokenClassification.from_pretrained(model_path)
ner        = pipeline("ner", model=model, tokenizer=tokenizer, device=device, aggregation_strategy="average")
ner

<transformers.pipelines.token_classification.TokenClassificationPipeline at 0x7cda23542050>

Activate NER

text = 'דני ובני נסעו לתל אביב כדי להיפגש עם מנכ"ל גוגל'

pd.DataFrame(ner(text))
entity_group score word start end
PER 0.999 דני 0 3
PER 0.947 ובני 4 8
GPE 0.999 לתל אביב 14 22
TTL 0.823 מנכ"ל גוגל 37 47

license: MIT