Hebrew NER Model
NER model based on RoBERTa (HeRo) for entity extraction from hebrew text.
You can clone this model using the following command:
git clone https://huggingface.co/etzion/hebrew_ner
For implementation example, you can see the following Google Colab notebook:
Code example
import python libraries
from transformers import AutoModelForTokenClassification, AutoTokenizer
from transformers.pipelines.pt_utils import KeyDataset
from transformers import pipeline
import pandas as pd
import torch
Define model
model_path = 'etzion/hebrew_ner'
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForTokenClassification.from_pretrained(model_path)
ner = pipeline("ner", model=model, tokenizer=tokenizer, device=device, aggregation_strategy="average")
ner
<transformers.pipelines.token_classification.TokenClassificationPipeline at 0x7cda23542050>
Activate NER
text = 'דני ובני נסעו לתל אביב כדי להיפגש עם מנכ"ל גוגל'
pd.DataFrame(ner(text))
entity_group | score | word | start | end |
---|---|---|---|---|
PER | 0.999 | דני | 0 | 3 |
PER | 0.947 | ובני | 4 | 8 |
GPE | 0.999 | לתל אביב | 14 | 22 |
TTL | 0.823 | מנכ"ל גוגל | 37 | 47 |