Model Name: BERT-base_NER-ar

Model Description :

BERT-base_NER-ar is a fine-tuned BERT multilingual base model for Named Entity Recognition (NER) in Arabic. The base model was pretrained on a diverse set of languages and fine-tuned specifically for the task of NER using the "wikiann" dataset. This model is case-sensitive, distinguishing between different letter cases, such as "english" and "English."

Dataset

The model was fine-tuned on the wikiann dataset, which is a multilingual named entity recognition dataset. It contains Wikipedia articles annotated with three types of named entities: LOC (location), PER (person), and ORG (organization). The annotations are in the IOB2 format. The dataset supports 176 of the 282 languages from the original WikiANN corpus.

Supported Tasks and Leaderboards

The primary supported task for this model is named entity recognition (NER) in Arabic. However, it can also be used to explore the zero-shot cross-lingual capabilities of multilingual models, allowing for NER in various languages.

Use Cases

Limitations

Usage :

from transformers import AutoModelForTokenClassification, AutoTokenizer
import torch 
# Load the fine-tuned model
model = AutoModelForTokenClassification.from_pretrained("ayoubkirouane/BERT-base_NER-ar")
tokenizer = AutoTokenizer.from_pretrained("ayoubkirouane/BERT-base_NER-ar")

# Tokenize your input text
text = "عاصمة فلسطين هي القدس الشريف."
tokens = tokenizer.tokenize(tokenizer.decode(tokenizer.encode(text)))

# Convert tokens to input IDs
input_ids = tokenizer.convert_tokens_to_ids(tokens)

# Perform NER inference
with torch.no_grad():
    outputs = model(torch.tensor([input_ids]))

# Get the predicted labels for each token
predicted_labels = outputs[0].argmax(dim=2).cpu().numpy()[0]

# Map label IDs to human-readable labels
predicted_labels = [model.config.id2label[label_id] for label_id in predicted_labels]

# Print the tokenized text and its associated labels
for token, label in zip(tokens, predicted_labels):
    print(f"Token: {token}, Label: {label}")