Open In Colab

22 Language Identifier - BERT

This model is trained to identify the following 22 different languages.

Loading the model

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("SharanSMenon/22-languages-bert-base-cased")

model = AutoModelForSequenceClassification.from_pretrained("SharanSMenon/22-languages-bert-base-cased")

Inference

def predict(sentence):
  tokenized = tokenizer(sentence, return_tensors="pt")
  outputs = model(**tokenized)
  return model.config.id2label[outputs.logits.argmax(dim=1).item()]

Examples

sentence1 = "in war resolution, in defeat defiance, in victory magnanimity"
predict(sentence1) # English

sentence2 = "en la guerra resolución en la derrota desafío en la victoria magnanimidad"
predict(sentence2) # Spanish

sentence3 = "هذا هو أعظم إله على الإطلاق"
predict(sentence3) # Arabic