GPT2 Medium 4 Persian

This is part of the Flax/Jax Community Week, organized by HuggingFace and TPU usage sponsored by Google.

Team Members

Mehrdad Farahani
Saied Alimoradi
M. Reza Zerehpoosh
Hooman Sedghamiz
Mazeyar Moeini Feizabadi

Dataset

We used Oscar dataset, which is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus.

How To Use

You can use this model directly with a pipeline for text generation.

from transformers import pipeline, AutoTokenizer, GPT2LMHeadModel
tokenizer = AutoTokenizer.from_pretrained('flax-community/gpt2-medium-persian')
model = GPT2LMHeadModel.from_pretrained('flax-community/gpt2-medium-persian')
generator = pipeline('text-generation', model, tokenizer=tokenizer, config={'max_length':100})
generated_text = generator('در یک اتفاق شگفت انگیز، پژوهشگران')

For using Tensorflow import TFGPT2LMHeadModel instead of GPT2LMHeadModel.

Demo

... SOON

Evaluation