GPT2 Medium 4 Persian
This is part of the Flax/Jax Community Week, organized by HuggingFace and TPU usage sponsored by Google.
Team Members
Dataset
We used Oscar dataset, which is a huge multilingual corpus obtained by language classification and filtering of the Common Crawl corpus.
How To Use
You can use this model directly with a pipeline for text generation.
from transformers import pipeline, AutoTokenizer, GPT2LMHeadModel
tokenizer = AutoTokenizer.from_pretrained('flax-community/gpt2-medium-persian')
model = GPT2LMHeadModel.from_pretrained('flax-community/gpt2-medium-persian')
generator = pipeline('text-generation', model, tokenizer=tokenizer, config={'max_length':100})
generated_text = generator('در یک اتفاق شگفت انگیز، پژوهشگران')
For using Tensorflow import TFGPT2LMHeadModel instead of GPT2LMHeadModel.
Demo
... SOON
Evaluation
... SOON