Punctuator for Uncased English

The model is fine-tuned based on DistilBertForTokenClassification for adding punctuations to plain text (uncased English)

Usage

from transformers import DistilBertForTokenClassification, DistilBertTokenizerFast

model = DistilBertForTokenClassification.from_pretrained("Qishuai/distilbert_punctuator_en")
tokenizer = DistilBertTokenizerFast.from_pretrained("Qishuai/distilbert_punctuator_en")

Model Overview

Training data

Combination of following three dataset:

Model Performance