Punctuator for Simplified Chinese

The model is fine-tuned based on DistilBertForTokenClassification for adding punctuations to plain text (simplified Chinese). The model is fine-tuned based on distilled model bert-base-chinese.

Usage

from transformers import DistilBertForTokenClassification, DistilBertTokenizerFast

model = DistilBertForTokenClassification.from_pretrained("Qishuai/distilbert_punctuator_zh")
tokenizer = DistilBertTokenizerFast.from_pretrained("Qishuai/distilbert_punctuator_zh")

Model Overview

Training data

Combination of following three dataset:

Model Performance