generated_from_trainer

<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->

mt5_correct_puntuation

本模型使用中文維基百科語料微調 google/mt5-base預訓練模型之中文標點符號訂正器。目前之準確率為 0.794。

This is a google/mt5-base model trained on Mandarin Wikipedia corpus and finetuned for Mandarin punctuation correction. Currently the accuracy is 0.794.

Datasets

模型使用中文維基百科公開資料微調。將取得的文本以「。」或「,」切分為不超過100字的句子。因為逗號和句號數量壓倒性地多,為盡量平衡資料集,僅保留包含冒號、分號、驚嘆號、問號的句子,作為正確句。將正確句之「,。:;、!?」隨機以「,。:;、!?」,製作為不正確句。訓練用句子共有291,112句。

Training hyperparameters

The following hyperparameters were used during training:

Framework versions