中文预训练Longformer模型 | Longformer_ZH with PyTorch

相比于Transformer的O(n^2)复杂度,Longformer提供了一种以线性复杂度处理最长4K字符级别文档序列的方法。Longformer Attention包括了标准的自注意力与全局注意力机制,方便模型更好地学习超长序列的信息。

Compared with O(n^2) complexity for Transformer model, Longformer provides an efficient method for processing long-document level sequence in Linear complexity. Longformer’s attention mechanism is a drop-in replacement for the standard self-attention and combines a local windowed attention with a task motivated global attention.

我们注意到关于中文Longformer或超长序列任务的资源较少,因此在此开源了我们预训练的中文Longformer模型参数, 并提供了相应的加载方法,以及预训练脚本。

There are not so much resource for Chinese Longformer or long-sequence-level chinese task. Thus we open source our pretrained longformer model to help the researchers.

加载模型 | Load the model

您可以使用谷歌云盘或百度网盘下载我们的模型
You could get Longformer_zh from Google Drive or Baidu Yun.

我们同样提供了Huggingface的自动下载
We also provide auto load with HuggingFace.Transformers.

from Longformer_zh import LongformerZhForMaksedLM
LongformerZhForMaksedLM.from_pretrained('ValkyriaLenneth/longformer_zh')

注意事项 | Notice

关于预训练 | About Pretraining

效果测试 | Evaluation

CCF Sentiment Analysis

Model Dev F
Bert 80.3
Bert-wwm-ext 80.5
Roberta-mid 80.5
Roberta-large 81.25
Longformer_SC 79.37
Longformer_ZH 80.51

Pretraining BPC

Model BPC
Longformer before training 14.78
Longformer after training 3.10

CMRC(Chinese Machine Reading Comprehension)

Model F1 EM
Bert 85.87 64.90
Roberta 86.45 66.57
Longformer_zh 86.15 66.84

Chinese Coreference Resolution

Model Conll-F1 Precision Recall
Bert 66.82 70.30 63.67
Roberta 67.77 69.28 66.32
Longformer_zh 67.81 70.13 65.64

致谢

感谢东京工业大学 奥村·船越研究室 提供算力。

Thanks Okumula·Funakoshi Lab from Tokyo Institute of Technology who provides the devices and oppotunity for me to finish this project.