bert-base-chinese-stock

This model is a fine-tuned version of bert-base-chinese on financial news. It achieves the following results on the evaluation set:

Loss: 0.0819
Precision: 0.8762
Recall: 0.9044
F1: 0.8901
Accuracy: 0.9751

Model description

為了自動化抽取新聞內包含的股票、金錢、人名、地區、日期、數量、和組織，我們使用財經新聞+人工標註的資料來fine-tune bert-base-chinese。

Usage

from transformers import pipeline
from transformers import AutoTokenizer


model_checkpoint = "JasonYan/bert-base-chinese-stock-ner"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
token_classifier = pipeline(
    "token-classification", model=model_checkpoint, tokenizer=tokenizer, aggregation_strategy="simple"
)
print(token_classifier("AI需求熱，帶台積電一起飛！劉德音：先進封裝供不應求、加快擴廠腳步"))

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 2

Training results

Training Loss	Epoch	Step	Validation Loss	Precision	Recall	F1	Accuracy
0.1559	0.2	3947	0.1232	0.7981	0.8431	0.8200	0.9601
0.1135	0.4	7894	0.1043	0.8163	0.8595	0.8373	0.9646
0.1039	0.6	11841	0.1007	0.8259	0.8775	0.8509	0.9664
0.098	0.8	15788	0.0937	0.8503	0.8799	0.8649	0.9688
0.0922	1.0	19735	0.0894	0.8534	0.8841	0.8685	0.9698
0.0745	1.2	23682	0.0911	0.8550	0.8935	0.8738	0.9703
0.0718	1.4	27629	0.0880	0.8637	0.8944	0.8788	0.9712
0.0708	1.6	31576	0.0842	0.8656	0.8975	0.8813	0.9722
0.0685	1.8	35523	0.0856	0.8688	0.9011	0.8847	0.9725
0.0668	2.0	39470	0.0832	0.8706	0.9023	0.8862	0.9729

Framework versions

Transformers 4.28.1
Pytorch 2.0.0
Datasets 2.14.4
Tokenizers 0.13.3