Nasopharyngeal carcinoma Cancer

Background

This model was built on Microsoft's BERT trained on PubMed uncased database (microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext). A number of (~500) radiology reports for staging nasopharyngeal carcinoma (NPC) written in our center by board-certified radiologist were retrospectively retrieved with ethics approval . To focus on NPC, incidental findings and unrelated observations are removed prior to training. In addition, the abbreviations for structures were replaced by the original words to facilitate the model of learning suffixes and prefixes that might indicate geographical locations (e.g. L neck -> left neck, IJC -> internal jugular chain).

A tokenizer was trained based on the original PubMed version, and the radiology reports were used to fine tune the PubMedBert. This fine tuned model has the weakness of unable to identify phrase or multi-word nouns, e.g. "nodal metastatases" is considered two separate words such that the BERT module tends to fill "nodes" when these two words are masked.

This model serve as a pilot analysis of whether it is possible to adopt a transformer based deep learning for radiology report corpus of NPC.

Affiliations

Imaging and Interventional Radiology,

Chinese University of Hong Kong

Training Losses

Epoch Training Loss Validation Loss
1 No log 3.474347
2 No log 3.174083
3 No log 2.944307
4 No log 2.674384
5 No log 2.574261
6 No log 2.390012
7 No log 2.209419
8 2.464700 2.107448
9 2.464700 1.974744
10 2.464700 1.841606
11 2.464700 1.783265
12 2.464700 1.674914
13 2.464700 1.572721
14 2.464700 1.546106
15 2.464700 1.507173
16 1.153500 1.445264
17 1.153500 1.394671
18 1.153500 1.345976
19 1.153500 1.312650
20 1.153500 1.256743
21 1.153500 1.233211
22 1.153500 1.213525
23 1.153500 1.182824
24 0.681100 1.164411
25 0.681100 1.128899
26 0.681100 1.145166
27 0.681100 1.079617
28 0.681100 1.087909
29 0.681100 1.102839
30 0.681100 1.066386
31 0.681100 1.094807
32 0.478400 1.060072
33 0.478400 1.016879
34 0.478400 0.999808
35 0.478400 0.987576
36 0.478400 1.011713
37 0.478400 0.996884
38 0.478400 1.018533
39 0.478400 1.015250
40 0.378400 0.945075
41 0.378400 0.950782
42 0.378400 1.004242
43 0.378400 0.984930
44 0.378400 0.966999
45 0.378400 0.988593
46 0.378400 0.970504
47 0.378400 0.976804
48 0.339400 1.001518
49 0.339400 0.986024
50 0.339400 0.987911