deidentification medical notes ehr phi

Model Description

How to use

Dataset

I2B2 I2B2
TRAIN SET - 790 NOTES TEST SET - 514 NOTES
PHI LABEL COUNT PERCENTAGE COUNT PERCENTAGE
DATE 7502 43.69 4980 44.14
STAFF 3149 18.34 2004 17.76
HOSP 1437 8.37 875 7.76
AGE 1233 7.18 764 6.77
LOC 1206 7.02 856 7.59
PATIENT 1316 7.66 879 7.79
PHONE 317 1.85 217 1.92
ID 881 5.13 625 5.54
PATORG 124 0.72 82 0.73
EMAIL 4 0.02 1 0.01
OTHERPHI 2 0.01 0 0
TOTAL 17171 100 11283 100

Training procedure

Results

Questions?

Post a Github issue on the repo: Robust DeID.