beto-prescripciones-medicas

Fine-tunning BETO for detection of entities in medical prescriptions. More models and detailes can be found in our repository. This is a fine-tuned version of bert-clinical-scratch-wl-es from PLN group @ CMM. Which is a fine-tunned version bert-base-spanish-wwm-uncased (BETO) from DCC UChile.

This work is part of a project that aims to have entity recognition models on prescription data from Minsal (Chile Health Minsistry), for the MDS7201 course from Data Science MSc program at UChile. We use data from a Chilean Hospital, which is not available for public use, but we do provide the files with which we trained the models. The procedure is the following one:

The resulting evaluation metrics are the following ones

f1 precision recall
0.93 0.92 0.94

Collaborators:

Supervised by:

Example

We provide a demo. Here we introduce those funtions that are necessary in order to translate the model's output into understandable tags.

We also provide a complementary model: beto-prescripciones-medicas-ADMIN. This model tags the output of the current model of those tokens tagged as ADMIN. The demo includes such model, and the output of both is shown as an example below:

ACTIVE_PRINCIPLE FORMA_FARMA CANT-ADMIN UND-ADMIN VIA-ADMIN PERIODICITY DURATION
PARACETAMOL 500 MG COMPRIMIDO 1 COMPRIMIDO ORAL cada 6 horas durante 3 dias

This example is also shown in this notebook, which uses the model as a blackbox.

Reproducibility

Training parameters (fine-tunning on RegEx data):

training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=5,
    weight_decay=0.01
)

Training parameters (fine-tunning on human tagged data)

training_args = TrainingArguments(
    output_dir = "./results",
    evaluation_strategy = "epoch",
    learning_rate = 2e-5,
    per_device_train_batch_size = 16,
    per_device_eval_batch_size = 16,
    num_train_epochs = 20,
    weight_decay = 0.01,
)