pytorch

Jordan_Name_Disambiguation

This model is a fine-tuned version of distilbert-base-uncased used for token classifcation. It achieves the following results on the evaluation set:

Model Description

This model is used to differentiate a mention of the country "jordan" as a place the veteran served vs the mention of "jordan" as a name, "jordan" as another location i.e. West Jordan Utah, or the country "jordan" as part of form language.

Intended uses & limitations

This is only intended to be used to determine if "jordan" is in the context of a service location.

This model was trained on a limited amount of data for a narrow classification task.

Training Data

The training data has two columns "text" and "service_location". The text column contains a snippet of text containing the word "Jordan" in various contexts. The service_location column indicates if the mention of the word "Jordan" is referencing a service location (1) or not (0).

NOTE: The training data has PII and is only accessible to team members on S3.

Data Analysis

The chart below displays the distribution of examples containing a "jordan" service token

Label Train Test
0 668 166
1 408 104

Training Procedure

Preprocessing

The data went through the following preprocessing steps:

Training Hyperparameters

Training Results

Training Loss Epoch Step Validation Loss Precision Recall F1 Accuracy
0.0891 1.0 45 0.0069 0.934 0.934 0.934 0.9984
0.0036 2.0 90 0.0037 0.9902 0.9528 0.9712 0.9993
0.0012 3.0 135 0.0025 0.9811 0.9811 0.9811 0.9995

Framework Versions