linktransformer transformers text-classification tabular-classification

dell-research-harvard/linktransformer-models-test

This model is part of the LinkTransformer ecosystem. While rooted in the a standard HuggingFace Transformer, this specific instance is tailored for text classification tasks. It classifies input sentences or paragraphs into specific categories or labels, leveraging the power of transformer architectures.

The base model for this classifier is: roberta. It is pretrained for the language: - en.

Labels are mapped to integers as follows:

This is a LinkTransformer model for classification of text into 'Protest', 'Riot' or 'Neither' classes. It was trained on annotated newspaper articles.

Usage with LinkTransformer

After installing LinkTransformer:

pip install -U linktransformer

Employ the model for text classification tasks:

import linktransformer as lt
df_clf_output = lt.classify_rows(df, on=["col_of_interest"], model="dell-research-harvard/linktransformer-models-test")

Training

Training your own LinkTransformer Classification Model

With the provided tools, you can train a custom classification model:

from linktransformer import train_clf_model

best_model_path, best_metric, label_map = train_clf_model(
    data="path_to_dataset.csv",
    model="you-model-path-or-name",
    on=["col_of_interest"],
    label_col_name="label_column_name",
    lr=5e-5,
    batch_size=16,
    epochs=3
)

Evaluation Results

<!--- Describe how your model was evaluated -->

Evaluation is typically based on metrics like accuracy, F1-score, precision, and recall.

Citing & Authors

@misc{arora2023linktransformer,
                  title={LinkTransformer: A Unified Package for Record Linkage with Transformer Language Models},
                  author={Abhishek Arora and Melissa Dell},
                  year={2023},
                  eprint={2309.00789},
                  archivePrefix={arXiv},
                  primaryClass={cs.CL}
}