CTRL44 Simplification model
This is a pretrained version of the controllable simplification model presented in the NAACL 2022 paper "Controllable Sentence Simplification via Operation Classification". It was trained on the IRSD simplification dataset.
A control token is expected at the start of input sequences to dictate which simplification operation should be performed. This can either be done manually or with an operation classifier like this one.
Possible control tokens are: "<ident>", "<para>", "<ssplit>", and "<dsplit>".
How to use
Here is how to use this model in PyTorch:
from transformers import BartForConditionalGeneration, AutoTokenizer
model = BartForConditionalGeneration.from_pretrained("liamcripwell/ctrl44-simp")
tokenizer = AutoTokenizer.from_pretrained("liamcripwell/ctrl44-simp")
text = "<para> Barack Hussein Obama II is an American politician who served as the 44th president of the United States from 2009 to 2017."
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, num_beams=10, max_length=128)