BART fine-tuned for keyphrase generation
<!-- Provide a quick summary of what the model is/does. -->
This is the <a href="https://huggingface.co/facebook/bart-base">bart-base</a> (<a href = "https://arxiv.org/abs/1910.13461">Lewis et al.. 2019</a>) model <a href="https://arxiv.org/abs/2209.03791">finetuned for the keyphrase generation task</a> on the fragments of the following corpora:
- Krapivin (<a href = "http://eprints.biblio.unitn.it/1671/1/disi09055%2Dkrapivin%2Dautayeu%2Dmarchese.pdf">Krapivin et al., 2009</a>)
- Inspec (<a href = "https://aclanthology.org/W03-1028.pdf">Hulth, 2003</a>)
- KPTimes (<a href = "https://aclanthology.org/W19-8617.pdf">Gallina, 2019</a>)
- DUC-2001 (<a href = "https://cdn.aaai.org/AAAI/2008/AAAI08-136.pdf">Wan, 2008</a>)
- PubMed (<a href = "https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=08b75d31a90f206b36e806a7ec372f6f0d12457e">Schutz, 2008</a>)
- NamedKeys (<a href = "https://joyceho.github.io/assets/pdf/paper/gero-bcb19.pdf">Gero & Ho, 2019</a>).
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("beogradjanka/bart_finetuned_keyphrase_extraction")
model = AutoModelForSeq2SeqLM.from_pretrained("beogradjanka/bart_finetuned_keyphrase_extraction")
text = "In this paper, we investigate cross-domain limitations of keyphrase generation using the models for abstractive text summarization.\
We present an evaluation of BART fine-tuned for keyphrase generation across three types of texts, \
namely scientific texts from computer science and biomedical domains and news texts. \
We explore the role of transfer learning between different domains to improve the model performance on small text corpora."
tokenized_text = tokenizer.prepare_seq2seq_batch([text], return_tensors='pt')
translation = model.generate(**tokenized_text)
translated_text = tokenizer.batch_decode(translation, skip_special_tokens=True)[0]
print(translated_text)
Training Hyperparameters
The following hyperparameters were used during training:
- learning_rate: 4e-5
- train_batch_size: 8
- optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
- num_epochs: 6
BibTeX:
@article{glazkova2023applying,
title={Applying Transformer-Based Text Summarization for Keyphrase Generation},
author={Glazkova, Anna and Morozov, Dmitry},
journal={Lobachevskii Journal of Mathematics},
volume={44},
number={1},
pages={123--136},
year={2023},
doi={10.1134/S1995080223010134}
}