exbert

GIRT-gpt2

Model description

This is the smallest version of GIRT with gpt2 as base (124M parameters).

How to use

You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility:

>>> from transformers import pipeline, set_seed
>>> generator = pipeline('text-generation',model='kargaranamir/GIRT-gpt2')
>>> set_seed(42)
>>> prompt = "<|startofissuetemplate|><|startofmetadata|>---\nrepo_name: huggingface\nname: bug report\nabout:"
>>> generator(prompt, num_beams=5, no_repeat_ngram_size=2, max_length=400, num_return_sequences=1)

[{'generated_text': "<|startofissuetemplate|><|startofmetadata|>---
repo_name: huggingface
name: bug report
about: Create a report to help us improve
title: ''
labels: 'bug'
---**Is your bug related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

**Describe the solution you'd like** (e.g. '....' or '...' if you have a suggestion)
If applicable, add any other context or screenshots about the feature request here.
"}]

Here is how to use this model to get the features of a given text in PyTorch:

>>> from transformers import GPT2Tokenizer, GPT2Model
>>> tokenizer = GPT2Tokenizer.from_pretrained('kargaranamir/GIRT-gpt2')
>>> model = GPT2Model.from_pretrained('kargaranamir/GIRT-gpt2')
>>> text = "<|startofissuetemplate|><|startofmetadata|>---\nrepo_name: huggingface\nname: bug report\nabout:"
>>> encoded_input = tokenizer(text, return_tensors='pt')
>>> output = model(**encoded_input)

Training Data

Training data comes from GIRT-Data paper and codebase.

Citation

If you find this model useful, cite the data paper.

@article{nikeghbal2023girt,
  title={GIRT-Data: Sampling GitHub Issue Report Templates},
  author={Nikeghbal, Nafiseh and Kargaran, Amir Hossein and Heydarnoori, Abbas and Sch{\"u}tze, Hinrich},
  journal={arXiv preprint arXiv:2303.09236},
  year={2023}
}