Gupshup

GupShup: Summarizing Open-Domain Code-Switched Conversations EMNLP 2021 Paper: https://aclanthology.org/2021.emnlp-main.499.pdf Github: https://github.com/midas-research/gupshup

Dataset

Please request for the Gupshup data using this Google form.

Dataset is available for Hinglish Dilaogues to English Summarization(h2e) and English Dialogues to English Summarization(e2e). For each task, Dialogues/conversastion have .source(train.source) as file extension whereas Summary has .target(train.target) file extension. ".source" file need to be provided to input_path and ".target" file to reference_path argument in the scripts.

Models

All model weights are available on the Huggingface model hub. Users can either directly download these weights in their local and provide this path to model_name argument in the scripts or use the provided alias (to model_name argument) in scripts directly; this will lead to download weights automatically by scripts.

Model names were aliased in "gupshup_TASK_MODEL" sense, where "TASK" can be h2e,e2e and MODEL can be mbart, pegasus, etc., as listed below.

1. Hinglish Dialogues to English Summary (h2e)

Model Huggingface Alias
mBART midas/gupshup_h2e_mbart
PEGASUS midas/gupshup_h2e_pegasus
T5 MTL midas/gupshup_h2e_t5_mtl
T5 midas/gupshup_h2e_t5
BART midas/gupshup_h2e_bart
GPT-2 midas/gupshup_h2e_gpt

2. English Dialogues to English Summary (e2e)

Model Huggingface Alias
mBART midas/gupshup_e2e_mbart
PEGASUS midas/gupshup_e2e_pegasus
T5 MTL midas/gupshup_e2e_t5_mtl
T5 midas/gupshup_e2e_t5
BART midas/gupshup_e2e_bart
GPT-2 midas/gupshup_e2e_gpt

Inference

Using command line

  1. Clone this repo and create a python virtual environment (https://docs.python.org/3/library/venv.html). Install the required packages using
git clone https://github.com/midas-research/gupshup.git
pip install -r requirements.txt
  1. run_eval script has the following arguments.

Please make sure you have downloaded the Gupshup dataset using the above google form and provide the correct path to these files in the argument's input_path and refrence_path. Or you can simply put test.source and test.target in data/h2e/(hinglish to english) or data/e2e/(english to english) folder. For example, to generate English summaries from Hinglish dialogues using the mbart model, run the following command

python run_eval.py \
    --model_name midas/gupshup_h2e_mbart \
    --input_path  data/h2e/test.source \
    --save_path generated_summary.txt \
    --reference_path data/h2e/test.target \
    --score_path scores.txt \
    --bs 8

Another example, to generate English summaries from English dialogues using the Pegasus model

python run_eval.py \
    --model_name midas/gupshup_e2e_pegasus \
    --input_path  data/e2e/test.source \
    --save_path generated_summary.txt \
    --reference_path data/e2e/test.target \
    --score_path scores.txt \
    --bs 8

Please create an issue if you are facing any difficulties in replicating the results.

References

Please cite [1] if you found the resources in this repository useful.

[1] Mehnaz, Laiba, Debanjan Mahata, Rakesh Gosangi, Uma Sushmitha Gunturi, Riya Jain, Gauri Gupta, Amardeep Kumar, Isabelle G. Lee, Anish Acharya, and Rajiv Shah. GupShup: Summarizing Open-Domain Code-Switched Conversations

@inproceedings{mehnaz2021gupshup,
  title={GupShup: Summarizing Open-Domain Code-Switched Conversations},
  author={Mehnaz, Laiba and Mahata, Debanjan and Gosangi, Rakesh and Gunturi, Uma Sushmitha and Jain, Riya and Gupta, Gauri and Kumar, Amardeep and Lee, Isabelle G and Acharya, Anish and Shah, Rajiv},
  booktitle={Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing},
  pages={6177--6192},
  year={2021}
}