MT-LLaMA Model Card

Model details

Model type: MT-LLaMA is an open-source multi-task model trained by fine-tuning LLaMA on the massive tasks in P3 (i.e., T0 Train). Concretely, the used datasets during training and task taxonomy are listed below:

Organizations developing the model: The MT-LLaMA team with members from Alibaba Damo Academy and the Chinese University of Hong Kong.

Intended use

You can try the codes from our github repo.

Zero-shot Evaluation

We primarily follow the protocols of Bigscience T0 to assess the generalization capability of our Multi-task LLaMA to: (1) Unseen Datasets (i.e., datasets from seen tasks); (2) Unseen Tasks.

Prompt Format

Extractive QA:

     Input: Answer the question according to the context. Question: ${question}. Context: ${context}. Answer:
     Output: ${Answer}


  1. SST-2
    Input: ${sentence} Based on this review, would the user recommend this product? No or Yes?
    Output: Yes / No

Multiple-Choice QA:

  1. OpenbookQA
    Input: ${question} Which is the correct answer? - (A) ${choiceA} - (B) ${choiceB} - (C) ${choiceC} - (D) ${choiceD}
    Output: ${choiceA} / ${choiceB} / ${choiceC} / ${choiceD}

Sentence Completion:

  1. COPA
    Input: ${premise} {% if question == "cause" %} This happened because... {% else %} As a consequence... Help me pick the more plausible option: - ${text1} - ${text2}
    Output: ${text1} / ${text2}

Coreference Resolution:

  1. Winogrande:
    Input: ${sentence} In the previous sentence, does _ refer to ${option1} or ${option2}?
    Output: ${option1} / ${option2}

Word Sense Disambiguation:

  1. WiC
    Input: Does the word "${word}" have the same meaning in these two sentences? Yes, No? ${sentence1} ${sentence2}
    Output: ${sentence1} / ${sentence2}

Natural Language Inference:

  1. MNLI:
    Input: ${premise} Question: Does this imply that ${hypothesis}? Please response with 'Yes', 'No', or 'Maybe'.
    Output: Yes / No / Maybe
  2. RTE
    Input: Given ${premise} Is it guaranteed true that "${hypothesis}"? Yes or no?
    Output: Yes / no

Results on Unseen Datasets

Model XQuAD-en (F1/EM) TyDiQA-en (F1/EM) MLQA-en (F1/EM) SQuAD (F1/EM) SST-2 (Acc.) OpenbookQA (Acc.)
LLaMA-7b 9.5 / 2.0 14.3 / 2.6 13.4 / 3.3 29.4 / 11.5 50.5 32.4
MT-LLaMA-7b 42.3 / 31.1 38.9 / 26.9 45.4 / 31.5 85.9 / 77.6 92.6 38.2

Results on Unseen Tasks

Model COPA (Acc.) Winogrande (Acc.) WiC (Acc.) MNLI (Acc.) RTE (Acc.)
LLaMA-7b 56.0 49.3 51.7 30.2 52.7
MT-LLaMA-7b 88.0 54.9 52.2 49.6 79.1


If you find this resource useful, please cite the repo as follows:

  author = {Xu, Weiwen and Li, Xin and Bing, Lidong},
  title = {Multi-task Instruction-tuned LLaMA},
  year = 2023,
  url = {}