This model is flan-t5-xxl fine-tuned on databricks-dolly-15k. The model was fine-tuned in mid-april 2023 by Jack Hessel, who is a Research Scientist at AI2. Here's a screenshot of a demo I made with it:

<p align="center"> <img src="https://huggingface.co/jmhessel/flant5-dolly-xxl/resolve/main/demo_image.png" width=600px> </p>

fine-tuning details

Fine-tuning occurred using the t5x library on a v3-128 TPU. Input/output sequence lengths were set to 512/256. ~5K fine-tuning steps were taken with an effective batch size of ~320, with model checkpointing every 100 steps. Adafactor with a learning rate of .0007 was used. The checkpoint with the best balance of validation BLEU/loss was selected. I added very crude support for newline inputs and outputs for t5. Here's how you process the inputs and outputs to get newlines --- my apologies for the crude hack!

text_in = ' '.join(text_in.replace('\n', ' <_LB_> ').replace('\t', ' <_T_> ').split())
outputs = model.generate(inputs["input_ids"], max_new_tokens=256, temperature=temp, top_p=top_p, do_sample=True)
out = tokenizer.decode(outputs[0], skip_special_tokens=True, clean_up_tokenization_spaces=False)
out = out.replace(' _LB_> ', '\n').replace(' _T_> ', '\t').replace('_LB_>', '\n').replace('_T_> ', '\t').replace('_LB_>', '\n').replace('_T_>', '\t')

why?

The purpose of fine-tuning the model was to better understand how flan-t5 performs when simply fine-tuned on a human-request-style instruction tuning corpus.

Observations:

why release?

While it is fairly easy to get most (un-RLHFed) language models to output offensive text, there's something jarring about playing with a language model where you can make potentially offensive imperative requests. I hope this model can help folks explore instruction tuned models, and further understand the importance of safeguards.