🌍 Vulture-180B

Vulture-180B is a further fine-tuned causal Decoder-only LLM built by Virtual Interactive (VILM), on top of the famous Falcon-180B by TII. We collected a new dataset from news articles and Wikipedia's pages of 12 languages (Total: 80GB) and continue the pretraining process of Falcon-180B. Finally, we construct a multilingual instructional dataset following Alpaca's techniques.

While Vulture-180B is an adapter freely usable under APACHE-2.0, Falcon-180B itself remains available only under the Falcon-180B TII License and Acceptable Use Policy. Users should ensure any commercial applications based on Vulture-180B comply with the restrictions on Falcon-180B's use.

Technical Report coming soon 🤗

Prompt Format

The reccomended model usage is:

A chat between a curious user and an artificial intelligence assistant.

USER:{user's question}<|endoftext|>ASSISTANT:

Model Details

Model Description

Acknowledgement

Out-of-Scope Use

Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.

Bias, Risks, and Limitations

Vulture-180B is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.

Recommendations

We recommend users of Vulture-180B to consider finetuning it for the specific set of tasks of interest, and for guardrails and appropriate precautions to be taken for any production use.

How to Get Started with the Model

To run inference with the model in full bfloat16 precision you need approximately 8xA100 80GB or equivalent.

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch
from peft import PeftModel

model = "tiiuae/falcon-180b"
adapters_name = 'vilm/vulture-180b'

tokenizer = AutoTokenizer.from_pretrained(model)
m = AutoModelForCausalLM.from_pretrained(model, torch_dtype=torch.bfloat16, device_map="auto" )
m = PeftModel.from_pretrained(m, adapters_name)

prompt = "A chat between a curious user and an artificial intelligence assistant.\n\nUSER:Thành phố Hồ Chí Minh nằm ở đâu?<|endoftext|>ASSISTANT:"

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

output = m.generate(input_ids=inputs["input_ids"],
                    attention_mask=inputs["attention_mask"],
                    do_sample=True,
                    temperature=0.6,
                    top_p=0.9,
                    max_new_tokens=50,)
output = output[0].to("cpu")
print(tokenizer.decode(output))