text-generation

Model Card for bt-opt-1.3b

Model Details

Model Description

Uses

Direct Use

This model can be used for the task of Text Generation

Downstream Use [Optional]

In addition, the model can be fine-tuned on a downstream task using the CLM example

Out-of-Scope Use

The model should not be used to intentionally create hostile or alienating environments for people.

Bias, Risks, and Limitations

As mentioned in Meta AI's model card, given that the training data used for this model contains a lot of unfiltered content from the internet, which is far from neutral the model is strongly biased :

Like other large language models for which the diversity (or lack thereof) of training data induces downstream impact on the quality of our model, OPT-175B has limitations in terms of bias and safety. OPT-175B can also have quality issues in terms of generation diversity and hallucination. In general, OPT-175B is not immune from the plethora of issues that plague modern large language models.

See model facebook/opt-1.3b model card for example biased predictions

The model creators noted in the associated paper

we found OPT-175B does not work well with declarative instructions or point-blank interrogatives. Prompting with such instructions tends to produce a simulation of a dialogue beginning with such an instruction, rather than an execution of the instruction. Future work into instruction learning, in the vein of InstructGPT (Ouyang et al., 2022), may alleviate these limitations. OPT-175B also tends to be repetitive and can easily get stuck in a loop.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

Training Details

Training Data

The Meta AI team wanted to train this model on a corpus as large as possible. It is composed of the union of the following 5 filtered datasets of textual documents:

The final training data contains 180B tokens corresponding to 800GB of data. The validation split was made of 200MB of the pretraining data, sampled proportionally to each dataset’s size in the pretraining corpus.

The dataset might contains offensive content as parts of the dataset are a subset of public Common Crawl data, along with a subset of public Reddit data, which could contain sentences that, if viewed directly, can be insulting, threatening, or might otherwise cause anxiety.

Alo see the dataset card in the associated paper.

Training Procedure

Preprocessing

The texts are tokenized using the GPT2 byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a vocabulary size of 50272. The inputs are sequences of 2048 consecutive tokens.

The 175B model was trained on 992 80GB A100 GPUs. The training duration was roughly ~33 days of continuous training

Speeds, Sizes, Times

More information needed

Evaluation

Testing Data, Factors & Metrics

Testing Data

More information needed

Factors

Metrics

More information needed

Results

More information needed

Model Examination

More information needed

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Technical Specifications [optional]

Model Architecture and Objective

OPTForCausalLM

Compute Infrastructure

More information needed

Hardware

More information needed

Software

Transformers_version: 4.22.1

Citation

BibTeX:

@misc{zhang2022opt,
      title={OPT: Open Pre-trained Transformer Language Models}, 
      author={Susan Zhang and Stephen Roller and Naman Goyal and Mikel Artetxe and Moya Chen and Shuohui Chen and Christopher Dewan and Mona Diab and Xian Li and Xi Victoria Lin and Todor Mihaylov and Myle Ott and Sam Shleifer and Kurt Shuster and Daniel Simig and Punit Singh Koura and Anjali Sridhar and Tianlu Wang and Luke Zettlemoyer},
      year={2022},
      eprint={2205.01068},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Glossary [optional]

More information needed

More Information [optional]

More information needed

Model Card Authors [optional]

Opentensor in collaboration with Ezi Ozoani and the Hugging Face team

Model Card Contact

More information needed

How to Get Started with the Model

Use the code below to get started with the model.

<details> <summary> Click to expand </summary>

from transformers import AutoTokenizer, AutoModelForCausalLM
 
tokenizer = AutoTokenizer.from_pretrained("opentensor/bt-opt-1.3b")
 
model = AutoModelForCausalLM.from_pretrained("opentensor/bt-opt-1.3b")
 

</details>