Intro to 0-pos

With all the recent work in extending rotary positional training to create long context, we asked, do we need positional information at all?

Can you finetune transformers to remove the need for positional info?

The answer is: Yes!

This model is pythia-1b finetuned to zero out the usage of rotary emebddings.

See our Notebook for training code and examples of inferences using other models with 0-pos.

Usage

git clone https://huggingface.co/ontocord/pythia_1b_0_pos

from transformers import AutoTokenizer
from pythia_1b_0_pos.modeling_pythia_0_pos import GPTNeoXForCausalLM
model = GPTNeoXForCausalLM.from_pretrained("pythia_1b_0_pos")
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-1b")
tokenizer.batch_decode(model.generate(**tokenizer("### Intro.\nWhen in the course of human history", return_tensors="pt"), repetition_penalty=1.1, no_repeat_ngram_size=4, max_length=700))

Will output:

## Intro.
When in the course of human history, there have been many wars and conflicts between nations. The most famous one is the American Civil War (1861–65). This was a war that lasted for more than two years. It was fought over slavery, which had been abolished by the U.S. Constitution. In this case, the United States were at war with Mexico.

The United States were not involved in any of these wars. They were involved in the following:

• The Mexican-American War (1846)

• Spanish-American War
• The Philippine Insurrection War

Lessons

Lesson so far is (1) it appears that the training will teach the model to produce text at about the same length as the average training example length, (2) we should use a relatiely low LR but not too low (1e-7 to 5e-8) (3) we should have a long decay of the 0-pos factor, and (4) training should use in domain text (we tried wikitext but minipile was better).

Cite Us

Please cite Ontocord.AI and this model if you find this research helpful.