Model Details
Model Description
This is the tokenizer associated with the long context fine tuning experiments we have published at:
- https://github.com/abacusai/long-context
- http://arxiv.org/abs/2308.10882
To avoid ambiguity around the tokenizer associated with these fine-tuned models we have published this version for people to use when testing these models. We expect that either the Llama 1 or Llama 2 tokenizers should work without issue with all these models. However, you can use this tokenizer if it is easier.
Usage
The tokenizer can be loaded using AutoTokenizer
and setting use_fast=False
. We found that
fast tokenizers have poor performance on long contexts. Using the code in the repository you
can load this tokenizer with the following code:
from models import load_tokenizer
tokenizer = load_tokenizer()