Standard roberta-large model fine-tuned for one pass over the entire Pile dataset.

See Test-time training on nearest neighbors for large language models for details.