Huggingface thinks this is a model, but it's just a tokenizer. Trained on https://huggingface.co/datasets/joelito/MultiLegalPile_Wikipedia_Filtered