tokenizer

A copy of Meta's llama tokenizer, with three tokens added to mask PII: