PDB Protein BPE Tokenizer A protein sequence tokenizer trained on PDB Sequences with vocabulary size = 1024