PDB Protein BPE Tokenizer

A protein sequence tokenizer trained on PDB Sequences with vocabulary size = 1024