protein language model

ProtBert-BFD finetuned on Rosetta 20,40,60AA dataset

This model is finetuned to predict Rosetta fold energy using a dataset of 300k protein sequences: 100k of 20AA, 100k of 40AA, and 100k of 60AA

Current model in this repo: prot_bert_bfd-finetuned-032822_1323

Performance

prot_bert_bfd from ProtTrans

The starting pretrained model is from ProtTrans, trained on 2.1 billion proteins from BFD. It was trained on protein sequences using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository.

Created by Ladislav Rampasek