biology

GPTNeo

This study aims to develop a model for predicting the immunogenicity of new antigens, and predicts through the input of MHC molecules in humans and mice and the sequence of peptides corresponding to new antigens. So far, immunogenicity prediction models predict immunogenicity based only on the combination of MHC molecules and peptide sequences, and there is a biological limitation that does not consider the reactivity of T cells. This study alternately applied various embeddings used in prior studies to develop models that are biologically valid and have improved prediction performance, and analyzed protein sequences verbally using GPT, a key natural language processing technology. In the future, we will collect evidence on the learning and performance evaluation of the model produced in this study and the biological validity of the model.

Key keywords: immunogenicity, embedding, GPT, classification

Research supported with Cloud TPUs from Google's TPU Research Cloud (TRC)