Label Semantics:

Label 0: Non-crystallizable (Negative)

Label 1: Crystallizable (Positive)

Dataset

  1. DeepCrystal Train
  2. DeepCrystal Test
  3. BCrystal Test
  4. SP Test
  5. TR Test

Model

ESMCrystal_t12_35M_v2

ESMCrystal_t12_35M_v2 is a state-of-the-art protein crystallization prediction model finetuned on esm2_t12_35M_UR50D, having 12 layers and 35M parameters with size of approx. 136MB using transfer learning to predict whether an input protein sequence will crystallize or not.

Accuracy :

Dataset Accuracy
DeepCrystal Test 0.8161222339304531
BCrystal test 0.8052602126468943
SP test 0.7637130801687764
TR test 0.8389328063241107

Comparision Table:

Dataset Count Positives Negatives TP FP FN TN Precision Recall F1 Accuracy ROC Mathew's Coefficient PPV NPV
DeepCrystalTest 1898 898 1000 579 319 30 970 0.64476615 0.95073892 0.76841407 0.81612223 0.9403 0.657526117 0.64476615 0.97
BCrystal Test 1787 891 896 573 318 30 866 0.64309764 0.95024876 0.76706827 0.80526021 0.9396 0.644635696 0.64309764 0.96651786
SP Test 237 148 89 97 51 5 84 0.65540541 0.95098039 0.776 0.76371308 0.9293 0.586069704 0.65540541 0.94382022
TR Test 1012 374 638 225 149 14 624 0.60160428 0.94142259 0.73409462 0.83893281 0.9562 0.658766192 0.60160428 0.97805643

Graphs

ROC-AUC Curve

PR-AUC Curve

Final scores :

precision recall f1-score support
non-crystallizable 0.75 0.97 0.85 1000
crystallizable 0.95 0.64 0.77 898
accuracy 0.82 1898
macro avg 0.85 0.81 0.81 1898
weighted avg 0.85 0.82 0.81 1898
precision recall f1-score support
non-crystallizable 0.73 0.97 0.83 896
crystallizable 0.95 0.64 0.77 891
accuracy 0.81 1787
macro avg 0.84 0.80 0.80 1787
weighted avg 0.84 0.81 0.80 1787
precision recall f1-score support
non-crystallizable 0.62 0.94 0.75 89
crystallizable 0.95 0.66 0.78 148
accuracy 0.76 237
macro avg 0.79 0.80 0.76 237
weighted avg 0.83 0.76 0.77 237
precision recall f1-score support
non-crystallizable 0.81 0.98 0.88 638
crystallizable 0.94 0.60 0.73 374
accuracy 0.84 1012
macro avg 0.87 0.79 0.81 1012
weighted avg 0.86 0.84 0.83 1012

Confusion matrix:

    | 579 | 319 |
    |  30 | 970 |
    | 573 | 318 |
    |  30 | 866 |
    | 97 |  51 |
    |  5 |  84 |
    | 225 | 149 |
    |  14 | 624 |

Metrics

roc score:

Mathews Coefficient:

NPV:

PPV:

Researchers:

Credits: