Label Semantics:

Label 0: Non-crystallizable (Negative)

Label 1: Crystallizable (Positive)

Dataset

  1. DeepCrystal Train
  2. DeepCrystal Test
  3. BCrystal Test
  4. SP Test
  5. TR Test

Model

ESMCrystal_t6_8M_v1

ESMCrystal_t6_8M_v1 is a state-of-the-art protein crystallization prediction model finetuned on esm2_t6_8M_UR50D, having 6 layers and 8M parameters with the size of approx. 31.4MB using transfer learning to predict whether an input protein sequence will crystallize or not.

Accuracy :

Dataset Accuracy
DeepCrystal Test 0.7913593256059009
BCrystal test 0.7811975377728035
SP test 0.6962025316455697
TR test 0.8191699604743083

Comparision Table:

Count Positives Negatives TP FP FN TN Precision Recall F1 Accuracy ROC Mathew's Coefficient PPV NPV
Test 1898 898 1000 532 362 34 966 0.5950783 0.93992933 0.72876712 0.79091869 0.9467 0.611906376 0.5950783 0.966
BCrystal Test 1787 891 896 531 360 31 865 0.5959596 0.94483986 0.73090158 0.78119754 0.9396 0.604504011 0.5959596 0.96540179
SP Test 237 148 89 80 68 4 85 0.54054054 0.95238095 0.68965517 0.69620253 0.9328 0.501728679 0.54054054 0.95505618
TR Test 1012 374 638 207 167 16 622 0.55347594 0.92825112 0.69346734 0.81916996 0.9562 0.615341231 0.55347594 0.97492163

Graphs

ROC-AUC Curve

PR-AUC Curve

Final scores :

precision recall f1-score support
non-crystallizable 0.73 0.97 0.83 1000
crystallizable 0.94 0.60 0.73 898
accuracy 0.79 1898
macro avg 0.83 0.78 0.78 1898
weighted avg 0.83 0.79 0.78 1898
precision recall f1-score support
non-crystallizable 0.71 0.97 0.82 896
crystallizable 0.94 0.60 0.73 891
accuracy 0.78 1787
macro avg 0.83 0.78 0.77 1787
weighted avg 0.83 0.78 0.77 1787
precision recall f1-score support
non-crystallizable 0.56 0.96 0.70 89
crystallizable 0.95 0.54 0.69 148
accuracy 0.70 237
macro avg 0.75 0.75 0.70 237
weighted avg 0.80 0.70 0.69 237
precision recall f1-score support
non-crystallizable 0.79 0.97 0.87 638
crystallizable 0.93 0.55 0.69 374
accuracy 0.82 1012
macro avg 0.86 0.76 0.78 1012
weighted avg 0.84 0.82 0.81 1012

Confusion matrix:

    | 532 | 362 |
    |  34 | 966 |
    | 531 | 360 |
    |  31 | 865 |
    | 80 | 68 |
    |  4 | 85 |
   | 207 | 167 |
   |  16 | 622 |

Metrics

roc score:

Mathews Coefficient:

NPV:

PPV:

Researchers:

Credits: