biology chemistry therapeutic science drug design drug development therapeutics

Dataset description

The CYP P450 genes are involved in the formation and breakdown (metabolism) of various molecules and chemicals within cells. Specifically, CYP3A4 is an important enzyme in the body, mainly found in the liver and in the intestine. It oxidizes small foreign organic molecules (xenobiotics), such as toxins or drugs, so that they can be removed from the body.

Task description

Binary classification. Given a drug SMILES string, predict CYP3A4 inhibition.

Dataset statistics

Total: 12,328 drugs

Dataset split

Random split on 70% training, 10% validation, and 20% testing

To load the dataset in TDC, type

from tdc.single_pred import ADME
data = ADME(name = 'CYP3A4_Veith')

Model description

CNN is applying Convolutional Neural Network on SMILES string fingerprint. The model is tuned with 100 runs using the Ax platform. To load the pre-trained model, type

from tdc import tdc_hf_interface
tdc_hf = tdc_hf_interface("CYP3A4_Veith-CNN")
# load deeppurpose model from this repo
dp_model = tdc_hf_herg.load_deeppurpose('./data')
tdc_hf.predict_deeppurpose(dp_model, ['YOUR SMILES STRING'])

References