biology chemistry therapeutic science drug design drug development therapeutics

Dataset description

The CYP P450 genes are involved in the formation and breakdown (metabolism) of various molecules and chemicals within cells. Specifically, CYP3A4 is an important enzyme in the body, mainly found in the liver and in the intestine. It oxidizes small foreign organic molecules (xenobiotics), such as toxins or drugs, so that they can be removed from the body.

Task description

Binary classification. Given a drug SMILES string, predict CYP3A4 inhibition.

Dataset statistics

Total: 12,328 drugs

Dataset split

Random split on 70% training, 10% validation, and 20% testing

To load the dataset in TDC, type

from tdc.single_pred import ADME
data = ADME(name = 'CYP3A4_Veith')

Model description

Morgan chemical fingerprint with an MLP decoder. The model is tuned with 100 runs using the Ax platform.

from tdc import tdc_hf_interface
tdc_hf = tdc_hf_interface("CYP3A4_Veith-Morgan")
# load deeppurpose model from this repo
dp_model = tdc_hf_herg.load_deeppurpose('./data')
tdc_hf.predict_deeppurpose(dp_model, ['YOUR SMILES STRING'])

References