chemistry SMILES product

Model Card for ReactionT5-product-prediction

This is a ReactionT5 pre-trained to predict the products of reactions. You can use the demo here.

Model Details

<!-- Provide a longer summary of what this model is. -->

Model Sources

<!-- Provide the basic links for the model. -->

Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

How to Get Started with the Model

Download files and use the code below to get started with the model.

from transformers import AutoTokenizer, T5ForConditionalGeneration

tokenizer = AutoTokenizer.from_pretrained('sagawa/ReactionT5-product-prediction')
inp = tokenizer('REACTANT:COC(=O)C1=CCCN(C)C1.O.[Al+3].[H-].[Li+].[Na+].[OH-]REAGENT:C1CCOC1', return_tensors='pt')
model = T5ForConditionalGeneration.from_pretrained('sagawa/ReactionT5-product-prediction')
output = model.generate(**inp, min_length=6, max_length=109, num_beams=1, num_return_sequences=1, return_dict_in_generate=True, output_scores=True)
output = tokenizer.decode(output['sequences'][0], skip_special_tokens=True).replace(' ', '').rstrip('.')
output # 'O=S(=O)([O-])[O-].O=S(=O)([O-])[O-].O=S(=O)([O-])[O-].[Cr+3].[Cr+3]'

Training Details

Training Procedure

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> We used Open Reaction Database (ORD) dataset for model training. Following is the command used for training. For more information, please refer to the paper and GitHub repository.

python train.py \
    --epochs=100 \
    --batch_size=32 \
    --data_path='../data/all_ord_reaction_uniq_with_attr_v3.csv' \
    --use_reconstructed_data \
    --pretrained_model_name_or_path='sagawa/CompoundT5'

Results

Model Training set Test set Top-1 [% acc.] Top-2 [% acc.] Top-3 [% acc.] Top-5 [% acc.]
Sequence-to-sequence USPTO USPTO 80.3 84.7 86.2 87.5
WLDN USPTO USPTO 80.6 (85.6) 90.5 92.8 93.4
Molecular Transformer USPTO USPTO 88.8 92.6 94.4
T5Chem USPTO USPTO 90.4 94.2 96.4
CompoundT5 USPTO USPTO 88.0 92.4 93.9 95.0
ReactionT5 ORD USPTO 0.0 <85.0> 0.0 <90.6> 0.0 <92.3> 0.0 <93.8>

Performance comparison of Compound T5, ReactionT5, and other models in product prediction. The values enclosed in ‘<>’ in the table represent the scores of the model that was fine-tuned on 200 reactions from the USPTO dataset. The score enclosed in ‘()’ is the one reported in the original paper.

Citation [optional]

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

Model Card Authors [optional]

{{ model_card_authors | default("[More Information Needed]", true)}}

Model Card Contact

{{ model_card_contact | default("[More Information Needed]", true)}}