<h1>Model description</h1>

This is a fine-tuned BioBERT model for extracting the relation between clinical trial outcome and its significance level. The task is framed as sentence classification:

For example, the sentence "Intubation conditions (succinylcholine 8.3 ± 0.8; rocuronium 8.2 ± 0.9; P = 0.7) and failed first intubation attempts (succinylcholine 32/200; rocuronium 36/201; P = 1.0) did not differ between the groups." contains several outcomes ("Intubation conditions", "failed first intubation attempts") and significance levels ("P = 0.7", "P = 1.0"). Masked sentence for each pair and the expected label are as follows:

@OUTCOME$ (succinylcholine 8.3 ± 0.8; rocuronium 8.2 ± 0.9; @SIGNIFICANCE$) and failed first intubation attempts (succinylcholine 32/200; rocuronium 36/201; P = 1.0) did not differ between the groups.	1

@OUTCOME$ (succinylcholine 8.3 ± 0.8; rocuronium 8.2 ± 0.9; P = 0.7) and failed first intubation attempts (succinylcholine 32/200; rocuronium 36/201; @SIGNIFICANCE$) did not differ between the groups.        0

Intubation conditions (succinylcholine 8.3 ± 0.8; rocuronium 8.2 ± 0.9; P = 0.7) and @OUTCOME$ (succinylcholine 32/200; rocuronium 36/201; @SIGNIFICANCE$) did not differ between the groups.	1

Intubation conditions (succinylcholine 8.3 ± 0.8; rocuronium 8.2 ± 0.9; @SIGNIFICANCE$) and @OUTCOME$ (succinylcholine 32/200; rocuronium 36/201; P = 1.0) did not differ between the groups.	0

This is the second version of the model; the original model development was reported in:

Anna Koroleva, Patrick Paroubek. Extracting relations between outcome and significance level in Randomized Controlled Trials (RCTs) publications. Proceedings of ACL BioNLP workshop, 2019 https://aclanthology.org/W19-5038/

The original work was conducted within the scope of the Assisted authoring for avoiding inadequate claims in scientific reporting PhD project of the Methods for Research on Research (MiRoR, http://miror-ejd.eu/) program.

Model creator: Anna Koroleva

<h1>Intended uses & limitations</h1>

The model was originally intended to be used as a part of spin (unjustified presentation of trial results) detection pipeline in articles reporting Randomised controlled trials (see Anna Koroleva, Sanjay Kamath, Patrick MM Bossuyt, Patrick Paroubek. DeSpin: a prototype system for detecting spin in biomedical publications. Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing. https://aclanthology.org/2020.bionlp-1.5/). It can also be used separately, for predicting outcome - significance level relation.

The main limitation is that the model was trained on a fairly small sample of data annotated by a single annotator. Annotating more data or involvig more annotators was not possible within the PhD project.

<h1>How to use</h1>

The model should be used with the BioBERT tokeniser. A sample code for getting model predictions is below:

	import numpy as np
	from transformers import AutoModelForTokenClassification
	from transformers import AutoModelForSequenceClassification

	tokenizer = AutoTokenizer.from_pretrained('dmis-lab/biobert-v1.1')
	model = AutoModelForSequenceClassification.from_pretrained("aakorolyova/outcome_significance_relation")
	
	text1 = "@OUTCOME$ (succinylcholine 8.3 ± 0.8; rocuronium 8.2 ± 0.9; @SIGNIFICANCE$) and failed first intubation attempts (succinylcholine 32/200; rocuronium 36/201; P = 1.0) did not differ between the groups."
	text2 = "@OUTCOME$ (succinylcholine 8.3 ± 0.8; rocuronium 8.2 ± 0.9; P = 0.7) and failed first intubation attempts (succinylcholine 32/200; rocuronium 36/201; @SIGNIFICANCE$) did not differ between the groups."
	
	tokenized_input1 = tokenizer(text1, padding="max_length", truncation=True, return_tensors='pt')
	output1 = model(**tokenized_input1)['logits']
	output1 = np.argmax(output1.detach().numpy(), axis=1)
	print(output1)

	tokenized_input2 = tokenizer(text2, padding="max_length", truncation=True, return_tensors='pt')
	output2 = model(**tokenized_input2)['logits']
	output2 = np.argmax(output2.detach().numpy(), axis=1)
	print(output2)

Some more useful functions can be found in or Github repository: https://github.com/aakorolyova/DeSpin-2.0

<h1>Training data</h1>

Training data can be found in https://github.com/aakorolyova/DeSpin-2.0/tree/main/data/Outcome_significance_relation

<h1>Training procedure</h1> The model was fine-tuned using Huggingface Trainer API. Training scripts can be found in https://github.com/aakorolyova/DeSpin-2.0

<h1>Evaluation</h1>

Precision: 94.96%

Recall: 96.35%

F1: 95.65%