<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->
flan-t5-base-geoqa
This model is a fine-tuned version of google/flan-t5-base on the petroglyphs-nlp-consulting/res_syn_sentences_qa_lg dataset.
Model description
The google/flan-t5-base model has been fine-tuned on 24750 question-answer pairs obtained from the petroglyphs-nlp-consulting/res_syn_sentences_qa_lg dataset.
Intended uses & limitations
The model can be used for Q&A inferences covering Geosciences (specifically oil and Gas Reservoir Geology) topics.
Training and evaluation data
- Training: 24750 question-answer pairs
- Validation: 4980 question-answer pairs
- Test: 250 question-answer pairs
Evaluation run on the test set using the following methods:
def normalize_answer(s):
def remove_articles(text):
return re.sub(r'\b(a|an|the)\b', ' ', text)
def white_space_fix(text):
return ' '.join(text.split())
def remove_punc(text):
exclude = set(string.punctuation)
return ''.join(ch for ch in text if ch not in exclude)
def lower(text):
return text.lower()
return white_space_fix(remove_articles(remove_punc(lower(s))))
def f1_score(prediction, ground_truth):
prediction_tokens = normalize_answer(prediction).split()
ground_truth_tokens = normalize_answer(ground_truth).split()
common = Counter(prediction_tokens) & Counter(ground_truth_tokens)
num_same = sum(common.values())
if num_same == 0:
return 0
precision = 1.0 * num_same / len(prediction_tokens)
recall = 1.0 * num_same / len(ground_truth_tokens)
f1 = (2 * precision * recall) / (precision + recall)
return f1
def exact_match_score(prediction, ground_truth):
return (normalize_answer(prediction) == normalize_answer(ground_truth))
def evaluate(gold_answers, predictions):
f1 = exact_match = total = 0
for gold_answer, prediction in zip(gold_answers, predictions):
total += 1
exact_match += exact_match_score(prediction, gold_answer)
f1 += f1_score(prediction, gold_answer)
exact_match = 100.0 * exact_match / total
f1 = 100.0 * f1 / total
return {'exact_match': exact_match, 'f1': f1}
- 'exact_match': 70.4
- 'f1': 83.07678804855274
Comparison with vanilla google/flan-t5-base
- 'exact_match': 65.2
- 'f1': 80.156117099275
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 16
- total_train_batch_size: 512
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 6
Training results
'train_samples_per_second': 40.608, 'train_steps_per_second': 0.079, 'train_loss': 0.06748775382422739, 'epoch': 15}
Training hardware
2 x RTX Titan GPU (24Gb each)
Framework versions
- Transformers 4.27.2
- Pytorch 2.0.0+cu117
- Datasets 2.10.1
- Tokenizers 0.13.2
Environmental footprint
A single GPU has been used but both cards have been used for memory sharing. 280W x 3h = 0.84 kWh x 0.3 kg eq. CO2/kWh = 0.25 kg eq. CO2