Training parameters:
model_args = ClassificationArgs()
model_args.max_seq_length = 512
model_args.train_batch_size = 12
model_args.eval_batch_size = 12
model_args.num_train_epochs = 5
model_args.evaluate_during_training = False
model_args.learning_rate = 1e-5
model_args.use_multiprocessing = False
model_args.fp16 = False
model_args.save_steps = -1
model_args.save_eval_checkpoints = False
model_args.no_cache = True
model_args.reprocess_input_data = True
model_args.overwrite_output_dir = True
Evaluation on BoolQ Test Set:
| Precision | Recall | F1-score | |
|---|---|---|---|
| 0 | 0.82 | 0.80 | 0.81 |
| 1 | 0.88 | 0.89 | 0.88 |
| accuracy | 0.86 | ||
| macro avg | 0.85 | 0.84 | 0.85 |
| weighted avg | 0.86 | 0.86 | 0.86 |
ROC AUC Score: 0.844