generated_from_trainer

ast-finetuned-audioset-10-10-0.4593_ft_ESC-50_aug_0-1

This model is a fine-tuned version of MIT/ast-finetuned-audioset-10-10-0.4593 on a subset of ashraq/esc50 dataset. It achieves the following results on the evaluation set:

Training and evaluation data

Training and evaluation data were augmented with audiomentations GitHub: iver56/audiomentations library and the following augmentation methods have been performed based on previous experiments Elliott et al.: Tiny transformers for audio classification at the edge:

Gain

Noise

Speed adjust

Pitch shift

Time masking

Training hyperparameters

The following hyperparameters were used during training:

Training results

Training Loss Epoch Step Validation Loss Accuracy Precision Recall F1
9.9002 1.0 28 8.5662 0.0 0.0 0.0 0.0
5.7235 2.0 56 4.3990 0.0357 0.0238 0.0357 0.0286
2.4076 3.0 84 2.2972 0.4643 0.7405 0.4643 0.4684
1.4448 4.0 112 1.3975 0.7143 0.7340 0.7143 0.6863
0.8373 5.0 140 1.0468 0.8571 0.8524 0.8571 0.8448
0.7239 6.0 168 0.8518 0.8929 0.9164 0.8929 0.8766
0.6504 7.0 196 0.7391 0.9286 0.9449 0.9286 0.9244
0.535 8.0 224 0.6682 0.9286 0.9449 0.9286 0.9244
0.4237 9.0 252 0.6443 0.9286 0.9449 0.9286 0.9244
0.3709 10.0 280 0.6304 0.9286 0.9449 0.9286 0.9244

Test results

Parameter Value
test_loss 0.5829914808273315
test_accuracy 0.9285714285714286
test_precision 0.9446428571428571
test_recall 0.9285714285714286
test_f1 0.930292723149866
test_runtime (s) 4.1488
test_samples_per_second 6.749
test_steps_per_second 3.374
epoch 10.0

Framework versions