Model description:
Distilbert is created with knowledge distillation during the pre-training phase which reduces the size of a BERT model by 40%, while retaining 97% of its language understanding. It's smaller, faster than Bert and any other Bert-based model.
Distilbert-base-uncased finetuned on the fake news dataset with below Hyperparameters
learning rate 5e-5,
batch size 32,
num_train_epochs=2,
Full code available @ DistilBert-FakeNews
Dataset available @ Fake News dataset