medical

Dataset: https://www.kaggle.com/datasets/timmayer/covid-news-articles-2020-2022

Comprehensive guide can be found here: https://medium.com/@shankar.arunp/easily-build-your-own-gpt-from-scratch-using-aws-51811b6355d3

The model is GPT2 further pre-trained on the news articles to incorporate COVID-19 related context to the model.

Similar article on how to further pre-train a BERT base model from scratch using the articles can be found here: https://medium.com/@shankar.arunp/training-bert-from-scratch-on-your-custom-domain-data-a-step-by-step-guide-with-amazon-25fcbee4316a