This model takes in dirty text (from pdfs and epubs), cleans it, and converts it to markdown. It removes page headers/footers, page numbers, etc.