This is a ByT5-small model fine-tuned for early Middle English lemmatization. This is a PoC. The model has been fed series of 11-grams extracted from eLAEME corpus and prefixed with "Lemmatize: ". It is not intended to serve as general lemmatizer for all sorts of Middle English texts because eLAEME employes bespoke transcription rules that diverge from your regular transcript rules.
The manx
package that you can use the model with can be found here:
https://github.com/mdm-code/manx
. The package will give a more general look
at the data used to fine-tune the model. It lets you download corpus files, parse
them and get them ready for fine-tuning the base model checkpoint.
It has links to Colab notebook and ready-made API that lets you feed
texts to have them lemmatized.
Make sure to reference this Huggingface repository
(https://huggingface.co/mdm-code/me-lemmatize-byt5-small
) and the Github
repository (https://github.com/mdm-code/manx
) for manx
whenever you use
this model for your own research. The model and package are published under the
GPL-3 license.