This project pretrains a roberta-base
on the Alemannic (als
) data subset of the OSCAR corpus in JAX/Flax.
We will be using the masked-language modeling loss for pretraining.
This project pretrains a roberta-base
on the Alemannic (als
) data subset of the OSCAR corpus in JAX/Flax.
We will be using the masked-language modeling loss for pretraining.