CompoundPiece model trained only on Stage 1 training data (self-supervised training on hyphenated and non-hyphenated words scraped from the web). See CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models.

Citation

@article{minixhofer2023compoundpiece,
  title={CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models},
  author={Minixhofer, Benjamin and Pfeiffer, Jonas and Vuli{\'c}, Ivan},
  journal={arXiv preprint arXiv:2305.14214},
  year={2023}
}

License

MIT