Goal
This model can be used to add emoji to an input text.
To accomplish this, we framed the problem as a token-classification problem, predicting the emoji that should follow a certain word/token as an entity.
The accompanying demo, which includes all the pre- and postprocessing needed can be found here.
For the moment, this only works for Dutch texts.
Dataset
For this model, we scraped about 1000 unique tweets per emoji we support: ['😨', '😥', '😍', '😠', '🤯', '😄', '🍾', '🚗', '☕', '💰']
Which could look like this:
Wow 😍😍, what a cool car 🚗🚗!
Omg, I hate mondays 😠... I need a drink 🍾
After some processing, we can reposition this in a more known NER format:
Word | Label |
---|---|
Wow | B-😍 |
, | O |
what | O |
a | O |
cool | O |
car | O |
! | B-🚗 |
Which can then be leveraged for training a token classification model.
Unfortunately, Terms of Service prohibit us from sharing the original dataset.
Training
The model was trained for 4 epochs.