token-classification sequence-tagger-model

Goal

This model can be used to add emoji to an input text.

To accomplish this, we framed the problem as a token-classification problem, predicting the emoji that should follow a certain word/token as an entity.

The accompanying demo, which includes all the pre- and postprocessing needed can be found here.

For the moment, this only works for Dutch texts.

Dataset

For this model, we scraped about 1000 unique tweets per emoji we support: ['😨', '😥', '😍', '😠', '🤯', '😄', '🍾', '🚗', '☕', '💰']

Which could look like this:

Wow 😍😍, what a cool car 🚗🚗!
Omg, I hate mondays 😠... I need a drink 🍾

After some processing, we can reposition this in a more known NER format:

Word Label
Wow B-😍
, O
what O
a O
cool O
car O
! B-🚗

Which can then be leveraged for training a token classification model.

Unfortunately, Terms of Service prohibit us from sharing the original dataset.

Training

The model was trained for 4 epochs.