Eurovision-inspired Lyrics Generator with Bidirectional LSTM
Model
This is a text generation bi-directional LSTM model trained on a corpus of Eurovision song lyrics translated into English, preprocessed to 8,274 tokens (+OOV) and 172,249 n-gram sequences. Each embedding sequence input is a maximum of 202 tokens (pre-padded).
The model was built with Tensorflow 2.12.0.
Dataset
The model was trained on the following dataset from Kaggle. https://www.kaggle.com/datasets/minitree/eurovision-song-lyrics
Usage
Use the tokenizer.json
in this repo to convert your input text into sequences. The max length of a sequence is 202, while the model was trained on n-grams of each sequence as its features, the last token being the label. Thus, when testing the model, the maxlen
parameter of pad_sequences
has to be 201.
In Tensorflow:
seed_text = "Once when I was a child"
next_words = 10
for _ in range(next_words):
sequence = tokenizer.texts_to_sequences([seed_text])[0]
sequence = pad_sequences([sequence], maxlen=201, padding='pre')
prediction = model.predict(sequence, verbose=0)
prediction = np.argmax(prediction, axis=-1).item()
output = tokenizer.index_word[prediction]
seed_text += " " + output
print(seed_text)