Eurovision-inspired Lyrics Generator with Bidirectional LSTM

Model

This is a text generation bi-directional LSTM model trained on a corpus of Eurovision song lyrics translated into English, preprocessed to 8,274 tokens (+OOV) and 172,249 n-gram sequences. Each embedding sequence input is a maximum of 202 tokens (pre-padded).

The model was built with Tensorflow 2.12.0.

Dataset

The model was trained on the following dataset from Kaggle. https://www.kaggle.com/datasets/minitree/eurovision-song-lyrics

Usage

Use the tokenizer.json in this repo to convert your input text into sequences. The max length of a sequence is 202, while the model was trained on n-grams of each sequence as its features, the last token being the label. Thus, when testing the model, the maxlen parameter of pad_sequences has to be 201.

In Tensorflow:

seed_text = "Once when I was a child"
next_words = 10

for _ in range(next_words):
  sequence = tokenizer.texts_to_sequences([seed_text])[0]
  sequence = pad_sequences([sequence], maxlen=201, padding='pre')
  prediction = model.predict(sequence, verbose=0)

  prediction = np.argmax(prediction, axis=-1).item()
  output = tokenizer.index_word[prediction]
  seed_text += " " + output

print(seed_text)