sklearn machine learning movie-genre-prediction multi-class classification

Model Details

Model Description

The goal of the competition is to design a predictive model that accurately classifies movies into their respective genres based on their titles and synopses.

The model takes in inputs such as movie_name and synopsis as a whole string and outputs the predicted genre of the movie.

Model Sources

Training Details

We have used Multinomial Naive Bayes Algorithm to work well with Sparse Vectorized data, which consists of movie_name and synopsis. The output of the model is a class (out of 10 classes) of the genre.

Training Data

All the Training and Test Data can be found here:



  1. Label Encoding
  2. Tokenization
  3. TF-IDF Vectorization
  4. Preprocessing of digits, special characters, symbols, extra spaces and stop words from textual data


The evaluation metric used is [Accuracy] as specified in the competition.