IMDb movie database

Adrián Rodríguez Saínz
4 min readFeb 16, 2021

This blog is part of a Data Science course using Python and Pandas.

For this post, I used publicly available IMDb datasets. To make the data more relevant and easier to use, I have narrowed down the movies to be analysed using two conditions. Firstly, I only took movies made after 1960 into account, due to missing data in the datasets present in older movies. Secondly, I only considered movies that have received more than a thousand votes, to filter out potential outliers.

This post will focus on the following questions:

· Is there a relationship between the number of people who vote and the average vote a given movie is getting?
· What types of movies are the most well reviewed?
· Did the length of movies change over time?
· Has there been an increase in popularity for a given genre?

Is there a relationship between the number of people who vote and the average vote a given movie is getting?

This question was really interesting to me, as I have always thought people are more likely to give feedback on negative experiences than good ones. As it turned out, this is not the case; the literal opposite holds in the case of IMDb movie ratings.

Plotting each movie with their respective number of votes received (Votes) and the average of those votes (AR), it is clear that movies that are rated higher tend to receive a significantly higher amount of votes as well.

Votes are in millions

This does not prove that people are more likely to rate good movies, but it shows that over all, people tend to give a larger number of votes for good movies.
This might be due to the fact that people tend to watch movies based on recommendations of others, thus much more people end up watching the higher rated movies than lower rated ones.

What types of movies are the most well reviewed?

0=drama, 1=action, 2=comedy, 3=crime, 4=horror

I have used a box plot to visualise the ratings of each genre. Not surprisingly, horror films are at the low-end of the scale, while crime movies are at the top.

Did the length of movies change over time?

Another interesting question is; have the length of movies changed over time? The movie industry became huge in the last decades, but what effect did it have on the length of the films?

As it can bee seen from the graph, a mild increase can be observed, however, as colour darkness measures density, it is rather apparent that the output of the movie industry has grown drastically, in terms of movies produced.

Has there been an increase in popularity for a given genre?

I have taken “popularity” as a measure of popularity in the film industry. I have taken the 5 genres; drama, action, comedy, crime and horror, and plotted the number of films produced belonging to a category in each year.

As it can be seen, the movie industry has grew substantially as a whole. Drama movies stand out with over a whopping 3000 movies made in the mid 2010’s, followed by action movies, with around 1400 movies made in the same period. Just as with the ratings, horror movies finished last, with over 700 movies made in the peak of the category, in the earlier 2010’s.

--

--