Movies in 2023 - Can you predict box office sales from IMDB votes?

Introduction

In my previous articles, I looked to see if it’s possible to predict the online voting for a movie from particular features, such as the director / producer / writer’s previous film ratings.

It worked with some success but was unable to predict the big swings of movie blockbusters or those that really fail to hit the mark.

Whilst reviewing the prediction accuracy (I built an automated integration and live dashboard), I thought it was time to bring in some more data to see if any other data insights would emerge.

Predicting box office sales before the movie hits the screens would prove as difficult as the user ratings but whilst digging around in the data, I stumbled across a fascinating, albeit probably obvious insight - that the number of votes correlates with the box office success. After all, to vote one has presumably seen the movie at the cinema.

However, rather than looking historically at what has been for a direct correlation, I was intrigued to know if the volume of votes from when the film is released can help predict the box office sales?

Video walk through

Louis talks you through this analysis of Box office sales and IMDB votes

The visualisation

This chart shows the average box office per day vs the average votes per day.

  • The ‘per day’ measures are the total box office or votes received divided by the number of days since release of the film.

  • Bubble size indicates how recently the film was released i.e. the smaller the bubble, the newer the film.

  • Bubble colour indicates how viewers rated the film; red is hot (above 7), blue less so.

So what can we do with this?

The plotted line diagonally crossing the page represents the linear trend and from this, we can utilise its equation:

Box office per day = 6039.17 x Votes per day + -2.64515e+06 (P-value: 0.0067051 & R-squared: 0.502091)

The equation shows that for every additional vote per day a movie receives, its box office revenue is expected to increase by $6,039.17.

The P-value (0.0067) is less than 0.05, which means it is statistically significant - therefore there is a strong correlation between the number of votes a movie receives and its box office performance.

The R-squared value of 0.502 indicates that 50% of the variation in box office revenue can be explained by the number of votes a movie receives. This means that other factors, such as the quality of the movie, the marketing, etc. play a role in determining a movie's box office success.

Insights

Anything close to the linear trend line is in keeping with the expected correlation, i.e. the more votes per day, the higher box office takings per day.

On that basis, John Wick 4 has actually performed less effectively than the commercial hype around it; people are voting in numbers but it’s not translating to revenue. In contrast, Fast X and the Mario Bros movies have over performed, not getting as much traction with voters but raking in the sales.

What next?

The biggest data drawback from this very quick analysis is that we’re looking at a point in time, so we can’t see how things unfold. When you spot this in your analysis, you can take action and start creating time series data (if it doesn’t already exist) so that you can evolve your analysis.

I’ve now setup a ‘date stamped’ historical table, which will record the votes (and rating) for each movie by day, so that we can develop the next evolution of this analysis.

We can check back in a few months to see how enthusiasm with voting may be an early indicator for spending money on the big screen.


About the author

Louis Keating is the Founder of White Box, with over two decades of experience in data science and advanced analytics.

 

We specialise in data visualisation at White Box

We can help illuminate your data so that you can make the right, unbiased decisions for your organisation.

As your partner in data visualisation, we’ll help you to realise the full potential of your data and maximise your business success through advanced and innovative solutions that make all the difference.

Get in touch today for your free data strategy consultation.


Explore more of our data stories