Can Machine Learning win the Melbourne Cup?
Whilst building the Melbourne Cup game, we had the chance to assess the variables and build some models to see if we could use our analytical skills to help predict the winner.
Firstly, our game showed as an important truth….predicting the winner is tough! So we settled on modelling a place (1st, 2nd or 3rd).
Important note - We do not encourage gambling and by no means do we believe that this is a robust solution otherwise we would quit our jobs and do this for a living!
What did we learn?
Chaid is a great way of visualising the modelling process through decisions. The nodes help you decide which way to go by showing you the predictive result (the middle decimal) and the percentage of the file this represents (the bottom %).
So firstly, at the top, we see that Price is a key variable, odds/price under $23 is ideal.
Then a crossroads. Do we use the Breeding Sire and Dam options that lead to the bright green in the bottom right or do we use colour to find the green bottom middle node? Both also use horse country (origin) and both have a decent outcome, although bottom right is better.
Either way, we now have some ideas of what is going to be predictive.
For the purposes of our scoring exercise, we used the bottom right node.
Next we tested a Logistic regression model and a Random Forest. Both showed Price as a significant and crucial variable.
Once our training and testing was complete, we need to score the runners!
To simplify things, we have ranked the outputs for each horse to gauge a consensus across the models.
Keeping in mind that we’re predicting a place (1st, 2nd or 3rd), Cross Counter feels like the strongest choice.
An important question you should always ask is “how good are these models?”
The answer: The accuracy isn’t great. They suffer from a lack of data. The ideal situation would be using the longer trend race history, for every runner. Maybe next year.
This exercise is purely for fun (we do find this kind of data analysis fun) but keen to see how the models do.
Now we wait and see what happens at 3pm today….
Post 3pm update
So the runners have run and the champagne has been sunk. How did we do?
If you look down the final column, very near the bottom, you’ll see Vow And Declare, the winner of this years 2019 Melbourne Cup. Not one of our predictions…
However, if you remember from the start of the article, we were looking for a “place” and our intuition told us to ring fence the top 5 predictions for an each-way bet.
So Prince of Arran came to our rescue with a second place.
Our conclusion: Don’t gamble kids, even if you’re a data scientist.