Dolores: a model that predicts football match outcomes from all over the world

作者:Anthony C. Constantinou

摘要

The paper describes Dolores, a model designed to predict football match outcomes in one country by observing football matches in multiple other countries. The model is a mixture of two methods: (a) dynamic ratings and (b) Hybrid Bayesian Networks. It was developed as part of the international special issue competition Machine Learning for Soccer. Unlike past academic literature which tends to focus on a single league or tournament, Dolores is trained with a single dataset that incorporates match outcomes, with missing data (as part of the challenge), from 52 football leagues from all over the world. The challenge involved using a single model to predict 206 future match outcomes from 26 different leagues, played from March 31 to April 9 in 2017. Dolores ranked 2nd in the competition with a predictive error 0.94% higher than the top and 116.78% lower than the bottom participants. The paper extends the assessment of the model in terms of profitability against published market odds. Given that the training dataset incorporates a number of challenges as part of the competition, the results suggest that the model generalised well over multiple leagues, divisions, and seasons. Furthermore, while detailed historical performance for each team helps to maximise predictive accuracy, Dolores provides empirical proof that a model can make a good prediction for a match outcome between teams x and y even when the prediction is derived from historical match data that neither x nor y participated in. While this agrees with past studies in football and other sports, this paper extends the empirical evidence to historical training data that does not just include match results from a single competition but contains results spanning different leagues and divisions from 35 different countries. This implies that we can still predict, for example, the outcome of English Premier League matches, based on training data from Japan, New Zealand, Mexico, South Africa, Russia, and other countries in addition to data from the English Premier league.

论文关键词:Association football, Bayesian Networks, Dynamic ratings, Football betting, Soccer prediction, Time-series analysis

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10994-018-5703-7