


Football is fundamentally won and lost on the pitch itself, yet one researcher built 11 different prediction models to try and work out who will win the 2026 FIFA World Cup before the final whistle blows, and the resulting answers differ significantly.
Produced by scientist and Wangari founder Ari Joury, who co-authored 'Soccer Analytics with Machine Learning', these predictions use various models with different classifiers to sort through real match data from the last 16 years to hopefully result in a conclusive answer.
Published in Towards Data Science, Joury outlines how the models argued their outcomes from the same tournament simulation, with three rating systems, two goal models, and five classifiers going head to head to see who is right in the end.
We won't truly know the answer until a winner emerges on July 19, but if you want to get ahead of the curve and perhaps even look impressive in front of your friends then you might want to pay attention to what the models have to say.
Advert

There unfortunately was no singular or unified choice for the World Cup winner across the 11 models, but they did settle on four different winners and it's mostly the usual suspects.
Seven out of the eleven outcomes predicted Spain to be the winner, which is a consensus shared by many and the betting markets as their success at EURO 2024 and the emergence of stars like Lamine Yamal make them a safe pick.
2022 World Cup winner Argentina is seemingly the second favorite, with two models picking Messi to go back-to-back, and the remaining two models picked France and Netherlands — with the latter being somewhat of a dark horse that you might not have expected.
It's a slightly different story when looking at the mean consensus across the different models, as while Spain remained the most likely to lift the trophy with a 20% chance, France and Argentina are tied second with a 14% probability.
Both the Netherlands and England are close behind in third and fourth with 10% and 9% respectively, with Brazil, Portugal, Germany, Belgium, and Croatia rounding out the rest of the top 10.
The models results do largely align with common consensus among fans and the betting markets, yet there are some glaring flaws that Joury himself is aware of when analysing the viability of these models.
Arguably the biggest issue comes from the data set that the models are working from, as two glaring discrepancies emerge. The data available does include every World Cup match from 2010 to 2022, alongside 102 games from the European Championships across 2020 and 2024, but this provides teams outside of Europe with a much weaker data set.
That could potentially harm the chances of teams like Argentina and Brazil, and it also significantly impacts any nations that are simply omitted from the data altogether, like World Cup debutants including Cabo Verde and Curaçao located outside of Europe.

Additionally, in using old data the models not only fail to account for current form – as the most recent information is at least two years out of date – but also include results that are broadly irrelevant to the current strength of teams.
There's no hiding the fact that leading footballing nations like France, Argentina, or Spain are consistently strong – with each winning a World Cup of their own within the included data set – but results from when Yamal was just 3-years-old don't have much weight on this year's winner.
"None of this sinks the exercise," Joury explains, "but a model suite is only as independent as its inputs, and being explicit about that is the difference between an actual ensemble and an echo chamber."
With data's influence on the modern game growing seemingly with every subsequent year it's arguably more interesting than ever to evaluate the viability of prediction models like this, but ultimately it's impossible to actually predict who will win, as the joy of the beautiful game is that the 'deserved' winner doesn't always emerge victorious.