Who will win the processing war?

The silent battle many are avoiding

In the olden days, before even the Proper Football Men were denouncing spreadsheets, football had different rules. Depending on your viewpoint, it either had too many or none at all: yes, we’re talking pre-1863 codification.

Football was very popular even before it was The Beautiful Game™️ (sponsored by Kingdom Airlines), but everyone had their own way of doing things. This limited what you could do. It made it tough to play with other teams, and with so many interpretations some were bound to be worse than others. [a cheap crack at rugby could be made here]

And it’s with this in mind that we head to Japan, to quote a recent paper about analytics: “we propose […] a unified framework designed to streamline event annotation, data standardization, and various deep learning modeling for soccer analytics.” This is OpenSTARLab, coming from a group of Japanese researchers, the latest addition to a noble line of groups who want to make things easier.

As the paper references, they’re not the first. In 2019, a group of researchers from the Belgian university KU Leuven and Dutch company SciSports presented the Soccer Player Action Description Language, in a paper on their possession value model. The fellow Central European-born Kloppy Python package has a similar concept. The idea: while data providers stubbornly produce different types of event data, there are fundamental similarities that can be mapped into the same ‘language’.

An aside: there’s a tangent we could go down here, on a favourite Get Goalside topic of the internationalisation of analytics after a period of heavy English centrality. In fact, we’ll come back to it in a bit.

At one point in time, I even wondered whether FIFA would get in on the act. In 2021 they launched their FIFA Football Language. Arsène Wenger’s opening note uses the phrase “open-source”! Unlike the other frameworks, FIFA weren’t (yet?) trying to squeeze other data providers into their football language, but it formed the basis for their own data collection for FIFA tournaments.

Just like in the 1860s, the football community would quite like things to be simpler. Over the past few years, and for the next few years as well, a wave of football clubs will be going through their first major data provider switch. In many cases, this is because data providers have different offerings - more detailed and/or more shiny - but these differences don’t make up the majority of an event data spec.

In May last year, when writing about STILL-yet-to-be-replaced-as-Chelsea-front-of-shirt-sponsor Infinite Athlete, I asked a question. It was a question mainly about data engineering, but applies to the issues that OpenSTARLab is trying to solve too:

Which of the following is the more likely winner of the next three-to-five years?

  • Interoperability between data providers becomes seamless on its own, allowing for integration of different data sources within a provider’s own product, or allowing for foolproof entity matching between any provider to use data in third-party applications like Tableau

  • Organisations will turn to cloud providers like AWS for API integration and setting up data storage, either through some (semi-)automation (AI anyone??) or as an affordable managed service

  • The above, but provided by domestic leagues or national FAs

  • The scale of the task will have simply shrunk enough for clubs of all sizes to hire employees for the set-up and maintenance of data pipelines, and creation of internal tools

  • None of the above, it’ll be as complex as always

  • Something else

Maybe part of the ‘something else’ will be packages like kloppy and OpenSTARLab. If they become robust enough to use as standardisation systems while still offering the unique features of chosen data providers, a football club will only have to build a data system once*.

*(Well, as much as anyone only builds a data system once. The need to re-build and re-write would certainly reduce, and/or the need for them to develop their own abstractions).

This still doesn’t mean that it’s sensible for all football clubs to create their whole data engineering and software infrastructure themselves. It’s not. Smart (and sufficiently wealthy) football associations should be helping their clubs to establish a baseline standard, particularly if they’re a country in some sort of continental competition coefficient race. Like, I dunno, Belgium.

Unfortunately (from certain perspectives), this is unlikely to be a problem that really slaps people round the face for another few years. Not only are there many leagues where ‘using data’ means using software rather than the raw data itself, but therefore many leagues where it’s a competitive advantage to quietly build things yourself even if that means building inexpertly.

Now would, I suppose, be a good moment to declare the interest of working for a company that 1) deals with multiple event data providers 2) deals with interoperability between different data providers 3) has (smart, capable, witty) employees to pay and investors to create value for. Hopefully you, dear reader, trust that Get Goalside’s only biases are towards the entertainment of yourselves and, more importantly, of the writer.

(Also to the reduction in usage of the word ‘democratise’: democracy has enough on its plate without being dragged into sales pitches).

At some point, the boring parts of using data in football will get easier. What’s less clear is how.

Football got codified when a group of English elites decided to argue until they reached an agreement (which was then tweaked and then completely rewritten for clarity in the 1930s). Their agreement was picked up as the standard by the world. Who will end up writing the standard for football data collection and processing?

All of that was written about a week ago, and I didn’t get around to finishing it off. And then along came the Open-Source Avengers.

🌉🌁 𝐁𝐮𝐢𝐥𝐝𝐢𝐧𝐠 𝐛𝐫𝐢𝐝𝐠𝐞𝐬! We brought together contributors from different open source projects to discuss how we can align our work and improve interoperability in football analytics. Looking forward to the next steps. #FootballAnalytics #OpenSource #PySport

PySport (@pysport.org)2025-02-26T13:46:03.373Z

Wonderfully, not only does it circle straight back to the origin of this post, but goes right back to the internationalism point. Six nationalities are represented here, none of them English. England by no means has a monopoly on analytics history, but it did have a commercial tracking data provider in the 90s, it did supply the mainstream breakthrough for expected goals, it does have the club(s) commonly cited as the sport’s leaders in analytics.

But it’s very exciting seeing this cross-continent collaboration, and it’s very exciting seeing interesting analytics job ads from a widening set of nations. I would love to see more of them outside the historic nations plus America; I would love to see more of them in women’s football.

The ‘globalness’ of the global game has clearly pushed players and coaches to be better. It seems likely that it’ll help push the data side of things to be better too.