- Get Goalside
- Posts
- A fifth birthday
A fifth birthday
Get Goalside is five years old. I was aware that it pre-dated the pandemic, but hadn’t put together its age until going back through the archive after publishing the latest post (understanding machine learning through the analogy of football).
So I went back and reviewed it all.
Inevitably, more of it than I’d like was… meh. Not bad, just a waste of precious words. For those who’ve been around a while, thank you for not unsubscribing. However, thankfully a lot of it was quite good, or at least had interesting ideas.
Because people are always newly arriving (welcome), and I’d forgotten a lot of this anyway, here’s a thematic wrap-up of Get Goalside’s greatest hits.
Defensive stats
Get Goalside actually started as a defensive analysis newsletter/blog, but in April 2019 I published ‘Possession adjusting: an essay’. I would eventually look at some proper data on the topic, but two thoughts from it stick out even now:
“Adjusting defensive stats isn’t about seeing which are the ‘good’ defenders, but if you’re using the defensive stats to determine a player’s role, then you need to isolate that role as much as possible. In other words, you have three things that can affect a player’s stats. Their execution, their tactical role, and things outside their control like the quality of their or their opponents team and, from that, how much of the ball either side sees.”
And
“We don’t adjust attacking stats like shots or expected goals just because a striker plays for a good team.”
In public work, this latter sentence is still pretty accurate. The Athletic published a piece about adjusting attacking stats in 2022, but even adjusting per 90 minutes isn’t totally mainstream. This won’t be the case behind closed, professional doors. Coincidentally, Tim Keech of MRKT Insights (who are also celebrating a fifth birthday) noted the other day that Anthony Gordon’s time at Everton popped positively for them when adjusting his stats. However, to distil the point of these two thoughts more succinctly, it’s a good message: what in the raw data needs contextualising, and why?
A couple of years later, I came back with some graphs, some Hamilton references, and a prompt:
“I present these not as definitive takeaways to apply to your own work, but to show [that] whether possession adjusting makes sense depends on both the stat and the position of the player.”
‘What is football’
The defensive stats focus led to rumination on the nature of football more broadly.
It really hit its stride in 2022, but there are a couple of good lines from earlier pieces as well:
“To me, football is too much of an inter-connected sport for things to be as simple as coming down to the strongest link or weakest link. […] Maybe instead of the theory argument being strong link vs weak link it should be strong selection vs weak selection, or maybe strong unit vs weak unit.”
— ‘Is football a ‘weak link’ sport?’, September 2020
“As [Seth] Partnow says about shooting in basketball [in his book, The Midrange Theory], if we were starting from scratch I don't think we'd have the statistical landscape for defending in football that we have now. In fact, as he also teases in a footnote, our lack of conceptual understanding of defending probably holds back how we choose to collect data on it.
If defending is all about space, why are the defensive statistics so much about how a player affects the ball?”
— ‘Do we know football well enough to have good defensive stats?’, December 2021
A mixture of Partnow’s book, The Midrange Theory, and Twitter user @TiotalFootball are probably largely to blame for 2022, where I really hammered on the theme…
‘What’s in the way of analytics solving football?’ (partly the ball, partly the fallibility and slow learning rates of humans); ‘The Theory of Everything (in football)’ (split things into better conceptual categories); ‘Fear and fatigue in analytics modelling’ (maybe these would affect pitch control/possession value models); ‘What if we didn’t care about passes’ (and what stats we might have collected instead); ‘Ball control, space control, and why good teams play Pep-ball’ (with a 2×2 framework which also explains high pressing and deep blocks); ‘What is midfield for?’ (time-wasting).
Good collection of ideas there.
Conference-watch
The best conferences are collections of people and ideas, interrupted by sales pitches. The worst conferences are sales pitches, interrupted by ideas. As a result, post-conference pieces tend to weave half-threads together.
The post-conference pieces and their threads:
2021 Opta Pro Forum and StatsBomb ‘Evolve’ event (halfway house of event and tracking data; data providers competing on modelling); 2021 NESSIS and StatsBomb conferences (‘We know what players do, but we don't know how they do it’ — Vosse de Boode); 2022 StatsBomb conference (‘Insight is one thing, productising insight is another’); 2023 Opta Pro Forum (asking good questions); 2023 StatsBomb conference (‘everyone’s a decision-maker now’).
The best thing about these conferences (most of which I attended in person) wasn’t what I wrote about them, it was the conversations over various beverages. And one truly fantastic free StatsBomb pen.
‘Research’ work
At times I’ve even looked at real data (outside of the defensive data mentioned earlier).
In 2019, I wrote something that’s still a favourite of mine: looking at how good people think ‘good’ is and how that might affect the ratings they give to various players. The following year, I looked at the percentage of minutes Englishmen made up in the Premier League (amid Brexit rules controversy). Would really like to see what updated data on that looks like.
Two really fun pieces based on StatsBomb’s freely-available data: ‘How would you play against the Invincibles’ and trying to find a similar dribbler to Messi. In more of a data science vein, in 2023 I had a go at adding player position labels to their 360 data freezeframes too.
Finally, and more seriously, I went back to the StatsBomb free data to question the rationale behind increased added time on matches. The follow-up piece questioning the reasoning also delved into what actually drives match length. Hopefully we’ve seen the back of that fad.
Silly ideas
But I guess I can’t throw too much shade - this newsletter has been a home for lots of silly ideas in the past.
Get Goalside’s is an intentional, whimsical silly though. The silliest things you can do with advanced tech (2019); stealing ideas from other sports (2021); the list of things Manchester United could task their new data scientists with doing (2022 - “Optimal C-suite engineering (retain as few of your bosses as possible, as many as needed)”); silly decision-making that data scientists could do in-game (2022).
There was no whimsy in 2023.
Analytics history
Analytics is older than you (might) think. Although even the recent stuff is now old enough to be the subject of books and podcasts. And what do we mean when we talk about ‘analytics’ anyway?
History is a particular pre-occupation. Partly because Get Goalside, in some ways, is a record of analytics history. Then, building on that thought, it’s partly because I’m aware of the holes in my knowledge (and memory). Here’s a paragraph from the second piece linked above, a post partially about the book Expected Goals by Rory Smith, in which Chris Anderson heavily features:
“While editing this post I leafed through my copy of The Numbers Game. Although Anderson, who co-wrote it, had been a blogger, he was kind of in the 'early' section that feels slightly separate from what came later; before the famous Opta expected goals blog by Sam Green, before the StatsBomb blog took off in a big way. The closing chapter of the book features forecasts, one of which is that ‘Geometry – space, vectors, triangles and dynamic lattices – will be the focus of many analytical advances’. How smart would I have looked in the 'early analytics Twitter' era of blogging if I’d just repeated that over and over again?”
The point being: if we’re building ideas, we should, as much as possible, use what’s come before as our foundation.
And then in a slightly different vein, from the same piece:
“[M]aybe Moneyball’s Anglosphere legacy is larger than we thought, but, for example, what if any legacy does Nemzeti Sport’s early twentieth century data visualisations have in Hungary?”
History is as much about who is forgotten as who is remembered.
Special guests
Finally…
The best newsletters, like the best conferences, are a collection of people and ideas. Get Goalside has been lucky enough to have a bunch of smart people pop up and form the basis of the following pieces:
‘Can you teach tactics in a lockdown?’, ‘Bringing advanced data to the public’, Get Goalside #100, ‘What do we actually know about football’, and ‘Understand football and you’ll understand AI’.
I’ll end with something that Javier Fernández (ex-Barcelona, current-Zelus Analytics) says in the special 100th issue of Get Goalside:
“This sport has an incredible opportunity of becoming even more popular and even more enjoyable. Organizations will benefit immensely if they share more data; analysts need to prepare more and better use the data. We will all grow and enjoy more. xG is great. But football is not simple. Don't settle.”
Thank you everyone for reading.