Get Goalside
Posts
Blogging's cool again

Blogging's cool again

Let's share some links like they did in the dial-up days

Mark Thompson
December 17, 2024

Yes, to take Oscar Isaac’s memorable line from Star Wars: The Rise of Skywalker: “Somehow analytics blogging returned.” (or, if country star Lainey Wilson is more your speed: “Doggone, dadgum it, didn't see that coming / Blogging’s cool again”)

Let’s get the links up top, then share some data, and finally do some ‘takeaways’.

A brief aside: One of the writing skills I most admire is drawing together the different contextual threads that contribute to a particular story. Grace Robertson is excellent at this, e.g. a piece earlier this year on how Positional Play turned defensive. In that vein, I’d recommend her recent article on this year’s Rainbow Laces campaign.

The links

Devin Pleuler, Senior Director of R&D at Maple Leaf Sports & Entertainment (a.k.a. ‘the Toronto sports teams you know (except the Blue Jays)’) has been admirably prolific with Central Winger over the past few weeks
the folks at KU Leuven’s Sports Analytics Lab are doing a series on design decisions for possession value models
Houston Dynamo’s Head of Analysis Carlon Carpenter wrote about movements in the final third, using Skillcorner data for the Austrian Bundesliga
analytics consultant Joris Bekkers just wrote up some work he presented last year on measuring pressing intensity (at a Skillcorner event, by coincidence)
Over at American Soccer Analysis, Eliot McKinley’s put together a detailed look at the impact of capping ‘minutes played’ at 90 (a pet niche for Get Goalside, given that it touches on 1) added time decisions 2) definitions of statistics by a dominant company that have an outsized influence on the public stats understanding)
Michael Caley’s Expecting Goals is back from US election vote analysis with a dissection of a different type of symbol of statehood, the recent bad form of Manchester City’s men’s team
Paul Johnson with a blog on using multilevel modelling and football finances (which is slightly less recent than other things in this list but he was on the Double Pivot podcast a few days ago chatting with aforementioned Caley)

Those are recent beginnings or returns, joining things like:

Ben Wylie’s Plot the Ball, which recently wrote about Barcelona’s use of young players.
Scouted Football, whose output is becoming more and more delightfully statty with the recent additions of Skillcorner data (them again) and Jake Entwistle (previously of Squawka)

Send along any more!

On top of this, it’s thematically appropriate that Hudl (who now own Statsbomb) have just released some fresh data to the public. (provided you fill in one of those ‘is this really necessary’ data collection forms to do so).

This is a lot of fun. Having a ‘to read’ pile that starts tipping over and collapsing under its virtual weight is a nice change.

I asked Devin Pleuler, who’d written under the Central Winger title on MLSSoccer.com in the early 2010s*, why re-start now? “The primary [reason] is that I enjoy it! I find writing to be the best way to organize my thoughts and opinions and effective communication remains the most important part of sports analytics.

“But also, with the shift away from Twitter, it felt like the right time to reengage with the community that has atrophied over the last few years. My belief is that teams are too secretive and perhaps this can motivate others to contribute to the conversation.”

*some of it is still online! Remember Chivas USA?

Where is the data…

If this has motivated you to join in, you might be wondering where to get some data.

Outside of the recently-released J1 data, Statsbomb has a trove of publicly-available event data. There’s Champions League finals, one-off team seasons like for the 2003/04 Arsenal Invincibles, and entire league seasons for a selection of top European leagues (and more). They have a Python and R package to help get started loading the data.

If you want to try tracking data, there are a handful of matches from Metrica Sports, Skillcorner, and PFF. To be frank, these aren’t as polished as datasets as the Statsbomb event data, but at least the Kloppy Python package can help out a bit.

[online edit: after publishing, Kloppy released an update of the package, including access to some Sportec event and tracking data!]

Get in touch if you know any other data providers who have data available to the public.

If you don’t/don’t want to code, I’d just recommend scanning through FBRef. You can get a free trial of Stathead to access the database in different ways. And, although it’s a bit of a pain, you can just copy and paste tables into Excel. (My first stat explorations, many years ago, were based on flicking between one of WhoScored, Squawka, or the StatsZone app and a spreadsheet).

Beyond this, there are a range of ‘community’ data access sources. If you want to scrape data, or access scraped data, it is not too difficult to find resources. I’m retiscent to point directly to them because 1) I don’t need to use them anymore 2) unless you have a clear and specific use, you don’t need to.

If you want to plot some on-pitch data for shot maps etc, mplsoccer is good for Python, ggsoccer was my go-to when I used to use R (I still miss ggplot), and d3-soccer is good for the JavaScript hive.

Again, lemme know of any other similar packages in these or other coding languages.

What do all these new blogs teach us?

If I were starting (or re-starting) analytics blogging again after a time away from it, some neat notes of interest from the list of links earlier:

Time: In the VAR age, using full minutes rather than ‘capped at 90’ minutes matters for players (as McKinley demonstrates). When dealing with in/out-of-possession data, ‘per 30’ seems a neat standard (as Carpenter demonstrates).

Data science can be fascinating: I overheard someone at October’s StatsBomb conference say that you can bank on KU Leuven Sports Analytics Lab giving you strong work - the series linked above, taken from a master’s thesis by Lode Van Tente, is just that. In a way it’s as simple as switching one variable in a model and writing up the results - but both the reasoning and the write-up are excellently clear.

Paul Johnson’s post is also a nice example of the ‘modelling technique applied to football’ genre (a field which can otherwise produce some, to be honest, unreadable work). It benefits from focusing on explaining the technique, and not over-promising what its application means for the results.

Remixing: The KU Leuven series is, notably, riffing on methodologies that existing possession value models use; part of Joris Bekkers’s post is based on an aspect of Will Spearman’s Pitch Control model; Devin Pleuler has a whole post about data science tricks he’s learnt from others. You can get a lot of mileage out of taking an idea of someone else’s that you like and spinning it into your own area of interest.

Learning and thinking: There’s a three-minute segment of a lecture that I like to share, by Larry McEnerney, the director of the University of Chicago’s Writing Program. His aim, in that section, is to reinforce to the grad students taking the program how different it is to write for a general audience than the academic one they’re used to. But in doing that, in being specific about the way they currently write, he says:

“To help yourselves do your thinking, you have to do your writing. You have to do this, because the stuff you’re thinking about is too damn complicated to just do it in your head.”

And this is one of the things Devin Pleuler said earlier. “I find writing to be the best way to organize my thoughts and opinions,” he said.

This doesn’t need to be what all your writing is, or all your thinking; neither does it need to be on a regular schedule. There are seven bullet points in this newsletter’s first list of links, and six of those are one-offs or irregularly scheduled. And they were all still a joy to have.