Possession adjusting: Part Two

2 Possession 2 Adjusting

When we’re young, we believe everything is possible. At a certain point we get old, tired, prone to saying “It’s complicated”.

The idea of adjusting a player’s defensive statistics to account for [whatever]… it’s complicated.

After last week’s investigation into ‘possession adjusting’, I’ve taken a look into ‘turnover adjusting’. A write-up with data tables is here, but the long and short is the same as the last one: defensive actions don’t seem linked to possession/turnover share.

Briefly, I should acknowledge an error I made in the code for the original study (now corrected and updated in the GitHub files for the project). It was a bad but ultimately insignificant one, as far as the results go. (Lesson: do basic data exploration steps throughout, particularly if you’re in an unfamiliar dev environment).

Here be numbers

A reminder of what these studies are: using Statsbomb’s open data for the 2015/16 ‘Big Five’ European men’s league seasons, running a simple correlation analysis of player defensive output (per 90 minutes, with different actions analysed separately rather than grouped) compared to a variable representing ‘possession’. Previously this was the team’s share of passes, this time it was the average number of possession sequences that happened per 90 minutes of game-time.

Some headline numbers from this latest bit of analysis, comparing to turnovers. Outside of the small group of Wing Backs, no combination of player position and defensive action had a correlation strength against the turnovers per 90 stronger than +-0.31. Only six out of a possible 45 combinations had a stronger correlation than +-0.2.

I was genuinely surprised by this. I thought there’d be a stronger link between how ‘turnover-y’ gametime was and how many defensive actions players made, given that defensive actions often are turnovers.

Even when performing the correlations for players league-by-league (which controlled a little for the fact that the Premier League had noticeably fewer turnovers than other leagues), there was little of any meaning to draw out.

So what now

Looking through the data in more detail shed some light on this.

La Liga’s correlations are really confusing, with negative correlations for Defensive Midfielders - meaning, a (very weak) relationship where a higher amount of turnovers was linked to a lower amount of defensive actions.

Why? Partly because Rayo Vallecano DMs had a really high amount of turnovers per 90 but weren’t hugely defensive active. Meanwhile players in the Spanish Big Two like Busquets, Mascherano, and Casemiro played in matches with far fewer turnovers but were much more defensively active.

And it was like that across the leagues and across positions. Shift to Premier League centre-backs (my favourite group) and at the low-turnover end you have the incredibly-active Nicolás Otamendí, alongside Otamendí aspirants like Ramiro Funes Mori and Laurent Koscielny , then at the high-turnover end you have the almost lethargic (in statistical approach) Scott Dann and Brede Hangeland.

No, so really now what

At a certain point we get old, tired, prone to low defensive output, and saying “It’s complicated”.

And there are so many complications. The play style of the match affects the statistical output, as do both the relative and absolute quality of the two teams playing it. A player’s role affects their statistical output, and can often be detected through it, but not always. A player performing their role badly may look like a different role entirely, and a player unable to perform their role may be because of them or their teammates or their manager.

Not only that, but weaker teams tend to defend with more players which affects how they can attack after turnovers which also affects how they can defend after losing the ball again. Sergio Busquets putting up ‘midfield destroyer(ish)’ tackle numbers, on a possession-heavy, low-turnover team is a clear sign that football is played on an odd, odd playing field.

Isn’t it exciting?

If you’re new to football data, you may well have missed the introduction of expected goals, the introduction of possession value models, the jump into physics PhDs and graph neural networks that tracking data has brought, and you may well be missing the first steps of body pose data (for something cool on that, see here). Yet there’s still no public consensus about how to take the tackle numbers of a player on one team and say ‘this is how best to compare them to a player on another’.

I still reckon that’s possible.