What's your research question?

The craft of intention

Which is harder to do well, asking questions or answering them?

At some points a few months ago, I was speaking to someone who works with physical data. A few years ago, they’d have had to be selling stakeholders on the answers th data could give them. Now, everyone eats it up. But the person was saying ‘I say to people, do you know why you want this data? What do you actually want to use it for?’.

It’s a good question to ask.

Ultimately, though, there is only one question that anyone in sport has: “how can I win more?”. And so you need to play a little game of trade-offs, narrowing the broad focus onto something narrower, a tangible area, a specific factor.

This is why this year’s Hudl-Statsbomb conference (no more capitalised ‘B’ for ‘Bomb’) steered its research competition entrants towards ‘trade-offs’ as a theme. There was one on the value of a lesser-quality left-footed left-sided defender vs a better-quality right-footer; one on the value of booting it and pressing a throw-in instead of trying to play out of pressure. (The 2024 conference research papers can be found here)

It’s not just this year’s HudStatconf papers where the strength of the question shines through. Frequently, the thing that strikes me in analytics ‘research’ work is the clarity of the question. It’s there in some of my favourite research papers (who doesn’t have favourites), like fellow 2022 Sloan conference appearances ‘Learning from the pros: extracting professional goalkeeper technique from broadcast footage’ and ‘Beyond action value: a deep reinforcement learning framework for optimising player decisions in soccer’. It’s there in my favourite broad genre of work: projects which didn’t find what they set out to find, but which were conceived clearly enough that the journey was worth it and the direction of future investigations is clear.

It’s a craft.

For example: Pep Guardiola has referred to formation notation as ‘telephone numbers’, Emma Hayes has said discussion around formations is ‘archaic’ (around 9:20) - but we talk about ‘formations’ for a reason, as a convenient shorthand. So there will be some cases where splitting data by ‘formation’, or where methods of determining ‘formation’, will make more sense than others. (If you need in-the-moment formation/shape information, then tracking data or video analysis may be your only viable options, but build-up patterns might be gleaned from event data)

But this whole thing about ‘what is your actual question’ is not just true of research.

Three years ago to the day (as this is being written), I wrote an overview of where you might spend money set aside for ‘analytics’. Quite frankly, I’d forgotten I’d written it. Let’s have a look at the end conclusion:

If you're one of the elite [teams], it makes sense to get ahead of the game and get a department set up internally, on the condition that you retain that knowledge. There's no point in the exclusivity benefits of an internal department if you don't make sure you still have it if someone leaves.

However, most clubs are not yet at the point where they're hiring data people and allowing them large amounts of time for research projects. For the majority, it probably makes sense to choose data provider smartly as much as cost allows (and to be honest about whether the shiny things in the data will get used) but then to make strategic use of third-parties.

‘Where to spend your analytics money?’, October 2021

There are two biases I have here:

  1. I work for a company you’d class as a ‘third party’, which one may want to make strategic use of (bosses and colleagues at Twenty3 Towers claim to read, so I better link the company website)

  2. It feels nice when things you wrote years ago still hold up

That said, I want to highlight two specific parts of this extract. “There’s no point in the exclusivity benefits […] if you don’t make sure you still have it if someone leaves.”; apply this to your internally-created tools as well as report formats and research projects.

If you think players are the only things whose performance can drop off a cliff due to getting old, you might wanna google ‘tech debt’. Several months ago, I quoted Charlie Marshall of the European Clubs Association: “There are so, so many [clubs] and the vast majority of them are quite small businesses.” If you’re a small business — a community events business, really — do you want to also be a software company? Why?

(This is worth its own blog, but can probably be boiled down to 1) difficulties in combining services from different data sources 2) difficulties in wrangling external software to work for team game models 3) an employee’s time doesn’t appear as an additional cost on the balance sheet 4) you don’t have to wait on the external company’s timeline to update your internal platform 5) as Andy Warhol said, ‘in the future everyone will develop a scatterplot tool for 15 minutes’)

The other particularly important part of the extract is “strategic use” of third-parties. A couple of paragraphs later in that 2021 piece, I wrote:

Using these would allow you to flex your capacity for 'analytics' as needed, without hiring full-time. As a club, this can also enable you to build up an internal knowledge bank if you make sure that the third-parties work and/or findings get stored somewhere people will remember them.

On top of this, assuming that you're not their first customer, third-parties are also likely to have processes in place that mean you can skip some of the tricky 'training wheels' stage of setting up an analytics department

‘Where to spend your analytics money?’, October 2021

(I neglected to mention data engineering, and I apologise to the gods of cloud computing for this)

The problem with all that, of course, is that following this advice may mean hiring a sufficiently capable head of data to then spend money on a data provider as well as then spending further money on outside services. Maybe that won’t play well with bosses who expect some guy (often a guy) with a quantitative degree to get things going within a month or two.

But, look at Bayer Leverkusen. Thankfully not too tight-lipped in what they allow to be shared on LinkedIn, they’re on the roster of Kitman Labs, Catapult Matchtracker, MyGamePlan, and SportsDynamics, and that’s only the software platform partnership announcements I could easily find.

It may well be that Leverkusen’s eventual aim is for all of that tech to be internally created and managed (as one of the ‘Analytics in the US’ panellists at the Hudl-Statsbomb conference put forward as a general truth). It may well be that they’re not even using those platforms (which would be strange, but ‘pays for an unused subscription’ is hardly a novel situation).

But, while the best research papers ask a tight, well-framed question, the best clubs will be doing the same.

Questions for the crowd

  • What are your experiences with building up internal software? How would you approach it if doing it over afresh?

  • What’s the best use of team formation as a variable in data research you’ve seen?