Lessons in gen AI

For now: 'gen' as in generative; in future: 'gen' as in generation?

It’s a matter of time before someone releases an ‘AI assistant coach’. I imagine that, when they do, it will follow the common AI hype cycle:

  • product (or, more likely, product beta) released

  • interface is engaging enough that normies can use it for jokes

  • the ‘LinkedIn Apex’: declarations of a game-changer for the world

  • within two weeks, usage drops 80%

Despite this, generative AI systems are here to stay, in one form or another. Why? Because they’re getting quite good. As someone who spends a lot of time in code editors, I can attest to that.

And, attesting to it, there are some features that guide us to what football-based systems might be like when they inevitably (but not necessarily imminently) come.

How fast the road takes us, I don’t know. But I think there are three strands to the direction of travel: tools, context, and accessible content.

Tools for tools

There’s a small part of the generative AI industry who really believe, or want investors to believe, that large language models can become superintelligent on their own. But we don’t really need that - in either a practical or dystopian sense. “Agents being able to use software is how AI becomes more general,” Amjad Masad, the CEO of a very popular company called Replit, said recently.

These models, increasingly good at interacting with human languages and ‘non-human’ languages like code and APIs, can be hooks into things. And so your football LLM systems don’t need to be trained into being expert analysts. They just need to be good enough to use the tools at their disposal.

For example, some speculative fiction: “How well does this full-back defend against overlaps?” could be turned, by an LLM-underpinned tool, into a series of requests to other tools. Maybe you have the data for ‘overlaps faced’ already available, in which case gathering it is the first step. But maybe not, in which case a data science process can be kicked off (along lines of previous work). After that, another process, gathering data about how often corners, shots, and goals are conceded shortly after the defender comes up against an overlap. Another process might grab and edit clips.

(This feels like it’ll have implications for how different services can be used alongside each other, if ‘interfaces’ are going to be more geared to code scripts and API calls than human users. But that’s for another time).

The point is, a ‘generative AI world’ in football won’t need the LLM to be a football expert; the ecosystem of football technology will do a lot of the heavy lifting.

Context is king

Context is fun. This is objectively true, because mistaken context or innuendo makes up 90% of Shakespearean comedy. But it’s not fun if your LLM is missing it.

Code assistance tools are settling on ways to help this. For in-editor helpers, users can add files as context to their question - for example, you could ask it to write some data parsing functions based on the schema files you already have.

For football, the big and slightly hypey way this might be useful is context of terminology and game model. If we go back to the ‘overlap’ example from before, a club or coach might implicitly mean ‘overlaps in the final third’, calling similar movements elsewhere on the pitch something else. Or, a slightly more realistic example, terms for types of press or phases of in-possession play or player roles.

Beyond football, but perhaps in football more than elsewhere, I think LLM systems will live and die on understanding of context. It takes far longer to learn how to use a genAI helper if everything you ask it needs to include all relevant context. It’s like talking to a particularly stubborn child who’s decided to only follow instructions very literally. Or like working with someone in Quality Assurance.

But…

Content is king too

If context is important, that means the system needs to be able to access it. And that means that more of a coach’s/club’s work needs to be on a system that plays ball with an LLM-tool.

Think about training sessions. The majority of a coach’s work is done outside of a matchday, which has been the traditional bulk of data availability. They may well have a bank of reference video clips, but a lot of knowledge might come from conversations with the rest of the staff and exist in their heads, or on paper.

Think, as well, about coaches changing jobs. Currently, the clubs are the ones who buy software - but if part of what they’re hiring in a head coach is that coach’s methodology, it’s that coach’s ‘data’ that is the most important to access. Head coaches (at the top level where it can be afforded) often want to bring assistants along with them into a new role - would an LLM-based assistant be the same?

And, of course, if we’re talking about a coach’s knowledge as ‘data’, there’s the data ownership question. If LLM-based systems are going to be relying on coaching knowledge, coaches have got to make sure they can take this with them and use it in future roles. (The same is presumably already true for scouts, whose reports have long been logged into centralised knowledge banks).

The ‘pool of knowledge’ problem

Yes, LLMs spit out text that is incorrect. But it’s amazing that they work as well as they do, pretty reliable conversation-bots created by probability.

Of course, they’re more likely to get things wrong when the data isn’t there to produce ‘good’ probability estimates of the next word in a sentence. When coding, that’ll often happen when using a small-usage package, or non-mainstream language, or just a quite specific type of problem.

Fortunately, football is the biggest sport on the planet; and if you’re reading this newsletter in its original state, then your command of English will help avoid potential disadvantages of LLMs in other languages too. Purely keeping my British Isles locality in mind, it’d be interesting to know how good LLM systems are in Welsh, and Scots and Irish Gaelic.

All that being said, while I’ve made the point that it’s the football software ecosystem that’ll do most work, maybe this will be genuinely harder to achieve outside of the major languages and/or in more niche areas of the game. Will off-the-shelf LLMs fit into ecosystems where practitioners want to organise, say, periodisation and training microcycles in Japanese? (sidenote: I’m interested in how this would interact with things like the dialectic variation in Arabic too).

The end

You could probably go a long way without specialist software.

For text notes, tools like Notion have AI helpers which can search your notes and files elsewhere, like in Google Drive. The issues with domain knowledge of LLMs would come in here, but if you’re a low-budget club then you could probably get value out of using something like it as a repository for your coaching or scouting notes.

If you are the type of coach or analyst who collects video clips, and these clips are consistently labelled, an LLM-inflected search tool might help you find relevant clips easier, without you having to act as a folder-expert organiser. But maybe this will be even easier than that: given that tools can integrate with Google Drive, perhaps you could keep a your clips there alongside a spreadsheet of names, tags, and links (essentially metadata) and use that as the entry point for an LLM-searchbot.

 Get Goalside has written about what we mean by the term ‘analytics’ before, and whether ‘football tech’ counts or is stretching the definition too far. But, regardless of your opinion on that, the end of that piece is still relevant here:

And the thing with Moneyball is that it wasn’t about data per se[…] What it’s about is questioning orthodoxies and finding edges, finding the most efficient way to get wins that you can.

So whaddya do when the Yankees are ‘doing analytics’ too? You’ve just gotta try and find another edge.

‘What we talk about when we talk about 'analytics'‘, March 2024

Maybe this is one of those edges.