The path to now

From Charles Reep to Skillcorner

Football vs Transphobia week of action, 25th-31st March: https://www.footballvhomophobia.com/fvt/

In the beginning there was darkness.

And then… Then there was the light of a headtorch, illuminating the notepad of Royal Air Force accountant, Wing Commander Charles Reep. It is 1950. Like many of us, Reep is annoyed at a football team. Unlike many of us, he’s so annoyed that he’s marking down his own data.

Within a few years of beginning his data notation at Swindon Town, of all teams, Reep will be a minor celebrity. He will be credited with helping the success of Wolverhampton Wanderers, the most successful English team of that decade alongside only Manchester United. They won three league titles, two FA Cups, and their midweek friendlies against Continental rivals helped inspire the European Cup.

This is mid-century England, and it’s the first wave of football analytics.

And now, seventy-five years later, we have team cohesion of pressing chains, among other shiny things that data company Skillcorner is promising in their latest launch of metrics. This is quarter-century England - at least where this newsletter is being written - and it’s the latest wave of football analytics.

How did we get here?

**

As much as it would be nice and neat to begin football’s data history with Charles Reep, we must go back into older and fuzzier times. Newspapers from 1920s and 1930s Hungary show visualisations that would look modern in football coverage today.

A ‘match chart’ from 1922, from ‘Three Sportviz Inventions By a Hungarian Newspaper’, Nightingale

According to Attila Bátorfy, the writer of the linked article, the Hungarian newspaper Nemzeti Sport mentioned its creations being copied by outlets in Italy and Sweden. We know disappointingly little about this spread of football datavis. Historical terms are always relative to geography and the present day, and so for the moment Charles Reep remains our most secure starting point for ‘football analytics history’. But it seems very likely that as long as there has been football, there have been dorks making data out of it. Often it hasn’t been imagination that has held things back, but finance or technology.

Those charts (which, to reiterate, were published around the time Hungarian icon Ferenc Puskás was born) focused on themes we’d find familiar. Shot maps, for the match’s chances, and a momentum chart for which team was on top. These were also themes that Reep homed in on in his data. In 1997, a paper was published in the journal The Statistician using Reep’s data. The statistician Richard Pollard - who by that point had worked with Reep for more than a decade - put together what would nowadays be recognised as expected goals and possession value models. The match’s chances, and which team was on top.

Reep, though, was motivated specifically by what helped teams to win. This was a difference from the match charts of earlier newspapers. It also seems like he was motivated by a particular tactical viewpoint, a familiar-sounding one: that the modern style of short passing was overhyped.

From our modern perspective, this seems flawed to say the least. And Reep’s analysis of the data he collected, in articles that he wrote in newspapers and magazines, often seemed cherry-picked. However, some context about 1950s football is useful here.

Imagine the dying stages of a modern football match, when both teams can still get a result but everyone is very tired. The game becomes spread, formations become less clear, as players get caught between wave after counter-wave of attack, unable to get back ‘into position’. This - based off a bit of Footballia-watching - is what 1950s football was like all game.

Teams were simply less compact, which meant that passing was much more forwards, as it often can be in the sunset period of tense, modern matches. Teams on the ball weren’t facing a defensive block so much as a defensive lattice. In this context, sideways passing instead of forward movement would allow a defensive structure to form, something which our modern experiences simply take for granted.

Technical ability was also lower. If Reep had been around for the tactical evolution of the 2010s, he’d have probably liked the high counterpressing approaches coming out of Germany. If your 1950s-calibre players were going to be prone to miscontrols, why make them in midfield and not, after a longer ball, in the final third?

Although he was probably correct that there was efficiency to be squeezed, Reep’s public analysis was contested in newspapers at the time, and his professional involvement in the ‘50s and ‘60s didn’t lead to an ‘analytical revolution’. Successful, popular English teams still played ‘nice football’. Yet Reep’s data collection being affected by his opinions about football was not to be a one-off.

**

All data collection implicitly has an opinion. Way back to Nemzeti Sport, the data that was created was the overlap of what ‘mattered’ and what could feasibly be collected. Part of Reep’s skill was developing a shorthand system that could collect much more information than you would expect to be possible. (It may have helped that he was an accountant who joined the British Royal Air Force’s administration - numbers and regimented systems was his day job).

Those who took, and possibly bastardised, Reep’s research were inspired by contemporary trends themselves. Shortly after the 1966 World Cup triumph, results turned dismal for the England men’s team. They failed to qualify for all four major tournaments between 1972 and 1978, and had disappointing results at the following two that they actually qualified for. After the disappointing 1982 World Cup, an article in the Daily Express pointed to the beliefs of FA assistant director of coaching, Charles Hughes, as the way forward:

“Yet five years ago, and again before these finals, the FA had at their disposal information and analysis which could transform the future. […] [T]he logic of the Reep-Hughes analysis and statistics over 30 years, rejected by [England manager, Ron] Greenwood, is undeniable. Every four years we attempt to explain failure, we have a thousand arbitrary opinions. But Hughes has the facts.”

(Although the article says that Hughes learnt from Reep’s work, it names Reep as a ‘retired naval commander’, the wrong branch of military. It seems likely that the journalist knew Hughes’s opinions far better than Reep’s).

These facts: few goals come from long passing moves, lots of goals come from set-pieces or final third turnovers. Hughes and another figure the article mentions, Graham Taylor, got their chance a decade later. After a memorable fourth-place under Bobby Robson at Italia ‘90, Taylor got his chance; unfortunately, his England side won zero games at Euro ‘92, and failed to qualify for the 1994 World Cup. Few goals came from long passing moves; fewer came from a long-ball England team.

**

Having traveled the historical timeline all this way, it’s only a short jump to the founding of two industry-defining modern data companies. As has been the case with the characters we’ve seen so far, operations were influenced by circumstance.

Opta and Prozone (like Reep, both English) came into being around 1996-1997. Fresh off the back of a memorable semi-final for Terry Venables’ England at the ‘96 Euros, Opta would be collecting data for the Premier League. But they needed to fund their enterprise. A hungry media with pages and airtime to fill helped immensely. Prozone, meanwhile, found business viability through video analysis and player running load data.

Technology helped each company create a fully systematic data collection operation, where Reep had initially been limited by needing to not only be at the games himself but to process all of his shorthand into data by himself too. Yet by the mid-2010s, the technology was not enough. The context of these companies’ origins and the context of contemporary football was beginning to frustrate people.

The wave of Pep Guardiola imitators had made it clear that incisive passes needed to be identifed, while the pressing style emerging in Germany made the conventional collection of tackles and interceptions seem… small-fry.

A German company called Impect sought to address the first, with an emphasis on how many players passes ‘bypassed’, while the second was taken on by England-based (again) company StatsBomb and their ‘pressure events’.

Prozone, meanwhile, had apparently grown used to football clubs being uninterested in the details of their data. They “had been the gatekeepers of tracking data for many years and were loath to share it with anyone,” wrote Ian Graham, Liverpool’s Director of Research from 2012-2023, in How to Win the Premier League. “The data revolution in football could have happened years earlier if it wasn’t for Prozone’s protectionism.”

Little surprise, then, that Graham helped lobby for the Premier League to institute full access to tracking data in the early 2010s. Not only that, but a couple of years after league-wide sharing of data was finally agreed, in 2016, Liverpool Football Club welcomed a new tracking data company.

It was a relatively new company, and evidently neither side of the partnership wanted to wait around for innovation to evolve slowly. For once - unlike Charles Reep, Hughes and Taylor, Opta and Prozone - it was not English by birth. It was Parisian. And it was wanting to do something completely crazy.

**

Around the time that Opta and Prozone were being set up, the host for the 2002 World Cup was chosen. It would be the first-ever shared World Cup, between South Korea and Japan; the first-ever in Asia; and the first since the 1930s when a host, at the time of announcement, had never qualified for the competition before. (Japan would shortly afterwards qualify for the 1998 edition, their first appearance).

Whether caught up in this World Cup buzz or not, this period saw some interesting work by Japanese academics. Beginning in 1996, Tsuyoshi Taki and Jun-ichi Hasegawa published a series of research papers on the ‘dominant region’ of teams - areas of the pitch that one team or the other had hold over. Or, in modern terminology, ‘pitch control’.

The idea and calculations would still appear innovative twenty years later, but a problem at the time was getting the data itself. Work was published in academic journals for computer vision (a field where computers process images to detect objects, basically), but results were either small-scale or theoretical. Taki and Hasegawa even had to propose an in-stadium camera set-up that would work for their calculations. (“The telecasting image is not useful for analysis,” they wrote, “[…] because most scenes are intermittent and are focused on a player with the ball.”)

This in-stadium set-up was the kind of thing that Prozone was able to do, but still suffered - to a lesser degree - with what Taki and Hasegawa believed made TV footage unfeasible to work with. The issues of players being off-screen occurred on a smaller scale when they passed in front of each other, blocking one player or another from the view of the in-stadium cameras.

Technology and imagination, though, find a way. Among other things, TVs got bigger.

When Liverpool partnered with Parisian-based Skillcorner in the late 2010s (yes, we’re finally getting back to Skillcorner), the camera shots of football matches were both larger and crisper than they’d been in the late 1990s. Computer vision technology had also improved. And so too had understanding about how football teams moved.

Football coaches talk about ‘shape’ and ‘units’ a lot. Teams move in coordinated ways - in defence, particularly. This is hardly ground-breaking information, but it means that with enough processing power and prior data you can make smart guesses about where off-screen players are. And with a wider aspect-ratio of television, there tend to be fewer players off-screen, who are off-screen for a shorter time.

All this combination of technological advances allows companies to create more and/or better tracking data from relatively cheap, and very available, TV footage. For the first time in history, teams could scout with tracking data.

But what to scout? Would you simply use the ‘distance run’ figures, the thing that English coaches of the ‘00s seemed to like?

The fascinating tension in innovation is finding the balance point where imagination can be pushed just enough past technology. The latter often proves to be easier to improve than the former.

As it turns out, just like Impect and Statsbomb created unique selling points born out of frustration with the industry’s existing data, the world of ‘physical data’ stats has its sticking points too. At a similar point in time, the early 2010s, researchers were trying to find better measurement points when it came to player running too.

Separation between speed thresholds, ‘running’ vs ‘high-intensity runs’ vs ‘sprinting’, was fairly established, but still dissatisfying. In a particularly punchy 2015 research paper, subtitled ‘Shedding some light on the complexity’, we get this line: “Contemporary time–motion analysis of soccer still only offers a basic snapshot, and it is imperative that future research attempts to integrate multiple approaches to unravel the complexity of the game and its performance determinants.”

Three years later, one of the co-authors was involved in a paper whose title went even further. It was called “Are Current Physical Match Performance Metrics in Elite Soccer Fit for Purpose or Is the Adoption of an Integrated Approach Needed?”.

The floors of the 2010s data scene were littered with gauntlets being thrown.

**

No longer would it be acceptable to just focus on running at different speeds. Running - in fact, all movement on a football pitch - has a tactical component. That, ultimately, is the field in which Skillcorner’s ‘Game Intelligence’ data, particularly the recent out of possession launch, is pitching itself in.

Early event data captured on-ball events like tackles; later event data added direct pressures to the ball-carrier; the latest wave adds layers of variety about what effects that pressure has. There’s even a metric for how many times a defender turned their opponent backwards, something that Get Goalside imagined for an alternative football data universe several years ago.

It remains to be seen whether this fancy new data - like all data - is as reliable and insightful as it markets itself to be. The point at which Get Goalside tends to turn from enthusiasm to scepticism is - out of step with some media outlets - when someone starts flogging it. Thankfully, no-one is (yet) shilling a service based on the historical lineages of football data innovation.

To say that history repeats, or echoes, suggests that history itself is alive. Humanity repeats itself. People echo people that have come before them. It’s no surprise that association football is an echo of other ball sports from global history, because the bounce of a ball is a magical thing. It’s no surprise that the urge of a small group of people, to understand those bounces a little better, echoes too.

We know for sure that, a century ago, there were people tracking the key events of football matches. We know that, as England was still only five years removed from the Second World War, extensive and systematic data collection of match events was underway. We know that forerunners to modern tracking data applications are almost thirty years old.

Many things about the game have changed. For one, tactics no longer have the athletic limitations of the 1950s: research suggests that high-intensity running increased 30% in the Premier League between 2006 and 2013, and by a further 10% between 2014/15 and 2018/19. This increase in athleticism may well be an inspirational spur for some of the latest data innovations, or at least making the ground fertile for Skillcorner to put its roots in.

But many things are the same.

In late 1960, Reep wrote a pointed article about Tottenham Hotspur’s victory over Stan Cullis’ Wolves in the FA Cup, suggesting that their short passing would make their success short-lived (they ended up winning the double that season). In a response to Reep in the Hull Daily Mail local newspaper, their football correspondent closed his piece with the following:

“Football is a sophisticated game, and it will grow more so as the years go by. At the moment, possession and change of rhythm are the two principal weapons of the world’s best teams.

The long, forward pass still has its place in the armory of any successful team. But its very value lies in a team’s ability to use it as a surprise, not, as Wolves [Reep’s former associates] have done, as a monotonous, obvious weapon.”

Tactical variety, tactical debate, and arguments about data. This is mid-century England.

References/Further Reading

Attila Bátorfy, ‘Three Sportviz Inventions By a Hungarian Newspaper’, https://medium.com/nightingale/three-sportviz-inventions-by-a-hungarian-newspaper-b5c0df489d6c

Ian Graham, How To Win The Premier League

Rory Smith, Expected Goals

Richard Pollard & Charles Reep, ‘Measuring the effectiveness of playing styles’ (1997): https://www.researchgate.net/publication/227692321_Measuring_the_effectiveness_of_playing_strategies_at_soccer

Paul, Bradley, & Nassis, ‘Factors Affecting Match Running Performance of Elite Soccer Players: Shedding Some Light on the Complexity’ (2015): https://www.researchgate.net/publication/273071207_Factors_Affecting_Match_Running_Performance_of_Elite_Soccer_Players_Shedding_Some_Light_on_the_Complexity

Ade & Bradley, ‘Are Current Physical Match Performance Metrics in Elite Soccer Fit for Purpose or Is the Adoption of an Integrated Approach Needed?’ (2018): https://www.researchgate.net/publication/322277340_Are_Current_Physical_Match_Performance_Metrics_in_Elite_Soccer_Fit_for_Purpose_or_Is_the_Adoption_of_an_Integrated_Approach_Needed

Taki, Hasegawa, Fukumura, ‘Development of motion analysis system for quantitative evaluation of teamwork in soccer’ (1996): https://ieeexplore.ieee.org/document/560865

Taki, Hasegawa, ‘Dominant region: a basic feature for group motion analysis and its application to teamwork evaluation in soccer games’ (1998): https://www.spiedigitallibrary.org/conference-proceedings-of-spie/3641/1/Dominant-region--a-basic-feature-for-group-motion-analysis/10.1117/12.333797.short

Taki, Hasegawa, ‘Visualization of dominant region in team games and its application to teamwork analysis’ (2000): https://www.semanticscholar.org/paper/Visualization-of-dominant-region-in-team-games-and-Taki-Hasegawa/beff32a0a37a8d094a471067895cf420dd2e20de

Previous Get Goalside on the subject of Charles Reep and analytics history: https://www.getgoalsideanalytics.com/36315087-analytics-is-older-than-you-think/

Get Goalside, ‘What if we didn’t care about passes?’: https://www.getgoalsideanalytics.com/what-if-passes/