Everyone wants to win and pressure mounts on teams that have yet to win. Do early season victories matter? Does getting off to a good start in the season help a team’s results through the rest of the year and does having a poor start mean the rest of the year is going to suffer? Here’s a statistical analysis and some thoughts on the matter.
The idea is simple that a team that gets off to a bad start risks carrying on for the whole year. Riders are demoralised, the media gets on their backs and calm tactics give way to panic measures as the pressure mounts to deliver results creating a negative spiral. Is it true?
Let’s crunch the numbers for 16 World Tour teams: Ag2r La Mondiale, Astana, BMC Racing, Cannondale, Etixx-Quickstep, FDJ, Giant-Alpecin, Katusha, Lampre-Merida, Lotto-Jumbo, Lotto-Soudal, Movistar, Orica-Greenedge, Team Sky, Tinkoff-Saxo and Trek-Segafredo over the last three seasons. Fellow World Tour teams IAM Cycling and Dimension Data are left out because they’ve been in the Pro Conti ranks over this period so this could skew the results. As a proxy for early season results the cut-off was the end of March: count the race days and almost precisely one quarter of the season’s race days have gone. The teams are ranked by wins for the first period to the end of March and then for April to the end of the season. Then these rankings are checked for correlation to see if teams that score early wins will keep on winning. Here the rankings as scatter graphs for the three years.
The evidence says that there’s only small correlation (rho = 0.591 in 2015, 0.32 in 2014, 0.404 in 2013) between early season success and having a good run through the rest of the year. In other words win early and you don’t really get into a groove for the rest of the season and teams struggling now can recover their mojo over the summer.
To illustrate this with anecdotes let’s see 2015 for some examples. From the start of the season to the end of March Etixx-Quickstep topped the team victory rankings while Lotto-Jumbo were last and this was true for the period from April to the end of the year too. But in between there was a lot of movement, Movistar for example were second equal for the early season period but 10th for the rest of the year while Trek had a good start in sixth place but ended the year in 14th place. In 2014 Ag2r La Mondiale got off to a roaring start and were fourth on the team rankings table by the end of March only to finish last for the rest of the season and last for the whole of the year.
If there’s only a small correlation to having a good start to the season and future success this suggests any causation is going to be even more tenuous. Winning early often means having a house sprinter to collect early season success as the peloton is spread around the world, for example see Bryan Coquard racking up the wins for Direct Energie right now. Doing it later when teams and the best sprinters are concentrated together in races like the Tour de France is another matter. Similarly some teams just aren’t cut out for the shorter, flatter races at the start of the season. Instead of having a punchy sprinter with a track background Tinkoff has Alberto Contador and Rafał Majka who need long days in the mountains to take their wins so over the last three seasons they’ve a disproportionate amount of their wins later in the season with only 11.7% of wins achieved in the first quarter.
The Moneyball Pitch
So if you’re a team manager starting with a blank sheet of paper to assemble a team and plan the season ahead what do you do? There are few cheap sprinters right now. Ever since Nacer Bouhanni signed a €1.2 million deal with Cofidis prices for sprinters have risen substantially so hiring someone to take a few cheeky wins in February and March comes with an outrageous price tag. Prior to this it would have been good to try and recruit a second tier sprinter to collect some wins, for example Astana have Andrea Guardini for this role, he might be Italian but can’t get a ride in the Giro and has to settle for taking smaller wins.
All this focuses on data but in cycling storytelling counts for so much. Any team that has a slow start can quickly feel the pressure as the media and fans point out the lack of wins. Any manager saying “don’t worry, we’ve planned it like this” will find people saying “they would say that, wouldn’t they” and mocking a rider’s second place in a photo finish as “part of the plan“. But resources can be deployed differently, why spend all that money on big pre-season training camps in order to have your team red hot in February when you could save the budget for later in the year in order to strike when it really counts? Then again a few early wins are good for morale and cohesion.
The broader question is whether early wins count for anything more than morale and narratives. The data suggest they don’t bring momentum. It’s questionable whether they bring publicity. We might think the goal of a team is to win but really it’s to promote their sponsors and if winning the best way to do this not all wins are equal. Take an untelevised race with a handful of spectators at the finish line where the results end up buried in a local newspaper and yes, you have a win in statistical terms but the publicity gains are tiny. Taking out an ad in the local paper would have been cheaper.
Instead the optimal solution is to to draw up a list of target races that reach the target audience of the team sponsors. The Tour de France is obvious as it’s piped into so many homes and has huge roadside audiences. But elsewhere sales of lottery tickets and scratchcards in France won’t change if FDJ storm the Tour Down Under but this race matters to Orica-Greenedge a lot more. Team Sky have gone big on the Tour of California in the days when 21st Century Fox was on the jersey as a sponsor and given the enormity of the US bicycle market California is a big deal for Cannondale, Trek and other teams with prominent manufacturers.
Conclusion
Does a team that starts winning early carry on winning throughout the year? No. Going by the data for 16 teams over three seasons here’s little to suggest that good fortune early in the year brings momentum, that success breeds success. Instead it seems structural as teams with climbers obviously wait until the snow melts while a team like Etixx-Quickstep that has a lot of riders capable of winning one day races win a lot. A coach or team manager might like to hire a sprinter to pick up early wins but a sponsor might wonder if the price paid is too high given the meagre publicity available in many small races.
Interesting article.
The target races makes sense although, to some degree, some teams seem to exist to ride almost.
Take Etixx, will their targets vary because they now have Lidl on board? Probably not; they ride because of where they’re from and what their culture is.
Greenedge are another interesting example. Apart from their home race, they’re out on a limb in terms of much of their home audience – in another time zone. But they circumvent what could be a restriction through excellent PR and social media interaction. It really does add a whole new possibility for the team and sponsors. It would be difficult to put a value on this but if you could add all the video hits for Greenedge, would the value pay for one, two, several riders?
Having said that, these teams still need wins to keep the ball rolling. And both have excellent young sprinters in their ranks.
Excellent PR and social media interaction may well add a whole new possibility for the team and sponsors.
Yet, in what is now their fifth season, they are still Orica-YourNameHere. Odd, that.
Orica has business in many countries (one of their factories is located near Bergen, Norway) so the team sponsor is at home everywhere or at least everywhere some rock needs to be blasted. If riders and Aussie fans are less so, it might be less important. They are spending most of the year making Orica a familiar brand name in Europe.
Good article. More of thei data driven journalism is needed in cycling (and sport in general) rather than stories and “trends” driven by journalists forcing facts into a pre-conceived narrative. Do you read Five Thirty Eight, Football Outsiders or any of the baseball sites? They do a really good job of using statistics to inform our knowledge of sports in a much better way than almost any cycling coverage which essentially amounts to sound bites, gossip and who won what reporting.
+1
I don’t read those but am aware of the large use of data in baseball, the sport seems to lend it to this. In cycling we have fewer metrics, I’ve been toying with compiling an index of which races finish in a bunch sprint for example to have a definitive stat but it’s very subjective once you try to define what is a bunch sprint and what is a group.
What about a rider vs group bunch sprint metric? I’m thinking for sprinters, it’s a continuum between Marcel Kittel (Can’t get over mountains) and Valverde (Climber who has a fast finish). Amount of climbing / difficulty of racing determines how many are in the lead group.
For example, if the size of the group determines how hard / how much climbing the race experienced, then each type of sprinter can win in a given set of circumstances/terrain.
ie: Kristoff tends to win bunch sprints between 10-25 people, above 25 people and his sprint % suffers, etc.
Matthews can win sprints between 20-50, etc.
Kittell wins anything over 75, etc.
Or perhaps the % of wins of a sprinter as judged by which teammate is also in the bunch? Here I’m thinking that someone like Paolini or an Impey would do well. If Impey is in the bunch, then Matthews/Gerrans has x% wins, but if Impey misses the split then their % drops…
Agreed, baseball has lots of individual situations with a specific player attempting a specific action with a specific outcome, all of which can be tallied. Cycling doesn’t record anywhere near as much information.
I suppose the bunch sprint question comes down to what constitutes a bunch. More than a certain proportion of the starters finishing within x seconds of the winner?
You could possibly define a “Bunch” sprint by X number of teams with dedicated sprint trains with Y number of said train present who present their dedicated sprinter to within Z distance of the finish line. While and “Group” finish would lack those requirements and result in defined non-dedicated sprinters arriving together. Also, if a certain percentage of the peloton finishes within a certain time of the dedicated sprinters, it would be a sprint. Just some musings.
Totally agree – I nominate Inner Ring to start a fangraphs for cycling site.
Oh wait, there’s no way that the ASO or UCI or the teams will pay him to do that statistical analysis. Those bodies in other sports contract their services to the teams, leagues and TV networks. Unfortunately this is not on the table right now for cycling.
Sure, there’s no chance of anything nearly as sophisticated as what we have for baseball, NFL etc in terms of statistical analysis, but there’s definitely more scope for journalists and bloggers to utilise the information we do have and interpret cycling in a more informed data-driven way – precisely as TIR has done here extremely well. There are definitely small things which could be done to improve reporting and, just as crucially, cut down on bad reporting, right now with the resources cycling already has available.
I agree, it can be done, but the powers that be (eg. UCI and ASO) don’t like to look at sports analysis in the same way that North American sports fans and organisers do.
For example, who’s going to pay you or I or Inner Ring to keep these statistics? In North America, the leagues, networks and teams pay to have the statistics kept.
The never-ending quest for the empiricism of sport, does seem to be a curiously North American phenomena. I would suggest that the European model of sport appreciation tends to celebrate the capricious nature of fate, weather and timekeeping; as opposed to boiling everything down to statistics.
Colin one of the interesting things in cycling is the lack of repeatable performances. Maybe it works in track cycling but on the road it’s so hard to compare. You can have one race a year meaning only an annual repetition and then you have course changes, changing wind directions, different fields etc. Data are hard to find and equally hard to apply.
Exactly Inrng, long live external variables, and the relative unpredictability of the sport!
Probably the subject of an upcoming piece on here if I can stitch the thoughts together into one piece.
Interesting analysis, but aren’t these data really more suggestive of teams having some level of consistency in good or bad performances over a season? It seems a stretch to suggest that a team win’s in the early season somehow set them up for wins later in the season. Seems much more likely that they were just good that season, so they won early and late.
The idea was to explore the idea that teams that get stuck in a rut stay there. Similarly there’s a big relief for teams when they win, FDJ yesterday talked about “unblocking” after their win yesterday and this mentality is quite prevalent.
Kind of remarkable a French team would require such a motivational compass. There are loads of races in France. By turning up and partipating any French team would expect wins in that first quarter and beyond.
Is there any relation between a team being super-hot in form (like AG2r last year) and a sudden drop in win ratio after a positive test in the team??
I do wonder if it is possible to distinguish between the two situations using these figures. Perhaps it would be necessary first to isolate the teams who get a disproportionate share of wins in particular parts of the season?
Thanks for the data, though, M/Mme le host.
There are nice statistical models for just this. If inrng was able to share the data used would be happy to put up some analysis which speaks more directly to the benefits of an early win.
What also is the cost of running a second team to support the sprinter? Even if you don’t commit to a whole sprint train like katusha in Qatar at the moment, you still need a rider or two to support the team sprinter. This then adds some complexity to planning the team as targets may be incompatible for the different riders.
True but you’re bound to have a couple of solid rouleurs on the team, whether for the classics, as bodyguards for climbers, for TTTs etc. Obviously hiring proven specialists here can cost more.
+1
You, sir, are quite a polymath.
Wow — fantastic work!
What if one-day races were compared to one-day races and stage races were compared to stage races?
Dear Inrng, Have you tried combining the three years worth of data into a single data set and graph? This would provide a larger sample and improve the reliability of the results.
What he said!
It’s 0.441. I wanted to show each season separately in case people wanted to see the underlying data more clearly.
Great article, as always. However, you made a very common mistake: there is no correlation! You adjusted the linear regression to data sets that are completely random. Even if they were linear, with rho less than 0.95, there would be still no correlation.
They’re over different time periods but surely it’s a way to test whether performance in the early season can influence the rest of the year?
I meant just a statistics. You used the wrong method to check the trend and correlation.
Whenever the rho is less than .95 there is no coreraltion, i.e. you really can’t draw any conclusion. I could fit a cow with a better rho and it would not tell us anything 😉
I’m confused KubaWinter. A low rho means the regression coefficient is pretty meaningless, but the correlation coefficient still tells you something, no? At least the fact that it is positive for all three years shows that probably ‘shine in february, fade in july’ does not hold for entire teams. With 48 samples, the 0.44 correlation coefficient for the combined data is significantly different from 0 for sure.
If I were forced to write down a conclusion from this set of data it would probably be something like: if you win often, you usually, but not necessarily, win some early races as well.
Correct me if I’m wrong INRNG, but the rho is the Spearman’s rank correlation coefficient. This is different from linear regression. The possible values of rho goes from -1 to 1 and is an indicator of the relationship between the data points. As far as I can tell, linear regression was not used here and no test of significance was undertaken. There are many ways of testing whether the rho, the correlation coefficient, is significantly different from 0 (no relationship).
https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient
Therefore, the datasets above do show correlation, linear regression has not been undertaken, and the threshold of 0.95 is not applicable in this scenario.
Hopefully this clarifies the situation but I’ll be happy to clarify further if needs be.
Dr Dewi Owen, Quantitative Social Scientist, University of Bristol
The above reply is in support of the analysis and clarifies what is in my opinion the misunderstanding of KubaWinter
Quite right, it’s the Spearman’s rank rho.
I love it when statisticians get together to argue….. Can I just say I prefer red dots and blue lines.
Can wins Qatar, I already forgot who came second!
Baseball stats with correlation to cycling:
http://www.sfgate.com/sports/article/MLB-Drug-Suspensions-6827350.php
ha