How do the rankings work?
Published on: Feb. 18, 2018
In this blog we will try to explain how the ranking system works.
To recap what is on the rankings page:
- To appear in the rankings table, athletes must have at least 4 recorded scores (although estimates will appear for all rowers on their individual results page), and their most recent score must date within the previous year. The ranking is on their estimated skill level minus the level of uncertainty attached to that estimate.
- The scores are only calculated on the data available. This is incomplete, but more data will be added over time. Browse the races to see what is included.
- "Skill" is an estimate of how good a rower is at sweep rowing or sculling. More points = more likely to win a race. The average/default is 100 points.
- "Uncertainty" is an measure of how sure the estimate of the skill is. In short, fewer races on the system or more inconsistent results means a higher measure of uncertainty. The default starting point is 10.
The idea of ranking rowers comes from Chess. For more than 50 years, chess players have been ranked on a system known as ELO. Players start out with a set number of points. If they beat someone, they gain points, and vice versa. However, the amount of points awarded varies depending on how unexpected the win was - if one beats another player much more skilled than them, they will get more points for that win than if they had beaten a novice. Live league tables are maintained online, for example here.
The calculations in this website are an adaptation and refinement of these ideas. The rankings are based upon the Trueskill algorithm, first developed by Microsoft to ensure that people playing online multiplayer online games face a fair level of challenge that matches their skill level. It is based upon the principle that everyone's skill in a particular domain (e.g. rowing or chess) can be represented by a statistical probability distribution. In other words, the idea here is that while we can never know exactly what someone's skill level is - particularly if race results are infrequent and rowers are inconsistent over time - we can give an informed central estimate of what we think their skill level is, and we can also talk about how certain we are of that estimate. For the technically minded, the skill level quoted is the mean of the individual's skill distribution, and the 'uncertainty' figure quoted is their standard deviation.
The maths involved in calculating the rankings are quite involved - see for example the original paper on the algorithm. However, there are a number of important factors to note:
- It is the result relative to other players in a race, rather than an absolute performance, that counts.
- This means that you can beat someone by 5 lengths or a bow ball, at the end of the day, a win is a win. In a multi-lane race or a time trial, the thing that matters is where you place in the finish results, not what your overall time was.
- As above, if a GB Squad Member beats a bunch of novices at (for example) Twickenham Regatta, their ranking is not going to be as affected as if they had won the A Final of the Olympics.
- The smaller the boat competing, the more meaningful a result and the more their skill estimate will be affected.
- Similarly, the more people competing in a race, the more the result affects their skill level.
The two factors above combine so that a large time trial of singles or pairs will be much more influential on an athlete's ranking scores than a Henley-style one-on-one eights race.
At the moment, the data used to calculate the rankings is still quite limited - a handful of races over a small period of time. Therefore the rankings currently present reflect these individual data points more than what the people involved are really capable of. It remains to be seen whether these skill estimates can be used for predictive purposes (as was originally envisaged), as opposed to merely reflecting previous performances.
A final note in this regard is on the information that a race result can tell us. Mathematically speaking, there is a finite amount of data that can be collected from any race result. This is equal to the number of teams (crews) that compete in that event multipled by the binary logarithm of that number. A race of 6 eights generates only 16 bits of data, compared to 110 bits generated by a time trial of 24 pairs. Similarly, if we have a number of people to rank (say 1000 active UK rowers), the amount of data needed is also equal to the number of rowers times by the binary logarithm of that number.
These two factors can be combined. If all 1000 rowers got into a single and raced each other in a big time trial, we would only need one race to be able to produce a comprehensive ranking. If instead they switched to sweep and got into pairs (or sculled in doubles), they would need to time trial at least three times before enough data had been produced. If they raced in quads or fours, they would need 5 races. In eights they would need 12 races to be able to obtain the same amount of data (imagine repeating eights head 12 times!). Consider how little data is produced by a six lane side-by-side regatta semi or final, and you may understand the difficulty in producing an accurate ranking. The answer to the problem, of course, is to simply have more data!
How do the predictions work?
It's straightforward. We take the two rower's latest skill level and uncertainty, and effectively reverse those to give us a probability that rower A will beat rower B. If your skill level is higher, you should be more likely to beat your opponent, and the closer the two values, the more likely it will be a 50:50 chance. Given a certain skill level, if your uncertainty is higher, the more likely it is that it will be a 50:50 chance of you winning.
The two factors, the skill difference (i.e. your skill minus that of your opponent) and the combined uncertainty (your uncertainty value plus that of your opponent) both affect your win probability. For example, if you have 5 points more skill than your opponent (e.g. you have 95 vs 90 points, or 110 vs 105) and you both have 9 uncertainty points, your win probability will be 61%. If we are more sure of both the skill levels and reduce your uncertainty to 3 points each, the same skill difference of 5 points will give you a win probability of 80%.