Four new studies test the accuracy of fitness prediction functions.
I’ve done enough “real” VO2 max tests – the trips to the lab, the expensive and cumbersome equipment, the brutally exhausting treadmill protocol, and in one case, the puking in the corner afterwards – that I’ve been intrigued to notice the recent trend of GPS watches and heart-rate monitors promising to estimate my VO2 max for me. Could it possibly be that simple?
I’m clearly not the only one wondering, because I noticed at least four presentations at the recent American College of Sports Medicine conference addressing that very question. Overall, the results look better than I might have expected, but there are some differences between the various approaches taken by different watches.
VO2 max is basically the definitive measurement of aerobic fitness. It tells you the maximum rate at which you can take oxygen from the air and deliver through it through the lungs into the bloodstream for use by your working muscles. It’s an excellent measurement of current health and predictor of future health – in fact, last year, in the US, the American Heart Association argued that it should classified as a new “vital sign” to be assessed yearly by your doctor.
As the AHA statement noted, there are various ways of measuring or estimating VO2 max. The best is to do it directly, by measuring the oxygen you consume while you exercise to exhaustion. Next best is to estimate it while exercising to exhaustion; for example, based on the distance covered during a 12-minute run.
There are also “sub-maximal” exercise protocols that estimate VO2 max based on the relationship between your heart rate and pace, without forcing you to go all-out. This (along with basic information like your age and sex) what GPS watches like the Garmin Forerunner 230, 235, and 630 do, by having you run for at least 10 minutes while simultaneously measuring your pace and heart rate. (The 230 and 630 measure heart rate with a chest strap, while the 235 uses a wrist sensor integrated into the watch.)
Finally, there are estimates that don’t involve exercise at all, but simply use information like your age, resting heart rate, and typical activity levels. Watches like the Polar V800 take this a step further by measuring your heart-rate variability (the subtle variations in the time between successive heart beats) for a few minutes while you’re lying down.
Here’s what the data presented at the ACSM conference found.
Garmin vs. Polar vs. Lab Tests
The most comprehensive study came from Bryan Smith and his colleagues at Southern Illinois University Edwardsville in the US. They estimated VO2 max for 23 women and 26 men using the Garmin 230, Garmin 235, and Polar V800, then compared those results to gold-standard lab testing.
Typical VO2 max values in healthy college students tends to be in the 40s or 50s (in units of millilitres of oxygen per kilograms of body mass per minute). In those units, here’s how much the various watch estimates over- or underestimated VO2 max (an upward bar indicates that the watch underestimated VO2 max):
There are some interesting patterns there. The Garmin measurements seem to consistently overestimate VO2 max, to a greater degree in men than women, and to a greater degree with the wrist sensor (which is a newer and less reliable way of monitoring heart rate) than the chest strap.
The Polar measurements appear to be less accurate—not surprisingly, given that they’re estimating a characteristic of maximal exercise while at rest. But the deviation seems to be completely different in men and women. It’s hard to know whether this is an artifact of the particular group of men and women in this study (the men in the study had slightly higher BMIs and also slightly higher VO2 max), or something more systematic.
Before taking these results as gospel, though, it’s worth checking what some of the other studies found.
Chest Strap vs. Wrist Sensor
Another analysis from the same group took a deeper head-to-head look at the data from the two Garmin systems. Given that the measurements were with both watches at the same time on the same run, what explains the different VO2 max estimates?
The most likely culprit appears to be the heart-rate measurements. Chest straps are considered highly accurate, and the wrist sensors produced heart rate values that were consistently lower than the chest strap. That, in turn, meant that the wrist sensors overestimated VO2 max – which makes sense, since the heart-rate data was artificially low.
The conclusion, not surprisingly, is that chest straps give you better data. The remaining question is whether the wrist band gives you “good enough” data, which depends on what you’re using it for.
A couple of other studies compared a single device to lab measurements.
Garmin vs. Lab Tests
Rebecca Moore and her colleagues at Eastern Michigan, led by grad student Andrew Pearson, used a treadmill test and a Garmin Forerunner 235 (the one with the wrist sensor) to measure VO2 max in 23 volunteers.
In this case, the average VO2 max in the lab was 52.4 ml/kg/min, compared to 49.3 ml/kg/min with the watch – so the watch with the wrist sensor underestimated the lab value, which is the opposite of what the Southern Illinois study found.
What explains the discrepancy? I have no idea, but it suggests we should be cautious about drawing definitive conclusions about either set of results. I asked Moore and Pearson about the individual variation in their data, and they said that the watch consistently underestimated VO2 max, particularly for those with higher values (above 50, a value generally found in sub-20:00 5K runners).
Polar vs. Lab Tests
Finally, Kent Johnson and Jenny Beadle of Lipscomb University compared the values produced by Polar’s FT60 Fitness Test (the one based on heart-rate variability while lying down) with lab values in 31 subjects. In this case, the average lab value was 44.9 ml/kg/min, and the Polar value was 49.8 ml/kg/min.
This overestimate of about 10 percent, or just under 5 ml/kg/min, is similar to what the Southern Illinois group saw in men, but not in women. However, the Lipscomb volunteers were 13 men and 18 women, so that pattern of sex differences doesn’t seem consistent between the studies.
So what overall conclusions can we draw from these studies?
First, there appears to be a general hierarchy along exactly the lines that you would have guessed before seeing the studies. An exercise-based test with a chest strap is better than one with a wrist sensor, which in turn is better than a resting test.
None of them are perfect matches for maximal lab testing, but the chest strap data seems remarkably good, with statistically insignificant overestimates of 0.8 and 1.2 ml/kg/min, on average, in women and men. That’s a little bit more than two per cent.
Second, given the inconsistencies between different studies, we shouldn’t draw any final conclusions, particularly about patterns like how men and women respond differently. Taken as a whole, the studies suggest that the Garmin methodology can give you a VO2 max estimate within about five per cent of your true value.
To be really useful from a practical perspective, what we’d need to understand is how consistent the measures are when repeated multiple times, and how much individual variation there is. Knowing that the watches are off by less than, say, five per cent on average is nice – but does that mean nearly everyone is off by between three and five per cent, or are a few people right on while others are 10 per cent off?
In the end, you pretty much get what you pay for (in terms of money and effort). For most of us, an estimate of VO2 max is interesting for curiosity’s sake, and an error of a few per cent is no big deal. If you want a more accurate fitness marker, head to your local exercise physiology lab… or, better yet, sign up for a race.