Do Some People Not Respond to Exercise?

It depends on what you measure, a new study finds.

Over the last few years, I’ve written several times about the debate over exercise ‘non-response’. Do some people, no matter how hard they try, simply not get fitter?

That has been the conventional view since landmark experiments in the 1990s on the genetic basis of exercise response. But a 2015 study from Queen’s University, in Canada, suggested that pretty much everyone gets fitter if they get a high enough volume and intensity of exercise. And another study earlier this year bolstered that conclusion.

The full picture may be a little more complex, though. I was at Queen’s a few weeks ago for a talk, and had a chance to chat with Louise de Lannoy, one of the authors of the 2015 study. She updated me on her latest work, and filled me in on the ongoing debate about individual response and the challenges in accurately measuring and analysing it.

De Lannoy’s newest study was published in PLoS ONE recently, and it provides further analysis from the experiment described in the 2015 study, in which subjects did 24 weeks of exercise with a combination of low or high volume and intensity.

In the original study, the outcome was aerobic fitness (what’s often called VO2 max), and in the group doing high volume (40 minutes per workout, five times a week) and high intensity (75 per cent of VO2 max), everyone improved by a significant margin.

The new study, in contrast, looks at insulin and glucose response, which are risk factors that typically precede Type 2 diabetes. It’s well established that moderate exercise improves these parameters on average – but does everyone improve?

De Lannoy’s results are sobering. The average results did show that both insulin and glucose response improved in the high-volume, high-intensity exercise. But looking at the individual results, only about 20 per cent of the subjects showed significant improvements in these parameters, regardless of exercise group.

Why is this? Part of the explanation is that the researchers set a relatively high bar for what was considered a significant improvement. Based on the before-and-after measurements of the control group (which did no exercise), they estimated the typical day-to-day variation in these measures; a significant improvement, they argued, would be one that was more than twice the size of this typical variation.

If they used a lower threshold for improvement, like ‘anything above zero’, then there would be more responders – between 50 and 90 per cent, depending on the exercise group. But with this approach, you would also have to conclude that 50 percent of the control group had made a ‘significant’ improvement, which is illogical.

In a sense, you’re stuck with an inevitable conflict between minimising false positives and false negatives, with no perfect answer. If you had a perfect measurement system, you’d probably conclude that more than 20 per cent of people improved their glucose and insulin response, but considerably fewer than 100 per cent.

When I asked de Lannoy about this, she said that they assume more than 20 per cent of people improve, but that they can only be confident about the improvement in 20 per cent of them. To correctly identify more responders, they would need to take multiple repeated measures – something that’s not cheap or easy either in a clinical or research setting.

So where does this leave us on the overall question of non-response? I’m still inclined to believe that aerobic fitness will almost always respond to a sufficient dose of exercise – after all, it’s the primary outcome that we expect to be altered by exercise.

But it appears that some of the secondary benefits of exercise – in this case, glucose and insulin response – may be less universal. I wouldn’t be surprised if it turns out that the same thing applies to other parameters like blood pressure and cholesterol too.

There’s also an interesting postscript worth mentioning. In recent years, the topic of ‘individual response’ has become hot, and many studies now graph the individual results of their subjects instead of just average results. This, in theory, allows readers to get a sense of how individuals did (or didn’t) respond to an intervention, instead of just showing the average.

As de Lannoy’s results illustrate, this is an important consideration. But the way many studies report this data is deeply flawed, as a 2015 review in Experimental Physiology on the topic showed.

The problem is that there is random variation in every measurement (either inherent in the measurement itself, or in biological fluctuations of the quantity being measured). If I take two consecutive blood-pressure measurements of 1000 people, this random variation will mean that some people will have a higher second reading and others will have a lower second reading – but this doesn’t mean that we can divide them into ‘responders’ and ‘non-responders’!

The review paper illustrates this with a simulation in which everyone’s ‘true’ blood pressure decreases by 5 mmHG, but the measurements are subjects to the usual random fluctuations. The results end up looking like a typical pattern of individual response (and non-response), even though everyone was really a responder:

Random variation in response.
Image courtesy of Experimental Physiology

In fact, the problem is even more insidious that you might think, thanks to “regression to the mean”. Subjects with a randomly high initial value will be more likely to have a lower second value, and vice-versa. This can give the biologically plausible illusion that, for example, the least fit subjects get the biggest response from an intervention, even if everyone’s response is actually identical.

The upshot is that researchers need to be careful (and readers need to be skeptical) about the statistics of individual variation. (In technical terms, the review paper argues that if the standard deviations of your initial and final measurements are the same, you have no justification for analysing or discussing individual responses.)

In de Lannoy’s data, the results did meet that mathematical threshold to suggest there was really individual variation in response – but not by much. As this debate continues, I think (and hope) we’ll start to see a tighter focus on when we’re seeing responders and non-responders versus when we’re just seeing a bunch of random noise.


Subscribe to Runner's World

Related Articles