Regression to the mean, variability, and the misuse of placebo response
methodology
randomization
simulation
superiority trials
trial design
Author
Fei Zuo
Published
December 14, 2025
Modified
March 5, 2026
🎧 1-minute summary
Narration generated using OpenAI.fm.
In randomized clinical trials, especially in neurology and psychiatry, we often hear phrases like:
“The placebo response was high.”
“This indication has a large placebo effect.”
“The drug failed because of placebo response.”
These statements are often misleading. In this post, I’ll explain why placebo response is frequently misunderstood, why it cannot be interpreted as evidence of a causal improvement, and why misunderstanding it leads to flawed trial design and reasoning. The discussion here focuses on superiority trials.
What is placebo response?
Placebo response refers to the overall change observed in participants assigned to placebo between baseline and follow-up. This term does not refer solely to a psychobiological “placebo effect.” In methodological literature, placebo response encompasses all sources of change occurring in the placebo arm, including natural disease course and statistical artifacts, not just expectation-driven effects.
Placebo response is therefore a composite quantity.
The decomposition problem
In practice, observed change in the placebo arm reflects multiple distinct components:
Natural history (disease fluctuation, spontaneous remission, cyclical patterns)
Regression to the mean (especially when enrollment requires elevated baseline severity)
Measurement error (random variation in outcome assessment)
True placebo effect (expectation-driven psychobiologic response attributable to belief in treatment)
All of these are bundled together under the label “placebo response.”
And here is the problem:
These components cannot be separated in a standard parallel-group randomized trial. The design does not allow identification of which portion of within-arm change is attributable to which mechanism.
Placebo response is a within-arm change-from-baseline quantity. It is nota causal effect.
Within-arm change is not a causal estimand
When we calculate placebo response, we are computing:
That is simply change over time that bundles together everything that happens between baseline and follow-up, including regression to the mean, measurement error, natural fluctuation in disease severity, study participation effects, and any expectation-driven placebo effects. But it remains a before-after comparison within a single arm and does not isolate any specific component. More importantly, without a “no-placebo” counterfactual (i.e., a control arm in which participants receive no placebo intervention), there is no basis for concluding that placebo itself caused the observed effect. Regression to the mean, measurement error, natural fluctuation, and study participation effects can all produce change over time even if no placebo were given.
Enrollment in a clinical trial changes behavior. Patients are monitored more closely, seen at scheduled visits, encouraged to adhere to treatment, and given structured clinical attention. These factors can influence outcomes even without active therapy — a so-called study effect. If “improvement on placebo” refers to improvement relative to what would have happened outside the trial entirely, then it incorporates these participation effects. That is a different counterfactual from baseline change and it is not directly observed in a standard parallel-group randomized trial.
Many conditions fluctuate over time. Some improve spontaneously, some remit partially, and some follow cyclical patterns. If “improvement on placebo” refers to improvement relative to what would have happened under the untreated natural disease trajectory without study participation, then it invokes a different counterfactual altogether. That counterfactual is not identifiable in a standard parallel-group randomized trial.
Relative to drug?
This is the only comparison randomization is designed to answer. The causal estimand in a superiority trial is:
A causal effect is defined as the contrast between two potential outcomes for the same individual:
\[
Y(1) - Y(0)
\]
where:
Y(1) is the outcome under treatment
Y(0) is the outcome under control
Because both potential outcomes cannot be observed for the same individual, causal effects cannot be directly measured. In a randomized trial, randomization allows unbiased estimation of the average causal effect by comparing outcomes between groups:
\[
E\!\left[Y(1) - Y(0)\right]
\]
Apparent improvement within the placebo arm alone does not identify a causal effect, because it does not compare observed outcomes to the unobserved counterfactual.
Without clarifying the counterfactual, “improvement on placebo” collapses multiple distinct estimands into a single phrase, and only one of them (drug vs placebo) is relevant to causal treatment effect estimation.
What randomization actually protects
Randomization ensures that everything happening in the placebo arm is also happening in the treatment arm in expectation.
That includes:
Regression to the mean
Natural history
Measurement noise
Study participation effects
Expectation effects (if blinding holds)
Because the estimand is the difference between groups, all shared componensts cancel in expectation and therefore in the contrast.
This is the key insight:
Placebo response is shared variability.
Shared variability does not bias the treatment effect.
Randomization renders placebo response irrelevant to the treatment contrast.
The logical mistake
The reasoning often goes like this:
The placebo arm “improved” a lot.
Therefore, placebo is “competing” with the drug.
Therefore, we must reduce placebo response.
But this logic confuses within-arm change with the between-arm contrast that defines treatment effect.
The treatment effect is the difference between randomized groups, not the magnitude of change within one arm.
To see this clearly, consider a simple simulation.
A simulation demonstration
We simulate a fluctuating disease across many trials where:
Patients enroll during a symptomatic period (as is common in practice due to eligibility criteria).
Outcomes are noisy.
There is regression to the mean.
The drug has a modest true effect of −2 units.
The placebo has no causal effect, but outcomes naturally fluctuate.
Regression to the mean
Because enrollment is based on a high observed baseline, some patients qualify as a result of random positive noise fluctuation rather than consistently high underlying severity.
When re-measured at follow-up, that noise component does not systematically repeat. As a result, the group will tend to show apparent improvement on average even in the absence of any active treatment.
This is precisely why change from baseline is not informative about causal effect. The observed improvement reflects statistical artifact and natural variability, not evidence that placebo caused benefit.
Across 2000 simulated trials, you will typically observe that:
The placebo arm shows substantial “improvement” from baseline — a large “placebo response.”
The estimated treatment effect is very close to −2, the true drug effect.
The placebo arm shows an apparent improvement not because placebo is effective, but because regression to the mean and variability generate change over time. Those same forces operate in the treatment arm.
Randomization ensures that these shared forces cancel in expectation when we compute:
A large placebo response (i.e., within-placebo change) is usually a symptom of high outcome variability. It is this variability, not the apparent improvement itself, that increases standard errors, widens confidence intervals, and reduces statistical power.
That is a precision issue, not a validity issue.
A high placebo response does not invalidate the comparison. It reflects that clinical outcomes are dynamic and randomized trials are designed precisely so that such dynamics do not bias the treatment contrast.
Why strategies to reduce placebo response are misguided
Many placebo-response reduction strategies aim to improve signal detection by reducing observed within-placebo change.
Placebo run-in periods
As Stephen Senn has noted in discussions of placebo run-in designs, excluding “placebo responders” during a run-in does not necessarily remove placebo response from a trial; it may simply exclude patients who happened to improve at that particular time.
Risk of overcorrection
Efforts to reduce placebo response can introduce new problems:
Selection effects that distort the target population
Reduced generalizability
Inflated apparent effect sizes
Fragile, non-replicable results
If the treatment only appears effective after aggressively filtering out natural variability, that raises deeper questions about clinical relevance.
Common approaches include:
Placebo run-in periods
Excluding early “improvers”
Enrichment designs
Highly restrictive eligibility criteria
When strategies aim to “reduce placebo response”, they are often attempting to suppress natural variability and heterogeneity in the outcome measure through design restrictions. Variability itself is not the problem — it is a property of the disease and the measurement process; the problem is inefficient analysis. The principled approach is to account for variability appropriately in the statistical model, not to alter the population or trial context in ways that may compromise generalizability, clinical applicability, and robustness.
Although regulatory guidance recognizes enrichment strategies including those aimed at reducing variability, the notion of a “placebo responder” warrants caution. In fluctuating conditions, short-term improvement during a run-in period often reflects regression to the mean or random temporal variation rather than a stable biological subtype. A patient labelled a “placebo responder” at one time point may not meet that definition if re-measured later. Excluding such individuals therefore selects on transient fluctuation rather than on a reproducible causal mechanism, and may do little to achieve its intended goal.
If the goal Is better signal detection, focus on efficient modeling
If variability makes treatment effects harder to detect, the principled solution is to improve statistical efficiency.
Use appropriate endpoints and efficient modelling
For clinical outcomes, whether continuous, count, or ordinal, analyze the outcome directly and adjust for baseline and other prognostic baseline covariates using an appropriate regression model for the outcome type.
This approach:
Accounts for baseline differences
Reduces unexplained variability
Improves statistical power
Uses all available information when repeated measurements are present
Avoids unnecessary data reduction (e.g., dichotomization or percent change from baseline)
Reduce measurement noise, not biological variability
Takeaway
Precision should be achieved through robust endpoints and efficient modeling that adjusts for prognostic baseline covariates.
“Placebo response” is a descriptive summary of within-arm change. It conflates regression to the mean, measurement error, natural fluctuation, study effects, and expectation effects into a single number. These components are not separately identifiable in a parallel-group randomized trial.
Apparent within-arm improvement is not evidence of a causal effect. Change from baseline, by itself, does not justify an improvement claim without an explicit comparison arm. In a randomized superiority trial, the only causal estimand the design justifies is the treatment contrast.
References
Hernán, M. A., & Robins, J. M. (2020). Causal inference: What if. Chapman & Hall/CRC.
International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH). (2019). ICH E9(R1) addendum on estimands and sensitivity analysis in clinical trials.
Murray, E. J. (2021). Editorial: Demystifying the placebo effect. American Journal of Epidemiology, 190(1), 2–9. doi:10.1093/aje/kwaa162
Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5), 688–701. https://doi.org/10.1037/h0037350
Senn, S. (2021). Statistical issues in drug development (3rd ed.). Wiley.
U.S. Food and Drug Administration. (2019). Enrichment strategies for clinical trials to support determination of effectiveness of human drugs and biological products: Guidance for industry.