Rethinking Placebo Response

Regression to the mean, variability, and the misuse of placebo response

methodology

randomization

simulation

superiority trials

trial design

Author

Fei Zuo

Published

December 14, 2025

Modified

March 5, 2026

🎧 1-minute summary

Narration generated using OpenAI.fm.

In randomized clinical trials, especially in neurology and psychiatry, we often hear phrases like:

“The placebo response was high.”
“This indication has a large placebo effect.”
“The drug failed because of placebo response.”

These statements are often misleading. In this post, I’ll explain why placebo response is frequently misunderstood, why it cannot be interpreted as evidence of a causal improvement, and why misunderstanding it leads to flawed trial design and reasoning. The discussion here focuses on superiority trials.

What is placebo response?

Placebo response refers to the overall change observed in participants assigned to placebo between baseline and follow-up. This term does not refer solely to a psychobiological “placebo effect.” In methodological literature, placebo response encompasses all sources of change occurring in the placebo arm, including natural disease course and statistical artifacts, not just expectation-driven effects.

Placebo response is therefore a composite quantity.

The decomposition problem

In practice, observed change in the placebo arm reflects multiple distinct components:

Natural history (disease fluctuation, spontaneous remission, cyclical patterns)
Regression to the mean (especially when enrollment requires elevated baseline severity)
Measurement error (random variation in outcome assessment)
Study effect (being observed, monitored, structured follow-up, adherence reinforcement, behavioral change)
True placebo effect (expectation-driven psychobiologic response attributable to belief in treatment)

All of these are bundled together under the label “placebo response.”

And here is the problem:

These components cannot be separated in a standard parallel-group randomized trial. The design does not allow identification of which portion of within-arm change is attributable to which mechanism.
Placebo response is a within-arm change-from-baseline quantity. It is not a causal effect.

Within-arm change is not a causal estimand

When we calculate placebo response, we are computing:

\[ \text{Placebo response} = \mathbb{E}\left( Y_{\text{post}} - Y_{\text{baseline}} \mid \text{Placebo} \right) \]

That is a descriptive statistic.

But the treatment effect in a randomized trial is:

\[ \text{Treatment effect} = \mathbb{E}\left( Y_{\text{post}} \mid \text{Drug} \right) - \mathbb{E}\left( Y_{\text{post}} \mid \text{Placebo} \right) \]

That is a between-arm contrast.

Randomization protects and justifies the treatment contrast, not the within-arm summaries.

So when people say: “Patients improved on placebo.”

The statement sounds causal but it is ambiguous. Improvement always implies a comparison.

The question is:

Improved relative to what?

Relative to baseline?

\[ \mathbb{E}\left( Y_{\text{post}} - Y_{\text{baseline}} \mid \text{Placebo} \right) \]

That is simply change over time that bundles together everything that happens between baseline and follow-up, including regression to the mean, measurement error, natural fluctuation in disease severity, study participation effects, and any expectation-driven placebo effects. But it remains a before-after comparison within a single arm and does not isolate any specific component. More importantly, without a “no-placebo” counterfactual (i.e., a control arm in which participants receive no placebo intervention), there is no basis for concluding that placebo itself caused the observed effect. Regression to the mean, measurement error, natural fluctuation, and study participation effects can all produce change over time even if no placebo were given.
Relative to no study participation?

\[ \mathbb{E}\left( Y_{\text{post}} \mid \text{Placebo} \right) - \mathbb{E}\left( Y_{\text{post}} \mid \text{No Study} \right) \]

Enrollment in a clinical trial changes behavior. Patients are monitored more closely, seen at scheduled visits, encouraged to adhere to treatment, and given structured clinical attention. These factors can influence outcomes even without active therapy — a so-called study effect. If “improvement on placebo” refers to improvement relative to what would have happened outside the trial entirely, then it incorporates these participation effects. That is a different counterfactual from baseline change and it is not directly observed in a standard parallel-group randomized trial.
Relative to the natural disease trajectory?

\[ \mathbb{E}\left( Y_{\text{post}} \mid \text{Placebo} \right) - \mathbb{E}\left( Y_{\text{post}} \mid \text{Natural History} \right) \]

Many conditions fluctuate over time. Some improve spontaneously, some remit partially, and some follow cyclical patterns. If “improvement on placebo” refers to improvement relative to what would have happened under the untreated natural disease trajectory without study participation, then it invokes a different counterfactual altogether. That counterfactual is not identifiable in a standard parallel-group randomized trial.
Relative to drug?

This is the only comparison randomization is designed to answer. The causal estimand in a superiority trial is:

\[ \text{Treatment effect} = \mathbb{E}\left( Y_{\text{post}} \mid \text{Drug} \right) - \mathbb{E}\left( Y_{\text{post}} \mid \text{Placebo} \right) \]

What is a causal effect?

A causal effect is defined as the contrast between two potential outcomes for the same individual:

\[ Y(1) - Y(0) \]

where:

Y(1) is the outcome under treatment
Y(0) is the outcome under control

Because both potential outcomes cannot be observed for the same individual, causal effects cannot be directly measured. In a randomized trial, randomization allows unbiased estimation of the average causal effect by comparing outcomes between groups:

\[ E\!\left[Y(1) - Y(0)\right] \]

Apparent improvement within the placebo arm alone does not identify a causal effect, because it does not compare observed outcomes to the unobserved counterfactual.

Without clarifying the counterfactual, “improvement on placebo” collapses multiple distinct estimands into a single phrase, and only one of them (drug vs placebo) is relevant to causal treatment effect estimation.

What randomization actually protects

Randomization ensures that everything happening in the placebo arm is also happening in the treatment arm in expectation.

That includes:

Regression to the mean
Natural history
Measurement noise
Study participation effects
Expectation effects (if blinding holds)

Because the estimand is the difference between groups, all shared componensts cancel in expectation and therefore in the contrast.

This is the key insight:

Placebo response is shared variability.
Shared variability does not bias the treatment effect.
Randomization renders placebo response irrelevant to the treatment contrast.

The logical mistake

The reasoning often goes like this:

The placebo arm “improved” a lot.
Therefore, placebo is “competing” with the drug.
Therefore, we must reduce placebo response.

But this logic confuses within-arm change with the between-arm contrast that defines treatment effect.

The treatment effect is the difference between randomized groups, not the magnitude of change within one arm.

To see this clearly, consider a simple simulation.

A simulation demonstration

We simulate a fluctuating disease across many trials where:

Patients enroll during a symptomatic period (as is common in practice due to eligibility criteria).
Outcomes are noisy.
There is regression to the mean.
The drug has a modest true effect of −2 units.
The placebo has no causal effect, but outcomes naturally fluctuate.

Regression to the mean

Because enrollment is based on a high observed baseline, some patients qualify as a result of random positive noise fluctuation rather than consistently high underlying severity.

When re-measured at follow-up, that noise component does not systematically repeat. As a result, the group will tend to show apparent improvement on average even in the absence of any active treatment.

This is precisely why change from baseline is not informative about causal effect. The observed improvement reflects statistical artifact and natural variability, not evidence that placebo caused benefit.

Code

set.seed(1)

one_trial <- function(n = 200, 
                      true_drug_effect = -2,
                      sigma_person = 6,#between-subject heterogeneity in disease severity
                      sigma_noise = 8, #measurement noise/day-to-day symptom variability
                      enroll_threshold = 55, #only people with a baseline measurement ≥ 55 meet eligible criteria
                      return_data = FALSE) #for inspection only when set to TRUE
  {

  # keep generating candidates until we can enroll n
  keep_idx <- integer(0) #indices of candidates whose observed baseline meets eligibility
  mu_pool <- numeric(0) #each candidate’s latent (unobserved) “true” severity
  y0_pool <- numeric(0) #their observed baseline measurement (true severity + noise)

  while (length(keep_idx) < n) {
#This loop will keep generating candidates until we have at least n eligible participants.    
    mu_new <- rnorm(5*n, mean = 50, sd = sigma_person) #Generate candidates' latent severities
    y0_new <- mu_new + rnorm(5*n, 0, sigma_noise) #Generate candidates' baseline observed score
    keep_new <- which(y0_new >= enroll_threshold) #only enroll candidates whose observed baseline is high enough (≥ 55). 
    #Note eligibility is based on measured baseline severity, not latent severity.

    mu_pool <- c(mu_pool, mu_new)
    y0_pool <- c(y0_pool, y0_new)
    keep_idx <- which(y0_pool >= enroll_threshold)
  }

  idx <- sample(keep_idx, n) #Randomly select exactly n enrolled patients
  mu <- mu_pool[idx]
  y0 <- y0_pool[idx]

  A <- rbinom(n, 1, 0.5)  # 1=drug, 0=placebo
  y1 <- mu + rnorm(n, 0, sigma_noise) + A * true_drug_effect

  placebo_change <- mean(y1[A == 0] - y0[A == 0])
  te_post <- mean(y1[A == 1]) - mean(y1[A == 0])

  # -- Precision: Welch-style SE, CI, one-sided test for power----------------
  n1 <- sum(A == 1)
  n0 <- sum(A == 0)
  s1_sq <- var(y1[A == 1])
  s0_sq <- var(y1[A == 0])
  # Standard error for difference in means (Welch formulation).
  # We estimate the variance from the data, so the sampling distribution
  # of the test statistic follows a t distribution rather than a normal.
  se_te <- sqrt(s1_sq / n1 + s0_sq / n0)
  
  # Welch–Satterthwaite df
  df_welch <- (s1_sq/n1 + s0_sq/n0)^2 /
    ((s1_sq/n1)^2/(n1 - 1) + (s0_sq/n0)^2/(n0 - 1))
  
  # Two-sided 95% CI using t critical value
  # If the population variances were known, a normal (z) critical value
  # could be used. Here they are estimated, so t is appropriate.
  t_crit <- qt(0.975, df = df_welch)
  ci_lo <- te_post - t_crit * se_te
  ci_hi <- te_post + t_crit * se_te
  
  # One-sided p-value for benefit (H1: treatment effect < 0)
  # The test statistic follows a t distribution with df_welch.
  t_stat <- te_post / se_te
  p_one_sided <- pt(t_stat, df = df_welch)
  
  # "Power" indicator = conclude benefit at alpha=0.025 (one-sided)
  # equivalent to checking whether the upper bound of a two-sided 95% confidence interval
  # is below zero, since both use the same critical value.
  sig_025 <- as.integer(p_one_sided < 0.025) 
  sig_benefit_025 <- as.integer(ci_hi < 0) #an alternative check

  # --------------------------------------------------------------
  if (return_data) { #for inspection only
    return(list(
      screened = data.frame(
        true_severity = mu_pool,
        baseline_observed = y0_pool,
        eligible = y0_pool >= enroll_threshold
      ),
      enrolled = data.frame(
        true_severity = mu,
        baseline_observed = y0,
        treatment = A,
        followup = y1
      ),
      summaries = c(placebo_change = placebo_change,
                    treatment_effect = te_post,
                     df_welch = df_welch,
                     t_crit = t_crit,
                     se_te = se_te,
                     ci_lo = ci_lo,
                     ci_hi = ci_hi,
                     sig_025 = sig_025,
                     sig_benefit_025 = sig_benefit_025,
                     p_one_sided = p_one_sided)
    ))
  }
  # --------------------------------------------------------------
  c(placebo_change = placebo_change,
    treatment_effect = te_post,
    df_welch = df_welch,
    t_crit = t_crit,
    se_te = se_te,
    ci_lo = ci_lo,
    ci_hi = ci_hi,
    sig_025 = sig_025,
    sig_benefit_025 = sig_benefit_025, 
    p_one_sided = p_one_sided)
}

#Repeat across 2000 simulations
B <- 2000
res <- do.call(rbind, lapply(seq_len(B), function(i) one_trial())) 

#Summarize the simulation results
out <- list(
  mean_placebo_change = round(mean(res[, "placebo_change"]), 2),
  mean_estimated_treatment_effect = round(mean(res[, "treatment_effect"]), 2),
  mean_se = round(mean(res[, "se_te"]), 2),
  mean_ci_width = round(mean(res[, "ci_hi"] - res[, "ci_lo"]), 2),
  approx_power = round(mean(res[, "sig_025"]), 2),
  #approx_power = round(mean(res[, "sig_benefit_025"]), 2), #an alternative check
  n_trials_used = nrow(res)
) 

print(out)

$mean_placebo_change
[1] -7.31

$mean_estimated_treatment_effect
[1] -2.05

$mean_se
[1] 1.35

$mean_ci_width
[1] 5.32

$approx_power
[1] 0.33

$n_trials_used
[1] 2000

Across 2000 simulated trials, you will typically observe that:

The placebo arm shows substantial “improvement” from baseline — a large “placebo response.”
The estimated treatment effect is very close to −2, the true drug effect.

The placebo arm shows an apparent improvement not because placebo is effective, but because regression to the mean and variability generate change over time. Those same forces operate in the treatment arm.

Randomization ensures that these shared forces cancel in expectation when we compute:

\[ \mathbb{E}\big(Y_{\text{post}} \mid \text{Drug}\big) \;-\; \mathbb{E}\big(Y_{\text{post}} \mid \text{Placebo}\big) \]

A large placebo response (i.e., within-placebo change) is usually a symptom of high outcome variability. It is this variability, not the apparent improvement itself, that increases standard errors, widens confidence intervals, and reduces statistical power.

That is a precision issue, not a validity issue.

A high placebo response does not invalidate the comparison. It reflects that clinical outcomes are dynamic and randomized trials are designed precisely so that such dynamics do not bias the treatment contrast.

Why strategies to reduce placebo response are misguided

Many placebo-response reduction strategies aim to improve signal detection by reducing observed within-placebo change.

Placebo run-in periods

As Stephen Senn has noted in discussions of placebo run-in designs, excluding “placebo responders” during a run-in does not necessarily remove placebo response from a trial; it may simply exclude patients who happened to improve at that particular time.

Risk of overcorrection

Efforts to reduce placebo response can introduce new problems:

Selection effects that distort the target population
Reduced generalizability
Inflated apparent effect sizes
Fragile, non-replicable results

If the treatment only appears effective after aggressively filtering out natural variability, that raises deeper questions about clinical relevance.

Common approaches include:

Placebo run-in periods
Excluding early “improvers”
Enrichment designs
Highly restrictive eligibility criteria

When strategies aim to “reduce placebo response”, they are often attempting to suppress natural variability and heterogeneity in the outcome measure through design restrictions. Variability itself is not the problem — it is a property of the disease and the measurement process; the problem is inefficient analysis. The principled approach is to account for variability appropriately in the statistical model, not to alter the population or trial context in ways that may compromise generalizability, clinical applicability, and robustness.

Although regulatory guidance recognizes enrichment strategies including those aimed at reducing variability, the notion of a “placebo responder” warrants caution. In fluctuating conditions, short-term improvement during a run-in period often reflects regression to the mean or random temporal variation rather than a stable biological subtype. A patient labelled a “placebo responder” at one time point may not meet that definition if re-measured later. Excluding such individuals therefore selects on transient fluctuation rather than on a reproducible causal mechanism, and may do little to achieve its intended goal.

If the goal Is better signal detection, focus on efficient modeling

If variability makes treatment effects harder to detect, the principled solution is to improve statistical efficiency.

Use appropriate endpoints and efficient modelling

For clinical outcomes, whether continuous, count, or ordinal, analyze the outcome directly and adjust for baseline and other prognostic baseline covariates using an appropriate regression model for the outcome type.

This approach:

Accounts for baseline differences
Reduces unexplained variability
Improves statistical power
Uses all available information when repeated measurements are present
Avoids unnecessary data reduction (e.g., dichotomization or percent change from baseline)

Other principled ways to improve precision

Increase sample size
Improve measurement reliability (better instruments, rater training, repeated measurements)
Reduce measurement noise, not biological variability

Takeaway

Precision should be achieved through robust endpoints and efficient modeling that adjusts for prognostic baseline covariates.

“Placebo response” is a descriptive summary of within-arm change. It conflates regression to the mean, measurement error, natural fluctuation, study effects, and expectation effects into a single number. These components are not separately identifiable in a parallel-group randomized trial.

Apparent within-arm improvement is not evidence of a causal effect. Change from baseline, by itself, does not justify an improvement claim without an explicit comparison arm. In a randomized superiority trial, the only causal estimand the design justifies is the treatment contrast.

References

Hernán, M. A., & Robins, J. M. (2020). Causal inference: What if. Chapman & Hall/CRC.

Holland, P. W. (1986). Statistics and causal inference. Journal of the American Statistical Association, 81(396), 945–960. https://doi.org/10.1080/01621459.1986.10478354

International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH). (2019). ICH E9(R1) addendum on estimands and sensitivity analysis in clinical trials.

Murray, E. J. (2021). Editorial: Demystifying the placebo effect. American Journal of Epidemiology, 190(1), 2–9.
doi:10.1093/aje/kwaa162

Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5), 688–701. https://doi.org/10.1037/h0037350

Senn, S. (2021). Statistical issues in drug development (3rd ed.). Wiley.

U.S. Food and Drug Administration. (2019). Enrichment strategies for clinical trials to support determination of effectiveness of human drugs and biological products: Guidance for industry.