A Conceptual Workflow for Designing a Seamless Bayesian Basket Trial
How to think through a two-stage adaptive basket trial design
This post outlines general methodological concepts for designing Bayesian basket trials. Any examples are illustrative and do not describe a specific study or disclose proprietary details. Similar approaches may be applied in practice, but the content here is presented at a conceptual level only.
Modern clinical trials are no longer viewed as fixed, one-shot experiments. Increasingly, they are designed to adapt and learn as data accrue, allowing for early stopping when signals are weak and continuation when evidence is promising.
This shift is especially important in rare disease settings, where patient populations are limited and efficient use of data is critical. In these contexts, more flexible designs such as seamless Bayesian basket trials become particularly attractive. These designs allow multiple subgroups (or “baskets”) to be evaluated in parallel, while adaptively prioritizing those showing evidence of benefit.
This post outlines one way to think about designing a seamless Bayesian basket trial, focusing on the key ideas and decision steps. The goal is to build intuition for how the design operates, before diving into implementation or simulation. There are many valid ways to implement such designs; the framework below is intended to illustrate a conceptual approach rather than a prescriptive template.
In this setting, trial decisions can be framed as probabilistic statements about clinically meaningful effects. Rather than relying on point estimates alone, decisions may be guided by the probability that the treatment achieves a clinically meaningful effect, as informed by the full posterior distribution.
The goal
We want to understand how this trial design makes decisions as data accrue:
which baskets continue
which stop early
and which are ultimately declared successful
The workflow below walks through these decision steps conceptually.
Trial data structure
To illustrate the design, we consider a setting where each patient contributes:
a baseline seizure count over a baseline period
a follow-up seizure count over a given exposure period
Seizure counts are modelled using a negative binomial distribution, with treatment effects expressed as rate ratios (RR) and baseline seizure frequency included as a prognostic covariate.
In practice, a small number of additional prognostic baseline covariates may also be included and pre-specified in the statistical analysis plan. Incorporating such covariates can improve the precision of the treatment effect estimate and overall efficiency of the design. However, care is needed not to include too many covariates, particularly in small sample settings, as this can introduce instability and unnecessary model complexity.
Stage 1 analysis (signal-seeking)
We analyze the first stage using a Bayesian hierarchical model, which allows information to be shared across baskets while accounting for between-basket variability.
The Stage 1 Bayesian model requires prior distributions for parameters such as the intercept, treatment effect, prognostic covariate effects, between-basket variability, and overdispersion. These details are set aside here to focus on the adaptive workflow and will be revisited when discussing implementation.
In general, Bayesian decision rules take the form
\[ \Pr(\theta > \theta_0 \mid \text{data}) \ge \gamma \]
- \(\theta\): treatment effect
- \(\theta_0\): clinically meaningful threshold
- \(\gamma\): decision threshold
In our setting, the treatment effect is a rate ratio (RR), where values less than one indicate treatment benefit relative to control.
Define “benefit”
Suppose a clinically meaningful benefit is defined as a ≥ 20% reduction in seizure frequency for treatment compared to control. This corresponds to RR ≤ 0.8.
Estimate posterior probability of benefit
After Stage 1, we estimate the posterior probability of benefit for each basket.
For example, Basket A might have:
posterior probability of benefit = 0.65
This means there is a 65% probability that the true treatment effect achieves at least a 20% reduction in seizure frequency for treatment relative to control (i.e., RR ≤ 0.8).
Apply continuation rule
We then compare this probability to a pre-specified continuation threshold (e.g., 0.3 or 30% probability).
If the posterior probability meets the threshold → the basket continues to Stage 2
Otherwise → the basket stops early
The continuation rule reflects two components:
a clinically meaningful target (RR ≤ 0.8)
a required level of statistical evidence (posterior probability ≥ 0.3)
These components are not evaluated separately. Instead, the entire posterior distribution of the treatment effect is assessed, and the decision is based on the resulting posterior probability that the target is achieved.
Decision for the example
Since 0.65 ≥ 0.3, Basket A continues to Stage 2.
Adaptive continuation
Each basket is evaluated independently at the Stage 1 analysis.
Some baskets continue to Stage 2, while others stop early based on the observed evidence.
The continuation decision combines two key elements:
a clinically meaningful target (e.g., a ≥ 20% reduction in seizure frequency, corresponding to RR ≤ 0.8)
a decision threshold on the posterior probability (e.g., posterior probability ≥ 0.3)
This is what makes the design efficient:
resources are focused on baskets where there is meaningful evidence of benefit
baskets with little evidence of achieving a clinically meaningful effect are stopped early
In this way, the trial adapts by prioritizing signals that are both clinically relevant and statistically supported by the data.
Stage 1 → Stage 2 flow (basket-specific)
Stage 2 data collection in continuing baskets
For baskets that continue:
new patients are enrolled
new data are observed
For baskets that stopped:
- contribute no further data
Stage 2 analysis (confirmatory)
We analyze Stage 2 using a similar Bayesian hierarchical model, focusing only on baskets that continued from Stage 1.
At this stage, the goal shifts from signal detection to confirming evidence of a clinically meaningful treatment effect.
Unlike a traditional two-stage design, Stage 2 borrows information from Stage 1 by incorporating the Stage 1 results into the prior distribution for the treatment effect.
Incorporate Stage 1 evidence (borrowing with discounting)
The posterior from Stage 1 is used to construct an informative prior for Stage 2.
To avoid over-reliance on early data, this information may be discounted, meaning:
strong Stage 1 signals still contribute
but uncertainty is intentionally inflated to avoid overconfidence
This allows Stage 2 to:
learn from Stage 1,
while still being driven by new data
The choice of whether and how much to discount Stage 1 information requires additional consideration. In practice, this is typically informed by simulation studies to evaluate operating characteristics under different assumptions, as well as by the scientific plausibility of borrowing and the degree of heterogeneity across baskets. Regulatory guidance (e.g., FDA) emphasizes the need to justify prior assumptions and assess robustness through sensitivity analyses, rather than prescribing specific levels of discounting. These details are set aside here and will be explored in a subsequent post.
Define “success”
We retain the same definition of clinically meaningful benefit for a basket:
- a ≥20% reduction in seizure frequency, corresponding to RR ≤ 0.8
Estimate posterior probability of success
Using:
Stage 2 data, and
the Stage 1-informed (discounted) prior
we estimate the posterior probability of success, defined as the probability that RR ≤ 0.8 for each continuing basket.
For example, Basket A might have:
posterior probability of success = 0.95
This means there is a 95% probability that the treatment achieves at least a 20% reduction in seizure frequency for treatment relative to control at the Stage 2 analysis.
In this example, success is assessed separately for each basket. A given basket is declared successful if its posterior probability exceeds the pre-specified success threshold.
More generally, how “success” is defined—whether at the basket level (each basket evaluated independently) or at the trial level (e.g., requiring multiple or all baskets to meet a criterion)—is a clinical and scientific decision, not purely a statistical one. Different definitions can be evaluated through simulation to understand their impact on operating characteristics and overall trial conclusions.
Apply success rule
We then compare this probability to a pre-specified success threshold (e.g., 0.9 or 90% probability).
\[ \text{Declare success if } P(\text{RR} \le 0.8 \mid \text{data}) \ge \gamma_{\text{eff}} \]
If the posterior probability of success exceeds the threshold → the basket is declared successful
Otherwise → the basket does not meet the success criterion
Decision for the example
Suppose that after Stage 2, Basket A has a 95% posterior probability that the true rate ratio is less than or equal to 0.8. Since this exceeds the success threshold of 0.9, Basket A is declared successful.
\[ P(\text{RR} \le 0.8 \mid \text{data}) = 0.95 \ge 0.90 \]
Why this matters
Stage 2 applies a more stringent threshold than Stage 1, while also leveraging earlier evidence:
Stage 1: identifies promising signals
Stage 2: confirms them using additional data and stronger evidence thresholds
Borrowing: improves efficiency by reusing information
Discounting: protects against overconfidence
This creates a seamless design that balances:
efficiency
robustness
and interpretability
Final decision
For each basket, we ask:
Is there strong evidence that treatment is better than control?
If yes → success; If not → failure
A basket is declared successful only if it:
continues past Stage 1, and
meets the success criterion in Stage 2
Stage 2 → Final Decision flow (basket-specific)
How are the thresholds chosen?
The continuation threshold in Stage 1 (e.g., 0.3) and the success threshold in Stage 2 (e.g., 0.9) are not arbitrary.
In practice, these thresholds are chosen through simulation studies to achieve desirable operating characteristics, such as:
controlling false positive rates
ensuring adequate power to detect meaningful treatment effects
balancing early stopping with the ability to confirm true signals
Although the decision rules are Bayesian, the operating characteristics used to calibrate these thresholds are often evaluated from a frequentist perspective. That is, we simulate many trials under fixed “true” treatment effects and assess how often the design:
incorrectly declares success (type I error)
correctly identifies true treatment effects (power)
This hybrid approach—Bayesian decision-making with frequentist calibration—is standard in practice and ensures that the design has acceptable long-run performance.
Different choices of thresholds lead to different trade-offs:
a lower Stage 1 threshold allows more baskets to continue, reducing the risk of discarding promising signals early
a higher Stage 2 threshold requires stronger evidence before declaring success
More recently, there has been increasing interest in Bayesian operating characteristics, which evaluate quantities such as average posterior probabilities or decision behaviour under prior distributions. Recent regulatory guidance (e.g., FDA) highlights that these approaches can provide additional insight, but also introduce additional complexity, as results may depend on prior assumptions and how uncertainty is characterized across scenarios.
In this post, we focus on the workflow and decision logic. In a subsequent post, we explore how simulation is used to calibrate these thresholds and evaluate the performance of the design.