Simulate Correlated Progression-Free Survival and Overall Survival Using a Gumbel Copula • TrialSimulator

Progression-free survival (PFS) and overall survival (OS) are common time-to-event endpoints in oncology trials. Because progression or death defines a PFS event, simulated trial data should always satisfy $\mbox{PFS} \leq \mbox{OS}$ . At the same time, the two endpoints are often highly correlated, so independent marginal simulation is usually not appropriate.

This vignette describes the copula-based generator implemented in TrialSimulator::CorrelatedPfsAndOs2(). The function is intended for simulation settings where the user wants to specify three interpretable quantities:

the marginal median PFS,
the marginal median OS,
Kendall’s tau between the observed, uncensored PFS and OS times.

The method uses a Gumbel survival copula for a latent time to progression (TTP) and OS, and then defines

$\mbox{PFS} = \min(\mbox{TTP}, \mbox{OS}).$

This construction guarantees $\mbox{PFS} \leq \mbox{OS}$ for every simulated patient. It also keeps both marginal PFS and OS exponential, so the inputs can be specified directly through their medians.

This is a useful distinction from the illness-death model in TrialSimulator::CorrelatedPfsAndOs3(). The illness-death model gives a clinically interpretable transition structure, but OS generated from that model can have a time-varying hazard ratio between treatment arms. Therefore, if OS will be analyzed using a proportional hazards Cox model and the simulation should be consistent with that analysis model, the copula-based method described here is often more appropriate.

Model

Let $X$ denote latent TTP and let $Y$ denote OS. The model assumes exponential marginal survival functions

$\Pr(X > x) = \exp(-\lambda_X x), \qquad \Pr(Y > y) = \exp(-\lambda_Y y),$

and a Gumbel–Hougaard survival copula

$\Pr(X > x, Y > y) = \exp\left[ -\left\{(\lambda_X x)^\theta + (\lambda_Y y)^\theta\right\}^{1/\theta} \right], \qquad \theta \geq 1.$

The observed PFS time is

$P = \min(X, Y).$

Under this model, OS is exponential with rate $\lambda_Y$ . PFS is also exponential, with survival function

$\Pr(P > t) = \Pr(X > t, Y > t) = \exp\left[ -\left\{(\lambda_X^\theta + \lambda_Y^\theta)t^\theta\right\}^{1/\theta} \right] = \exp(-\lambda_P t),$

where

$\lambda_P = \left(\lambda_X^\theta + \lambda_Y^\theta\right)^{1/\theta}.$

Therefore, if the target medians are $m_P$ for PFS and $m_Y$ for OS,

$\lambda_P = \frac{\log(2)}{m_P}, \qquad \lambda_Y = \frac{\log(2)}{m_Y}.$

For a candidate value of $\theta$ , the latent TTP rate is then

$\lambda_X = \left(\lambda_P^\theta - \lambda_Y^\theta\right)^{1/\theta}.$

This requires $m_P < m_Y$ , as expected for PFS and OS.

Choosing the Copula Parameter

For the Gumbel copula, Kendall’s tau between latent TTP and OS is

$\tau(X,Y) = 1 - \frac{1}{\theta}.$

However, CorrelatedPfsAndOs2() asks for Kendall’s tau between the observed, uncensored endpoints $P=\min(X,Y)$ and $Y$ , not between latent TTP and OS. These are not the same quantity. The formula below is derived in the appendix. Under the model above,

$\tau(P,Y) = 1 - \frac{1}{\theta} \left[ 1 - \left(\frac{m_P}{m_Y}\right)^\theta \right].$

Equivalently, writing $\tau_X = \tau(X,Y)$ and $\theta = 1/(1-\tau_X)$ ,

$\tau(P,Y) = 1 - (1-\tau_X) \left[ 1 - \left(\frac{m_P}{m_Y}\right)^{1/(1-\tau_X)} \right].$

CorrelatedPfsAndOs2() solves this equation numerically for the latent Kendall’s tau $\tau_X$ , converts it to $\theta$ , and then simulates from the Gumbel copula.

For any finite $\theta$ , $\tau(P,Y)$ is larger than $\tau(X,Y)$ :

$\tau(P,Y)-\tau(X,Y) = \frac{1}{\theta} \left(\frac{m_P}{m_Y}\right)^\theta > 0.$

Intuitively, even if latent TTP and OS have a weaker association, observed PFS contains deaths by construction, which increases the association between observed PFS and OS.

The requested value of $\tau(P,Y)$ has a lower bound. When $\theta=1$ , TTP and OS are independent, but PFS still contains OS deaths through the definition $P=\min(X,Y)$ . In this limiting case,

$\tau(P,Y) = \frac{m_P}{m_Y}.$

Thus, for fixed medians $m_P$ and $m_Y$ , CorrelatedPfsAndOs2() can only target

$\frac{m_P}{m_Y} \leq \tau(P,Y) < 1.$

If the requested Kendall’s tau is too small for the two medians, the function stops with an error.

Example

Suppose we want to simulate PFS with median 5 months and OS with median 11 months, targeting Kendall’s tau of 0.6 between observed, uncensored PFS and OS times.

pfs_and_os <- endpoint(name = c('pfs', 'os'),
                       type = c('tte', 'tte'),
                       generator = CorrelatedPfsAndOs2,
                       median_pfs = 5,
                       median_os = 11,
                       kendall = 0.6,
                       pfs_name = 'pfs',
                       os_name = 'os')

pfs_and_os

For verification only, we can call the generator directly. This is not the recommended way to use TrialSimulator for simulation studies; in practice, the generator should usually be supplied to endpoint() and used inside the trial simulation workflow. Direct calls are helpful here because they let us check the marginal medians, the observed Kendall’s tau, and the ordering $\mbox{PFS} \leq \mbox{OS}$ before adding enrollment, censoring, treatment arms, and analyses.

set.seed(123)
dat <- CorrelatedPfsAndOs2(n = 10000,
                           median_pfs = 5,
                           median_os = 11,
                           kendall = 0.6)

head(dat, 2)
#>        pfs       os pfs_event os_event
#> 1 2.744789 19.37430         1        1
#> 2 5.342269 25.04203         1        1

The simulation should approximately recover the requested medians and Kendall’s tau between observed, uncensored PFS and OS times.

with(dat, median(pfs))
#> [1] 5.102113
with(dat, median(os))
#> [1] 11.28231
with(dat, cor(pfs, os, method = 'kendall'))
#> [1] 0.5954657
with(dat, all(pfs <= os))
#> [1] TRUE

Because this generator returns event indicators equal to 1 for both endpoints, censoring and staggered enrollment should be handled by the broader trial simulation workflow rather than by this endpoint generator.

Practical Guidance

This generator is most useful when the simulation objective is to specify marginal PFS and OS medians and a rank-based association between the two observed endpoints. Kendall’s tau is often more stable and interpretable than Pearson correlation for skewed time-to-event variables, and it is directly connected to the Gumbel copula parameter.

There are still modeling assumptions to keep in mind:

PFS and OS are both marginally exponential.
Dependence is induced through a Gumbel survival copula on latent TTP and OS.
The Gumbel copula only allows non-negative dependence.
The requested Kendall’s tau is for observed PFS and OS event times before censoring is applied.
The attainable Kendall’s tau between observed PFS and OS is constrained by the PFS/OS median ratio.

For simulations that require transition-specific hazards or a mechanistic disease process, the illness-death model implemented in TrialSimulator::CorrelatedPfsAndOs3() may be more appropriate. For simulations where the main requirement is a direct medians-plus-Kendall’s-tau specification, especially when OS will be analyzed under a proportional hazards Cox model, CorrelatedPfsAndOs2() provides a compact alternative.

Appendix: Kendall’s Tau for Observed PFS and OS

This appendix gives the derivation behind the formula used in CorrelatedPfsAndOs2().

Let $X$ and $Y$ be nonnegative random variables with exponential margins

$X \sim \operatorname{Exp}(\lambda_X), \qquad Y \sim \operatorname{Exp}(\lambda_Y),$

and joint survival function

$S_{X,Y}(x,y) = \Pr(X>x, Y>y) = \exp\left[ -\left\{(\lambda_X x)^\theta + (\lambda_Y y)^\theta\right\}^{1/\theta} \right], \qquad \theta \ge 1.$

Define $P=\min(X,Y)$ . We want Kendall’s tau between $P$ and $Y$ .

Standardize the margins by setting

$A=\lambda_X X, \qquad B=\lambda_Y Y.$

Then the joint survival function of $(A,B)$ is

$\Pr(A>a, B>b) = \exp\left\{-(a^\theta+b^\theta)^{1/\theta}\right\}.$

Use the polar-type transformation

$R=(A^\theta+B^\theta)^{1/\theta}, \qquad W=\frac{A^\theta}{A^\theta+B^\theta}.$

A Jacobian calculation gives the joint density

$f_{R,W}(r,w)=\frac{1}{\theta} e^{-r}(r+\theta-1), \qquad r>0,\; 0<w<1.$

Thus, $R$ and $W$ are independent and $W \sim \operatorname{Unif}(0,1)$ . The inverse transformation is

$A = R W^{1/\theta}, \qquad B = R(1-W)^{1/\theta}.$

Let

$a=\lambda_X^\theta, \qquad b=\lambda_Y^\theta, \qquad c=\frac{a}{a+b}, \qquad d=\frac{b}{a+b}.$

The event $X \le Y$ is equivalent to $W \le c$ :

$X \le Y \iff \frac{A}{\lambda_X} \le \frac{B}{\lambda_Y} \iff W \le c,$

$P= \begin{cases} X, & W \le c,\\ Y, & W > c. \end{cases}$

Using the survival-copula representation of Kendall’s tau,

$\tau(P,Y)=4E\{S_{P,Y}(P,Y)\}-1.$

When $W \le c$ , $P=X$ and

$S_{P,Y}(P,Y)=S_{X,Y}(X,Y)=e^{-R}.$

When $W>c$ , $P=Y$ . Since

$S_{P,Y}(y,y)=\Pr(P>y,Y>y)=\Pr(X>y,Y>y),$

we have, using $Y=B/\lambda_Y=R(1-W)^{1/\theta}/\lambda_Y$ ,

$S_{P,Y}(Y,Y) = \exp\left[ -R\left(\frac{1-W}{d}\right)^{1/\theta} \right].$

Therefore,

$E\{S_{P,Y}(P,Y)\} = E\{e^{-R}\mathbf 1(W\le c)\} + E\left[ e^{-R((1-W)/d)^{1/\theta}} \mathbf 1(W>c) \right].$

By independence of $R$ and $W$ ,

$E\{S_{P,Y}(P,Y)\} = cL_R(1) + \int_c^1 L_R\left\{\left(\frac{1-w}{d}\right)^{1/\theta}\right\} dw,$

where $L_R(s)=E(e^{-sR})$ . From the density of $R$ ,

$L_R(s) = \frac{1}{\theta} \int_0^\infty e^{-(1+s)r}(r+\theta-1)\,dr = \frac{\theta+(\theta-1)s}{\theta(1+s)^2}.$

In particular,

$L_R(1)=\frac{2\theta-1}{4\theta}.$

For the integral term, use the substitution

$t=\left(\frac{1-w}{d}\right)^{1/\theta}, \qquad dw=-d\theta t^{\theta-1}dt.$

Then

$\int_c^1 L_R\left\{\left(\frac{1-w}{d}\right)^{1/\theta}\right\} dw = d\theta\int_0^1 t^{\theta-1}L_R(t)\,dt.$

Substituting $L_R(t)$ gives

$d\int_0^1 \frac{\theta t^{\theta-1}+(\theta-1)t^\theta}{(1+t)^2} dt.$

Because

$\frac{d}{dt}\left(\frac{t^\theta}{1+t}\right) = \frac{\theta t^{\theta-1}+(\theta-1)t^\theta}{(1+t)^2},$

the integral is

$d\left[\frac{t^\theta}{1+t}\right]_0^1 = \frac{d}{2}.$

Thus,

$E\{S_{P,Y}(P,Y)\} = c\frac{2\theta-1}{4\theta}+\frac{d}{2} = \frac{2\theta-1+d}{4\theta},$

using $c+d=1$ . Finally,

$\tau(P,Y) = 4E\{S_{P,Y}(P,Y)\}-1 = \frac{\theta-1+d}{\theta} = 1-\frac{1}{\theta} \left( \frac{\lambda_X^\theta}{\lambda_X^\theta+\lambda_Y^\theta} \right).$

Since $P$ has rate

$\lambda_P = \left(\lambda_X^\theta+\lambda_Y^\theta\right)^{1/\theta},$

this can also be written as

$\tau(P,Y) = 1-\frac{1}{\theta} \left( \frac{\lambda_X^\theta}{\lambda_P^\theta} \right) = 1-\frac{1}{\theta} \left[ 1-\left(\frac{\lambda_Y}{\lambda_P}\right)^\theta \right].$

Because $\lambda_Y/\lambda_P = m_P/m_Y$ , the formula used by CorrelatedPfsAndOs2() is

$\tau(P,Y) = 1-\frac{1}{\theta} \left[ 1-\left(\frac{m_P}{m_Y}\right)^\theta \right].$