In TrialSimulator
, a trial arm is defined as a
collection of endpoints (and potentially other covariates or biomarkers)
with a data generation process. This vignette demonstrates how to use
the following key functions to define and summarize arms in a simulated
clinical trial setting.
-
endpoint
: Creates one or more endpoints -
add_endpoints
: Add one or more endpoints to an arm -
generate_data
: Generates a dataset from anArms
object (for exploratory purpose) -
print
: Method that displays a summary report of anArms
object summarizing all endpoints in the arm
Define Arm with Mulitple Sets of Endpoints
The function endpoint
can be used to define one or
multiple endpoints simultaneously. These endpoints can be independent
correlated, depending on the generator
provided. In the
following hypothetical example, we construct a custom generator that
simulates PFS, OS, PSA levels at baseline and year 1. A pre-specified
correlation matrix ensures the endpoints are appropriately correlated.
We also ensure that PFS is always less than or equal to OS.
rng <- function(n, pfs_rate, os_rate, psa_mean, psa_sd, corr_matrix){
dist <- list()
dist[['PFS']] <- function(x) qexp(x, rate = pfs_rate)
dist[['OS']] <- function(x) qexp(x, rate = os_rate)
dist[['PSA_baseline']] <- function(x) qnorm(x, mean = psa_mean, sd = psa_sd)
dist[['PSA_year1']] <- function(x) qnorm(x, mean = psa_mean - 12, sd = psa_sd)
dsgn = simdata::simdesign_norta(cor_target_final = corr_matrix,
dist = dist,
transform_initial = data.frame,
names_final = names(dist),
seed_initial = 1)
simdata::simulate_data(dsgn, n_obs = n) %>%
mutate(PFS = pmin(PFS, OS)) %>%
mutate(PFS_event = 1, OS_event = 1)
}
In this generator,
- Baseline PSA follows a normal distribution with mean 20 and SD 4.0.
- After one year of treatment, the average PSA level decreases to 8.
- PFS and OS are exponentially distributed with median times of 2.5 and 4.5 years, respectively.
- A constraint ensures that PFS does not exceed OS.
- PSA readout times are 0 (at baseline) and 1 (at year 1).
The following code defines the endpoints and uses the
print
method to generate a summary report based on 10,000
samples from the generator rng
.
ep1 <- endpoint(name = c('PSA_baseline', 'PSA_year1', 'OS', 'PFS'),
type = c('non-tte', 'non-tte', 'tte', 'tte'),
readout = c(PSA_baseline = 0, PSA_year1 = 1),
generator = rng,
pfs_rate = log(2)/2.5, os_rate = log(2)/4.5,
psa_mean = 20, psa_sd = 4,
corr_matrix = matrix(c(1, .6, -.5, -.4,
.6, 1, -.4, -.3,
-.5, -.4, 1, .7,
-.4, -.3, .7, 1), nrow = 4))
ep1
⚕⚕ Endpoint Name: PSA_baseline, PSA_year1, OS, PFS
⚕⚕ # of Endpoints: 4
We can define another set of endpoints using a separate call to
endpoint()
. However, keep in mind that any endpoints
defined separately are assumed to be independent of those in prior calls
(i.e. PSA_baseline
, PSA_year1
,
PFS
and OS
).
In the following example, we define a biomarker, even though it is
actually not an endpoint. In practice, the function
endpoint
is useful in introducing any variables, including
covariates, biomarkers, sub-group indicators, etc. Ideally, a biomarker
should be integrated into the generator rng
to capture
meaningful correlation with other endpoints.
ep2 <- endpoint(name = 'biomarker',
type = 'non-tte',
readout = c(biomarker = 0),
generator = rbinom,
size = 1, prob = .3)
ep2
⚕⚕ Endpoint Name: biomarker
⚕⚕ # of Endpoints: 1
We now create a treatment arm by combining ep1
and
ep2
. The print
method automatically summarizes
the marginal distributions of all endpoints. As seen, the summary report
of the arm simply concatenates the two reports of ep1
and
ep2
.
trt <- arm(name = 'treated')
trt$add_endpoints(ep1, ep2)
trt
⚕⚕ Arm Name: treated
⚕⚕ # of Endpoints: 5
⚕⚕ Registered Endpoints: PSA_baseline, PSA_year1, OS, PFS,
biomarker
Add Inclusion Criteria for the Arm
We can define inclusion criteria for the arm by passing logical
filter expressions via the ...
argument in
arm()
. These filters are applied to the generated trial
data. For example, the following code restricts enrollment to patients
with
- Baseline PSA > 10
- Positive PSA values at year 1
The summary report will reflect the effect of these inclusion criteria on the simulated population.
trt <- arm(name = 'treated', PSA_baseline > 10 & PSA_year1 > 0)
trt$add_endpoints(ep1, ep2)
trt
⚕⚕ Arm Name: treated
⚕⚕ # of Endpoints: 5
⚕⚕ Registered Endpoints: PSA_baseline, PSA_year1, OS, PFS,
biomarker
Simulate Data Explicitly (Not Recommended)
Although TrialSimulator
allows direct data generation
using the generate_data()
method, it is generally
discouraged. One of the core principles of TrialSimulator
is to separate trial simulation logic from data generation, allowing the
framework to manage data generation and truncation (and/or censoring)
dynamically based on trial milestones.
Nevertheless, for inspection or debugging, one can call
## not recommended
tmp <- trt$generate_data(100)
head(tmp)
#> PFS OS PSA_baseline PSA_year1 PFS_event OS_event
#> 1 1.2410022 2.9010900 17.05084 7.4786703 1 1
#> 2 1.5231352 4.0614705 14.86082 6.7132135 1 1
#> 3 0.8116851 3.0751050 24.34986 9.0405846 1 1
#> 4 3.5868602 10.1470295 13.49077 0.2577257 1 1
#> 5 1.2952825 1.2952825 21.89733 8.6974747 1 1
#> 6 0.3332482 0.3332482 20.30860 7.4350724 1 1
#> PSA_baseline_readout PSA_year1_readout biomarker biomarker_readout
#> 1 0 1 0 0
#> 2 0 1 0 0
#> 3 0 1 1 0
#> 4 0 1 1 0
#> 5 0 1 0 0
#> 6 0 1 0 0
This gives a preview of the patient-level data generated by the
treatment arm configuration (generator, inclusion filters). However,
enrollment schedule and dropout are not taken into account, which is
another reason why we strongly discourage this way to use
TrialSimulator
.