Panel current status data arise in longitudinal studies where subjects are examined for the occurrence of an event only at pre-scheduled visit times, rather than being continuously monitored. For each subject at each examination time, we only observe whether the event has occurred (status = 1) or not (status = 0) by that time.
This type of data is common in biomedical studies where continuous monitoring is impractical or costly, such as:
The PanelCurrentStatus
package implements the
conditional censoring logistic (CCL) estimator, which transforms panel
current status data into a binary outcome analysis. This approach offers
advantages over traditional methods by:
The example dataset contains simulated panel current status data with the following structure:
d1
to d5
: Binary indicators (0/1) for
event status at each of 5 examination timesc1
to c5
: Corresponding examination
timesz1
to z3
: Predictor variables
(covariates)Let’s take a look at the first few rows:
data <- readRDS("data.rds")
head(data)
#> d1 d2 d3 d4 d5 c1 c2 c3 c4 c5 z1
#> 1 0 0 0 1 1 1.466907 1.763902 2.078570 2.352064 2.633078 -2.6990811
#> 2 0 1 1 1 1 1.541113 1.708809 2.081996 2.366938 2.586009 0.6203506
#> 3 0 0 0 0 1 1.581326 1.803390 2.054550 2.233843 2.656011 2.4248842
#> 4 0 0 1 1 1 1.667167 1.792692 2.179182 2.293049 2.626627 -5.2451395
#> 5 0 0 1 1 1 1.460037 1.779462 1.982407 2.415980 2.548397 0.9595520
#> 6 0 0 1 1 1 1.489140 1.856881 2.175501 2.363438 2.631624 1.1315754
#> z2 z3
#> 1 -2.6952075 0.73815782
#> 2 0.6741001 -0.03317348
#> 3 -3.4416333 -2.97365274
#> 4 1.4207321 -0.41903780
#> 5 1.5718480 -2.36532196
#> 6 -4.2616836 -3.41857940
Before analysis, we need to format the data appropriately for the CCL estimation. We’ll separate the components into matrices:
delta
: Matrix of binary event indicatorsctime
: Matrix of examination timespredictors
: Matrix of covariatesThe ccl.fit()
function computes the scaled coefficients
from the conditional censoring logistic (CCL) estimator. It takes the
following parameters:
delta
: Matrix of binary event indicatorsctime
: Matrix of examination timespredictors
: Matrix of covariatesn.ptb
: Number of bootstrap samples for standard error
estimationseed
: Random seed for reproducibility
fit <- ccl.fit(delta, ctime, predictors, n.ptb = 500, seed = 1)
data.frame(est = fit[[2]], est.se = fit[[3]])
#> est est.se
#> 1 0.6391523 0.04615756
#> 2 0.5501780 0.05454640
#> 3 0.5373904 0.04816519
The output shows the estimated coefficients (est
) and
their standard errors (est.se
) for each predictor. These
coefficients represent the association between each predictor and the
risk of the event, similar to coefficients in a Cox proportional hazards
model but up to a scale multiplier.
To evaluate the prediction performance of the estimated risk model, we can calculate various metrics from the ROC curve using kernel smoothing. First, we need to prepare the data in a “long” format:
alldata <- NULL
for (k in 1:ncol(delta)) {
DF <- data.frame(delta = delta[, k], ctime = ctime[, k], predictors)
alldata <- rbind(alldata, DF)
}
Then we calculate evaluation metrics using the ccl.roc()
function:
t0
: Prediction time point (here we use the median
examination time)h
: Bandwidth for kernel smoothing (calculated as a
function of sample size)
t0 <- median(alldata$ctime)
h <- sd(alldata$ctime) / n^0.3
ans <- ccl.roc(alldata, fit, t0, h)
data.frame(roc = ans[[1]], roc.se = ans[[2]])
#> roc roc.se
#> auc.tilde.lower 0.6049470 0.02153376
#> auc.tilde.upper 0.6049485 0.02153363
#> tpr.05.tilde 0.1307085 0.02770687
#> tpr.10.tilde 0.1799483 0.02987241
The output includes:
auc.tilde.lower
and auc.tilde.upper
: Lower
and upper bounds for the area under the ROC curve (AUC)tpr.05.tilde
and tpr.10.tilde
: True
positive rates at 5% and 10% false positive ratesAUC values range from 0.5 (no discrimination) to 1 (perfect discrimination). Higher values indicate better model performance.
proc.time()
#> user system elapsed
#> 7.721 0.706 9.880