Introduction to Panel Current Status Data

Panel current status data arise in longitudinal studies where subjects are examined for the occurrence of an event only at pre-scheduled visit times, rather than being continuously monitored. For each subject at each examination time, we only observe whether the event has occurred (status = 1) or not (status = 0) by that time.

This type of data is common in biomedical studies where continuous monitoring is impractical or costly, such as:

  • Disease progression studies with periodic clinical visits
  • Longitudinal cohort studies with scheduled examinations
  • Cancer screening programs with regular check-ups

The PanelCurrentStatus package implements the conditional censoring logistic (CCL) estimator, which transforms panel current status data into a binary outcome analysis. This approach offers advantages over traditional methods by:

  • Building on existing logistic regression estimators
  • Incorporating monitoring time information into the working model
  • Providing better performance in smaller sample sizes
  • Remaining robust under various model specifications

Example Dataset

The example dataset contains simulated panel current status data with the following structure:

  • d1 to d5: Binary indicators (0/1) for event status at each of 5 examination times
  • c1 to c5: Corresponding examination times
  • z1 to z3: Predictor variables (covariates)

Let’s take a look at the first few rows:

data <- readRDS("data.rds")
head(data)
#>   d1 d2 d3 d4 d5       c1       c2       c3       c4       c5         z1
#> 1  0  0  0  1  1 1.466907 1.763902 2.078570 2.352064 2.633078 -2.6990811
#> 2  0  1  1  1  1 1.541113 1.708809 2.081996 2.366938 2.586009  0.6203506
#> 3  0  0  0  0  1 1.581326 1.803390 2.054550 2.233843 2.656011  2.4248842
#> 4  0  0  1  1  1 1.667167 1.792692 2.179182 2.293049 2.626627 -5.2451395
#> 5  0  0  1  1  1 1.460037 1.779462 1.982407 2.415980 2.548397  0.9595520
#> 6  0  0  1  1  1 1.489140 1.856881 2.175501 2.363438 2.631624  1.1315754
#>           z2          z3
#> 1 -2.6952075  0.73815782
#> 2  0.6741001 -0.03317348
#> 3 -3.4416333 -2.97365274
#> 4  1.4207321 -0.41903780
#> 5  1.5718480 -2.36532196
#> 6 -4.2616836 -3.41857940

Data Preparation

Before analysis, we need to format the data appropriately for the CCL estimation. We’ll separate the components into matrices:

  • delta: Matrix of binary event indicators
  • ctime: Matrix of examination times
  • predictors: Matrix of covariates
n <- nrow(data)
delta <- as.matrix(data[, 1:5])
ctime <- as.matrix(data[, 5 + 1:5])
predictors <- as.matrix(data[, 10 + 1:3])

Estimating the Model Parameters

The ccl.fit() function computes the scaled coefficients from the conditional censoring logistic (CCL) estimator. It takes the following parameters:

  • delta: Matrix of binary event indicators
  • ctime: Matrix of examination times
  • predictors: Matrix of covariates
  • n.ptb: Number of bootstrap samples for standard error estimation
  • seed: Random seed for reproducibility
fit <- ccl.fit(delta, ctime, predictors, n.ptb = 500, seed = 1)
data.frame(est = fit[[2]], est.se = fit[[3]])
#>         est     est.se
#> 1 0.6391523 0.04615756
#> 2 0.5501780 0.05454640
#> 3 0.5373904 0.04816519

The output shows the estimated coefficients (est) and their standard errors (est.se) for each predictor. These coefficients represent the association between each predictor and the risk of the event, similar to coefficients in a Cox proportional hazards model but up to a scale multiplier.

Evaluating Model Performance

To evaluate the prediction performance of the estimated risk model, we can calculate various metrics from the ROC curve using kernel smoothing. First, we need to prepare the data in a “long” format:

alldata <- NULL
for (k in 1:ncol(delta)) {
  DF <- data.frame(delta = delta[, k], ctime = ctime[, k], predictors)
  alldata <- rbind(alldata, DF)
}

Then we calculate evaluation metrics using the ccl.roc() function:

  • t0: Prediction time point (here we use the median examination time)
  • h: Bandwidth for kernel smoothing (calculated as a function of sample size)
t0 <- median(alldata$ctime)
h <- sd(alldata$ctime) / n^0.3
ans <- ccl.roc(alldata, fit, t0, h)
data.frame(roc = ans[[1]], roc.se = ans[[2]])
#>                       roc     roc.se
#> auc.tilde.lower 0.6049470 0.02153376
#> auc.tilde.upper 0.6049485 0.02153363
#> tpr.05.tilde    0.1307085 0.02770687
#> tpr.10.tilde    0.1799483 0.02987241

The output includes:

  • auc.tilde.lower and auc.tilde.upper: Lower and upper bounds for the area under the ROC curve (AUC)
  • tpr.05.tilde and tpr.10.tilde: True positive rates at 5% and 10% false positive rates

AUC values range from 0.5 (no discrimination) to 1 (perfect discrimination). Higher values indicate better model performance.

proc.time()
#>    user  system elapsed 
#>   7.721   0.706   9.880

Conclusion

The PanelCurrentStatus package provides a robust method for developing and evaluating risk prediction models with panel current status data. The CCL estimator offers advantages in terms of computational efficiency and performance in smaller samples compared to traditional methods.