survmaximin.Rmd
To illustrate the usage of the SurvMaximin algorithm, first load in the simulated data, which includes a coefficient matrix derived locally from each source site and a varable covariance matrix from one target site.
library(SurvMaximin)
data(B_source); dim(B_source)
#> [1] 25 15
data(Sigma_target); dim(Sigma_target)
#> [1] 25 25
Then we fit the SurvMaximin model by calling the survmaximin
function. Note that the \(delta\) parameter controls the ridge penalty with the default value as 0. The transfer-learning coefficients are saved as beta.est
in the output list. Weights for each source site are stored as weight
, as well.
output <- survmaximin(B_source, Sigma_target, delta=0.5)
output$beta.est
#> [,1]
#> [1,] 2.636780e-01
#> [2,] 5.528960e-01
#> [3,] 1.153801e-02
#> [4,] 1.192360e-01
#> [5,] -1.323856e-01
#> [6,] -7.834660e-02
#> [7,] -4.310746e-01
#> [8,] -9.672508e-02
#> [9,] 0.000000e+00
#> [10,] -1.075174e-02
#> [11,] -8.814115e-03
#> [12,] 5.321782e-04
#> [13,] 0.000000e+00
#> [14,] 3.290181e-03
#> [15,] 5.013197e-03
#> [16,] -5.828835e-02
#> [17,] -2.895772e-03
#> [18,] 1.339693e-03
#> [19,] 0.000000e+00
#> [20,] 3.331753e-03
#> [21,] -3.333063e-03
#> [22,] 3.040951e-03
#> [23,] -1.409748e-02
#> [24,] 1.674652e-03
#> [25,] 5.870561e-05
output$weight
#> [1] 0.58479638 0.05892942 0.00000000 0.06756831 0.00000000 0.00000000
#> [7] 0.00000000 0.00000000 0.00000000 0.08022597 0.09000053 0.04396359
#> [13] 0.04215204 0.00000000 0.03236377
To evaluate the performance of the SurvMaximin model, we first import the validation dataset.
data(x.valid); length(x.valid)
#> [1] 2000
data(z.valid); dim(z.valid)
#> [1] 2000 25
data(delta.valid); length(delta.valid)
#> [1] 2000
We can use Est.Cval
function in the survC1
package to calculate the C statistics from the validation data set and evaluate the model performance.
valid.dat <- data.frame(`t_to_event_` = x.valid,
`death_ind` = delta.valid,
`score` = z.valid %*% output$beta.est)
c.maximin <- survC1::Est.Cval(valid.dat, tau = 5)$Dhat
c.maximin
#> [1] 0.7433596
If all sites can be treated as the target site and each site demands a SurvMaximin model to be fitted, then users can first store the locally estimated coefficients in one matrix (\(p\times L\) where \(L\) denotes the total number of sites), and data covariance matrices into a list:
data(B_all); dim(B_all)
#> [1] 25 16
data(Sigma_all)
length(Sigma_all); dim(Sigma_all[[1]])
#> [1] 16
#> [1] 25 25
output <- survmaximin_fed(B_all, Sigma_all, delta=0.5)
length(output)
#> [1] 16
output[[1]]$beta.est
#> [,1]
#> [1,] 0.3037710858
#> [2,] 0.4775245024
#> [3,] 0.0861233420
#> [4,] 0.2139044058
#> [5,] -0.2610936626
#> [6,] -0.1564759896
#> [7,] -0.4658850947
#> [8,] -0.2946007823
#> [9,] -0.0113981649
#> [10,] 0.0070111543
#> [11,] -0.0144088932
#> [12,] -0.0036474729
#> [13,] -0.0256351874
#> [14,] 0.0061266944
#> [15,] 0.0054613656
#> [16,] -0.0111721845
#> [17,] -0.0055267140
#> [18,] 0.0008517773
#> [19,] -0.0001285146
#> [20,] 0.0060341509
#> [21,] 0.0120849586
#> [22,] 0.0058385248
#> [23,] 0.0197415697
#> [24,] 0.0021760373
#> [25,] -0.0148486143
output[[1]]$weight
#> [1] 0.09415249 0.00000000 0.05996628 0.00733774 0.00000000 0.00000000
#> [7] 0.00000000 0.01096313 0.12935355 0.14112495 0.09401698 0.06373866
#> [13] 0.00000000 0.09032024 0.30902598
To visualize the estimated results and inspect the sparsity of the estimators, we show the density plot for the survmaximin coefficient of each covariate from all sites as below.
library(ggplot2)
beta.all = c()
p = 25; L = 16
for(i in 1:L){
for(j in 1:p){
beta.all = rbind(beta.all,
data.frame(`variable` = paste0('covar',j), `site` = i, `maximin_coeff` = output[[i]]$beta.est[j]))
}
}
beta.all$variable = factor(beta.all$variable, levels = paste0('covar', 1:25))
ggplot(beta.all, ggplot2::aes(x = maximin_coeff)) +
geom_density() +
xlim(c(-0.2, 0.2)) +
geom_vline(xintercept = 0, col = 'red', linetype = 2) +
facet_wrap(.~variable, scales = 'free_y')
#> Warning: Removed 51 rows containing non-finite values (stat_density).