R/dcalasso.R
dcalasso.Rd
dcalasso
fits adaptive lasso for big datasets using multiple linearization methods,
including one-step estimation and least square approximation. This function is able to
fit the adaptive lasso model either when the dataset is being loaded as a whole into data
or when
the datasets are splitted a priori and saved into multiple rds
files.
The algorithm uses a divide-and-conquer one-step estimator as the initial estimator
and uses a least square approximation to the partial likelihood, which
reduces the computation cost. The algorithm currently supports adaptive lasso with
Cox proportional hazards model with or without
time-dependent covariates. Ties in survival data analysis are handled by Efron's method.
The first half of the routine computes an initial estimator (n^1/2 consistent estimator). It first obtains a warm-start by
fitting coxph to the first subset (first random split of data or first data file indicated by data.rds) and then uses one-step
estimation with iter.os rounds to update the warm-start. The one-step estimation loops through each subset and gathering scores
and information matrices. The second half of the routine then shrinks the initial estimator using a least square approximation-based adaptive lasso step.
dcalasso(formula, family = cox.ph(), data = NULL, data.rds = NULL, weights, subset, na.action, offset, lambda = 10^seq(-10, 3, 0.01), gamma = 1, K = 20, iter.os = 2, ncores = 1)
formula | a formula specifying the model. For Cox model, the outcome should be specified as the Surv(start, stop, status) or Surv(start, status) object in the survival package. |
---|---|
family | For Cox model, family should be cox.ph(), or "cox.ph". |
data | data frame containing all variables. |
data.rds | when the dataset is too big to load as a whole into the RAM, one can specify |
weights | a prior weights on each observation |
subset | an expression indicating subset of rows of data used in model fitting |
na.action | how to handle NA |
offset | an offset term with a fixed coefficient of one |
lambda | tuning parameter for the adaptive lasso penalty. penalty = lambda * sum_j |beta_j|/|beta_j initial|^gamma |
gamma | exponent of the adaptive penalty. penalty = lambda * sum_j |beta_j|/|beta_j initial|^gamma |
K | number of division of the full dataset. It will be overwritten to |
iter.os | number of iterations for one-step updates |
ncores | number of cores to use. The iterations will be paralleled using |
adaptive lasso shrinkage estimation
initial unregularized estimator
variance-covariance matrix of unpenalized model
variance-covariance matrix of penalized model
sequence of BIC evaluation at each lambda
number use to penalize the degrees of freedom in BIC.
number of used rows of the data
index for the optimal BIC
minimal BIC
family object of the model
optimal lambda to minimize BIC
degrees of freedom at each lambda
number of covariates
number of one-step iterations
term object of the model
Wang, Yan, Chuan Hong, Nathan Palmer, Qian Di, Joel Schwartz, Isaac Kohane, and Tianxi Cai. "A Fast Divide-and-Conquer Sparse Cox Regression." arXiv preprint arXiv:1804.00735 (2018).
Yan Wang yaw719@mail.harvard.edu, Tianxi Cai tcai@hsph.harvard.edu, Chuan Hong <Chuan_Hong@hms.harvard.edu>