Divide-and-conquer (DAC) is a commonly used strategy to overcome the challenges of extraordinarily large data, by first breaking the dataset into series of data blocks, then combining results from individual data blocks to obtain a final estimation. We propose a screening and one-step linearization infused DAC (SOLID) algorithm to fit sparse logistic regression to massive datasets, by integrating the DAC strategy with a screening step and sequences of linearization.
The package solid
consists 1) a screening and one-step linearization infused DAC (SOLID) algorithm to fit sparse logistic regression to massive datasets, and 2) a modified cross-validation (MCV) that utilizes the side products of the SOLID hence substantially reduce the computational burden.
Install development version from GitHub:
# install.packages("remotes") remotes::install_github("celehs/solid")
Hong C, Wang Y, Cai T. A divide-and-conquer method for sparse risk prediction and evaluation. Biostatistics (Oxford, England). 2020 Sep. https://doi.org/10.1093/biostatistics/kxaa031