phecap_run_feature_extraction.Rd
Run surrogate-assisted feature extraction (SAFE) using unlabeled data and subsampling.
phecap_run_feature_extraction( data, surrogates, subsample_size = 1000L, num_subsamples = 200L, dropout_proportion = 0, frequency_cutoff = 0.5, start_seed = 45600L, verbose = 0L)
data | An object of class PhecapData, obtained by calling PhecapData(...) |
---|---|
surrogates | A list of objects of class PhecapSurrogate, obtained by something like list(PhecapSurrogate(...), PhecapSurrogate(...)) |
subsample_size | An integer scalar giving the size of each subsample |
num_subsamples | The number of subsamples drawn for each surrogate |
dropout_proportion | A scalar between 0 and 1. If it is positive, for each predictor a random subset of observations will be set to zero |
frequency_cutoff | A scalar between 0 and 1. Variables selected in at least this proportion of the subsamples are the variables finally selected |
start_seed | in the i-th subsample, the seed is set to start_seed + i |
verbose | print progress every |
In this unlabeled setting, the extremes of each surrogate are used to define cases and controls. The variables selected are those selected in at least half (or the proportion specified) of the subsamples.
An object of class PhecapFeatureExtraction
, with components
the names of selected features
the proportion of being selected for each feature
See PheCAP-package
for code examples.