Run surrogate-assisted feature extraction (SAFE) using unlabeled data and subsampling.

phecap_run_feature_extraction(
  data, surrogates,
  subsample_size = 1000L, num_subsamples = 200L,
  dropout_proportion = 0, frequency_cutoff = 0.5,
  start_seed = 45600L, verbose = 0L)

Arguments

data

An object of class PhecapData, obtained by calling PhecapData(...)

surrogates

A list of objects of class PhecapSurrogate, obtained by something like list(PhecapSurrogate(...), PhecapSurrogate(...))

subsample_size

An integer scalar giving the size of each subsample

num_subsamples

The number of subsamples drawn for each surrogate

dropout_proportion

A scalar between 0 and 1. If it is positive, for each predictor a random subset of observations will be set to zero

frequency_cutoff

A scalar between 0 and 1. Variables selected in at least this proportion of the subsamples are the variables finally selected

start_seed

in the i-th subsample, the seed is set to start_seed + i

verbose

print progress every verbose subsample if verbose is positive, or remain quiet if verbose is zero

Details

In this unlabeled setting, the extremes of each surrogate are used to define cases and controls. The variables selected are those selected in at least half (or the proportion specified) of the subsamples.

Value

An object of class PhecapFeatureExtraction, with components

selected

the names of selected features

frequency

the proportion of being selected for each feature

See also

See PheCAP-package for code examples.