Train Phenotyping Model using the Training Labels

Train the phenotyping model on the training dataset, and evaluate its performance via random splits of the training dataset.

phecap_train_phenotyping_model(
  data, surrogates, feature_selected,
  method = "lasso_bic",
  train_percent = 0.7, num_splits = 200L,
  start_seed = 78900L, verbose = 0L)

Arguments

data	an object of class `PhecapData`, obtained by calling `PhecapData(...)`.
surrogates	a list of objects of class `PhecapSurrogate`, obtained by something like `list(PhecapSurrogate(...), PhecapSurrogate(...))`. The surrogates used here might be different from that used in feature extraction.
feature_selected	a character vector of the features that should be included in the model, probably returned by `phecap_run_feature_extraction` (but not necessary). The features listed here might be different from those returned from feature extraction.
method	Either a character vector or a list of two components. If a character vector is used, possible entries are given below. When at least two methods are specified, the predicted probability is the simple average of the predicted probabilities from each method. `'plain'` (logistic regression without penalty) `'ridge_cv'` (logistic regression with ridge penalty and CV tuning) `'lasso_cv'` (logistic regression with lasso penalty and CV tuning) `'lasso_bic'` (logistic regression with lasso penalty and BIC tuning) `'alasso_cv'` (logistic regression with adaptive lasso penalty and CV tuning) `'alasso_bic'` (logistic regression with adaptive lasso penalty and BIC tuning) `'svm'` (support vector machine with CV tuning, package `e1071` needed, `subject_weight` not supported) `'rf'` (random forest with default parameters, package `randomForestSRC` needed) `'xgb'` (extreme gradient boosting with default parameters, package `xgboost` needed) If a list is used, it should contain two named components as follows. `fit` (a function for model fitting, with arguments `x`, `y`, `subject_weight`, `penalty_weight`) `predict` (a function for prediction, with arguments `object` which was returned by `fit`, `x` which was used as the new data to predict on)
train_percent	The percentage (between 0 and 1) of labels that are used for model training during random splits
num_splits	The number of random splits.
start_seed	in the i-th split, the seed is set to start_seed + i.
verbose	print progress every verbose splits if verbose is positive, or remain quiet if verbose is zero

Value

An object of class PhecapModel, with components

coefficients

the fitted object

method

the method used for model training

feature_selected

the feature selected by SAFE

train_roc

ROC on training dataset

train_auc

AUC on training dataset

split_roc

average ROC on random splits of training dataset

split_auc

average AUC on random splits of training dataset

fit_function

the function used for fitting

predict_function

the function used for prediction

Arguments

Value

See also