The function requires as input: * a surrogate, such as the ICD code * the healthcare utilization It can leverage other EHR features (optional) to assist risk prediction.

PheNorm.Prob(
  nm.logS.ori,
  nm.utl,
  dat,
  nm.X = NULL,
  corrupt.rate = 0.3,
  train.size = 10 * nrow(dat)
)

Arguments

nm.logS.ori

name of the surrogates (log(ICD+1), log(NLP+1) and log(ICD+NLP+1))

nm.utl

name of healthcare utilization (e.g. note count, encounter_num etc)

dat

all data columns need to be log-transformed and need column names

nm.X

additional features other than the main ICD and NLP

corrupt.rate

rate for random corruption denoising, between 0 and 1, default value=0.3

train.size

size of training sample, default value 10 * nrow(dat)

Value

list containing probability and beta coefficient

Examples

if (FALSE) { set.seed(1234) fit.dat <- read.csv("https://raw.githubusercontent.com/celehs/PheNorm/master/data-raw/data.csv") fit.phenorm=PheNorm.Prob("ICD", "utl", fit.dat, nm.X = NULL, corrupt.rate=0.3, train.size=nrow(fit.dat)); head(fit.phenorm$probs) }