The function requires as input: * a surrogate, such as the ICD code * the healthcare utilization It can leverage other EHR features (optional) to assist risk prediction.
PheNorm.Prob( nm.logS.ori, nm.utl, dat, nm.X = NULL, corrupt.rate = 0.3, train.size = 10 * nrow(dat) )
| nm.logS.ori | name of the surrogates (log(ICD+1), log(NLP+1) and log(ICD+NLP+1))  | 
    
|---|---|
| nm.utl | name of healthcare utilization (e.g. note count, encounter_num etc)  | 
    
| dat | all data columns need to be log-transformed and need column names  | 
    
| nm.X | additional features other than the main ICD and NLP  | 
    
| corrupt.rate | rate for random corruption denoising, between 0 and 1, default value=0.3  | 
    
| train.size | size of training sample, default value 10 * nrow(dat)  | 
    
list containing probability and beta coefficient