Surrogate-guided ensemble Latent Dirichlet Allocation
sureLDA(
X,
ICD,
NLP,
HU,
filter,
prior = "PheNorm",
weight = "beta",
nEmpty = 20,
alpha = 100,
beta = 100,
burnin = 50,
ITER = 150,
phi = NULL,
nCores = 1,
labeled = NULL,
verbose = FALSE
)
nPatients x nFeatures matrix of EHR feature counts
nPatients x nPhenotypes matrix of main ICD surrogate counts
nPatients x nPhenotypes matrix of main NLP surrogate counts
nPatients-dimensional vector containing the healthcare utilization feature
nPatients x nPhenotypes binary matrix indicating filter-positives
'PheNorm', 'MAP', or nPatients x nPhenotypes matrix of prior probabilities (defaults to PheNorm)
'beta', 'uniform', or nPhenotypes x nFeatures matrix of feature weights (defaults to beta)
Number of 'empty' topics to include in LDA step (defaults to 10)
LDA Dirichlet hyperparameter for patient-topic distribution (defaults to 100)
LDA Dirichlet hyperparameter for topic-feature distribution (defaults to 100)
number of burnin Gibbs iterations (defaults to 50)
number of subsequent iterations for inference (defaults to 150)
(optional) nPhenotypes x nFeatures pre-trained topic-feature distribution matrix
(optional) Number of parallel cores to use only if phi is provided (defaults to 1)
(optional) nPatients x nPhenotypes matrix of a priori labels (set missing entries to NA)
(optional) indicating whether to output verbose progress updates
scores nPatients x nPhenotypes matrix of weighted patient-phenotype assignment counts from LDA step
probs nPatients x nPhenotypes matrix of patient-phenotype posterior probabilities
ensemble Mean of sureLDA posterior and PheNorm/MAP prior
prior nPatients x nPhenotypes matrix of PheNorm/MAP phenotype probability estimates
phi nPhenotypes x nFeatures topic distribution matrix from LDA step
weights nPhenotypes x nFeatures matrix of topic-feature weights