Electronic Health Record (EHR)

PheCAP High-throughput phenotyping with electronic medical record data using a common semi-supervised approach. This package implements surrogate-assisted feature extraction (SAFE) and common machine learning approaches to train and validate phenotyping models. Link
PheNorm The algorithm combines the most predictive variable, such as count of the main International Classification of Diseases (ICD) codes, and other Electronic Health Record (EHR) features (e.g. health utilization and processed clinical note data), to obtain a score for accurate risk prediction and disease classification. Link
sureLDA A statistical learning method to simultaneously predict a range of target phenotypes using codified and natural language processing (NLP)-derived Electronic Health Record (EHR) data. Link
SCORNET The Semi-supervised Calibration of Risk with Noisy Event Times (SCORNET) is a consistent, semi-supervised, non-parametric survival curve estimator optimized for efficient use of EHR data with a limited number of current status labels. Link
SAMGEP A novel semi-supervised machine learning algorithm to predict phenotype event times using Electronic Health Record (EHR) data. Link
MAP An automated high-throughput phenotyping method integrating International Classification of Diseases (ICD) codes and narrative data extracted using natural language processing (NLP). Link
KESER This packages implements the Knowledge Extraction via Sparse Embedding Regression (KESER) algorithm. We provide functions to use large scale code embeddings to facilitate effective feature selection and knowledge discovery with EHR data. Link
MASTA This package implements a two-step semi-supervised learning method (Multi-modal Automated Survival Time Annotation) on predicting event time with longitudinal encounter records. Link


Natural Language Processing (NLP)

CHANL The Chart Review Tool Powered by NLP (CHANL) is designed to facilitate chart review of narrative text notes from the electronic medical records (EMR). Link
NILE NILE is an efficient and effective software for natural language processing (NLP) of clinical narrative texts. It uses a prefix tree algorithm for named entity recognition, and finite-state machines for semantic analysis, both of which were inspired by the natural reading behavior of humans. Link
EXTEND EXTEND is a natural language processing (NLP) tool that can efficiently extract numerical clinical data from different type of narrative notes with high accuracy. Link


Precision Medicine

OptimalSurrogate Provides functions to identify an optimal transformation of a potential surrogate marker such that the proportion of the treatment effect on a primary outcome can be inferred based on the treatment effect on this identified optimal transformation and functions to estimate the proportion of treatment effect explained by this optimal transformation. Link
PanelCurrentStatus This package contains R functions to compute the conditional censoring logistic (CCL) estimator and model metrics to evaluate risk predictions using panel current status data. Link
survAccuracyMeasures This R package computes non-parametric and semi-parametric estimates of common accuracy measures for risk prediction markers from survival data. Link
survCompetingRisk This package aims to help to evaluate the prognostic accuracy of a marker with multiple competing risk events. Functions to calculate the AUC, ROC, PPV, and NPV are provided. Link
survMarkerTwoPhase This R package computes non-parametric and semi-parametric estimates of accuracy measures for risk prediction markers from survival data under two phase study designs. Link
SurrogateOutcome Provides functions to estimate the proportion of treatment effect on a censored primary outcome that is explained by the treatment effect on a censored surrogate outcome/event. Link
SurrogateTest Provides functions to test for a treatment effect in terms of the difference in survival between a treatment group and a control group using surrogate marker information obtained at some early time point in a time-to-event outcome setting. Link


High Dimensional Inference

dcalasso Fast divide-and-conquer Cox proportional hazards model with adaptive lasso. The dcalasso package aims to fit Cox proportional hazards model to extremely large, when both n and p are extremely large and n>>p.  Link
solid The package consists 1) a screening and one-step linearization infused DAC (SOLID) algorithm to fit sparse logistic regression to massive datasets, and 2) a modified cross-validation (MCV) that utilizes the side products of the SOLID hence substantially reduce the computational burden. Link
StructureMC Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design. Link
SRAT SRAT is a fully rank-based and flexible approach to test for association between a set of genetic variants and an outcome, while accounting for within-family correlation and adjusting for covariates. SRAT includes the well-known Wilcoxon rank sum test as a special case. Link



© 2018- CELEHS, Harvard University