Electronic Health Record (EHR)

PheCAP

High-throughput phenotyping with electronic medical record data using a common semi-supervised approach. This package implements surrogate-assisted feature extraction (SAFE) and common machine learning approaches to train and validate phenotyping models.

PheNorm

The algorithm combines the most predictive variable, such as count of the main International Classification of Diseases (ICD) codes, and other Electronic Health Record (EHR) features (e.g. health utilization and processed clinical note data), to obtain a score for accurate risk prediction and disease classification.

sureLDA

A statistical learning method to simultaneously predict a range of target phenotypes using codified and natural language processing (NLP)-derived Electronic Health Record (EHR) data.

SCORNET

The Semi-supervised Calibration of Risk with Noisy Event Times (SCORNET) is a consistent, semi-supervised, non-parametric survival curve estimator optimized for efficient use of EHR data with a limited number of current status labels.

SAMGEP

A novel semi-supervised machine learning algorithm to predict phenotype event times using Electronic Health Record (EHR) data.

MAP

An automated high-throughput phenotyping method integrating International Classification of Diseases (ICD) codes and narrative data extracted using natural language processing (NLP).

KESER

This packages implements the Knowledge Extraction via Sparse Embedding Regression (KESER) algorithm. We provide functions to use large scale code embeddings to facilitate effective feature selection and knowledge discovery with EHR data.

MASTA

This package implements a two-step semi-supervised learning method (Multi-modal Automated Survival Time Annotation) on predicting event time with longitudinal encounter records.


Natural Language Processing (NLP)

CHANL

The Chart Review Tool Powered by NLP (CHANL) is designed to facilitate chart review of narrative text notes from the electronic medical records (EMR).

NILE

NILE is an efficient and effective software for natural language processing (NLP) of clinical narrative texts. It uses a prefix tree algorithm for named entity recognition, and finite-state machines for semantic analysis, both of which were inspired by the natural reading behavior of humans.

EXTEND

EXTEND is a natural language processing (NLP) tool that can efficiently extract numerical clinical data from different type of narrative notes with high accuracy.


Precision Medicine

OptimalSurrogate

Provides functions to identify an optimal transformation of a potential surrogate marker such that the proportion of the treatment effect on a primary outcome can be inferred based on the treatment effect on this identified optimal transformation and functions to estimate the proportion of treatment effect explained by this optimal transformation.

PanelCurrentStatus

This package contains R functions to compute the conditional censoring logistic (CCL) estimator and model metrics to evaluate risk predictions using panel current status data.

survAccuracyMeasures

This R package computes non-parametric and semi-parametric estimates of common accuracy measures for risk prediction markers from survival data.

survCompetingRisk

This package aims to help to evaluate the prognostic accuracy of a marker with multiple competing risk events. Functions to calculate the AUC, ROC, PPV, and NPV are provided.

survMarkerTwoPhase

This R package computes non-parametric and semi-parametric estimates of accuracy measures for risk prediction markers from survival data under two phase study designs.

SurrogateOutcome

Provides functions to estimate the proportion of treatment effect on a censored primary outcome that is explained by the treatment effect on a censored surrogate outcome/event.

SurrogateTest

Provides functions to test for a treatment effect in terms of the difference in survival between a treatment group and a control group using surrogate marker information obtained at some early time point in a time-to-event outcome setting.


High Dimensional Inference

dcalasso

Fast divide-and-conquer Cox proportional hazards model with adaptive lasso. The dcalasso package aims to fit Cox proportional hazards model to extremely large, when both n and p are extremely large and n>>p.

solid

The package consists 1) a screening and one-step linearization infused DAC (SOLID) algorithm to fit sparse logistic regression to massive datasets, and 2) a modified cross-validation (MCV) that utilizes the side products of the SOLID hence substantially reduce the computational burden.

StructureMC

Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design.

SRAT

SRAT is a fully rank-based and flexible approach to test for association between a set of genetic variants and an outcome, while accounting for within-family correlation and adjusting for covariates. SRAT includes the well-known Wilcoxon rank sum test as a special case.


Shiny Web Applications



© 2018- CELEHS, Harvard University