High-throughput phenotyping with electronic medical record data using a common semi-supervised approach. This package implements surrogate-assisted feature extraction (SAFE) and common machine learning approaches to train and validate phenotyping models.
The algorithm combines the most predictive variable, such as count of the main International Classification of Diseases (ICD) codes, and other Electronic Health Record (EHR) features (e.g. health utilization and processed clinical note data), to obtain a score for accurate risk prediction and disease classification.
A statistical learning method to simultaneously predict a range of target phenotypes using codified and natural language processing (NLP)-derived Electronic Health Record (EHR) data.
The Semi-supervised Calibration of Risk with Noisy Event Times (SCORNET) is a consistent, semi-supervised, non-parametric survival curve estimator optimized for efficient use of EHR data with a limited number of current status labels.
A novel semi-supervised machine learning algorithm to predict phenotype event times using Electronic Health Record (EHR) data.
An automated high-throughput phenotyping method integrating International Classification of Diseases (ICD) codes and narrative data extracted using natural language processing (NLP).
This packages implements the Knowledge Extraction via Sparse Embedding Regression (KESER) algorithm. We provide functions to use large scale code embeddings to facilitate effective feature selection and knowledge discovery with EHR data.
This package implements a two-step semi-supervised learning method (Multi-modal Automated Survival Time Annotation) on predicting event time with longitudinal encounter records.
The Chart Review Tool Powered by NLP (CHANL) is designed to facilitate chart review of narrative text notes from the electronic medical records (EMR).
NILE is an efficient and effective software for natural language processing (NLP) of clinical narrative texts. It uses a prefix tree algorithm for named entity recognition, and finite-state machines for semantic analysis, both of which were inspired by the natural reading behavior of humans.
EXTEND is a natural language processing (NLP) tool that can efficiently extract numerical clinical data from different type of narrative notes with high accuracy.
Provides functions to identify an optimal transformation of a potential surrogate marker such that the proportion of the treatment effect on a primary outcome can be inferred based on the treatment effect on this identified optimal transformation and functions to estimate the proportion of treatment effect explained by this optimal transformation.
This package contains R functions to compute the conditional censoring logistic (CCL) estimator and model metrics to evaluate risk predictions using panel current status data.
This R package computes non-parametric and semi-parametric estimates of common accuracy measures for risk prediction markers from survival data.
This package aims to help to evaluate the prognostic accuracy of a marker with multiple competing risk events. Functions to calculate the AUC, ROC, PPV, and NPV are provided.
This R package computes non-parametric and semi-parametric estimates of accuracy measures for risk prediction markers from survival data under two phase study designs.
Provides functions to estimate the proportion of treatment effect on a censored primary outcome that is explained by the treatment effect on a censored surrogate outcome/event.
Provides functions to test for a treatment effect in terms of the difference in survival between a treatment group and a control group using surrogate marker information obtained at some early time point in a time-to-event outcome setting.
Fast divide-and-conquer Cox proportional hazards model with adaptive lasso. The dcalasso package aims to fit Cox proportional hazards model to extremely large, when both n and p are extremely large and n>>p.
The package consists 1) a screening and one-step linearization infused DAC (SOLID) algorithm to fit sparse logistic regression to massive datasets, and 2) a modified cross-validation (MCV) that utilizes the side products of the SOLID hence substantially reduce the computational burden.
Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design.
SRAT is a fully rank-based and flexible approach to test for association between a set of genetic variants and an outcome, while accounting for within-family correlation and adjusting for covariates. SRAT includes the well-known Wilcoxon rank sum test as a special case.
© 2018- CELEHS, Harvard University