The PheCAP package implements surrogate-assisted feature extraction (SAFE) and common machine learning approaches to train and validate phenotyping models. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures, which reduce the level of manual input, and machine learning approaches for algorithm training.
Install stable version from CRAN:
install.packages("PheCAP")
Install development version from GitHub:
# install.packages("remotes") remotes::install_github("celehs/PheCAP")
Follow the main steps, and try the R codes from the simulated data and real EHR data examples.
Yichi Zhang*
, Tianrun Cai*
, Sheng Yu*
, Kelly Cho, Chuan Hong, Jiehuan Sun, Jie Huang, Yuk-Lam Ho, Ashwin Ananthakrishnan, Zongqi Xia, Stanley Shaw, Vivian Gainer, Victor Castro, Nicholas Link, Jacqueline Honerlaw, Selena Huang, David Gagnon, Elizabeth Karlson, Robert Plenge, Peter Szolovits, Guergana Savova, Susanne Churchill, Christopher O’Donnell, Shawn Murphy, J Michael Gaziano, Isaac Kohane, Tianxi Cai*
, and Katherine Liao*
. Methods for High-throughput Phenotyping with Electronic Medical Record Data Using a Common Semi-supervised Approach (PheCAP). Nature Protocols (2019). *
contributed equally.
Yu, S., Chakrabortty, A., Liao, K. P., Cai, T., Ananthakrishnan, A. N., Gainer, V. S., … Cai, T. Surrogate-assisted feature extraction for high-throughput phenotyping. Journal of the American Medical Informatics Association (2017), e143-e149.
Liao, K. P., Cai, T., Savova, G. K., Murphy, S. N., Karlson, E. W., Ananthakrishnan, A. N., … Kohane, I. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. BMJ (2015), 350(apr24 11), h1885–h1885.