EXTraction of EMR Numerical Data (EXTEND) was develop by Tianrun Cai at Brigham and Women’s Hospital, Katherine P. Liao at Brigham and Women’s Hospital, Frank Rybicki at University of Ottawa, Tianxi Cai at Harvard T.H. Chan School of Public Health. EXTEND is a natural language processing (NLP) tool that can efficiently extract numerical clinical data from different type of narrative notes with high accuracy. By expanding the dictionary and developing new rules, the usage of EXTEND can be easily expanded to extract additional numerical data important in clinical outcomes research.
GitHub RepoEXTEND
EXTraction of EMR Numerical Data
Overview
Installation
The following installation steps have been tested to work in a 64-bit Python 3.7 environment on both Windows 10 and Windows Server 2016.
Installation Steps:
- Create a system environment variable called
ENTEND_HOME
, and assign a desired path of the EXTEND main folder as the value.
- Download and unzip the EXTEND folder.
- In a command line window, change the directory to folder
EXTEND-master
.
- Run
python setup.py install
.
- In order to perform data extraction, please select some of variables below to run (Note: it’s case sensitive).
- Current version of EXTEND can be used to extract variables in the list:
['ECOG', 'EF', 'BMI', 'H', 'W', 'RR’, 'T', 'BP', 'HR', 'Sat’, 'PDL1', 'Crn', 'HbA1C']
ECOG
: Eastern Cooperative Oncology GroupEF
: Ejection FractionBMI
: Body Mass IndexH
: HeightW
: WeightRR
: Respiratory RateT
: TemperatureHR
: Heart RateSat
: Oxygen SaturationPDL1
: Programmed death-ligand 1Crn
: CreatinineHbA1C
: Hemoglobin A1C
- Example: If we would like to etract
EF
andBMI
, we can use['EF', 'BMI']
in the script.
Reference
Cai T, Zhang L, Yang N, Kumamaru KK, Rybicki FJ, Cai T, Liao KP. EXTraction of EMR numerical data: an efficient and generalizable tool to EXTEND clinical research. BMC Medical Informatics and Decision Making. 2019;19(1):226. doi: 10.1186/s12911-019-0970-1. PMID: 31730484; PMCID: PMC6858776.