Methodology
Federated Transfer Learning
Automated Feature Selection
ARCH (Aggregated naRrative Codified Health) records analysis generates a large-scale knowledge graph for a comprehensive set of EHR codified and narrative features (Gan et al., 2025). It allows:
- feature representation
- information extraction
- uncertainty quantification
The ARCH algorithm first derives embedding vectors from a co-occurrence matrix of all EHR concepts and then generates cosine similarities along with associated p-values to measure the strength of relatedness between clinical features with statistical certainty quantification. In the final step, ARCH performs a sparse embedding regression to remove indirect linkage between entity pairs.
DOME (DirectiOnal Medical Embedding) is another algorithm for knowledge graph construction from EHR data (Wen et al., 2025). The DOME algorithm encodes temporal directional relationships between medical concepts using summary-level EHR data. DOME begins by aggregating patient-level EHR data into an asymmetric co-occurrence matrix. It then calculates two Positive Pointwise Mutual Information (PPMI) matrices that encode the pairwise prior and posterior dependencies between medical concepts. Next, a joint matrix factorization is applied to these PPMI matrices, generating three vectors for each concept: one semantic embedding and two directional context embeddings. Together, these vectors provide a comprehensive representation of the temporal relationships between EHR concepts.