Improving polygenic risk prediction performance by integrating electronic health records through phenotype embedding

EEPRS integrates electronic health record-derived phenotype embeddings with GWAS summary statistics to improve polygenic risk prediction. Using embedding methods such as Word2Vec and GPT, EEPRS enhances both single- and multi-trait PRS performance and reveals interpretable phenotype clusters, offering a scalable, interpretable framework for integrating clinical and genetic information.