TY - JOUR
T1 - The high-dimensional space of human diseases built from diagnosis records and mapped to genetic loci
AU - Jia, Gengjie
AU - Li, Yu
AU - Zhong, Xue
AU - Wang, Kanix
AU - Pividori, Milton
AU - Alomairy, Rabab M.
AU - Esposito, Aniello
AU - Ltaief, Hatem
AU - Terao, Chikashi
AU - Akiyama, Masato
AU - Matsuda, Koichi
AU - Keyes, David E.
AU - Im, Hae Kyung
AU - Gojobori, Takashi
AU - Kamatani, Yoichiro
AU - Kubo, Michiaki
AU - Cox, Nancy J.
AU - Evans, James
AU - Gao, Xin
AU - Rzhetsky, Andrey
N1 - KAUST Repository Item: Exported on 2023-05-25
Acknowledged KAUST grant number(s): FCC/1/1976-26-01, FCS/1/4102-02-01, REI/1/0018-01-01, REI/1/4473-01-01
Acknowledgements: We thank E. Gannon and M. Rzhetsky for comments on earlier versions of this manuscript, and many volunteers whose data are used in this study. This work was funded by the DARPA Big Mechanism program under ARO contract W911NF1410333, by National Institutes of Health grants R01HL122712, 1P50MH094267 and U01HL108634-01, and by a gift from L. and K. Dauten to A.R. Additional support came from King Abdullah University of Science and Technology, award numbers FCS/1/4102-02-01, FCC/1/1976-26-01, REI/1/0018-01-01 and REI/1/4473-01-01 to X.G.; and came from Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, and the Central Public-interest Scientific Institution Basal Research Fund (11024316000202300001) to G.J. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
PY - 2023/5/22
Y1 - 2023/5/22
N2 - Human diseases are traditionally studied as singular, independent entities, limiting researchers’ capacity to view human illnesses as dependent states in a complex, homeostatic system. Here, using time-stamped clinical records of over 151 million unique Americans, we construct a disease representation as points in a continuous, high-dimensional space, where diseases with similar etiology and manifestations lie near one another. We use the UK Biobank cohort, with half a million participants, to perform a genome-wide association study of newly defined human quantitative traits reflecting individuals’ health states, corresponding to patient positions in our disease space. We discover 116 genetic associations involving 108 genetic loci and then use ten disease constellations resulting from clustering analysis of diseases in the embedding space, as well as 30 common diseases, to demonstrate that these genetic associations can be used to robustly predict various morbidities.
AB - Human diseases are traditionally studied as singular, independent entities, limiting researchers’ capacity to view human illnesses as dependent states in a complex, homeostatic system. Here, using time-stamped clinical records of over 151 million unique Americans, we construct a disease representation as points in a continuous, high-dimensional space, where diseases with similar etiology and manifestations lie near one another. We use the UK Biobank cohort, with half a million participants, to perform a genome-wide association study of newly defined human quantitative traits reflecting individuals’ health states, corresponding to patient positions in our disease space. We discover 116 genetic associations involving 108 genetic loci and then use ten disease constellations resulting from clustering analysis of diseases in the embedding space, as well as 30 common diseases, to demonstrate that these genetic associations can be used to robustly predict various morbidities.
UR - http://hdl.handle.net/10754/692028
UR - https://www.nature.com/articles/s43588-023-00453-y
U2 - 10.1038/s43588-023-00453-y
DO - 10.1038/s43588-023-00453-y
M3 - Article
C2 - 38177845
SN - 2662-8457
JO - Nature Computational Science
JF - Nature Computational Science
ER -