TY - JOUR
T1 - Latent space arithmetic on data embeddings from healthy multi-tissue human RNA-seq decodes disease modules
AU - de Weerd, Hendrik A.
AU - Guala, Dimitri
AU - Gustafsson, Mika
AU - Synnergren, Jane
AU - Tegnér, Jesper
AU - Lubovac-Pilav, Zelmina
AU - Magnusson, Rasmus
N1 - Publisher Copyright:
© 2024 The Author(s)
PY - 2024/11/8
Y1 - 2024/11/8
N2 - Computational analyses of transcriptomic data have dramatically improved our understanding of complex diseases. However, such approaches are limited by small sample sets of disease-affected material. We asked if a variational autoencoder trained on large groups of healthy human RNA sequencing (RNA-seq) data can capture the fundamental gene regulation system and generalize to unseen disease changes. Importantly, we found this model to successfully compress unseen transcriptomic changes from 25 independent disease datasets. We decoded disease-specific signals from the latent space and found them to contain more disease-specific genes than the corresponding differential expression analysis in 20 of 25 cases. Finally, we matched these disease signals with known drug targets and extracted sets of known and potential pharmaceutical candidates. In summary, our study demonstrates how data-driven representation learning enables the arithmetic deconstruction of the latent space, facilitating the dissection of disease mechanisms and drug targets.
AB - Computational analyses of transcriptomic data have dramatically improved our understanding of complex diseases. However, such approaches are limited by small sample sets of disease-affected material. We asked if a variational autoencoder trained on large groups of healthy human RNA sequencing (RNA-seq) data can capture the fundamental gene regulation system and generalize to unseen disease changes. Importantly, we found this model to successfully compress unseen transcriptomic changes from 25 independent disease datasets. We decoded disease-specific signals from the latent space and found them to contain more disease-specific genes than the corresponding differential expression analysis in 20 of 25 cases. Finally, we matched these disease signals with known drug targets and extracted sets of known and potential pharmaceutical candidates. In summary, our study demonstrates how data-driven representation learning enables the arithmetic deconstruction of the latent space, facilitating the dissection of disease mechanisms and drug targets.
KW - disease mechanisms
KW - disease signal extraction
KW - drug repurposing
KW - gene expression analysis
KW - latent space analysis
KW - module inference
KW - variational autoencoder
UR - http://www.scopus.com/inward/record.url?scp=85208221759&partnerID=8YFLogxK
U2 - 10.1016/j.patter.2024.101093
DO - 10.1016/j.patter.2024.101093
M3 - Article
C2 - 39568475
AN - SCOPUS:85208221759
SN - 2666-3899
VL - 5
JO - Patterns
JF - Patterns
IS - 11
M1 - 101093
ER -