TY - JOUR
T1 - Deriving disease modules from the compressed transcriptional space embedded in a deep autoencoder
AU - Dwivedi, Sanjiv K.
AU - Tjärnberg, Andreas
AU - Tegner, Jesper
AU - Gustafsson, Mika
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: This work was supported by the Swedish foundation for strategic research, and Swedish Research Council. S.K.D. thanks to Andreas Kalin for helpful discussions and suggestions in deep learning, and Tejaswi VS Badam for his other helpful suggestions.
PY - 2020/2/12
Y1 - 2020/2/12
N2 - Disease modules in molecular interaction maps have been useful for characterizing diseases. Yet biological networks, that commonly define such modules are incomplete and biased toward some well-studied disease genes. Here we ask whether disease-relevant modules of genes can be discovered without prior knowledge of a biological network, instead training a deep autoencoder from large transcriptional data. We hypothesize that modules could be discovered within the autoencoder representations. We find a statistically significant enrichment of genome-wide association studies (GWAS) relevant genes in the last layer, and to a successively lesser degree in the middle and first layers respectively. In contrast, we find an opposite gradient where a modular protein–protein interaction signal is strongest in the first layer, but then vanishing smoothly deeper in the network. We conclude that a data-driven discovery approach is sufficient to discover groups of disease-related genes.
AB - Disease modules in molecular interaction maps have been useful for characterizing diseases. Yet biological networks, that commonly define such modules are incomplete and biased toward some well-studied disease genes. Here we ask whether disease-relevant modules of genes can be discovered without prior knowledge of a biological network, instead training a deep autoencoder from large transcriptional data. We hypothesize that modules could be discovered within the autoencoder representations. We find a statistically significant enrichment of genome-wide association studies (GWAS) relevant genes in the last layer, and to a successively lesser degree in the middle and first layers respectively. In contrast, we find an opposite gradient where a modular protein–protein interaction signal is strongest in the first layer, but then vanishing smoothly deeper in the network. We conclude that a data-driven discovery approach is sufficient to discover groups of disease-related genes.
UR - http://hdl.handle.net/10754/661511
UR - http://www.nature.com/articles/s41467-020-14666-6
UR - http://www.scopus.com/inward/record.url?scp=85079338123&partnerID=8YFLogxK
U2 - 10.1038/s41467-020-14666-6
DO - 10.1038/s41467-020-14666-6
M3 - Article
C2 - 32051402
SN - 2041-1723
VL - 11
JO - Nature Communications
JF - Nature Communications
IS - 1
ER -