TY - JOUR
T1 - Deep neural network prediction of genome-wide transcriptome signatures – beyond the Black-box
AU - Magnusson, Rasmus
AU - Tegner, Jesper
AU - Gustafsson, Mika
N1 - KAUST Repository Item: Exported on 2022-04-26
Acknowledgements: Supported by the Swedish Research Council (grant 2019-04193(M.G.), the Swedish foundation for strategic research (grant SB16-0095(R.M., M.G.)), the Center for Industrial IT (CENIIT)(R.M., M.G.), the Systems Biology Research Centre at University of Skövde under grants from the Knowledge Foundation (grant 20200014) (R.M.), and the King Abdullah University of Science and Technology (KAUST) (J.N.T). Computational resources were granted by Swedish National Infrastructure for Computing (SNIC; SNIC 2020/5-177, and LiU-2019-25). The authors would like to thank Dr. Robert Lehman for his helpful suggestions to improve this manuscript.
PY - 2022/2/23
Y1 - 2022/2/23
N2 - Prediction algorithms for protein or gene structures, including transcription factor binding from sequence information, have been transformative in understanding gene regulation. Here we ask whether human transcriptomic profiles can be predicted solely from the expression of transcription factors (TFs). We find that the expression of 1600 TFs can explain >95% of the variance in 25,000 genes. Using the light-up technique to inspect the trained NN, we find an over-representation of known TF-gene regulations. Furthermore, the learned prediction network has a hierarchical organization. A smaller set of around 125 core TFs could explain close to 80% of the variance. Interestingly, reducing the number of TFs below 500 induces a rapid decline in prediction performance. Next, we evaluated the prediction model using transcriptional data from 22 human diseases. The TFs were sufficient to predict the dysregulation of the target genes (rho = 0.61, P < 10−216). By inspecting the model, key causative TFs could be extracted for subsequent validation using disease-associated genetic variants. We demonstrate a methodology for constructing an interpretable neural network predictor, where analyses of the predictors identified key TFs that were inducing transcriptional changes during disease
AB - Prediction algorithms for protein or gene structures, including transcription factor binding from sequence information, have been transformative in understanding gene regulation. Here we ask whether human transcriptomic profiles can be predicted solely from the expression of transcription factors (TFs). We find that the expression of 1600 TFs can explain >95% of the variance in 25,000 genes. Using the light-up technique to inspect the trained NN, we find an over-representation of known TF-gene regulations. Furthermore, the learned prediction network has a hierarchical organization. A smaller set of around 125 core TFs could explain close to 80% of the variance. Interestingly, reducing the number of TFs below 500 induces a rapid decline in prediction performance. Next, we evaluated the prediction model using transcriptional data from 22 human diseases. The TFs were sufficient to predict the dysregulation of the target genes (rho = 0.61, P < 10−216). By inspecting the model, key causative TFs could be extracted for subsequent validation using disease-associated genetic variants. We demonstrate a methodology for constructing an interpretable neural network predictor, where analyses of the predictors identified key TFs that were inducing transcriptional changes during disease
UR - http://hdl.handle.net/10754/676526
UR - https://www.nature.com/articles/s41540-022-00218-9
UR - http://www.scopus.com/inward/record.url?scp=85125212745&partnerID=8YFLogxK
U2 - 10.1038/s41540-022-00218-9
DO - 10.1038/s41540-022-00218-9
M3 - Article
C2 - 35197482
SN - 2056-7189
VL - 8
JO - npj Systems Biology and Applications
JF - npj Systems Biology and Applications
IS - 1
ER -