TY - JOUR
T1 - DEEPStack-RBP: Accurate identification of RNA-binding proteins based on autoencoder feature selection and deep stacking ensemble classifier
AU - Wei, Qinqin
AU - Zhang, Qingmei
AU - Gao, Hongli
AU - Song, Tao
AU - Salhi, Adil
AU - Yu, Bin
N1 - KAUST Repository Item: Exported on 2022-10-03
Acknowledgements: We thank anonymous reviewers for valuable suggestions and comments. This work was supported by the National Natural Science Foundation of China (Nos. 62172248, 61863010), the Natural Science Foundation of Shandong Province of China (No. ZR2021MF098), and the Key Laboratory Open Foundation of Hainan Province, China (No. JSKX202001).
PY - 2022/9/23
Y1 - 2022/9/23
N2 - RNA-binding proteins (RBPs) are involved in a number of biological processes such as RNA synthesis, protein folding, alternative splicing, etc. Predicting RBPs can facilitate the discovery and treatment of human diseases, such as muscle atrophy, nervous system diseases, and cancer. However, there are still various challenges in identifying RBPs using experimental methods. Computational methods, and in particular Deep Learning, are being deployed to alleviate some of these challenges and provide new avenues of investigation in the field of RBPs prediction. Here, we propose DEEPStack-RBP, a novel RBPs prediction tool based on deep learning and ensemble learning. First, conjoint triad (CT), local descriptors (LD), pseudo amino acid composition (PseAAC), multivariate mutual information (MMI) and position specific scoring matrix-transition probability composition (PSSM-TPC) are applied to extract multiple features from the proteins. Subsequently, autoencoder (AE) is used to eliminate redundancy in features, and SMOTE-ENN is employed to balance the samples by minimizing the number difference between positive and negative cases. Finally, the stacked ensemble classifier composed of bidirectional long short-term memory (BiLSTM), gated recurrent unit (GRU), and support vector machine (SVM) is used for prediction. On the training dataset RBP9873, the ACC value of DEEPStack-RBP reaches 98.76% with a MCC value of 0.9508. For the three independent test datasets of Human, S. cerevisiae and A. thaliana, the accuracy of the model is 97.16%, 97.67% and 99.57% respectively, and the MCC is 0.9405, 0.9499 and 0.9906 respectively. These results show that DEEPStack-RBP can be used as a powerful tool for RBPs prediction.
AB - RNA-binding proteins (RBPs) are involved in a number of biological processes such as RNA synthesis, protein folding, alternative splicing, etc. Predicting RBPs can facilitate the discovery and treatment of human diseases, such as muscle atrophy, nervous system diseases, and cancer. However, there are still various challenges in identifying RBPs using experimental methods. Computational methods, and in particular Deep Learning, are being deployed to alleviate some of these challenges and provide new avenues of investigation in the field of RBPs prediction. Here, we propose DEEPStack-RBP, a novel RBPs prediction tool based on deep learning and ensemble learning. First, conjoint triad (CT), local descriptors (LD), pseudo amino acid composition (PseAAC), multivariate mutual information (MMI) and position specific scoring matrix-transition probability composition (PSSM-TPC) are applied to extract multiple features from the proteins. Subsequently, autoencoder (AE) is used to eliminate redundancy in features, and SMOTE-ENN is employed to balance the samples by minimizing the number difference between positive and negative cases. Finally, the stacked ensemble classifier composed of bidirectional long short-term memory (BiLSTM), gated recurrent unit (GRU), and support vector machine (SVM) is used for prediction. On the training dataset RBP9873, the ACC value of DEEPStack-RBP reaches 98.76% with a MCC value of 0.9508. For the three independent test datasets of Human, S. cerevisiae and A. thaliana, the accuracy of the model is 97.16%, 97.67% and 99.57% respectively, and the MCC is 0.9405, 0.9499 and 0.9906 respectively. These results show that DEEPStack-RBP can be used as a powerful tool for RBPs prediction.
UR - http://hdl.handle.net/10754/681764
UR - https://linkinghub.elsevier.com/retrieve/pii/S0950705122009686
UR - http://www.scopus.com/inward/record.url?scp=85138345446&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2022.109875
DO - 10.1016/j.knosys.2022.109875
M3 - Article
SN - 0950-7051
VL - 256
SP - 109875
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
ER -