TY - JOUR
T1 - RPI-MDLStack: Predicting RNA–protein interactions through deep learning with stacking strategy and LASSO
AU - Yu, Bin
AU - Wang, Xue
AU - Zhang, Yaqun
AU - Gao, Hongli
AU - Wang, Yifei
AU - Liu, Yushuang
AU - Gao, Xin
N1 - KAUST Repository Item: Exported on 2022-04-21
Acknowledged KAUST grant number(s): FCC/1/1976-18-01, REI/1/4216-01-01, REI/1/4473-01-01, REI/1/4742-01-01, URF/1/4098-01-01, URF/1/4379-01-0
Acknowledgements: We thank anonymous reviewers for valuable suggestions and comments. This work was supported by the National Natural Science Foundation of China (No. 62172248), the Natural Science Foundation of Shandong Province of China (No. ZR2021MF098), and the King Abdullah University of Science and Technology (KAUST) Office of Spon-sored Research (OSR) under award numbers (Nos. FCC/1/1976-18-01, REI/1/4216-01-01, REI/1/4437-01-01, REI/1/4473-01-01, URF/1/4379-01-01, REI/1/4742-01-01 and URF/1/4098-01-01)
PY - 2022/3/18
Y1 - 2022/3/18
N2 - RNA–protein interactions (RPI) play a crucial role in foundational cellular physiological processes. Traditional methods to predict RPI are implemented through expensive and labor-intensive biological experiments, and existing computational methods are far from being satisfactory. There is a timely need for developing more cost-effective methods to predict RPI. A stacking ensemble deep learning-based framework (named RPI-MDLStack) is constructed for RPI prediction in this study. First, sequential-, physicochemical-, structural- and evolutionary-information from RNA and protein sequences are obtained through eight feature extraction methods. Then, the optimal feature is generated after eliminating the redundancy of the fusion features by the least absolute shrinkage and selection operator (LASSO). Based on the stacking strategy, the optimal feature is first learned by the base-classifier combination composed of multilayer perceptron (MLP), support vector machine (SVM), random forest (RF), gated recurrent unit (GRU), and deep neural networks (DNN). Finally, the prediction scores are fed into a discriminative model for further training. The results of 5-fold cross-validation test prove the superior identification of RPI-MDLStack with accuracy of 96.7%, 87.3%, 94.6%, 97.1% and 89.5% on RPI488, RPI369, RPI2241, RPI1807, and RPI1446, respectively. Additionally, RPI-MDLStack obtained the overall prediction accuracy of 97.8% in the independent tests trained on RPI488. Compared with other state-of-the-art RPI prediction methods using the same datasets, RPI-MDLStack shows more robust and stable for predicting RPI.
AB - RNA–protein interactions (RPI) play a crucial role in foundational cellular physiological processes. Traditional methods to predict RPI are implemented through expensive and labor-intensive biological experiments, and existing computational methods are far from being satisfactory. There is a timely need for developing more cost-effective methods to predict RPI. A stacking ensemble deep learning-based framework (named RPI-MDLStack) is constructed for RPI prediction in this study. First, sequential-, physicochemical-, structural- and evolutionary-information from RNA and protein sequences are obtained through eight feature extraction methods. Then, the optimal feature is generated after eliminating the redundancy of the fusion features by the least absolute shrinkage and selection operator (LASSO). Based on the stacking strategy, the optimal feature is first learned by the base-classifier combination composed of multilayer perceptron (MLP), support vector machine (SVM), random forest (RF), gated recurrent unit (GRU), and deep neural networks (DNN). Finally, the prediction scores are fed into a discriminative model for further training. The results of 5-fold cross-validation test prove the superior identification of RPI-MDLStack with accuracy of 96.7%, 87.3%, 94.6%, 97.1% and 89.5% on RPI488, RPI369, RPI2241, RPI1807, and RPI1446, respectively. Additionally, RPI-MDLStack obtained the overall prediction accuracy of 97.8% in the independent tests trained on RPI488. Compared with other state-of-the-art RPI prediction methods using the same datasets, RPI-MDLStack shows more robust and stable for predicting RPI.
UR - http://hdl.handle.net/10754/676342
UR - https://linkinghub.elsevier.com/retrieve/pii/S156849462200148X
UR - http://www.scopus.com/inward/record.url?scp=85126516969&partnerID=8YFLogxK
U2 - 10.1016/j.asoc.2022.108676
DO - 10.1016/j.asoc.2022.108676
M3 - Article
SN - 1568-4946
VL - 120
SP - 108676
JO - Applied Soft Computing
JF - Applied Soft Computing
ER -