RNA–protein interactions (RPI) play a crucial role in foundational cellular physiological processes. Traditional methods to predict RPI are implemented through expensive and labor-intensive biological experiments, and existing computational methods are far from being satisfactory. There is a timely need for developing more cost-effective methods to predict RPI. A stacking ensemble deep learning-based framework (named RPI-MDLStack) is constructed for RPI prediction in this study. First, sequential-, physicochemical-, structural- and evolutionary-information from RNA and protein sequences are obtained through eight feature extraction methods. Then, the optimal feature is generated after eliminating the redundancy of the fusion features by the least absolute shrinkage and selection operator (LASSO). Based on the stacking strategy, the optimal feature is first learned by the base-classifier combination composed of multilayer perceptron (MLP), support vector machine (SVM), random forest (RF), gated recurrent unit (GRU), and deep neural networks (DNN). Finally, the prediction scores are fed into a discriminative model for further training. The results of 5-fold cross-validation test prove the superior identification of RPI-MDLStack with accuracy of 96.7%, 87.3%, 94.6%, 97.1% and 89.5% on RPI488, RPI369, RPI2241, RPI1807, and RPI1446, respectively. Additionally, RPI-MDLStack obtained the overall prediction accuracy of 97.8% in the independent tests trained on RPI488. Compared with other state-of-the-art RPI prediction methods using the same datasets, RPI-MDLStack shows more robust and stable for predicting RPI.
ASJC Scopus subject areas