LigandRFs: Random forest ensemble to identify ligand-binding residues from sequence information alone

Peng Chen, Jianhua Z. Huang, Xin Gao*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

22 Scopus citations

Abstract

Background: Protein-ligand binding is important for some proteins to perform their functions. Protein-ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advances in computational prediction for protein-ligand binding sites, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available. Results: In this paper, we propose a sequence-based approach to identify protein-ligand binding residues. We propose a combination technique to reduce the effects of different sliding residue windows in the process of encoding input feature vectors. Moreover, due to the highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we construct several balanced data sets, for each of which a random forest (RF)- based classifier is trained. The ensemble of these RF classifiers forms a sequence-based protein-ligand binding site predictor. Conclusions: Experimental results on CASP9 and CASP8 data sets demonstrate that our method compares favorably with the state-of-the-art protein-ligand binding site prediction methods.

Original languageEnglish (US)
Article number4
JournalBMC BIOINFORMATICS
Volume15
StatePublished - 2014

ASJC Scopus subject areas

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'LigandRFs: Random forest ensemble to identify ligand-binding residues from sequence information alone'. Together they form a unique fingerprint.

Cite this