Random forest method for predicting protein ligand–binding residues

Peng Chen, Bing Wang, Jun Zhang, Xin Gao

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

Protein–ligand binding is important for some proteins to perform their functions. Protein–ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advance in computational prediction for protein–ligand binding sites, the state-of-the-art methods searched for similar, known structures of the query and predicted the binding sites based on the solved structures. However, such structural information is not commonly available. This chapter proposes a sequence-based approach to identify protein–ligand binding residues. We proposed a combination to reduce the effects of different sliding residue windows in the process of encoding input vectors. Moreover, due to the highly imbalanced samples between the ligand-binding sites and non-ligand-binding sites, we constructed several balanced data sets, for each of which a random forest (RF)-based classifier was trained. The ensemble of these RF classifiers formed a sequence-based protein–ligand binding site predictor. Experimental results on CASP9 and CASP8 targets demonstrated that our method compared favorably with the state of the art.

Original languageEnglish (US)
Title of host publicationComputational Intelligence in Protein-Ligand Interaction Analysis
PublisherElsevier
Pages1-25
Number of pages25
ISBN (Electronic)9780128243862
ISBN (Print)9780128244357
DOIs
StatePublished - Jan 1 2024

Keywords

  • Binding site
  • Ensembling
  • Imbalanced sample
  • Protein-ligand binding
  • Random forest

ASJC Scopus subject areas

  • General Biochemistry, Genetics and Molecular Biology

Fingerprint

Dive into the research topics of 'Random forest method for predicting protein ligand–binding residues'. Together they form a unique fingerprint.

Cite this