Abstract
Protein–ligand binding is important for some proteins to perform their functions. Protein–ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advance in computational prediction for protein–ligand binding sites, the state-of-the-art methods searched for similar, known structures of the query and predicted the binding sites based on the solved structures. However, such structural information is not commonly available. This chapter proposes a sequence-based approach to identify protein–ligand binding residues. We proposed a combination to reduce the effects of different sliding residue windows in the process of encoding input vectors. Moreover, due to the highly imbalanced samples between the ligand-binding sites and non-ligand-binding sites, we constructed several balanced data sets, for each of which a random forest (RF)-based classifier was trained. The ensemble of these RF classifiers formed a sequence-based protein–ligand binding site predictor. Experimental results on CASP9 and CASP8 targets demonstrated that our method compared favorably with the state of the art.
Original language | English (US) |
---|---|
Title of host publication | Computational Intelligence in Protein-Ligand Interaction Analysis |
Publisher | Elsevier |
Pages | 1-25 |
Number of pages | 25 |
ISBN (Electronic) | 9780128243862 |
ISBN (Print) | 9780128244357 |
DOIs | |
State | Published - Jan 1 2024 |
Keywords
- Binding site
- Ensembling
- Imbalanced sample
- Protein-ligand binding
- Random forest
ASJC Scopus subject areas
- General Biochemistry, Genetics and Molecular Biology