TY - GEN
T1 - LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone
AU - Chen, Peng
AU - Huang, Jianhua Z
AU - Gao, Xin
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledged KAUST grant number(s): GRP-CF-2011-19-P-Gao-Huang, KUS-CI-016-04
Acknowledgements: This work was supported Award Numbers KUS-CI-016-04 and GRP-CF-2011-19-P-Gao-Huang, made by King Abdullah University of Science and Technology (KAUST). This work was also supported by the National Natural Science Foundation of China (Nos. 61300058, 61374181 and 61472282). Publication charges for this article have been funded by the Award Numbers KUS-CI-016-04 and GRP-CF-2011-19-P-Gao-Huang, made by King Abdullah University of Science and Technology (KAUST).
PY - 2014/12/3
Y1 - 2014/12/3
N2 - Background
Protein-ligand binding is important for some proteins to perform their functions. Protein-ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advances in computational prediction for protein-ligand binding sites, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available.
Results
In this paper, we propose a sequence-based approach to identify protein-ligand binding residues. We propose a combination technique to reduce the effects of different sliding residue windows in the process of encoding input feature vectors. Moreover, due to the highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we construct several balanced data sets, for each of which a random forest (RF)-based classifier is trained. The ensemble of these RF classifiers forms a sequence-based protein-ligand binding site predictor.
Conclusions
Experimental results on CASP9 and CASP8 data sets demonstrate that our method compares favorably with the state-of-the-art protein-ligand binding site prediction methods.
AB - Background
Protein-ligand binding is important for some proteins to perform their functions. Protein-ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advances in computational prediction for protein-ligand binding sites, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available.
Results
In this paper, we propose a sequence-based approach to identify protein-ligand binding residues. We propose a combination technique to reduce the effects of different sliding residue windows in the process of encoding input feature vectors. Moreover, due to the highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we construct several balanced data sets, for each of which a random forest (RF)-based classifier is trained. The ensemble of these RF classifiers forms a sequence-based protein-ligand binding site predictor.
Conclusions
Experimental results on CASP9 and CASP8 data sets demonstrate that our method compares favorably with the state-of-the-art protein-ligand binding site prediction methods.
UR - http://hdl.handle.net/10754/344396
UR - http://www.biomedcentral.com/1471-2105/15/S15/S4
UR - http://www.scopus.com/inward/record.url?scp=84961572705&partnerID=8YFLogxK
U2 - 10.1186/1471-2105-15-S15-S4
DO - 10.1186/1471-2105-15-S15-S4
M3 - Conference contribution
C2 - 25474163
SP - S4
BT - BMC Bioinformatics
PB - Springer Nature
ER -