TY - JOUR
T1 - PointSite: A Point Cloud Segmentation Tool for Identification of Protein Ligand Binding Atoms
AU - Yan, Xu
AU - Lu, Yingfeng
AU - Li, Zhen
AU - Wei, Qing
AU - Gao, Xin
AU - Wang, Sheng
AU - Wu, Song
AU - Cui, Shuguang
N1 - KAUST Repository Item: Exported on 2022-05-30
Acknowledgements: This work was supported in part by NSFC-Youth 61902335, by Key Area R&D Program of Guangdong Province with Grant No. 2018B030338001, by the National Key R&D Program of China with Grant No. 2018YFB1800800, by Shenzhen Outstanding Talents Training Fund, by Guangdong Research Project No.2017ZT07 × 152, by Guangdong Regional Joint Fund-Key Projects 2019B1515120039, by the NSFC 61931024&81922046, by helixon biotechnology company Fund and CCF-Tencent Open Fund.
PY - 2022/5/27
Y1 - 2022/5/27
N2 - Accurate identification of ligand binding sites (LBS) on a protein structure is critical for understanding protein function and designing structure-based drugs. As the previous pocket-centric methods are usually based on the investigation of pseudo-surface-points outside the protein structure, they cannot fully take advantage of the local connectivity of atoms within the protein, as well as the global 3D geometrical information from all the protein atoms. In this paper, we propose a novel point clouds segmentation method, PointSite, for accurate identification of protein ligand binding atoms, which performs protein LBS identification at the atom-level in a protein-centric manner. Specifically, we first transfer the original 3D protein structure to point clouds and then conduct segmentation through Submanifold Sparse Convolution based U-Net. With the fine-grained atom-level binding atoms representation and enhanced feature learning, PointSite can outperform previous methods in atom Intersection over Union (atom-IoU) by a large margin. Furthermore, our segmented binding atoms, that is, atoms with high probability predicted by our model can work as a filter on predictions achieved by previous pocket-centric approaches, which significantly decreases the false-positive of LBS candidates. Besides, we further directly extend PointSite trained on bound proteins for LBS identification on unbound proteins, which demonstrates the superior generalization capacity of PointSite. Through cascaded filter and reranking aided by the segmented atoms, state-of-the-art performance can be achieved over various canonical benchmarks, CAMEO hard targets, and unbound proteins in terms of the commonly used DCA criteria.
AB - Accurate identification of ligand binding sites (LBS) on a protein structure is critical for understanding protein function and designing structure-based drugs. As the previous pocket-centric methods are usually based on the investigation of pseudo-surface-points outside the protein structure, they cannot fully take advantage of the local connectivity of atoms within the protein, as well as the global 3D geometrical information from all the protein atoms. In this paper, we propose a novel point clouds segmentation method, PointSite, for accurate identification of protein ligand binding atoms, which performs protein LBS identification at the atom-level in a protein-centric manner. Specifically, we first transfer the original 3D protein structure to point clouds and then conduct segmentation through Submanifold Sparse Convolution based U-Net. With the fine-grained atom-level binding atoms representation and enhanced feature learning, PointSite can outperform previous methods in atom Intersection over Union (atom-IoU) by a large margin. Furthermore, our segmented binding atoms, that is, atoms with high probability predicted by our model can work as a filter on predictions achieved by previous pocket-centric approaches, which significantly decreases the false-positive of LBS candidates. Besides, we further directly extend PointSite trained on bound proteins for LBS identification on unbound proteins, which demonstrates the superior generalization capacity of PointSite. Through cascaded filter and reranking aided by the segmented atoms, state-of-the-art performance can be achieved over various canonical benchmarks, CAMEO hard targets, and unbound proteins in terms of the commonly used DCA criteria.
UR - http://hdl.handle.net/10754/668465
UR - https://pubs.acs.org/doi/10.1021/acs.jcim.1c01512
U2 - 10.1021/acs.jcim.1c01512
DO - 10.1021/acs.jcim.1c01512
M3 - Article
C2 - 35621730
SN - 1549-9596
JO - Journal of Chemical Information and Modeling
JF - Journal of Chemical Information and Modeling
ER -