TY - JOUR
T1 - QAUST: protein function prediction using structure similarity search, protein interaction and functional sequence motifs
AU - Smaili, Fatima Z.
AU - Tian, Shuye
AU - Roy, Ambrish
AU - Alazmi, Meshari
AU - Arold, Stefan T.
AU - Mukherjee, Srayanta
AU - Hefty, P. Scott
AU - Chen, Wei
AU - Gao, Xin
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledged KAUST grant number(s): URF/1/1976-04, URF/1/1976-06
Acknowledgements: We thank Mr. Chengxin Zhang, Dr. Wei Zhang and Professor Yang Zhang for helpful discussions. The research reported in this publication was supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. URF/1/1976-04 and URF/1/1976-06. This work used the Extreme Science and Engineering Discovery Environment (XSEDE) [62], which is supported by National Science Foundation grant number ACI-1053575.
PY - 2020
Y1 - 2020
N2 - The number of available protein sequences in public databases is increasing exponentially. However, a significant percentage of these sequences lack functional annotation, which is essential for the understanding of how biological systems operate. We propose a novel method, QAUST, to infer protein functions, specifically Gene Ontology (GO) terms and Enzyme Commission (EC) numbers. Our method uses three sources of information: structure information encoded by global and local structure similarity search, biological network information inferred by protein-protein interaction data, and sequence information extracted from functionally discriminative sequence motifs. The three pieces of information are combined by consensus averaging to make the final prediction. Our approach has been tested on 500 protein targets from the CAFA benchmark set. The results show that our method provides accurate functional annotation and outperforms other prediction methods based on sequence similarity search or threading. We further demonstrate that a previously unknown function of TRIM22 protein predicted by QAUST can be experimentally validated. Availability: http://www.cbrc.kaust.edu.sa/qaust/submit/.
AB - The number of available protein sequences in public databases is increasing exponentially. However, a significant percentage of these sequences lack functional annotation, which is essential for the understanding of how biological systems operate. We propose a novel method, QAUST, to infer protein functions, specifically Gene Ontology (GO) terms and Enzyme Commission (EC) numbers. Our method uses three sources of information: structure information encoded by global and local structure similarity search, biological network information inferred by protein-protein interaction data, and sequence information extracted from functionally discriminative sequence motifs. The three pieces of information are combined by consensus averaging to make the final prediction. Our approach has been tested on 500 protein targets from the CAFA benchmark set. The results show that our method provides accurate functional annotation and outperforms other prediction methods based on sequence similarity search or threading. We further demonstrate that a previously unknown function of TRIM22 protein predicted by QAUST can be experimentally validated. Availability: http://www.cbrc.kaust.edu.sa/qaust/submit/.
UR - http://hdl.handle.net/10754/661370
M3 - Article
JO - Accepted by Genomics, Proteomics, and Bioinformatics
JF - Accepted by Genomics, Proteomics, and Bioinformatics
ER -