TY - JOUR
T1 - QAUST
T2 - Protein Function Prediction Using Structure Similarity, Protein Interaction, and Functional Motifs
AU - Smaili, Fatima Zohra
AU - Tian, Shuye
AU - Roy, Ambrish
AU - Alazmi, Meshari
AU - Arold, Stefan T.
AU - Mukherjee, Srayanta
AU - Hefty, P. Scott
AU - Chen, Wei
AU - Gao, Xin
N1 - Funding Information:
We thank Mr. Chengxin Zhang, Dr. Wei Zhang and Professor Yang Zhang for helpful discussions. The research reported in this publication was supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Grant Nos. URF/1/1976-04 and URF/1/1976-06 .
Publisher Copyright:
© 2021 The Authors
PY - 2021/12
Y1 - 2021/12
N2 - The number of available protein sequences in public databases is increasing exponentially. However, a significant percentage of these sequences lack functional annotation, which is essential for the understanding of how biological systems operate. Here, we propose a novel method, Quantitative Annotation of Unknown STructure (QAUST), to infer protein functions, specifically Gene Ontology (GO) terms and Enzyme Commission (EC) numbers. QAUST uses three sources of information: structure information encoded by global and local structure similarity search, biological network information inferred by protein–protein interaction data, and sequence information extracted from functionally discriminative sequence motifs. These three pieces of information are combined by consensus averaging to make the final prediction. Our approach has been tested on 500 protein targets from the Critical Assessment of Functional Annotation (CAFA) benchmark set. The results show that our method provides accurate functional annotation and outperforms other prediction methods based on sequence similarity search or threading. We further demonstrate that a previously unknown function of human tripartite motif-containing 22 (TRIM22) protein predicted by QAUST can be experimentally validated.
AB - The number of available protein sequences in public databases is increasing exponentially. However, a significant percentage of these sequences lack functional annotation, which is essential for the understanding of how biological systems operate. Here, we propose a novel method, Quantitative Annotation of Unknown STructure (QAUST), to infer protein functions, specifically Gene Ontology (GO) terms and Enzyme Commission (EC) numbers. QAUST uses three sources of information: structure information encoded by global and local structure similarity search, biological network information inferred by protein–protein interaction data, and sequence information extracted from functionally discriminative sequence motifs. These three pieces of information are combined by consensus averaging to make the final prediction. Our approach has been tested on 500 protein targets from the Critical Assessment of Functional Annotation (CAFA) benchmark set. The results show that our method provides accurate functional annotation and outperforms other prediction methods based on sequence similarity search or threading. We further demonstrate that a previously unknown function of human tripartite motif-containing 22 (TRIM22) protein predicted by QAUST can be experimentally validated.
KW - EC number
KW - Functionally discriminative motif
KW - GO term
KW - Protein function prediction
KW - Protein structure similarity
UR - http://www.scopus.com/inward/record.url?scp=85120914343&partnerID=8YFLogxK
U2 - 10.1016/j.gpb.2021.02.001
DO - 10.1016/j.gpb.2021.02.001
M3 - Article
C2 - 33631427
AN - SCOPUS:85120914343
SN - 1672-0229
VL - 19
SP - 998
EP - 1011
JO - Genomics, Proteomics and Bioinformatics
JF - Genomics, Proteomics and Bioinformatics
IS - 6
ER -