TY - JOUR
T1 - Prediction of Metabolic Pathway Involvement in Prokaryotic UniProtKB Data by Association Rule Mining
AU - Boudellioua, Imene
AU - Saidi, Rabie
AU - Hoehndorf, Robert
AU - Martin, Maria J.
AU - Solovyev, Victor
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: IB, RH and VS were supported by funding provided by the King Abdullah University of Science and Technology.
PY - 2016/7/8
Y1 - 2016/7/8
N2 - The widening gap between known proteins and their functions has encouraged the development of methods to automatically infer annotations. Automatic functional annotation of proteins is expected to meet the conflicting requirements of maximizing annotation coverage, while minimizing erroneous functional assignments. This trade-off imposes a great challenge in designing intelligent systems to tackle the problem of automatic protein annotation. In this work, we present a system that utilizes rule mining techniques to predict metabolic pathways in prokaryotes. The resulting knowledge represents predictive models that assign pathway involvement to UniProtKB entries. We carried out an evaluation study of our system performance using cross-validation technique. We found that it achieved very promising results in pathway identification with an F1-measure of 0.982 and an AUC of 0.987. Our prediction models were then successfully applied to 6.2 million UniProtKB/TrEMBL reference proteome entries of prokaryotes. As a result, 663,724 entries were covered, where 436,510 of them lacked any previous pathway annotations.
AB - The widening gap between known proteins and their functions has encouraged the development of methods to automatically infer annotations. Automatic functional annotation of proteins is expected to meet the conflicting requirements of maximizing annotation coverage, while minimizing erroneous functional assignments. This trade-off imposes a great challenge in designing intelligent systems to tackle the problem of automatic protein annotation. In this work, we present a system that utilizes rule mining techniques to predict metabolic pathways in prokaryotes. The resulting knowledge represents predictive models that assign pathway involvement to UniProtKB entries. We carried out an evaluation study of our system performance using cross-validation technique. We found that it achieved very promising results in pathway identification with an F1-measure of 0.982 and an AUC of 0.987. Our prediction models were then successfully applied to 6.2 million UniProtKB/TrEMBL reference proteome entries of prokaryotes. As a result, 663,724 entries were covered, where 436,510 of them lacked any previous pathway annotations.
UR - http://hdl.handle.net/10754/617797
UR - http://dx.plos.org/10.1371/journal.pone.0158896
UR - http://www.scopus.com/inward/record.url?scp=84978698009&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0158896
DO - 10.1371/journal.pone.0158896
M3 - Article
C2 - 27390860
SN - 1932-6203
VL - 11
SP - e0158896
JO - PLoS ONE
JF - PLoS ONE
IS - 7
ER -