TY - JOUR
T1 - DEEPre: sequence-based enzyme EC number prediction by deep learning
AU - Li, Yu
AU - wang, sheng
AU - Umarov, Ramzan
AU - Xie, Bingqing
AU - Fan, Ming
AU - Li, Lihua
AU - Gao, Xin
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledged KAUST grant number(s): URF/1/1976-04, URF/1/3007-01
Acknowledgements: We would like to thank Prof. Kuo-Chen Chou for kindly providing the KNN dataset. This publication is based upon work supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No URF/1/1976-04 and URF/1/3007-01, National Natural Science Foundation of China (61401131 and 61731008).
PY - 2017/10/23
Y1 - 2017/10/23
N2 - Annotation of enzyme function has a broad range of applications, such as metagenomics, industrial biotechnology, and diagnosis of enzyme deficiency-caused diseases. However, the time and resource required make it prohibitively expensive to experimentally determine the function of every enzyme. Therefore, computational enzyme function prediction has become increasingly important. In this paper, we develop such an approach, determining the enzyme function by predicting the Enzyme Commission number.We propose an end-to-end feature selection and classification model training approach, as well as an automatic and robust feature dimensionality uniformization method, DEEPre, in the field of enzyme function prediction. Instead of extracting manuallycrafted features from enzyme sequences, our model takes the raw sequence encoding as inputs, extracting convolutional and sequential features from the raw encoding based on the classification result to directly improve the prediction performance. The thorough cross-fold validation experiments conducted on two large-scale datasets show that DEEPre improves the prediction performance over the previous state-of-the-art methods. In addition, our server outperforms five other servers in determining the main class of enzymes on a separate low-homology dataset. Two case studies demonstrate DEEPre's ability to capture the functional difference of enzyme isoforms.The server could be accessed freely at http://www.cbrc.kaust.edu.sa/DEEPre.
AB - Annotation of enzyme function has a broad range of applications, such as metagenomics, industrial biotechnology, and diagnosis of enzyme deficiency-caused diseases. However, the time and resource required make it prohibitively expensive to experimentally determine the function of every enzyme. Therefore, computational enzyme function prediction has become increasingly important. In this paper, we develop such an approach, determining the enzyme function by predicting the Enzyme Commission number.We propose an end-to-end feature selection and classification model training approach, as well as an automatic and robust feature dimensionality uniformization method, DEEPre, in the field of enzyme function prediction. Instead of extracting manuallycrafted features from enzyme sequences, our model takes the raw sequence encoding as inputs, extracting convolutional and sequential features from the raw encoding based on the classification result to directly improve the prediction performance. The thorough cross-fold validation experiments conducted on two large-scale datasets show that DEEPre improves the prediction performance over the previous state-of-the-art methods. In addition, our server outperforms five other servers in determining the main class of enzymes on a separate low-homology dataset. Two case studies demonstrate DEEPre's ability to capture the functional difference of enzyme isoforms.The server could be accessed freely at http://www.cbrc.kaust.edu.sa/DEEPre.
UR - http://hdl.handle.net/10754/625965
UR - https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btx680/4562505/DEEPre-sequencebased-enzyme-EC-number-prediction
UR - http://www.scopus.com/inward/record.url?scp=85042918066&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btx680
DO - 10.1093/bioinformatics/btx680
M3 - Article
C2 - 29069344
SN - 1367-4803
VL - 34
SP - 760
EP - 769
JO - Bioinformatics
JF - Bioinformatics
IS - 5
ER -