TY - JOUR
T1 - EcmPred: Prediction of extracellular matrix proteins based on random forest with maximum relevance minimum redundancy feature selection
AU - Kandaswamy, Krishna Kumar Umar
AU - Ganesan, Pugalenthi
AU - Kalies, Kai Uwe
AU - Hartmann, Enno
AU - Martinetz, Thomas M.
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: This work was supported by the Graduate School for Computing in Medicine and Life Sciences funded by Germany's Excellence Initiative [DFG GSC 235/1]. KKK acknowledges Dr. Bianca Habermann, Max Planck Institute for Biology of Ageing, Germany for her support.
PY - 2013/1
Y1 - 2013/1
N2 - The extracellular matrix (ECM) is a major component of tissues of multicellular organisms. It consists of secreted macromolecules, mainly polysaccharides and glycoproteins. Malfunctions of ECM proteins lead to severe disorders such as marfan syndrome, osteogenesis imperfecta, numerous chondrodysplasias, and skin diseases. In this work, we report a random forest approach, EcmPred, for the prediction of ECM proteins from protein sequences. EcmPred was trained on a dataset containing 300 ECM and 300 non-ECM and tested on a dataset containing 145 ECM and 4187 non-ECM proteins. EcmPred achieved 83% accuracy on the training and 77% on the test dataset. EcmPred predicted 15 out of 20 experimentally verified ECM proteins. By scanning the entire human proteome, we predicted novel ECM proteins validated with gene ontology and InterPro. The dataset and standalone version of the EcmPred software is available at http://www.inb.uni-luebeck.de/tools-demos/Extracellular_matrix_proteins/EcmPred. © 2012 Elsevier Ltd.
AB - The extracellular matrix (ECM) is a major component of tissues of multicellular organisms. It consists of secreted macromolecules, mainly polysaccharides and glycoproteins. Malfunctions of ECM proteins lead to severe disorders such as marfan syndrome, osteogenesis imperfecta, numerous chondrodysplasias, and skin diseases. In this work, we report a random forest approach, EcmPred, for the prediction of ECM proteins from protein sequences. EcmPred was trained on a dataset containing 300 ECM and 300 non-ECM and tested on a dataset containing 145 ECM and 4187 non-ECM proteins. EcmPred achieved 83% accuracy on the training and 77% on the test dataset. EcmPred predicted 15 out of 20 experimentally verified ECM proteins. By scanning the entire human proteome, we predicted novel ECM proteins validated with gene ontology and InterPro. The dataset and standalone version of the EcmPred software is available at http://www.inb.uni-luebeck.de/tools-demos/Extracellular_matrix_proteins/EcmPred. © 2012 Elsevier Ltd.
UR - http://hdl.handle.net/10754/562580
UR - https://linkinghub.elsevier.com/retrieve/pii/S0022519312005486
UR - http://www.scopus.com/inward/record.url?scp=84872408863&partnerID=8YFLogxK
U2 - 10.1016/j.jtbi.2012.10.015
DO - 10.1016/j.jtbi.2012.10.015
M3 - Article
C2 - 23123454
SN - 0022-5193
VL - 317
SP - 377
EP - 383
JO - Journal of Theoretical Biology
JF - Journal of Theoretical Biology
ER -