TY - JOUR
T1 - Prediction of functionally important sites from protein sequences using sparse kernel least squares classifiers
AU - Tang, Ke
AU - Pugalenthi, Ganesan
AU - Suganthan, P. N.
AU - Lanczycki, Christopher J.
AU - Chakrabarti, Saikat
N1 - Funding Information:
KT is financially supported by a National Natural Science Foundation of China Grant (No. 60802036). G.P. and P.N.S. acknowledge the financial support offered by the A*Star (Agency for Science, Technology and Research). S.C. and C.J.L. acknowledge the support provided by the Intramural Research Program of the National Library of Medicine at National Institutes of Health/DHHS.
PY - 2009/6/26
Y1 - 2009/6/26
N2 - Identification of functionally important sites (FIS) in proteins is a critical problem and can have profound importance where protein structural information is limited. Machine learning techniques have been very useful in successful classification of many important biological problems. In this paper, we adopt the sparse kernel least squares classifiers (SKLSC) approach for classification and/or prediction of FIS using protein sequence derived features. The SKLSC algorithm was applied to 5435 FIS that have been extracted from 312 reliable alignments for a wide range of protein families. We obtained 68.28% sensitivity and 68.66% specificity for training dataset and 65.34% sensitivity and 66.88% specificity for testing dataset. Further, large scale benchmarking study using alignments of 101 protein families containing 1899 FIS showed that our method achieved an average ∼70% sensitivity in predicting different types of FIS, such as active sites, metal, ligand or protein binding sites. Our findings also indicate that active sites and metal binding sites are comparably easier to predict compared to the ligand and protein binding sites. Despite moderate success, our results suggest the usefulness and potential of SKLSC approach in prediction of FIS using only protein sequence derived information.
AB - Identification of functionally important sites (FIS) in proteins is a critical problem and can have profound importance where protein structural information is limited. Machine learning techniques have been very useful in successful classification of many important biological problems. In this paper, we adopt the sparse kernel least squares classifiers (SKLSC) approach for classification and/or prediction of FIS using protein sequence derived features. The SKLSC algorithm was applied to 5435 FIS that have been extracted from 312 reliable alignments for a wide range of protein families. We obtained 68.28% sensitivity and 68.66% specificity for training dataset and 65.34% sensitivity and 66.88% specificity for testing dataset. Further, large scale benchmarking study using alignments of 101 protein families containing 1899 FIS showed that our method achieved an average ∼70% sensitivity in predicting different types of FIS, such as active sites, metal, ligand or protein binding sites. Our findings also indicate that active sites and metal binding sites are comparably easier to predict compared to the ligand and protein binding sites. Despite moderate success, our results suggest the usefulness and potential of SKLSC approach in prediction of FIS using only protein sequence derived information.
KW - Functionally important sites
KW - Machine learning algorithms
KW - Protein functional templates
KW - Sparse kernel least squares classifiers
UR - http://www.scopus.com/inward/record.url?scp=65649137001&partnerID=8YFLogxK
U2 - 10.1016/j.bbrc.2009.04.096
DO - 10.1016/j.bbrc.2009.04.096
M3 - Article
C2 - 19394310
AN - SCOPUS:65649137001
SN - 0006-291X
VL - 384
SP - 155
EP - 159
JO - Biochemical and biophysical research communications
JF - Biochemical and biophysical research communications
IS - 2
ER -