TY - JOUR
T1 - Towards fully automated structure-based NMR resonance assignment of 15N-labeled proteins from automatically picked peaks
AU - Jang, Richard
AU - Gao, Xin
AU - Li, Ming
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: We would like to thank Xiong, Pandurangan, and Bailey-Kellogg for providing us with their program and the test data for five proteins. We would like to thank our collegues Babak Alipanahi, Frank Balbach, Dongbo Bu, Thorsten Dieckmann, Logan Donaldson, Emre Karakoc, and Shuai Cheng Li for thoughtful discussions. This work is partially supported by NSERC (Grant OGP0046506), China's MOST 863 (Grant 2008AA02Z313), Canada Research Chair program, MITACS, an NSERC Collaborative Grant, Premier's Discovery Award, SHARCNET, Cheriton Scholarship, and a grant from King Adbullah University of Science and Technology.
PY - 2011/3
Y1 - 2011/3
N2 - In NMR resonance assignment, an indispensable step in NMR protein studies, manually processed peaks from both N-labeled and C-labeled spectra are typically used as inputs. However, the use of homologous structures can allow one to use only N-labeled NMR data and avoid the added expense of using C-labeled data. We propose a novel integer programming framework for structure-based backbone resonance assignment using N-labeled data. The core consists of a pair of integer programming models: one for spin system forming and amino acid typing, and the other for backbone resonance assignment. The goal is to perform the assignment directly from spectra without any manual intervention via automatically picked peaks, which are much noisier than manually picked peaks, so methods must be error-tolerant. In the case of semi-automated/manually processed peak data, we compare our system with the Xiong-Pandurangan-Bailey- Kellogg's contact replacement (CR) method, which is the most error-tolerant method for structure-based resonance assignment. Our system, on average, reduces the error rate of the CR method by five folds on their data set. In addition, by using an iterative algorithm, our system has the added capability of using the NOESY data to correct assignment errors due to errors in predicting the amino acid and secondary structure type of each spin system. On a publicly available data set for human ubiquitin, where the typing accuracy is 83%, we achieve 91% accuracy, compared to the 59% accuracy obtained without correcting for such errors. In the case of automatically picked peaks, using assignment information from yeast ubiquitin, we achieve a fully automatic assignment with 97% accuracy. To our knowledge, this is the first system that can achieve fully automatic structure-based assignment directly from spectra. This has implications in NMR protein mutant studies, where the assignment step is repeated for each mutant. © Copyright 2011, Mary Ann Liebert, Inc.
AB - In NMR resonance assignment, an indispensable step in NMR protein studies, manually processed peaks from both N-labeled and C-labeled spectra are typically used as inputs. However, the use of homologous structures can allow one to use only N-labeled NMR data and avoid the added expense of using C-labeled data. We propose a novel integer programming framework for structure-based backbone resonance assignment using N-labeled data. The core consists of a pair of integer programming models: one for spin system forming and amino acid typing, and the other for backbone resonance assignment. The goal is to perform the assignment directly from spectra without any manual intervention via automatically picked peaks, which are much noisier than manually picked peaks, so methods must be error-tolerant. In the case of semi-automated/manually processed peak data, we compare our system with the Xiong-Pandurangan-Bailey- Kellogg's contact replacement (CR) method, which is the most error-tolerant method for structure-based resonance assignment. Our system, on average, reduces the error rate of the CR method by five folds on their data set. In addition, by using an iterative algorithm, our system has the added capability of using the NOESY data to correct assignment errors due to errors in predicting the amino acid and secondary structure type of each spin system. On a publicly available data set for human ubiquitin, where the typing accuracy is 83%, we achieve 91% accuracy, compared to the 59% accuracy obtained without correcting for such errors. In the case of automatically picked peaks, using assignment information from yeast ubiquitin, we achieve a fully automatic assignment with 97% accuracy. To our knowledge, this is the first system that can achieve fully automatic structure-based assignment directly from spectra. This has implications in NMR protein mutant studies, where the assignment step is repeated for each mutant. © Copyright 2011, Mary Ann Liebert, Inc.
UR - http://hdl.handle.net/10754/564361
UR - http://www.liebertpub.com/doi/10.1089/cmb.2010.0251
UR - http://www.scopus.com/inward/record.url?scp=79952374427&partnerID=8YFLogxK
U2 - 10.1089/cmb.2010.0251
DO - 10.1089/cmb.2010.0251
M3 - Article
C2 - 21385039
SN - 1066-5277
VL - 18
SP - 347
EP - 363
JO - Journal of Computational Biology
JF - Journal of Computational Biology
IS - 3
ER -