TY - JOUR
T1 - Learning probabilistic models of hydrogen bond stability from molecular dynamics simulation trajectories
AU - Chikalov, Igor
AU - Yao, Peggy
AU - Moshkov, Mikhail
AU - Latombe, Jean-Claude
N1 - KAUST Repository Item: Exported on 2020-10-01
PY - 2011/2/18
Y1 - 2011/2/18
N2 - Background: Hydrogen bonds (H-bonds) play a key role in both the formation and stabilization of protein structures. They form and break while a protein deforms, for instance during the transition from a non-functional to a functional state. The intrinsic strength of an individual H-bond has been studied from an energetic viewpoint, but energy alone may not be a very good predictor.Methods: This paper describes inductive learning methods to train protein-independent probabilistic models of H-bond stability from molecular dynamics (MD) simulation trajectories of various proteins. The training data contains 32 input attributes (predictors) that describe an H-bond and its local environment in a conformation c and the output attribute is the probability that the H-bond will be present in an arbitrary conformation of this protein achievable from c within a time duration ?. We model dependence of the output variable on the predictors by a regression tree.Results: Several models are built using 6 MD simulation trajectories containing over 4000 distinct H-bonds (millions of occurrences). Experimental results demonstrate that such models can predict H-bond stability quite well. They perform roughly 20% better than models based on H-bond energy alone. In addition, they can accurately identify a large fraction of the least stable H-bonds in a conformation. In most tests, about 80% of the 10% H-bonds predicted as the least stable are actually among the 10% truly least stable. The important attributes identified during the tree construction are consistent with previous findings.Conclusions: We use inductive learning methods to build protein-independent probabilistic models to study H-bond stability, and demonstrate that the models perform better than H-bond energy alone. 2011 Chikalov et al; licensee BioMed Central Ltd.
AB - Background: Hydrogen bonds (H-bonds) play a key role in both the formation and stabilization of protein structures. They form and break while a protein deforms, for instance during the transition from a non-functional to a functional state. The intrinsic strength of an individual H-bond has been studied from an energetic viewpoint, but energy alone may not be a very good predictor.Methods: This paper describes inductive learning methods to train protein-independent probabilistic models of H-bond stability from molecular dynamics (MD) simulation trajectories of various proteins. The training data contains 32 input attributes (predictors) that describe an H-bond and its local environment in a conformation c and the output attribute is the probability that the H-bond will be present in an arbitrary conformation of this protein achievable from c within a time duration ?. We model dependence of the output variable on the predictors by a regression tree.Results: Several models are built using 6 MD simulation trajectories containing over 4000 distinct H-bonds (millions of occurrences). Experimental results demonstrate that such models can predict H-bond stability quite well. They perform roughly 20% better than models based on H-bond energy alone. In addition, they can accurately identify a large fraction of the least stable H-bonds in a conformation. In most tests, about 80% of the 10% H-bonds predicted as the least stable are actually among the 10% truly least stable. The important attributes identified during the tree construction are consistent with previous findings.Conclusions: We use inductive learning methods to build protein-independent probabilistic models to study H-bond stability, and demonstrate that the models perform better than H-bond energy alone. 2011 Chikalov et al; licensee BioMed Central Ltd.
UR - http://hdl.handle.net/10754/325468
UR - https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-S1-S34
UR - http://www.scopus.com/inward/record.url?scp=79951538322&partnerID=8YFLogxK
U2 - 10.1186/1471-2105-12-S1-S34
DO - 10.1186/1471-2105-12-S1-S34
M3 - Article
C2 - 21342565
SN - 1471-2105
VL - 12
JO - BMC Bioinformatics
JF - BMC Bioinformatics
IS - S1
ER -