TY - JOUR
T1 - A Comparison Study for DNA Motif Modeling on Protein Binding Microarray
AU - Wong, Ka-Chun
AU - Li, Yue
AU - Peng, Chengbin
AU - Wong, Hau-San
N1 - KAUST Repository Item: Exported on 2020-10-01
PY - 2015/6/11
Y1 - 2015/6/11
N2 - Transcription Factor Binding Sites (TFBSs) are relatively short (5-15 bp) and degenerate. Identifying them is a computationally challenging task. In particular, Protein Binding Microarray (PBM) is a high-throughput platform that can measure the DNA binding preference of a protein in a comprehensive and unbiased manner; for instance, a typical PBM experiment can measure binding signal intensities of a protein to all possible DNA k-mers (k=810). Since proteins can often bind to DNA with different binding intensities, one of the major challenges is to build motif models which can fully capture the quantitative binding affinity data. To learn DNA motif models from the non-convex objective function landscape, several optimization methods are compared and applied to the PBM motif model building problem. In particular, representative methods from different optimization paradigms have been chosen for modeling performance comparison on hundreds of PBM datasets. The results suggest that the multimodal optimization methods are very effective for capturing the binding preference information from PBM data. In particular, we observe a general performance improvement using di-nucleotide modeling over mono-nucleotide modeling. In addition, the models learned by the best-performing method are applied to two independent applications: PBM probe rotation testing and ChIP-Seq peak sequence prediction, demonstrating its biological applicability.
AB - Transcription Factor Binding Sites (TFBSs) are relatively short (5-15 bp) and degenerate. Identifying them is a computationally challenging task. In particular, Protein Binding Microarray (PBM) is a high-throughput platform that can measure the DNA binding preference of a protein in a comprehensive and unbiased manner; for instance, a typical PBM experiment can measure binding signal intensities of a protein to all possible DNA k-mers (k=810). Since proteins can often bind to DNA with different binding intensities, one of the major challenges is to build motif models which can fully capture the quantitative binding affinity data. To learn DNA motif models from the non-convex objective function landscape, several optimization methods are compared and applied to the PBM motif model building problem. In particular, representative methods from different optimization paradigms have been chosen for modeling performance comparison on hundreds of PBM datasets. The results suggest that the multimodal optimization methods are very effective for capturing the binding preference information from PBM data. In particular, we observe a general performance improvement using di-nucleotide modeling over mono-nucleotide modeling. In addition, the models learned by the best-performing method are applied to two independent applications: PBM probe rotation testing and ChIP-Seq peak sequence prediction, demonstrating its biological applicability.
UR - http://hdl.handle.net/10754/584252
UR - http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=7122289
UR - http://www.scopus.com/inward/record.url?scp=84963813115&partnerID=8YFLogxK
U2 - 10.1109/TCBB.2015.2443782
DO - 10.1109/TCBB.2015.2443782
M3 - Article
SN - 1545-5963
VL - 13
SP - 261
EP - 271
JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics
JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics
IS - 2
ER -