TY - JOUR
T1 - Evolving Transcription Factor Binding Site Models From Protein Binding Microarray Data
AU - Wong, Ka-Chun
AU - Peng, Chengbin
AU - Li, Yue
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: This work was supported in part by the City University
of Hong Kong under Project 7200444/CS, and in part by the Amazon Web Service Research Grant.
PY - 2016/2/2
Y1 - 2016/2/2
N2 - Protein binding microarray (PBM) is a high-throughput platform that can measure the DNA binding preference of a protein in a comprehensive and unbiased manner. In this paper, we describe the PBM motif model building problem. We apply several evolutionary computation methods and compare their performance with the interior point method, demonstrating their performance advantages. In addition, given the PBM domain knowledge, we propose and describe a novel method called kmerGA which makes domain-specific assumptions to exploit PBM data properties to build more accurate models than the other models built. The effectiveness and robustness of kmerGA is supported by comprehensive performance benchmarking on more than 200 datasets, time complexity analysis, convergence analysis, parameter analysis, and case studies. To demonstrate its utility further, kmerGA is applied to two real world applications: 1) PBM rotation testing and 2) ChIP-Seq peak sequence prediction. The results support the biological relevance of the models learned by kmerGA, and thus its real world applicability.
AB - Protein binding microarray (PBM) is a high-throughput platform that can measure the DNA binding preference of a protein in a comprehensive and unbiased manner. In this paper, we describe the PBM motif model building problem. We apply several evolutionary computation methods and compare their performance with the interior point method, demonstrating their performance advantages. In addition, given the PBM domain knowledge, we propose and describe a novel method called kmerGA which makes domain-specific assumptions to exploit PBM data properties to build more accurate models than the other models built. The effectiveness and robustness of kmerGA is supported by comprehensive performance benchmarking on more than 200 datasets, time complexity analysis, convergence analysis, parameter analysis, and case studies. To demonstrate its utility further, kmerGA is applied to two real world applications: 1) PBM rotation testing and 2) ChIP-Seq peak sequence prediction. The results support the biological relevance of the models learned by kmerGA, and thus its real world applicability.
UR - http://hdl.handle.net/10754/597027
UR - http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=7396941
UR - http://www.scopus.com/inward/record.url?scp=84958643010&partnerID=8YFLogxK
U2 - 10.1109/TCYB.2016.2519380
DO - 10.1109/TCYB.2016.2519380
M3 - Article
SN - 2168-2267
VL - 47
SP - 415
EP - 424
JO - IEEE Transactions on Cybernetics
JF - IEEE Transactions on Cybernetics
IS - 2
ER -