TY - JOUR
T1 - A Bayesian Search for Transcriptional Motifs
AU - Miller, Andrew K.
AU - Print, Cristin G.
AU - Nielsen, Poul M. F.
AU - Crampin, Edmund J.
N1 - KAUST Repository Item: Exported on 2021-09-16
Acknowledgements: This work was supported by the New Zealand Tertiary Education Commission [Top Achiever’s Doctoral Scholarship] to AKM (http://www.tec.govt.nz); the New Zealand Health Research Council International Investment Opportunities Fund (http://www.hrc.govt.nz/) to CGP and EJC; the Breast Cancer Research Trust (http://www.breastcancercure.org.nz/) to CGP and EJC; and the New Zealand Foundation for Research, Science and Technology to CGP and EJC (http://www.frst.govt.nz/). This publication is based on work (by EJC) that was supported in part by award No. KUK-C1-013-04, made by King Abdullah University of Science and Technology (http://www.kaust.edu.sa/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
This publication acknowledges KAUST support, but has no KAUST affiliated authors.
PY - 2010
Y1 - 2010
N2 - Identifying transcription factor (TF) binding sites (TFBSs) is an important step towards understanding transcriptional regulation. A common approach is to use gaplessly aligned, experimentally supported TFBSs for a particular TF, and algorithmically search for more occurrences of the same TFBSs. The largest publicly available databases of TF binding specificities contain models which are represented as position weight matrices (PWM). There are other methods using more sophisticated representations, but these have more limited databases, or aren't publicly available. Therefore, this paper focuses on methods that search using one PWM per TF. An algorithm, MATCHTM, for identifying TFBSs corresponding to a particular PWM is available, but is not based on a rigorous statistical model of TF binding, making it difficult to interpret or adjust the parameters and output of the algorithm. Furthermore, there is no public description of the algorithm sufficient to exactly reproduce it. Another algorithm, MAST, computes a p-value for the presence of a TFBS using true probabilities of finding each base at each offset from that position. We developed a statistical model, BaSeTraM, for the binding of TFs to TFBSs, taking into account random variation in the base present at each position within a TFBS. Treating the counts in the matrices and the sequences of sites as random variables, we combine this TFBS composition model with a background model to obtain a Bayesian classifier. We implemented our classifier in a package (SBaSeTraM). We tested SBaSeTraM against a MATCHTM implementation by searching all probes used in an experimental Saccharomyces cerevisiae TF binding dataset, and comparing our predictions to the data. We found no statistically significant differences in sensitivity between the algorithms (at fixed selectivity), indicating that SBaSeTraM's performance is at least comparable to the leading currently available algorithm. Our software is freely available at: http://wiki.github.com/A1kmm/sbasetram/building-the-tools. © 2010 Miller et al.
AB - Identifying transcription factor (TF) binding sites (TFBSs) is an important step towards understanding transcriptional regulation. A common approach is to use gaplessly aligned, experimentally supported TFBSs for a particular TF, and algorithmically search for more occurrences of the same TFBSs. The largest publicly available databases of TF binding specificities contain models which are represented as position weight matrices (PWM). There are other methods using more sophisticated representations, but these have more limited databases, or aren't publicly available. Therefore, this paper focuses on methods that search using one PWM per TF. An algorithm, MATCHTM, for identifying TFBSs corresponding to a particular PWM is available, but is not based on a rigorous statistical model of TF binding, making it difficult to interpret or adjust the parameters and output of the algorithm. Furthermore, there is no public description of the algorithm sufficient to exactly reproduce it. Another algorithm, MAST, computes a p-value for the presence of a TFBS using true probabilities of finding each base at each offset from that position. We developed a statistical model, BaSeTraM, for the binding of TFs to TFBSs, taking into account random variation in the base present at each position within a TFBS. Treating the counts in the matrices and the sequences of sites as random variables, we combine this TFBS composition model with a background model to obtain a Bayesian classifier. We implemented our classifier in a package (SBaSeTraM). We tested SBaSeTraM against a MATCHTM implementation by searching all probes used in an experimental Saccharomyces cerevisiae TF binding dataset, and comparing our predictions to the data. We found no statistically significant differences in sensitivity between the algorithms (at fixed selectivity), indicating that SBaSeTraM's performance is at least comparable to the leading currently available algorithm. Our software is freely available at: http://wiki.github.com/A1kmm/sbasetram/building-the-tools. © 2010 Miller et al.
UR - http://hdl.handle.net/10754/671247
UR - https://dx.plos.org/10.1371/journal.pone.0013897
UR - http://www.scopus.com/inward/record.url?scp=78649513744&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0013897
DO - 10.1371/journal.pone.0013897
M3 - Article
SN - 1932-6203
VL - 5
SP - e13897
JO - PLOS ONE
JF - PLOS ONE
IS - 11
ER -