Modeling of transcription factor binding sites (TFBSs) and TFBS prediction on genomic
sequences are important steps to elucidate transcription regulatory mechanism. Dependency
of transcription regulation on a great number of factors such as chemical specificity, molecular
structure, genomic and epigenetic characteristics, long distance interaction, makes this a
challenging problem. Different experimental procedures generate evidence that DNA-binding
domains of transcription factors show considerable DNA sequence specificity. Probabilistic
modeling of TFBSs has been moderately successful in identifying patterns from a family
of sequences. In this study, we compare performances of different probabilistic models and
try to estimate their efficacy over experimental TFBSs data. We build a pipeline to calculate
sensitivity and specificity from aligned TFBS sequences for several probabilistic models,
such as Markov chains, hidden Markov models, Bayesian networks. Our work, containing
relevant statistics and evaluation for the models, can help researchers to choose the most
appropriate model for the problem at hand.
Date of Award | May 2012 |
---|
Original language | English (US) |
---|
Awarding Institution | - Computer, Electrical and Mathematical Sciences and Engineering
|
---|
Supervisor | Vladimir Bajic (Supervisor) |
---|