TY - JOUR
T1 - TSSPlant: a new tool for prediction of plant Pol II promoters
AU - Shahmuradov, Ilham A.
AU - Umarov, Ramzan
AU - Solovyev, Victor V.
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledged KAUST grant number(s): URF/1/1976-02, FCS/1/2448-01
Acknowledgements: King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) [URF/1/1976-02, FCS/1/2448-01]; Science Development Foundation under the President of the Republic of Azerbaijan [Grant EİF-2010-1(1)-40/27-3]. Funding for open access charge: King Abdullah University of Science and Technology (Awards No URF/1/1976-02 and FCS/1/2448-01).
PY - 2017/1/12
Y1 - 2017/1/12
N2 - Our current knowledge of eukaryotic promoters indicates their complex architecture that is often composed of numerous functional motifs. Most of known promoters include multiple and in some cases mutually exclusive transcription start sites (TSSs). Moreover, TSS selection depends on cell/tissue, development stage and environmental conditions. Such complex promoter structures make their computational identification notoriously difficult. Here, we present TSSPlant, a novel tool that predicts both TATA and TATA-less promoters in sequences of a wide spectrum of plant genomes. The tool was developed by using large promoter collections from ppdb and PlantProm DB. It utilizes eighteen significant compositional and signal features of plant promoter sequences selected in this study, that feed the artificial neural network-based model trained by the backpropagation algorithm. TSSPlant achieves significantly higher accuracy compared to the next best promoter prediction program for both TATA promoters (MCC≃0.84 and F1-score≃0.91 versus MCC≃0.51 and F1-score≃0.71) and TATA-less promoters (MCC≃0.80, F1-score≃0.89 versus MCC≃0.29 and F1-score≃0.50). TSSPlant is available to download as a standalone program at http://www.cbrc.kaust.edu.sa/download/.
AB - Our current knowledge of eukaryotic promoters indicates their complex architecture that is often composed of numerous functional motifs. Most of known promoters include multiple and in some cases mutually exclusive transcription start sites (TSSs). Moreover, TSS selection depends on cell/tissue, development stage and environmental conditions. Such complex promoter structures make their computational identification notoriously difficult. Here, we present TSSPlant, a novel tool that predicts both TATA and TATA-less promoters in sequences of a wide spectrum of plant genomes. The tool was developed by using large promoter collections from ppdb and PlantProm DB. It utilizes eighteen significant compositional and signal features of plant promoter sequences selected in this study, that feed the artificial neural network-based model trained by the backpropagation algorithm. TSSPlant achieves significantly higher accuracy compared to the next best promoter prediction program for both TATA promoters (MCC≃0.84 and F1-score≃0.91 versus MCC≃0.51 and F1-score≃0.71) and TATA-less promoters (MCC≃0.80, F1-score≃0.89 versus MCC≃0.29 and F1-score≃0.50). TSSPlant is available to download as a standalone program at http://www.cbrc.kaust.edu.sa/download/.
UR - http://hdl.handle.net/10754/622722
UR - https://academic.oup.com/nar/article/doi/10.1093/nar/gkw1353/2900189/TSSPlant-a-new-tool-for-prediction-of-plant-Pol-II
UR - http://www.scopus.com/inward/record.url?scp=85020193988&partnerID=8YFLogxK
U2 - 10.1093/nar/gkw1353
DO - 10.1093/nar/gkw1353
M3 - Article
C2 - 28082394
SN - 0305-1048
VL - 45
SP - gkw1353
JO - Nucleic Acids Research
JF - Nucleic Acids Research
IS - 8
ER -