TY - JOUR
T1 - Hybrid model for efficient prediction of Poly(A) signals in human genomic DNA
AU - Albalawi, Fahad
AU - Chahid, Abderrazak
AU - Guo, Xingang
AU - Albaradei, Somayah
AU - Magana-Mora, Arturo
AU - Jankovic, Boris R.
AU - Uludag, Mahmut
AU - Van Neste, Christophe Marc
AU - Essack, Magbubah
AU - Laleg-Kirati, Taous-Meriem
AU - Bajic, Vladimir B.
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledged KAUST grant number(s): BAS/1/1606-01-01, BAS/1/1627-01-01, FCC/1/1976-17-01
Acknowledgements: This work has been supported by the King Abdullah University of Science and Technology (KAUST) Base Research Fund (BAS/1/1606-01-01) to VBB, (BAS/1/1627-01-01) to TMLK, and KAUST Office of Sponsored Research (OSR) under Awards No CARF – FCC/1/1976-17-01.
PY - 2019/4/13
Y1 - 2019/4/13
N2 - Polyadenylation signals (PAS) are found in most protein-coding and some non-coding genes in eukaryotes. Their accurate recognition improves understanding gene regulation mechanisms and recognition of the 3'-end of transcribed gene regions where premature or alternate transcription ends may lead to various diseases. Although different methods and tools for in-silico prediction of genomic signals have been proposed, the correct identification of PAS in genomic DNA remains challenging due to a vast number of non-relevant hexamers identical to PAS hexamers. In this study, we developed a novel method for PAS recognition. The method is implemented in a hybrid PAS recognition model (HybPAS), which is based on deep neural networks (DNNs) and logistic regression models (LRMs). One of such models is developed for each of the 12 most frequent human PAS hexamers. DNN models appeared the best for eight PAS types (including the two most frequent PAS hexamers), while LRM appeared best for the remaining four PAS types. The new models use different combinations of signal processing-based, statistical, and sequence-based features as input. The results obtained on human genomic data show that HybPAS outperforms the well-tuned state-of-the-art Omni-PolyA models, reducing the classification error for different PAS hexamers by up to 57.35% for 10 out of 12 PAS types, with Omni-PolyA models being better for two PAS types. For the most frequent PAS types, 'AATAAA' and 'ATTAAA', HybPAS reduced the error rate by 35.14% and 34.48%, respectively. On average, HybPAS reduces the error by 30.29%. HybPAS is implemented partly in Python and in MATLAB available at https://github.com/EMANG-KAUST/PolyA_Prediction_LRM_DNN.
AB - Polyadenylation signals (PAS) are found in most protein-coding and some non-coding genes in eukaryotes. Their accurate recognition improves understanding gene regulation mechanisms and recognition of the 3'-end of transcribed gene regions where premature or alternate transcription ends may lead to various diseases. Although different methods and tools for in-silico prediction of genomic signals have been proposed, the correct identification of PAS in genomic DNA remains challenging due to a vast number of non-relevant hexamers identical to PAS hexamers. In this study, we developed a novel method for PAS recognition. The method is implemented in a hybrid PAS recognition model (HybPAS), which is based on deep neural networks (DNNs) and logistic regression models (LRMs). One of such models is developed for each of the 12 most frequent human PAS hexamers. DNN models appeared the best for eight PAS types (including the two most frequent PAS hexamers), while LRM appeared best for the remaining four PAS types. The new models use different combinations of signal processing-based, statistical, and sequence-based features as input. The results obtained on human genomic data show that HybPAS outperforms the well-tuned state-of-the-art Omni-PolyA models, reducing the classification error for different PAS hexamers by up to 57.35% for 10 out of 12 PAS types, with Omni-PolyA models being better for two PAS types. For the most frequent PAS types, 'AATAAA' and 'ATTAAA', HybPAS reduced the error rate by 35.14% and 34.48%, respectively. On average, HybPAS reduces the error by 30.29%. HybPAS is implemented partly in Python and in MATLAB available at https://github.com/EMANG-KAUST/PolyA_Prediction_LRM_DNN.
UR - http://hdl.handle.net/10754/631950
UR - https://www.sciencedirect.com/science/article/pii/S104620231830361X
UR - http://www.scopus.com/inward/record.url?scp=85064326020&partnerID=8YFLogxK
U2 - 10.1016/j.ymeth.2019.04.001
DO - 10.1016/j.ymeth.2019.04.001
M3 - Article
C2 - 30991099
SN - 1046-2023
VL - 166
SP - 31
EP - 39
JO - Methods
JF - Methods
ER -