TY - JOUR
T1 - DeepGSR: An optimized deep-learning structure for the recognition of genomic signals and regions
AU - Kalkatawi, Manal M.
AU - Magana-Mora, Arturo
AU - Jankovic, Boris R.
AU - Bajic, Vladimir B.
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledged KAUST grant number(s): BAS/1/1606-01-01
Acknowledgements: We are grateful to Mohammad Shoaib Amini for helping with the data extraction. This research made use of the resources of the compute and GPU clusters at King Abdullah University of Science & Technology (KAUST), Thuwal, Saudi Arabia. This work was supported by the King Abdullah University of Science and Technology (KAUST) through the baseline research fund BAS/1/1606-01-01 for Vladimir B. Bajic. The open access charges for this article are covered from the same fund.
PY - 2018/9/1
Y1 - 2018/9/1
N2 - Motivation
\nRecognition of different genomic signals and regions (GSRs) in DNA is crucial for understanding genome organization, gene regulation, and gene function, which in turn generate better genome and gene annotations. Although many methods have been developed to recognize GSRs, their pure computational identification remains challenging. Moreover, various GSRs usually require a specialized set of features for developing robust recognition models. Recently, deep-learning (DL) methods have been shown to generate more accurate prediction models than ‘shallow’ methods without the need to develop specialized features for the problems in question. Here, we explore the potential use of DL for the recognition of GSRs.
\nResults
\nWe developed DeepGSR, an optimized DL architecture for the prediction of different types of GSRs. The performance of the DeepGSR structure is evaluated on the recognition of polyadenylation signals (PAS) and translation initiation sites (TIS) of different organisms: human, mouse, bovine, and fruit fly. The results show that DeepGSR outperformed the state-of-the-art methods, reducing the classification error rate of the PAS and TIS prediction in the human genome by up to 29% and 86%, respectively. Moreover, the cross-organisms and genome-wide analyses we performed, confirmed the robustness of DeepGSR and provided new insights into the conservation of examined GSRs across species.
AB - Motivation
\nRecognition of different genomic signals and regions (GSRs) in DNA is crucial for understanding genome organization, gene regulation, and gene function, which in turn generate better genome and gene annotations. Although many methods have been developed to recognize GSRs, their pure computational identification remains challenging. Moreover, various GSRs usually require a specialized set of features for developing robust recognition models. Recently, deep-learning (DL) methods have been shown to generate more accurate prediction models than ‘shallow’ methods without the need to develop specialized features for the problems in question. Here, we explore the potential use of DL for the recognition of GSRs.
\nResults
\nWe developed DeepGSR, an optimized DL architecture for the prediction of different types of GSRs. The performance of the DeepGSR structure is evaluated on the recognition of polyadenylation signals (PAS) and translation initiation sites (TIS) of different organisms: human, mouse, bovine, and fruit fly. The results show that DeepGSR outperformed the state-of-the-art methods, reducing the classification error rate of the PAS and TIS prediction in the human genome by up to 29% and 86%, respectively. Moreover, the cross-organisms and genome-wide analyses we performed, confirmed the robustness of DeepGSR and provided new insights into the conservation of examined GSRs across species.
UR - http://hdl.handle.net/10754/628696
UR - https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/bty752/5089227
UR - http://www.scopus.com/inward/record.url?scp=85064111361&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/bty752
DO - 10.1093/bioinformatics/bty752
M3 - Article
C2 - 30184052
SN - 1367-4803
VL - 35
SP - 1125
EP - 1132
JO - Bioinformatics
JF - Bioinformatics
IS - 7
ER -