Genome annotation is an important topic since it provides information for the foundation
of downstream genomic and biological research. It is considered as a way of summarizing
part of existing knowledge about the genomic characteristics of an organism. Annotating
different regions of a genome sequence is known as structural annotation, while
identifying functions of these regions is considered as a functional annotation. In silico
approaches can facilitate both tasks that otherwise would be difficult and timeconsuming.
This study contributes to genome annotation by introducing several novel
bioinformatics methods, some based on machine learning (ML) approaches.
First, we present Dragon PolyA Spotter (DPS), a method for accurate identification of the
polyadenylation signals (PAS) within human genomic DNA sequences. For this, we derived
a novel feature-set able to characterize properties of the genomic region surrounding the
PAS, enabling development of high accuracy optimized ML predictive models. DPS
considerably outperformed the state-of-the-art results.
The second contribution concerns developing generic models for structural annotation,
i.e., the recognition of different genomic signals and regions (GSR) within eukaryotic DNA.
We developed DeepGSR, a systematic framework that facilitates generating ML models
to predict GSR with high accuracy. To the best of our knowledge, no available generic and
automated method exists for such task that could facilitate the studies of newly sequenced organisms. The prediction module of DeepGSR uses deep learning algorithms
to derive highly abstract features that depend mainly on proper data representation and
hyperparameters calibration. DeepGSR, which was evaluated on recognition of PAS and
translation initiation sites (TIS) in different organisms, yields a simpler and more precise
representation of the problem under study, compared to some other hand-tailored
models, while producing high accuracy prediction results.
Finally, we focus on deriving a model capable of facilitating the functional annotation of
prokaryotes. As far as we know, there is no fully automated system for detailed
comparison of functional annotations generated by different methods. Hence, we
developed BEACON, a method and supporting system that compares gene annotation
from various methods to produce a more reliable and comprehensive annotation. Overall,
our research contributed to different aspects of the genome annotation.
Date of Award | Nov 30 2017 |
---|
Original language | English (US) |
---|
Awarding Institution | - Computer, Electrical and Mathematical Sciences and Engineering
|
---|
Supervisor | Vladimir Bajic (Supervisor) |
---|
- recognition
- prediction
- genomic signals
- genome annotation
- deep learning
- genomic regions