This work uses genetic algorithms (GA) to reduce the complexity of the artificial neural networks (ANNs) and decision trees (DTs) for the accurate recognition
of translation initiation sites (TISs) in Arabidopsis Thaliana. The Arabidopsis data
was extracted directly from genomic DNA sequences. Methods derived in this work
resulted in both reduced complexity of the predictors, as well as in improvement in
prediction accuracy (generalization). Optimization through use of GA is generally
a computationally intensive task. One of the approaches to overcome this problem
is to use parallelization of code that implements GA, thus allowing computation on
multiprocessing infrastructure. However, further improvement in performance GA
implementation could be achieved through modification done to GA basic operations
such as selection, crossover and mutation. In this work we explored two such improvements,
namely evolutive mutation and GA-Simplex crossover operation.
In this thesis we studied the benefit of these modifications on the problem of TISs
recognition. Compared to the non-modified GA approach, we reduced the number of
weights in the resulting model's neural network component by 51% and the number of
nodes in the model's DTs component by 97% whilst improving the model's accuracy
at the same time.
Separately, we developed another methodology for reducing the complexity of prediction
models by optimizing the composition of training data subsets in bootstrap
aggregation (bagging) methodology. This optimization is achieved by applying a new
GA-based bagging methodology in order to optimize the composition of each of the
training data subsets. This approach has shown in our test cases to considerably
enhance the accuracy of the TIS prediction model compared to the original bagging
methodology.
Although these methods are applied to the problem of accurate prediction of TISs we
believe that these methodologies have a potential for wider scope of application.
Date of Award | Jun 2011 |
---|
Original language | English (US) |
---|
Awarding Institution | - Computer, Electrical and Mathematical Sciences and Engineering
|
---|
Supervisor | Vladimir Bajic (Supervisor) |
---|