Biological knowledge is widely represented in the form of ontologies and ontologybased
annotations. Biomedical ontologies describe known phenomena in biology using
formal axioms, and the annotations associate an entity (e.g. genes, diseases, chemicals,
etc.) with a set of biological concepts. In addition to formally structured
axioms, ontologies contain meta-data in the form of annotation properties expressed
mostly in natural language which provide valuable pieces of information that characterize
ontology concepts. The structure and information contained in ontologies and
their annotations make them valuable for use in machine learning, data analysis and
knowledge extraction tasks.
I develop the rst approaches that can exploit all of the information encoded in ontologies,
both formal and informal, to learn feature embeddings of biological concepts
and biological entities based on their annotations to ontologies. Notably, I develop the
rst approach to use all the formal content of ontologies in the form of logical axioms
and entity annotations to generate feature vectors of biological entities using neural
language models. I extend the proposed algorithm by enriching the obtained feature
vectors through representing the natural language annotation properties within the
ontology meta-data as axioms. Transfer learning is then applied to learn from the
biomedical literature and apply on the formal knowledge of ontologies.
To optimize learning that combines the formal content of biomedical ontologies
and natural language data such as the literature, I also propose a new approach that uses self-normalization with a deep Siamese neural network that improves learning
from both the formal knowledge within ontologies and textual data.
I validate the proposed algorithms by applying them to the Gene Ontology to
generate feature vectors of proteins based on their functions, and to the PhenomeNet
ontology to generate features of genes and diseases based on the phenotypes they are
associated with. The generated features are then used to train a variety of machinelearning
based classi ers to perform di erent prediction tasks including the prediction
of protein interactions, gene{disease associations and the toxicological e ects of chemicals.
I also use the proposed methods to conduct the rst quantitative evaluation of
the quality of the axioms and meta-data included in ontologies to prove that including
axioms as background improves ontology-based prediction.
The proposed approaches can be applied to a wide range of other bioinformatics
research problems including similarity-based prediction and classi cation of interaction
types using supervised learning, or clustering.
Date of Award | Sep 14 2020 |
---|
Original language | English (US) |
---|
Awarding Institution | - Computer, Electrical and Mathematical Sciences and Engineering
|
---|
Supervisor | Xin Gao (Supervisor) |
---|
- Machine learning
- Bioinformatics
- Biomedical ontologies
- Symbolic AI