Machine Learning Models for Biomedical Ontology Integration and Analysis

  • Fatima Z. Smaili

Student thesis: Doctoral Thesis

Abstract

Biological knowledge is widely represented in the form of ontologies and ontologybased annotations. Biomedical ontologies describe known phenomena in biology using formal axioms, and the annotations associate an entity (e.g. genes, diseases, chemicals, etc.) with a set of biological concepts. In addition to formally structured axioms, ontologies contain meta-data in the form of annotation properties expressed mostly in natural language which provide valuable pieces of information that characterize ontology concepts. The structure and information contained in ontologies and their annotations make them valuable for use in machine learning, data analysis and knowledge extraction tasks. I develop the rst approaches that can exploit all of the information encoded in ontologies, both formal and informal, to learn feature embeddings of biological concepts and biological entities based on their annotations to ontologies. Notably, I develop the rst approach to use all the formal content of ontologies in the form of logical axioms and entity annotations to generate feature vectors of biological entities using neural language models. I extend the proposed algorithm by enriching the obtained feature vectors through representing the natural language annotation properties within the ontology meta-data as axioms. Transfer learning is then applied to learn from the biomedical literature and apply on the formal knowledge of ontologies. To optimize learning that combines the formal content of biomedical ontologies and natural language data such as the literature, I also propose a new approach that uses self-normalization with a deep Siamese neural network that improves learning from both the formal knowledge within ontologies and textual data. I validate the proposed algorithms by applying them to the Gene Ontology to generate feature vectors of proteins based on their functions, and to the PhenomeNet ontology to generate features of genes and diseases based on the phenotypes they are associated with. The generated features are then used to train a variety of machinelearning based classi ers to perform di erent prediction tasks including the prediction of protein interactions, gene{disease associations and the toxicological e ects of chemicals. I also use the proposed methods to conduct the rst quantitative evaluation of the quality of the axioms and meta-data included in ontologies to prove that including axioms as background improves ontology-based prediction. The proposed approaches can be applied to a wide range of other bioinformatics research problems including similarity-based prediction and classi cation of interaction types using supervised learning, or clustering.
Date of AwardSep 14 2020
Original languageEnglish (US)
Awarding Institution
  • Computer, Electrical and Mathematical Sciences and Engineering
SupervisorXin Gao (Supervisor)

Keywords

  • Machine learning
  • Bioinformatics
  • Biomedical ontologies
  • Symbolic AI

Cite this

'