The amount of available protein sequences is rapidly increasing, mainly as a consequence
of the development and application of high throughput sequencing technologies
in the life sciences. It is a key question in the life sciences to identify the functions of
proteins, and furthermore to identify the phenotypes that may be associated with a
loss (or gain) of function in these proteins. Protein functions are generally determined
experimentally, and it is clear that experimental determination of protein functions
will not scale to the current { and rapidly increasing { amount of available protein
sequences (over 300 million). Furthermore, identifying phenotypes resulting from loss
of function is even more challenging as the phenotype is modi ed by whole organism
interactions and environmental variables. It is clear that accurate computational prediction
of protein functions and loss of function phenotypes would be of signi cant
value both to academic research and to the biotechnology industry.
We developed and expanded novel methods for representation learning, predicting
protein functions and their loss of function phenotypes. We use deep neural network
algorithm and combine them with symbolic inference into neural-symbolic algorithms.
Our work signi cantly improves previously developed methods for predicting protein
functions through methodological advances in machine learning, incorporation
of broader data types that may be predictive of functions, and improved systems for
neural-symbolic integration.
The methods we developed are generic and can be applied to other domains in
which similar types of structured and unstructured information exist. In future, our methods can be applied to prediction of protein function for metagenomic samples in order to evaluate the potential for discovery of novel proteins of industrial value. Also our methods can be applied to the prediction of loss of function
phenotypes in human genetics and incorporate the results in a variant prioritization
tool that can be applied to diagnose patients with Mendelian disorders.
Date of Award | Apr 8 2020 |
---|
Original language | English (US) |
---|
Awarding Institution | - Computer, Electrical and Mathematical Sciences and Engineering
|
---|
Supervisor | Robert Hoehndorf (Supervisor) |
---|
- gene functions
- phenotypes
- ontologies
- embeddings
- deep neural networks
- machine learning