Semantic Disease Gene Embeddings (SmuDGE): Phenotype-based disease gene prioritization without phenotypes

Mona Alshahrani, Robert Hoehndorf*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

42 Scopus citations

Abstract

Motivation In the past years, several methods have been developed to incorporate information about phenotypes into computational disease gene prioritization methods. These methods commonly compute the similarity between a disease's (or patient's) phenotypes and a database of gene-to-phenotype associations to find the phenotypically most similar match. A key limitation of these methods is their reliance on knowledge about phenotypes associated with particular genes which is highly incomplete in humans as well as in many model organisms such as the mouse. Results We developed SmuDGE, a method that uses feature learning to generate vector-based representations of phenotypes associated with an entity. SmuDGE can be used as a trainable semantic similarity measure to compare two sets of phenotypes (such as between a disease and gene, or a disease and patient). More importantly, SmuDGE can generate phenotype representations for entities that are only indirectly associated with phenotypes through an interaction network; for this purpose, SmuDGE exploits background knowledge in interaction networks comprised of multiple types of interactions. We demonstrate that SmuDGE can match or outperform semantic similarity in phenotype-based disease gene prioritization, and furthermore significantly extends the coverage of phenotype-based methods to all genes in a connected interaction network. Availability and implementation https://github.com/bio-ontology-research-group/SmuDGE.

Original languageEnglish (US)
Pages (from-to)i901-i907
JournalBioinformatics
Volume34
Issue number17
DOIs
StatePublished - Sep 1 2018

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Fingerprint

Dive into the research topics of 'Semantic Disease Gene Embeddings (SmuDGE): Phenotype-based disease gene prioritization without phenotypes'. Together they form a unique fingerprint.

Cite this