TY - JOUR
T1 - Differentiating isoform functions with collaborative matrix factorization.
AU - Wang, Keyao
AU - Wang, Jun
AU - Domeniconi, Carlotta
AU - Zhang, Xiangliang
AU - Yu, Guoxian
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: This work was supported by National Natural Science Foundation of China [61872300, 61873214]; Fundamental Research Funds for
the Central Universities [XDJK2019B024]; and Natural Science Foundation of CQ CSTC [cstc2018jcyjAX0228].
PY - 2020/3/17
Y1 - 2020/3/17
N2 - MOTIVATION:Isoforms are alternatively spliced mRNAs of genes. They can be translated into different functional proteoforms, and thus greatly increase the functional diversity of protein variants (or proteoforms). Differentiating the functions of isoforms (or proteoforms) helps understanding the underlying pathology of various complex diseases at a deeper granularity. Since existing functional genomic databases uniformly record the annotations at the gene-level, and rarely record the annotations at the isoform-level, differentiating isoform functions is more challenging than the traditional gene-level function prediction. RESULTS:Several approaches have been proposed to differentiate the functions of isoforms. They generally follow the multi-instance learning paradigm by viewing each gene as a bag and the spliced isoforms as its instances, and push functions of bags onto instances. These approaches implicitly assume the collected annotations of genes are complete and only integrate multiple RNA-seq datasets. As such, they have compromised performance. We propose a data integrative solution (called DisoFun) to Differentiate isoform Functions with collaborative matrix factorization. DisoFun assumes the functional annotations of genes are aggregated from those of key isoforms. It collaboratively factorizes the isoform data matrix and gene-term data matrix (storing Gene Ontology annotations of genes) into low-rank matrices to simultaneously explore the latent key isoforms, and achieve function prediction by aggregating predictions to their originating genes. In addition, it leverages the PPI network and Gene Ontology structure to further coordinate the matrix factorization. Extensive experimental results show that DisoFun improves the area under the receiver operating characteristic curve and area under the precision-recall curve of existing solutions by at least 7.7 and 28.9%, respectively. We further investigate DisoFun on four exemplar genes (LMNA, ADAM15, BCL2L1 and CFLAR) with known functions at the isoform-level, and observed that DisoFun can differentiate functions of their isoforms with 90.5% accuracy. AVAILABILITY AND IMPLEMENTATION:The code of DisoFun is available at mlda.swu.edu.cn/codes.php?name=DisoFun. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
AB - MOTIVATION:Isoforms are alternatively spliced mRNAs of genes. They can be translated into different functional proteoforms, and thus greatly increase the functional diversity of protein variants (or proteoforms). Differentiating the functions of isoforms (or proteoforms) helps understanding the underlying pathology of various complex diseases at a deeper granularity. Since existing functional genomic databases uniformly record the annotations at the gene-level, and rarely record the annotations at the isoform-level, differentiating isoform functions is more challenging than the traditional gene-level function prediction. RESULTS:Several approaches have been proposed to differentiate the functions of isoforms. They generally follow the multi-instance learning paradigm by viewing each gene as a bag and the spliced isoforms as its instances, and push functions of bags onto instances. These approaches implicitly assume the collected annotations of genes are complete and only integrate multiple RNA-seq datasets. As such, they have compromised performance. We propose a data integrative solution (called DisoFun) to Differentiate isoform Functions with collaborative matrix factorization. DisoFun assumes the functional annotations of genes are aggregated from those of key isoforms. It collaboratively factorizes the isoform data matrix and gene-term data matrix (storing Gene Ontology annotations of genes) into low-rank matrices to simultaneously explore the latent key isoforms, and achieve function prediction by aggregating predictions to their originating genes. In addition, it leverages the PPI network and Gene Ontology structure to further coordinate the matrix factorization. Extensive experimental results show that DisoFun improves the area under the receiver operating characteristic curve and area under the precision-recall curve of existing solutions by at least 7.7 and 28.9%, respectively. We further investigate DisoFun on four exemplar genes (LMNA, ADAM15, BCL2L1 and CFLAR) with known functions at the isoform-level, and observed that DisoFun can differentiate functions of their isoforms with 90.5% accuracy. AVAILABILITY AND IMPLEMENTATION:The code of DisoFun is available at mlda.swu.edu.cn/codes.php?name=DisoFun. SUPPLEMENTARY INFORMATION:Supplementary data are available at Bioinformatics online.
UR - http://hdl.handle.net/10754/662248
UR - https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz847/5625622
UR - http://www.scopus.com/inward/record.url?scp=85082015982&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btz847
DO - 10.1093/bioinformatics/btz847
M3 - Article
C2 - 32176770
SN - 1367-4803
VL - 36
SP - 1864
EP - 1871
JO - Bioinformatics (Oxford, England)
JF - Bioinformatics (Oxford, England)
IS - 6
ER -