TY - JOUR
T1 - DMIL-IsoFun: predicting isoform function using deep multi-instance learning.
AU - Yu, Guoxian
AU - Zhou, Guangjie
AU - Zhang, Xiangliang
AU - Domeniconi, Carlotta
AU - Guo, Maozu
N1 - KAUST Repository Item: Exported on 2021-07-27
PY - 2021/7/20
Y1 - 2021/7/20
N2 - MotivationAlternative splicing creates the considerable proteomic diversity and complexity on relatively limited genome. Proteoforms translated from alternatively spliced isoforms of a gene actually execute the biological functions of this gene, which reflect the functional knowledge of genes at a finer granular level. Recently, some computational approaches have been proposed to differentiate isoform functions using sequence and expression data. However, their performance is far from being desirable, mainly due to the imbalance and lack of annotations at isoform-level, and the difficulty of modeling gene-isoform relations.ResultWe propose a deep multi-instance learning based framework (DMIL-IsoFun) to differentiate the functions of isoforms. DMIL-IsoFun firstly introduces a multi-instance learning convolution neural network trained with isoform sequences and gene-level annotations to extract the feature vectors and initialize the annotations of isoforms, and then uses a class-imbalance Graph Convolution Network to refine the annotations of individual isoforms based on the isoform co-expression network and extracted features. Extensive experimental results show that DMIL-IsoFun improves the Smin and Fmax of state-of-the-art solutions by at least 29.6% and 40.8%. The effectiveness of DMIL-IsoFun is further confirmed on a testbed of human multiple-isoform genes, and Maize isoforms related with photosynthesis.AvailabilityThe code and data are available at http://www.sdu-idea.cn/codes.php?name=DMIL-Isofun.Supplementary informationSupplementary data are available at Bioinformatics online.
AB - MotivationAlternative splicing creates the considerable proteomic diversity and complexity on relatively limited genome. Proteoforms translated from alternatively spliced isoforms of a gene actually execute the biological functions of this gene, which reflect the functional knowledge of genes at a finer granular level. Recently, some computational approaches have been proposed to differentiate isoform functions using sequence and expression data. However, their performance is far from being desirable, mainly due to the imbalance and lack of annotations at isoform-level, and the difficulty of modeling gene-isoform relations.ResultWe propose a deep multi-instance learning based framework (DMIL-IsoFun) to differentiate the functions of isoforms. DMIL-IsoFun firstly introduces a multi-instance learning convolution neural network trained with isoform sequences and gene-level annotations to extract the feature vectors and initialize the annotations of isoforms, and then uses a class-imbalance Graph Convolution Network to refine the annotations of individual isoforms based on the isoform co-expression network and extracted features. Extensive experimental results show that DMIL-IsoFun improves the Smin and Fmax of state-of-the-art solutions by at least 29.6% and 40.8%. The effectiveness of DMIL-IsoFun is further confirmed on a testbed of human multiple-isoform genes, and Maize isoforms related with photosynthesis.AvailabilityThe code and data are available at http://www.sdu-idea.cn/codes.php?name=DMIL-Isofun.Supplementary informationSupplementary data are available at Bioinformatics online.
UR - http://hdl.handle.net/10754/670286
UR - https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btab532/6324304
U2 - 10.1093/bioinformatics/btab532
DO - 10.1093/bioinformatics/btab532
M3 - Article
C2 - 34282449
SN - 1367-4803
JO - Bioinformatics (Oxford, England)
JF - Bioinformatics (Oxford, England)
ER -