TY - JOUR
T1 - Attributed heterogeneous network fusion via collaborative matrix tri-factorization
AU - Yu, Guoxian
AU - Wang, Yuehui
AU - Wang, Jun
AU - Domeniconi, Carlotta
AU - Guo, Maozu
AU - Zhang, Xiangliang
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: This work is supported by Natural Science Foundation of China (61872300 and 61873214).
PY - 2020/6/26
Y1 - 2020/6/26
N2 - Heterogeneous network based data fusion can encode diverse inter- and intra-relations between objects, and has been sparking increasing attention in recent years. Matrix factorization based data fusion models have been invented to fuse multiple data sources. However, these models generally suffer from the widely-witnessed insufficient relations between nodes and from information loss when heterogeneous attributes of diverse network nodes are transformed into ad-hoc homologous networks for fusion. In this paper, we introduce a general data fusion model called Attributed Heterogeneous Network Fusion (AHNF). AHNF firstly constructs an attributed heterogeneous network composed with different types of nodes and the diverse attribute vectors of these nodes. It uses indicator matrices to differentiate the observed inter-relations from the latent ones, and thus reduces the impact of insufficient relations between nodes. Next, it collaboratively factorizes multiple adjacency matrices and attribute data matrices of the heterogeneous network into low-rank matrices to explore the latent relations between these nodes. In this way, both the network topology and diverse attributes of nodes are fused in a coordinated fashion. Finally, it uses the optimized low-rank matrices to approximate the target relational data matrix of objects and to effectively accomplish the relation prediction. We apply AHNF to predict the lncRNA-disease associations using diverse relational and attribute data sources. AHNF achieves a larger area under the receiver operating curve 0.9367 (by at least 2.14%), and a larger area under the precision-recall curve 0.5937 (by at least 28.53%) than competitive data fusion approaches. AHNF also outperforms competing methods on predicting de novo lncRNA-disease associations, and precisely identifies lncRNAs associated with breast, stomach, prostate, and pancreatic cancers. AHNF is a comprehensive data fusion framework for universal attributed multi-type relational data. The code and datasets are available at http://mlda.swu.edu.cn/codes.php?name=AHNF.
AB - Heterogeneous network based data fusion can encode diverse inter- and intra-relations between objects, and has been sparking increasing attention in recent years. Matrix factorization based data fusion models have been invented to fuse multiple data sources. However, these models generally suffer from the widely-witnessed insufficient relations between nodes and from information loss when heterogeneous attributes of diverse network nodes are transformed into ad-hoc homologous networks for fusion. In this paper, we introduce a general data fusion model called Attributed Heterogeneous Network Fusion (AHNF). AHNF firstly constructs an attributed heterogeneous network composed with different types of nodes and the diverse attribute vectors of these nodes. It uses indicator matrices to differentiate the observed inter-relations from the latent ones, and thus reduces the impact of insufficient relations between nodes. Next, it collaboratively factorizes multiple adjacency matrices and attribute data matrices of the heterogeneous network into low-rank matrices to explore the latent relations between these nodes. In this way, both the network topology and diverse attributes of nodes are fused in a coordinated fashion. Finally, it uses the optimized low-rank matrices to approximate the target relational data matrix of objects and to effectively accomplish the relation prediction. We apply AHNF to predict the lncRNA-disease associations using diverse relational and attribute data sources. AHNF achieves a larger area under the receiver operating curve 0.9367 (by at least 2.14%), and a larger area under the precision-recall curve 0.5937 (by at least 28.53%) than competitive data fusion approaches. AHNF also outperforms competing methods on predicting de novo lncRNA-disease associations, and precisely identifies lncRNAs associated with breast, stomach, prostate, and pancreatic cancers. AHNF is a comprehensive data fusion framework for universal attributed multi-type relational data. The code and datasets are available at http://mlda.swu.edu.cn/codes.php?name=AHNF.
UR - http://hdl.handle.net/10754/664146
UR - https://linkinghub.elsevier.com/retrieve/pii/S1566253520303079
UR - http://www.scopus.com/inward/record.url?scp=85087277071&partnerID=8YFLogxK
U2 - 10.1016/j.inffus.2020.06.012
DO - 10.1016/j.inffus.2020.06.012
M3 - Article
SN - 1566-2535
VL - 63
SP - 153
EP - 165
JO - Information Fusion
JF - Information Fusion
ER -