Graph embedding, aiming to learn low-dimensional representations for nodes in graphs, has attracted increasing attention due to its critical application including node classification, link prediction and clustering in social network analysis. Most existing algorithms for graph embedding only rely on the topology information and fail to use the copious information in nodes as well as edges. As a result, their performance for many tasks may not be satisfactory. In this thesis, we proposed a novel and general framework for graph embedding with rich text information (GERI) through constructing a heterogeneous network, in which we integrate node and edge content information with graph topology. Specially, we designed a novel biased random walk to explore the constructed heterogeneous network with the notion of flexible neighborhood. Our sampling strategy can compromise between BFS and DFS local search on heterogeneous graph. To further improve our algorithm, we proposed semi-supervised GERI (SGERI), which learns graph embedding in an discriminative manner through heterogeneous network with label information. The efficacy of our method is demonstrated by extensive comparison experiments with 9 baselines over multi-label and multi-class classification on various datasets including Citeseer, Cora, DBLP and Wiki. It shows that GERI improves the Micro-F1 and Macro-F1 of node classification up to 10%, and SGERI improves GERI by 5% in Wiki.
|Date made available
|KAUST Research Repository