TY - GEN
T1 - Dataset Recommendation via Variational Graph Autoencoder
AU - Altaf, Basmah
AU - Akujuobi, Uchenna Thankgod
AU - Yu, Lu
AU - Zhang, Xiangliang
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: This work is supported by King Abdullah University of Science and Technology (KAUST), Saudi Arabia.
PY - 2019
Y1 - 2019
N2 - This paper targets on designing a query-based dataset recommendation system, which accepts a query denoting a user's research interest as a set of research papers and returns a list of recommended datasets that are ranked by the potential usefulness for the user's research need. The motivation of building such a system is to save users from spending time on heavy literature review work to find usable datasets.We start by constructing a two-layer network: one layer of citation network, and the other layer of datasets, connected to the firstlayer papers in which they were used. A query highlights a set of papers in the citation layer. However, answering the query as a naive retrieval of datasets linked with these highlighted papers excludes other semantically relevant datasets, which widely exist several hops away from the queried papers. We propose to learn representations of research papers and datasets in the two-layer network using heterogeneous variational graph autoencoder, and then compute the relevance of the query to the dataset candidates based on the learned representations. Our ranked datasets shown in extensive evaluation results are validated to be more truly relevant than those obtained by naive retrieval methods and adoptions of existing related solutions.
AB - This paper targets on designing a query-based dataset recommendation system, which accepts a query denoting a user's research interest as a set of research papers and returns a list of recommended datasets that are ranked by the potential usefulness for the user's research need. The motivation of building such a system is to save users from spending time on heavy literature review work to find usable datasets.We start by constructing a two-layer network: one layer of citation network, and the other layer of datasets, connected to the firstlayer papers in which they were used. A query highlights a set of papers in the citation layer. However, answering the query as a naive retrieval of datasets linked with these highlighted papers excludes other semantically relevant datasets, which widely exist several hops away from the queried papers. We propose to learn representations of research papers and datasets in the two-layer network using heterogeneous variational graph autoencoder, and then compute the relevance of the query to the dataset candidates based on the learned representations. Our ranked datasets shown in extensive evaluation results are validated to be more truly relevant than those obtained by naive retrieval methods and adoptions of existing related solutions.
UR - http://hdl.handle.net/10754/661922
UR - https://ieeexplore.ieee.org/document/8970775/
UR - http://www.scopus.com/inward/record.url?scp=85078946614&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2019.00011
DO - 10.1109/ICDM.2019.00011
M3 - Conference contribution
SN - 9781728146041
SP - 11
EP - 20
BT - 2019 IEEE International Conference on Data Mining (ICDM)
PB - IEEE
ER -