TY - GEN
T1 - Mining top-k Popular Datasets via a Deep Generative Model
AU - Akujuobi, Uchenna Thankgod
AU - Sun, Ke
AU - Zhang, Xiangliang
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledged KAUST grant number(s): 2639
Acknowledgements: This publication is based upon work supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. 2639. This work was performed when Ke Sun was affiliated with KAUST.
PY - 2019/1/25
Y1 - 2019/1/25
N2 - Finding popular datasets to work on is essential for data-driven research domains. In this paper, we focus on the problem of extracting top-k popular datasets that have been used in data mining, machine learning, and artificial intelligence fields. We solve this problem on an attributed citation network, which includes node content information (text of published papers) and paper citation relations. By formulating the problem as a semi-supervised multi-label classification one, we develop an efficient deep generative model for learning from both the document content and citation relations. The evaluation on a real-world dataset shows that our proposed model outperforms baseline methods. We then apply the model further to reveal the top-k frequently cited datasets in selected areas and report interesting findings.
AB - Finding popular datasets to work on is essential for data-driven research domains. In this paper, we focus on the problem of extracting top-k popular datasets that have been used in data mining, machine learning, and artificial intelligence fields. We solve this problem on an attributed citation network, which includes node content information (text of published papers) and paper citation relations. By formulating the problem as a semi-supervised multi-label classification one, we develop an efficient deep generative model for learning from both the document content and citation relations. The evaluation on a real-world dataset shows that our proposed model outperforms baseline methods. We then apply the model further to reveal the top-k frequently cited datasets in selected areas and report interesting findings.
UR - http://hdl.handle.net/10754/631711
UR - https://ieeexplore.ieee.org/document/8621957
UR - http://www.scopus.com/inward/record.url?scp=85062624116&partnerID=8YFLogxK
U2 - 10.1109/BigData.2018.8621957
DO - 10.1109/BigData.2018.8621957
M3 - Conference contribution
SN - 9781538650356
SP - 584
EP - 593
BT - 2018 IEEE International Conference on Big Data (Big Data)
PB - Institute of Electrical and Electronics Engineers (IEEE)
ER -