Mining top-k Popular Datasets via a Deep Generative Model

Uchenna Thankgod Akujuobi, Ke Sun, Xiangliang Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

Finding popular datasets to work on is essential for data-driven research domains. In this paper, we focus on the problem of extracting top-k popular datasets that have been used in data mining, machine learning, and artificial intelligence fields. We solve this problem on an attributed citation network, which includes node content information (text of published papers) and paper citation relations. By formulating the problem as a semi-supervised multi-label classification one, we develop an efficient deep generative model for learning from both the document content and citation relations. The evaluation on a real-world dataset shows that our proposed model outperforms baseline methods. We then apply the model further to reveal the top-k frequently cited datasets in selected areas and report interesting findings.
Original languageEnglish (US)
Title of host publication2018 IEEE International Conference on Big Data (Big Data)
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages584-593
Number of pages10
ISBN (Print)9781538650356
DOIs
StatePublished - Jan 25 2019

Fingerprint

Dive into the research topics of 'Mining top-k Popular Datasets via a Deep Generative Model'. Together they form a unique fingerprint.

Cite this