TY - CHAP
T1 - TopSpin: TOPic Discovery via Sparse Principal Component INterference
AU - Takáč, Martin
AU - Ahipaşaoğlu, Selin Damla
AU - Cheung, Ngai-Man
AU - Richtarik, Peter
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: This work was partially supported by the U.S. National Science Foundation, under award number NSF:CCF:1618717, NSF:CMMI:1663256 and NSF:CCF:1740796.
PY - 2019/2/14
Y1 - 2019/2/14
N2 - We propose a novel topic discovery algorithm for unlabeled images based on the bag-of-words (BoW) framework. We first extract a dictionary of visual words and subsequently for each image compute a visual word occurrence histogram. We view these histograms as rows of a large matrix from which we extract sparse principal components (PCs). Each PC identifies a sparse combination of visual words which co-occur frequently in some images but seldom appear in others. Each sparse PC corresponds to a topic, and images whose interference with the PC is high belong to that topic, revealing the common parts possessed by the images. We propose to solve the associated sparse PCA problems using an Alternating Maximization (AM) method, which we modify for the purpose of efficiently extracting multiple PCs in a deflation scheme. Our approach attacks the maximization problem in SPCA directly and is scalable to high-dimensional data. Experiments on automatic topic discovery and category prediction demonstrate encouraging performance of our approach. Our SPCA solver is publicly available.
AB - We propose a novel topic discovery algorithm for unlabeled images based on the bag-of-words (BoW) framework. We first extract a dictionary of visual words and subsequently for each image compute a visual word occurrence histogram. We view these histograms as rows of a large matrix from which we extract sparse principal components (PCs). Each PC identifies a sparse combination of visual words which co-occur frequently in some images but seldom appear in others. Each sparse PC corresponds to a topic, and images whose interference with the PC is high belong to that topic, revealing the common parts possessed by the images. We propose to solve the associated sparse PCA problems using an Alternating Maximization (AM) method, which we modify for the purpose of efficiently extracting multiple PCs in a deflation scheme. Our approach attacks the maximization problem in SPCA directly and is scalable to high-dimensional data. Experiments on automatic topic discovery and category prediction demonstrate encouraging performance of our approach. Our SPCA solver is publicly available.
UR - http://hdl.handle.net/10754/631654
UR - http://link.springer.com/chapter/10.1007/978-3-030-12119-8_8
UR - http://www.scopus.com/inward/record.url?scp=85062075162&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-12119-8_8
DO - 10.1007/978-3-030-12119-8_8
M3 - Chapter
SN - 9783030121181
SP - 157
EP - 180
BT - Brain-Inspired Intelligence and Visual Perception
PB - Springer International Publishing
ER -