TY - GEN
T1 - Tagging Like Humans: Diverse and Distinct Image Annotation
AU - Wu, Baoyuan
AU - Chen, Weidong
AU - Sun, Peng
AU - Liu, Wei
AU - Ghanem, Bernard
AU - Lyu, Siwei
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: This work is supported by Tencent AI Lab. The participation of Bernard Ghanem is supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research. The participation of Siwei Lyu is partially supported by National Science Foundation National Robotics Initiative (NRI) Grant (IIS-1537257) and National Science Foundation of China Project Number 61771341.
PY - 2018/12/18
Y1 - 2018/12/18
N2 - In this work we propose a new automatic image annotation model, dubbed diverse and distinct image annotation (D2IA). The generative model D2IA is inspired by the ensemble of human annotations, which create semantically relevant, yet distinct and diverse tags. In D2IA, we generate a relevant and distinct tag subset, in which the tags are relevant to the image contents and semantically distinct to each other, using sequential sampling from a determinantal point process (DPP) model. Multiple such tag subsets that cover diverse semantic aspects or diverse semantic levels of the image contents are generated by randomly perturbing the DPP sampling process. We leverage a generative adversarial network (GAN) model to train D2IA. Extensive experiments including quantitative and qualitative comparisons, as well as human subject studies, on two benchmark datasets demonstrate that the proposed model can produce more diverse and distinct tags than the state-of-the-arts.
AB - In this work we propose a new automatic image annotation model, dubbed diverse and distinct image annotation (D2IA). The generative model D2IA is inspired by the ensemble of human annotations, which create semantically relevant, yet distinct and diverse tags. In D2IA, we generate a relevant and distinct tag subset, in which the tags are relevant to the image contents and semantically distinct to each other, using sequential sampling from a determinantal point process (DPP) model. Multiple such tag subsets that cover diverse semantic aspects or diverse semantic levels of the image contents are generated by randomly perturbing the DPP sampling process. We leverage a generative adversarial network (GAN) model to train D2IA. Extensive experiments including quantitative and qualitative comparisons, as well as human subject studies, on two benchmark datasets demonstrate that the proposed model can produce more diverse and distinct tags than the state-of-the-arts.
UR - http://hdl.handle.net/10754/627545
UR - https://ieeexplore.ieee.org/document/8578929/
UR - http://www.scopus.com/inward/record.url?scp=85055520200&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2018.00831
DO - 10.1109/CVPR.2018.00831
M3 - Conference contribution
SN - 9781538664209
SP - 7967
EP - 7975
BT - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
PB - IEEE Computer [email protected]
ER -