Relationship proposal networks

Ji Zhang, Mohamed Elhoseiny, Scott Cohen, Walter Chang, Ahmed Elgammal

Research output: Chapter in Book/Report/Conference proceedingConference contribution

82 Scopus citations

Abstract

Image scene understanding requires learning the relationships between objects in the scene. A scene with many objects may have only a few individual interacting objects (e.g., in a party image with many people, only a handful of people might be speaking with each other). To detect all relationships, it would be inefficient to first detect all individual objects and then classify all pairs; not only is the number of all pairs quadratic, but classification requires limited object categories, which is not scalable for real-world images. In this paper we address these challenges by using pairs of related regions in images to train a relationship proposer that at test time produces a manageable number of related regions. We name our model the Relationship Proposal Network (Rel-PN). Like object proposals, our Rel-PN is class-agnostic and thus scalable to an open vocabulary of objects. We demonstrate the ability of our Rel-PN to localize relationships with only a few thousand proposals. We demonstrate its performance on Visual Genome dataset and compare to other baselines that we designed. We also conduct experiments on a smaller subset of 5,000 images with over 37,000 related regions and show promising results.
Original languageEnglish (US)
Title of host publicationProceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Print)9781538604571
DOIs
StatePublished - Nov 6 2017
Externally publishedYes

Fingerprint

Dive into the research topics of 'Relationship proposal networks'. Together they form a unique fingerprint.

Cite this