TY - GEN
T1 - Relationship proposal networks
AU - Zhang, Ji
AU - Elhoseiny, Mohamed
AU - Cohen, Scott
AU - Chang, Walter
AU - Elgammal, Ahmed
N1 - Generated from Scopus record by KAUST IRTS on 2019-11-20
PY - 2017/11/6
Y1 - 2017/11/6
N2 - Image scene understanding requires learning the relationships between objects in the scene. A scene with many objects may have only a few individual interacting objects (e.g., in a party image with many people, only a handful of people might be speaking with each other). To detect all relationships, it would be inefficient to first detect all individual objects and then classify all pairs; not only is the number of all pairs quadratic, but classification requires limited object categories, which is not scalable for real-world images. In this paper we address these challenges by using pairs of related regions in images to train a relationship proposer that at test time produces a manageable number of related regions. We name our model the Relationship Proposal Network (Rel-PN). Like object proposals, our Rel-PN is class-agnostic and thus scalable to an open vocabulary of objects. We demonstrate the ability of our Rel-PN to localize relationships with only a few thousand proposals. We demonstrate its performance on Visual Genome dataset and compare to other baselines that we designed. We also conduct experiments on a smaller subset of 5,000 images with over 37,000 related regions and show promising results.
AB - Image scene understanding requires learning the relationships between objects in the scene. A scene with many objects may have only a few individual interacting objects (e.g., in a party image with many people, only a handful of people might be speaking with each other). To detect all relationships, it would be inefficient to first detect all individual objects and then classify all pairs; not only is the number of all pairs quadratic, but classification requires limited object categories, which is not scalable for real-world images. In this paper we address these challenges by using pairs of related regions in images to train a relationship proposer that at test time produces a manageable number of related regions. We name our model the Relationship Proposal Network (Rel-PN). Like object proposals, our Rel-PN is class-agnostic and thus scalable to an open vocabulary of objects. We demonstrate the ability of our Rel-PN to localize relationships with only a few thousand proposals. We demonstrate its performance on Visual Genome dataset and compare to other baselines that we designed. We also conduct experiments on a smaller subset of 5,000 images with over 37,000 related regions and show promising results.
UR - http://ieeexplore.ieee.org/document/8100038/
UR - http://www.scopus.com/inward/record.url?scp=85041918005&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2017.555
DO - 10.1109/CVPR.2017.555
M3 - Conference contribution
SN - 9781538604571
BT - Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
PB - Institute of Electrical and Electronics Engineers Inc.
ER -