The task of situation recognition aims to solve the visual reasoning problem with the ability to predict the activity happening (salient action) in an image and the nouns of all associated semantic roles playing in the activity. This poses severe challenges due to long-tailed data distributions and local class ambiguities. Prior works only propagate the local noun-level features on one single image without utilizing global information. We propose a Knowledge-aware Global Reasoning (KGR) framework to endow neural networks with the capability of adaptive global reasoning over nouns by exploiting diverse statistical knowledge. Our KGR is a local-global architecture, which consists of a local encoder to generate noun features using local relations and a global encoder to enhance the noun features via global reasoning supervised by an external global knowledge pool. The global knowledge pool is created by counting the pairwise relationships of nouns in the dataset. In this paper, we design an action-guided pairwise knowledge as the global knowledge pool based on the characteristic of the situation recognition task. Extensive experiments have shown that our KGR not only achieves state-of-the-art results on a large-scale situation recognition benchmark, but also effectively solves the long-tailed problem of noun classification by our global knowledge.
|Original language||English (US)|
|Number of pages||13|
|Journal||IEEE Transactions on Pattern Analysis and Machine Intelligence|
|State||Published - Jan 23 2023|