Abstract
3D scene perception that provides semantic interpretation for each point has a wide range of practical scenarios. However, existing methods are generally implemented based on 3D point cloud. In this work, we investigate the inverse projection problem from 2D plane image to 3D scene perception and propose a multi-task model to jointly predict height information and semantic categories. Taking into account the commonalities and differences between the two tasks, we have carefully designed the multi-task model to explicitly strengthen the establishment of cross-task correlations. Specifically, the calibration refinement attention (CRA) module is proposed before the classifier heads, incorporating the beneficial information while filtering the inconsistent characteristics among the two tasks. Besides, a spatial structure enhanced (SSE) module is introduced to integrate the spatial structure information into the output features of CRA module through skip-connection. After that, the neighboring pixel affinity (NPA) loss and the soft weighted ordinal (SWO) classification loss for the two tasks are introduced to optimize the direction of the task gradients. At the same time, to validate the effectiveness of the proposed method, a novel metric named height constrained semantic accuracy (HCSA) is proposed to consider the accuracy of semantic segmentation and height estimation jointly. Extensive experiments on Vaihingen, Potsdam, and DFC2019 demonstrate that the proposed method achieves generalization and competitive performance.
Original language | English (US) |
---|---|
Pages (from-to) | 233-249 |
Number of pages | 17 |
Journal | ISPRS Journal of Photogrammetry and Remote Sensing |
Volume | 195 |
DOIs | |
State | Published - Jan 1 2023 |
Externally published | Yes |
ASJC Scopus subject areas
- Engineering (miscellaneous)
- Atomic and Molecular Physics, and Optics
- Computers in Earth Sciences
- Computer Science Applications
- Geography, Planning and Development