TY - GEN
T1 - ACDNet: Adaptively Combined Dilated Convolution for Monocular Panorama Depth Estimation
AU - Zhuang, Chuanqing
AU - Lu, Zhengda
AU - Wang, Yiqun
AU - Xiao, Jun
AU - Wang, Ying
N1 - KAUST Repository Item: Exported on 2023-03-01
Acknowledgements: This work is supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (No. XDA23090304), the National Natural Science Foundation of China (U2003109, U21A20515, 62102393), the Youth Innovation Promotion Association of the Chinese Academy of Sciences (Y201935), the State Key Laboratory of Robotics and Systems (HIT) (SKLRS-2022-KF-11), and the Fundamental Research Funds for the Central Universities.
PY - 2022/6/28
Y1 - 2022/6/28
N2 - Depth estimation is a crucial step for 3D reconstruction with panorama images in recent years. Panorama images maintain the complete spatial information but introduce distortion with equirectangular projection. In this paper, we propose an ACDNet based on the adaptively combined dilated convolution to predict the dense depth map for a monocular panoramic image. Specifically, we combine the convolution kernels with different dilations to extend the receptive field in the equirectangular projection. Meanwhile, we introduce an adaptive channel-wise fusion module to summarize the feature maps and get diverse attention areas in the receptive field along the channels. Due to the utilization of channel-wise attention in constructing the adaptive channel-wise fusion module, the network can capture and leverage the cross-channel contextual information efficiently. Finally, we conduct depth estimation experiments on three datasets (both virtual and real-world) and the experimental results demonstrate that our proposed ACDNet substantially outperforms the current state-of-the-art (SOTA) methods.
AB - Depth estimation is a crucial step for 3D reconstruction with panorama images in recent years. Panorama images maintain the complete spatial information but introduce distortion with equirectangular projection. In this paper, we propose an ACDNet based on the adaptively combined dilated convolution to predict the dense depth map for a monocular panoramic image. Specifically, we combine the convolution kernels with different dilations to extend the receptive field in the equirectangular projection. Meanwhile, we introduce an adaptive channel-wise fusion module to summarize the feature maps and get diverse attention areas in the receptive field along the channels. Due to the utilization of channel-wise attention in constructing the adaptive channel-wise fusion module, the network can capture and leverage the cross-channel contextual information efficiently. Finally, we conduct depth estimation experiments on three datasets (both virtual and real-world) and the experimental results demonstrate that our proposed ACDNet substantially outperforms the current state-of-the-art (SOTA) methods.
UR - http://hdl.handle.net/10754/689807
UR - https://ojs.aaai.org/index.php/AAAI/article/view/20278
U2 - 10.1609/aaai.v36i3.20278
DO - 10.1609/aaai.v36i3.20278
M3 - Conference contribution
SP - 3653
EP - 3661
BT - Proceedings of the AAAI Conference on Artificial Intelligence
PB - Association for the Advancement of Artificial Intelligence (AAAI)
ER -