TY - GEN
T1 - Deep optics for monocular depth estimation and 3D object detection
AU - Chang, Julie
AU - Wetzstein, Gordon
N1 - KAUST Repository Item: Exported on 2022-06-30
Acknowledgements: We thank Vincent Sitzmann and Mark Nishimura for insightful advice. This project was supported by an NSF CAREER Award (IIS 1553333), an Okawa Research Grant, a Sloan Fellowship, a Visual Computing Center CCF Grant of KAUST Office of Sponsored Research, and a PECASE Award (W911NF-19-1-0120).
This publication acknowledges KAUST support, but has no KAUST affiliated authors.
PY - 2020/2/27
Y1 - 2020/2/27
N2 - Depth estimation and 3D object detection are critical for scene understanding but remain challenging to perform with a single image due to the loss of 3D information during image capture. Recent models using deep neural networks have improved monocular depth estimation performance, but there is still difficulty in predicting absolute depth and generalizing outside a standard dataset. Here we introduce the paradigm of deep optics, i.e. end-to-end design of optics and image processing, to the monocular depth estimation problem, using coded defocus blur as an additional depth cue to be decoded by a neural network. We evaluate several optical coding strategies along with an end-to-end optimization scheme for depth estimation on three datasets, including NYU Depth v2 and KITTI. We find an optimized freeform lens design yields the best results, but chromatic aberration from a singlet lens offers significantly improved performance as well. We build a physical prototype and validate that chromatic aberrations improve depth estimation on real-world results. In addition, we train object detection networks on the KITTI dataset and show that the lens optimized for depth estimation also results in improved 3D object detection performance.
AB - Depth estimation and 3D object detection are critical for scene understanding but remain challenging to perform with a single image due to the loss of 3D information during image capture. Recent models using deep neural networks have improved monocular depth estimation performance, but there is still difficulty in predicting absolute depth and generalizing outside a standard dataset. Here we introduce the paradigm of deep optics, i.e. end-to-end design of optics and image processing, to the monocular depth estimation problem, using coded defocus blur as an additional depth cue to be decoded by a neural network. We evaluate several optical coding strategies along with an end-to-end optimization scheme for depth estimation on three datasets, including NYU Depth v2 and KITTI. We find an optimized freeform lens design yields the best results, but chromatic aberration from a singlet lens offers significantly improved performance as well. We build a physical prototype and validate that chromatic aberrations improve depth estimation on real-world results. In addition, we train object detection networks on the KITTI dataset and show that the lens optimized for depth estimation also results in improved 3D object detection performance.
UR - http://hdl.handle.net/10754/679475
UR - https://ieeexplore.ieee.org/document/9010976/
UR - http://www.scopus.com/inward/record.url?scp=85078707112&partnerID=8YFLogxK
U2 - 10.1109/ICCV.2019.01029
DO - 10.1109/ICCV.2019.01029
M3 - Conference contribution
SN - 9781728148038
SP - 10192
EP - 10201
BT - 2019 IEEE/CVF International Conference on Computer Vision (ICCV)
PB - IEEE
ER -