TY - GEN
T1 - Deep Recurrent Belief Propagation Network for POMDPs
AU - Wang, Yuhui
AU - Tan, Xiaoyang
N1 - Funding Information:
This work is partially supported by National Science Foundation of China (61732006,61976115), AI+ Project of NUAA(XZA20005,56XZA18009), and research project(315025305), Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX19 0195). We would also like to thank the anonymous reviewers, for offering thoughtful comments and helpful advice on earlier versions of this work.
Publisher Copyright:
Copyright © 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2021
Y1 - 2021
N2 - In many real-world sequential decision-making tasks, especially in continuous control like robotic control, it is rare that the observations are perfect, that is, the sensory data could be incomplete, noisy or even dynamically polluted due to the unexpected malfunctions or intrinsic low quality of the sensors. Previous methods handle these issues in the framework of POMDPs and are either deterministic by feature memorization or stochastic by belief inference. In this paper, we present a new method that lies somewhere in the middle of the spectrum of research methodology identified above and combines the strength of both approaches. In particular, the proposed method, named Deep Recurrent Belief Propagation Network (DRBPN), takes a hybrid style belief updating procedure - an RNN-type feature extraction step followed by an analytical belief inference, significantly reducing the computational cost while faithfully capturing the complex dynamics and maintaining the necessary uncertainty for generalization. The effectiveness of the proposed method is verified on a collection of benchmark tasks, showing that our approach outperforms several state-of-the-art methods under various challenging scenarios.
AB - In many real-world sequential decision-making tasks, especially in continuous control like robotic control, it is rare that the observations are perfect, that is, the sensory data could be incomplete, noisy or even dynamically polluted due to the unexpected malfunctions or intrinsic low quality of the sensors. Previous methods handle these issues in the framework of POMDPs and are either deterministic by feature memorization or stochastic by belief inference. In this paper, we present a new method that lies somewhere in the middle of the spectrum of research methodology identified above and combines the strength of both approaches. In particular, the proposed method, named Deep Recurrent Belief Propagation Network (DRBPN), takes a hybrid style belief updating procedure - an RNN-type feature extraction step followed by an analytical belief inference, significantly reducing the computational cost while faithfully capturing the complex dynamics and maintaining the necessary uncertainty for generalization. The effectiveness of the proposed method is verified on a collection of benchmark tasks, showing that our approach outperforms several state-of-the-art methods under various challenging scenarios.
UR - http://www.scopus.com/inward/record.url?scp=85130019801&partnerID=8YFLogxK
U2 - 10.1609/aaai.v35i11.17227
DO - 10.1609/aaai.v35i11.17227
M3 - Conference contribution
AN - SCOPUS:85130019801
T3 - 35th AAAI Conference on Artificial Intelligence, AAAI 2021
SP - 10236
EP - 10244
BT - 35th AAAI Conference on Artificial Intelligence, AAAI 2021
PB - Association for the Advancement of Artificial Intelligence
T2 - 35th AAAI Conference on Artificial Intelligence, AAAI 2021
Y2 - 2 February 2021 through 9 February 2021
ER -