TY - GEN
T1 - A Novel Framework for Robustness Analysis of Visual QA Models
AU - Huang, Jia-Hong
AU - Dao, Cuong Duc
AU - Alfadly, Modar
AU - Ghanem, Bernard
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: This work was supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research and used the resources of the Supercomputing Laboratory at KAUST in Thuwal, Saudi Arabia.
PY - 2019/7/17
Y1 - 2019/7/17
N2 - Deep neural networks have been playing an essential role in many computer vision tasks including Visual Question Answering (VQA). Until recently, the study of their accuracy was the main focus of research but now there is a trend toward assessing the robustness of these models against adversarial attacks by evaluating their tolerance to varying noise levels. In VQA, adversarial attacks can target the image and/or the proposed main question and yet there is a lack of proper analysis of the later. In this work, we propose a flexible framework that focuses on the language part of VQA that uses semantically relevant questions, dubbed basic questions, acting as controllable noise to evaluate the robustness of VQA models. We hypothesize that the level of noise is negatively correlated to the similarity of a basic question to the main question. Hence, to apply noise on any given main question, we rank a pool of basic questions based on their similarity by casting this ranking task as a LASSO optimization problem. Then, we propose a novel robustness measure Rscore and two largescale basic question datasets (BQDs) in order to standardize robustness analysis for VQA models.
AB - Deep neural networks have been playing an essential role in many computer vision tasks including Visual Question Answering (VQA). Until recently, the study of their accuracy was the main focus of research but now there is a trend toward assessing the robustness of these models against adversarial attacks by evaluating their tolerance to varying noise levels. In VQA, adversarial attacks can target the image and/or the proposed main question and yet there is a lack of proper analysis of the later. In this work, we propose a flexible framework that focuses on the language part of VQA that uses semantically relevant questions, dubbed basic questions, acting as controllable noise to evaluate the robustness of VQA models. We hypothesize that the level of noise is negatively correlated to the similarity of a basic question to the main question. Hence, to apply noise on any given main question, we rank a pool of basic questions based on their similarity by casting this ranking task as a LASSO optimization problem. Then, we propose a novel robustness measure Rscore and two largescale basic question datasets (BQDs) in order to standardize robustness analysis for VQA models.
UR - http://hdl.handle.net/10754/662272
UR - http://aaai.org/ojs/index.php/AAAI/article/view/4861
U2 - 10.1609/aaai.v33i01.33018449
DO - 10.1609/aaai.v33i01.33018449
M3 - Conference contribution
SP - 8449
EP - 8456
BT - Proceedings of the AAAI Conference on Artificial Intelligence
PB - Association for the Advancement of Artificial Intelligence (AAAI)
ER -