Adversarial examples are among the most widespread attacks in adversarial machine learning. In this work, we define new targeted and non-targeted attacks that are computationally less expensive than standard adversarial attacks. Besides practical purposes in some scenarios, these attacks can improve our understanding of the robustness of machine learning models. Moreover, we introduce a new training scheme to improve the performance of pre-trained neural networks and defend against our attacks. We examine the differences between our method, standard training, and standard adversarial training on pre-trained models. We find that our method protects the networks better against our attacks. Furthermore, unlike usual adversarial training, which reduces standard accuracy when applied to previously trained networks, our method maintains and sometimes even improves standard accuracy.
|Date made available
|KAUST Research Repository