Transferrability of Adversarial Attacks from Convolutional Neural Networks to ChatGPT4

AuthorBunzel, Niklas
AbstractThis research evaluates the ability of adversarial attacks, primarily designed for CNN-based classifiers, to target the multimodal image captioning tasks executed by large multimodal models (LMMs), such as ChatGPT4. The study included several attacks, with a particular emphasis on the Projected Gradient Descent (PGD) attack, considering various parameters, surrogate models, and datasets. Initial but limited experiments support the hypothesis that PGD attacks are transferable to ChatGPT. Subsequently, results demonstrated that PGD attacks could be adaptively transferred to disrupt the normal functioning of ChatGPT. On the other hand, other adversarial attack strategies showed a limited ability to compromise ChatGPT. These findings provide insights into the security vulnerabilities of emerging neural network architectures used for generative AI. Moreover, they underscore the possibility of cost-effectively crafting adversarial examples against novel architectures, necessitating the development of robust defense mechanisms for LMMs in practical applications.