Abstrakt | This research evaluates the ability of adversarial attacks, primarily designed for CNN-based classifiers, to target the multimodal image captioning tasks executed by large vision language models, such as ChatGPT4. The study included different versions of ChatGPT4, several attacks, with a particular emphasis on the Projected Gradient Descent (PGD) attack, considering various parameters, surrogate models, and datasets. Initial but limited experiments support the hypothesis that PGD attacks are partly transferable to ChatGPT4. Subsequently, results demonstrated that PGD attacks could be adaptively transferred to disrupt the normal functioning of ChatGPT. On the other hand, other adversarial attack strategies showed a limited ability to compromise ChatGPT. These findings provide insights into the security vulnerabilities of emerging neural network architectures used for generative AI. Moreover, they underscore the possibility of cost-effectively crafting adversarial examples against novel architectures, necessitating the development of robust defense mechanisms for large vision language models in practical applications. |
---|