| Abstrakt | Adversarial attacks pose a significant threat to the reliability and trustworthiness of machine learning systems, particularly in image classification tasks like deepfake detection. This chapter presents a comprehensive approach to detecting two prominent types of adversarial attacks: noise perturbation attacks and patch-based attacks. Using the ImageNet classification task as a primary use case, we investigate methods based on statistical features for identifying adversarial noise across a diverse range of attacks. These methods are designed to detect subtle changes in image distributions caused by adversarial manipulations, offering a lightweight and interpretable solution for adversarial attack detection that can be part of a multi-class detector framework. Building upon this foundation, the chapter explores the security-critical application of deepfake detection. Here, patch-based attacks are examined in depth. The proposed detection framework leverages statistical and spatial features to identify patch artifacts, ensuring robustness against these localized yet highly effective attacks. Our analysis compares the effectiveness of statistical detectors across multiple adversarial attack types and evaluates their performance in real-world scenarios. By addressing both noise perturbation and patch attacks, this chapter provides actionable insights and tools for enhancing the security of machine learning systems deployed in high-stakes applications, bridging the gap between theory and practice in adversarial defense. |
|---|