Publications

Real-World Audio Deepfake Detection Using SSL-Based Speech Models and Diverse Training Data

AuthorSchäfer, Karla; Neu, Matthias
Date2025
TypeConference Paper
AbstractThe potential for audio deepfakes to be used for malevolent purposes is increasing in line with advances in artificial intelligence and synthesis methods. With this, the need for reliable audio deepfake detectors increased. Most audio deepfake detectors comprise two principal components: a frontend, which is responsible for feature extraction, and a back-end, which performs the classification. Self-supervised learning (SSL) based front-ends are right now the most promising when faced with real-world data. We tested different combinations of six SSL-based front-ends and four back-ends, i.e. classifiers, using nine variously combined training sets, enabling the inclusion of the majority of the currently available training sets for audio deepfake detection. The combination of Wav2Vec2.0 XLS-R (2b) as the front-end and GF as the back-end performed best with an EER of 0.73% on the in-the-wild dataset, outperforming the current SOTA. Furthermore, our findings highlighted the significance of training set constellations and the utilisation of large front-ends.
ConferenceInternational Conference on Tools with Artificial Intelligence 2025
Urlhttps://publica.fraunhofer.de/handle/publica/506047