Introduction to Audio Deepfake Generation: Academic Insights for Non-Experts

AbstraktWith the advancement of artificial intelligence, the methods for generating audio deepfakes have improved, but the technology behind it has become more complex. Despite this, non-expert users are able to generate audio deepfakes due to the increased accessibility of the latest technologies. These technologies can be used to support content creators, singers, and businesses such as the advertisement or entertainment industries. However, they can also be misused to create disinformation, CEO fraud, and voice scams. Therefore, with the increasing demand for countermeasures against their misuse, continuous interdisciplinary exchange is required. This work introduces recent techniques for generating audio deepfakes, with a focus on Text-to-Speech Synthesis and Voice Conversion for non-experts. It covers background knowledge, the latest trends and models, as well as open-source and closed-source software to explore both technological and practical aspects of audio deepfakes.
