DOI: 10.3390/fi18070344 ISSN: 1999-5903

Evaluating Adversarial Robustness of Deepfake Audio Detectors and Vocoder Fingerprint Detectors Against Universal Adversarial Perturbations

Quang Minh Tran, Wei Zong, Yang-Wai Chow, Willy Susilo

Audio deepfake and vocoder fingerprint detectors are increasingly used to identify synthetic speech and attribute it to its generating model. However, their robustness against adversarial perturbations remains unclear across attack algorithms, perturbation domains, detector representations, and vocoder types. This paper presents a focused, quality-aware evaluation of four representative adversarial attacks, namely the Fast Gradient Sign Method (FGSM), Basic Iterative Method (BIM), Projected Gradient Descent (PGD), and Carlini–Wagner (CW) attack, against audio deepfake and vocoder fingerprint detectors. Each attack is implemented in both the waveform domain and the short-time Fourier transform (STFT) magnitude domain. All attacks are optimized against Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks (AASIST) under a targeted fake-to-real objective and are evaluated on synthetic speech generated by HiFi-GAN, Fullband MelGAN, StyleMelGAN, and Parallel WaveGAN. Attack performance is first measured on the source AASIST detector, after which black-box transferability is assessed on three target detector families: ResNet with Linear Frequency Cepstral Coefficient (LFCC) features, LCNN with Constant-Q Cepstral Coefficient (CQCC) features, and a bidirectional long short-term memory (BiLSTM) detector. The results show that adversarial effectiveness depends strongly on perturbation domain and detector representation. STFT-magnitude PGD transfers strongly to LFCC-based ResNet detectors but has limited effect on CQCC-based and recurrent detectors. In contrast, waveform-domain attacks produce broader transferability across feature-based detectors, with different attacks showing distinct ASR–quality trade-offs. Under the chosen waveform-domain budget, FGSM and BIM preserve transcription-level intelligibility while retaining meaningful black-box transferability, whereas CW provides the strongest overall source-detector and black-box attack performance. To distinguish effective adversarial perturbations from destructive signal degradation, we evaluate audio quality and intelligibility using word error rate (WER) and signal-to-noise ratio (SNR). Overall, the findings show that robustness claims in audio deepfake and vocoder fingerprint detection are limited when adversarial perturbations, black-box transferability, and audio quality are jointly considered.

More from our Archive