Download PDF

Deepfake Audio Detection Using Ensemble CNN-BiLSTM Architecture

Author : Dr. J Sharmila Joseph, Anaswar L, Harshwardhan Hande and Soham Dange

Abstract :

A new approach uses deep learning to spot fake audio made by artificial intelligence. Instead of one single network, it combines convolutional neural nets with bidirectional long short-term memory models working together. Features pulled from sound include things like mel-frequency cepstral coefficients, mel-spectrograms, chroma data, and how pitch changes over time. While the CNN looks at mel-spectrograms to find fixed visual-like patterns in the audio, the Bi-LSTM focuses on sequences of MFCCs. Temporal quirks common in computer-made voices are caught through this sequence analysis. Together, these parts help tell real speech apart from synthetic versions.

Keywords :

Multimodal Feature Fusion, Hybrid Deep Learning, Audio Anti-Spoofing Countermeasures.