DOI: 10.3390/jmse14131202 ISSN: 2077-1312

Learnable Two-Stage Frequency-Guided Front-End for Robust Underwater Acoustic Target Recognition

Heng Zhang, Haiyan Wang, Shaodong Zhang, Yongsheng Yan

Underwater acoustic target recognition is a key enabling technology for marine environmental monitoring, resource exploration, and maritime defense. However, ship-radiated noise is strongly affected by nonstationary propagation, multipath interference, frequency-dependent attenuation, and time-varying ambient noise, making robust acoustic representation a central challenge. Most existing deep learning-based recognition methods still rely on fixed time–frequency front-ends, such as Mel-spectrograms or wavelet scattering representations, whose analysis parameters are predefined and cannot be adapted to the recognition objective. This separation between acoustic preprocessing and deep model optimization may weaken narrowband tonal components, modulation-related structures, or weak target-dependent frequency patterns before they are processed by the classifier. To address this limitation, this paper proposes a Learnable Two-Stage Frequency-Guided Front-End, termed LTFF, for underwater acoustic signal recognition. In the first stage, LTFF performs frequency-guided discrimination by decomposing the input waveform into frequency-dependent components and applying learnable complex Gabor filters, whose center frequencies and bandwidths are optimized through end-to-end training. In the second stage, squared-modulus envelope extraction, learnable Gaussian pooling, and per-channel energy normalization are integrated to improve local temporal stability and attenuate slowly varying energy fluctuations. In this way, LTFF establishes a trainable connection between physically meaningful underwater acoustic structures and downstream deep recognition models. Experiments on two public underwater acoustic datasets with convolutional and Transformer-based backbones show that LTFF provides a task-adaptive alternative to fixed acoustic front-ends and is particularly compatible with convolutional architectures. These results indicate that learnable frequency-guided front-ends are promising for improving robust underwater acoustic recognition under dynamic marine conditions.

More from our Archive