DOI: 10.3390/electronics15132764 ISSN: 2079-9292

Asymmetric Spectral Filtering and Behavior-Guided Graph Convolution for Multimodal Recommendation

Ganglong Duan, Yi Yao, Zhiqiang Ji, Tianqiao Gong, Jun Yan

Multimodal recommender systems are challenged by heterogeneous modality noise and coarse-grained feature fusion. Specifically, existing frequency-domain methods typically apply symmetric filtering across modalities, ignoring their distinct spectral characteristics. Consequently, symmetric filtering cannot simultaneously satisfy the denoising requirements of visual features and the semantic preservation requirements of textual features, leading to suboptimal multimodal representations. Meanwhile, current fusion strategies mainly operate at the instance level with static modality weights, lacking flexibility to dynamically adjust feature channels for user-specific collaborative contexts. To address these issues, this paper proposes MFA-GCN, a multimodal recommendation framework that combines asymmetric spectral filtering, multiview graph enhancement, and behavior-guided channel attention. For visual modalities, a multiscale frequency-domain module integrating 1D convolution and self-attention is adopted to suppress high-frequency disturbances while preserving informative structures. For textual modalities, a lightweight complex-domain scaling strategy is introduced to adjust spectral energy while maintaining semantic consistency. In addition, auxiliary user–user and item–item graphs are constructed to supplement sparse user–item interactions and provide richer collaborative signals. A behavior-guided channel attention mechanism is further used to dynamically refine multimodal representations. Experiments on three public Amazon datasets demonstrate that MFA-GCN consistently outperforms several representative baselines.

More from our Archive