M3DSRec: Memory-Enhanced Multimodal Sequential Recommendation with Multiple Distillation
Zhicheng Zhou, Xiangwu Meng, Yujie ZhangMultimodal sequential recommendation extracts multimodal features from user interaction sequences to improve user sequence modeling accuracy and expressiveness. However, this field still faces two major challenges: (1) how to effectively improve multimodal features synergy in user interaction sequences, and (2) how to reduce the number of model parameters while ensuring recommendation performance. To address these challenges, this paper proposes a memory-enhanced multimodal sequential recommendation method with multiple distillation (M3DSRec), which includes four key designs: First, a parameter standardization mixture of experts is used to mitigate multimodal feature distribution imbalance and avoid inter-feature interference. Second, a main-modal cooperative block is designed to fuse features of other modalities based on the primary unimodal feature, thereby enhancing the synergy among multimodal features. Third, a memory-enhanced dynamic cluster is designed, which uses a memory bank to enhance the representation of multimodal features in user interaction sequences. Finally, a multiple distillation strategy is used to align the features and logits of the teacher and student networks at multiple levels. Experimental results on five public datasets show that M3DSRec achieves a balance between recommendation performance and the number of model parameters, with both the teacher and student networks outperforming existing mainstream methods.