MMFN: MLLMs-guided Multi-source Information Fusion Network for Multimodal Fake News Detection
Ruihao Zhang, Shujuan Ji, Jiandong Lv, Ning Li, Haojie LiNowadays, the wide spread of false information has brought great harm to society, thus the demand for Multimodal Fake News Detection (MFND) is becoming increasingly urgent. Currently, traditional isolated trained detectors face challenges in directly acquiring open-world facts. The advent of Multi-modal Large Language Models (MLLMs) offers one potential solution to this challenge. In this paper, we first investigate the potential of MLLMs in MFND and find that: (1) their accuracy in detecting Fake News is significantly lower than traditional detectors; (2) although MLLMs can generate reasoning grounds that are highly related to human cognition, there are still some problems in their analysis process, such as missing key information and logical faults. Based on these findings, we propose that current MLLMs cannot directly replace conventional detectors but can provide them with evidence and knowledge from multiple perspectives. Based on this proposal, we design a MLLMs-guided Multi-source Information Fusion Network (MMFN) for Multimodal Fake News Detection. In MLLMs, a layer-by-layer human cognitive path is simulated to provide reasoning analysis and relevant background knowledge for MFND. Simultaneously, a Fine-grained Clues Extraction (FCE) module that combines attention and uncertainty reasoning is designed to capture both similar clues and ambiguous clues. Finally, an Uncertainty-Driven Adaptive Fusion Network (UAFN) is designed to adaptively mine key information and perform weighting of information at different levels. The experimental results verified on four popular fake news datasets demonstrate the superiority of our method.