MMTR: Strategy-Guided Multimodal Table Reasoning with Reflective Self-Correction
Lixin Bai, Yibo Ming, Yanmin ChenAlthough multimodal large language models (MLLMs) have achieved remarkable progress in visual question answering, they remain limited in tabular tasks that require fine-grained structured information perception and complex logical reasoning. This limitation primarily stems from the high density of structured information inherent in tables and the scarcity of high-quality instruction tuning data. To address these challenges and improve the model’s reasoning accuracy in tables, we propose MMTR, a strategy-guided multimodal table reasoning method with reflective self-correction. Mechanistically, we design a dual-LoRA architecture: the Strategy LoRA is responsible for generating structured reasoning steps, while the Reflection LoRA verifies and self-corrects these initial outputs. Their synergy empowers the model with a closed-loop capability of “reasoning–reflection–correction”. On the data front, we construct StrTab-QA, a large-scale dataset comprising question-answering, negative, and reflection samples, providing diverse supervision signals. During training, we further introduce a progressive “reasoning-to-reflection” fine-tuning strategy to gradually achieve cross-modal alignment and structural adaptation. Furthermore, coupled with an adaptive resizing and padding scheme, our approach effectively preserves table structures and minimizes information distortion during visual encoding. Extensive experiments demonstrate that MMTR consistently outperforms strong baselines across multiple table reasoning benchmarks.