DOI: 10.3390/info17070641 ISSN: 2078-2489

MMTR: Strategy-Guided Multimodal Table Reasoning with Reflective Self-Correction

Lixin Bai, Yibo Ming, Yanmin Chen

Although multimodal large language models (MLLMs) have achieved remarkable progress in visual question answering, they remain limited in tabular tasks that require fine-grained structured information perception and complex logical reasoning. This limitation primarily stems from the high density of structured information inherent in tables and the scarcity of high-quality instruction tuning data. To address these challenges and improve the model’s reasoning accuracy in tables, we propose MMTR, a strategy-guided multimodal table reasoning method with reflective self-correction. Mechanistically, we design a dual-LoRA architecture: the Strategy LoRA is responsible for generating structured reasoning steps, while the Reflection LoRA verifies and self-corrects these initial outputs. Their synergy empowers the model with a closed-loop capability of “reasoning–reflection–correction”. On the data front, we construct StrTab-QA, a large-scale dataset comprising question-answering, negative, and reflection samples, providing diverse supervision signals. During training, we further introduce a progressive “reasoning-to-reflection” fine-tuning strategy to gradually achieve cross-modal alignment and structural adaptation. Furthermore, coupled with an adaptive resizing and padding scheme, our approach effectively preserves table structures and minimizes information distortion during visual encoding. Extensive experiments demonstrate that MMTR consistently outperforms strong baselines across multiple table reasoning benchmarks.

More from our Archive