Improving visual question answering for bridge inspection by pre‐training with external data of image–text pairs

doi:10.1111/mice.13086

DOI: 10.1111/mice.13086 ISSN:

Improving visual question answering for bridge inspection by pre‐training with external data of image–text pairs

Thannarot Kunlamai, Tatsuro Yamane, Masanori Suganuma, Pang‐Jo Chun, Takayaki Okatani

Computational Theory and Mathematics
Computer Graphics and Computer-Aided Design
Computer Science Applications
Civil and Structural Engineering
Building and Construction

Show PDF Cite

Abstract

This paper explores the application of visual question answering (VQA) in bridge inspection using recent advancements in multimodal artificial intelligence (AI) systems. VQA involves an AI model providing natural language answers to questions about the content of an input image. However, applying VQA to bridge inspection poses challenges due to the high cost of creating training data that requires expert knowledge. To address this, we propose leveraging existing bridge inspection reports, which already include image–text pairs, as external knowledge to enhance VQA performance. Our approach involves training the model on a large collection of image–text pairs, followed by fine‐tuning it on a limited amount of training data specifically designed for the VQA task. The results demonstrate a significant improvement in VQA accuracy using this approach. These findings highlight the potential of AI models for VQA as valuable tools for assessing the condition of bridges.

Outline

Improving visual question answering for bridge inspection by pre‐training with external data of image–text pairs

Abstract

More from our Archive