DOI: 10.1002/cpe.70836 ISSN: 1532-0626

A Comprehensive Survey on Deep Learning‐Based Infrared and Visible Image Fusion

Hai Zhou, Xuhui Zhang, Wei Gan, Tiantian Li, Jiazheng Li, Junhong Zhan, Lianglun Cheng, Zhuowei Wang

ABSTRACT

Infrared and visible image fusion (IVIF) aims to integrate complementary information from heterogeneous sensors into a unified representation, thereby enhancing scene perception and facilitating downstream vision tasks such as object detection and semantic segmentation. In recent years, deep learning has significantly advanced IVIF research, with architectures including convolutional neural networks (CNN), autoencoders (AE), generative adversarial networks (GAN), and Transformers driving rapid methodological evolution. In this study, we establish a unified and reproducible benchmarking framework for IVIF and provide a systematic evaluation of representative methods proposed between 2019 and 2025. Specifically, existing approaches are categorized into five architectural paradigms: CNN‐based, AE‐based, GAN‐based, Transformer‐based, and hybrid models and their design principles and development trends are comprehensively analyzed. Furthermore, widely used datasets and evaluation metrics are standardized to ensure fair and consistent comparisons. Under the proposed evaluation protocol, 18 state‐of‐the‐art (SOTA) methods are assessed both qualitatively and quantitatively across multiple benchmarks. In addition to fusion quality, computational efficiency indicators including model size, FLOPs, and inference speed are incorporated to provide an in‐depth analysis of performance efficiency trade‐offs. Extensive experimental results reveal the strengths and limitations of different architectural paradigms and provide practical insights for real‐world deployment. Finally, we discuss current challenges in IVIF research and outline promising future directions.

More from our Archive