Fast Fourier Asymmetric Context Aggregation Network: A Controllable Photo-Realistic Clothing Image Synthesis Method Using Asymmetric Context Aggregation Mechanism
Haopeng Lei, Ying Hu, Mingwen Wang, Meihai Ding, Zhen Li, Guoliang LuoClothing image synthesis has emerged as a crucial technology in the fashion domain, enabling designers to rapidly transform creative concepts into realistic visual representations. However, the existing methods struggle to effectively integrate multiple guiding information sources, such as sketches and texture patches, limiting their ability to precisely control the generated content. This often results in issues such as semantic inconsistencies and the loss of fine-grained texture details, which significantly hinders the advancement of this technology. To address these issues, we propose the Fast Fourier Asymmetric Context Aggregation Network (FCAN), a novel image generation network designed to achieve controllable clothing image synthesis guided by design sketches and texture patches. In the FCAN, we introduce the Asymmetric Context Aggregation Mechanism (ACAM), which leverages multi-scale and multi-stage heterogeneous features to achieve efficient global visual context modeling, significantly enhancing the model’s ability to integrate guiding information. Complementing this, the FCAN also incorporates a Fast Fourier Channel Dual Residual Block (FF-CDRB), which utilizes the frequency-domain properties of Fast Fourier Convolution to enhance fine-grained content inference while maintaining computational efficiency. We evaluate the FCAN on the newly constructed SKFashion dataset and the publicly available VITON-HD and Fashion-Gen datasets. The experimental results demonstrate that the FCAN consistently generates high-quality clothing images aligned with the design intentions while outperforming the baseline methods across multiple performance metrics. Furthermore, the FCAN demonstrates superior robustness to varying texture conditions compared to the existing methods, highlighting its adaptability to diverse real-world scenarios. These findings underscore the potential of the FCAN to advance this technology by enabling controllable and high-quality image generation.