DOI: 10.3390/electronics15132800 ISSN: 2079-9292

Adaptive Content and Style Fusion for Text-to-Image Generations

Yi-Fang Lee, Chun-Chieh Lee, Chi-Hung Chuang, Chih-Lung Lin, Kuo-Chin Fan

Text-to-image generation aims to produce images that match the semantic content of a text prompt. In style transfer tasks, the model must further integrate reference styles while preserving prompt semantics. However, balancing semantic consistency and style fidelity remains challenging. Existing methods commonly rely on fixed feature weights and lack adaptive control, which often leads to style over-injection and content distortion. To address these issues, we propose a novel framework that performs dynamic regulation at both the feature and temporal levels. At the feature level, we propose an Entropy-Aware Adaptive Fusion (EAAF) module. It incorporates a bidirectional distribution transformation mechanism to enhance the statistical correlation between content and style features. The module further uses information entropy as a dynamic control signal to adaptively adjust the strength of style injection, thereby achieving a balance between semantic consistency and style fidelity. At the temporal level, we design a Progressive Feature Reweighting (PFR) strategy. By applying stage-wise weighting to content and style features at different diffusion steps, this strategy effectively improves structural stability and color consistency. In addition, our framework is modular and can be integrated into existing diffusion-based style transfer models without additional fine-tuning or retraining. Experimental results demonstrate that applying our approach to current state-of-the-art models, such as StyleStudio and CSGO, significantly enhances their performance, particularly in maintaining strong prompt alignment while achieving high-fidelity style transfer.

More from our Archive