DOI: 10.3390/jimaging12070279 ISSN: 2313-433X

Prompt-Guided Semantic Latent Direction Learning in Diffusion Models for Abstract Visual Concept Manipulation

Mahzaib Khalid, Fangli Ying, Al-Garadi Ahmed Mohammed Atef, Aniwat Phaphuangwittayakul, Riyad Dhuny

Diffusion-based generative models achieve high-fidelity image synthesis; however, controlling internal representations for abstract visual concepts remains challenging due to the ambiguity of textual descriptions. In this work, we propose a prompt-guided concept-vector learning framework for the controllable manipulation of such concepts without requiring external human-annotated image pairs, segmentation masks, identity labels, or manually annotated editing targets. The method introduces a learnable concept vector optimized in the bottleneck (mid-block) feature space of a pretrained Stable Diffusion U-Net, while keeping all pretrained model parameters frozen. A multi-prompt data generation strategy based on paired positive and neutral prompts provides weak semantic guidance for capturing the target concept direction and reducing dependence on a single prompt formulation. The learned vector is further applied in an image-to-image setting through controlled noise injection and concept-guided denoising, enabling the semantic modification of real images while preserving structural content. The concept strength is controlled by a scaling parameter γ, while the image-to-image noise strength is controlled by β, allowing for a practical balance between semantic modification and structural fidelity. Experiments are conducted on two main abstract concepts, perfect skin and peaceful lake, with additional qualitative analysis on subjective portrait-level concepts. Quantitative evaluation using SSIM, LPIPS, and CLIP similarity demonstrates that the proposed method improves semantic alignment while maintaining structural preservation compared with Stable Diffusion image-to-image baselines. A human preference study further shows that concept-injected outputs are preferred in 76.0% of responses for perfect skin and 85.7% for peaceful lake. Ablation studies further demonstrate the controllability and robustness of the proposed framework. Overall, the method provides a simple and parameter-efficient approach for interpretable concept-level manipulation in diffusion models.

More from our Archive