A large language model persona-based framework for optimizing text-to-image prompts in fashion design applications
Minsuk Kim, Seungju LimThis study presents a systematic framework that optimizes text-to-image generation prompts through Large Language Model (LLM) personas in fashion design applications. While generative models like Stable Diffusion show significant creative potential, prompt engineering remains challenging for domain experts lacking technical expertise. We developed a systematic five-stage methodology to optimize text-to-image prompts. First, we generate prompts using different AI personas with varying expertise. Then we create images, evaluate their quality, identify weaknesses, and optimize the prompts accordingly. Our optimized prompts demonstrated significant improvements over persona-based approaches across multiple evaluation dimensions. The Multi-expert persona achieved the highest baseline performance (9.11/11 points), which our optimization process enhanced to 10.05 points—a statistically significant 10.3% improvement (p<0.01). Optimized prompts significantly outperformed all persona approaches in requirement implementation and showed superior performance in human preference assessments. The optimized prompts achieved maximum CLIP scores of 0.9043 and ImageReward scores of 1.7452, demonstrating peak performance advantages across all metrics. In head-to-head comparisons, optimized prompts secured first-place rankings in 50% of human preference evaluations, significantly exceeding the 20% random expectation. This framework bridges the gap between language models and image generation systems, enabling fashion professionals to achieve consistent, high-quality AI-generated designs without prompt engineering expertise, thereby accelerating creative workflows and reducing design iteration time.