Large Language Models for UML Class Diagram Modeling: A Preliminary Empirical Evaluation
Yong Cheng, You Huang, Shixin Yao, Yue Qian, Liang Zhang, Xueyin Fang, Tianjun WuUML class diagram modeling is a fundamental task in software engineering, yet the application of large language models (LLMs) to this domain remains underexplored. Existing studies predominantly focus on single closed-source models with simple prompting strategies, lacking systematic comparisons across model types, prompt engineering techniques, and iterative refinement approaches. In this paper, we construct a difficulty-stratified dataset of 30 UML class diagram exercises and propose an automated weighted evaluation metric over generated PlantUML code—both of which are rarely constructed and systematically applied in existing LLM-driven UML modeling research. We present a preliminary empirical evaluation comparing open-source and closed-source LLMs across multiple scales and types, diverse prompting strategies, and varying requirement complexity levels. Beyond the single-round static paradigm of prior work, we further introduce and evaluate iterative prompting schemes that continuously improve model outputs through structured feedback. Our findings reveal that chain-of-thought prompting has different effects on improving the quality of different models, that relationship modeling is the persistent bottleneck under increasing complexity, and attribute extraction remains a largely unsolved technical challenge across all tested LLMs. Further, automated feedback-driven iterative refinement yields varied improvements: it brings notable performance gains for reasoning-oriented thinking models while delivering only marginal promotion for high-performance general chat models. These results provide actionable guidance for practitioners and researchers applying LLMs to UML modeling tasks.