MLLMto3D: An MCP-Driven Closed-Loop Framework for Architectural 3D Generation
Dong Yao, Bingcheng He, Xiaoxi ZhaoMultimodal large language models can read architectural images and design instructions but they still struggle to turn architectural rules into editable, executable models in professional modeling environments. To address this limitation, this paper presents MLLMto3D, an MCP-driven closed-loop framework that connects multimodal reasoning with Rhino-based modeling, feedback, and revision. The framework consists of five phases: visual parsing, JSON-based intent serialization, code synthesis, MCP-driven Rhino execution and feedback, and verification with bounded repair. Its core mechanism is JSON-based intent serialization, which converts image-derived architectural information into machine-readable modeling parameters under a predefined JSON schema. The schema separates geometric and compositional constraints, including height, bay rhythm, facade zones, and alignment rules, from design variables such as materials, openings, and ornament. Building on this mechanism, Skills modules externalize facade typology knowledge and safe Rhino scripting patterns, providing callable professional constraints for code synthesis to reduce design-intent deviation and API hallucination. The framework is evaluated through an experimental design case study on a site in Shanghai’s Hengfu Historic District, where the generation of new façades is informed by a nearby heritage architectural reference. The results show that MLLMto3D can generate a parametrically adjustable Rhino model while preserving the main compositional constraints, thereby advancing AI-assisted architectural 3D generation toward a controllable, verifiable, and iterative modeling process.