EfficientUICoder: A Bidirectional Token Compression Framework for Efficient MLLM-Based UI Code Generation

doi:10.1145/3808114

DOI: 10.1145/3808114 ISSN: 2994-970X

EfficientUICoder: A Bidirectional Token Compression Framework for Efficient MLLM-Based UI Code Generation

Jingyu Xiao, Zhongyi Zhang, Yuxuan Wan, Yintong Huo, Yang Liu, Michael R. Lyu

Multimodal Large Language Models (MLLMs) have demonstrated exceptional performance in UI2Code tasks (i.e., generating code from UI mockups), significantly enhancing website development efficiency. However, UI2Code tasks incur substantially higher computational overhead compared to traditional code generation tasks. This overhead is primarily driven by the large number of input image tokens required to represent complex visual designs and the extensive volume of output code tokens needed to describe complete webpage structures. In this paper, we conduct a comprehensive preliminary study on popular MLLMs for UI2Code tasks, identifying significant redundancies in both image and code tokens. We observe that these redundancies not only exacerbate computational complexity but also hinder the model’s ability to focus on key UI elements, leading to excessively lengthy and often invalid HTML files. To address these challenges, we propose EfficientUICoder, a bidirectional compression framework designed for efficient UI code generation. First, we introduce an Element and Layout-aware Token Compression method, which preserves essential UI element and layout information by detecting element regions and constructing a UI element tree for efficient representation. Second, we design a Region-aware Token Refinement strategy that refines selected tokens by leveraging attention scores to evaluate semantic importance, discarding low-attention tokens from selected regions while integrating high-attention tokens from unselected regions. Third, we develop an Adaptive Duplicate Token Suppression mechanism, which dynamically modulates token probabilities during decoding by tracking HTML/CSS code structure frequencies and applying exponential penalty strategies to minimize repetitive generation. Extensive experiments demonstrate that EfficientUICoder achieves a 55%-60% compression ratio without compromising the quality of the generated webpages, effectively reducing output code redundancy. In terms of efficiency, EfficientUICoder achieves superior improvements, reducing computational cost by up to 44.9%, generated tokens by up to 41.4%, prefill time by up to 46.6%, and inference time by up to 48.8% on 34B-level MLLMs. Code is available at https://github.com/WebPAI/EfficientUICoder.

Outline

EfficientUICoder: A Bidirectional Token Compression Framework for Efficient MLLM-Based UI Code Generation

More from our Archive