Dynamic Pipeline Scheduling for Hybrid Edge Collaboration in Multi‐Modal LLM Inference
He LiABSTRACT
Multi‐modal large language models (MLLMs) are revolutionising AI‐enabled technologies, enabling more intelligent and human‐like multimedia data analysis. The application of MLLMs has the potential to accelerate processes while reducing human resource costs. However, the substantial computational overhead of supporting MLLMs poses challenges even during inference. This paper introduces a paradigm leveraging high‐performance low‐cost edge devices to enhance MLLM processing. The proposed approach employs in‐device and across‐device collaboration to improve overall system utilization and reduce MLLM inference latency. Additionally, a sophisticated multilevel pipeline scheduling method addresses bottlenecks in the edge system. Comprehensive experimental results demonstrate that the proposed system significantly accelerates MLLM inference for multi‐modal data processing.