A Community Multi-Building Energy Management Method Based on Multi-Head Attention-Enhanced Multi-Agent Proximal Policy Optimization
Xiaoyuan Fu, Li Huang, Weiwei Du, Yuqi JinCommunity multi-building energy management is a key approach for reducing carbon emissions from the building sector and alleviating peak grid pressure. However, load coupling among buildings and coordinated energy-storage operation make control-policy design highly challenging. To address the limitation of the standard multi-agent proximal policy optimization (MAPPO) algorithm, in which the centralized critic simply concatenates building observations and therefore struggles to model inter-building interactions, this paper proposes an improved MAPPO algorithm with a multi-head-attention-enhanced centralized critic, referred to as a multi-head-attention MAPPO (MHA-MAPPO). Without changing the decentralized execution framework, the proposed method improves the critic network in three aspects. First, a dual-branch gated embedding module is designed to adaptively fuse local building observations and global interaction information. Second, an interaction-attention path is constructed to explicitly capture pairwise dependencies among buildings through multi-head attention. Third, a context-attention path is introduced to extract high-level community-level global features by means of learnable query vectors. These improvements enable the critic to estimate the joint-state value more accurately and provide more reliable advantage estimates for all agents. Experiments in the CityLearn environment show that, compared with the original MAPPO, MHA-MAPPO improves the mean evaluation reward by approximately 19.2%, reduces the reward standard deviation by one order of magnitude, and decreases peak net load and total net load by approximately 15.4% and 35.5%, respectively. The results verify the effectiveness of multi-head attention for coordinated multi-building scheduling. The proposed method provides a useful reference for improving multi-agent reinforcement learning algorithms in community energy management.