Customer Baseline Credibility in Constrained Reinforcement Learning for Incentive-Based Demand Response
Jiyong Li, Kaiyue WangIncentive-based demand response is an important flexibility resource for power systems with high-renewable energy penetration. However, practical incentive allocation depends not only on flexible capacity and user response uncertainty, but also on the credibility of customer baseline load (CBL), which directly affects response measurement, verification, and incentive settlement. To address this issue, this paper proposes a constrained reinforcement learning method with customer baseline credibility for dynamic resource allocation in incentive-based demand response. Based on user-side load measurements and demand response event records, the proposed framework evaluates user resources using flexible capacity, response reliability, response cost, and CBL credibility. The CBL credibility score reflects the measurement quality of the delivered response and is used as a pre-event allocation factor. Users are then grouped into different resource levels, and a group-level reinforcement learning agent dynamically determines incentive multipliers and response task allocation ratios. To improve feasibility, an action correction module revises raw policy outputs under budget, price, response capacity, and CBL risk constraints before implementation. Case studies are conducted using public industrial demand response measurements and open electricity-system time-series data. The results show that the proposed CBL-CRL method reduces the normalized total operating cost to 0.897, reduces the response tracking error to 0.108, and lowers CBL risk exposure to 0.087 under the normal scenario. Relative to the No-DR reference, CBL-CRL reduces the normalized total operating cost by 10.3 percent. Compared with MAPPO, the strongest learning-based baseline, CBL-CRL reduces the response tracking error by 10.7 percent and the CBL risk exposure by 40.8 percent, while maintaining the same renewable accommodation rate of 0.970. Compared with rule-based and learning-based baselines, CBL-CRL achieves a better balance between operational performance, incentive efficiency, action feasibility, and baseline-related settlement reliability. The results demonstrate that CBL credibility should not only be used for post-event settlement, but can also serve as an effective pre-event resource allocation factor for measurement-driven demand response programs.