DOI: 10.3390/machines14070741 ISSN: 2075-1702

A Dual-Stream Deep Reinforcement Learning Framework for Hot Rolling Production Scheduling

Chi Wang, Wang Cao, Min Huang

Hot Rolling Production Scheduling (HRPS) is a crucial combinatorial optimization problem characterized by severe conflicts between rigid physical rolling rules and strict order due dates. While real-time scheduling is essential for dynamic manufacturing, traditional meta-heuristics suffer from severe computational time bottlenecks. Conversely, standard end-to-end Deep Reinforcement Learning (DRL) models offer rapid inference but typically struggle with spatio-temporal feature entanglement, training instability under extreme penalty landscapes, and poor zero-shot scale generalization. To bridge these gaps, this paper proposes a novel framework named Dual-Stream Group-Optimize Policy Optimization with Multiple Optima (DSGO-POMO). The framework introduces three core innovations: (1) a Dual-Stream intervention network that explicitly decouples and synergistically fuses physical attributes with temporal slacks; (2) a Group Relative Policy Optimization (GRPO) training mechanism to stabilize policy updates; and (3) an Entropy-Aware and Dual-Annealed Differential Active Search (EA-DAS) strategy to seamlessly adapt pre-trained weights to out-of-distribution scales. Extensive computational experiments validate the superiority of the proposed framework. On medium-scale instances (<!-- MathType@Translator@5@5@MathML2 (no namespace).tdl@MathML 2.0 (no namespace)@ -->

More from our Archive