Abstract:
Conventional reinforcement learning methods are prone to triggering dangerous states due to random exploration in the early training stage, which limits their application to industrial process control characterized by strong coupling, strong nonlinearity, and strict safety constraints. This paper proposes a multi-agent continuous dynamic strategy planning method with a safety mechanism for the multivariable continuous control problem of an ethylene glycol distillation column. In this method, the reboiler steam flow rate, overhead reflux flow rate, bottom product draw flow rate, side draw flow rate, and post-cooling side draw temperature are modeled as actions of multiple cooperative agents. A safety mechanism based on the column bottom level and ethylene glycol purity is introduced. Both the original policy actions and the safety-corrected executed actions are stored in the replay buffer. Under the centralized training with decentralized execution framework, relative entropy regularization and safety intervention information are used to constrain subsequent policy updates. Simulation results on the ethylene glycol distillation column show that, after 40 days of training, the proposed method increases product output by 5.72% and average net profit by 49.26% compared with human operation, while maintaining product purity within the quality constraint. The proposed method improves economic benefits and learning stability in chemical process control, while ensuring operational safety during early-stage random exploration