具备安全机制的高效强化学习工业过程控制方法

刘光玉; 黄文俊; 张惠泽; 崔允端; 喻学锋

doi:10.12146/j.issn.2095-3135.20260518001

具备安全机制的高效强化学习工业过程控制方法

Efficient Reinforcement Learning with Safety Mechanism for Industrial Process Control

摘要

摘要: 传统强化学习方法在训练早期容易因随机探索触发危险状态，限制了其在具有强耦合、强非线性和严格安全约束等特点的工业过程控制中的应用。本文针对乙二醇精馏塔多变量连续控制问题，提出一种具备安全机制的多智能体连续动态策略规划方法。该方法将再沸器蒸汽流量、塔顶回流流量、塔釜采出流量、侧线采出流量和侧线冷却后温度建模为多个协同智能体动作，引入基于塔釜液位和乙二醇纯度的安全机制，并将原始策略动作和安全修正后的实际执行动作共同纳入经验回放池，在集中训练分散执行框架下利用相对熵正则化和安全干预信息约束后续策略更新。乙二醇精馏塔仿真实验表明，训练40天后，该方法相较于人工操作使产品产出提升5.72%，平均净利润提升49.26%，且产品纯度始终满足质量约束。所提方法能够在化工过程控制中提升经济收益的同时改善学习稳定性，并在训练早期保障随机探索过程中的运行安全。

Abstract: Conventional reinforcement learning methods are prone to triggering dangerous states due to random exploration in the early training stage, which limits their application to industrial process control characterized by strong coupling, strong nonlinearity, and strict safety constraints. This paper proposes a multi-agent continuous dynamic strategy planning method with a safety mechanism for the multivariable continuous control problem of an ethylene glycol distillation column. In this method, the reboiler steam flow rate, overhead reflux flow rate, bottom product draw flow rate, side draw flow rate, and post-cooling side draw temperature are modeled as actions of multiple cooperative agents. A safety mechanism based on the column bottom level and ethylene glycol purity is introduced. Both the original policy actions and the safety-corrected executed actions are stored in the replay buffer. Under the centralized training with decentralized execution framework, relative entropy regularization and safety intervention information are used to constrain subsequent policy updates. Simulation results on the ethylene glycol distillation column show that, after 40 days of training, the proposed method increases product output by 5.72% and average net profit by 49.26% compared with human operation, while maintaining product purity within the quality constraint. The proposed method improves economic benefits and learning stability in chemical process control, while ensuring operational safety during early-stage random exploration

HTML全文

参考文献(0)

施引文献

资源附件(0)