Coordinated Optimization of Generation and Compensation to Enhance Short-Term Voltage Security of Power Systems Using Accelerated Multi-Objective Reinforcement Learning

High proportions of asynchronous motors in demand-side have pressured heavily on short-term voltage security of receiving-end power systems. To enhance short-term voltage security, this paper coordinates the optimal outputs of generation and compensation in a multi-objective dynamic optimization mod...

Full description

Bibliographic Details
Main Authors: Zhuoming Deng, Zhilin Lu, Zhifei Guo, Wenfeng Yao, Wenmeng Zhao, Baorong Zhou, Chao Hong
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9000885/
Description
Summary:High proportions of asynchronous motors in demand-side have pressured heavily on short-term voltage security of receiving-end power systems. To enhance short-term voltage security, this paper coordinates the optimal outputs of generation and compensation in a multi-objective dynamic optimization model. With equipment dynamics, network load flows, lower and upper limitations, and security constraints considered, this model simultaneously minimizes two objectives: the expense of control decision and the voltage deviation. The Radau collocation method is employed to handle dynamics, by transforming all differential algebraic equations into algebraic ones. Most importantly, Pareto solutions are obtained through an accelerated multi-objective reinforcement learning (AMORL) method by filtering the dominated solutions. The entire feasible region is partitioned into small independent regions, to eliminate the scope for Pareto solutions. Besides, the AMORL method redefines the state functions and introduces creative state sensitivities, which accelerate the switch from learning to applying, once the agent accumulates sufficient knowledge. Furthermore, Pareto solutions are diversified via introducing some potential solutions. Lastly, the Fuzzy decision-making methodology picks up the tradeoff solution. Case studies are implemented on a practical 748-node power grid, which validate the acceleration and efficiency of the AMORL method. The AMORL method is overall superior to conventional reinforcement learning (RL) method with more optimal non-dominated objective values, much shorter CPU time, and better convergence to accurate values. Moreover, compared with another three state-of-the-art RL methods, the AMORL method takes almost the same CPU time of several seconds, but is slightly superior to the state-of-the-art methods in terms of optimal objective values. Additionally, the calculated values of the AMORL method fit the best with the accurate values during each iteration, resulting in a good convergence.
ISSN:2169-3536