Summary: | In this thesis, modeling and optimization in the field of storage management under
stochastic condition will be investigated using two different methodologies: Simulation
Optimization Techniques (SOT), which are usually categorized in the area of Reinforcement
Learning (RL), and Nonlinear Modeling Techniques (NMT).
For the first set of methods, simulation plays a fundamental role in evaluating the control
policy: learning techniques are used to deliver sub-optimal policies at the end of a
learning process. These iterative methods use the interaction of agents with the stochastic
environment through taking actions and observing different states. To converge to
the steady-state condition where policies and value functions do not change significantly
with the continuation of the learning process, all or most important states must be visited
sufficiently. This might be prohibitively time-consuming for large-scale problems.
To make these techniques more efficient both in terms of computation time and robust
optimal policies, the idea of Opposition-Based Learning (OBL-Type I and Type II) is
employed to modify/extend popular RL techniques including Q-Learning, Q(λ), sarsa,
and sarsa(λ). Several new algorithms are developed using this idea. It is also illustrated
that, function approximation techniques such as neural networks can contribute to the
process of learning. The state-of-the-art implementations usually consider the maximization
of expected value of accumulated reward. Extending these techniques to consider
risk and solving some well-known control problems are important contributions of this
thesis.
Furthermore, the new nonlinear modeling for reservoir management using indicator functions
and randomized policy introduced by Fletcher and Ponnambalam, is extended to
stochastic releases in multi-reservoir systems. In this extension, two different approaches
for defining the release policies are proposed. In addition, the main restriction of considering
the normal distribution for inflow is relaxed by using a beta-equivalent general
distribution. A five-reservoir case study from India is used to demonstrate the benefits
of these new developments. Using a warehouse management problem as an example,
application of the proposed method to other storage management problems is outlined.
|