Optimization procedures for Markovian and semi-Markovian decision processes

Date

1994

Authors

Ren, Zhi-Zhong Oscar

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

In this paper, we investigate both Markovian decision processes (MDP) and semi-Markovian decision processes (Semi-MDP), for either discrete or continuous time, and with or without discounting. Attention is focused primarily on the determination of optimal strategy in MDP or Semi-MDP with finite states and finite action space. The structures of system rewards in terms of yields and bonuses associated with state occupancies, transitions among the states in the process and the action taken in each state are presented and incorporated into the appropriate optimization criteria and algorithms. The existence of optimal stationary strategies for an infinite horizon are noted and used in the algorithms for different cases. The different policy iteration methods involving the appropriate policy improveĀ­ment algorithms (PIA) as well as the value determination operations (VDO) used to obtain the optimal stationary strategies and the total expected return values or average gain value per unit time over infinite time horizons are presented; MoreĀ­ over, the value iteration procedure (VIP) used to obtain optimal time-dependent strategies and the corresponding optimized total expected return values for discrete time MDP or Semi-MDP over finite horizons are also presented. All the algorithms are fully discussed and a number of examples are presented through this paper. In addition to the discussion of some of the underlying theory and properties of MDP and Semi-MDP, this paper also provides a set of programs written and tested in Maple for implementing the various optimization algorithm.

Description

Keywords

Citation