Markov decision process (MDP) – Reinforcement Learning decision model

On: July 6, 2019

In: AI, Deep Machine Learning, Reinforcement Learning

Tagged: action, control, corresponding, decision, discrete, probability, reward, state, stochastic, time

* Is a discrete time stochastic control process for decision making in situations where outcomes are partly random and partly under the control of a decision maker.

* At each discrete time step, the process is in some state s, and the decision maker may choose any action a that is available in state s.

* The process responds at the next time step by randomly moving into a new state s’, and giving the decision maker a corresponding reward R_a(s,s’).

*The probability that the process moves into its new state s’ is influenced by the chosen action. Specifically, it is given by the state transition function P_a(s,s’).

Previous Post: Prior probability distribution

Next Post: Q-learning – Model-free reinforcement learning algorithm

Comments are closed, but trackbacks and pingbacks are open.