Markov decision process (MDP) – Reinforcement Learning decision model

* Is a discrete time stochastic control process for decision making in situations where outcomes are partly random and partly under the control of a decision maker.

* At each discrete time step, the process is in some state s, and the decision maker may choose any action a that is available in state s.

* The process responds at the next time step by randomly moving into a new state s’, and giving the decision maker a corresponding reward R_a(s,s’).

*The probability that the process moves into its new state s’ is influenced by the chosen action. Specifically, it is given by the state transition function P_a(s,s’).