![Python Reinforcement Learning](https://wfqqreader-1252317822.image.myqcloud.com/cover/708/36698708/b_36698708.jpg)
上QQ阅读APP看书,第一时间看更新
Markov Decision Process
MDP is an extension of the Markov chain. It provides a mathematical framework for modeling decision-making situations. Almost all Reinforcement Learning problems can be modeled as MDP.
MDP is represented by five important elements:
- A set of states
the agent can actually be in.
- A set of actions
that can be performed by an agent, for moving from one state to another.
- A transition probability (
), which is the probability of moving from one state
to another
state by performing some action
.
- A reward probability (
), which is the probability of a reward acquired by the agent for moving from one state
to another state
by performing some action
.
- A discount factor (
), which controls the importance of immediate and future rewards. We will discuss this in detail in the upcoming sections.