![Python Reinforcement Learning](https://wfqqreader-1252317822.image.myqcloud.com/cover/708/36698708/b_36698708.jpg)
上QQ阅读APP看书,第一时间看更新
Agent environment interface
Agents are the software agents that perform actions, At, at a time, t, to move from one state, St, to another state St+1. Based on actions, agents receive a numerical reward, R, from the environment. Ultimately, RL is all about finding the optimal actions that will increase the numerical reward:
![](https://epubservercos.yuewen.com/F4348E/19470379901496006/epubprivate/OEBPS/Images/d1491480-df3b-43f1-9be5-4e3aaccd0853.png?sign=1739279011-j2Up0UtVbmWP7XjUeDjwtW2S12tCcY2a-0-8439815ff286f20da4ef988b31d7bf1d)
Let us understand the concept of RL with a maze game:
![](https://epubservercos.yuewen.com/F4348E/19470379901496006/epubprivate/OEBPS/Images/70987800-38d3-43e3-9e61-5460aac2d036.png?sign=1739279011-k7Vc6FrQ5CgNDhJhwMJWdRl1r5EvH2vm-0-4a8418811f13572bc86e3a8fecfd74f9)
The objective of a maze is to reach the destination without getting stuck on the obstacles. Here's the workflow:
- The agent is the one who travels through the maze, which is our software program/ RL algorithm
- The environment is the maze
- The state is the position in a maze that the agent currently resides in
- An agent performs an action by moving from one state to another
- An agent receives a positive reward when its action doesn't get stuck on any obstacle and receives a negative reward when its action gets stuck on obstacles so it cannot reach the destination
- The goal is to clear the maze and reach the destination