cs231n spring 2017 lecture14 Reinforcement Learning 听课笔记

时间：2017-12-10 19:18:49 阅读：302 评论：0 收藏：0 [点我收藏+]

（没太听明白，下次重新听）

1. 增强学习

　　有一个 Agent 和 Environment 交互。在 t 时刻，Agent 获知状态是 s_t，做出动作是 a_t；Environment 一方面给出 Reward 信号 r_t，另一方面改变状态至 s_t+1；Agent 获得 r_t和 s_t+1。目标是 Agent 学习 s_t到 a_t的某种映射 π* 最大化累积的 Reward，∑γ^tr_t，其中 γ^t是折现系数（discount factor）。

技术分享图片

　　用Markov Decision Process描述RL problem。马尔可夫过程是拥有马尔可夫性质的过程。马尔可夫性质：未来的状态仅依赖当前状态，或者说该过程没有记忆特质。

cs231n spring 2017 lecture14 Reinforcement Learning 听课笔记

原文：http://www.cnblogs.com/zonghaochen/p/8017725.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)