cs231n学习笔记–14. reinforcement learning

1. What is Reinforcement Learning

概述:

举个栗子:

再举一个:

2. Markov Decision Process

  • Mathematical formulation of the RL problem
  • Markov property: Current state completely characterises the state of the world

处理流程:

The optimal policy π*

3. Q-learning

Definitions: Value function and Q-value function:

Bellman equation:

优化策略:

Solving for the optimal policy: Q-learning

举个栗子:Playing Atari Games

Q-network Architecture

Training the Q-network: Experience Replay

Deep Q-Learning with Experience Replay

4. Policy Gradients

Intuition:

Variance reduction:

Variance reduction: Baseline

How to choose the baseline?