billy ian in wonderland

Alt text

The agent and environment interact at each of a sequence of discrete time steps, $t=0,1,2,3,dots$
At each time step $t$, the agent receives some representation of the environment’s state, $S_tinmathcal{S}$, where $mathcal{S}$ is the set of possible states.
On that basis, the agent selects an action, $A_t in mathcal{A}(S_t)$, where $mathcal{A}(S_t)$ is the set of actions available in state $S_t$.
One time step later, in part as a consequence of its action, the agent receives a numerical reward, $R_{t+1} in mathcal{R} subset mathbb{R}$, and finds itself in a new state, $S_{t+1}$.

近期文章