A Reinforcement Learning Agent
by JS
I finally got around to putting up my code for a sarsa agent. A couple of things to note:
- The underlying function approximation is accomplished through a CMAC. Note that my CMAC implementation uses Python native hashing. In particular, if you know anything about CMAC, this is not memory bounded, as Python hashes are designed to deal with collisions.
- Trace updates are managed via a queue, which seems to result in more intuitive and efficient code than appears elsewhere. For example, the “meaning” of the lambda parameter is obvious if you consider its affect on the length of the trace queue. In fact, instead of specifying lambda, we could set a trace length directly.
- I’ve also included a mountain car implementation. This is taken directly from the book.
So what is a sarsa agent? Well, s.a.r.s.a. stands for “state”,”action”,”reward”,”state”,”action” and it summarizes the sequence of signals and actions that drive the agents learning process. So what is the agent learning? It’s learning how to act in response to the state it receives. So in the mountain car scenario, the state is the position and velocity of the car, and the action that the agent needs to decide every timestep is whether to push the car left or right.
Of course you could probably figure out the proper way to act if you think about the problem for a while. What reinforcement learning tries to do is learn the proper policy from experience alone. (Well, not quite alone — there’s a reward signal as well.)
Next on my list is an implementation of a LSTD agent (Least Squares Temporal Difference).