Motivation Redux
by JS
When pressed to explain my post on motivation the other day, in particular the idea of motivation as the gradient of the value function, I found my explanation, and indeed my own understanding, unsatisfactory. Since I like to aim for complete understanding, I thought I’d spend a little time on value function gradients and why they may serve as a technical definition of motivation.
To start we should probably have a target idea of motivation in mind if we hope to explain it in technical terms. From the OED we have:
The (conscious or unconscious) stimulus for action towards a desired goal, esp. as resulting from psychological or social factors; the factors giving purpose or direction to human or animal behaviour. Now also more generally (as a count noun): the reason a person has for acting in a particular way, a motive.
WordNet has the following:
The psychological feature that arouses an organism to action toward a desired goal; the reason for the action; that which gives purpose and direction to behavior.
So what does gradient of the value function mean? Consider the following environment and value function. The red square is the goal, where an agent receives a reward of 1.0. The agent is currently at the blue square and is trying to reach the goal. I’ve labeled each state with it’s expected discounted reward following the optimal policy of reaching the goal in the least number of states. The discount factor is 0.95.
Following a greedy policy, where the agent selects the action that brings it to the next state of highest value, would force the agent to move right (in this scenario the agent can move in the four cardinal directions unless obstructed, or not at all). Out of the two choices of actions, left or right, the right action has the higher value since it is closer to the goal and not subject to as much discounting. How would we describe the actions of this agent? We might say that the agent is moving up the gradient of the value function, since its path involves actions that seek states of higher and higher value.
Notice, however, that this characterization is tightly coupled with the way we’ve chosen to represent rewards and values. If we consider the undiscounted case, things look quite a bit different.
Now the agent’s choice of action does not matter all that much, since the agent can always get to the reward state, and the amount of time to get there does not matter. The point is not that one model of reward is more realistic than another, the point is simply that the model of reward matters, and by extension, the ability to use the gradient of value function as a meaningful proxy for reward just sort of begs the question of how (and why) rewards are the way they are.
As stated, the above definitions for motivation seem to do just as well as “gradient of the value function,” but they also seem to fail in the same way. The interesting question is not what is motivation, but why are there particular motivations.


