Modeling Fixed Memory Agents
by JS
[ For background on game theory I recommend starting here. ]
In two player iterated games, each player chooses an action each round. The reward each player receives is given by a payout matrix, and is based on the joint action of both players. In other words, if I choose action and my opponent chooses action
, then our joint reward is given by
where
is some payout matrix giving the reward for each possible joint action of me and my opponent.
If you are playing an agent with some fixed memory size and strategy, you may, as the opponent, want to reason about what the agent will play and, in particular, what you should play to maximize your own return every round. This is the topic of this paper.
While trying to understand the paper, and unsuccessfully trying to prove a conjecture in the paper concerning fixed strategy agents with non-trivial memory, I came across an interesting way to model the problem — de Bruijn graphs.
Since these are named graphs, you should assume that mathematicians know a great deal about them. They seem to come up in systems research as well, as the graphs have nice diameter properties that are beneficial in state based protocols and networks. The construction of a de Bruijn graph is simple. Each node is a length list of
symbols. There are
nodes in the graph (i.e. every possible list appears). If we view every node as a memory queue, then a directed edge occurs between two nodes if the destination node can be created from the origin node by putting a symbol at the front of the memory queue and “popping” off a symbol at the end of the memory queue. In other words, this is like remembering something recent and forgetting something old.
In the two player scenario above, our opponent is basing her actions on my previously recorded actions. The next action I take results in both some reward based on our joint actions, and an update to the opponent’s memory. In other words, we can model my actions in the game as a reward function over the de Bruijn graph that records both the state of the game in terms of the opponent’s memory, and my possible actions, which move from state to state (or node to node). The edges are weighted according to the payout matrix. This does not cause any problems since the opponent’s strategy, given the opponent’s current memory state, is entirely deterministic. As the opponent strategy changes, the edge weights change. As the opponent’s memory size changes, or the number of actions available to me changes, the underlying graph structure changes (de Bruijn graphs are parametrized by the number of symbols (e.g. my possible actions) and the memory size
.
We know that the diameter of unweighted de Bruijn graphs is since to get to any node requires
actions that have the effect of “overwriting memory” and thus terminating in the appropriate node. The conjecture in the paper mentioned above seems to be equivalent to the following question:
What is the length of the longest shortest path in a weighted de Bruijn graph?
This is a somewhat confusingly posed question. For instance, “length” refers to the number of hops in a path, but “shortest” means shortest in terms of the weight of the edges. For arbitrary weighting of edges, the diameter can probably be made arbitrarily large, but in the case where the edge weights are restricted to reflect a small number of rewards (relative to the size of the graph), and the rewards are somewhat evenly distributed across the edges, it seems like we can put some strong bounds on the length of shortest paths. If we have a mechanism for doing that, then we can reason about all sorts of fixed memory deterministic strategies for agents.
