When using CMACs to solve reinforcement learning problems the goal is usually to model the value function. In this case the updates are not as simple as in the supervised learning case. Since rewards are often sparse but the value function is dense, updates have to solve the credit assignment problem.
I added a special kind of CMAC to my github repository that performs temporal difference updates in order to make learning value functions with delayed rewards easy. The method also employs eligibility traces, which have an elegant implementation as a Python dictionary.