Psych/Cogst 4650/6650, Spring 2012: Reinforcement Learning: Computational and Brain Aspects

I. Introduction: the idea of reinforcement learning (RL) in machine learning and neuroscience

Jan 25:

Feb 1:

Readings for Jan. 25 — Feb 1 introductory material (required):
  1. Woergoetter, W., and B. Porr (2007). Reinforcement learning. Scholarpedia, 3(3):1448.
  2. Parent, A. and L.-N. Hazrati (1995). Functional anatomy of the basal ganglia. I. The cortico-basal ganglia-thalamo-cortical loop. Brain Research Reviews 20:91-127.
  3. Wolpert, D. M., J. Diedrichsen, J. R. Flanagan (2011). Principles of sensorimotor learning. Nature Reviews Neuroscience 12:739-752.
  4. Atallah, H. E., Frank, M. J., & O'Reilly, R. C. (2004). Hippocampus, cortex, and basal ganglia: Insights from computational models of complementary learning systems. Neurobiology of Learning and Memory, 82(3), 253-267.
  5. Other non-required readings:
  6. Chater, N. (2009). Rational and mechanistic perspectives on reinforcement learning. Cognition 113:350-364.
  7. Sutton, R. and A. G. Barto (1998). Reinforcement Learning (BOOK).
Background reading for basic brain structure review:

II. Brain Mechanisms and Models of Reinforcement Learning

Feb 8:

Basic basal ganglia circuitry and models

  1. Aldridge, J. W., & Berridge, K. C. (1998). Coding of serial order by neostriatal neurons: A ''natural action'' approach to movement sequence. Journal of Neuroscience, 18(7), 2777-2787. Also, Aldridge and Berridge review.
  2. Frank, M. J. (2011). Computational models of motivated action selection in corticostriatal circuits. Current Opinion in Neurobiology, 21(3), 381-386.
  3. Graybiel, A. M. (2008). Habits, rituals, and the evaluative brain. Annual Review of Neuroscience, 31(1), 359-387.
  4. Redgrave, P., Vautrelle, N., & Reynolds, J. N. J. (2011). Functional properties of the basal ganglia's re-entrant loop architecture: selection and reinforcement. Neuroscience, 198, 138-151.
Additional material, not required (note: Most of these review similar material, but frame the content in the specific academic perspective indicated, except Cohen and Frank, which is a different version of Frank 2011):

Feb 15:

The process of learning and unlearning as insight into models

  1. Jin, X., & Costa, R. M. (2010). Start/stop signals emerge in nigrostriatal circuits during sequence learning. Nature, 466(7305), 457-462.
  2. Thorn CA, Atallah H, Howe M, Graybiel AM (2010). Differential dynamics of activity changes in dorsolateral and dorsomedial striatal loops during learning. Neuron 66:781-795.
  3. Howe, M. W., Atallah, H. E., McCool, A., Gibson, D. J., & Graybiel, A. M. (2011). Habit learning is associated with major shifts in frequencies of oscillatory activity and synchronized spike firing in striatum. Proceedings of the National Academy of Sciences, 108(40), 16801-16806.

Feb 22:

Chaining, embedding, interrupting and unlearning (1)

  1. Charlesworth, J. D., Tumer, E. C., Warren, T. L., & Brainard, M. S. (2011). Learning the microstructure of successful behavior. [10.1038/nn.2748]. Nat Neurosci, 14(3), 373-380
  2. Bornstein, A. M., & Daw, N. D. (2011). Multiplicity of control in the basal ganglia: computational roles of striatal subregions. Current Opinion in Neurobiology, 21(3), 374-380
  3. Amemori, K.-i., Gibb, L. G., & Graybiel, A. M. (2011). Shifting responsibly: The importance of striatal modularity to reinforcement learning in uncertain environments. Frontiers in Human Neuroscience, 5.

Feb 29:

Chaining, embedding, interrupting and unlearning (2)

  1. Ding, J. B., Guzman, J. N., Peterson, J. D., Goldberg, J. A., & Surmeier, D. J. (2010). Thalamic gating of corticostriatal signaling by cholinergic interneurons. Neuron, 67(2), 294-307.
  2. Isoda, M., & Hikosaka, O. (2011). Cortico-basal ganglia mechanisms for overcoming innate, habitual and motivational behaviors. [10.1111/j.1460-9568.2011.07698.x]. European Journal of Neuroscience, 33(11), 2058-2069.
  3. Special topic: Michael Anderson, Colloquium Speaker in Psychology this week. Anderson, M. (2010). Neural re-use as a fundamental organizational principal of the brain. Behavioral and Brain Sciences.

Mar 7:

Individual differences in performance of models and subjects, disorders

  1. Montague, P. R., Dolan, R. J., Friston, K. J., & Dayan, P. (2012). Computational psychiatry. Trends in Cognitive Sciences, 16(1), 72-80.
  2. Redgrave, P., Rodriguez, M., Smith, Y., Rodriguez-Oroz, M. C., Lehericy, S., Bergman, H., . . . Obeso, J. A. (2010). Goal-directed and habitual control in the basal ganglia: implications for Parkinson's disease. [10.1038/nrn2915]. Nat Rev Neurosci, 11(11), 760-772.
  3. Neiman, T. and Y. Loewenstein (2011). Reinforcement learning in professional basketball players. Nature Communications 2:569.

Mar 14:

Assigning agency

  1. Review Wolpert (under Jan. 25 readings).
  2. Gallagher, S. (2012). Multiple aspects in the sense of agency. New Ideas in Psychology 30:15-31.
  3. Wegner D. (2004). Precis of The illusion of conscious will. Behavioral and Brain Sciences, 27:649-659 (not commentaries). Whoever signs up for this should get the book and present some of the experiments reviewed, with an eye to the statistical assignment of agency, not the consciousness aspect.
  4. Whitham, E. M., Fitzgibbon, S. P., Lewis, T. W., Pope, K. J., DeLosAngeles, D., Clark, C. R., . . . Willoughby, J. O. (2011). Visual experiences during paralysis. Frontiers in Human Neuroscience, 5. doi: 10.3389/fnhum.2011.00160


Mar 28:

Multiple types of reinforcers and gates: Opiates and oxytocin

  1. Humphries, M. D., & Prescott, T. J. (2010). The ventral basal ganglia, a selection mechanism at the crossroads of space, strategy, and reward. Progress in Neurobiology, 90(4), 385-417.
  2. Ross, H. E., & Young, L. J. (2009). Oxytocin and the neural mechanisms regulating social cognition and affiliative behavior. Frontiers in Neuroendocrinology, 30(4), 534-547.
  3. Depue, R. L., & Collins, P. F. (1999). Neurobiology of the structure of personality: Dopamine, facilitation of incentive motivation, and extraversion. Behavioral and Brain Sciences, 22, 491-569.
An example, but not necessarily a "model", of research in this area using primates and neuroimaging:
  1. Chang, S. W. C., Barter, J. W., Ebitz, R. B., Watson, K. K., & Platt, M. L. (2012). Inhaled oxytocin amplifies both vicarious reinforcement and self reinforcement in rhesus macaques (Macaca mulatta). Proceedings of the National Academy of Sciences, 109:959-964.

April 4:

Hierarchical and other control architectures: computation

  1. Parr, P. and S. Russell (1997). Reinforcement Learning with Hierarchies of Machines. Proc. NIPS.
  2. Botvinick, M., Y. Niv, A. G. Barto (2009). Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective. Cognition 113:262-280.
  3. Ribas-Fernandes, J. J. F., A. Solway, C. Diuk, J. T. McGuire, A. Barto, Y. Niv, and M. Botvinick (2011). A Neural Signature of Hierarchical Reinforcement Learning. Neuron 71:370-379.
  4. Barto, A. G. and S. Mahadevan (2003). Recent Advances in Hierarchical Reinforcement Learning. Discrete Event Systems 14:41-77.
  5. Vigorito, C. M. and A. G. Barto (2010). Intrinsically Motivated Hierarchical Skill Learning in Structured Environments. IEEE Transactions on Autonomous Mental Development 2:132-144.

April 11:

Negative reinforcement and punishment

  1. Matsumoto M, Hikosaka O (2009). Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature 459:837-841.
  2. LeDoux, J. E. (2003). The emotional brain, fear and the amygdala. Cellular and Molecular Neurobiology, 23.
  1. Leknes, S., & Tracey, I. (2008). A common neurobiology for pain and pleasure. Nature Reviews Neuroscience, 9:314-320.

April 18:

The special case of anxiety: resource allocation and control

  1. Egner, T., Etkin, A., Gale, S., & Hirsch, J. (2008). Dissociable neural systems resolve conflict from emotional versus nonemotional distracters. Cerebral Cortex, 18(6), 1475-1484. (Also Egner commentary).
  1. Duncan, J. (2010). The multiple-demand (MD) system of the primate brain: mental programs for intelligent behaviour. Trends in Cognitive Sciences, 14:172-179.
  2. Hurley, MM, Dennet, D. and Adams, R. (2011). Inside Jokes: Using humor to reverse-engineer the mind. MIT Press. Chapts 2 and 5 for a sketch of the argument and an interesting twist on what a "reinforcement" is.

April 25:

Integrating basal ganglia function into memory and decision-making

  1. O'Reilly, R. C. and M. J. Frank (2006). Making Working Memory Work: A Computational Model of Learning in the Prefrontal Cortex and Basal Ganglia. Neural Computation 18:283-328.
  2. van der Meer MA, Johnson A, Schmitzer-Torbert NC, Redish AD (2010). Triple dissociation of information processing in dorsal striatum, ventral striatum, and hippocampus on a learned spatial decision task. Neuron 67:25-32.
  3. McNab, F. and T. Klingberg (2008). Prefrontal cortex and basal ganglia control access to working memory. Nature Neuroscience 11:103-108.
  4. Ullman, M. T. (2006). Is Broca's area part of a basal ganglia thalamocortical circuit? Cortex 42:480-485.

May 2:


  1. Shmuelof, L., & Krakauer, J. W. (2011). Are we ready for a natural history of motor learning? Neuron, 72(3), 469-476.

Shimon Edelman <se37 at>
Last modified on Mon Feb 27 14:50:27 2012