SAIL Room, 111 Levin Building (425 S. University Ave.)

**Paul Schrater**

Departments of Psychology and Computer Science

University of Minnesota**How to get modularity of mind by decomposing value into learnable probabilistic constraints**

To understand how the brain learns, makes sequences of decisions and controls its motor effectors, Computational Neuroscience has profitably imported core concepts from artificial Intelligence and economics in terms of decision theory, reinforcement learning and stochastic optimal control, all of which have common foundations in solving optimization problems to maximize expected value. The framework has a large number of sticky problems: how can we acquire a reward/preference model? How are preferences/utilities computed for things we haven't seen? How do we integrate multiple preferences? How can we represent tasks and goals in terms of preferences? How can we learn modular, reusable solutions to sequential decision and control problems?

The goal of this talk is to show our theoretical and empirical work relating preference models, reinforcement learning and compositionality in control. Traditionally, both decisions and control are developed using a framework where the goal of the agent is to maximize expected reward. Here, we make a substantial departure, reformulating the decision and control problems as probabilistic constraint satisfaction. The talk will show how to answer the following questions in an integrated framework:

1. Using probabilistic preferences, we can acquire preferences by Bayesian learning, use our episodic and semantic knowledge to infer preferences for items not previously known. However, these preferences will act as probabilistic constraint satisfaction (satisfice, not maximize) and are only equivalent to reward/utility under key environmental conditions. Preferences are re-interpreted as probabilistic constraints linking intrinsic values to extrinsic events and items. We use these ideas to give new explanations for preference reversals, and show we can experimentally induce reversals by designing appropriate experience sequence.

2. How to decompose complex control tasks as logical combinations of probabilistic constraints

3. How this task decomposition also decomposes the optimal control problem into a composition of reusable solutions to the component constraints. This novel policy-weighting architecture allows fully optimal control even with dynamic changes in targets, payoffs and obstacles, without having to resolve the control problem.

These results yield a novel interpretation for the hierarchical decision/control architecture in the brain, based on real-time adaptive re-weighting of a bank of control models. It also makes accurate predictions of reaching behavior in control of limb movements with dynamic targets and obstacles.*The talk will begin at 12pm. A pizza lunch will be served at 11:45am.*