Lecture 13.2: the mind's arrows

— the Bayesian underpinnings of cognition

Bayesian models of object perception
Daniel Kersten and Alan Yuille
Current Opinion in Neurobiology 13:1-9 (2003).

Theory-based Bayesian models of inductive learning and reasoning
Joshua B. Tenenbaum, Thomas L. Griffiths, and Charles Kemp
Trends in Cognitive Sciences 10:309-318 (2006).

the Bayes Theorem


D — the observed data;
h — the hypothesis in question;
H — the space of all possible hypotheses;
K — the background domain knowledge.

See the Wiki entry on the Bayes Theorem for mathematical details.

slide 2

Bayes in perception

P(h | K) prior probability prevalence of a certain category of shapes in the world
P(D | h,K) / P(D | K) likelihood
ratio
how likely the image is, given that the object has the hypothesized shape
P(h | D,K) posterior probability how probable the hypothesized shape is, given the image

slide 3

Bayes in decision-making

P(h | K) prior probability prevalence of a certain disease in the general population
P(D | h,K) / P(D | K) likelihood
ratio
how likely the test result is, given that the patient has the hypothesized disease
P(h | D,K) posterior probability how probable the hypothesized disease is, given the test result

slide 4

the Bayes Theorem: proof outline

The conditional probability of B given A (think darts) is defined as P(B | A) = |A & B| / |A| Divide numerator and denominator by the "universe" size |U| to obtain P(B | A) = P(A & B) / P(A)

Now, by the definition of conditional probability, the joint probability can be expressed in two equivalent ways:

P(A & B) = = P(A) P(B | A)
= P(B) P(A | B)

Suppose that B is the hypothesis in question, and A is data that can be brought to bear on it. We can use the Bayes Theorem to estimate the probability of the hypothesis being true, given the data:

P(B | A) = P(A | B) P(B) / P(A)

slide 4

Bayes in perception

  1. The given image is consistent with many shape / viewpoint combinations.
  2. The likelihood: the compatibility of different scene interpretations with the observed image; here, "small curvature and large slant" hypotheses are more likely.
  3. The prior: highly convex objects viewed from above are expected.
  4. A Bayesian observer combines likelihood and prior to estimate the posterior probability for each possible interpretation of the given image in terms of the curvature and the slant of the perceived shape.

slide 6

Bayes in shape perception: another example

slide 7

Bayes in shape perception

slide 8

slide 9

A hierarchical Bayesian framework for theory-based induction

The learner observes data about the world (e.g. examples of objects that a word refers to) and must predict other unobserved data (e.g. which other objects the word can refer to).

The learner's intuitive theory generates hypotheses that can explain the observed data and that support the desired predictions. The theory represents knowledge on at least two levels of abstraction:

— a structured probabilistic model generates expectations about the probability of possible data sets;

— more abstract domain principles generate the space of possible structures.

Priors for abstract domain principles can come from multiple sources, including higher-level domain knowledge or domain-general conceptual resources.

slide 10

Bayes in word learning

The hypothesis space of word meanings: a tree-structured taxonomy.


Domain principles constrain the structure of the hypothesis space and generate the priors and likelihoods necessary to evaluate the hypotheses given data: — taxonomic principle
— contrast principle
— competent and cooperative speaker
— randomly sampled examples

Bayes in word learning

Comparison of the model's predictions with 4-year-old children's patterns of generalization.

For both children and the model, the probability of generalization decreases with taxonomic distance to the examples.

This gradient becomes sharper as more examples are observed.


Observing several examples drawn at random, it would be a highly suspicious coincidence for all examples to fall within a given taxonomic category (e.g. basset hounds) if the word in fact had a much broader extension (e.g. dogs), so the most specific consistent hypothesis is strongly preferred.

slide 12

theory-based Bayesian property induction

Three models for property induction: a taxonomic model (left), a food-web model (center) and a dimensional threshold model (right).


The "Data" level shows properties with high prior probability under each of these models.

For example, the dimensional threshold model favors hypotheses that include all species beyond some point in the linear order.

slide 13

Learning a theory for how biological properties are distributed over species

Given abstract domain knowledge that species should be organized in a taxonomic tree, with properties varying smoothly over that tree, a Bayesian learner can infer the tree structure that best explains a set of observed properties.

Two ways to organize animal species into a taxonomy are shown.

The preferred structure will be the tree that maximizes the likelihood P(Data|Structure).


Intuitively, the best choice allows features to vary smoothly over the tree: for example, because gorillas and monkeys share many properties, these species should be located nearby in the tree.

slide 14

Learning a theory for how biological properties are distributed over species

Animal species may be organized according to various structural principles, such as the three shown here.

Bayesian inference in the hierarchical framework can select the organizing principles best supported by a set of observed properties.


Choosing the best structure involves a trade-off between complexity and fit to the data [a kind of regularization], which can be formalized in terms of the hierarchical Bayesian framework.

slide 15

extra: Bayesian causal induction

Abstract knowledge in a medical domain can be represented using a graph schema, a probabilistic generative grammar for graphs. Variables fall into three classes: risk factors, diseases, and symptoms.

Given a newly observed correlation (e.g. between working in a factory and chronic chest pain), the graph schema generates hypotheses for explaining the data (red). In the simplest hypotheses, a disease known to be caused by working in a factory might cause chest pain, or a disease known to cause chest pain might actually be produced by working in a factory.

Failing that, the learner could posit a new disease X, which has chest pain as a symptom and is caused by working in a factory. Other hypotheses that may be simpler a priori but which violate the theory would never be considered, such as a direct causal link from working in a factory to chest pain, or from chest pain to working in a factory.

slide 16