[...] The problem of perception and action is essentially a form of Bayesian filtering: the state of sensory and motor variables has to be estimated online from priors and a stream of noisy and ambiguous observations.
[...] As these computations have to be performed on a temporal scale of the order of a single interspike interval, we propose to take single spikes as the basic unit of representation and computations. An alternative would be to consider that probabilities are represented by the average firing rate, defined over long periods of time or large population of neurons (Shadlen & Newsome, 1994).
We will take single neurons, as opposed to a population of them, as the basic units of computation, considering each neuron as computing the probability of one particular hidden variable.
We propose that the basic meaning of a spike is the occurrence of new, unpredictable probabilistic information and that propagation of spikes in cortical networks corresponds to propagation of beliefs in a corresponding Bayesian network.
We consider that each neuron codes for a time-varying binary hidden variable, xt. This variable could correspond to a property of the real world, such as the presence or absence of an object in a limited portion of space (the neuron's receptive field) or whether motion goes in one particular direction in the neuron's receptive field. It could also be much more abstract and represent statistical regularities of the sensory input and motor output.
This model is called generative because it defines the way that observations (the sensory input) are assumed to be generated (or caused) by the state of the hidden variable (Hinton & Ghahramani, 1997). Thus, a generative model might describe how often a horizontal bar appears or disappears at a given retinal location and how its presence result in a particular pattern of light on the retina. [A complementary approach is to employ a discriminative model, which maps inputs into outputs (e.g., class labels) without attempting to approximate the joint distribution of the inputs and the outputs. Main disadvantage: a discriminative model feels like a "black box": relationships between variables are not made explicit; new tasks involving similar variables cannot be generalized to. Example: language learning. Cf. R. C. Conant and W. R. Ashby, Every good regulator of a system must be a model of that system, Intl. J. Systems Science, 1:89-97 (1970).]
Figure 1: Generative model for the synaptic input received by a single neuron.
[Discuss Equation 2.2]
Figure 2: The log-odds ratio reflects a leaky integration of sensory evidence. [LEAKY INTEGRATION: think of a bucket of water which is suspended from a coil spring and which has a small hole, on a rainy day, as a measurement device for precipitation. A more realistic example: a three-bucket system (2, feeding into 1) that measures both natural precipitation and contribution from the neighbor's sprinkler. Cf. the FORECASTING STONE.]
[Fig.2] is reminiscent of the leaky integration of synaptic inputs in biological neurons. However, to understand the neural basis of probabilistic computation, we need to define the rules according to which a neuron will fire output spikes as a function of its synaptic inputs, that is, what relates its output spike train Ot with the synaptic input st. In other words, WHAT IS THE NEURAL CODE? This output spike train should provide a good representation of Lt, since it is all that will be available for performing further probabilistic computations.
Predictive coding. We propose that each spike deterministically reports new information about the state xt that is not redundant with what was already reported by the preceding spikes. In other words, the neuron performs a form of predictive coding and fires only when it cannot predict itself. This corresponds more or less to having spikes represent the temporal derivative of Lt.
Intuitively, this ensures that the model is self-consistent, in the sense that the output of the Bayesian neuron can be used as an input for another Bayesian neuron. If spikes represent only new information rather than an integration of sensory evidence, they can be harmlessly integrated in later processing stages without running into the problem of redundant successive integrations. [As illustrated in Figure 6.]
In order to fire only when new information is available, we propose that a neuron implements a form of spike-dependent adaptation, increasing its firing threshold after each spike. Thus, the neuron compares online the odds for its hidden variable, Lt, with a prediction Gt computed from the output spike train. A spike is emitted when the odds (a leaky integration of the synaptic input st) exceeds the prediction (a leaky integration of the output spike train Ot).
[Discuss Equation 2.3]
Figure 3: Predictive coding: Mechanism and prediction for firing statistics.
Our model predicts neurons that respond as long as they receive evidence that their preferred stimulus is present and stop responding when their stimulus disappears, with rather sharp transitions rather than slow ramps between active and inactive states. This is easy to understand intuitively: as long as the stimulus is present, the new sensory inputs received since the last output spike result in an increase of the log odds Lt. Meanwhile, the predicted log odds Gt decreases as the neuron forgets the contribution of its last output spike. This will lead to a new output spike when Lt crosses Gt once again.
Note that this model neuron signals unpredictable changes of the probability of xt = 1, not unpredictable changes in the state xt itself.Alternatives:
- A neuron using predictive coding for the state rather than the probability of xt would signal changes in this state, for example, switches from xt = 0 to xt = 1. Thus, this neuron would fire only at the time when the stimulus appears and become silent during longer presentation of the stimulus. A large class of sensory neurons may match this description, and we intend to explore this alternative hypothesis in future work.
- This is in contrast with neurons whose firing rate is proportional to Lt. The firing rate of such neurons would increase linearly when their stimulus is present and decrease linearly when their stimulus is absent (but eventually saturate). [...] In addition to the computational problems posed by this kind of representation (see section 3), it seems that sensory cortical neurons are more likely to fit the first description than the second.
Figure 4: Equivalence with leaky integrate-and-fire neuron: an example trial. Equation 2.6: cf. parallel RC circuit.
Similarity with Rate Coding. The model neuron's output firing statistics, in response to a Poisson distributed input, is close to a Poisson process (see Figures 3C and 3E). The mean output firing rate depends on the hidden state and is conditionally independent of time. Moreover, the firing rate can be described as a linear rectified function of the mean synaptic drive or mean rate of evidence received by the neuron. In this condition, one might wonder if it is really necessary to consider individual spikes: the output is not qualitatively different from a rate code model, where the firing rate provides information about xt.However, obtaining this input-output function is a nonlinear transform that requires Lt and Gt as intermediate stages. In effect, it consists of selecting among a bombardment of weakly informative synaptic input exactly what is relevant for the hidden variable. This selective evidence is expressed in the output spike train, with typically many fewer spikes than in the synaptic input (this is quite important, given that cortical neurons receive hundreds of active connections and can fire only a few tens of spikes in a second). This computation conserves information rather than adds noise by sampling spikes at random from a particular rate.
We started from an interpretation of synaptic integration in single neurons as a form of inference in a hidden Markov chain. We derived a model of spiking neurons able to compute the marginal posterior probabilities of sensory and motor variables given evidence received in the entire network. In this view, the brain implements an underlying Bayesian network in a neural architecture, with conditional probabilities represented by synaptic weights. [Re the last claim: see the bottom paragraph on p.95 in the paper.]
Despite nonlinear processing at the single cell level, the emerging picture is relatively simple: the neuron acts as a leaky integrate-and-fire neuron driven by noise. [...] Spikes report fluctuations in the level of certainty that could not be predicted from the stability of its stimulus (contribution from Gt). Thus, firing will be, by definition, unpredictable. This last observation leads us to suggest that the irregular firing and Poisson statistics observed in cortical neurons arise as a direct consequence of the random fluctuations in the sensory inputs and the instability of the real word, but are not due to unreliable or chaotic neural processing.
Neural Representation of Probability We propose that the probabilities of perceptual or motor variables are not represented explicitly in the output firing rates of the neurons. Rather, they correspond to an internal activation level of the neuron, which is not directly observable except by integrating its output spike train. The parameters of this integration, and thus what a spike means, are learned online by efferent neurons. Thus, we propose that neurons and neural networks are highly adaptive structures that continuously change their dynamical properties in order to interpret their input as best as possible.Our model neurons are responding as long as they receive evidence in favor of their hypothesis, with a firing rate proportional to the strength of the evidence.
Figure 6: Why rate coding is not a good solution. One alternative neural coding of probability could be to fire spikes stochastically, with a probability that is proportional to (or a function of) the log probability ratio Lt. [...] This encoding is seductive in its simplicity. However, it has two major drawbacks.
- First, being a stochastic spike generation rule, it adds uncertainty, and thus noise, to an otherwise deterministic probability computation.
- Second, and more importantly, the resulting model would not be self-consistent since the input and output firing rates have different meanings and different dynamics.
Bayesian Learning and Spike-Time-Dependent Plasticity. Finally, it is crucial for the biological realism of the model to find adaptive neural dynamics and synaptic plasticity rules able to learn the generative model. In the companion letter, we show that single neurons can learn the synaptic weights and neural dynamics using spike-dependent plasticity rules.
Bayesian inference and the Bayesian coding hypothesis. The fundamental concept behind the Bayesian approach to perceptual computations is that the information provided by a set of sensory data about the world is represented by a conditional probability density function over the set of unknown variables the posterior density function. A Bayesian perceptual system, therefore, would represent the perceived depth of an object, for example, not as a single number Z but as a conditional probability density function p(Z|I), where I is the available image information (e.g. stereo disparities). Loosely speaking, p(Z|I) would specify the relative probability that the object is at different depths Z, given the available sensory information.
A Bayes-optimal system maintains, at each stage of local computation, a representation of all possible values of the parameters being computed along with associated probabilities. This allows the systemBayesian statisticians refer to the idea of representing and propagating information in the form of conditional density functions as belief propagation, and this approach has been highly successful in designing effective artificial vision systems.
- to integrate information efficiently over space and time,
- to integrate information from different sensory cues and sensory modalities,
and- to propagate information from one stage of processing to another without committing too early to particular interpretations.
The opposing view is that neural representations are deterministic and discrete, which might be intuitive but also misleading. This intuition might be due to the apparent 'oneness' of our perceptual world and the need to 'collapse' perceptual representations into discrete actions, such as decisions or motor behaviors.
Figure 3. Inferences with convolution codes [needed when the distributions in question are continuous rather than binary, as in Deneve's analysis].
Figure 4. Inferences with gain encoding [an alternative to convolution coding].
If we rank the neurons by their preferred orientations, the population response to a trial of particular orientation θ0 takes the form of a hill of activity (Figure 4b). On any given trial, the shape of the hill is corrupted by near-Poisson noise. To decode such noisy population codes, one can use a Bayesian decoder which returns the posterior distribution over θ given the hill of activity, p(θ|A). For independent Poisson noise, the posterior distribution is Gaussian, with its mean controlled mostly by the position of the peak of the hill and the variance inversely proportional to the gain of the hill. This is because, for Poisson noise, the variance of the spike count is proportional to the gain. This implies that the signal-to-noise ratio the ratio of the gain over the square root of the variance grows with the square root of the gain. Therefore, a high gain entails a high signal-to-noise ratio, and a narrow posterior distribution. Consequently, the noisy hill of activity can be treated as a neural code for the posterior, with the position of the peak encoding the mean, and the amplitude (or gain) encoding the variance.
But, it turns out there is a real problem for the neurophysiologists, because in the 90's Jack Gallant recorded the response properties of neurons in a free-looking situation in monkeys, retro-fitting what had fallen on the classical RF of a neuron, neurons did not behave the way they did in the recording situation, throwing out a paltry few spikes even if the precisely required contrast happened to land exactly on the receptive field. Was it anesthesia, active vision or what that was producing the difference? Maybe the Deneve reframing of leaky integrate-and-fire is part of the answer.