[Next Group] [Up] [Previous Group]

No reconstruction, no impenetrability (at least not much)
A commentary on ``Is vision continuous with cognition?'' by Z. Pylyshyn

Shimon Edelman
School of Cognitive and Computing Sciences
University of Sussex at Brighton
Falmer BN1 9QH, UK

September 1998


Two of the premises of the target paper -- surface reconstruction as the goal of early vision, and inaccessibility of intermediate stages in the process presumably leading to such reconstruction -- are questioned and found wanting.

According to Pylyshyn, a certain chunk of vision machinery, which he calls the early vision module, is not accessible to ``cognitive'' intervention; at most, attentional control over the locus of application of that module can be exercised. This approach is part of a general strategy that treats the mind as modular [Fodor, 1983], seeking to divide it into various cognitive faculties (presumably, each of these is to be conquered by theorists later, when the genies are securely confined to their bottles). In vision research, this amounts to postulating that all the visual processing between the retinal image and the mythical 2.5-D Sketch [Marr, 1982] is inaccessible to the rest of cognition.

For this claim to be of interest, the cognitively impenetrable early vision module must be charged with a substantial portion of the visual processing burden. If all the vision module was doing were to make a few small steps away from the raw image (say, detecting abrupt intensity transitions or ``edges'' in the stimulus image), the vision genie would be too puny to justify imprisonment.

The crucial issue, which the target article commendably puts squarely on the table in section 7, is, then, this: what is the output of the visual system? If there was ever a $64K question in vision research, this is surely it: if you know the answer, you know the nature of the internal representation of the visual world. Pylyshyn offers one possible answer: in section 7, he claims that ``...evidence favors the view that some depth-encoded surface representation of the layout [of the scene] is present in the output of the early-vision system.''

Such surface representation must be both explicit and obligatory (``automatic''), as per the 2.5-D Sketch doctrine [Marr, 1982], if the modularity thesis is to be worth defending in the context of vision. In other words, the system must maintain depth representation of all the visible surfaces at all times -- or else suffer the consequences of its inability to salvage the intermediate representations, to which, alas, the mind has no access (according to Pylyshyn).

In fact, none of these two alternatives fully corresponds to the psychophysical reality. On the one hand, the postulate of obligatory surface reconstruction is undermined (i) by the scarcity of empirical support, (ii) by the ability of current models of recognition and categorization to do without surface representation as such, and (iii) by the indications that arbitrarily sketchy representations are passed off routinely as the real thing by the visual system. On the other hand, cognitive control can be easily exerted over the system's response to seemingly arbitrary combinations of very low-level features such as individual dots and lines. Let us consider each of these issues in turn.

Although some experiments testing the idea of explicit surface reconstruction have been carried out, the interpretation of their results is debatable. Studies that actually claim to have demonstrated that under certain circumstances surfaces are represented explicitly tend to rely on the subject's report of the perceived surface [Treue et al., 1995], a technique prone to what Dennett (1991) terms an internal revision. For example, it is conceivable that the system labels a portion of the visual field as ``that surface'' while actually marking only a tiny minority of the relevant pixels -- perhaps those near the experimenter's probe -- as belonging to it (see [Dennett, 1991], p.344).1 In 1992, Nakayama and Shimojo summarized their psychophysical study of surface interpolation as follows: ``We have suggested that sampled images can be associated with surfaces, not mentioning the representation of surfaces themselves. [...] Because we have no specific data to address this issue directly, we can only speculate.'' (pp.1362-3). At present, the issue of explicit representation of surfaces is still undecided: the survey of [Pessoa et al., 1998] ends with a series of six conclusions, which by no means settle the debate one way or the other.2

A serious problem facing the hypothesis of surface reconstruction stems from the indications that the perceptual functions thought by Marr to be the culmination and the very purpose of reconstruction -- object recognition and categorization -- need not actually involve anything like the 2.5-D Sketch [Bülthoff et al., 1995,Edelman and Duvdevani-Bar, 1997,Mel, 1997]. This begs the question why, in perceptual scenarios that do not call for an interaction with object surfaces, should the visual system bother with the computationally intensive and error-prone reconstruction in the first place. Indeed, mounting evidence (some of it cited in the target article) suggests that little of the scene structure beyond its general layout (described in terms of object categories, not raw surfaces) may be retained even in short-term memory [O'Regan, 1992,Blackmore et al., 1995,Simons, 1996].

When the uncertainty of actual surface reconstruction is combined with Pylyshyn's belief that anything less than a reconstructed surface is not accessible to cognition,3 common perceptual phenomena that are the mainstay of classical visual psychophysics become great puzzles. For instance, one wonders why it is so easy to have subjects respond to simple stimuli consisting of dots or lines -- entities that, according to Pylyshyn's claim, are locked inside the early vision module (in Marr's terminology, these would be part of the Primal Sketch, a hypothetical stage preceding the surface reconstruction in the 2.5-D sketch; an example of a task involving such stimuli is vernier discrimination, mentioned in section 6.3 of the target article). Unless the ``cognitively impenetrable'' early vision module is assigned a more substantial role than chasing around a few points in the visual field, Pylyshyn's thesis loses much of its poignancy.


Blackmore et al., 1995
Blackmore, S. J., Brelstaff, G., Nelson, K., and Troscianko, T. (1995).
Is the richness of our visual world an illusion? Transsaccadic memory for complex scenes.
Perception, 24:1075-1081.

Bülthoff et al., 1995
Bülthoff, H. H., Edelman, S., and Tarr, M. J. (1995).
How are three-dimensional objects represented in the brain?
Cerebral Cortex, 5:247-260.

Bülthoff and Mallot, 1988
Bülthoff, H. H. and Mallot, H. A. (1988).
Interaction of depth modules: stereo and shading.
Journal of the Optical Society of America, 5:1749-1758.

Dennett, 1991
Dennett, D. C. (1991).
Consciousness explained.
Little, Brown & Company, Boston, MA.

Edelman and Duvdevani-Bar, 1997
Edelman, S. and Duvdevani-Bar, S. (1997).
A model of visual recognition and categorization.
Phil. Trans. R. Soc. Lond. (B), 352(1358):1191-1202.

Fodor, 1983
Fodor, J. A. (1983).
The modularity of mind.
MIT Press, Cambridge, MA.

Marr, 1982
Marr, D. (1982).
W. H. Freeman, San Francisco, CA.

Mel, 1997
Mel, B. (1997).
SEEMORE: Combining color, shape, and texture histogramming in a neurally-inspired approach to visual object recognition.
Neural Computation, 9:777-804.

Nakayama and Shimojo, 1992
Nakayama, K. and Shimojo, S. (1992).
Experiencing and perceiving visual surfaces.
Science, 257:1357-1363.

O'Regan, 1992
O'Regan, J. K. (1992).
Solving the real mysteries of visual perception: The world as an outside memory.
Canadian J. of Psychology, 46:461-488.

Pessoa et al., 1998
Pessoa, L., Thompson, E., and Noe, A. (1998).
Finding out about filling in: A guide to perceptual completion for visual science and the philosophy of perception.
Behavioral and Brain Sciences, -:-.
in press.

Simons, 1996
Simons, D. J. (1996).
In sight, out of mind: When object representations fail.
Psychological Science, 7:301-305.

Treue et al., 1995
Treue, S., Andersen, R. A., Ando, H., and Hildreth, E. C. (1995).
Structure-from-motion: Perceptual evidence for surface interpolation.
Vision Research, 35:139-148.

About this document ...

No reconstruction, no impenetrability (at least not much)
A commentary on ``Is vision continuous with cognition?'' by Z. Pylyshyn

This document was generated using the LaTeX2HTML translator Version 98.1p1 release (March 2nd, 1998)

Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds.

The command line arguments were:
latex2html -numbered_footnotes -ascii_mode -split 0 pylyshyn-commentary.tex.

The translation was initiated by Shimon Edelman on 1998-09-02


... p.344).1
Experiments that employ a depth probe to assess the subject's percept of the surface at a chosen spot in the visual field [Bülthoff and Mallot, 1988] cannot help interfering with the very process they aspire to measure. In these experiments, the visual system is effectively called upon to produce on demand an estimate of the perceived depth at the probe; the possibility that elsewhere in the visual field there is no representation of anything like surface depth or orientation is not ruled out.
... other.2
In the target article, Pylyshyn cites Pessoa et al., albeit in in a different context.
... cognition,3
``...there is no evidence that ...outputs of specialized subprocesses are available to cognition in the normal course of perception'' (section 7; ``subprocesses'' are the intermediate steps that presumably lead to the computation of the surface layout).

[Next Group] [Up] [Previous Group]
Shimon Edelman