School of Cognitive and Computing Sciences
University of Sussex at Brighton
Falmer BN1 9QH, UK
According to Pylyshyn, a certain chunk of vision machinery, which he calls the early vision module, is not accessible to ``cognitive'' intervention; at most, attentional control over the locus of application of that module can be exercised. This approach is part of a general strategy that treats the mind as modular [Fodor, 1983], seeking to divide it into various cognitive faculties (presumably, each of these is to be conquered by theorists later, when the genies are securely confined to their bottles). In vision research, this amounts to postulating that all the visual processing between the retinal image and the mythical 2.5-D Sketch [Marr, 1982] is inaccessible to the rest of cognition.
For this claim to be of interest, the cognitively impenetrable early vision module must be charged with a substantial portion of the visual processing burden. If all the vision module was doing were to make a few small steps away from the raw image (say, detecting abrupt intensity transitions or ``edges'' in the stimulus image), the vision genie would be too puny to justify imprisonment.
The crucial issue, which the target article commendably puts squarely on the table in section 7, is, then, this: what is the output of the visual system? If there was ever a $64K question in vision research, this is surely it: if you know the answer, you know the nature of the internal representation of the visual world. Pylyshyn offers one possible answer: in section 7, he claims that ``...evidence favors the view that some depth-encoded surface representation of the layout [of the scene] is present in the output of the early-vision system.''
Such surface representation must be both explicit and obligatory (``automatic''), as per the 2.5-D Sketch doctrine [Marr, 1982], if the modularity thesis is to be worth defending in the context of vision. In other words, the system must maintain depth representation of all the visible surfaces at all times -- or else suffer the consequences of its inability to salvage the intermediate representations, to which, alas, the mind has no access (according to Pylyshyn).
In fact, none of these two alternatives fully corresponds to the psychophysical reality. On the one hand, the postulate of obligatory surface reconstruction is undermined (i) by the scarcity of empirical support, (ii) by the ability of current models of recognition and categorization to do without surface representation as such, and (iii) by the indications that arbitrarily sketchy representations are passed off routinely as the real thing by the visual system. On the other hand, cognitive control can be easily exerted over the system's response to seemingly arbitrary combinations of very low-level features such as individual dots and lines. Let us consider each of these issues in turn.
Although some experiments testing the idea of explicit surface reconstruction have been carried out, the interpretation of their results is debatable. Studies that actually claim to have demonstrated that under certain circumstances surfaces are represented explicitly tend to rely on the subject's report of the perceived surface [Treue et al., 1995], a technique prone to what Dennett (1991) terms an internal revision. For example, it is conceivable that the system labels a portion of the visual field as ``that surface'' while actually marking only a tiny minority of the relevant pixels -- perhaps those near the experimenter's probe -- as belonging to it (see [Dennett, 1991], p.344).1 In 1992, Nakayama and Shimojo summarized their psychophysical study of surface interpolation as follows: ``We have suggested that sampled images can be associated with surfaces, not mentioning the representation of surfaces themselves. [...] Because we have no specific data to address this issue directly, we can only speculate.'' (pp.1362-3). At present, the issue of explicit representation of surfaces is still undecided: the survey of [Pessoa et al., 1998] ends with a series of six conclusions, which by no means settle the debate one way or the other.2
A serious problem facing the hypothesis of surface reconstruction stems from the indications that the perceptual functions thought by Marr to be the culmination and the very purpose of reconstruction -- object recognition and categorization -- need not actually involve anything like the 2.5-D Sketch [Bülthoff et al., 1995,Edelman and Duvdevani-Bar, 1997,Mel, 1997]. This begs the question why, in perceptual scenarios that do not call for an interaction with object surfaces, should the visual system bother with the computationally intensive and error-prone reconstruction in the first place. Indeed, mounting evidence (some of it cited in the target article) suggests that little of the scene structure beyond its general layout (described in terms of object categories, not raw surfaces) may be retained even in short-term memory [O'Regan, 1992,Blackmore et al., 1995,Simons, 1996].
When the uncertainty of actual surface reconstruction is combined with Pylyshyn's belief that anything less than a reconstructed surface is not accessible to cognition,3 common perceptual phenomena that are the mainstay of classical visual psychophysics become great puzzles. For instance, one wonders why it is so easy to have subjects respond to simple stimuli consisting of dots or lines -- entities that, according to Pylyshyn's claim, are locked inside the early vision module (in Marr's terminology, these would be part of the Primal Sketch, a hypothetical stage preceding the surface reconstruction in the 2.5-D sketch; an example of a task involving such stimuli is vernier discrimination, mentioned in section 6.3 of the target article). Unless the ``cognitively impenetrable'' early vision module is assigned a more substantial role than chasing around a few points in the visual field, Pylyshyn's thesis loses much of its poignancy.
This document was generated using the LaTeX2HTML translator Version 98.1p1 release (March 2nd, 1998)
Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html -numbered_footnotes -ascii_mode -split 0 pylyshyn-commentary.tex.
The translation was initiated by Shimon Edelman on 1998-09-02