Representation of visual structure


Intelligent processing of visual objects implies the ability to deal with their structure. Understanding the ability of human observers to perceive the arrangement of parts in a composite object, or the arrangement of objects in a scene, is the central concern of the current theoretical and experimental work in high-level vision. Theories of visual structure processing necessarily analyze object representation in terms of structural units that are, in a sense, smaller than the entire object or scene. Common to all such theories is the need to explain the origins of the structural units, and, in particular, the dependence of the set of units used by a visual system on its experience with structured stimuli. This dependence — specifically, the probabilistic processes that may govern the acquisition of structural units by the human visual system — is the focus of the vision research in my lab.


What does the phenomenal impression made by a scene consist of? The answer, we conjecture, is scene structure: objects and their locations ("what+where"). If this is so, the impression must persist, even if for a short time, at those stages of the visual pathway where units tuned both to complex shapes and their locations are present (specifically, in areas V4 and TE). Studies of the neural correlates of visual awareness suggest that information represented at this level should be available to conscious access.

We study the ability of observers to recall, over a few intervening scenes, spatially anchored information concerning scene components, varying the number of objects, and the statistics of their absolute and relative location. Our results indicate that scene structure — the phenomenal "what+where" — is psychologically real, and is briefly available to conscious recall. Moreover, the representation of such structure is modifiable by statistical learning, which can produce insensitivity to scene changes that fall within the expected norm (see the "inside boundary" condition in the figure on the right), and heightened sensitivity to unusual changes, such as the translation of a familiar spatial arrangement to a new location (see objects 1 and 2 in the "outside boundary" condition).

Joint work with Claudia M. Hunter.

Computational theory and modeling

We are developing a computational model of structure representation, which uses a common low-dimensional coarse code for shape and location, a notion derived from our earlier work on shape recognition and categorization, and supported by a body of recent neurobiological data, particularly the reports of ``what+where'' neurons in inferotemporal and prefrontal cortices. This effort explores the ability of a computational model of unsupervised learning to mimic the detailed pattern of human performance in the acquisition of composite structural-unit representations: the model will be used as a ``subject'' in replicas of psychophysical experiments, and as a testbed for new computational ideas and explanations. We expect this project to facilitate the development of a detailed and explicit, hence explanatory, computational model of statistically driven unsupervised learning of structural primitives for vision. The resulting model will be biologically relevant, being based on findings from monkey electrophysiology. Our research should also result in the development of practical applications in computer vision, where the problem of dealing with object structure is a major challenge. Moreover, understanding the computational basis of structure processing should also be useful in cognitive domains other than vision, notably language, where new approaches rooted in statistical concepts are emerging both in theoretical linguistics and in the empirical field of natural language engineering.

Joint work with Nathan Intrator.


Shimon Edelman <>
Last modified on Thu Jun 16 12:34:09 2005