TambiÚn disponible en Espa˝ol


The digital magazine of InfoVis.net

Active Vision
by Juan C. D├╝rsteler [message n║ 200]

Despite what we may think, our visual system is incredibly limited. We see only a tiny part of our environment sharply and in a fragmentary way, as our eyes shake scanning it. Even more deceiving, is that we retain only two or three subjects of attention during a few fractions of a second, bound to the task we are intending to perform. These facts have important consequences for design and visualisation.

Most of us, when looking at our environment, have the sensation that we are seeing everything sharply. One could say the we perceive our environment in its entirety, that we see everything at once. This illusion is far from reality as we can see by just taking into account that we need to read a book by looking at each word one by one.

To see this better you just need to do a simple exercise: stare at the first word of the next paragraph (ACTIVE) and, still looking at that word, try to read the other one at the other end of the page.



Surprised? Effectively, it's not possible to read the word on the right while we are staring at the one on the left. This is due to the fact that the fovea occupies a tiny area of the retina (a circle of 0.2 mm of diameter) covering only 1% of its surface that, nevertheless, is directly connected to 50% of the visual cortex. The fovea contains a high density of photoreceptors and is responsible fot sharp vision.

DusHeadFovealEye.jpg (27776 bytes) DusHeadFovealMouth.jpg (27711 bytes)
What we see and what we don't see. Vision is only sharp in a very small portion around the fixation point whose image falls directly on the fovea.
In the images we can see an artistic recreation of the sharp vision area. In the left face we can clearly see the right eye in fovela vision. In the right image we can clearly see the mouth.
We can see the eye or the mouth clearly but not both at the same time!
The constant movement of the eyes allows us to extract the information we are interested in providing the illusion that we see  it all.
Source: Image by Luis Fuentes de Oca. Image treatment by the author.
Click on the image to enlarge it.

On the other hand the peripheral retina (the remains of it) with 99% of the surface connects only to the remaining 50% of the visual cortex, integrating the signals of many individual cells into a limited number of neural fibers.

The result is a high visual acuity zone just in the center of the retina (the fovea) and a large peripheral vision area with very low resolution but very sensitive to movement and spatial location.

Consequently we can talk about foveal vision, the one allowing us to see sharply, that encompasses a very small solid angle (about 3┬║ of the visual field). This obliges us to look straight at an object to see it clearly. The peripheral vision is the one that uses the peripheral retina providing very low resolution but very sensitive to movement, contrast and spatial location.

In order to overcome these severe restrictions our eyes are constantly moving in subtle "shakes" (saccadic movements) to scan our vital environment gathering information for the brain to supposedly build a mental map of our environment giving us the illusion that we "see everything in detail".

This has important implications for the creation of visualisations, graphic charts and human computer interfaces..

Until quite recently the idea was that the mental map gathered a very rich image of the surrounding world. The concept of Active Vision represents a change of paradigm since it recognises that this is only an illusion, this "map" is very poor and limited and vision is a dynamic process.

The brain uses only the fragments of visual information it strictly needs to execute its instantaneous mental activity. In reality what we store of the enviroment we are immersed in is incredibly limited and it's only thanks to the associations to past experiences that we can give it meaning and richness.

As Collin Ware explains in his very commendable book "Visual Thinking for Design", the illusion of having everything in view comes from the fact that we can extract whatever detail of the world through vision at any time just by making a movement of the eyes that is, literally, faster than thought and consciousness.

Visual thinking is a dynamic act of relation with the environment in which we use information stored in our brain and information stored in an external form (visualisations) that in Ware's words, constitutes a dance with the environment. Understanding this dance is fundamental to design effective visualisations.

Let's summarise some of the key concepts that appear in Ware's book along with their implications for design.


The act of perception is determined by two types of processes that take place along the different pathways of the visual cortex.

CuatroCarasPerc.jpg (55188 bytes)
Perceptual processes bottom up & top down
: Image treated by the author
Click on the image to enlarge it.
  • Bottom-up: driven by the visual information present in the light pattern incident on the retina.

  • Top-down: driven by the needs of visual attention that derive form the tasks the brain is performing.

For a better understanding let's see the image on the right. Let's follow the arrows beginning with the letter "V" to build the word that the letters form. While we are doing it we still perceive the faces but clearly they vanish from our consciousness.  On the contrary if we try to decipher the nature of each face and their facial features now what vanishes are the letters and arrows. 

So, what we see depends on one hand on the patterns of light present in our retina (bottom-up) and on the specific attention the task we are pursuing demands from our brain (top-down). The bottom-up process provides neural information in several stages about the main traits of the image that the top-down processes filter making those that are not the object of our attention "vanish".

If you are in doubt look at the smart videos of dothettest website

Bottom up process

There are three basic concepts that help understand this process

  • Features The raw information coming out of the photoreceptors in the retina is processed in different areas of the visual cortex that work in parallel identifying particular features of the image present in the retina. Some zones are specialised in the detection of orientation, others on size while others respond better to the channels codifying colours and yet others are specialised in the detection of movement or stereoscopic vision. This way each part of the visual field is processed simultaneously for the detection of those elementary features that are passed onto the next stage, discarding non relevant information (the one that does not trigger the appropriate sets of neurons)┬á┬á

  • Patterns The elementary features┬á detected in the former stage are combined to produce increasingly complex patterns. Visual space is divided into regions of common features like for example colour, orientation, size, etc. Individual features are connected into long chains that derive into continuous contours and in general different patterns built on top of the elementary features begin to emerge.

  • Objects. In the final phase the sets of relevant patterns built on top of millions of elementary features end up "distilled" through patterns into a very reduced set of visual objects, sets of patterns that retain relevant information of the scene. These visual objects are stored in a so called visual working memory┬á that is not capable of maintaining more than three objects at a time.

    This very limited capacity of the visual working memory has an important impact on what we can actually see and retain. Although the set of patterns that make the visual object could be interpreted as a horse, there's no such mental image of the horse, but it's more like an associative network that can find a set of former experiences related with the type of visual object and trigger a series of concepts associated to it (neigh, mammal, friendly or hostile depending on our past experience, etc.)
BottomUpEng.jpg (44954 bytes)
Bottom-u and top-down processes in perception. Both process are shown for the task of finding a pineapple in a set of fruits.
Feature detection (here we have only represented profiles and colour) evolves into pattern generation and from that into the construction of visual objects. All this stimulated by the attention required by the task the brain is trying to perform (detecting a pineapple). 
This way the stimuli that do not correspond with the expected features of a pineapple tend to "vanish" (are inhibited). We have depicted that by making the orientation and color features not corresponding with those of the pineapple vanish.
Finally they converge into very simple patterns that trigger the concept of "pineapple" associated to them in our brain, including  the multiple associations to that concept built through experience during time, like "sweet", "acid", "raw", etc.
Source: representation elaborated by the author.
Click on the image to enlarge it.

The few visual objects that appear in the working memory are precisely those that respond better to the task we are trying to perform in that moment. We will see now that we say "respond" here in a literal way.

Top down process, driven by Attention 

The bottom-up process can be summarised as you can observe in the attached graphic.  Starting from the retinal image (a set of coloured spots that in this example correspond to a series of fruits) the processes of feature detection and pattern generation take place, ending in the elaboration of visual objects. The arrow labeled "Integration" represents the evolution of the steps of the Bottom-up process.

At the same time the complementary top-down process takes place.  the signals of the bottom-up process using the current cognitive task symbolised by the arrow labeled "Attention".

Top-down processes are described in literature as linked to the concept of attention. These processes are driven by the need to accomplish a given task in each moment.

This way, if we are looking for an object with certain features (colour, oriebntation, etc) the neurons specialised in detecting those particular features will see their response increased (in fact they will send inhibiting signals to the neurons specialiised in other types of features) by the top down process, resulting in a more intense signal than the ones in charge of detecting other features.

The same happens with the generation of patterns and the fusion of them into visual objects. For example, if we are looking for a pineapple in a set of fruits (as in the graphic representation) the neurons detecting green yellow and brown will increase their signals (they will "shout" louder)

At the same time the neurons specialised in detecting orientation will reinforce their signals when they detect crossed features in the form of a rhomboid and the patterns compatible with round shapes will increase their response when detected in the image in detriment of other patterns compatible with cylinders or other non relevant shapes.

Finally only those visual objects that could resemble a reticulated spheroid with a green spot on top will appear in the working memory.

Maybe the most fascinating aspect of this is that it constitutes a dynamic process in which the working memory holds the objects during a very small time lapse with only the one needed to go on processing the attention and accepting or discarding an object in function of the task.

In the end this is a dynamic and interative process. The brain, driven by the attention towards the current task, redirects the saccadic movements of the eyes towards the more relevant parts of the image feeding back continuously the process until the objective is attained. This way if we are in a sports shop looking for a tennis ball, many round objects will show up in the working memory , but the visual system will be continuously feeding back the gaze management in order to find round, yellow and hairy objects in a dynamic way, using the contents of working memory as visual objects show up.

This explains why many times we "don't see" real objects clearly placed in our visual field, but not within our sphere of interest or attention. This is the case of the videos of the dothettest web site.

It also allows us to understand why visual atttributes (orientation lines, coloured areas, etc.) emerge as elements of attention and tend to form patterns easily associable to concepts (visual variables), hence useful in graphics.

We have intended to convey here a brief exposition of the latest advances in the cognitive study of vision. We appear to be much more limited than we believe. This should be a source of knowledge when we make representations that try to make existing information more understandable or when we intend to convey a coherent message in an easy to comprehend way

Links of this issue:

http://www.infovis.net/printRec.php?rec=persona&lang=2#ColinWare   Brief semblance of Colin ware
http://www.dothetest.co.uk/   Awareness test website
© Copyright InfoVis.net 2000-2018