The main challenges in designing a model of object recognition

While it is relatively easy to explain how we see, that is, how our eyes work, it is much more difficult to explain how we interpret what we see and recognise objects for what they are. This essay will initially briefly identify some of the main challenges in designing a model of object recognition. A number of models have emanated from cognitive psychology to try to address these challenges and explain the processes involved in object recognition. By outlining two of these models – those of Biederman (1987) and Humphreys et al (1995) – a better understanding of how objects are recognised can be arrived at.

It is also necessary to consider how evidence from case studies has contributed to supporting the basic components of these two models. Finally, as the two models to be described do not compete with each other, a point will be made proposing an integration of the two to provide a more complete description of the structure and processes of the object recognition system. There are at least three issues that models of object recognition seek to address. The first of these is how the cognitive system recognises a three-dimensional figure from a two-dimensional retinal image.

Secondly, and related, is the question of whether the object recognition process is viewer-centred or object-centred. That is, is it just as easy to recognise an object no matter what position it is in (if this was the case, an object-centred model would be appropriate)? Or does ease and speed of recognition depend on where the object is in relation to the viewer (in this case a viewer-centred model would be more accurate)? Thirdly, do we have a stored knowledge of structural descriptions? The following two models focus on these issues to differing degrees.

As shall be seen later, these challenges are not merely academic: they are also challenges for patients with brain damage resulting in impaired object recognition capabilities. Biederman (1987, 1990 in Eysenck & Keane, 2000) proposed a model of object recognition in response to a computational model published by Marr & Nishihara in 1978. Their model addressed both the ‘2D to 3D’ problem and the ‘object/viewer-centred’ issue by suggesting that we see objects as various combinations of cylinders.

Biederman’s theory made use instead of a total of thirty-six different-shaped basic components which he called ‘geons’. He proposed initial stages of identifying basic edges and what he called ‘non-accidental properties’ or essentially identifying shapes using Gestalt principles. Once components were identified they would be matched to object representations. Biederman argued that as these geons were invariant in shape they led to a generated 3D image that was object-centred. Biederman’s model seems therefore to have addressed two of the major issues in object recognition quite comprehensively.

The model does not however adequately explain how components match to object representations. Patient HJA (Humphreys, 1999 in Leek, 2001) seems to provide evidence for both the component aspect of Biederman’s model as well as the steps taken in recognising an object. Studies with HJA confirm that he can identify ‘bits’ of objects and indeed some simple objects if they are separate, that is, not overlapping. So Biederman would seem to be correct in proposing an initial identification of ‘bits’ according to Gestalt principles.

HJA was able to name the ‘bits’ he saw but he was not able to integrate them – the next step in Biederman’s model. As he was not able to do this he certainly was not able to then match these ‘bits’ to an object in his structural store of objects. The process of how components match to object recognitions is a strength of the Humphreys et al (1995 in Eysenck & Keane, 2000) model. This model is known as a cascaded connectionist model as well as an interactive activation and competition model.

Very simply, the model involves four levels of recognition: a stored structural description of objects, semantic representations, name representations and category labels. As information passes through this cascade of different levels relevant units are excited and therefore competing units are inhibited; these processes are both bottom-up and top-down. As this model is centred around a structural knowledge store of objects and therefore prototypes, it can be said that this model is object-centred. Evidence for this structural knowledge store comes from case studies.

One patient in particular has provided many examples of the changes in his abilities following a stroke (Humphreys & Riddoch, 1987; Humphreys, 1989). HJA was able to recognise objects by touch much better than simply by vision. This indicates that his semantic store had not been impaired – there were still representations, both physical and semantic – for objects stored somewhere. However, perhaps the most striking demonstration of HJA’s problem is that when he was asked to draw the very objects he had had difficulty in naming, his pictures were consistently excellent.

Thus, he has mental pictures of everything and can reproduce these but the bottom-up process of recognition is impaired. Another patient further exhibits the problem between visual recognition and semantic knowledge (Beauvois, 1982; Beauvois ; Saillant, 1985 in Humphreys ; Riddoch, 1987) knew that snow was white because of the association ‘Snow White’ (that is, she had verbal knowledge). She also knew if snow in a picture was the correct colour or not (indicating she had intact visual knowledge).

However, she could not go from correct visual knowledge to correct verbal knowledge. This is evidence of a separate semantic store, separate from the structural store in the Humphreys et al model. Evidence has been cited for aspects of both the Biederman model as well as the Humphreys et al model. In fact, the models are not in competition – they focus on different processes in object recognition. It would seem, therefore, that a more comprehensive model of object recognition could be arrived at by combining the two models.

In this case, Humphreys et al’s model would provide the framework for how the system works. That is, the new model would still be connectionist, interactive and both top-down and bottom-up driven. Within this framework Biederman’s geons and his steps in object formation should be added so that the actual process, especially in the early stages of recognition, could be described. Finally, as Davidoff and Warrington (2000, in Leek, 2001, p. 266) have suggested through their own work with a patient with only higher-level recognition problems: “… he high-level mechanisms that support object constancy are functionally distinct from those that are involved in the derivation of integrated lower-level perceptual representations of visual stimuli”. So if the Humphreys et al model could be extended to include this more complex higher-order recognition then this new model would address all three challenges in the study of object recognition as well as being supported by the range of case studies available. Two of the most important models in object recognition have been described.

Although they focus on different aspects of the cognitive processes involved in object recognition they are both valid. The case studies cited have provided evidence for a stored knowledge of structural descriptions. As Leek (p. 265) puts it: “This pattern of performance suggests that low-level perceptual processes are functionally distinct from other intermediate stages in recognition that support the derivation of integrated perceptual representations of visual stimuli”.

Certainly HJA’s case, for example, does support a level of stored structural descriptions of objects because he is able to name things from memory; draw them from memory; and describe them – thus, his semantic and categorical stores are intact and so is his vision. It is the mapping of structures onto these stores that has been broken. Further case studies have also pointed to the fact that perhaps more work needs to be done on expanding our knowledge of the higher end of processing. To this end an integrated model has been suggested as a way forward.