How do Inversion and Contrast-Reversal Lirnit Face Recognition? A

0 downloads 0 Views 3MB Size Report
The important qualitative difference between efficiency and N, is that efficiency has .... The interesting propeny of this curve is that the combination of these features has resulted in the ...... S~ikes: Explonno the Neural Code (1" ed.). Cambridge ...
How do Inversion and Contrast-Reversal Lirnit Face Recognition?

A thesis submitted in conformity with the requirements

for ihe cfegree of Master of Arts Graduate Depariment of Psychology

University of Toronto

O Copyright by Carl MichaeI Gaspar 200 1

1+1

National Lbmy ,calda

Biblbthdque natinatiOna(8 du Canada

Acquisiüons and Bibrqraphic Services

Acquisitions et

services bibliographiques

The author has granteci a nonexciusive licence aiiowing the National Liiracy of Canada to reproduce, loan, distniute or seil copies of this thesis in microform, papa or electronic formats.

L'autem a accordé une licence non exclusive permettant à la Bibliothèque nationale du Canada de reproduire, prêter, disûiiuer ou vendre des copies de cette thèse sous Ia forme de microfichelfilm,de reproduction sur papier ou sur format électronique.

The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts h m it may be printed or otherwise reproduced withoui the author's permission.

L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thése ni des extraits substantie1s de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.

How do inversion and Contrast-Reversal Limit Face Recognition? Master of Arts, 200 1 Car1 Michael Gaspar Department of Psychology University of Toronto

Abstract

We analyzed the face inversion (FI) and contmt-reversal (CR) effects using a black-box model of recognition performance (Gold, Bennett, and Sekuler 1999b). which quantifies performance in terms of three independent factors. Three advantages of this approach were described. First, this quantitative description of the face effects provides new information about these effects. Second, our black-box model reveals an arnbiguity in the existing research that can only be resolved by the current study. Third, the literature on perceptud Ieaming suggests that efficiency and not noise should change with both Fi and CR. This was confmed by the results. A separate experiment empioyed response classification estimate observers'

perceptua1 templates. An anaiysis of the changes in these templates with Fi and CR verified the idea that these stimulus manipulations reduce an observer's sampling eficiency. The development of inflexible representations with learning is discussed as a

possible cause of these efficiency losses.

Table of Contents

Introduction

........................................................................................i

Interpreting the Face Inversion and Conmt-Reversal Effects A Black Box Approach

.................. 2

................................................................

...........

L I

..................

13

From Quantitative Analyses to an Explanaiion of the Face Effects The Inherent Ambiguity of Simple Threshold Measurements

An Independent Measure of Information Use: Response Classification Percephml Leaming and the FIE and CR

Methods

...... 14

..........................................

17

..................................

19

............................................................

20

The Development of Inflexible Representations Hypothesis and Overview

4

............................................................................................

23

Observers

............................................................................. 22

Apparatus

...............................................................................

22

..................................................................................

23

Stimuli

.............................................................................

25

...............................................................................

25

...................... . . . ....... . . ...............................

31

Noise fields Procedure

Ideai Observer

Results

.............................................................................................. EquivaIent input Noise

................................................................

Intemal-to-Externid Noise Calculation Efficiency

..........................................................

.................................................................

32

33

33 33

Analyses of NCOR ..................................................................... 36 Percentage of Significant Pixels

...................................................... 39

Qualitative Analyses of Classification Images ..................................... 40

Discussion

........................................................................................ 46

Experiment 1: Black Box Mode1 of Efficiency Experiment 2: Classification tmages

.....................................

16

.................................................47

Impact of Results on Assumptions about the Relevance of Information Type 50 What Does It Mean CO Have an Efficiency Deficit? ................................. 51

References

.......................................................................................

55

Appendix

..........................................................................................

62

Limits to Face Recognition 1

Introduction Face reco,pition is an integrai part of everyday living. Research in cornputer vision has also shown that face recognition is an enormously difficult task to perform, perhaps leading to the popular idea that face recognition is subserved by cognitive processes that are qualitatively diffecent from those underlying normal object recognition.

In cognitive neuroscience and perceptud psychology. this idea has taken two major forms: 1. Human face recognition is subserved by an innate mechanisrn sensitive to face stimuli, which is anatornically and functionally separate from normal recognition processes (Firah et al., 2000; 1998: Kanwisher, 2000). 2. Through repeated social interactions we are naturally expert at face recognition, and this ability is. thus. the strongest manifestation of expert object recognition, which extracts different types of information from the given stimuhs (Carey, 1992; Gauthier et al.. 2000: 1998). So while the debate about the origins of our face recognition ability remains animated, there bas k e n striking consensus on the fundamental idea that this abiiity is qualitatively different from normal object recognition (whether normal is defined as either non-face object recognition, or non-expert object recognition)In behavioral research. the idea that face recognition processing is fundarnentally unique has been encouraged by a focus on the face inversion effect (FIE) and contrat reversai effect (CE)as phenomena chat demonstrate different uses of information type. The FIE is a degradation in recognition performance, rneasured by reaction time (RT)or accuracy, that occurs when the face stimulus that must be recognized is rotated 180 degrees in the picture-pIane. The CRE is a simfiar degradation that occurs when the

Limits to Face Recognition 3 contrast of a face stimulus is reversed, similar to a photographie negative. Many researchers speculate that face inversion dismpts the normal processing of relational information, and the current debate has been on understanding what this relational information is. The less active debate on contrat-reversal effects has produced more alternatives to the information-type explanation, but it has stiI1 been Iargely driven by concerns, for example, about how we interpret lighting and extract 3-D,view invariant information. We suggest that. for the initial stages of exploratory research on face recognition, questions about the possible use of specific types of information may be overly ambitious. Instead, we seek to provide a quantitative description of the performance limitations elicited by inversion and contrast reversal. [n other words, we begin with the question: What are the Face Inversion and Contrast Reversai Effects?

Interpreting the Face Inversion and Contrast Reversal Effects

The primary studies cited as support for a qualitativeiy different face recognition process have been examinations of the face inversion effect (FE). and its variations Iike the Thatcher Illusion and chimenc face effect (or composite face effect). In 1969, Yin demonstrated a greater decrement in identification performance with face inversion. for face stimuIi over pictures of other types of objects. This phenomenon is significant because face inversion changes nothing about the stimulus that is important for its recognition; the stimutus, as a whole, retains its relative spatial frequency composition. As such, any decrement in performance introduced by inversion can reasonably be

attnbuted to a change in how we use the information that is there.

Limits to Face Recognition 3 As descnied earlier, many researchers have exptained this apparent difference in

information usage in terms of separate mechanisms underlying face and normal object recognition, which rely on different types of information. The most that can be logically inferred from face inversion. however. is that it causes a change in information usage. When the inference is stated in this way. we are really only emphasizing the fact that human face recognition must be limited by sorne factor interna1 to the observer under study. The ability to use a specific type of information is just one of these possible factors. Inversion rnay also decrease the overd1 amount of information we use. the relative amount of useful information we use. or the degree of variability in this usage.' Unfortunately, the notion that the FIE reftects the use of different information types often serves as a working assurnption behind rnany studies conducted on other aspects of face recognition like facial expression (Caider et al.. 2000). contrast polarity (Hole, George, and Dunsmore, 1999). neural mechanisms (Haxby, Ungerleider. Clark, Schouten. Hoffman, and Martin, 1999; for an exception. see Aguirre, Singh. and D'Esposito, 1999), and comparative psychology (Parr, Dove, and Hopkins, 1998). In order to gain a clear understanding of what the F E and CRE cm tell us about face recognition, one must first gain a precise understanding of what these effects are. In other words, the germane question is: How do inversion and contrast reversai Limit performance? We propose a

1

There are a few experimental results that suggest face inversion disrupts the use of 'relational' information. The interested reader is referred to the Appendix for criticism of this research. The intention is not to argue against the idea inversion disrupts relational processing, but to outline the Limitations of current experimental methods in showing that this is true. This is one of the reasons why we have been careful not to assume much about what our results will be.

Limits to Face Reco-gition 4 black-box model of calculation efficiency and internai noise to aid in answering this new question.

A Black Box Approach A methodology is required to somehow specify how recognition performance is

limited during inversion or conuast revend, without making any assumptions about the nature of face recognition in pdcular. Ideaily, this wouId involve a genenlized mode1 of sensory performance that allows Limitations to be reduced to two or more independent

and measurable quantities. The gened- or simple- nature of such a model would dlow much room for speculation on the mechanistic propenies involved in performance changes across a condition. At the same rime, independence among parameter values provides us with the potentiai to rule out entire categories of expianation. These two goak cm be achieved by ueating the observer in a psychophysical task

as a black box, a sensory device whose general, intemd properties can only be deduced fom extemal characteristics. in this smdy for exarnple, when one rneasures hypothecicai property X based on an observer's responses, one can conclude that the observer behaved

as though tirnited by a cenain amount of property X. An extension of the black box modei of cdculation efficiency described by Pelli (1990) will be used in this snidy (see Figure 1 for a schematic of Pelli's model). The reIationship of interest, ais0 used to analyze changes that occur with teaming in Goid, I., Bennett, PJ., & SekuIer. A.B. (1999b), is:

Er = k(N, + N,)

Eq. t

Limits to Face Recognition 5 where E, is the contrast energy of the signai at threshold, Ne is the power spectral density of externaily added noise, Neq is the constant for the energy of the intemally added noise, and k is a constant inversely proportionai to caiculation efficiency. The exact mode1 used here replaces k with an expression , 'k which includes a new term for proportional noise

N,,, (or m ) dong with the original k that denotes the inverse of efficiency: k'= k ( I + m ) / ( l - km)

Eq.2

In any case, our study is oniy concerned with consistent changes in the parameters k, Neq and N, across the conditions of contrast reversa1 and inversion. Two independent procedures can be used. one to derive k' and Neq, and one to derive N, (following Gold et ai.. 1999b). Therefore, inconsiscent changes in N, should force one to interpret a consistent k' change as a condition driven change in calculation efficiency (k). Aside from this complication, the relationship modeled in Eq. 1 is a relatively simple one. It is a linear function, with k' as its dope and Neq as its x-intercept. As constants, thesc two parameters are necessarily independent. If our psychophysical responses cm be modeled by Eq. 1, then one can exploit the potential to narrow explmations of the contmt reversal and inversion. For exampie, factors which naturally affect k' but not Neq can exclude a wide range of other factors. Some examples of what these factors rnay be are described next.

Limits to Face Recognition 6

Exiernai Noise

EquMent Input Noise Caicuiation slimuius

Decision Vasiable

Decision

Fioure 1. BIack-Box Mode1 of Recognition Performance. This is the model implied by Eq.1 (after Pelli. 1990). Eq3 is different because it doesn't assume that the calculation is necessarily contnst-invariant.

It is important to emphasize chat the extended model requires onIy that we

reinterpret k in the original equation in light of changes in N,. which are estimated using a procedure (also exphined in Burgess, A.E., & Colborne, B.. 1988) that is wholly independent of that used to estimate Neq and k (now k') in Eq.1. Lssues concerning actuat mesurement and the rtppticability of these models to psychophysicai behavior cm, therefore, be ireated sepmtely. These are discussed next. Equivalent Input Noise. No sensory device is perfect. Any physical tmsmission of a signal will degrade that signal by adding some degree of noise, intrinsic to the

particuIar process or device used. The intemally transformed stimulus that human observers base their decisions on in a given sensory task is d e p d e d in the same way. Factors known to d e p d e visual stimuti, accounting for some of the noise in the effective

stimuius, are the quantum fluctuation of Iight, the limited resolution of Our retinal aperture and the variability of neuron f i n g (Banks,SekuIer, & Anderson, 1991)- PeIli

(L990)describes how such ÏntemaIIy added and, thus, inaccessible noise (Na) is

Limits to Face Recognition 7 estimated in the case of audio amplifiers, and then shows how an observer's Na can be estimated using similru principles. Very briefly, it is known chat linear amplifiers obey

the following d e :

where Nin is the power spectraI density of noise inputted to the amplifier, and Nout is the power spectral density measured ac the amplifier's output. Pelli notes that, at low levels of Nin. Nout is chiefly determined by Na such that, up until the point where Nin=Na, Nout is approximately equd to a constant proportion of Na in log-log coordinates. It is said that, at low Ieveis of external noise (Nin), interna1 noise (Na) dominates. Conversely, where Nin contributes relatively greater variance. it dominates over Na zuid Nout is approximately equal to a [inear function of Nin aione (these approximations are qudified in next p m p p h ) . In order to estimate a device's Na. one simply needs to End the value of Nin at which the device's measured Nout begins to change with Nin. In linear coordinates, this amounts to finding the x-intercept of the ernpirical Nout (Nin) function. Since Na is estimated in reference to the externaIly added noise (usually energy), it is also called equivalent input noise (Neq): this makes the procedure transparent. Neq wiU now be used in place of Na.' T

Alttiough the math is simply Iiinear, the term noise dominance and the Log-log properties of Eq.3 seem to imply somerhing more- a physicd property of noise perhaps. This is not me. The above description of noise effects in ampIifiers di folIows from the hypotheticai. Iinear relation in Eq3. In fact, it cm be shown that, for Nout measured at large Ni, NoutWin approaches a reasonable approximation to the tnie slope. Perhaps this is the m o n why Pelii claims that calcdation efficïency estimates require only a single threshold measurement. This may seem Iike pointless exphation of some trivial facts, but my belief chat the term 'dominance' implies some speciaI propeq of noise may cause important misunderstandings.

Limits to Face Recognition 8 The only three physicai properties assumed in Eq.3 are a linear relationship, an order of effects, and additivity of noise; the order specified is that noise is dded before the amplification or proportion constant, In the case of amplifiers, these propenies usually hold. However, the choice of tate or early noise seems to be largely based on what you are interested in modeling (please see the Appendix for a brief discussion). Lastly, separate noise sources in any device add simply because they are a propeny ineasured by variance, and variances add. The two real issues to be confronted in applying equivalent input noise measurement to human sensory behavior are thus: 1. How is 'output' noise measured from a psychophysical response? 2. Is our approximation of the human Nout linear to inputted noise? Fortunately. Pelli (1990) and others (Burgess et al., 1981: Legge et al.. 1987) have shown that signal c o n m t power ar threshoId (c2) is similar to Nout in that: c 2=IVer +Nq

Eq.4

This means chat an observer's Neq cm aiso be estimated by calculating the x-intercept of a linear input-output relationship. However, we are now interested in the c2 (Nex) relationship. Calculation Efficiency, Since c2 is proportiond to the contrast energy Et of a signal, Eq.4 becomes Eq.1. Now the proportionality factor k' takes on a speciai meaning. Re-arranging t e m , one can see rhat k' in Eq.1 is:

E,/(N, Signai energy at threshold

+ N,)

Eq5

,teIls us how much signai an observer needs in order to

perforrn at a specified level of performance, a percentage of correct responses in a

Lirnits to Face Recognition 9 psychophysicai task. This percent correct, in mm,c m be uniquely related to a sepante signai-noise ratio, one that tells us exactly how much signai an should need in order to perform at that percent correct (k' ideal). This is an optimal signal-noise ratio that is based solely on the type of psychophysicd task and the stimuli involved, and thus ref1ect.s the intrinsic difficulty of the task, The derivation for k' ideal in our experiment is described in the Methods. In terms of the general model used here (outlined in Figure 1). k' ideal is best understood as a 'final outcome' signal-noise ratio. which musr somehow account for K. the hypothetical signal-noise ratio of the effective stimulus just prior to calculation (Eq.5). PhysicaIIy. any transformation of the effective stimulus cm only reduce or maintain its signal-noise ratio: ic is assumed chat noise cannot be selectively subtracted. nor can the signai be selectively amplified. If we assume the model outlined in Figure 1. and represented by Eq.1, any further reduction to the signal-noise ratio is Iimited to a signal reduction, This conclusion gives rise to the intecpretation of k' as a measure of one's ability to use the effective stimulus effciently. For example, if one's strategy for transforming the effective stimulus into a decision variable deviates from optimal in any way, one will need more signai energy thm an ideal observer would to perforrn at the same Ievel. In other words, k' will be Iarger than k' ideal. In order to quanti@ this deviation from ideal, we measure the percentage of the k' to k' ideal, This is cailed the caiculation efficiency. And we use a percentage in order to emphasize that this statistic refers to the actuai proportion of the signal energy used by an observer.

Limits to Face Recognition 10

Multi~licativeNoise. There is one additionai factor cailed multiplicative noise (N,), which is qualitatively similar to Na but limics performance in the same way that sub-optimal sarnpling does. N, is a type of intemaily added noise whose spectral density is dependent on the contrast level of the effective stimulus. In other words, N, is a noise source that is proponional to signd energy, and the spectral densities of external and intemal, constant noise (Gold et ai.. 1999; Lillywhite. 1981). Since the presence of N, will cause internal variability to increase with external noise, it will also result in an increase in the slope of the noise-threshold function that has nothing to do with the efficiency of calculation. The important qualitative difference between efficiency and N, is that efficiency has to do with sampling (Pelli, 1990) and N, has to do with the amplification of internai variability (Lillywhite, 1981). which relate to two very different phenornena. In order to separate the effects of these two variables, we estimate the total internal noise of an observer, using the double-pass technique described by Burgess & Colborne (1988; see Methods more detail). It can be shown that, at high levels of externally added noise, proportional noise rather than a constant internal noise is the effective determinant of psychophysical performance (Murray et ai., to be submitted; Burgess & Colborne). Any change in an observer's total internal noise is thus more Iikely to be related to a proportional mther than constant noise source change. In this study, total intemal noise is measured aIong with Neq and k', and we attnbute any condition-related changes of this meastire to a change in proportionai noise.

Limits to Face Recognition 11

From Quantitative Analyses ro an Erplanation of the Face Eficts Earlier on, this paper stressed that researchers had focused tao narrowly on explaining the FIE and CRE in t e m of different types of information processing. Our application of a black box mode1 ta hce recognition divens attention away from exphining anything all, Instead, the purpose of this quantitative rtnalysis is to describe what the face inversion and contrast reversai effects are. Once we know the reIative contributions of different Iimiting factors to face inversion and contrast reversai, we cm then decide what it is we have to explain in h e first place. We have shown tha~our bhck

box mode1 d i e s on a few simple assumptions and. chus, characterizes psychophysical decisions in a general wüy. This is why we are confident in its applicabiiity to face recognition. Furthemore. we have shown that it characterizes performance-Iimiting factors that are cheoreticalIy independent. This is why we are confident thnt our quantitative descriptions of the F E and CRE will be unique and precise. For example. if we know that inversion introduces intemal additive noise of amount X and a decrease in caIculation efficiency of mount Y, we cm be cenain that the change in X relative to the change in Y is a significant measure. This provides the potentid to narrow the effects of inversion and contrast reversal to effects of a specific kind. However, one might wonder about the kind of explanacion that c m be developed

from knowIedge about the reIauve changes in calculaiion efficiency, and interna[-additive and -proportional noise. Each of these has been weII defineci, but IittIe has yet been said about the concrete ways in which our cdculation efficiency and internai noise can change for exampte. If not rnuch else has been andyzed using these methods, then how do we

Limits to Face Recognition 12 make sense of these results? We sirnply argue that, when there is much more to know about how different perceptual and cognitive phenomena relate to efficiency and noise, the generality and independence of these measures will be invaluable in relating these phenomena to each other; perhaps, even in surprising and informative ways. Fortuitously, a few recent studies have used a black box model of efficiency and noise to analyze a variety of perceptual phenomena. Efficiency and noise have k e n used to characterize the change in acuity that occurs with aging (Bennett. Sekuler and Ozin, 1999). the improvement in psychophysical performance with perceptual leaming (Gold. Bennett and Sekuler, 1999b). the degradation of face recognition in the periphery (Makela et al.. 2001), the fluctuations in sensitivity to contrast differences (ïrgge. Kersten, & Burgess, 19871, and the degree to which quantum noise can account for the shape of our CSF (Pelli. 1990). In f x t , it turns out that there are enough relevant resu1r.s

in both this research and the face recognition Iiterature to form a reasonable hypothesis about which black box parameter should change systematicaily with inversion and conuast reversai. This hypothesis will be presented shortly. However, it is imperative to emphasize the inuinsic importance of measuring caiculation efficiency and intemd noise changes with Fi and CR. Therefore, one Iast argument conceming the utility of our black box model will be presented.

Limits to Face Recognition 13 The Inherenr Ambiguity of Simple Threshold Measurements

A careful analysis of our black box mode1 of suggests the following, rather

surprising, possibility: Inversion improves calculation efficiency. Assume for the moment that observers are, in fact, slightly more efficient at recognizing invencd faces than they are at recognizing upright faces. Now, assume that face inversion also introduces some amount of internal, additive noise. Depending on the relative size of these differences in efficiency and equivalent noise, the relationship between contrast threshold and extemal noise (Eq.2) may resemble that depicted in Figure 2. As one cm see. the function representing our performance in the invened condition has a lower dope. which is what we would expect with increased efficiency. Furthemore, the increased amount of equivalent noise naturally shifts this cuwe's x-intercept to the left.

The interesting propeny of this curve is that the combination of these features has resulted in the curve's y-intercept being at a point higher than the cuwe representing the upright condition. In other words. contrast threshotds will be significantly highcr for the invened condition where there is no extemai noise added to the stimuli, even though we are more efficient at recognizing inverted faces. Our current understanding of the face inversion and contrat revend effects is based entirely on perceptual recognition tasks where there is, in fact, no externdiy added noise? It is possible then that inversion will acictually generate a positive change, or improvement, in one of out black box parameters; 3

Actuaüy, contrast threshoIds are not even measured in most face recognition studies related to the FIE and CRE. This Mises a whole set of other problems concerning the interpretation of experimentai results. However, the current argument is stiI1 very relevant. The use of percentage correct to gauge sensitivity merely adds another layer of ambiguity.

Limits to Face Recognition 14 such a result would not be inconsistent with what we already know. Naturally, however, this result would require a d m a t i c re-interprecation of the face inversion effect: It is a positive effect in one way, but a negative effect in another way. The current study wilI resolve this arnbiguity.

Figure 2. A possible ambiguity in the face recognition literature anses because performance is always measured without externaily added noise. If face inversion increases efficiency but increases equivaIent input noise by the right amounts, performance wiil naturally be better with upright faces when there is no noise.

An Independent Measure of Infornation Use: Response Ciassification As shown in Ahumada & Beard (1998; 1999), the technique of response ~Iassificationcm be used to measure the influence of each pixel of the visual stimulus used in a psychophysical detection or discrimination task. This is a significant piece of information. The resuiting map of independent pixel weights, the ciassificatio~image, infotms us about what Iocations of the stimutus are used the most when an observer

Limits to Face Recognition 15 decides whether or not a given stimulus, for example, is one image or another. The method for deriving classification images is described in the Methods, It is not the purpose of this paper to describe the method in any further detail, or to prove it actually results in a map of decision weights (see Richards & Zhu, 1994). The criticai issue for this study is the specific type of information use chat our classification image represents. In Mumy, Bennett & Sekuler (CO be subrnitted), the response classification image

is specificaIly described as an estimate of an observer's tempiate. The template is simply the classification image as it is properly modeled in a psychophysical rnodel of the observer's decisions. In this model. the template is cross-correlated with the presented stimulus after it has been compted by extemal and intemal noise. This operation results

in a single decision variable, which is just some Iinear combination of the signal + noise values in the presented stimulus. As will be described in the Methods. the optimal scrategy for the 3AFC recognition task used in this study is for the observer to use a decision variable based on a cross-correlation of the presented stimulus with a specific template (derived from an optimai consideration of differences between the two image choices). Although the ided observer's template itself may be totaIIy different from that estimated of a human observer, our estimate of the human template assumes they both use the same operation.

in other words, the response classification image provides us with an estimate of how an observer uses pixel information as a linear cross-correlator. C o n m t this with another possibility. We rnight, foliowing the principles of reverse-correlation set out in Rieke et al. (L997), attempt to form a method to estimate information-use for a specific type of

Lirnits to Face Recognition 16 non-linear strategy. This rnay provide important information about how recognition preciseiy differs with inversion and contrast reversal (this is planned for a future study), However. any such smtegy would necessarily be sub-optimal. Therefore, our method of response classification is the b a t way

CO estimate

how optimally an observer is using

stimulus information (at least for the current t s k under study). A direct comparison of the classification image with the ideal template (described

in Methods) provides us with an independent measure of an observer's sampling efficiency. The statistical cornparison used will be a normalized cross-correlation between both templates (NCORR). TempIates derived in recognition tasks involving normal, contrast-reversed. inverted and invertedkontrast-reversed faces wiIl be compared.

Their respective NCORR values should, hypothetically, reflect the

differences in caIculation efficiency we will measure in the same conditions. The classification image. as a spatiai mapping of information use, can be used to mesure more than just efficiency. It informs us about what information an observer uses in different tasks (in this study we will empIoy spatial frequency decomposition to explore this). It can infonn us about how much information an observer uses (this study will compare the number of pixels used at some constant level of statisticai confidence). However, there are difficult problems to be overcome before more detailed statements can be made about these types of questions- The prime purpose of measuring classification images in this study will be to verify any findings that are found in the initiai experiments, which will evaiuare the black-box parameters.

F E and CRE as changes in an observer's

Lirnits to Face Recognition 17

Perceprual Leaming and the IE and CRE Gold, Bennett and Sekuler (l999b) applied the same black-box mode1 used in this study to an anaiysis of perceptual learning. They cornpared calculation efficiency, equivaient input noise and proportional noise across multiple sessions of a IO-AFC face recognition task. The result was that efficiency increased with each new session. while both internai and proponionai noise remained the same. Gold et al.. thus, concluded that the genenl improvement in face recognition that occurs with training must be driven solely by increased efficiency. This concbion was generalized to al1 of complex-pattern recognition because a separate expecirnenr. which required observers to recognize different Gaussian 'blobs', generated the same result. The relevant point here is that only efficiency changes with perceptual leaming, In a sepmte line of research, Gauthier et ai. (1998) sought to demonstrate that the development of recognition expertise for a given class ofobjects results in inversion and conuast-reversai effects for that class of objects. Gauthier et ai. compared the abilities of a novice and expert group to recognize individuid 'greebles*,an artificiai class of relatively homogenous objects never encountered More. They aiso used a performance criteria independent of a developed FIE and CRE in order to decide when potentid greeble experts had sufficient training. Gauthier et al. were, thus, abie to experimentaily create two discrete categorïes of observers, the expert and the novice. Gauthier et ai- found that expert observers suffered greater performance losses with conuast-reversal than novice observers. Therefoce, this study suggests that the

development of recognition expertise for a class of objects causes a contrast-reversai

Limits to Face Recognition 18 effect to occur for that class. Although this study dso suggests that this may not be true of object inversion, other studies lend suppon to the idea that extensive leaming can generate inversion effects. Bmyer and CrispeeIs (1992) descnbe expert handwriting identifiers whose identification abilities suffer greatly with the inversion of handwriting samples and faces, but no other classes of visual stimuli. Similady, Diamond and Carey ( 1986) describe dog experts who suffer from the same, object-specific inversion effects.

While such studies do not control for an expertise category, the dernonstrated isolation of the LE to one's dornain of expertise does imply that at least a few decades worth of training can result in the effect. The cmcial consequence is for this study is that, if it is true that the acquisition of recognition expertise results in the development of an LE or CRE. these effects are likely to reflect significant increases in the efficiency of this recognition. ln order to become an expert at any son of cornplex pattern recognition task, some degree of leaming is presumably required. And, since, Gold et al, have demonstrated that only efficiency changes with petcepmal leaming, it must be chat some high Ievel of efficiency results in an iE and CRE.

One might argue t h t it is possible that something other than efficiency changes with Iearning and it is this that results in the IE and CR€. However, recall from the previous discussion of our bhck-box mode1 that the model's parameters are independent factors, which form a comptete description of recognition performance. Therefore, there cannot be a rnodei-independent factor ihat is relevant to the IE and CRE. Expertise is a significant improvement in recognition, which is an efficiency increase. If expertise

Limits to Face Recognition 19 results in the iE and C W ,then a significantly increased efficiency results in the iE and

CRE.

The Developmenr of Injrexible Represematioru Based on the experiments described in the previous section, it seems vaiid to say that an increased eficiency results in the iE and CRE. But, given thac efficiency, IE and CRE are al1 descriptions of performance, it seems illogicd to postulate a causal

relationship; increased efficiency cannot really cause the IE and CRE. In this section, we postulate that percepcual leming for m object cIass results in an Inflexible Representation (IR) for that dass, m d it is the inflexible quaiity of this representation that causes the iE and CRE. Based on this ideri, one might hypothesize that observers will show much p a t e r efficiency ac recognizing upright, positive-polrtrity faces than b t h inverted, positive-polarity or upright, negative-polarity faces.

The argument that increased efficiency results in the iE and CRE can be put forth in a slightly different way: The LE and CRE simply rnean that performance is genedly better for upright and normally-conuasced stimuli relative to inverted and contrastreversed stimuli. Since we know that training resuIts in both an irnproved eficiency of some son, and the iE and C E ,one must assume that ttiis training resulted in a selective improvement of efficiency; strong gains in recognition efficiency occur exciusively for upright, n o d l y contnsted stimuli. In orher words, when observers devebp a

proficiency at discriminating arnong subtiy varying stimuli, the knowtedge or mechanism underlying tbis proficiency somehow c m o t be applied to these same stimuli when they

Lirnits to Face Recognition 20 are invened or contrast-reversed, To make rnatters simple, we will refer to this phenornenon in the rest of the paper as the developrnent of an Inflexible Representation

(IR).

Hypothesis and ûverview

The argument descnbed in the previous two sections predicts that efficiency will be significantly greater for observers in U-P as compared to those in 1-P. U-N and 1-N. Furthemore, it predicts that interna1 noise, both constant and proportional. will not be different across these conditions. These predictions are based on the selective enhancement of efficiency shown to exist in previous studies, and which we have termed

IR. However, these predictions will not be used as hypotheses in the current study. They have been worked out in order to demonstrate to the reader how the use o f our black-box mode1 can lead to not only novel. quantitative descriptions of the FIE and CR€, but aiso the beginnings of an explmation of these effects. Nevertheless, a hypothesis wiil be put forth. Given that the FIE and CRE do reflect a significant difference in recognition performance, of some kind, between U-P and al1 other conditions, at least one of our bIack-box parameters must change with inversion and contrast reversal. The same parameter, or cornbination of parameters, rnay not change with both inversion and contrast-reversai, If this is the case, our results may even hdicate a difference in the type of deficit caused by these two stimulus manipulations. In any event, the results will help to resolve the ambiguity about whether

Limits to Face Recognition 21 or not the FIE or C E can enhance performance in some way, and to forrn a stronger opinion about the vlidity of the IR hypothesis described in the previous section. Furthemore, our classification image experiment is likely to also result in some difference between U-P and al1 other conditions; either one or both of template efficiency, or percentage of significantly used pixels. In fact, a signifiant difference in cdculation efficiency but not proportional noise should lead us to expect the same difference in NCOR for our classification images. As described earlier, these ternplates are derived using a method independent of chat used to derive our black-box parameters. Therefore, if there is a Iack of any parameter difference in the black-box analysis due to methodologicd or samplingquality shortcomings. Our templates may be relied upon to reflect the hypothesized difference. For these reasons. al1 statisticd tests. those for both experiment L and 2, will be treated as planned comparisons.

Limits to Face Recognition 22

Methods

Observers Calculation Efficienc~Experiments. Six Female and two Male students from the University of Toronto participated. Al1 participants. except CG. were naïve and reimbursed for their participation (SlO/hour). Also, dl participants but CG were farniliar with the faces used as stimuli. Three participants (CG, AC and WL) were experienced psychophysical obse~ers. Al1 participants had normal or corrected-to-normal vision. The average age of participants was 22. Classification image Experiments. Three Fernale and one Male student from the same University participated. al1 of them different from those who participated in the Calculation Efficiency Experiments. A11 were naive and inexperienced psychcphysica1 observers, and none were farniliar with the faces used as stimuli. Ail participants had normal or corrected-to-normal vision. The average age of participants was 2 1,

Apparatus

Stimuli were displayed on an AppleColor High-ResoIution RGB monitor (mode1

M0401). The monitor's spatial resolution was at 72 pixels per inch, and had an approximate frarne rate of 7mz. The active display area was 23 x 17 cm, which subtended a visud angle of 13 x 9.7 degrees fiom a viewing distance of IO0 cm. Display luminance was calibnted with a Hagner Universal spot photometer (Mode1 SI, Optikon Corp. Ltd.). A master look-up tabIe (LUT), 1779-elements large,

Limits to Face Recognition 23 was consuucted from the measured relationship between luminance and tabled pixel values. Maximum display luminance was 50.74 cd/m2. New %bit LUTs were genented for during every triai. for each new image to be displayed. This was done in order to generate images at a resolution beyond the 256 gray-scales normally available from &bit display memory (see Tyler et ai., 1992). A11 experiment software was written in MATLAB 5.1, using the Psychophysics Toolbox extensions (Brainard, 1997: Pelli.

1997).

Stimuli Digitized photognphs of four faces, two male and two female, were used. Al1 rnodels wore no rnake-up. and were instructed to maintain neutrai expressions (see Gold. Bennett, & Sekuler. 1999a for details). The digitized photographs were then cropped to an oval window, excluding areas showing the chin and hair, including the hairline. The effective stimuli were then resized and centered in a 128 x 128 pixel box. The height of each face was a constant 99 pixels, while the widths ranged from 62 to 90 pixels. At a viewing distance of 1 rneter, al1 faces spanned approximately 1.5 (+/-0.3) x 2 degrees visual angle. The amplitude spectra of the images were set to the average of a set of 10 fixes, including the current images, used in a previous study (Gold et ai., 1999a). This was done so that participants could not rely on simple clifferences in the overaii magnitude of spatial frequencies bctween faces, but had to use phaseforientation information within channeis in order to discriminate. GoId et ai. were concemed with the actual sensitivity

Lirnits to Face Recognition 24 and category-dependent use of SF channels for recognition performance. This precaution may also provide us with an ecologicaiiy valid estimate of caiculation efficiency and the internai template; overail amplitude spectra information is not likely to be a reiiable measure for face discrimination in the reai world. Stimuli were represented, pixel-by-pixel. in tems of contrast. We used the foilowing definition of contrast:

Where the 1 without indices corresponds to the image's average luminance. This definition of contrast ensured that average luminance could be given an arbitnry value. which was necessary to display d l of the images within the linear contrast range of the monitor. Calibntion data generated by the Hagner Universal photorneter indicated that the monitor's maximum contrast was 1.875. Maximum signal contrast variance in our experiment was set at 1. Since the high noise condition adds noise of variance 0.0625, the standard deviation of the displayed stimulus will be 1.0308. Therefore. any given pixel in an image will faIl within the l i n m contnst range of the display 93 % of the time. Any value beyond this was re-smpIed from the Gaussian distribution.

Limits to Face Recognition 25 Noise fields External noise added to the face stimuli consisted of an image of the same dimension (128 x 128 pixels) whose individud pixel values were independently sampled from a Gaussian distribution. Cilculation Efficiencv Exwriments. On haif of the trials. face stimuli were added to a Gaussian white noise field (rn=O; vard.0625) before presentation. No noise was üdded for the other half of the trials.

In choosing this noise variance. we considered

how much signal energy would be required at the criterian threshold level used. and the need to add noise that was not too similar COprevious estimates of equivalent input noise. This was. thus. a problem of maximizing noise variance within the limits of our monitor's linear range. Data from Gold et ai. (1999b). who estimated calculation efficiency in a sirnilar experiment, was used for this decision. Classification Image Ex~enments.On dl of the triais, face stimuii were added to a Gaussian white noise field (m=O; var=0.0 19)-

Procedure Each experimental session consisted of 2000 triais of a 2AFC task, lasting approximately one hour. Sessions began with a two-minute adaptation period followed by a screen displaying the two face images to be discriminated during each test trial. This initial choice screen aiso inscrucred the participant to press a button when they felt they were ready to begin. Before each session, participants were instmcted to carefuiiy examine the faces during this tirne, and aIso during the selection period of each trial.

Limits to Face Recognition 26 Trials consisted of the following sequence: First, a small fixation point was displayed at the center of the screen for 1 sec. Participants were instnicted to focus on this marker untiI stimulus presentation. The stimulus was then displayed, centered, on the screen for 0.5 sec. This test stimulus was one of the two choice faces, added to noise where applicable and chosen at random. with equal probability. Finally,

3

selection

screen appeared. This screen displayed both face stimuIi. noiseless and at contrast variance 1, Participants were instnicted to press a button corresponding to the face they believed was the test stimulus (button-to-face matches were made explicit in each selection display). Feedbrick was provided in the f o m of short audio beeps: a low tope for incorrect responses and a high tone for correct responses. Participants were familisirized with this entire procedure in short demo sessions of approximately 20 triais. Calculation Efficiencv Experiments. Each participant ran in 8 sessions. one for each possible combination of stimuli gender, orientation and contrast polarity (see Figure

3 for a condition tree). The order of conditions varied between participants. except thai sessions were grouped by the gender of the face image for 3 of the 8 participants. The order of conditions was mndomized to avoid order effects especidly related to perceptual Ieming.

Limits to Face Recognition 27

Female

/=

Pdtive

FU-P

~sg.ve

FU-N

Positive

FI-P

Upright

m ' Inveried

,.,N,e

FI-N

Male

Figure 3. Condition uee. This tree depicts d l combinations of stimulus manipulations and type as a path, ending at a condition code, Gender is indicated first. followed by orientation, and then contrast-polarity. Two contrat thresholds were estimated for each session. one conesponding to performance during triais where externd noise had been added to the signal and one corresponding to performance where no noise was added. This was done by randomly interleaving four uial categories, two performance criteria for each of the noise and nonoise trial types. Conuast was adjusted independently for each tria1 category using the staircase method. This method ensured that, for each noise category. enough triais would display conuasts corresponding to a focused range of performance levels, one between

7 1% and 79%. A single staircase run at itch trial category consisted of BO trials. The fmt LOO0 trials, during which dl four staircase mns were completed, were replicated for the second haif of the session; a i l aspects of the presented stimuli were replicated exactiy, and in the same order. AI1 participants, except CG, were not aware that the second haif of triais was a repIicate of the tint ha

Limits to Face Recognition 28 A high-noise and a no-noise threshold were estimated by fitting the empiricd

relationship between conuast variance and accuracy to a Weibull function, and interpolating contrast variance at 71%. The range of contrast variances we were able to test. due to the descnbed use of multiple staircases, ensured a more accurate interpolation. The normal use of a single staircase tends to under-szunple high c o n m t trials, Note that psychometric function fitting is the procedure normally used on data obtained by the rnethod of constant stimuli. It is used here since a replication of trials for each staircase provides us with a larger sample at each contnst variance used. Employing the mode1 of equivalent input noise described eariier. we then derived a k' and Neq value for each participant in each condition. We bootstrapped the two contmt thresholds obtained at each noise condition, with their respective standard errors, to Equation 1. D.e hest linear fit to Equation 1. with k' and Neq as free parameters, was obtained after 1500 iterations. Recall that a11 k' values are inversely proportional to calculation efficiency. A11 k' values were divided by the ideal k' (different for each stimulus gender category), and multipIied by 100 to obtain caiculation efficiencies. For each level of contrast variance, a percentage of agreement Pabetween exactly replicated tri& was cdcuiated. One-twne correspondences between contrat variance level and percent correct P, were then denved from the fitted Weibull (psychometnc) function described exlier. We were thus able to obtain a unique mapping beween Paand

P,. An observer's response consistency can be characterized by the s slope of this mapping, which was mmodeied by:

Limits to Face Recognition 29

Eq.7

An observer modeled with different levels of internal noise, relative ro an externdly added noise of constant standard deviation, responds with systernatic changes to the slope s of thii equation. This was carried out by mnning several Monte Car10 simulations of an

observer performing in this experirnent with different levels of intemal noise. By comparing a participant's slope to the modeled observer's, we were thus able to obtain an estirnate of their totai internal noise (SD).relative to external noise

(m.This noise

standard deviation ratio was also caiculated for each participant, in ail conditions, Classification lmaae Ex~eriments. Each participant ran in 30 sessions: five sessions for each of the eight stimuli conditions in the Cah.dation Efficiency Experiments. Sessions of the same stimulus condition were run contiguously, as a session block, over a period of approximately one week. As with the f i t experiment. the order of session blocks was varied between participants. Contrast variance adjustment was simple in this experirnent: a single staircase was used to rnaintain performance at 71%. Noise-field variance was also held constant. ensuring that for any particular trial the SNR, as defined in Murny et al. (to be subrnitted), wouId be the same. We collected classification images by averaging the noise-fields of a conditionblock according to their response categories and combining them linearly. These categories include al1 combinations of response R and actuai test stimulus S; four in this

Limits to Face Recognition 30 expenment (summarized as SlRl, S2R1, SIR2, S2R2 in Table 1). The linear combination of response categories we used was: (SIR1 + S2RI) - (SIR2

+ S2R2)

Eq.8

As s h o w in Richards et ai. (1994), the resultant image provides estimates of the relative

decisional weight assigned to each pixel location. In ocher words, this is the classification image chat maps out the influencc of each noise pixel on responding with either R 1 or W.

Stimulus 2

Table 1.

Respond "1"

Respond '2,

S2R 1

S2R2

Stimulus-Response categories used to classify trials in the Response

Classification Method.

Two statistics of the observer template were employed: the normalized crosscorretation (NCOR) with the ided template, and the percentage of significantly used pixeIs (PSP). NCOR is a measure of exactly what its name suggests- the sum of a pixeL by-pixel multiplication of observer and ideal templates (where each has been normalized by its respective image variance). This provides a corrdation vdue chat indicates how

weIl the two images are matched; e.g. a value of one indicates the images are the same.

PSP is a percentage of the number of pixels in an observer tempiate that are significantly correlated with that observer's responses. Each pixel in the tempIate (classification image) was treated as a iinearly independent dimension. Therefore,

Lirnits to Face Recognition 31 multiple correlation significance tests were run to evaluate the significance of each pixel. The P-level of 0.0005 was used for each pixel correlation test. This ensured that, of the 128'total pixels in a given template, only eight pixels would be incorrectly ueated as significant, The current study is interested in template differences chat may be caused by inversion and contrast reversal. It may be the case chat the distribution of highly conelated pixel values may also Vary across these conditions. In this case, a difference in the choice of P-level may result in a difference in how PSP changes with condition. Histograms of the various templates were anaiyzed to ensure that P-tevel changes would not have this effect.

Ideul Observer As shown in Gold, Bennett, & SekuIer, (1999a), the ided decision NI^ in a TAFC

recognition task is to choose the image which results in the maximum correlation with the target stimulus. This can be shown to be equivalent to using a decision variable produced by correlating the target stimulus with a difference-image, the subuaction of one choice

image from the other. if the decision variable is positive, the ideal decision is to choose the image that was subtracted from. If it is negative, the ided decision is to choose the subtracted image. The performance of this ided rule, in noise, was estimated with a Monte Car10 simulation in both, Separate ided k' values were derived in this way for Male and Fernale stimuli, which were used for estimating observer efficiencies in Experiment 1. The difference image is aiso the ided template, and was used to estimate NCOR in Experiment 2.

Limits to Face Recopition 32

Initial tests consisted of five 3-factor ANOVA. three for each of the black box parameters measured in Experiment 1 and two for each of the template statistics measured in Experiment 2. The three factors, or independent variables, conoidered in each of these ANOVA were the stimulus parameters: gender, orientation and conuast polarity, For ail ANOVA, minimum criteria for statistical significance was a type 1 error of 0.05. This atso applied to ail planned t-tests (e.g. Al1 of those comparing the upright, normal-contrast condition with the others). Also, al1 t-tests executed after ANOVA that indicated no effects involving stimulus gender were done on scores that were collapsed over stimulus gender. For example, a subject's score in the fernale-upright-positive stimulus conditions would be averaged with hislher score in the male-upright-positive condition to derive a collapsed score for the upright-positive condition. Particular conditions are referred to with the following convention: an abbreviation of F or M for the femaie and male genders, followed by U or 1 for upright or inverted orientation, then P or N for positive and negative contmts. For example, the condition for femaie, upright and negative conuast stimuli is referred to as F-U-N. Where scores have been collapsed across gender categories, oniy the last two letter spaces are used (cg. U-N).

Litnits to Face Recognition 33

Experiment 1: Black Box Mode1 of Efficiency

Eqrtivalenr Input Noise

No significant effects were found in the 3-way ANOVA conducted for Equivaient input Noise. Additional t-tests between U-P and U-N [t(6) = 11, 1-P [t(6) = -0.081,and 1N [t(6) = -0.541al1 failed to reach significance. The general pertbrtnance deficits associüted with stimulus inversion and contrast-reversai cannot be attributed to changes in an additive interna1 noise.

infernal-10-EicfemalNoise

No significant effects were found in the 3-way ANOVA conducted for In to-External Noise. Additional t-tests between U-P and U-N [t (6) = -0.171, i-P [t (6) = -21, and 1-N [t (6) = -1.291 al1 failed to reach significance. The general performance deficits associated with stimulus inversion and contrast-reversa1 cannot be attributed to changes

in a multiplicative noise source. Any significant effects that may be found for calculation eficiency mut, therefore, be attn'buted to a change in the observer s sampiing strategy or calculation.

CaImIation EDcienq

A 3-factor within-subjects ANOVA revealed significant main effects of

orientation [F (1,s) = 67.85, p < 0.001] and contrast polar@ IF (1,s) = 14.52, p < 0.051.

Limits to Face Recognition 34 There was also a significant interaction effect between orientation and contrast polarity F(1,5) = 11.67, p < 0.051. Al1 other effects failed to reach significance. To understand the interaction between orientation and polarity, t-tests were executed for each of the 6 unique pairs of these stimulus conditions (these are shown in the legend of Figure 4). Since no interactions were found between gender and the other two factors, the average of calculation efficiency for male and female stimuli was used for these comparisons.

2.2

FEMA LE POS /O r NEQ ( 0 O

: 6

POSt180 a NEG (180 i

1.6 1.4

1.2 1 .O

0.8 0.6

0.4 0.2

0.0

M A LE POS/O 9 N E 0 /O r POStleO o NEQ 1180 O

Figure 4. Calculation eficiencies in al1 stimulus conditions, Error bars correspond to the standard error of k in Equation 1. Note that two bars are missing for AD (no data), and

one for JYG (compted data).

Limits to Face Recognition 35 There were six t-tests in total, The first three involved predictions based on past results; there must be some performance deficit associated with both contrast reversal and inversion, as reflected in simple differences between U-P and the three other conditions. No effects were found for Equivalent Input Noise or Intemal-Extemal Noise so, according to our black-box model, any performance detlcit must take the form of decreased calculation efficiency. The remaining three tests were post-hoc. However, tolerance adjustment was not necessary because the resulting p-values were much higher than the most liberal test would allow. Where both faces were upright, positive contrast faces (U-P) were identified with significantly greater efficiency chan negative contrast (U-N) faces [t(6) = 5. p c 0.0 11. U-

P faces were also identified with significantiy greater efficiency than t-P faces [t(7) = 7.36, p c 0.001 ] and LN faces [t(5) = 8-77, p < 0.0011. In other worcis, the performance deficits generally rissociated with contrast r e v e d and inversion can now be attnbuted to

a decrease in calculation efficiencyInterestingly, the efficiency difference between U-N and 1-P did not reach significance [t(6) = -0.697, p = 0.511. Therefore, the magnitude of the efficiency decrements associated with contrast reversal and inversion cannot be differentiated. Equally sviking is the lack of a signifiant difference between 1-N and both U-N [t(5) = 1.94, p = O. 111, and 1-P [t(5) = 2.35, p = 0.065]. This suggests that inversion and contrast-reversal do not additively combine to degrade efficiency. As shown in Figure 5, stimulus inversion decreases eficiency to a much lesser degree once the stimulus has

Limits to Face Recognition 36 been reversed in contrast, and vice versa. This sumrnarizes the interaction effect between polarity and orientation found in the earlier ANOVA.

-

Gender Col lapsed upright inverted

I

I

POS

NEG

Cdntrast Polarity

Figure 5. Mean calcuiation efficiencies between upright and inverted stimuli. in both positive and

negative contrasts. Efficiency scores have k e n collapsed across stimulus gender.

Experiment 2: Classification Images

Analyses of NCOR The Relationshi~Between CaIcuIation Efficiencies and NCOR. Since the primary interest is in how either eficiency or NCOR changes across stimulus conditions,

the values used in this correlation andysis were the ratios of efficiency (or NCOR)in one condition to the same in another condition; for example, efficiency in F-U-Pdivided by

Limits to Face Recognition 37 efficiency in F-[-P.

Our concem was: "1s the change in efficiency between two

conditions proportiond to the change in NCOR?"

Different subjects ran in each

experiment (one for each measure), so it was dso necessary to average these conditionratios across subjects. Correlation analyses were, thus, conducted between the 6 condition-ratios for caicuIation efficiency, and the 6 for NCOR. For male stimuli, there was a significant positive correlation (r=0.8 12) between the condition-ratios of calculation efficiency and NCOR [2(6) = 1.96. p < 0.051. However. the positive comlation for femaie stimuli (14.75) failed to reach significance [2(6) = 1.68, p = 0.0931. This provides modest support for the conclusion drawn earlier that significant changes in efficiency can be attributed to changes in an observer's sampling efficiency. It suggests that funher statistical analysis rnay demonsuate a pattern of inversion and contrast polarity effects for NCOR sirnilx to chat shown for caIculation efficiency, but this similririty may be stronger for Male stimuli. OMNlBUS Test for Effects of Polaritv & Inversion. A 3-factor wichin-subjects ANOVA reveaIed significant main effects of contrast polarity [F(1. 3) = 41.2. p < 0.011. and orientation [F( L, 3) = 10.3, p c 0.051, and a significant interaction between these two factors [F(1,3) = 23.63, p c 0.051. The performance deficits generaily shown for contnst reversal and inversion can, thus, be attnbuted to changes in the efficiency of an observer's templrtte. This was, in part, predicted from the previous effects found for cdcuiation efficiency and its positive corretation with NCOR.

In addition to an interaction between orientation and poIity, however, there was also a significant interaction between gender and orientation (F(1.3 = 11.9, p < O.OS)]. in

Limitq to Face Recognition 38 order to explore this second interaction effect, two additionai ANOVA were conducted: a 2-way ANOVA between inversion and contrast poiarity for the Femde condition and one for the Male condition. For Female stimuli. there was a main effect of orientation [F( 1. 3) = 14.18, p < 0.051, and an interaction effect between contrast polarity and orientation

[F(L.3) = 31.18, p < 0.051. However, the effect of polarity did not reach a significant Ievel [F(1. 3) = 5.19, p = O. 111. This interaction is illustnted in Figure 6a. As one crin see. the difference between 1-P and LN is slightly smaller than the difference between U-

N and 1-N. In other words. the effect of potarity seems more dependent on whether or not the face is upright.

NEO

POS

C o n t r u t P o kr i t y

-.f O

&2.

ou)r103

7

50

-

40

-i

30

-

IQ F*OQS upr iqht

-

-

inrartud

-

20

3

TU

3

A

O

NEO

POS Contras* P o k r i t y

Fieure s 6a and 6b, interaction effects between orientation and polarity, for femde (a) and male (b) stimuh.

Lirnits to Face Re~o~pition 39 For Male stimuli, only the interaction effect between polarity and inversion reached a significant level [F(1, 3) = 15.12, p c 0.051. This interaction, illustrated in Figure 6b, shows the same general trend as was shown for Female stimuli. The Iargest difference in NCOR is between U-P and the rest of the conditions, while differences among these non-U-P conditions rernain relatively smailer. The major difference between these two results is that the effect of polarity seems to be slightly stronger for the Male condition: strangely, NCOR for U-N is actuaily smaller than for U-1. The fact thrit orientation effects fail to reach significance for Male stimuli but do reach significance for Fernale stimuli appears to be the cause of the gender X orientation effect found in the 3way ANOVA. Funhermore, this appears to be driven by the slightly better performance of LN compared to U-N for male stimuli (compare Figures 6a and 6b).

Percentage of Significant Pkeis

OMNlBUS Test for Effects of Polarity & Inversion. A 3-way repeated measures ANOVA failed to find any significant effects. Addiuonai t-tests between U-P and U-N

[t(3) = 3.6). I-P [t(3) = 1-81, and 1-N It(3) = 2.81 ail failed to reach significance. The generai performance deficits associated with stimulus inversion and contrast-reversai cannot be attniuted to changes in the percentage of pixels that are significantly correlated with identification decisions. The Relationshi~Between Percent Simificant Pixels and NCOR. Thus far, results show that NCOR changes with inversion and contrast polarity, but that the percentage of significant pixeIs (PSP}does not However, this does not mean chat PSP is

Limits to Face Recognition 40 not a useful statistic to differentiate among particular conditions. It may be that smdl decreases in PSP, too smail to quaiify as significant in an ANOVA, may resuit in very large changes in efficiency. This seems even more plausible in the case where, overaiI. temptates are very noisy, Correlation Z-tests for the 4 stimulus conditions given by polarity and inversion show that there is a significant positive correlation (r = 0.821)between PSP and NCOR

(z(8) = 2.59, p < 0.01

)

for N-1.

A11 other conditions failed to show significant

correlations. Therefore, the significant changes in NCOR found for inversion and contrast reversai cannot be explained by subtle changes in PSP,which may amplify weak eficiency differences.

Qiralitative Analyses of Classijkation Images Data from al1 four subjects were treated as though they were obtained from a single subject. This was done for each condition (defmed by the 3 main factors). resulting in 8 averuged templates. More qualitative analyses of these average templates were performed in order to explore the differences and similarities between the inversion and contrast-reversai effects that occur across subjects, on average. Cross-correlations Between Each of the Averwe Tem~lates. N o d i z e d crosscorrelations were calculated between each of the estimated average templates in the 8 stimulus conditions. Two sets of comparisons were made, those for maie stimuli and those for female stimuli- Within each set, temptates for the 6 unique pairs, given by the

Limits to Face Recognition 41 two levels of orientation and contmt-polarity, were cross-correlated. The results are summarized in Table 2. Cross-correlation values among condition pairs were generdly higher and more homogenous for female stimuli. Also note that there are differences in the patterns for these correlations and those for the NCOR changes described previously. For example, while F-N-U is slightly less correlated with the ideai than F-P-1, this analysis shows that F-N-U is slightly more correlated with F-P-U than F-P-1. While these differences are slight, they do suggest that extra care must be taken when interpreting NCOR and calculation efficiency results. Specificaily. differences in how two templates correlate with the ideal do not always tell us how each differ from the other. Nevertheless, a comparison of Table 2 with the interaction effects for NCOR shown in Gnphs Za and Zb does suggest that these correhtions do preserve the basic pattern of NCOR results. The highest cross-correlation for male stimuli was between P-U and N-U, whereas this was between P-U and P-I for female. Nevenheless, P-U and P-I were the second most highly correlated pair for male stimuli. In contrast, the lowest crosscorrelations. for both male and female stimuli, were between N-U and P-1. For female stimuli, this pair resulted in a 2-fold decrease from the maximal correlation within the set (P-U with N-U). For male stimuli, this pair resulted in an 8-fold decrease from the maximal correlation within the set (P-U and P-i). Furthemore, correlations between N-U and P-I were surpnsingiy smaller than those behveen P-Uand N-1, for both stimuli sets. One would expect P-U and N-1 to have the lowest correlation, because N-1 involves nvo conditions that, independentiy, reduce calculation eficiency.

Limits to Face Recognition 42 The above results suggest that the relative decisionai weights assigned to pixel locations are most similar between P-U and either P-1 or N-U than they are arnong non-P-

U conditions. This means that the calculation efficiency decreases that are rissociated with inversion and contrast-reversal occur for different reasons; specifically, they seem to diifer by which areas are weighted differently from the P-U condition. Since there were no significant effects found for PSP, the differences between N-U and P-1 are not likely related to differences in the gros amounr of information icsed. Finaiiy. the second most highly correlated pairs, for both male and fernale sets. were N-U with N-1, and P-1 with N-1. This suggests that the additional stimulus manipulation of inversion or contrast-reversal from either N-U or P-[ preserves something about the templates for those conditions. Moreover, the correlation vdues for these 2 pairs were virtually similar in both maie and female sets. This is a unique resuIt. If P-I and N-U share very little in common, but N-1 matches both of these to the same degree. chen N-1 is IikeIy to be sirnilar to that minuscule portion of the template shared by P-1 and N-U. Another explanation is that N-f matches with difierent regions of P-1 and N-U, and that these two matches happen to be of the sarne magnitude. Considering how little signai there seems to be in N-[(see spatid frequency analysis below), this latter explanation seerns Iess iikely.

Limits to Face Recognition 43 P-U P-U

P-I

N-U

N- I

0.0245

0.034

0.0 185

0.0 134

0.0209

P-1

0.0206

N-U

0.0 102

0.0025

N-1

0.0094

0.0 t 65

0.020 1 0.0 126

Table z2, Normalized cross-correlations between observer templates in vanous stimuli conditions. The top portion of cornparisons is for female stimuli, and the bottom for male stimuli.

S~atialFreauencv Content of Avera~edTemdates.

Amplitude Spectri for

Femde stimuli are shown in Figure 7% and those for Male stimuli are shown in Figure

7b. Genenl similarities among conditions are the folIowing: When plotted across the xaxis in log coordinates, amplitude spectra show no consistent patterns after about 10 cycles per object. This upper region shows variations that are indistinguishable from its locd mean amplitude. One significant gen&al difference between male and female spectn is that the amplitudes for ail Male stimuli conditions other tfian P-U are relatively dose to the mean of the 'noisy' upper region. Scated another way, the P-U spectrum for Male stimuli is significantly larger in mean amplitude than a11 other conditions, whereas for FemaIe stimuli the P-U spectrum is similar in mean amplitude to both N-U and P-1. AIso, while global spectrum peaks for P-U and N-U both occur at 4 cycles for Male stimuli, g10ba.i

pe&c for P-U and N-U differ by about 3 cydes for Fernale stimuli. The differences in

Limits to Face Recognition 44 mean amplitude relative to P-U suggest that contrast reversal and inversion result in Iess overall information use in the Male set. However, the differences in global peak position relative to P-U for Female stimuli suggest that these face effects result in some change in

the type of information used.

--

-F-P-U F-P-I

03-'

0.8

-

-

-I-M-U

-r-*-t

10

mie mr inaol (bKd

M. Amplitude Spectrafor FemaIe SrVnuli

Lirnits COFace Recognition 45

-

PI-P-I -PI-Il-U

O

00

P

f

''

-.

*

r'*

0.3 02

EgunZé. Amplitude Spectrafor Male Stimuli.

Limits to Face Recognition 46

Discussion

Erperiment 1: Bfack Box Mode1 of Emiency

The results of Experiment I found that Intemal constant, and proportional noise did not change with either contrast-reversal or inversion of the face stimuli, The only black-box parameter that did change was caIculation efficiency. This change occurred oniy between the upright. positive-conirast condition and each of the other three conditions. In other words. we found efficiency differences exactly between those conditions pairs that were predicted to differ by at least one of the biack-box parameters. The lack of effects of conmt-reversal and inversion on proportional noise means chat we cm attribute al1 efficiency variations as variations in the optimality of one's sarnpling strategy. This attribution was further validated by the positive correlation that was found between the condition-ratios of efficiency and NCOR,which was derived in Expenment

2 with an independent method. Also. a contrast-poiarity X orientation effect was also found for calculation

efficiency. This was explained by the Iack of efficiency differences among those three conditions with at least one face maniputation (e-g. U-N,1-P, and LN). This is an interesting result. it suggests that inversion md contrast-reversai da not additively combine to degrade efficiency; at Ieast in part, inversion and contnst-reversal musc affect the sarne mechanism that results in efficiency losses. Lastiy, there were no effects of gender on efficiency. This provides some evfdence that the face effects occur independentiy of stimdus gender, Since gender is one way of capturing a large portion

tirnits to Face Recognition 47 of the variance between individud faces, it aiso suggests that there is sornething about face stimuli. or how we recognize face stimuli, in general that elicits the contrat-reversal and inversion effects; the face effects are the face effects. Independently of Experiment 2, these results have provided a quantirative description of the face effects. The FIE is a loss of calculation efficiency with face inversion, and the CRE refers to the same loss with contrast-reversal of a face stimuIus. We now know more about what these effects are, and can also mle out the possibility chat the face effects c m enhance performance in some way. Fi and CR affect recognition performance in a compIetely negative manner. and they both do so by reducing efficiency.

Experiment 2: Classification fmugcls

Results show that the pattern of NCOR with FI and CR is simiIar to that of calculation efficiency. Both contrast-reversal and inversion reduce template efficiency, but there is an interaction such that the effect of their combination is relatively similar,

This verifies the conclusions drawn in Experirnent t. However, there was an unexplained interaction between gender and orientation; the ciifference of NCOR between upright and positive stimuli was much greater for Cemale han male stimuli. Further andyses were conducted on the relationship between the order of conditions and NCOR, but no simple patterns were found, For example, it was not m e that more participants were assigned to the inverted, female stimulus condition fmt and then the inverted male stimutus

Limits to Face Recognition 48 condition. If me. there might have been a learning effect that was to the advantage of performance for male, inverted stimuli. Other simple effects were dso mied out. Four explanations of the gender X orientation effect are possible: One participant who did show some sort of order effect, AMC, dso demonstrated the suongest use of information across most of the conditions. The gender X orientation effect may simply reflect how strongly AMC drove the sntistical results for the classification images. The second possibility is that there is, in fact, something significant about the stimuli of differently gendered faces. The third possibility is that there is something significant about the two individual stimuli we selected, which has nothing to do with gender but affects the FIE nonetheless. The final possibility, more likely than the previous two, is that observers may behave differently or be motivated to different degrees when faced with slightly more difficult tasks. Absolute differences in the choice stimuli for male faces were slightly larger than those for the female faces: more information was availabte to recognize a male face. While our merisure of efficiencyallows us to directly compare performance in either task. it cannot account for the psychological effects of task difficulty. It may be that the slightly higher degree of difficulty in recognizing the female stimulus makes inversion for that set seem disproponionately more difficult, which could reduce motivation. The one conclusion that we can draw is that individuai differences related to the use of stimulus of particular types, whether by gender or specific identities,

are more likeIy to influence the data of Experiment 2 given the small sample size of four. The lack of an orientation X gender effect in Experiment 1, which had double the number of participants, provides some support for this interprention.

Lunits to Face Recognition 49

There were no effects found on PSP, nor a correlation between PSP and NCOR. This strongly suggests that efficiency 'fferences reflect a difference in the relative values assigned to pixels (or shape of the ternplate) and not simply how many pixels were used. However, it is possible that some smdl decrease in PSP may account for the Iow efficiencies found for the N-1 condition. Considering how noisy these templates were, this seems like a reasonable ctaim. Correlations among the temphtes ehemselves were most informative about differences between the FIE and CRE. Although both FI and CR resulted in a similar reduction of efficiency, the templates for U-N and C-P were the least correlated pair among ail sets of template correlations (including the correlation between U-P and N-1). Combined with the positive fkdings for NCOR and efficiency but not PSP effects. the previous result implies that Fi and CR elicit differentiy distribrirrd templates; they are equally deficient, but deficient in different ways. Nevertheless. the polarity X orientation interaction does imply some overlap in the mechanisms underlying these effects. One problem with the above malysis of inter-template correlations is the amount of noise in these templates. TheoreticaIly, this shuuldn't matter; noise portions will correlate to zero and onky decrease the magnirude of the actual template correlations. It should not change the pattern of correlation effects. Nevenheless, the possibility does exist for two randomly generated noise-fields to have an above normal level of correlation. Further tests will reanalyze inter-template correlations after each has been reduced to a smaller portion that contains a critecian leveI of significantly correlated

Lirnits to Face Recognition 50 pixels. This manipulation should reduce the chance of non-signal regions infiuencing our correlation estirnates. The spatial frequency analyses suggested some interesting differences between the sampiing strategies used among the various conditions. Many of these Ceatures, however, were not described in extensive detait. The statistical analyses of amplitude spectra requires an added degree of cornplexity that would have distracted from the key

focus of this paper- providing a simple, and quantitative description of the FE and CRE. Nevenhefess. these results do provide additional confidence that further study wilI demonstrate a significant difference in the sampling stntegy changes induced hy contnst-reversal and stimulus-inversion.

Impact of Resulrs on Assumprions about the Reievance of Information Tvpe Earlier on. we motivated the current study's quantitative approach by explainhg how current face recognition research focuses too narrowly on the H E and CRE as deficits in the ability to use some special type of information. While our aim is not

CO

question the vdidity of this appmach (see Appendix A), it is important &O note that our results can describe the FIE and CRE without assumine any difference in the type of information that is used. Our resuIts simply indicate a difference in the efficiency of information use. In fact, the classification images we derived across the various conditions reflect information usage with the same decision strategy, cross-correlation.

This c m be taken to suggest that FI and CR reduce rhe efficient use of the same type of information, and there is no sudden switch to usage of an altemate type of information-

Limits to Face Recognition 5 1 However, we leave this completely open and simply say that our results suggest new descriptions of information use that are much simpler. Nevertheless, it is still possible, at least theoreticdly,

COobtain

further estirnates

of different non-linear templates from the data we have gathered (see Rieke et ai., 1997). Therefore, our method of gauging information use can be seen as a general one that can help us to explain the FIE and CRE at varying levels of compLexity. An analysis of the non-linear use of the face stimulus is planned for future study. This will help to further describe the ways in which information use may change with Fi and CR. Perhaps. a specific type of non-linearity can be associated with the 'relational infornation use' that current face recognition researchers claim is disrupted by FI and CR. [f anything, our approach forces one to be explicit about how information use can differ in kind, and not simply by degree. It also promises to provide more direct evidence for such strategies in

the fomi of ~Iassificationimages.

Whar Does If Mean ro Have an Eficiency Deflcit?

Simply put, a 1ow efficiency means that one is not giving the most consideration to the most informative pixels of the target stimulus. In our sgecific task, the most informative pixels were those that differed the most between the two choice faces, A strict interpretation of our task means that the level of consideration given to pixels was incommensurate with the degree to which they couId signal important differences between face A and face B. in partic~ilar. However, our resuits were discussed as though the efficiency deficit discovered couid be generaiized to al1 faces. The main relison why

Lirnits to Face Recognition 52 we cm do this is that it had aiready been established, across many studies that used many different faces, that there are face inversion and contrat-reversal effects. It seems highly unlikely that our quantitative description of the FIE and CRE could change for different faces, which nonetheless eIicited s o i generat inversion and contrat-reversal effects in a previous study. Nevenheless. there were two features of our experimental setup that reduce the possibility of a specific-face effect. First, subjects were not farniliar with the choice faces pnor to testing. ln Experiment 1. which consisted of oniy 3000 triais per condition, it seems safe to assume that these subjects couId not have Ierirned enough about these panicular stimuli to establish a strategy highly separated from a general face recognition strategy. The high degree of inefficiency (less than 1% of signal energy) in al1 cases impiies that this was genenlly tme. Second, we controlIed for face gender, which is very Iikely ta account for a large portion of the variance across individual face stimuli. Since we found the same basic pattern of results for each stimulus gender. it seems unlikely that inversion and conuast-reversai eficienq effects wi11 not occur for other. less exclusive. categories of faces. Therefore, we cm say with confidence that inversion and contrat-reversai affect the ability to utilize the most informative pixels of faces in general. There are two Iikely types of causes involved in these category-specific efficiency losses. Fit, we may have some general knowledge of areas of the face that differ the most among the most individud faces diat we have known; regions where we should expect differences in the particular two choke faces in our task. It may be that inversion and contrat-reversa1 make tiûs knowledge irreIevant. For example, at an early enough

Limits to Face Recognition 53 stage in recognition, a contrat-reversed face rnay not seern tike a 'face' at ail. The observer rnay then use some aiternate suategy normally used for objects that are novel. Second. inversion or contrast-reversal may make it difficult to appIy this knowledge correctly. For example, it may not be possible to re-orient higlily Ieamed representations. either in orientation or phase space. The observer rnay then use their knowledge of where informative face pixels should be, but do so without making the proper adjustments. Either scenario implies some inflexibility in the learned representation of face stimuli (IR). This is what was used in the introduction to predict ri singular efficiency reduction

with FI and CR. While the above argument in favor of IR is nther elabonte. this explanation is highly favorable in two major ways. First, describing the CKE and F E as being caused by the same type of phenornenon is simply more parsimonious relative to existing theories. For exarnple, the current Iogic seems to be that, if inversion disrupts 'relational' information. then contrast-reversai must disrupt some other type of information important to face recognition. Previously, that information type was argued to be shape-fromshading (Liu. Collin, Burton, & Chaudhuri, 1999). but the very authors who pioneered research in this area soon rejected it in favor of an expIanation based on IR (see Liu, ColIin, & Chaudhuri, 2000). A second feature in favor of an CR explanation is that it is a general principle that

seems to have some direct analogy to early visual physiology. We know that the Vl is highly organized, such that the anatomical location of specific neurons can be determined from their combined sensitivities to specific orientations, locations, spatial frequencies

Limits to Face Recognition 54 and, possibly, phases. Internat representations of face stimdi rnay be coded directly fmm

the VI. In this case, an inflexible representation would imply an inability to utilize this representation, as it would be rotated L80 degrees in one of the V1 dimensions: orientation, location, spatiai frequency, or phase. The first and Iast of these could manifest themselves as the F E and CRE. Inflexibility dong location may reflect the resuIts Makela et aI. (200 l), who found reduced calculation efficient y for face recognition outside O€ the fovea. and independent of retinal smpling. it is possible that future MRI studies of face recognition wiII be able CO test these predictions. Meanwhile. perhaps simulations of different 'inflexible' uses of the upright, normal-contnst ternphte may be able to predict the cemplaces that were obtained for invened and contrast-reversed

stimuli. In c~nclusion,our quantitative analysis of the FE and CRE suggests a numkr of novel ways in which our knowtedge of face recognition and expertise rnay be enriched.

Limits to Face Recognition 55

Refertnces

Aguirre, G.K., Singh, R, IYEsposito, M. (1999). Stimulus inversion and the responses of face and object-sensitive cortical areas. NeuroReport, 10 (1), 189-194. Banks, M.S., Sekuler, A.B., Anderson, S.J. ( 199 1). Penpheral spatial vision: limits imposed by optics, photoreceptors, and receptor pooling. Journal of the @tical Society of America A, 8, 1775-1787. Bartlett, J-C., & Searcy, J, (1993). Inversion and configuration of faces, Cognitive P s y c h o l a (3), 28 1-3 16. Bennett, P.J., Sekuler, A.B., & Ozin, L. (1999). Effects of aging on calculation efficiency and equivalent noise. Journal of the @tical Society gf America A, 16,654468Brainard, D.H.( 1997). The Psychophysics Toolbox. Spatial Vision, 10.443446. Burgess, A.E., & Colborne, B. (1988). Visual signal detection. IV. Observer inconsistency. Journa1 of the M c a l Societv- of America A. 4, 6 17-627. Cabeza, R, & Kaîo, T. (2000). Features are also important: contriiutionsof featurai and conljgural processing to face recognition. Psvcholoeical Science. 1 1,429-433. Keane, J., Dean, M. (2000). Configurai Calder, A.J., Young, kW., information in facial expressions perception. Journal of Exwrimental 26.527-55 1.

Limits to Face Recognition 56

Carey, S. ( 1992). Becoming a face expert. Philoso~hical Transactions of the Royal Society. London. 335,95- 103. Carey, S. & Diamond, R. ( 1977). From piecemeal to configurational representation of faces. Science. 195.3 12-314. de Gelder, B., Bachoud-Levi, A C , Degos, J.D. (1998). inversion superiority in visual agnosia may be common to a variety ~f orientation polarised objects besides faces. Vision Research. 38,2855-286 1.

de Gelder, B., & Rouw, R (2000b). Paradoxical configuration effects for faces and objects in prosopagnosia. Neurousychologia. 38, 1271-1279.

Farah, M.J., Wilson, K.D.,Drain, M.,& Tanaka, J.N. ( 1998). What is special about face perception? PsycholQgical Review. 105,482498. Farah, M.J., Rabinowitz, C., Quinn, G.E.,& Liu, G.T.(2000). Early cornmitment of neural substrates for face recognition. Cognitive Neuro~sycholoev.17,117-1 23. Freire, A., Lee, K., Symons, L A . (2000). The face-inversion effect as a deficit in the encoding of configwal information: direct evidence. Perce~tion.29,159- 170. Gauthier, I., WiIliams, P., Tan, M.J., Tanaka, J. (1998). Training 'greeble' experts: a framework for studying expert object recognition processes. Vision Research. 38,2401-2428. Grrlthier, 1. (2000). What constrains the organization of the ventral temporal cortex? Trends in Cornitive Science- 4.1-2.

Limits to Face Recognition 57

George, N.,Dolau, RJ., Fin.G.R, Baylis, G.C.,Rwsell, C., Driver, J. (1999). Conttast polarity and face recognition in the h u m a fusiform Nature. 575480. Gold, J., Bennett, P.J.,& Sekuler, A.B. (1999a). Identification of band-pass filtered letters and faces by human and ideal observers. Visia

M.3ga3537-3560.

Gold, J., Bennett, P.J., & Seiculer, AB. (1999b). Signal but not noise changes with perceptual learning, W . 402 176178. Gold, J,, Murray,RF., Bennett, P.J., Sekuler, A.B. (2000).

Deriving behavioural receptive fields for visuab completed contours. Haxby, J.V., Hofhan, E.A., Gobbini, MI. (2000). The distributed

human n a system for face perception. (6),223-232.

in

biti ive S

m

Haxby, J.V.,UngerIeider, L.G.,Clark, VP.,Schouten, J.L., Hofian, E.A.,Martin, A. (1999). The effect of face inversion on activity in human ne& systems for face and object perception. 189-

199.

Hole, G.I.,George, P.A., Dunsmore, V. (1999)- Evidence for holistic processing of facesviewed as photographie negatives. Perc341-359.

Kanwisher, N. (2000). Domain specificity in fkce perception. (8), 759-763.

Kemp, R, McManus, C., Pigott, T. (1990). Sensitivity to the displacement officiai fmtures in negative and inverted images.

Limits to Face Recognition 58

Leder, H., & Bruce, V. (2000). When inverted faces are recognized: The role of configura1information in Fdce recognition.

@arterlv Journal of ~ w c h o l o g y53A, . 513-536, Leder, H., Candrian, G., Huber, O., & Bruce, V. (2001). Configural features in the context of upright and inverted faces. PerceMon. 30.73-83. Legge, GE., Kersten, D., & Burgess, f i .(1987). Contrast discrimination in noise. Journal of the Soc* of Amenca A. 4, 391-404,

Lewis,M.B., & Johnston, RA. (1997). The Thatcher illusion as a test of configura1 dimption. Perce~tion, 225-227. Liiiywhite, P.G. (1981). Multiplicative intrinsic noise and the .. limits to visual performance. Vis-cU29 1-296. Liy C.H., Collin, C.A., Burton, A.M., Chaudhuri, A. (1999). Lightmg direction affects recognition of untextureci tàces in photographie positive and negative. -h. 39,4003-4009.

Liu, CH.,Collin, C.A., Chaudhuri, A. (2000). ûoes face recognition rely on encoding of 3-D surfâce? Examining the role of shapefiom-shading and shape-60m-stereo. pe729-743.

Makela, P., Nasanen, R, Rovamo, J., Meimoth, D. (200 1). Identificationof facial images in perïpherai vision. Vision R w c h 4 1, 599-610. McMden, PA., Shore, D.I., Henderson, RB. (2000). Testing a two-component maiel of face identification: effécts of inversion, contrast 609-619. reversal, and direcîion of lighting. Perce-

Limits to Face Recognition 59

Murray, RF.,Bennett, P.J.,& Sekuler, A.B. (To be submitted). The Statistics of Response Classification, Nasanen, R. (1 999). Spatial fiequency ùandwidth used in the recognition of facial images. Vision Researçk 39.3824-3833. OToole, A.J., Abdi, W., Deffenbacher, KA., & Valentin, D. (1995). A percephial leamhg thwry of the information in fixes. In T.Vdentine .. of f h recognition (pp. 159(Ed.), w v e and camg@&bnal 182). New York,NY: Routiedge.

Pelli, D.G.(1990). The quantum efficiency of vision. in C. Blakemore (Ed.), Vision. Codinn and EEciency (pp. 3-24). Cambridge, UK:Camtnidge Univ. Press. Pelli, D.G.(1997). The VideoToolbox software for visual psychophysics: Transforming numbers into movies. Suatiai Visi~n-10, 437-442.

PeIli, D.G.,& Farell, B. (1999). Why use noise? Journal of the Society of America A, 16,647453. Rakover, S.S. (1999). Thompds Margaret Thatcher illusion: when inversion fàils. pl227-l23O.

Rhodes, G. (1995). Face recognition and configura1 coding. .In- T. ..

Valentine (Ed.), -ve and- OC (pp. 4748). New York, N'Y:Routledge.

of f'ace recognition

Richards, V.M.,& Zhu,S. (1994). Relative estimates of combination weights, decision criteria, and interna1 noise baseci on correiation coefficients. cSm& of America 95 ( l), 423-434.

Lirnits to Face Recognition 60

Rieke, F.M.,Wartand, D., de Ruyter van Steveninck, R, Biaiek, W. (1997). S~ikes:Explonno the Neural Code (1" ed.). Cambridge, MA: MIT Press. Sperling, G. (1989). Three stages and two systems of visual processing. Spatial Vision. 4,183-207.

Tanaka, J.W., & Sengco, J.A. (1997). Features and their configuration in &ce recognition. Memory & Cognition. 25 (9,585492. Tarr, M.J., & Gauthier, 1. (2000). FFA: a flexi'ble fusiform area for subordinate-level visuaI processing automatized by expertise. Nature Neusoscience 3 (8), 764-769. Treves, A. ( 1997). On the perceptual structure of face space. BioSvstems. 40,189- 196. Young, A.W., & Bruce, V. (1991). Percephai categories and the computation of "Grandmother". in V. Bruce (Ed.), Face Recoanttron: a l Of v i t i v e r,sy&&gy (pp. 5-49). m i a l issue Of the Eurobean m Hove, E. Sussex: Lawrence Erlbaum Associates Ltd. * .

Limits to Face Recognition 61

Limits to Face Recognition 62

Appendi: The Relationship Between the FIE and 'Relational' Information Use

Researchers who have assumed some relationship between face inversion and the use of relational information have often cited the composite face effect and Thatcher illusion as justifications for this assumption [Farah et al.. 1998; Freire, Lee. & Symons, 1000: Gauthier et al., 1998; Hole. George, & Dunsmore, 1999; Leder & Bruce, 2000;

Lewis & Johnston, 1997; McMulien, Shore, & Henderson. 2000: Rakover, 1999: Tanaka & Sengco, 1997).

The Composite Face Effecr A composite face image is one whose top and bottom halves correspond to two

different people. Where observers must identify one of these halves, the composite effect refers to the suong gain in recognition performance when the composite image of the face is rnisaligned (Young, Hellawell. and Hay, 1987). Young et al. attribute the relatively weaker ability to identify a face bdfduring composite aIignrnent to the normal processing of relational information that this alignment somehow encourages. Relational information in this case, at least between the top and bottom haives, would be detrimental because it is uniikely that this information wodd accurately correspond to the identity of the face haif in question. The interesting aspect of this effect is that it can be nullified by inversion. Specifically, composite inversion resuits in the same performance advantage as composite misalignment, such that there is no performance difference between aligned

Limits to Face Recognition 63 and misaligned composites when both are inverted (Young et al., 1987). If conclusions about the nature of the original composite are m e (i-e-Stimulus aiignment elicits the use of Fafse rdational information) then these results suggest chat the FIE can be interpreted in the sarne way. Inversion may disrupt the use of relationai information and, during a composite face task. provide the sarne type of advantrige that misdignment does. However. there rnay be a simpler expianation. An observer engaged in the composite identification task may simply be more inclined to using the entire face cegion during an identification trisk. even when made aware that such use rnay Iead to incorrect decisions. In this case, a rnisaligned composite rnay lead to better performance because

the misleading haIf of the stimulus is not where it is expected to be. The observer may attempt to use the rnisleading half, in addition to the relevant half, and not nccessarily in some interactive combination with the relevant half as relationai information-use would suggest. Since the rest of the face is not where it is expected to be. onIy the top half is used for the decision; there is no 'fusing' of hdves (Young et ai., 1987; Gauthier et al.. 1998).

In other words, there is ;ui ambiguity about whecher misalignment dismpts an ability to perceive reIationd properties becween composite hdves. or simply prevents us from using additional information that we should noc be using. This ambiguity dso extends into the advantage for composite recognition shown with inverted faces. Assume that the task is to identify the top h d f of a composite, the haiFspannuig from the eyes to the hriirline. Face stimuli rnay be over-learned in the upright view such h a t our attempt to use the rest

of the face directs us toward the space below the eyes (toward the

Limits to Face Recognition f2 'ground'), which is the forehead of the correct haif when the composite is inverted, but the nose and mouth of the incorrect haif when the composite is upright,

Dificulty Conforinds

Configural, or Relationai, information is often defined as the relative spacing between local features. Local features, in tum. are often conventionally defined (e.g. eyes, mouth). Studies often attempt to evaiuate recognition perfomance in cases where either type of information is used selectively. And they ensure that this is the case by ernploying a task where, to evaluate configural information for exmple. ail stimuli within a set to be matched with a target are equivalent in terms of their local features. In order to identify a target correctly, then. the subject musc use the spacing of these features. In other words. experimenters use a match-to-srirnple or l-of-many face identification task where faces within the set of possible targets are differentiated oniy by either configural or featural information (Leder & Bruce, 2000; Leder, Candrian, Huber & Bruce, 2001; Gauthier et al., 1998).

The intrinsic difficulty of identification in these experiments- the Iimitation solely determined by the availability of overail information- is never controlIed for in these experiments. This presents a problem in interpreting the FIE. The key interest in this type of study is the degree of interaction between the hypotheticaily important stimuIus manipulation and orientation; this is the inversion effect, isoiated to a specific aspect of the stimuIus. However, if the d e p e of physicai distinctiveness between members of the featural set is vastly greater than that between members of the configurai set, the interaction we are studying may, in fact, be one between task dfl~cultyand orientation.

Limits to Face Recognition 65 A difficulty X orientation might tum out to be hteresting, but this is totaily

different from what researchers are really concemed with. This same principle cm also apply to studies that infer configurai information use with other methods. For exampIe, Cabeza & Kato (3000) use the prototype effect in order to compare configunl and featural information use. In this paradigm, subjects are required to indicate whether they have seen a given face, in a previously learned stimulus set. Prototypical faces, defined by the average of local feature geomeuies (indicating a prototypical configuration) from the learned set or a combination of one feature from every member (featural), are included as targets along with the faces actually viewed. Hypothetically, a subject will be more Iikely to incorrectly perceive one of these prototypes as having been seen in the learned set, if the prototype is similar to the set along a dimension that is important to face memory. The relative importance of this prototype category is assessed as a comparison between false recognition rates between prototype stimuli defined from the iearned set and prototypes of a completely different set. As one can see, however. two things determine this relative mesure: the actual difference between the Iearned and unieamed sets (giving the difference among real and unreai prototypes). and a subject's sensitivity to the information defmed by these prototypes. A mesure of efficiency is, in fact, necessary in many cases.

Afective Condition Hypo:hesis

A simpler, more generai explanation of resuIts in many inversion effect experirnents is that part-based processing is invariantly used, but that the shock or distraction caused by viewing a nonnaiiy upnght stimulus upsidedown strongly affects

Limits to Face Recognition 66 behavior in sorne way. For example, tfie Thatcher i1Iusion (Rhodes, 1995) is used to support part-based processing in the inverted condition since we do not notice the 'grotesque' distortion produced by invtrting leatures parts in this case; it is zissumed that such part-inversions do not affect relationai information. However, it could aiso be tme that the inveried face. as a whote. elicits such shock as to distract the observer from the feature-inversion. [n this case, chcir ability to perceive altered relationships in the inverted condition would have no relevance on acnial performance. Conveniently, the distracter hypothesis can ülso be used to explain more detailed

FIE experiments like the Leder & Bruce study (1998). Following the basic logic of the distracter hypothesis, it seems likely that the more of the face that you see, the more the target stimulus witt look hce-like. and the more likely you are to experience shock at the inversion of rui object that is hardly ever seen invened in red life. AppIying this to Leder & Bruce. the behaviorally relevant difference between the local and relational features may be that the latter features look more face-like. and are thus more distracting when

viewed upside-down. This expianation cm aiso apply to composite effects since the 'weirdness' of face inversion may disuact observes from the incorrect composition of differenthalves. In this case. inversion may not necessarily prevent the pmcessing of the hotiseic or configurai information between haives, but rnay distract the observer from doing so.