Techniques and Applications for Binaural Sound ... - NTRS - NASA

0 downloads 0 Views 1MB Size Report
the ears. Headphone localization techniques simulate spatial hearing in the free field or in an environmental context, essentially by replicating over headphones.
NASA Technical

Memorandum

102279

Techniques and Applications Binaural Sound Manipulation Human-Machine Interfaces Durand R. Begault and Elizabeth M. Wenzel

(NASA-TM-]O2279) APPLICATIONS _OR IN HUMAN-MACHINE

N90-2Q99o

TECHNIQUES AND BINAURAL SOUND MANIPULAIION INTERFACES (NASA) 27 p CSCL 05I

G3/53

August

1990

National Aeronautics and Space Administration

Unclas 030502o

for in

NASA Technical

Memorandum

102279

Techniques and Applications Binaural Sound Manipulation Human-Machine Interfaces Durand R. Begault and Elizabeth M. Wenzel Ames Research Center, Moffett Field, California

August 1990

N/LR National Aeronautics and Space Administration Ames Research Center Moffett Field, California 94035-1000

for in

CONTENTS

Page SUMMARY

....................................................................................................................................

INTRODUCTION BINAURAL

1

...........................................................................................................................

TECHNIQUES

Lateralization

AND

PSYCHOACOUSTIC

CONSIDERATIONS

1 ........................

.............................................................................................................................

Headphone

Localization

Decorrelation APPLICATIONS Introduction

3

............................................................................................................

4

...........................................................................................................................

12

OF BINAURAL

14

SOUND

................................................................................

..............................................................................................................................

Increased

Sensitivity

Cognitive

Representation

Introduction

3

for Communication of Auditory

14

............................................................................... Space

14

...........................................................................

16

........................................................................................................................

16

Urgency

..............................................................................................................................

16

Auditory

icons

17

Location

of auditory

and redundancy

.........................................................................................

cues in relation

to exocentric

objects

...............................................

CONCLUDING

REMARKS

..........................................................................................................

REFERENCES

...............................................................................................................................

19 20 21

°..



111

m"'_': "'_':';! .....

..... '_' '-_ _"'"

;:CT

FILMED

TECHNIQUES

AND

APPLICATIONS FOR HUMAN-MACHINE

Durand

R. Begault

BINAURAL SOUND INTERFACES

and Elizabeth

Ames

Research

MANIPULATION

IN

M. Wenzel

Center

SUMMARY

The implementation of binaural addressed from both an applications cessing human

sound to speech and auditory sound and technical standpoint. Techniques

by means of filtering with head-related interface systems is discussed, although

interface.

Research

issues

pertaining

Aerospace

Human

Factors

Division

cues ("auditory icons") is overviewed include pro-

transfer functions. Application to advanced cockpit the techniques are extendable to any human-machine

to three-dimensional at NASA

Ames

sound

Research

displays

Center

under

investigation

at the

are described.

INTRODUCTION

In normal environment.

hearing

we use both ears,

In spite

of this,

which

the auditory

allows

important

information

contexts such as aviation is usually received over a monotic while advanced cab aircraft such as the McDonnell-Douglas highly cally

sophisticated unrelated

fundamental

during access

warning

displays,

sounds

a role in everyday

Research to operator

visual

into improving overload.

1977-1987 perceptual

which

resulted

take

auditory

in interacting

with

human-machine

the

interface

(one-ear) headset. It is surprising that, MD-88 and the Boeing 767 incorporate

displays

no advantage

stress"

are largely

of the spatial

a proliferation information

of semantiwhich

plays

so

experience. the human-machine

For example,

systems

cockpit

advantages

in "high

from other

one

human

interface

source errors

than vision

reports (Hughes,

is partially that

at least

1989).

for communicating

motivated 65%

by giving

of jet

One direction important

transport

attention accidents

for improvement

information

is to

to an operator.

Because spatial hearing is a part of everyday experience that is important for both survival and orientation, it is sensible to determine how it can be manipulated for conveying information in a humanmachine

interface.

The types the signal. munication and warning

of binaural

In an aircraft using

radio

signals

sound

manipulation

context, transmission

that originate

there

that are feasible

are two distinct

originating from the audio

types

from ground system

to implement of sources: control

installed

depend

on the source

(1) headphone

or other

aircraft,

in the cockpit.

speech

of

com-

and (2) speech

For both kindsof signals,binauralsoundcanimprovetheintelligibility of speechsourcesagainst noise,andassistin the segregationof multiple soundsources.For signalsoriginatingin the cockpit, sound spatializationcan also be usedto organizelocations in perceptualauditory space,and to conveyurgencyor establishredundancy. This paper reviews both establishedand evolving techniquesfor the binaural presentationof sound.Although the examplein the applicationsectionof the papermakesreferenceto commercial aircraftcockpits,the resultsareextendibleto air traffic control(ATC), rotorcraft,sonardisplay,and othermultiple-channelhuman-machine interfaces.The first sectionreviewsof binauralspatialization techniques,along with relevant psychoacousticconsiderationsfor their use. The secondsection showshow thesetechniquescanbe usedfor improvingcockpitauditorydisplays. This work was performed while the author held a National ResearchCouncil research associateship.

2

BINAURAL

TECHNIQUES

AND

PSYCHOACOUSTIC

CONSIDERATIONS

Spatial hearing refers to the perceived location, size and environmental context of a sound source. In the case of headphone audition, three categories of techniques for manipulating the spatial element of sound are discussed: lateralization, headphone localization, and decorrelation. These categories results

are also generally

from

the particular

Lateralization each

ear;

the ears.

an environmental occur

naturally.

involve

percept

Headphone

manipulation

by the kind of audio

is usually localization

of interaural

of the

sound

techniques

context,

essentially

by replicating

In its most

successful

implementation,

source in three-dimensional methods other than those niques

distinguished

spatial

percept

that

technique.

techniques

the resulting

between

(but not uniquely)

source

moving

simulate over

time and/or spatial

along hearing

headphones

the resulting

intensity the

at

intracranial

axis

in the free field

the interaural percept

differences

or in

differences

that

can be of an externalized

space. Decorrelation techniques implement interaural differences mentioned above. In this paper, phase inversion and reverberation

using tech-

are examined. Lateralization

Lateralization

techniques

take advantage

of two separate

mechanisms

are involved in spatial hearing. One mechanism evaluates the amplitude while another mechanism evaluates time differences. Human sensitivity what is known are abbreviated

in the psychoacoustic as interaural level

tively. It is widely the fine structure frequencies

literature difference

accepted that ILD operates of signals for frequencies

between

approximately

of the auditory

system

that

differences at the two ears, to these differences support

as the "duplex theory" of localization: (ILD) and interaural time difference

the differences (ITD), respec-

over the entire frequency range, while ITD operates below 1.6 kHz, and on the envelope of the signal

200 Hz-20

kHz (Blauert,

on for

1983).

If we present a signal to each speaker of a set of headphones with no ITD or ILD, the sound is heard in the middle of the head. As ITD or ILD is increased past a particular threshold, the sound will begin to shift toward the ear leading in time or greater in amplitude. Once a critical value of ITD or ILD is reached, the sound stops moving along the intracranial axis and remains at the leading or more

intense

of ILD

ear. The effective

is around

approximately

10 dB. The 10 dB,

range upper

a change

of ITD is up to approximately range

of ILD

in position

is more

resulting

1.5 ms, and the effective

difficult from

to determine

than

is easy

to confuse

ILD

ITD:

range beyond

with

the

corresponding shows these

change in auditory extent that occurs around this point (Blauert, 1983). Figure 1 differences rated by subjects on a 1 to 5 scale, where 5 represents maximum

displacement.

The results

It is relatively

easy

shown

are valid

to implement

in figure 1. For example, consider the left and right channel outputs gain factor

for speech

a ITD/ILD

and noise.

digital

signal

processing

algorithm

based

on the data

placing a signal at the extreme left position in the head. We derive y(n)L and y(n)R from an input signal x(n) by multiplying by a

g for ILD: y(n)L

= x(n)

y(n)R

= x(n).

3

g,

g _ .3

6 earlier

4

m

E=.

=

0

2 4 6 -12 Interaural level difference

(a)

(b)

Figure

1. Measurements

of interaural

and by delaying

the input signal

by '_ for ITD:

y(n)L Figure

2 shows

In practical additive.

terms,

sounds.

differences

y(n)R

and circuit

extreme

The spatial compared

right, extreme

positions

to spatial

in lateralization

dichotically

against

intracranial

a line centered

have the ability headphones.

between

to place

using

the ears,

sounds

to produce

outside

lateralization

was thought tening

the term "headphone

to be only possible

was assumed

that localization subjective

was

judgments

indeed

possible

of externalization

distinct

techniques

techniques.

can be thought

("traded")

spatial

in three

of as

in the psychoasound.

locations

distinctly

of the head.

are severely techniques

A more

of the head in any position

Figure

3

for incoming different

spatial

when

they are

limited

allow

desirable

positioning

condition

in three-dimensional

sounds

would

be to

space,

using

Localization

localization"

would

with "real world"

to be lateralized.

b) lTD.

of the perceived

are heard

Lateralization

Headphone At one time,

1.5

of the head.

world. inside

each other

position three

frequencies

J 1.0

x = 1.5 ms

and ITD

radio

in the real

a) ILD;

of ILD

left, and center

attainable

hearing

on lateralization,

for these lateralization

a centered,

different

L .5

0

displays

how ILD and ITD can be combined three

_1 -.5

Interaural time delay in msecs

= x(n + x)

diagram

effects

and ITD are pitted

so as to produce

In this example,

locations:

= x(n)

the perceptual

ILD

literature

illustrates

along

the waveform

Usually,

coustic

L -1.0

-1.5

be considered

(nonheadphone)

Plenge

(1974)

with

headphones.

of a sound

4

an oxymoron.

listening,

and all headphone

was one of the f'u'st researchers The

source

basis

when

for his argument recorded

Localization

with

lis-

to demonstrate was

based

a mannequin

on head

o_._

Left

i

Input

W_

.....

a)

rrlrrlr"_ _'

Time

[_

Left p.

Input

_

Right _=

L

b)

Figure

Time.---.---_

2. Waveform

display

and circuit

diagrams

for lateralization

techniques,

Input

a) ILD; b) lTD.

L

Input

i

1

I

|

Binaural

O----

,np t

Figure

3. Circuit

scheme

with microphones behind the notion that externalized

"

combining

ITD and ILD lateralization techniques inputs (e.g., radio transmission).

1983).

separate

three

the pinnae. Recent work by Wightman and Kistler (1989b) has reasserted localization of sounds in three-dimensional space is possible with head-

phone listening; their work differs from Plenge in that they substituted techniques for the mannequin head recording process, a technique first (Blauert,

to spatially

This technique,

and its advantages

and limitations,

5

digital signal processing used by Platte and Laws

are reviewed

below.

The technique for implementing measurements of the head related

headphone localization involves creating a digital filter based on transfer function (HRTF). The HRTF can be thought of as a

frequency-dependent amplitude and time delay that results from the resonances of the pinnae and the ear canal, and the effects of head shadowing. These effects combine differentially, as a function of sound

source

direction;

hence,

there

is a frequency

and group

delay

transfer

function

incoming signal that is unique for any given source position. In other words, HRTF alters the spectrum of the input signal in a spatially dependent way. Psychoacoustic because

research

it complements

elevation threshold.

has established

the "duplex

and Oldfield,

of localization.

It explains

the spectrum

of the

for spatial

median

hearing

plane

partly

perception

The HRTF

is measured

by placing

to the ear canal

is to obtain

an impulse

measurement

response

of positions.

The

convolution

of the speaker

impulse

and Mellert,

y(n)

position

In simplified

response signal

digital terms,

in relation

is then recorded,

close

to the eardrum

1977; Wightman

for use in subsequent

of analysis.

adjusted

at the microphone

a probe microphone

(Mehrgardt

for purposes

at a carefully

of

below pinnae

1984).

the entrance

signal

of the HRTF

on an

and front-back positions, situations where the ILD and ITD are close to 0 and/or Other researchers have shown that localization acuity is diminished overall without

cues (Parker

speaker

theory"

the importance

imposed

filtering

to a listener

of h(n) (the HRTF

x(n)

whose

head

is repeated

for that position)

with its path of transmission

1989a).

algorithms,

an impulse

and the procedure

of a subject,

and Kistler,

and

or at

The goal a spectral

is sounded

from

is immobilized. for the desired

is therefore

a

The number

obtained

via the

to the microphone:

y (n) = x(n) • h(n) Since

x(n)

Fourier

is the the unit-sample

transform

we can obtain

impulse

(x(n)

H(eJt°)

= 2

= 1), it follows

that

h(n)

= y(n).

By taking

the

of h(n)

the frequency

and phase

response

h(n)e-Jt°n

of the HRTF:

in essence

n(0J°)Figure

4 shows

the magnitude

To implement be multiplied digitally (FIR)

HRTF

processing

by the spectra

by the equivalent

filters

(Oppenheim

of the HRTF

for headphone

of two HRTF

operation

subject

sound,

measurements,

of time-domain

and Schafer, y(n)L

for a single

for different

the spectrum one for each

convolution,

using

1975):

= x(n) • h(n)L;

y(n)R

= x(n) • h(n)R

angles

of incidence.

of the incoming

sound

must

ear. This is accomplished two finite

impulse

response

0

90

\

-

o (dB)

i

-8O

I

|

I

|

i

I

I

I

I

|

|

I

I

i

16000 Hz

0 Frequency->

Figure

4. HRTF

Figure directly Blauert

measurements:

5 shows

one subject, ipsilateral ear, source at 0, 90, and 270 ° azimuth on data from Wightman and Kistler, 1989a).

this method.

The

frequency

responses

of two filters

(based

for synthesizing

opposite the right ear are shown. (Frequency response of the HRTF shown and then derived by Begault using a digital filter design program (Begault,

a source

was given by 1987; Blauert,

1983).) How

successful

dimensional

is the

auditory

that compare

free-field

technique,

space?

Figure

in terms 6 illustrates

and headphone

of producing results

localization

in close agreement, but it must be noted for a 180 ° sound source) were corrected

a veridical

obtained

experience

by Wightman

performance.

of three-

and Kistler

(1989b)

The data for both conditions

seem

that front-back reversals (e.g., mistaking a 0 ° sound source in the data analysis. These reversals increased in the head-

phone case. Based on data from eight subjects, the percentage of front-back confusions from the total number of judgments made with free-field listening is 3-12%, and 6-20% with headphone localization. Individual differences were also marked; the subjects who localized badly over headphones also tended

to be the ones that localized

There processed sources

is a problem

with

sound

through

heard

on the median

badly

in free-field

the technique

plane;

headphones. sources

conditions.

in that

some

people

are unable

This is particularly

synthesized

to appear

from

the front

sound as if they are inside of the head. Often, a "bow tie" pattern constant radius from the center of the head is specified (see fig. 7). We propose filter virtual

three

in conjunction sound

reflections,

position,

areas with

for improving

headphone

a head-tracking

device

(2) adding

and (3) obtaining

environmental

or synthesizing

localization to allow cues

"optimal"

performance:

in the

form

search

HRTF-

for simulating of the listener

is perceived

auditory

HRTFs.

to externalize

troublesome

when

with

a dynamic

orientation

of reverberation

usually

a circle

(1) using and

sound

and/or

to the early

A magnetic head-trackingdevice dimensional Wenzel

coordinates

and

(Wenzel,

about

have

Wightman,

appropriate recorded

Foster

filters

for intermediate

a hardware/software

1988)

to an input

by the tracker an interpolation

developed

and Foster,

FIR filters

filters,

that is attached to a set of headphones can transmit threea listener's head and body position to a receiving device. At Ames,

is finer

than

scheme

that takes

source

must

system

known

the output coordinates

(see fig. 8). Because

that of the sampling be used to derive

as the

of the tracker

the granularity

of the number

the appropriate

Convolvotron

of positions

impulse

response

positions.

(dB)

HRTF spectrum: measured at left ear Left r-! -8O

Inpu_

FIR filter I

#1

I

"-%1

0 L._

FlR#_filter _CI_

_

Right

HRTF spectrum: measured at right

(dB)

-80

1

I

I

I

I

I

I

I

I

1

I

1

t

I

I

I

I

0

I

I

20000

Hz

Frequency->

Figure

5. Headphone

localization

technique

8

based

on HRTF

and

of head

filtering.

ear

assigns

movement

by the HRTF for the FIR

180

0

Free field azimuth

120

"O

ee 0

(SDO) •

_e)

(SDO)

o





i1

60

d

i

Headphones azimuth

o

o

l

|

O

0 0

O

B•

"O

-60

"O ..-i

60e

• 11

.'• -120

_1Elevation • i " i •

i_ _=• I61 e . ¢ t_]

_

..

301-

Figure

-120

-60

0

6. Free field and headphone

0

30

of measured

..... _;/ _/__

f

o

position,

performance

0

for a single

HRTF

(approximately

positions

7. Perceptual

with

stimuli

externalizing Another

= Frequently

!,,,o

result

of HRTF

localized

in normal

sounds

equally

__p,.,_.

0

\(_%#k

.,o.,

hearing

Positions distant

frequendy from

reported

the center

by subjects

of the head.

when Note

presented

difficulty

localization

errors

is to add

"environmental"

cues.

With

described above, the stimulus is convolved with measurements It lacks environmental information in the form of early reflections arrive

with a sound

differentially

source

in most

listening

at the two ears in terms

contexts.

Begault

developed

at Ames.

Late reflections

play a role in externalization and front-back confusions; manipulation berant sound has been shown to affect perception of auditory distance

9

Early

of time of arrival,

This situation can be modeled with a ray-tracing of the enclosure and the position of the listener

one such program

in

to the front and rear.

to headphone

spatial angle of incidence. accounts for the dimensions ure 9 shows

filters.

synthesized

that also is convolved spatial

positions

Side

to sound

method of HRTF filtering under anechoic conditions. reverberation

on data from

2 Meters)

reported

180

3_0

(based

210

processed

solution

30 60 120

subject

Overhead

Figure

0 60

deg

1989b).

_

!1

I -30

-60

and Kistler,

k

:i I

o

Be-



;0

-30

-120

__S

60 30

e



60

120 Target

localization

Distance

150

,

'

.:z"

60

Wightman

/60

"

-=It' ......i -30

-180

-

" |

oF.ll

,,"



the made and

reflections

intensity,

and

computer program and sound source.

that Fig-

(reverberation)

may also

of the ratio of direct to rever(Begault, 1987; von B6kEsy,

I I

2

I

-3

1

Desired Measured

positions

1-4 I

1

position

4

11213 •.,,,.,.

.......

Measured

impulse

response

pairs

IIII

C

i I

_

-p°siti°n_ata ......

I

I

FIR filter

#1 Left Head tracking device

Input

FIR filter #2 Right

Figure 8. Basic method used by the Convolvotron; circuit shown for one sound source head tracker; i = software for interpolating between four pairs of stored HRTF impulse 1960).

An increase

ratio of energy its direction Another localization

in the number

of the early

of arrival area

reflections

with respect

techniques

using

the HRTFs

to the direct

sound,

has to do with

for the general it? There

of a "good

reflections

results

in a corresponding

and the sound

change

is less spatially

in the

correlated

in

to the listener.

for improvement

and if so, how do we derive and

of synthesized

only. HT = responses.

population,

the HRTF

If we want

can we use a single

are essentially

localizer."

itself.

two general

In the first

10

approach,

"set"

of HRTF

approaches: one

to use headphone

gathers

measurements,

averaging HRTF

methods, data

from

a

Listener: -4.0 -5.0 ori: 270.0 Source: 4.9 4.0 phi: 0.0 Environment file name: xx 6.0 6.0 (0.5) -6.0 6.0 (0.5) -6.0 No. reflections: 100 Td (i):

0.004

6.0

(0.5)

6.0

-6.0

(0.5)

sec.s i'

-14 -21 -29

0 -1

-99

:

-18 -33 -39

ENVIRONMENTAL ENCLOSURE MODELED with 100

,_

-14 -29 -31

I_F/_,

early ....

I Listener

I I Relative

}-----_

.:."_,_J ,7.,-" '

reflections

!

,

amplitude

of reflections within each 30 deg segment about the

User supplies Source and Listener Positions, size and shape of environmental context, absorptive properties of boundaries, number of early reflections, and orientation of sound source and listener. Program produces graphic output and an ascii score that allows signal processing of a given sound according to the model.

listener

I Binaural

reflectogram

]

\ D: -22

dB t ..........

R: 8 dB

Figure

9. Example

of user interface

for the ray-tracing early

number

of subjects

of Blauert

and uses some

and Mehrgardt

method

and Mellert

35 ms 80.Ox

program

used at Ames

for synthesizing

reflections.

of averaging

(Blauert,

the data.

1983; Mehrgardt

Examples

can be found

and Mellert,

1977).

in the work

Another

of averaging is to use techniques such as principal components analysis to find significant within each critical band. The second approach, which we are currently examining at Ames, the HRTFs

of a good

localizer,

i.e., someone

whose

localization

performance

headphones is superior compared to other subjects tested under the same with the averaging approach is that individual differences in the magnitude of the transfer minima

function

and maxima.

the subject's

become The

performance

help to converge

problem

"smoothed

out,"

resulting

in a transfer

with the good

localizer

approach

was idiosyncratic.

on a set of generalized

We anticipate

HRTFs.

11

both with and without conditions. and phase

function using

The problem characteristics

with less

is not knowing

that research

form

features is to use

extreme

to what

both approaches

degree will

Decorrelation Lateralization two different ILD,

and headphone output

signals

ITD, and HRTF

techniques inversion two signals

techniques

filtering

are not the only

methods

that can significantly affect both the spatial and multiple delay lines (see fig. 10). While from a single

lines is well-known processes

localization

are based

both

for each ear. This can be viewed

one are perhaps

and straightforward on combinations

innumerable, in their

transform

a single

as a decorrelation

for producing and timbral the number

binaural

signal

into

However,

decorrelation.

Two

dimensions of a sound are phase of techniques for differentiating

the use of phase

implementation.

input process.

inversion

Additionally,

and multiple many

delay

decorrelation

of these techniques. Front

Left

a)

]

= Intracranial

[]

= Diffusion

Input

@

Right

location

ot composite

of low frequencies

_

signal

< 1.6 kHz

Left Right

yVr

V Left

Stereo output Input

Figure

10. Decorrelation

techniques,

a) Phase

inversion;

12

b) cascaded

delay

line with interaural

shift.

Phaseinversionanddelay line decorrelation localization

of the "center

of gravity"

as described

of a broadband

here do not affect

signal.

Rather,

it affects

the percept

of the

the "diffuseness"

or

spatial extent of a sound source. For example, two sounds can both be localized intracranially, the decorrelated sound will seem to be more "spread out" than the correlated one (see fig. lO(a)). With frequency selective ture

binaural phase inversion

of speech,

the spreading

out effect

components. An explanation for this phenomenon nature of ITD perception. As mentioned previously,

of a waveform

waveform

envelope

the envelope

operates operates

of the signals

only

below

between

200 Hz-20

is identical

delay

line is another

input.

Figure

10(b)

delay

device.

This circuit

illustrates

primarily

with the lower

can be given in terms of the frequency sensitivity to the ITD of the fine struc-

1.6 kHz, (Blauert,

while 1983).

sensitivity With

but the fine structure

to the ITD

180 ° phase of each

of a

inversion,

harmonic

com-

Speech contains frequencies within the operating range spreading of sound occurs as a function of frequency. method

a simplified

adds four

kHz

at the two ears,

ponent is time delayed by a half-cycle. both forms of ITD, hence, a differential A multiple

approximately

occurs

but

of producing

version

time delays

two decorrelated

of a program

to the signal

available

within

signals on a typical

an approximately

from

of

a single

stereo

digital

25 ms period,

resulting in a timbral change to the signal that is similar to the effect of early reflections heard from walls in a small room. Additionally, by implementing a slight (< 2 ms) time shift between the delay pattern

at each

channel,

the sound

image

is perceived

as being

larger

phase inversion technique. This process is similar to the interaction environmental context: timbral changes result because of decorrelated

13

or more

spread

out, as with the

of a sound source patterns of reflected

within sound.

an

APPLICATIONS

OF BINAURAL

SOUND

Introduction This section

overviews

first is applicable originate

two application

to speech

in the cockpit.

radio

These

areas

for binaural sound

transmission,

areas

and both

are Increased

apply

Sensitivity

auditory

spatial

as carriers

into a grammar

perceived

location

The

that is redundant

of auditory

difference

between

of increasing

application.

allows

important

a listener

ing undesirable positions

auditory

in auditory

These

This

a synthetic

can clearly

separate

means

Sensitivity

a binaural

and a diotic

More

input

work

a noisy

environment.

in stark

contrast

is more

Studies

spatial

together display

profound both

localization

of speech

The noise

to the situation

by Cherry

(1953)

called

cocktail

"the

a single

under

than

which ears

with the environment,

and to allow

to allow perceptual engineers

of sound

layers

sources,

can be easily

from the observation to understand

and location

of the sound,

are

(the use of

simply

being

is the usual compared

of sound

an

criteria

to one

specifically,

sources

of sound

ear

in suppress-

sources

to discrete

seems where

one

by listening

to interfere a pilot

Although that

listen

and Cherry party

that in a group stream

binaural

and

effect"

of people

of speech.

and monaural

Taylor and

we use

(1954)

The

is not the

listening.

to a person

This

spea k in

both ears are open. voices

the existence hearing.

simultaneously, of studies

difference

(noise or other voices) and the necessary level for intelligibility both monotic and dichotic conditions. The difference between

so that a listener

This is coming

because a pilot was unable to over the single-channel radio

established

who are all speaking

of sound

recordings:

delivery

undifferentiated

to binaural

This has led to a number

inputs

in everyday

when

to many

its relation

presentation.

delivery,

two-channel

with one ear plugged

less with speech

must

of multiple

they mix multitrack

for two-channel

sources.

it is a powerful

demonstrated

segregation do when

over a monotic headset. Indeed, aviation accidents may have occurred attend to the "correct" voice out of the several that he hears coming transmission.

commonly

of applications

objects).

is the fact that using

what recording

the multiple

of separating

advantage

as noise,

types

grammar),

of exocentric

system

in interacting

such

that

the binaural

of urgency" of an auditory warning different perceived auditory spatial

or iconic

dimensions

important

signals

(using

for Communication

or "spacious"

advantages

Three

the "levels (organizing

position

Increased

is exactly

they create

binaural

to indicate

and warning

The

of speech and segregation of a desired of Auditory Space (using perceived

information).

to a verbal

displays.

space.

two advantages

by the listener.

icons

the "pleasing"

in a commercial

only

of semantic

urgency (using auditory space to indicate with a nonspatial grammar), redundancy

positions

issue

positions

to speech

auditory

for Communication

advantage over monotic or diotic listening for intelligibility source from multiple sources), and Cognitive Representation discussed: transmitted

to aircraft

The

is

comes

it is still possible

comparing

in dB between

of what term

intelligibility a masker

level

of the desired signal is measured the two cases, i.e., the improvement

for in

intelligibility due to binaural presentation, is termed the binaural intelligibility level difference, BILD. Results from experimental data evaluating the BILD differ depending on the stimuli used,

or and

14

criteria for evaluatingintelligibility; generally,it rangesfrom 3 to 12dB (Blauert, 1983).Koening (1950) also describedthe advantageof binaural over monauralhearingfor squelchingnoise;his resultswereverified with burststimuli by Zurek (1979). Work by BronkhorstandPlomp(1988)measuredthe BILD as a function of angleof incidence, usingconditionsthatessentiallycomparethe lateralizationtechniquesdiscussedearlier (pureITD or ILD) to headphonelocalization techniques(listening throughthe earsof a mannequinhead).Figure 11 showstheir results; the improvementusing headphonelocalization techniquesis around 2-4 dB.

12 L r

_

T

-

I

i

L 120

I 150

---

_=1o _8 (D 0

cl .J

N 4 m

-__L

2 0

Figure

11. BILD measurements filtered conditions

Informal was

studies

not measured,

300 ° azimuth)

Additional

of binaural

conducted

at Ames

information

using

provided

to be superior

summation

HRTF

I 90 Azimuth

180

as a function of azimuth: comparison of ILD, ITD, and HRTF(based on data from Bronkhorst and Plomp, 1988).

the externalization

were judged

ITD techniques. Because

were

30

I 60

HRTF

for selective can be found

of loudness,

filtering

by HRTFs attention in NASA

a binaurally

with

to four

Although ITD

voices,

(60,

compared

equipped

pilot

would

15

to ILD

also

and or

need

less

in suppressing hearing fatigue. them from other crew members.

in a monitoring system arrangement. Speech sound texture of air traffic control (ATC) and

cockpit warnings. This has the added advantage that repositioning the body and raising the voice to be heard above background noise (the Lombard effect) could be avoided, less fatigue.

the BILD 90, 240

TM-102826.

amplitude of the signal at each ear, giving an additional advantage Some pilots object to using headphones on the basis that it isolates It is possible to alleviate this by using microphones from crew members could be mixed into the overall

techniques. maximal

the level resulting

of in

Cognitive Introduction--

A binaural

within the "mind's location of sounds primarily

visual,

The idea of mental tory

input

that an analog

biases

maps

has

clearly

and Radeau,

simulation

(Goldin

indicate

studied,

can potentially

define

a synthetic

space

space" as the listener's organization imagery. It involves representation

representational

objects

(Cooper

and Thorndyke,

well

of sound

Space

of the that is

in content.

or imagery-based of spatial

not been

of Auditory

We define a "synthetic mind, by using mental

than propositional,

transformations

and cognitive

auditory

aural eye." within the

rather

Representation

1982; Kuipers,

experiments

that the two systems

system

and Shepherd,

is supported imagery

1980). Although

concerned

interact

exists

1978),

with

in the encoding

by studies

(Kosslyn,

imagery

based

auditory

and visual

of spatial

information

1981), on audi-

perceptual (Bertelson

1981).

Spatially

correspondent

analog

representations

are also supported

by the importance

of localiza-

tion for survival. Spatial pattern recognition survival in an environment in which many

would seem to be a fundamental requirement for animal events occur simultaneously in different spatial locations.

Especially

it is likely

in the absence

connection of auditory

of visual

with spatial hearing. space in the listener

The application literature,

albeit

simplistic

manner

of spatial from

(e.g.,

before pitch

children

Figure cates was

have

content

One possible

a possible

differences)

of

previously

manipulated

spatial

in

discern

An analogy

or used "impoverished"

whether

children

a command

stimuli,

use

the

is urgent parents

can immediately

pay attention,

in a

that a system of head recording

is to the kind of communication

that they had better

in the cues

applications described below techniques described previously.

voice

spatial

a diotic

audio

map for establishing

or not. The spatial

signal,

3 was synthesized semantic

occur

well

reserve

tell from

well before

the

they actually

of the message.

is urgent

with

intensity

substantiallymthe

of a command.

and position

of thinking

has been discussed

experiments

erred

urgency

synthesized

ence,

they

a command

ing the relative

systems

Some

be able to immediately

of their parents'

12 shows

whether

only interaural

itself is interpreted.

the semantic

warning

perspectives.

modes

standpoint, we are interested in eliciting a map semantic meaning as a function of position.

1965). Doll et al. (1986) concluded from their research systems would be beneficial. They used a mannequin

should

when

and intensity

realize

that imagery-based

spatialization. Examples localization, and decorrelation

A pilot

the command

for their

to auditory

different

using

technique for auditory lateralization, headphone Urgency-

audio

many

such as pure tones (Mudd, binaural cues for cockpit

stimuli,

From an applications so that we can convey

The numbered position

shown

dots represent

2 was synthesized

with a dichotic

organization

positions

a simple

time delay

for this set of positions

spatial

could

be applied

perceived

with

grammar

is for the most

for indicat-

positions:

a dichotic

and amplitude

that indiposition

time delay

1

differ-

differences. urgent

warning

to be

processed according to position 1. Less urgent commands are assigned to position 2 or 3. If a sound is transformed from the perceived left side positions to one directly inside of the head, there is a sort of "infringement"

of the listener's

personal

space.

It is suggested

16

that, with minimal

training,

a pilot

Front 2. Interaural time difference

ntiacranial

3. Interaural time and amplitude difference

Figure could

easily

12. A possible

associate

are perceived

a sound inside

possibility

of perceived

auditory

the head as occupying

is that the same

warning signal. The repetition means the action has not been

spatial

an urgency

positions.

space,

relative

to sounds

By reversing

topography

time from position

output channels,

positions

spatial

could

be used

in relation

to repetition

the signals

(4 and 5) at mirror

2 on the left, and the third time from

used

image

to obtain

locations

position

2 and

symmetrical

from

3 can be used

the center

Traffic Collision commands could

Avoidance come from

System (TCAS) voice command classifications 2 and 3, nonurgent resolution commands could

urgent

commands

could

originating

from

resolution

A command techniques.

sounds way

from

The

position

multiple

nonurgent space

commands.

icons

delay

technique

and

redundancy-

input

set against

sources.

The pilot

must

extract

cockpit,

while

simultaneously

to obtain

of the head.

meaning

signal

to the Using

as an example, traffic come from 4 and 5, and

distinguished

earlier

Urgency image

in section

as a function size,

further

and decorrelation

The cockpit high

from

on-board

work

by using 2 can

of location

techniques

decorre-

be applied would

differentiating

environment

level of ambient

vocal

to a range

noise

communication of alarms,

system,

of warnings that can be potentially displayed source and receiver, redundancy of an intended desired

An

in this

urgent

to define

to

from

an urgency

strategy.

a fairly

attending

on the particular

described

and

1.

1. can be further

their urgency.

This use of lateralization

auditory

Depending

space

in timbre

of a redundancy

different

tions.

from position

the urgency

by changes

is an application

Auditory

come

1 to underscore

be re-emphasized

of a

position

example of the elaboration of the spatial grammar is as follows: place one class of commands left (2 and 3); and one to the right (4 and 5), which reserves position 1 for urgent commands.

lation

that

of a command implies urgency, because the fact that it is repeated taken. For example, the first time a command is given, it comes from

3 on the left, the second

two other

1. No interaural difference ("urgency space")

to the sides.

Another

position

topography

axis

there

signals,

contains

a wide

from

the engine

with other

people

and automated

are approximately

200-500

range

of

and outside present voice

different

in the instructypes

to a pilot. In any communication system between message assures a better chance of the extraction of a

from noise.

17

Thereareredundantparadigmswhereinformationis encodedmore than once.For instance,in the pulse code modulation (PCM) standardfor digital encoding of auditory signals, an error correctionschemeis usedwhere 30%or more of the digital signal is literally replicatedon a tape (Nakajimaet al., 1983).Contrastingthis is a redundancyparadigmwhereidentical semanticsare conveyedwith different signs.An everyday example of the latter technique is the use of international traffic

signs

transmit

in the

the same

verbal

symbol

message.

method

verbal

verbal

icon

is possible

suggested

commands

fig. 13A).

Two

different

can be interpreted the possibility

for including

more

auditory

the auditory

in several

studies;

A.

of signs,

quickly

in TCAS icon,

and

(see fig. 13B). The advantage

are "sandwiched"

types

than

iconic

and

the verbal

verbal,

symbol;

the

of misinterpretation.

redundancy

with an appropriate

by recognizing

This idea has been where

The

presentation

quickly

(see

to reduce

message

with the icon upon be grasped

States

is redundant

A similar link each

United

icon,

The technique the verbal

to this is that the semantic

and can then

for example,

temporally

messages.

then precede be verified

Patterson

between

(1982)

auditory

content

by the verbal suggests

is to

message can icon.

a technique

icons.

B.

Verbal

Recognition

NO LEFT TURN

message

period:

A

B

C

Time

Figure

13. Identical

semantic

content

achieved

with combinations

of nonverbal

and verbal

semiology. Redundancy

has been

communication,

where

also be established same

time

advanced warning accessing (Mowbray

interval.

Returning

two types

of semiotic

by presenting Multi

so far in the context a pilot

sensory

cues

objects

follow

of a sequential, one another

with multiple

percepts

have

to be used

begun

differentiated in time.

that parallel

process

But redundancy

one another

as redundant

indicators

during

of can the

in some

aircraft. For example, some modern aircraft use visual, auditory, and tactile cues (red light, horn, and a stick shaker) for indicating aircraft engine stall. The supposition is that by multiple perceptual pathways and Gebhard, 1961).

It is also possible auditory

discussed

space

to achieve

for this purpose

to the concept

parallel is possible

of indicating

to the pilot

at the same

redundancy

in a purely

time, the chance

as long as the semiology

urgency

by spatialization

18

aural

manner.

of position to a center

of error

The

manipulation

is somewhat position

is reduced

of

limited.

(position

1 in

figure

12), the

"invasion"

urgent

auditory

icon.

of a sound

In this case both the icon

warn a pilot of urgency. urgent

aural

icons

Location

into urgency

Hence,

to further

of auditory

a limited

establish

spatial cues is to implement

an urgency

a headphone

can be combined

and the spatial

grammar

cues in relation

space

position

of spatial

sound

with

an appropriately

of the sound

manipulation

simultaneously

could

space.

to exocentric localization

objects-

system

Perhaps

the most

where the location

dimensional space is paralleled by its perceived auditory position. Ames' virtual environment project VIEW (Fisher et al., 1988).

advanced

of an object

Related research In a telerobotics

VIEW, an example would be to use spatialized auditory cues to indicate the position a robot, such as a spacecraft. With three-dimensional sound, the position of objects these

auditory

cues can be monitored

Headphone positional

localization

information.

sampling ple, a pilot

is applicable

Even

of positions

along

maneuvering

without

a sound

through

having

to capture their position

to cockpit display

the 360 ° azimuth a busy

be used with

warning

limited would

airport

systems

approach

(e.g.,

in a cockpit

would

be able

in three-

is under way at application of of objects associated

near with

communication

of

visually.

that require

to one elevation be useful

use of

at head

level)

environment.

with a

For exam-

to use an auditory

spatial

grammar based on the 12 hour hand clock positions, conceiving themselves in the center of the clock with 12 at 0 ° azimuth and 6 at 180 ° azimuth. If a pilot heard a TCAS warning that stated, "traffic at 8 o'clock,"

a headphone

localization

position

at 120 ° azimuth.

ability.

It is still currently

positions With headphone

with absolute continued

techniques Martens,

The problems unfeasible

could

provide

with front-back

to relate

redundancy

reversals

all 12 clock

by placing

mentioned

positions

in a virtual

the perceived

previously

limits

space

to perceived

the

confidence. research,

localization

HRTF-synthesized found by several

technique

should

the

externalization

be alleviated.

and front-back It is definitely

reversal

possible

positions, especially for conveying positions to the authors as an improvement over traditional intensity

for creating

spatial

auditory

displays

(Begault,

1984).

19

1986;

to use

problems a limited

inherent number

in of

sides; their effect has been or time difference stereo

Griesinger,

1989;

Kendall

and

CONCLUDING

Three

techniques

for controlling

tion, and decorrelation--have areas of research, including binaural

sound

to cockpit

the binaural

been efforts

speech

discussed at Ames.

and warning

REMARKS

sound

image--lateralization,

headphone

systems.

Improvements

in the intelligibility

and segregation of multiple source inputs were outlined. In addition, suggestions use of the mapping of auditory space to convey urgency, redundancy, and auditory perceptual

display.

Using

the

techniques

localiza-

in terms of implementation, perception, and current This was exemplified by showing an application of

outlined

would

faculty.

20

provide

powerful

of speech

were made for the position within an

control

over

a vivid

REFERENCES

Begault, Begault,

D. R.: Spatial

Manipulation

and Computers.

D. R.: Control of Auditory La Jolla, CA, 1987.

Bertelson,

P.; Spatial

Blauert,

and

Radeau,

Discordance.

J.: Spatial Press,

Bronkhorst,

M.:

Distance.

Hearing:

Dissertation,

Cross-Modal

Perception

Ex Tempore,

Bias

and

University

Perceptual

and Psychophysics,

the Psychophysics

vol. IV, no. 1, 1986, pp. 56-88. of California

Fusion

with

San Diego,

Auditory-Visual

vol. 29, 1981, pp. 578-584.

of Human

Sound

Localization.

Cambridge:MIT

1983. A. W.; and

ences

on

Plomp,

Speech

R.: The Effect

Intelligibility

of Head-Induced

in Noise.

J. Acous.

Interaural Soc.

Time

America,

and Level

vol.

83,

Differ-

no. 4,

1988,

pp. 1508-1516. Cherry,

E. C.: Some Experiments on the Recognition of Speech J. Acous. Soc. America, vol. 25, no. 5, 1953, pp. 975-979.

Cherry,

E. C.; and Taylor, W. K.: Some Further Experiments on the Recognition and Two Ears. J. Acous. Soc. America, vol. 26, no. 4, 1954, pp. 549-554.

Cooper,

L. A.; and Shephard, R. N.: Transformations E. C. Carterrette and M. P. Friedman (Eds.), Press,

Doll,

Audio

J. M.; Engelman,

for

Air Force

Cockpit Base,

S. S.; Wenzel, Workstations. CA,

Goldin,

Applications.

E. M.;

Tech.

C.;

and

D. J.: Development

Report

Aerospace

Coler,

In Proceedings

Factors,

Two

of Speech

Ears.

with One

I_.n

Medical

AAMRL-TR-86-014,

Factors

Laboratory,

M. W.:

Virtual

Society

32nd

Directional

Wright-Patterson

Research

McGreevy,

of the Human

of Simulated 1986.

Interface

Annual

Environment

Meeting,

Anaheim,

D.: Glass

P. W.: Simulating

Navigation

for Spatial

Knowledge

Acquisition.

vol. 24, 1982, pp. 457-471.

D.: Equalization

Reproduction.

vol.

with

1988, pp. 91-95.

Human

Hughes,

and

on Representations of Objects in Space. Handbook of Perception. New York:Academic

W. R.; and Folds,

OH: Armstrong

S. E.; and Thorndyke,

Griesinger,

One

1978.

T. J.; Gerth,

Fisher,

with

and Spatial

J. Audio Cockpit

Eng. Soc.,

Study

Reveals

Equalization

of Dummy

vol. 37, nos. Human

Factors

131, no. 6, 1989, pp. 32-36.

21

Head

Recordings

for Loudspeaker

1-2, 1989, pp. 20-29. Problems.

Aviat.

Week

and Space

Tech.,

Kendall, G. S.; andMartens,W. L.: Simulatingthe Cuesof ments. Koening,

Tech.

Report,

W.: Subjective

Northwestern

Effects

University,

in Binaural

Spatial

Heating

in Natural

Environ-

1984.

Hearing.

J. Acous.

Soc. America,

vol. 22, no. 1, 1950,

pp. 61-62. Kosslyn,

S. M.: The

Medium

and the Message

in Mental

Imagery:

A Theory.

Psychological

Rev.,

vol. 88, 1981, pp. 46-66. Kuipers,

B. J." The Cognitive L. P. Acredolo (Eds.), Press,

Map: Could it Have Been Any Other Spatial Orientation: Theory, Research,

In H. L. Pick Jr. and Applications. Plenum

1980.

Mehrgardt, S.; and Mellert, V.: Transformation Characteristics Soc. America, vol. 61, no. 6, 1977, pp. 1567-1576. Mowbray, G. H.; and (Ed.), Selected New York: Mudd,

Way? and

Gebhard, Papers

Dover,

of the External

Human

Ear. J. Acous.

J. W.: Man's Senses as Information Channels. In H. W. Sinaiko on Human Factors in the Design and Use of Control Systems.

1961.

S. A.: Experimental

Evaluation

of Binary

Pure-Tone

Audition.

J. Appl.

Psych.,

vol. 49, no. 2,

1965, pp. 112-121. Nakajima, D.; Doi, T.; Fukuda, Tab Books, 1983. Oppenheim, Hall, Parker,

A. V.; and Schafer, 1975.

S. P. A.; and Oldfield, II. Pinnae

Patterson,

Cues

and Iga, A.: Digital

R. W.: Digital

S. R.: Acuity

Absent.

R.: Guidelines Aviation

Plenge,

J.;

for Auditory

Authority,

London,

G.: On the Differences

signal

of Sound

Perception,

audio

technology.

processing.

Blue Ridge

Englewood

Localization:

Cliffs,

A Topography

Summit,

PA:

NJ: Prentice-

of Sound

Space.

vol. 13, 1984, pp. 600-617.

Warning

Systems

on Civil

Aircraft.

Tech.

Rep.

82017.

Civil

1982.

Between

Localization

and Lateralization.

J.Acous.

Soc.

America,

vol. 56, no. 3, 1974, pp. 944-951. von B6k6sy, Wenzel,

Georg:

Experiments

in Heating.

New York:McGraw-Hill,

E. M.; Wightman, F. L.; and Foster, Dimensional Acoustic Information. Annual

Wightman,

Meeting.

F. L.; and

Synthesis.

Anaheim, Kistler,

J. Acous.

S. H.: A Virtual In Proceedings

1960.

Display System of the Human

for Conveying ThreeFactors Society 32nd

CA, 1988.

D. J.: Headphone

Soc. America,

Simulation

vol. 85, no. 2, 1989a,

22

of Free-Field pp. 858-867.

Listening.

I: Stimulus

Wightman,F. L.; andKistler, cal Validation. Zurek,

D. J.: Headphone J. Acous. Soc. America,

P. M.: Measurements

of Binaural

Echo

Simulation of Free-Field Listening. vol. 85, no. 2, 1989b, pp. 868-878. Suppression.

1979, pp. 1750-1757.

23

J. Acous.

Soc.

America,

II: Psychophysi-

vol.

66, no. 6,

IXl/ A S_cm

Report Documentation

Page

AdmlniaNon

1, Report

2. Government

No.

NASA

Act_ession No.

3. Reciplent's

Catalog

No.

TM- 102279 5. Report

4. Title and Subtitle

Techniques and Applications for Binaural in Human-Machine Interfaces

August

Sound Manipulation

7. Author(s)

Durand

R. Begault

and Elizabeth

Date

1990

6. Performing

Organization

Code

8. Performing

Organization

Report

No.

A-90066

M. Wenzel

10. Work

Unit No.

505-69-01 9. Pedorming

Ames

Organization

Research

Moffett

Field,

Name

and Address

11. Contract

or Grant

No.

Center CA 94035 13. Type of Report and Period

12. Sponsoring

National

Agency

Name

Aeronautics

Washington,

DC

15. Supplementary

Technical

and Address

and Space Administration

Covered

Memorandum

14. Sponsoring

Agency

Code

20546-0001

Notes

Point of Contact:

Elizabeth

M. Wenzel,

Moffett

Field,

(415)604-6290

Ames

Research

Center,

MS 239-3,

CA 94035-1000 or FTS 464-6290

16. Abstract

The implementation icons")

is addressed

include

processing

advanced

cockpit

by means human

to any human-machine under investigation are described.

17. Key Words

(Suggested

of binaural

sound

to speech

from both an applications of filtering

interface interface.

with head-related

systems

is discussed,

Research

at the Aerospace

and auditory

and technical

transfer although

issues pertaining

Human

Factors

Techniques

functions.

at NASA Ames

Psychoacoustics Digital signal processing Human-machine interfaces

Subject

Classif.

(of this report)

Unclassified NASA

FORM

1628

20. Security

Classif.

sound displays Research

(of this page)

21. No. of Pages

28

OCTa6 Technical

Information

Service,

Center

Category-53

Unclassified

For sale by the National

to

are extendable

Statement

Unclassified-Unlimited

19. Security

overviewed

Application

the techniques

to three-dimensional

Division

18. Distribution

by Author(s))

sound cues ("auditory

standpoint.

Springfield,

Virginia 22161

22. Pdce

A03