the ears. Headphone localization techniques simulate spatial hearing in the free field or in an environmental context, essentially by replicating over headphones.
NASA Technical
Memorandum
102279
Techniques and Applications Binaural Sound Manipulation Human-Machine Interfaces Durand R. Begault and Elizabeth M. Wenzel
(NASA-TM-]O2279) APPLICATIONS _OR IN HUMAN-MACHINE
N90-2Q99o
TECHNIQUES AND BINAURAL SOUND MANIPULAIION INTERFACES (NASA) 27 p CSCL 05I
G3/53
August
1990
National Aeronautics and Space Administration
Unclas 030502o
for in
NASA Technical
Memorandum
102279
Techniques and Applications Binaural Sound Manipulation Human-Machine Interfaces Durand R. Begault and Elizabeth M. Wenzel Ames Research Center, Moffett Field, California
August 1990
N/LR National Aeronautics and Space Administration Ames Research Center Moffett Field, California 94035-1000
for in
CONTENTS
Page SUMMARY
....................................................................................................................................
INTRODUCTION BINAURAL
1
...........................................................................................................................
TECHNIQUES
Lateralization
AND
PSYCHOACOUSTIC
CONSIDERATIONS
1 ........................
.............................................................................................................................
Headphone
Localization
Decorrelation APPLICATIONS Introduction
3
............................................................................................................
4
...........................................................................................................................
12
OF BINAURAL
14
SOUND
................................................................................
..............................................................................................................................
Increased
Sensitivity
Cognitive
Representation
Introduction
3
for Communication of Auditory
14
............................................................................... Space
14
...........................................................................
16
........................................................................................................................
16
Urgency
..............................................................................................................................
16
Auditory
icons
17
Location
of auditory
and redundancy
.........................................................................................
cues in relation
to exocentric
objects
...............................................
CONCLUDING
REMARKS
..........................................................................................................
REFERENCES
...............................................................................................................................
19 20 21
°..
•
111
m"'_': "'_':';! .....
..... '_' '-_ _"'"
;:CT
FILMED
TECHNIQUES
AND
APPLICATIONS FOR HUMAN-MACHINE
Durand
R. Begault
BINAURAL SOUND INTERFACES
and Elizabeth
Ames
Research
MANIPULATION
IN
M. Wenzel
Center
SUMMARY
The implementation of binaural addressed from both an applications cessing human
sound to speech and auditory sound and technical standpoint. Techniques
by means of filtering with head-related interface systems is discussed, although
interface.
Research
issues
pertaining
Aerospace
Human
Factors
Division
cues ("auditory icons") is overviewed include pro-
transfer functions. Application to advanced cockpit the techniques are extendable to any human-machine
to three-dimensional at NASA
Ames
sound
Research
displays
Center
under
investigation
at the
are described.
INTRODUCTION
In normal environment.
hearing
we use both ears,
In spite
of this,
which
the auditory
allows
important
information
contexts such as aviation is usually received over a monotic while advanced cab aircraft such as the McDonnell-Douglas highly cally
sophisticated unrelated
fundamental
during access
warning
displays,
sounds
a role in everyday
Research to operator
visual
into improving overload.
1977-1987 perceptual
which
resulted
take
auditory
in interacting
with
human-machine
the
interface
(one-ear) headset. It is surprising that, MD-88 and the Boeing 767 incorporate
displays
no advantage
stress"
are largely
of the spatial
a proliferation information
of semantiwhich
plays
so
experience. the human-machine
For example,
systems
cockpit
advantages
in "high
from other
one
human
interface
source errors
than vision
reports (Hughes,
is partially that
at least
1989).
for communicating
motivated 65%
by giving
of jet
One direction important
transport
attention accidents
for improvement
information
is to
to an operator.
Because spatial hearing is a part of everyday experience that is important for both survival and orientation, it is sensible to determine how it can be manipulated for conveying information in a humanmachine
interface.
The types the signal. munication and warning
of binaural
In an aircraft using
radio
signals
sound
manipulation
context, transmission
that originate
there
that are feasible
are two distinct
originating from the audio
types
from ground system
to implement of sources: control
installed
depend
on the source
(1) headphone
or other
aircraft,
in the cockpit.
speech
of
com-
and (2) speech
For both kindsof signals,binauralsoundcanimprovetheintelligibility of speechsourcesagainst noise,andassistin the segregationof multiple soundsources.For signalsoriginatingin the cockpit, sound spatializationcan also be usedto organizelocations in perceptualauditory space,and to conveyurgencyor establishredundancy. This paper reviews both establishedand evolving techniquesfor the binaural presentationof sound.Although the examplein the applicationsectionof the papermakesreferenceto commercial aircraftcockpits,the resultsareextendibleto air traffic control(ATC), rotorcraft,sonardisplay,and othermultiple-channelhuman-machine interfaces.The first sectionreviewsof binauralspatialization techniques,along with relevant psychoacousticconsiderationsfor their use. The secondsection showshow thesetechniquescanbe usedfor improvingcockpitauditorydisplays. This work was performed while the author held a National ResearchCouncil research associateship.
2
BINAURAL
TECHNIQUES
AND
PSYCHOACOUSTIC
CONSIDERATIONS
Spatial hearing refers to the perceived location, size and environmental context of a sound source. In the case of headphone audition, three categories of techniques for manipulating the spatial element of sound are discussed: lateralization, headphone localization, and decorrelation. These categories results
are also generally
from
the particular
Lateralization each
ear;
the ears.
an environmental occur
naturally.
involve
percept
Headphone
manipulation
by the kind of audio
is usually localization
of interaural
of the
sound
techniques
context,
essentially
by replicating
In its most
successful
implementation,
source in three-dimensional methods other than those niques
distinguished
spatial
percept
that
technique.
techniques
the resulting
between
(but not uniquely)
source
moving
simulate over
time and/or spatial
along hearing
headphones
the resulting
intensity the
at
intracranial
axis
in the free field
the interaural percept
differences
or in
differences
that
can be of an externalized
space. Decorrelation techniques implement interaural differences mentioned above. In this paper, phase inversion and reverberation
using tech-
are examined. Lateralization
Lateralization
techniques
take advantage
of two separate
mechanisms
are involved in spatial hearing. One mechanism evaluates the amplitude while another mechanism evaluates time differences. Human sensitivity what is known are abbreviated
in the psychoacoustic as interaural level
tively. It is widely the fine structure frequencies
literature difference
accepted that ILD operates of signals for frequencies
between
approximately
of the auditory
system
that
differences at the two ears, to these differences support
as the "duplex theory" of localization: (ILD) and interaural time difference
the differences (ITD), respec-
over the entire frequency range, while ITD operates below 1.6 kHz, and on the envelope of the signal
200 Hz-20
kHz (Blauert,
on for
1983).
If we present a signal to each speaker of a set of headphones with no ITD or ILD, the sound is heard in the middle of the head. As ITD or ILD is increased past a particular threshold, the sound will begin to shift toward the ear leading in time or greater in amplitude. Once a critical value of ITD or ILD is reached, the sound stops moving along the intracranial axis and remains at the leading or more
intense
of ILD
ear. The effective
is around
approximately
10 dB. The 10 dB,
range upper
a change
of ITD is up to approximately range
of ILD
in position
is more
resulting
1.5 ms, and the effective
difficult from
to determine
than
is easy
to confuse
ILD
ITD:
range beyond
with
the
corresponding shows these
change in auditory extent that occurs around this point (Blauert, 1983). Figure 1 differences rated by subjects on a 1 to 5 scale, where 5 represents maximum
displacement.
The results
It is relatively
easy
shown
are valid
to implement
in figure 1. For example, consider the left and right channel outputs gain factor
for speech
a ITD/ILD
and noise.
digital
signal
processing
algorithm
based
on the data
placing a signal at the extreme left position in the head. We derive y(n)L and y(n)R from an input signal x(n) by multiplying by a
g for ILD: y(n)L
= x(n)
y(n)R
= x(n).
3
g,
g _ .3
6 earlier
4
m
E=.
=
0
2 4 6 -12 Interaural level difference
(a)
(b)
Figure
1. Measurements
of interaural
and by delaying
the input signal
by '_ for ITD:
y(n)L Figure
2 shows
In practical additive.
terms,
sounds.
differences
y(n)R
and circuit
extreme
The spatial compared
right, extreme
positions
to spatial
in lateralization
dichotically
against
intracranial
a line centered
have the ability headphones.
between
to place
using
the ears,
sounds
to produce
outside
lateralization
was thought tening
the term "headphone
to be only possible
was assumed
that localization subjective
was
judgments
indeed
possible
of externalization
distinct
techniques
techniques.
can be thought
("traded")
spatial
in three
of as
in the psychoasound.
locations
distinctly
of the head.
are severely techniques
A more
of the head in any position
Figure
3
for incoming different
spatial
when
they are
limited
allow
desirable
positioning
condition
in three-dimensional
sounds
would
be to
space,
using
Localization
localization"
would
with "real world"
to be lateralized.
b) lTD.
of the perceived
are heard
Lateralization
Headphone At one time,
1.5
of the head.
world. inside
each other
position three
frequencies
J 1.0
x = 1.5 ms
and ITD
radio
in the real
a) ILD;
of ILD
left, and center
attainable
hearing
on lateralization,
for these lateralization
a centered,
different
L .5
0
displays
how ILD and ITD can be combined three
_1 -.5
Interaural time delay in msecs
= x(n + x)
diagram
effects
and ITD are pitted
so as to produce
In this example,
locations:
= x(n)
the perceptual
ILD
literature
illustrates
along
the waveform
Usually,
coustic
L -1.0
-1.5
be considered
(nonheadphone)
Plenge
(1974)
with
headphones.
of a sound
4
an oxymoron.
listening,
and all headphone
was one of the f'u'st researchers The
source
basis
when
for his argument recorded
Localization
with
lis-
to demonstrate was
based
a mannequin
on head
o_._
Left
i
Input
W_
.....
a)
rrlrrlr"_ _'
Time
[_
Left p.
Input
_
Right _=
L
b)
Figure
Time.---.---_
2. Waveform
display
and circuit
diagrams
for lateralization
techniques,
Input
a) ILD; b) lTD.
L
Input
i
1
I
|
Binaural
O----
,np t
Figure
3. Circuit
scheme
with microphones behind the notion that externalized
"
combining
ITD and ILD lateralization techniques inputs (e.g., radio transmission).
1983).
separate
three
the pinnae. Recent work by Wightman and Kistler (1989b) has reasserted localization of sounds in three-dimensional space is possible with head-
phone listening; their work differs from Plenge in that they substituted techniques for the mannequin head recording process, a technique first (Blauert,
to spatially
This technique,
and its advantages
and limitations,
5
digital signal processing used by Platte and Laws
are reviewed
below.
The technique for implementing measurements of the head related
headphone localization involves creating a digital filter based on transfer function (HRTF). The HRTF can be thought of as a
frequency-dependent amplitude and time delay that results from the resonances of the pinnae and the ear canal, and the effects of head shadowing. These effects combine differentially, as a function of sound
source
direction;
hence,
there
is a frequency
and group
delay
transfer
function
incoming signal that is unique for any given source position. In other words, HRTF alters the spectrum of the input signal in a spatially dependent way. Psychoacoustic because
research
it complements
elevation threshold.
has established
the "duplex
and Oldfield,
of localization.
It explains
the spectrum
of the
for spatial
median
hearing
plane
partly
perception
The HRTF
is measured
by placing
to the ear canal
is to obtain
an impulse
measurement
response
of positions.
The
convolution
of the speaker
impulse
and Mellert,
y(n)
position
In simplified
response signal
digital terms,
in relation
is then recorded,
close
to the eardrum
1977; Wightman
for use in subsequent
of analysis.
adjusted
at the microphone
a probe microphone
(Mehrgardt
for purposes
at a carefully
of
below pinnae
1984).
the entrance
signal
of the HRTF
on an
and front-back positions, situations where the ILD and ITD are close to 0 and/or Other researchers have shown that localization acuity is diminished overall without
cues (Parker
speaker
theory"
the importance
imposed
filtering
to a listener
of h(n) (the HRTF
x(n)
whose
head
is repeated
for that position)
with its path of transmission
1989a).
algorithms,
an impulse
and the procedure
of a subject,
and Kistler,
and
or at
The goal a spectral
is sounded
from
is immobilized. for the desired
is therefore
a
The number
obtained
via the
to the microphone:
y (n) = x(n) • h(n) Since
x(n)
Fourier
is the the unit-sample
transform
we can obtain
impulse
(x(n)
H(eJt°)
= 2
= 1), it follows
that
h(n)
= y(n).
By taking
the
of h(n)
the frequency
and phase
response
h(n)e-Jt°n
of the HRTF:
in essence
n(0J°)Figure
4 shows
the magnitude
To implement be multiplied digitally (FIR)
HRTF
processing
by the spectra
by the equivalent
filters
(Oppenheim
of the HRTF
for headphone
of two HRTF
operation
subject
sound,
measurements,
of time-domain
and Schafer, y(n)L
for a single
for different
the spectrum one for each
convolution,
using
1975):
= x(n) • h(n)L;
y(n)R
= x(n) • h(n)R
angles
of incidence.
of the incoming
sound
must
ear. This is accomplished two finite
impulse
response
0
90
\
-
o (dB)
i
-8O
I
|
I
|
i
I
I
I
I
|
|
I
I
i
16000 Hz
0 Frequency->
Figure
4. HRTF
Figure directly Blauert
measurements:
5 shows
one subject, ipsilateral ear, source at 0, 90, and 270 ° azimuth on data from Wightman and Kistler, 1989a).
this method.
The
frequency
responses
of two filters
(based
for synthesizing
opposite the right ear are shown. (Frequency response of the HRTF shown and then derived by Begault using a digital filter design program (Begault,
a source
was given by 1987; Blauert,
1983).) How
successful
dimensional
is the
auditory
that compare
free-field
technique,
space?
Figure
in terms 6 illustrates
and headphone
of producing results
localization
in close agreement, but it must be noted for a 180 ° sound source) were corrected
a veridical
obtained
experience
by Wightman
performance.
of three-
and Kistler
(1989b)
The data for both conditions
seem
that front-back reversals (e.g., mistaking a 0 ° sound source in the data analysis. These reversals increased in the head-
phone case. Based on data from eight subjects, the percentage of front-back confusions from the total number of judgments made with free-field listening is 3-12%, and 6-20% with headphone localization. Individual differences were also marked; the subjects who localized badly over headphones also tended
to be the ones that localized
There processed sources
is a problem
with
sound
through
heard
on the median
badly
in free-field
the technique
plane;
headphones. sources
conditions.
in that
some
people
are unable
This is particularly
synthesized
to appear
from
the front
sound as if they are inside of the head. Often, a "bow tie" pattern constant radius from the center of the head is specified (see fig. 7). We propose filter virtual
three
in conjunction sound
reflections,
position,
areas with
for improving
headphone
a head-tracking
device
(2) adding
and (3) obtaining
environmental
or synthesizing
localization to allow cues
"optimal"
performance:
in the
form
search
HRTF-
for simulating of the listener
is perceived
auditory
HRTFs.
to externalize
troublesome
when
with
a dynamic
orientation
of reverberation
usually
a circle
(1) using and
sound
and/or
to the early
A magnetic head-trackingdevice dimensional Wenzel
coordinates
and
(Wenzel,
about
have
Wightman,
appropriate recorded
Foster
filters
for intermediate
a hardware/software
1988)
to an input
by the tracker an interpolation
developed
and Foster,
FIR filters
filters,
that is attached to a set of headphones can transmit threea listener's head and body position to a receiving device. At Ames,
is finer
than
scheme
that takes
source
must
system
known
the output coordinates
(see fig. 8). Because
that of the sampling be used to derive
as the
of the tracker
the granularity
of the number
the appropriate
Convolvotron
of positions
impulse
response
positions.
(dB)
HRTF spectrum: measured at left ear Left r-! -8O
Inpu_
FIR filter I
#1
I
"-%1
0 L._
FlR#_filter _CI_
_
Right
HRTF spectrum: measured at right
(dB)
-80
1
I
I
I
I
I
I
I
I
1
I
1
t
I
I
I
I
0
I
I
20000
Hz
Frequency->
Figure
5. Headphone
localization
technique
8
based
on HRTF
and
of head
filtering.
ear
assigns
movement
by the HRTF for the FIR
180
0
Free field azimuth
120
"O
ee 0
(SDO) •
_e)
(SDO)
o
•
•
i1
60
d
i
Headphones azimuth
o
o
l
|
O
0 0
O
B•
"O
-60
"O ..-i
60e
• 11
.'• -120
_1Elevation • i " i •
i_ _=• I61 e . ¢ t_]
_
..
301-
Figure
-120
-60
0
6. Free field and headphone
0
30
of measured
..... _;/ _/__
f
o
position,
performance
0
for a single
HRTF
(approximately
positions
7. Perceptual
with
stimuli
externalizing Another
= Frequently
!,,,o
result
of HRTF
localized
in normal
sounds
equally
__p,.,_.
0
\(_%#k
.,o.,
hearing
Positions distant
frequendy from
reported
the center
by subjects
of the head.
when Note
presented
difficulty
localization
errors
is to add
"environmental"
cues.
With
described above, the stimulus is convolved with measurements It lacks environmental information in the form of early reflections arrive
with a sound
differentially
source
in most
listening
at the two ears in terms
contexts.
Begault
developed
at Ames.
Late reflections
play a role in externalization and front-back confusions; manipulation berant sound has been shown to affect perception of auditory distance
9
Early
of time of arrival,
This situation can be modeled with a ray-tracing of the enclosure and the position of the listener
one such program
in
to the front and rear.
to headphone
spatial angle of incidence. accounts for the dimensions ure 9 shows
filters.
synthesized
that also is convolved spatial
positions
Side
to sound
method of HRTF filtering under anechoic conditions. reverberation
on data from
2 Meters)
reported
180
3_0
(based
210
processed
solution
30 60 120
subject
Overhead
Figure
0 60
deg
1989b).
_
!1
I -30
-60
and Kistler,
k
:i I
o
Be-
•
;0
-30
-120
__S
60 30
e
•
60
120 Target
localization
Distance
150
,
'
.:z"
60
Wightman
/60
"
-=It' ......i -30
-180
-
" |
oF.ll
,,"
•
the made and
reflections
intensity,
and
computer program and sound source.
that Fig-
(reverberation)
may also
of the ratio of direct to rever(Begault, 1987; von B6kEsy,
I I
2
I
-3
1
Desired Measured
positions
1-4 I
1
position
4
11213 •.,,,.,.
.......
Measured
impulse
response
pairs
IIII
C
i I
_
-p°siti°n_ata ......
I
I
FIR filter
#1 Left Head tracking device
Input
FIR filter #2 Right
Figure 8. Basic method used by the Convolvotron; circuit shown for one sound source head tracker; i = software for interpolating between four pairs of stored HRTF impulse 1960).
An increase
ratio of energy its direction Another localization
in the number
of the early
of arrival area
reflections
with respect
techniques
using
the HRTFs
to the direct
sound,
has to do with
for the general it? There
of a "good
reflections
results
in a corresponding
and the sound
change
is less spatially
in the
correlated
in
to the listener.
for improvement
and if so, how do we derive and
of synthesized
only. HT = responses.
population,
the HRTF
If we want
can we use a single
are essentially
localizer."
itself.
two general
In the first
10
approach,
"set"
of HRTF
approaches: one
to use headphone
gathers
measurements,
averaging HRTF
methods, data
from
a
Listener: -4.0 -5.0 ori: 270.0 Source: 4.9 4.0 phi: 0.0 Environment file name: xx 6.0 6.0 (0.5) -6.0 6.0 (0.5) -6.0 No. reflections: 100 Td (i):
0.004
6.0
(0.5)
6.0
-6.0
(0.5)
sec.s i'
-14 -21 -29
0 -1
-99
:
-18 -33 -39
ENVIRONMENTAL ENCLOSURE MODELED with 100
,_
-14 -29 -31
I_F/_,
early ....
I Listener
I I Relative
}-----_
.:."_,_J ,7.,-" '
reflections
!
,
amplitude
of reflections within each 30 deg segment about the
User supplies Source and Listener Positions, size and shape of environmental context, absorptive properties of boundaries, number of early reflections, and orientation of sound source and listener. Program produces graphic output and an ascii score that allows signal processing of a given sound according to the model.
listener
I Binaural
reflectogram
]
\ D: -22
dB t ..........
R: 8 dB
Figure
9. Example
of user interface
for the ray-tracing early
number
of subjects
of Blauert
and uses some
and Mehrgardt
method
and Mellert
35 ms 80.Ox
program
used at Ames
for synthesizing
reflections.
of averaging
(Blauert,
the data.
1983; Mehrgardt
Examples
can be found
and Mellert,
1977).
in the work
Another
of averaging is to use techniques such as principal components analysis to find significant within each critical band. The second approach, which we are currently examining at Ames, the HRTFs
of a good
localizer,
i.e., someone
whose
localization
performance
headphones is superior compared to other subjects tested under the same with the averaging approach is that individual differences in the magnitude of the transfer minima
function
and maxima.
the subject's
become The
performance
help to converge
problem
"smoothed
out,"
resulting
in a transfer
with the good
localizer
approach
was idiosyncratic.
on a set of generalized
We anticipate
HRTFs.
11
both with and without conditions. and phase
function using
The problem characteristics
with less
is not knowing
that research
form
features is to use
extreme
to what
both approaches
degree will
Decorrelation Lateralization two different ILD,
and headphone output
signals
ITD, and HRTF
techniques inversion two signals
techniques
filtering
are not the only
methods
that can significantly affect both the spatial and multiple delay lines (see fig. 10). While from a single
lines is well-known processes
localization
are based
both
for each ear. This can be viewed
one are perhaps
and straightforward on combinations
innumerable, in their
transform
a single
as a decorrelation
for producing and timbral the number
binaural
signal
into
However,
decorrelation.
Two
dimensions of a sound are phase of techniques for differentiating
the use of phase
implementation.
input process.
inversion
Additionally,
and multiple many
delay
decorrelation
of these techniques. Front
Left
a)
]
= Intracranial
[]
= Diffusion
Input
@
Right
location
ot composite
of low frequencies
_
signal
< 1.6 kHz
Left Right
yVr
V Left
Stereo output Input
Figure
10. Decorrelation
techniques,
a) Phase
inversion;
12
b) cascaded
delay
line with interaural
shift.
Phaseinversionanddelay line decorrelation localization
of the "center
of gravity"
as described
of a broadband
here do not affect
signal.
Rather,
it affects
the percept
of the
the "diffuseness"
or
spatial extent of a sound source. For example, two sounds can both be localized intracranially, the decorrelated sound will seem to be more "spread out" than the correlated one (see fig. lO(a)). With frequency selective ture
binaural phase inversion
of speech,
the spreading
out effect
components. An explanation for this phenomenon nature of ITD perception. As mentioned previously,
of a waveform
waveform
envelope
the envelope
operates operates
of the signals
only
below
between
200 Hz-20
is identical
delay
line is another
input.
Figure
10(b)
delay
device.
This circuit
illustrates
primarily
with the lower
can be given in terms of the frequency sensitivity to the ITD of the fine struc-
1.6 kHz, (Blauert,
while 1983).
sensitivity With
but the fine structure
to the ITD
180 ° phase of each
of a
inversion,
harmonic
com-
Speech contains frequencies within the operating range spreading of sound occurs as a function of frequency. method
a simplified
adds four
kHz
at the two ears,
ponent is time delayed by a half-cycle. both forms of ITD, hence, a differential A multiple
approximately
occurs
but
of producing
version
time delays
two decorrelated
of a program
to the signal
available
within
signals on a typical
an approximately
from
of
a single
stereo
digital
25 ms period,
resulting in a timbral change to the signal that is similar to the effect of early reflections heard from walls in a small room. Additionally, by implementing a slight (< 2 ms) time shift between the delay pattern
at each
channel,
the sound
image
is perceived
as being
larger
phase inversion technique. This process is similar to the interaction environmental context: timbral changes result because of decorrelated
13
or more
spread
out, as with the
of a sound source patterns of reflected
within sound.
an
APPLICATIONS
OF BINAURAL
SOUND
Introduction This section
overviews
first is applicable originate
two application
to speech
in the cockpit.
radio
These
areas
for binaural sound
transmission,
areas
and both
are Increased
apply
Sensitivity
auditory
spatial
as carriers
into a grammar
perceived
location
The
that is redundant
of auditory
difference
between
of increasing
application.
allows
important
a listener
ing undesirable positions
auditory
in auditory
These
This
a synthetic
can clearly
separate
means
Sensitivity
a binaural
and a diotic
More
input
work
a noisy
environment.
in stark
contrast
is more
Studies
spatial
together display
profound both
localization
of speech
The noise
to the situation
by Cherry
(1953)
called
cocktail
"the
a single
under
than
which ears
with the environment,
and to allow
to allow perceptual engineers
of sound
layers
sources,
can be easily
from the observation to understand
and location
of the sound,
are
(the use of
simply
being
is the usual compared
of sound
an
criteria
to one
specifically,
sources
of sound
ear
in suppress-
sources
to discrete
seems where
one
by listening
to interfere a pilot
Although that
listen
and Cherry party
that in a group stream
binaural
and
effect"
of people
of speech.
and monaural
Taylor and
we use
(1954)
The
is not the
listening.
to a person
This
spea k in
both ears are open. voices
the existence hearing.
simultaneously, of studies
difference
(noise or other voices) and the necessary level for intelligibility both monotic and dichotic conditions. The difference between
so that a listener
This is coming
because a pilot was unable to over the single-channel radio
established
who are all speaking
of sound
recordings:
delivery
undifferentiated
to binaural
This has led to a number
inputs
in everyday
when
to many
its relation
presentation.
delivery,
two-channel
with one ear plugged
less with speech
must
of multiple
they mix multitrack
for two-channel
sources.
it is a powerful
demonstrated
segregation do when
over a monotic headset. Indeed, aviation accidents may have occurred attend to the "correct" voice out of the several that he hears coming transmission.
commonly
of applications
objects).
is the fact that using
what recording
the multiple
of separating
advantage
as noise,
types
grammar),
of exocentric
system
in interacting
such
that
the binaural
of urgency" of an auditory warning different perceived auditory spatial
or iconic
dimensions
important
signals
(using
for Communication
or "spacious"
advantages
Three
the "levels (organizing
position
Increased
is exactly
they create
binaural
to indicate
and warning
The
of speech and segregation of a desired of Auditory Space (using perceived
information).
to a verbal
displays.
space.
two advantages
by the listener.
icons
the "pleasing"
in a commercial
only
of semantic
urgency (using auditory space to indicate with a nonspatial grammar), redundancy
positions
issue
positions
to speech
auditory
for Communication
advantage over monotic or diotic listening for intelligibility source from multiple sources), and Cognitive Representation discussed: transmitted
to aircraft
The
is
comes
it is still possible
comparing
in dB between
of what term
intelligibility a masker
level
of the desired signal is measured the two cases, i.e., the improvement
for in
intelligibility due to binaural presentation, is termed the binaural intelligibility level difference, BILD. Results from experimental data evaluating the BILD differ depending on the stimuli used,
or and
14
criteria for evaluatingintelligibility; generally,it rangesfrom 3 to 12dB (Blauert, 1983).Koening (1950) also describedthe advantageof binaural over monauralhearingfor squelchingnoise;his resultswereverified with burststimuli by Zurek (1979). Work by BronkhorstandPlomp(1988)measuredthe BILD as a function of angleof incidence, usingconditionsthatessentiallycomparethe lateralizationtechniquesdiscussedearlier (pureITD or ILD) to headphonelocalization techniques(listening throughthe earsof a mannequinhead).Figure 11 showstheir results; the improvementusing headphonelocalization techniquesis around 2-4 dB.
12 L r
_
T
-
I
i
L 120
I 150
---
_=1o _8 (D 0
cl .J
N 4 m
-__L
2 0
Figure
11. BILD measurements filtered conditions
Informal was
studies
not measured,
300 ° azimuth)
Additional
of binaural
conducted
at Ames
information
using
provided
to be superior
summation
HRTF
I 90 Azimuth
180
as a function of azimuth: comparison of ILD, ITD, and HRTF(based on data from Bronkhorst and Plomp, 1988).
the externalization
were judged
ITD techniques. Because
were
30
I 60
HRTF
for selective can be found
of loudness,
filtering
by HRTFs attention in NASA
a binaurally
with
to four
Although ITD
voices,
(60,
compared
equipped
pilot
would
15
to ILD
also
and or
need
less
in suppressing hearing fatigue. them from other crew members.
in a monitoring system arrangement. Speech sound texture of air traffic control (ATC) and
cockpit warnings. This has the added advantage that repositioning the body and raising the voice to be heard above background noise (the Lombard effect) could be avoided, less fatigue.
the BILD 90, 240
TM-102826.
amplitude of the signal at each ear, giving an additional advantage Some pilots object to using headphones on the basis that it isolates It is possible to alleviate this by using microphones from crew members could be mixed into the overall
techniques. maximal
the level resulting
of in
Cognitive Introduction--
A binaural
within the "mind's location of sounds primarily
visual,
The idea of mental tory
input
that an analog
biases
maps
has
clearly
and Radeau,
simulation
(Goldin
indicate
studied,
can potentially
define
a synthetic
space
space" as the listener's organization imagery. It involves representation
representational
objects
(Cooper
and Thorndyke,
well
of sound
Space
of the that is
in content.
or imagery-based of spatial
not been
of Auditory
We define a "synthetic mind, by using mental
than propositional,
transformations
and cognitive
auditory
aural eye." within the
rather
Representation
1982; Kuipers,
experiments
that the two systems
system
and Shepherd,
is supported imagery
1980). Although
concerned
interact
exists
1978),
with
in the encoding
by studies
(Kosslyn,
imagery
based
auditory
and visual
of spatial
information
1981), on audi-
perceptual (Bertelson
1981).
Spatially
correspondent
analog
representations
are also supported
by the importance
of localiza-
tion for survival. Spatial pattern recognition survival in an environment in which many
would seem to be a fundamental requirement for animal events occur simultaneously in different spatial locations.
Especially
it is likely
in the absence
connection of auditory
of visual
with spatial hearing. space in the listener
The application literature,
albeit
simplistic
manner
of spatial from
(e.g.,
before pitch
children
Figure cates was
have
content
One possible
a possible
differences)
of
previously
manipulated
spatial
in
discern
An analogy
or used "impoverished"
whether
children
a command
stimuli,
use
the
is urgent parents
can immediately
pay attention,
in a
that a system of head recording
is to the kind of communication
that they had better
in the cues
applications described below techniques described previously.
voice
spatial
a diotic
audio
map for establishing
or not. The spatial
signal,
3 was synthesized semantic
occur
well
reserve
tell from
well before
the
they actually
of the message.
is urgent
with
intensity
substantiallymthe
of a command.
and position
of thinking
has been discussed
experiments
erred
urgency
synthesized
ence,
they
a command
ing the relative
systems
Some
be able to immediately
of their parents'
12 shows
whether
only interaural
itself is interpreted.
the semantic
warning
perspectives.
modes
standpoint, we are interested in eliciting a map semantic meaning as a function of position.
1965). Doll et al. (1986) concluded from their research systems would be beneficial. They used a mannequin
should
when
and intensity
realize
that imagery-based
spatialization. Examples localization, and decorrelation
A pilot
the command
for their
to auditory
different
using
technique for auditory lateralization, headphone Urgency-
audio
many
such as pure tones (Mudd, binaural cues for cockpit
stimuli,
From an applications so that we can convey
The numbered position
shown
dots represent
2 was synthesized
with a dichotic
organization
positions
a simple
time delay
for this set of positions
spatial
could
be applied
perceived
with
grammar
is for the most
for indicat-
positions:
a dichotic
and amplitude
that indiposition
time delay
1
differ-
differences. urgent
warning
to be
processed according to position 1. Less urgent commands are assigned to position 2 or 3. If a sound is transformed from the perceived left side positions to one directly inside of the head, there is a sort of "infringement"
of the listener's
personal
space.
It is suggested
16
that, with minimal
training,
a pilot
Front 2. Interaural time difference
ntiacranial
3. Interaural time and amplitude difference
Figure could
easily
12. A possible
associate
are perceived
a sound inside
possibility
of perceived
auditory
the head as occupying
is that the same
warning signal. The repetition means the action has not been
spatial
an urgency
positions.
space,
relative
to sounds
By reversing
topography
time from position
output channels,
positions
spatial
could
be used
in relation
to repetition
the signals
(4 and 5) at mirror
2 on the left, and the third time from
used
image
to obtain
locations
position
2 and
symmetrical
from
3 can be used
the center
Traffic Collision commands could
Avoidance come from
System (TCAS) voice command classifications 2 and 3, nonurgent resolution commands could
urgent
commands
could
originating
from
resolution
A command techniques.
sounds way
from
The
position
multiple
nonurgent space
commands.
icons
delay
technique
and
redundancy-
input
set against
sources.
The pilot
must
extract
cockpit,
while
simultaneously
to obtain
of the head.
meaning
signal
to the Using
as an example, traffic come from 4 and 5, and
distinguished
earlier
Urgency image
in section
as a function size,
further
and decorrelation
The cockpit high
from
on-board
work
by using 2 can
of location
techniques
decorre-
be applied would
differentiating
environment
level of ambient
vocal
to a range
noise
communication of alarms,
system,
of warnings that can be potentially displayed source and receiver, redundancy of an intended desired
An
in this
urgent
to define
to
from
an urgency
strategy.
a fairly
attending
on the particular
described
and
1.
1. can be further
their urgency.
This use of lateralization
auditory
Depending
space
in timbre
of a redundancy
different
tions.
from position
the urgency
by changes
is an application
Auditory
come
1 to underscore
be re-emphasized
of a
position
example of the elaboration of the spatial grammar is as follows: place one class of commands left (2 and 3); and one to the right (4 and 5), which reserves position 1 for urgent commands.
lation
that
of a command implies urgency, because the fact that it is repeated taken. For example, the first time a command is given, it comes from
3 on the left, the second
two other
1. No interaural difference ("urgency space")
to the sides.
Another
position
topography
axis
there
signals,
contains
a wide
from
the engine
with other
people
and automated
are approximately
200-500
range
of
and outside present voice
different
in the instructypes
to a pilot. In any communication system between message assures a better chance of the extraction of a
from noise.
17
Thereareredundantparadigmswhereinformationis encodedmore than once.For instance,in the pulse code modulation (PCM) standardfor digital encoding of auditory signals, an error correctionschemeis usedwhere 30%or more of the digital signal is literally replicatedon a tape (Nakajimaet al., 1983).Contrastingthis is a redundancyparadigmwhereidentical semanticsare conveyedwith different signs.An everyday example of the latter technique is the use of international traffic
signs
transmit
in the
the same
verbal
symbol
message.
method
verbal
verbal
icon
is possible
suggested
commands
fig. 13A).
Two
different
can be interpreted the possibility
for including
more
auditory
the auditory
in several
studies;
A.
of signs,
quickly
in TCAS icon,
and
(see fig. 13B). The advantage
are "sandwiched"
types
than
iconic
and
the verbal
verbal,
symbol;
the
of misinterpretation.
redundancy
with an appropriate
by recognizing
This idea has been where
The
presentation
quickly
(see
to reduce
message
with the icon upon be grasped
States
is redundant
A similar link each
United
icon,
The technique the verbal
to this is that the semantic
and can then
for example,
temporally
messages.
then precede be verified
Patterson
between
(1982)
auditory
content
by the verbal suggests
is to
message can icon.
a technique
icons.
B.
Verbal
Recognition
NO LEFT TURN
message
period:
A
B
C
Time
Figure
13. Identical
semantic
content
achieved
with combinations
of nonverbal
and verbal
semiology. Redundancy
has been
communication,
where
also be established same
time
advanced warning accessing (Mowbray
interval.
Returning
two types
of semiotic
by presenting Multi
so far in the context a pilot
sensory
cues
objects
follow
of a sequential, one another
with multiple
percepts
have
to be used
begun
differentiated in time.
that parallel
process
But redundancy
one another
as redundant
indicators
during
of can the
in some
aircraft. For example, some modern aircraft use visual, auditory, and tactile cues (red light, horn, and a stick shaker) for indicating aircraft engine stall. The supposition is that by multiple perceptual pathways and Gebhard, 1961).
It is also possible auditory
discussed
space
to achieve
for this purpose
to the concept
parallel is possible
of indicating
to the pilot
at the same
redundancy
in a purely
time, the chance
as long as the semiology
urgency
by spatialization
18
aural
manner.
of position to a center
of error
The
manipulation
is somewhat position
is reduced
of
limited.
(position
1 in
figure
12), the
"invasion"
urgent
auditory
icon.
of a sound
In this case both the icon
warn a pilot of urgency. urgent
aural
icons
Location
into urgency
Hence,
to further
of auditory
a limited
establish
spatial cues is to implement
an urgency
a headphone
can be combined
and the spatial
grammar
cues in relation
space
position
of spatial
sound
with
an appropriately
of the sound
manipulation
simultaneously
could
space.
to exocentric localization
objects-
system
Perhaps
the most
where the location
dimensional space is paralleled by its perceived auditory position. Ames' virtual environment project VIEW (Fisher et al., 1988).
advanced
of an object
Related research In a telerobotics
VIEW, an example would be to use spatialized auditory cues to indicate the position a robot, such as a spacecraft. With three-dimensional sound, the position of objects these
auditory
cues can be monitored
Headphone positional
localization
information.
sampling ple, a pilot
is applicable
Even
of positions
along
maneuvering
without
a sound
through
having
to capture their position
to cockpit display
the 360 ° azimuth a busy
be used with
warning
limited would
airport
systems
approach
(e.g.,
in a cockpit
would
be able
in three-
is under way at application of of objects associated
near with
communication
of
visually.
that require
to one elevation be useful
use of
at head
level)
environment.
with a
For exam-
to use an auditory
spatial
grammar based on the 12 hour hand clock positions, conceiving themselves in the center of the clock with 12 at 0 ° azimuth and 6 at 180 ° azimuth. If a pilot heard a TCAS warning that stated, "traffic at 8 o'clock,"
a headphone
localization
position
at 120 ° azimuth.
ability.
It is still currently
positions With headphone
with absolute continued
techniques Martens,
The problems unfeasible
could
provide
with front-back
to relate
redundancy
reversals
all 12 clock
by placing
mentioned
positions
in a virtual
the perceived
previously
limits
space
to perceived
the
confidence. research,
localization
HRTF-synthesized found by several
technique
should
the
externalization
be alleviated.
and front-back It is definitely
reversal
possible
positions, especially for conveying positions to the authors as an improvement over traditional intensity
for creating
spatial
auditory
displays
(Begault,
1984).
19
1986;
to use
problems a limited
inherent number
in of
sides; their effect has been or time difference stereo
Griesinger,
1989;
Kendall
and
CONCLUDING
Three
techniques
for controlling
tion, and decorrelation--have areas of research, including binaural
sound
to cockpit
the binaural
been efforts
speech
discussed at Ames.
and warning
REMARKS
sound
image--lateralization,
headphone
systems.
Improvements
in the intelligibility
and segregation of multiple source inputs were outlined. In addition, suggestions use of the mapping of auditory space to convey urgency, redundancy, and auditory perceptual
display.
Using
the
techniques
localiza-
in terms of implementation, perception, and current This was exemplified by showing an application of
outlined
would
faculty.
20
provide
powerful
of speech
were made for the position within an
control
over
a vivid
REFERENCES
Begault, Begault,
D. R.: Spatial
Manipulation
and Computers.
D. R.: Control of Auditory La Jolla, CA, 1987.
Bertelson,
P.; Spatial
Blauert,
and
Radeau,
Discordance.
J.: Spatial Press,
Bronkhorst,
M.:
Distance.
Hearing:
Dissertation,
Cross-Modal
Perception
Ex Tempore,
Bias
and
University
Perceptual
and Psychophysics,
the Psychophysics
vol. IV, no. 1, 1986, pp. 56-88. of California
Fusion
with
San Diego,
Auditory-Visual
vol. 29, 1981, pp. 578-584.
of Human
Sound
Localization.
Cambridge:MIT
1983. A. W.; and
ences
on
Plomp,
Speech
R.: The Effect
Intelligibility
of Head-Induced
in Noise.
J. Acous.
Interaural Soc.
Time
America,
and Level
vol.
83,
Differ-
no. 4,
1988,
pp. 1508-1516. Cherry,
E. C.: Some Experiments on the Recognition of Speech J. Acous. Soc. America, vol. 25, no. 5, 1953, pp. 975-979.
Cherry,
E. C.; and Taylor, W. K.: Some Further Experiments on the Recognition and Two Ears. J. Acous. Soc. America, vol. 26, no. 4, 1954, pp. 549-554.
Cooper,
L. A.; and Shephard, R. N.: Transformations E. C. Carterrette and M. P. Friedman (Eds.), Press,
Doll,
Audio
J. M.; Engelman,
for
Air Force
Cockpit Base,
S. S.; Wenzel, Workstations. CA,
Goldin,
Applications.
E. M.;
Tech.
C.;
and
D. J.: Development
Report
Aerospace
Coler,
In Proceedings
Factors,
Two
of Speech
Ears.
with One
I_.n
Medical
AAMRL-TR-86-014,
Factors
Laboratory,
M. W.:
Virtual
Society
32nd
Directional
Wright-Patterson
Research
McGreevy,
of the Human
of Simulated 1986.
Interface
Annual
Environment
Meeting,
Anaheim,
D.: Glass
P. W.: Simulating
Navigation
for Spatial
Knowledge
Acquisition.
vol. 24, 1982, pp. 457-471.
D.: Equalization
Reproduction.
vol.
with
1988, pp. 91-95.
Human
Hughes,
and
on Representations of Objects in Space. Handbook of Perception. New York:Academic
W. R.; and Folds,
OH: Armstrong
S. E.; and Thorndyke,
Griesinger,
One
1978.
T. J.; Gerth,
Fisher,
with
and Spatial
J. Audio Cockpit
Eng. Soc.,
Study
Reveals
Equalization
of Dummy
vol. 37, nos. Human
Factors
131, no. 6, 1989, pp. 32-36.
21
Head
Recordings
for Loudspeaker
1-2, 1989, pp. 20-29. Problems.
Aviat.
Week
and Space
Tech.,
Kendall, G. S.; andMartens,W. L.: Simulatingthe Cuesof ments. Koening,
Tech.
Report,
W.: Subjective
Northwestern
Effects
University,
in Binaural
Spatial
Heating
in Natural
Environ-
1984.
Hearing.
J. Acous.
Soc. America,
vol. 22, no. 1, 1950,
pp. 61-62. Kosslyn,
S. M.: The
Medium
and the Message
in Mental
Imagery:
A Theory.
Psychological
Rev.,
vol. 88, 1981, pp. 46-66. Kuipers,
B. J." The Cognitive L. P. Acredolo (Eds.), Press,
Map: Could it Have Been Any Other Spatial Orientation: Theory, Research,
In H. L. Pick Jr. and Applications. Plenum
1980.
Mehrgardt, S.; and Mellert, V.: Transformation Characteristics Soc. America, vol. 61, no. 6, 1977, pp. 1567-1576. Mowbray, G. H.; and (Ed.), Selected New York: Mudd,
Way? and
Gebhard, Papers
Dover,
of the External
Human
Ear. J. Acous.
J. W.: Man's Senses as Information Channels. In H. W. Sinaiko on Human Factors in the Design and Use of Control Systems.
1961.
S. A.: Experimental
Evaluation
of Binary
Pure-Tone
Audition.
J. Appl.
Psych.,
vol. 49, no. 2,
1965, pp. 112-121. Nakajima, D.; Doi, T.; Fukuda, Tab Books, 1983. Oppenheim, Hall, Parker,
A. V.; and Schafer, 1975.
S. P. A.; and Oldfield, II. Pinnae
Patterson,
Cues
and Iga, A.: Digital
R. W.: Digital
S. R.: Acuity
Absent.
R.: Guidelines Aviation
Plenge,
J.;
for Auditory
Authority,
London,
G.: On the Differences
signal
of Sound
Perception,
audio
technology.
processing.
Blue Ridge
Englewood
Localization:
Cliffs,
A Topography
Summit,
PA:
NJ: Prentice-
of Sound
Space.
vol. 13, 1984, pp. 600-617.
Warning
Systems
on Civil
Aircraft.
Tech.
Rep.
82017.
Civil
1982.
Between
Localization
and Lateralization.
J.Acous.
Soc.
America,
vol. 56, no. 3, 1974, pp. 944-951. von B6k6sy, Wenzel,
Georg:
Experiments
in Heating.
New York:McGraw-Hill,
E. M.; Wightman, F. L.; and Foster, Dimensional Acoustic Information. Annual
Wightman,
Meeting.
F. L.; and
Synthesis.
Anaheim, Kistler,
J. Acous.
S. H.: A Virtual In Proceedings
1960.
Display System of the Human
for Conveying ThreeFactors Society 32nd
CA, 1988.
D. J.: Headphone
Soc. America,
Simulation
vol. 85, no. 2, 1989a,
22
of Free-Field pp. 858-867.
Listening.
I: Stimulus
Wightman,F. L.; andKistler, cal Validation. Zurek,
D. J.: Headphone J. Acous. Soc. America,
P. M.: Measurements
of Binaural
Echo
Simulation of Free-Field Listening. vol. 85, no. 2, 1989b, pp. 868-878. Suppression.
1979, pp. 1750-1757.
23
J. Acous.
Soc.
America,
II: Psychophysi-
vol.
66, no. 6,
IXl/ A S_cm
Report Documentation
Page
AdmlniaNon
1, Report
2. Government
No.
NASA
Act_ession No.
3. Reciplent's
Catalog
No.
TM- 102279 5. Report
4. Title and Subtitle
Techniques and Applications for Binaural in Human-Machine Interfaces
August
Sound Manipulation
7. Author(s)
Durand
R. Begault
and Elizabeth
Date
1990
6. Performing
Organization
Code
8. Performing
Organization
Report
No.
A-90066
M. Wenzel
10. Work
Unit No.
505-69-01 9. Pedorming
Ames
Organization
Research
Moffett
Field,
Name
and Address
11. Contract
or Grant
No.
Center CA 94035 13. Type of Report and Period
12. Sponsoring
National
Agency
Name
Aeronautics
Washington,
DC
15. Supplementary
Technical
and Address
and Space Administration
Covered
Memorandum
14. Sponsoring
Agency
Code
20546-0001
Notes
Point of Contact:
Elizabeth
M. Wenzel,
Moffett
Field,
(415)604-6290
Ames
Research
Center,
MS 239-3,
CA 94035-1000 or FTS 464-6290
16. Abstract
The implementation icons")
is addressed
include
processing
advanced
cockpit
by means human
to any human-machine under investigation are described.
17. Key Words
(Suggested
of binaural
sound
to speech
from both an applications of filtering
interface interface.
with head-related
systems
is discussed,
Research
at the Aerospace
and auditory
and technical
transfer although
issues pertaining
Human
Factors
Techniques
functions.
at NASA Ames
Psychoacoustics Digital signal processing Human-machine interfaces
Subject
Classif.
(of this report)
Unclassified NASA
FORM
1628
20. Security
Classif.
sound displays Research
(of this page)
21. No. of Pages
28
OCTa6 Technical
Information
Service,
Center
Category-53
Unclassified
For sale by the National
to
are extendable
Statement
Unclassified-Unlimited
19. Security
overviewed
Application
the techniques
to three-dimensional
Division
18. Distribution
by Author(s))
sound cues ("auditory
standpoint.
Springfield,
Virginia 22161
22. Pdce
A03