Accuracy: Trueness and Precision Errors: Biases ...

8 downloads 0 Views 170KB Size Report
Hartmann (1983); Rakerd and Hartmann (1986) provided a useful set of equations to ... cation is T. Overlines indicate the mean operation. common name.
Sound Source Localization Identification Analysis Accuracy: Trueness and Precision Errors: Biases, Confusions, and Misses M. Torben Pastore and William A. Yost Department of Speech and Hearing Science, College of Health Solutions

– root-mean-squared (RMS) error offers no sense for the bias in listeners’ responses. Averaged across many speaker locations, this is not expected to be a problem.

Measures of Accuracy

• An increasingly popular method for investigating auditory spatial acuity is the sound source identification method. Sound stimuli are presented over loudspeakers and listeners report the perceived location of the sound source. • The data from such identification tasks can be presented in confusion matrices which compare actual to reported sound source locations. While figures offer an important visual estimate of the patterns of listener responses, quantitative descriptions and inferences require statistical analyses. • The dependent variable in identification tasks is not continuous, and so care must be taken in selecting the appropriate statistical tools. • Hartmann (1983); Rakerd and Hartmann (1986) provided a useful set of equations to analyze listener performance in sound source identification tasks. They allowed for the discretization of listener responses (e.g., loudspeaker numbers) and considered the intricacies inherent to the issues mentioned above over multiple publications e.g., (Hartmann, 1983; Hartmann and Rakerd, 1989). • In general, this approach to analyzing identification data is part of a larger set of issues related to measurement per se. These issues have been standardized and updated in ISO Standard 5725 (1994). This standard defines accuracy of any measured outcome in terms of trueness and precision. • For experimental set-ups that present sounds from 360◦ around the listener, further complications may arise in the form of front-back confusions, resulting in skewed or even bi-modal distributions. Whichever common statistic is used to quantify and partition accuracy, none of these methods offer a simple way to process data that include frequent frontback reversals. • This poster considers important distinctions between different measures of accuracy, different kinds of errors, and approaches to quantitatively summarizing experimental outcomes in light of issues particular to sound source identification, especially for circular loudspeaker arrays .

Definition of Terms For simplicity, we define all concepts in terms of a single target location. So, let x i , with i = 1 − N , signify individual responses to the same stimulus and target location. Usually, an array of loudspeaker locations are tested for multiple listeners. In this case, listeners, loudspeaker locations, etc. must also be identified with subscripts, e.g., Hartmann et al. (1998).

A listener’s accuracy (or error) in identifying the location of a sound source can be expressed in terms of accuracy, trueness, and precision. These terms have been defined in ISO Standard 5725. • Accuracy - the difference between a single instance of a listener’s response and the veridical sound source location. In ISO 5725, this is the general term for the accuracy of a measurement, which is then partitioned into the systematic and non-systematic components below. • Trueness - the systematic difference between listener responses and the target location. In ISO 5725, “Trueness” refers to the closeness of agreement between the arithmetic mean of a large number of test results and the true or accepted reference value. • Precision - given the systematic error, precision is an estimation of the variability about the average listener response. In ISO 5725, “Precision” refers to the closeness of agreement between test results.

trueness precision

T target

responses

Figure 1: An illustration of the basic concepts described by ISO 5725.

1

Percent/Proportion Correct - a poor estimate of trueness, and a nearly useless estimate of precision

Measures Based on Absolute Error

6

2

5

3

ξrms 6= MAE

Table 1: Descriptive statistics based on the absolute value of error. The target location is T. Overlines indicate the mean operation.

common name

formula

mean signed error (trueness)

ξ = xi − T

|ξ| = |x i − T | ¯ ¯ ¯ mean unsigned deviation from the mean (precision) σξ = x i − x ¯

mean absolute error (MAE)

• Advantages – Intuitively understandable. – Each error contributes to the overall measure in direct proportion to the magnitude of the error. Therefore outliers do not have an outsized effect on the estimate. – to gain a clearer sense of the underlying distribution of errors with less vulnerability to the effects of outliers, quantile measures (e.g., median/quartile) may be used to calculate error in place of mean/standard deviation. In this case, error retains its sign (not absolute). • Disadvantages – most statistical inference and decision theory rely on probability functions that work well with squared, moment-based measures such as variance, root-mean-squared error, etc. – without this moment-based tool-set, inferences between data sets resulting from different methods is difficult

RMS error must be considered with care. It should not be misconstrued to be the average error of listener responses. This is because RMS error is always inflated compared to mean unsigned error. The difference between MAE and RMS Error will be a function of the degree of variability of listeners’ responses and the number, N , of responses. This difference will then be scaled by the angular separation of the loudspeakers, A, being identified. In other words q (x i − T )2 6= |x i − T | or equivalently, v uN N ¯ X X ¯ A u A t 2 ¯ (x i − x) 6= (x i − x)¯ p N i =1 N i =1

.

An Example For illustrative purposes, the various measures of response variability are calculated and illustrated together for the same hypothetical data set in Figure 2. A skewed, random distribution of 30 responses was generated in Matlab for a single, target loudspeaker located at ‘position 0.’ Data are dimensionless for the sake of example. Several differences between these measures can be identified that will generalize across data sets. From the top of the figure: 1. measures based on absolute error (grey)

Measures Based on Moments

(a) the average (signed) error, x − T , is a simple measure of trueness, or ‘bias.’ (b) all calculated absolute values lie within the range of the data values

Table 2: Moment-based measures. The generalized formulae account for only a single target location, T . See Rakerd and Hartmann (1986) for detailed descriptions and formulae for Hartmann’s statistics.

Basic Concepts

Confusions

common name

Hartmann’s notation

formula

mean response

R

root-mean-squared error

D

standard deviation (biased)

s

x q ξr ms = (x i − T )2 q σξ = (x i − x)2

mean signed error (trueness)

C2

ξ = xi − T

2

2. measures based on moments (orange) (a) the squared error values will often lie outside the range of the data (b) the MAE is smaller than the rms Error

2

• Advantages – Squaring measures allows variance to be partitioned into systematic and non-systematic error. In Rakerd and Hartmann’s notation this is D 2 = C 2 + s 2. Note that this is NOT the case for the unsquared measures! e.g. trueness + precision 6= total error. – These measures can be easily integrated into common methods of frequentist statistics, signal detection theory, and decision theory. Hartmann (1983); Hartmann and Rakerd (1989); Hartmann et al. (1998), for example, has taken advantage of this to infer the relationship between measures of the minimum audible angle and sound source localization accuracy, as well as the necessary spacing of loudspeakers for source identification tasks. • Disadvantages – squared measures of variability proportionally inflate the weight of outliers – squared variability is not easily or intuitively graspable

reported speaker

Introduction

5 4 3 2 1 6 6

4

1

2

3

4

5

presented speaker

Figure 3: Sample data for an experiment using 6 speakers evenly spaced apart by 60◦. To display the behavioral data so that correct responses and reversals are easy to visualize, the numbers that listeners used to identify speakers are circularshifted along the x and y axes. Correct responses are in blue, front-back confusions are in red, and misses are in black.

Front-back confusions may occur in, or near, an azimuthal plane at any elevation e.g., (Middlebrooks, 1992). For simplicity, we here refer to errors in the 0◦ elevation (pinna height) azimuth plane only. • when the source is located at an angle of α and a listener responds with a location at at or near 180 − α, we call this a front-back confusion (FBC). • if the measures of accuracy discussed above are intended to describe spatial processing, then the existence of FBCs presents serious issues of interpretation (e.g. the mean of a bimodal distribution resulting from numerous FBCs). • it is not uncommon to “correct” FBCs by flipping them about the frontback plane and labeling them as correct. Such corrections are done because a FBC is not the result of errors in binaural processing. However, this must be approached carefully, in light of what the research question is. • criteria for distinguishing misses from FBCs are essential. the distribution of errors is an important factor: – It is important to have some estimate of the expected errors in binaural processing due to “internal noise” – the stimulus may lead to a broad range of errors

Biases • biases are errors of trueness; they emerge as a predominant pattern across multiple responses in which one set of responses occurs markedly more often than others. Simply put, if the distribution of errors is not uniform, there may be a response bias or biases • a bias could consist of one or several patterns in the distribution of errors • for example, the responses of a listener might demonstrate a bias to the right. That the listener has a unilateral deafness in the left ear could account for this bias. • it is usually a good idea to have a statistical criterion for defining a bias • across multiple target locations, response biases may average out.

References Figure 2: Hypothetical listener response data (black circles) for a single target loudspeaker (red ‘T’) using a continuous response method.

Note that these measures of accuracy can be applied to multiple target locations by averaging over those locations, as typically done in the literature.

Measures of Errors Misses • misses are any error that is not attributable to a front-back confusion

Hartmann, W. M. (1983). “Localization of sound in rooms,” J. Acoust. Soc. Am. 74, 1380–1391. Hartmann, W. M. and Rakerd, B. (1989). “On the minimum audible angle–a decision theory approach.,” J. Acoust. Soc. Am. 85, 2031–41. Hartmann, W. M., Rakerd, B., and Gaalaas, J. B. (1998). “On the source-identification method.,” J. Acoust. Soc. Am. 104, 3546–57. International Organization for Standardization (1994). “ISO Standard 5725-1,” Technical Report, URL https://www.iso.org/obp/ui/#iso:std:iso:5725:-1:ed-1:v1:en. Middlebrooks, J. C. (1992). “Narrow-band sound localization related to external ear acoustics.,” J. Acoust. Soc. Am. 92, 2607–24. Rakerd, B. and Hartmann, W. M. (1986). “Localization of sound in rooms, III: Onset and duration effects.,” J. Acoust. Soc. Am. 80, 1695–706.

Acknowledgements Partially support by a grant from the National Institute on Deafness and Other Communication Disorders, NIDCD. For further information, please contact [email protected].