Perceptual and physical evaluation of differences among a ... - Webistem

4 downloads 641 Views 704KB Size Report
to create a measurement tool allowing a more relevant discrimination between loudspeakers ... between listening tests and objective measurements on.
Perceptual and physical evaluation of differences among a large panel of loudspeakers Mathieu Lavandier, Sabine Meunier, Philippe Herzog Laboratoire de Mécanique et d’Acoustique, C.N.R.S., 31 Chemin Joseph Aiguier, 13402 Marseille Cedex 20, France e-mail: {lavandier, meunier, herzog}@lma.cnrs-mrs.fr

This study examines the restitution of timbre by loudspeakers in a listening room. The main objective is to create a measurement tool allowing a more relevant discrimination between loudspeakers regarding our perception of reproduced sound. A panel of loudspeakers is evaluated in two parallel ways: perceptual and physical measurements. The experimental protocol used is compatible with both approaches and has been described in a previous study [1]. It first consists in recording the sound radiated by loudspeakers in a room, and then submitting the recorded sounds to signal analysis, on the one hand, as well as to a series of listening tests under headphones, on the other hand. The final step is to define a suitable method of analysis in order to differentiate the recordings in the same way as listeners did. Whereas our first experiment involved only twelve loudspeakers in a room not specially designed for listening, this new experiment took place in the listening room used by a loudspeaker manufacturer to test its own products. A large panel of loudspeakers was chosen, covering a wide range of technologies, prices and trade marks. The experimental results allow us to identify the main perceptual dimensions involved in our tests and to refine our description of the objective attributes corresponding to these dimensions. The listening test was based on a free classification task, and the results are compared to the ones of the previous experiment based on similarity ratings and pair-comparison tests.

1 Introduction

proaches. We also chose to evaluate the relative differences between loudspeakers and not their absolute quality, which supposes less a priori on what we are looking for. The main objective is to create a measurement tool allowing a more relevant discrimination between loudspeakers regarding our perception of reproduced sound. This tool might then be involved in quality evaluations of loudspeakers.

Normalized measurements used nowadays to differentiate loudspeakers do not seem to be relevant compared to what listeners hear while using these loudspeakers. The aim of our research is to look for relationships between perceived and measured differences between the sound fields radiated by different loudspeakers in a room.

Listening tests on loudspeakers have to be rigorously controlled for their results to be valid [10, 11]. The perception of the sound radiated by a loudspeaker is greatly influenced by the room in which it is used and by the positions of both loudspeaker and listener [12, 13]. The studies on the perception of sound reproduction led to the publication of recommendations concerning listening tests on loudspeakers [14, 15].

The perception of reproduced sound has been widely investigated and several studies have sought the link between listening tests and objective measurements on loudspeakers. The listening tests consisted in rating the loudspeakers on different perceptual scales [2] revealed in previous studies [3, 4], or on overall quality or fidelity scales [5]. These ratings were then compared to frequency response measurements done in anechoic chamber [2, 6, 7, 8] or in listening room [2, 8]. Klippel [9] has simulated the sound produced by loudspeakers during his listening tests, using anechoic measurements. Then, he looked for objective attributes defined on this signal to explain the results of the listening tests.

To deal with all these constraints, we defined an experimental protocol described in a previous study [1]. It first consists in recording the sound radiated by loudspeakers in a room, and then submitting the recorded sounds to signal analysis, on the one hand, as well as to a series of listening tests under headphones, on the other hand. Then, we look for relationships between these two approaches, examining the way they differentiate loudspeakers. Our goal is to define a suitable method of analysis in order to differentiate the recordings in the same way as listeners did. Because of the listening through headphones, the spatial dimension of sound reproduction is not investigated and our research focuses on the restitution of timbre by single loudspeakers.

We followed the same idea of bringing the perceptual and physical evaluations together in order to find relationships between them. But, in front of the difficulty to do it a posteriori, we tried to keep these two approaches as close as possible one to the other, as long as we could, since the beginning of the experiment. The physical measurements had to be done in the same environment as listening tests, and preferably at the same time, as we would be sure to measure the same sound field along both ap-

1689

Forum Acusticum 2005 Budapest

Lavandier, Meunier, Herzog

In a first experiment [1] involving twelve loudspeakers, two main perceptual dimensions underlying the differences perceived by listeners were revealed. These dimensions were found independent of the tested recording techniques and musical excerpts. A suitable objective discrimination between the recordings was realized in parallel, with a very good agreement between the two approaches. The experiment described in the present paper was built to confirme our previous results, especially with measurements done in another listening room. More experimental data was also required, with more different loudspeakers involved, for a better description of the perceptual dimensions and to be able to uncover other potential dimensions.

The recordings were achieved with stereophonic AB ORTF technique, using microphones from the AKG Blue Line series, situated at 1 m from the floor, in a restricted area corresponding approximately to the head of a listener seated at 2.40 m in front of the loudspeaker (Figure 1). Several musical excerpts were stored on a compact disc and reproduced on the loudspeakers by high-grade CD player and amplifier. One excerpt was involved in the listening test (Mc Coy Tyner, "Miss Bea", jazz, 3.3 s). It was chosen for comparison with previous results as it was one of the excerpts used in our previous experiment [1]. During the recording sessions, reproduction levels were roughly adjusted to normal listening conditions. Before the listening tests, the overall loudness of the recordings to be compared has been set to the same level, around 65 phons, as judged by the experimenters. This equalization prevents loudness from appearing as an uninteresting dimension potentially masking other perceptual dimensions.

2 A larger panel of loudspeakers in another listening room 2.1 Recording sessions The recordings have been conducted in the listening room used by a loudspeaker manufacturer to test its products. Its ground plan is shown in Figure 1. This room is close to listening room standards ([14, 15]). The floor is entirely carpeted. The reverberation time was measured at one position, between the recording microphones and the loudspeaker, with a source at the position of the loudspeaker. Its value was found around 0.4 s at midrange frequencies.

2.2

Listening test

Our psychoacoustical approach is based on the evaluation of perceptual similarities between the recordings of the loudspeakers. The most direct method to obtain these similarity judgments is to run a pair comparison test. In our previous experiment [1], the twelve recordings were presented by pairs to listeners who had to quantify the similarities within each pair. But this method is not realizable anymore with thirty-seven recordings. The number of pairs to be judged becomes too large and the test would be too long for listeners to be able to handle the task. So, an indirect method based on a free classification task was used. The thirty-seven recordings were presented as thirtyseven crosses randomly distributed on a computer screen. Listeners could move each cross freely and listen to it as many time as they wanted by simply clicking on it. They were asked to group the recordings in different classes: recordings found similar should be placed in the same class, whereas recordings found dissimilar should be placed in different classes. The number of classes to be used was free, and so the degree of similarity between two recordings for them to be placed in the same class was let to the judgement of each listener. The partition of each listener was then transformed into a matrix of dissimilarity between the recordings. The dissimilarity of two recordings from the same class is set to zero and the dissimilarity of two recordings from different classes is set to one. The more often two recordings were grouped together by listeners, the more similar they are supposed to be, and the lower the value of their dissimilarity is found.

Figure 1: Listening room used for recording sessions During five sessions, a panel of thirty-seven single loudspeakers was recorded, covering a wide range of technologies, prices and trade marks. They were all situated at the same position in the room (Figure 1). Their vertical position was evaluated by the position of the point between the medium and the tweeter. This point was placed at one meter from the floor, unless loudspeakers were designed to stand directly on the floor.

1690

Forum Acusticum 2005 Budapest

Lavandier, Meunier, Herzog

The listening test was conducted in an isolated soundproof room using Stax Ib Pro headphones. Fifty-six listeners took part in the experiment, with thirty four men and twenty-two women. They were between twenty-two and fifty-three years old, with an averaged age of thirtytwo years. They were members of the laboratory or students, and none of them was trained for loudspeaker comparison. It took them between ten and eighty minutes to complete the task, with an average of twenty-eight minutes per listener. Listening tests were followed by informal interviews where listeners were asked to describe their partition as much as they could.

tative samples of very poor quality sound reproduction.

3 Results of the psychoacoustical approach

Figure 3: Four-dimensional auditory space resulting from MDS analysis of similarity data. Each number corresponds to the recording of a loudspeaker Multidimensional scaling (MDS) technique was considered suitable for analyzing the similarity data obtained from our free classification task [16, 17]. A fourdimensional auditory space shown in Figure 3 was retained as a proper solution regarding stress measurement, interpretability while listening, and interviews of listeners after the test. The first perceptual dimension is predominant and seems to be strongly linked to bass-treble ratio or spectral balance. But this dimension does not seem monotonic. As we move towards the right along this dimension, we find "lack of bass" stimuli, then "boost of bass" stimuli and finally "well-balanced" stimuli. This dimension could be characteristic of the discoloration of the sound reproduction [9], if too much bass could be assumed to be perceived more "well-balanced" than a reproduction lacking of bass. Instead of considering only the similarity, listeners might also have used the notions of quality or preference to group the recordings. The second perceptual dimension would be linked to "reverberation" and "feeling of space". It appears as a direct consequence of the interaction between the loudspeakers and the room. The third perceptual dimension depends on the emergence of the medium compared to the rest

Figure 2: Dendogram resulting from cluster analysis of similarity data. Each number corresponds to the recording of a loudspeaker The number of classes used by listeners varied from three to twenty-five, with an average of eight classes per listener. A comparison of the different partitions based on the rand-index and cluster analysis [16] did not reveal any classes among listeners. As no clear difference of strategy appeared between them while defining their partitions, listeners might only have used different degrees of similarity to separate the recordings. The mean dissimilarity matrix was analyzed using cluster analysis. The dendogram of Figure 2 does not reveal clear classes between the recordings. The partition seems to depend "continuously" on the chosen degree of similarity to discriminate the recordings. Only the recordings of loudspeakers 8 and 13 appear very different from the others. These two loudspeakers correspond to very band-limited loudspeakers. They were involved in the test as represen-

1691

Forum Acusticum 2005 Budapest

Lavandier, Meunier, Herzog

of the spectrum. It determines the "clarity/clearness" or "dullness" of the sound. The last perceptual dimension is heard principally on cymbals and could be called the "softness" or "hardness" of the sound.

and its associated metric to differentiate the recordings in the same way as listeners did. Figure 4 displays the mean values of the correlation obtained with the different methods of analysis. The average is taken over the results from each channel of the stereophonic recordings. Phase information does not seem relevant to the perceived dimensions involved in our experiment, as shown by comparing the results given by spectrum and power spectral density. The tested spectral weightings appear of no use to improve the correlation between objective and perceptual dissimilarities. For a pertinent evaluation of loudspeakers, the importance of taking into account the auditory masking effects by considering the specific loudness [1, 9, 20] is confirmed. More experimental data is required to make a clear difference between the results from the specific loudness determined on the overall signal or as a function of time. This question is of great importance to assess the necessity of taking into account the temporal dependency of auditory masking effects. Mean specific loudness seems to contain as much useful information as specific loudness to describe the perceptual dissimilarities resulting from our listening test.

4 Results of the physical approach As the physical approach has been presented in details in a previous paper [1], we will only give a brief summary of the approach here. Both channels of the recordings used in the listening test were analyzed as monophonic signals. Our goal was to define a suitable method of analysis to differentiate the recordings in the same way as listeners did. We investigated the time, spectral and time-frequency domains. Weighted spectral domains were tested: the A-weighting and a weighting based on the normal equal-loudness contour at 70 phones [18], which is close to the level of our sound reproduction. The specific loudness was also investigated. The overall specific loudness and the time-frequency pattern of specific loudness of our signals were computed [19]. The temporal mean of the time-frequency pattern of specific loudness was also considered. It is called "mean specific loudness" in the following. For each method of analysis, we defined metrics measuring the dissimilarities between signals. Details about these objective dissimilarities and their calculation can be found in [1].

Even if it appears as the most pertinent method of analysis among the tested ones, the specific loudness is not sufficient to explain the perceptual dimensions involved in our listening test. Multidimensional scaling analysis was applied to the objective dissimilarities. This allows us to draw objective spaces that can be compared to the perceptual ones. In our previous study [1], mean specific loudness led to two-dimensional objective spaces very similar to the corresponding two-dimensional auditory spaces. The objective dimensions were then suitable to describe the perceptual ones. In the present study, the objective spaces based on mean specific loudness are still twodimensional, so the mean specific loudness would not explain the four perceptual dimensions involved in our listening test. As in the previous study, the two objective dimensions can be linked to bass-treble ratio or spectral balance for the first one, and to the emergence of medium frequencies for the second one. The second objective dimension might describe properly the third perceptual dimension. On the other hand, the first objective dimension is monotonic and then is strongly different from the first perceptual dimension, even if they are linked to the same kind of attribute. The second and fourth perceptual dimensions do not appeared to be contained in the information given by the mean specific loudness that we have computed.

Figure 4: Correlations between objective and perceptual dissimilarities. Specific loudness 1 is the overall specific loudness. Specific loudness 2 is the time-frequency pattern of specific loudness. Mean specific loudness is the temporal mean of specific loudness 2.

5

The objective dissimilarities are then compared to the mean perceptual dissimilarities resulting from our listening test. Calculating the correlation between objective and perceptual dissimilarities is a way to evaluate the suitability of the corresponding objective method of analysis

Comparison with previous results

For comparison with our previous results [1], we realized partial analysis of our two experiments, keeping only the recordings of the eleven loudspeakers involved in both

1692

Forum Acusticum 2005 Budapest

Lavandier, Meunier, Herzog

experiments. These recordings are based on the same musical excerpt but were realized in a different room. The perceptual dissimilarities corresponding to the considered recordings were extracted from the dissimilarity matrices of each test, new listening tests were not run. Of course, the influence of the other stimuli involved in the listening tests could not be removed a posteriori, and it has to be kept in mind while interpreting the results.

auditory space (bottom of Figure 5) as in [1].

Figure 6: Two-dimensional objective space based on the mean specific loudness of the eleven recordings selected from the present experiment. Left channel on top, right channel on bottom. Each number corresponds to the recording of a loudspeaker

Figure 5: Two-dimensional auditory spaces resulting from MDS analysis of partial similarity data. Previous experiment on top, present experiment on bottom. Each number corresponds to the recording of a loudspeaker

6

Conclusion

Our experimental protocol appears to allow parallel perceptual and physical discriminations of loudspeakers radiating in a room. A first experiment [1] involving only twelve loudspeakers had revealed two perceptual dimensions, and the specific loudness was found as a suitable method of analysis to explain these dimensions. These results were confirmed by partial analysis of a new experiment done in another room, considering only the recordings of the loudspeakers common to both experiments.

The two-dimensional auditory spaces obtained by multidimensional scaling analysis of the partial similarity data from each experiment are found very similar (Figure 5). Only the recording of loudspeaker 5 is greatly moving. This particular behavior has been noticed previously [1], and might be explained if we could take into account specificities in our MDS model [21]. If we considered only the partial data common to both experiments, and despite the influence of the other stimuli, the difference of room for the recordings and the difference of task during the listening tests, the two perceptual dimensions are the same in the two experiments.

The complete analysis of this new experiment revealed four perceptual dimensions. Discovering more dimensions is not surprising, as in the previous experiment the number of dimensions that could be uncovered was limited by the number of loudspeakers involved. Our analysis using the specific loudness is not sufficient anymore to explain all the four perceptual dimensions, which is not surprising either. The way we are extracting information from specific loudness calculations is limited and our definition of the objective dissimilarity defined on specific loudness patterns might be refined. For now, it involves an averaging over time and frequency that might completely hide subtle informations potentially perceived by

The objective dissimilarities on the partial data from the present experiment were also computed. The correlations between objective and perceptual dissimilarities followed the same trend than on Figure 4, but the specific loudness gave even better results with a correlation around 80%, as in our previous experiment [1]. The two-dimensional objective spaces based on mean specific loudness (Figure 6) could also describe the corresponding two-dimensional

1693

Forum Acusticum 2005 Budapest

Lavandier, Meunier, Herzog

listeners. Other auditory models might also be tested.

[6] F.E. Toole. Loudspeaker measurements and their relationship to listener preferences: Part 1. J. Audio Eng. Soc., 34(4):227–235, 1986.

The nature of the first perceptual dimension is more surprising. In both experiments, this dimension is linked to bass-treble ratio or spectral balance, but in the second experiment, it is not monotonic anymore. From our experimental data, we cannot conclude if the nature of this dimension is truly non-monotonic and was hidden in our first experiment because of the lack of stimuli; or if this new behavior is due to the free classification task. Instead of considering only the similarity, listeners might also have used the notions of quality or preference to group the recordings. A new listening test based on paircomparison and similarity ratings will have to be conducted. It should involve as many recordings as possible, in order to compare the results with the ones of the free classification task.

[7] F.E. Toole. Loudspeaker measurements and their relationship to listener preferences: Part 2. J. Audio Eng. Soc., 34(5):323–348, 1986. [8] H. Staffeldt. Correlation between subjective and objective data for quality loudspeakers. J. Audio Eng. Soc., 22(6):402–415, 1974. [9] W. Klippel. Multidimensional relationship between subjective listening impression and objective loudspeaker parameters. Acustica, 70:45–54, 1990. [10] S.P. Lipshitz and J. Vanderkooy. The great debate: Subjective evaluation. J. Audio Eng. Soc., 29(7/8):482–491, 1981.

7 Acknowledgement

[11] F.E. Toole. Listening tests- turning opinion into facts. J. Audio Eng. Soc., 30(6):431–445, 1982.

We wish to thank Jeremy Marozeau for providing us his program running free classification task listening tests, the Mosquito Group for putting their listening room at our disposal, Mickael Lefebvre for his help during the recording sessions, the manufacturers, audio professionals and laboratories who lent us their loudspeakers (BC Acoustique, Cabasse, Conservatoire National des Arts et Métiers, Copper et Cobalt, France Telecom R&D, Genesis, Mosquito Group, Relief Sonore, Supravox), and finally all the listeners who took part in the experiment.

[12] S. Bech. Perception of timbre of reproduced sound in small rooms: Influence of room and loudspeaker position. J. Audio Eng. Soc., 42(12):999–1007, 1994. [13] S.E. Olive, P.L. Schuck, S.L. Sally, and M.E. Bonneville. The effect of loudspeaker placement on listener preference ratings. J. Audio Eng. Soc., 42(9):651–669, 1994. [14] IEC Publication 60268-13. Sound system equipment- part 13: Listening tests on loudspeakers. International Electrotechnical Comission, Geneva, Swizerland, 1998.

References

[15] AES20-1996. Aes recommended practice for professional audio - subjective evaluation of loudspeakers. J. Audio Eng. Soc., 44(5):382–400, 1996.

[1] M. Lavandier, P. Herzog, and S. Meunier. The restitution of timbre by loudspeakers in a listening room: perceptual and physical measurements. In AES 117th Convention, number 6240.

[16] O. Houix. Catégorisation auditive des sources sonores. PhD thesis, Université du Maine, 2003.

[2] A. Gabrielsson, B. Lindström, and O. Till. Loudspeaker frequency response and perceived sound quality. J. Acoust. Soc.Am., 90(2, Pt. 1):707–719, 1991.

[17] I. Borg and P. Groenen. Modern multidimensional scaling. Theory and applications. Springer, 1997.

[3] A. Gabrielsson, U. Rosenberg, and H. Sjögren. Judgments and dimension analyses of perceived sound quality of sound-reproducing systems. J. Acoust. Soc.Am., 55(4):854–861, 1974.

[19] E. Zwicker and H. Fastl. Psychoacoustics: facts and models. Springer, 1999.

[18] British Standard ISO 226:2003. Acoustics- normal equal-loudness level contours. BSi, 2003.

[20] H. Staffeldt. Measurement and prediction of the timbre of sound reproduction. J. Audio Eng. Soc., 32(6):410–414, 1984.

[4] A. Gabrielsson and H. Sjögren. Perceived sound quality of sound-reproducing systems. J. Acoust. Soc.Am., 65(4):1019–1033, 1979.

[21] S. Winsberg and J.D. Carroll. A quasinonmetric method for multidimensional scaling via an extended euclidean model. Psychometrika, 54(2):217–229, june 1989.

[5] F.E. Toole. Subjective measurements of loudspeaker : sound quality and listener performance. J. Audio Eng. Soc., 33(1/2):2–32, 1985.

1694