Interaction Studies Voice features of telephone operators predict auditory preferences of consumers. --Manuscript Draft-Manuscript Number:
IS-D-15-00008R3
Full Title:
Voice features of telephone operators predict auditory preferences of consumers.
Short Title:
Voice features of operator predict auditory preferences
Article Type:
Original Article
First Author:
Vanessa André
Other Authors:
Christine Petr Nicolas André Martine Hausberger Alban Lemasson
Corresponding Author:
Vanessa André Rennes, Bretain FRANCE
Funding Information: Section/Category:
Social behaviour & interaction
Keywords:
acoustic structure, human, phone interaction, prosody, voice perception
Manuscript Classifications:
10.03.01: Animal social behaviour; 20.01: Experimental; 10.05.06: Psycholinguistics
Abstract:
What makes a human voice agreeable is a matter of scientific discussion. Whereas prosody was shown to play a role regarding "male-female" attraction, the impact of frequency modulations in "non-sexual", notably commercial, contexts has attracted little attention. Another point unaddressed in the literature is auditory sensitivity to shortterm frequency modulations as current studies focus more on sentence. Thirty French female operators were recorded over the phone. All "bonjour" greeting words were classified in terms of frequency modulation linearity and orientation at the syllable and word levels. Then, the different voices were played back to students and seniors who had to rate each voice according to their degree of agreeableness. Listeners preferred non-monotonous voices. Differences between age-classes were greater than between sex-classes. Results suggest that short-term frequency changes are important for auditory evaluation of voice agreeableness. This study opens new research perspectives concerning the importance of prosody during consumer-seller interactions.
Author Comments: Order of Authors Secondary Information:
Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
Manuscript
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
Click here to download Manuscript Corrected Manuscript3.doc
Voice features of telephone operators predict auditory preferences of consumers. André, Vanessa;a Petr, Christine;b André, Nicolas;a Hausberger, Martine;a Lemasson, Albanc C.N.R.S., Ethologie animale et humaine, U.M.R. 6552 - Université de Rennes 1 –, Campus de Beaulieu 263 Avenue du General Leclerc, Bâtiment 25, 35042 Rennes, France a
Institut d’études politiques, Université de Rennes 1, 104 bd de la Duchesse Anne, 35700 Rennes, France b
c
Université de Rennes 1, Ethologie animale et humaine, UMR6552 - C.N.R.S., Station Biologique, 35380, Paimpont, France What makes a human voice agreeable is a matter of scientific discussion. Whereas prosody was shown to play a role regarding “male-female” attraction, the impact of frequency modulations in “non-sexual”, notably commercial, contexts has attracted little attention. Another point unaddressed in the literature is auditory sensitivity to shortterm frequency modulations as current studies focus more on sentence. Thirty French female operators were recorded over the phone. All “bonjour” greeting words were classified in terms of frequency modulation linearity and orientation at the syllable and word levels. Then, the different voices were played back to students and seniors who had to rate each voice according to their degree of agreeableness. Listeners preferred non-monotonous voices. Differences between age-classes were greater than between sex-classes. Results suggest that short-term frequency changes are important for auditory evaluation of voice agreeableness. This study opens new research perspectives concerning the importance of prosody during consumer-seller interactions. Key words: acoustic structure, human, phone interaction, prosody, voice perception Corresponding author: André Vanessa
[email protected] UMR 6552, Bâtiment 25, Campus de Beaulieu, 263 Avenue Général Leclerc 35042 Rennes, France +33670856489
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
1
1. Introduction
2
Regardless of the syntactic and the semantic content of speech, the phonetic
3
organization of voices is commonly used by conversing interlocutors to assess each
4
other’s identity, personality, arousal state and motivation (Bruckert, Lienard, Lacroix,
5
Kreutzer and Leboucher, 2006; Collins, 2000; Feinberg, Jones, Little, Burt and Perrett,
6
2005; Jones, Feinberg, DeBruine, Little and Vukovic, 2010; Scherer, 1972; Scherer,
7
1978; Smith, Brown, Strong and Rencher, 1975). Thus, the interpretation of a word by a
8
receiver is based first on facial expressions (55%), second on voice features (38%), and
9
only third on lexical content (7%) (Mehrabian and Ferris, 1967), suggesting that in the
10
absence of any visual clues, prosody plays a role in auditory perception. However, most
11
studies have focused on women – men auditory sexual attraction and evaluation
12
(Bruckert et al., 2006; Collins, 2000; Jones, Feinberg, DeBruine, Little and Vukovic,
13
2008; Jones et al., 2010; Klofstad, Anderson and Peters, 2012; Re, O’Connor, Bennett
14
and Feinberg, 2012; Simmons, Peters and Rhodes, 2011). Interestingly, a few studies
15
also investigated vocal attractiveness in other social contexts (e.g. Bruckert et al, 2010).
16
Nevertheless, little is still known about voice agreeableness and its impact on social
17
interactions and notably in contexts where voices play a crucial role as in commercial
18
interactions over the phone.
19
A key acoustic feature playing a general role in auditory perception and evaluation
20
is the pitch of a voice. Auditory recognition of gender and age is frequency-dependent;
21
the voices of men and seniors are lower-pitched than those, respectively, of women and
22
juniors (Bruckert et al., 2006; Latinus and Belin, 2011). Regardless of the age and the
23
gender of both receivers and emitters, people with low-pitched voices are perceived as
24
more dominant than people with high-pitched voices (Jones et al., 2010; Klofstad et al., 2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
25
2012). Conversely, men judge women with high-pitched voices to be more feminine
26
than women with low-pitched voices (Collins and Missing, 2003; Jones et al., 2008).
27
Similarly, men with low-pitched voices are considered to be more masculine, more
28
corpulent (larger and taller) and more attractive to women than men with high-pitched
29
voices (Collins, 2000; Feinberg et al., 2005; Jones et al., 2010). Higher pitches are
30
associated with anger, joy and anxiety, whereas lower pitches are associated with
31
sadness and indifference (Zetterholm, 1998).
32
Voice variations in both temporal and frequency domains appear even more crucial
33
for evaluation than pitch (Besson, Magne and Schon, 2002; Latinus and Belin, 2011).
34
Some people with clinical disorders, such as schizophrenia or depression, are unable to
35
detect these so-called prosodic variations in their interlocutors’ voices and thus have
36
difficulties holding a proper conversation (Alpert, Pouget and Silva, 2001; Bach,
37
Buxtorf, Grandjean and Strik, 2009; Péron et al., 2011). Among the different prosodic
38
features, voice rhythm is of primal importance. For example, reasonably fast-speaking
39
people are considered more competent and more persuasive than slow-speaking people
40
in occidental cultures (Peng, Zebrowitz and Lee, 1993). However, speakers with very
41
low or very high rhythms are considered less benevolent than speakers with
42
intermediate rhythms (Brown, Strong and Rencher, 1973; Brown, Strong and Rencher,
43
1974; Brown, William and Alvin, 1975; Smith et al., 1975). Frequency variations can
44
also play a crucial role. In general, less monotonous voices (i.e. with frequent frequency
45
changes for instance while reading a speech) are associated with more positive
46
personalities (Zukerman and Miyake, 1993). However, most of the past studies focus on
47
rhythm variations or frequency changes at the sentence or the word level (Brown et al.,
48
1973; Brown et al., 1974; Brown et al., 1975; Peng et al., 1993; McAleer et al, 2014; 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
49
Smith et al., 1975), and comparatively little is known concerning the impact of subtle
50
frequency variations such as sudden within-word changes (at the syllable level). Subtle
51
frequency variations concern both frequency linearity (linear vs non-linear frequencies
52
within the syllable) and orientation (upward vs flat vs downward frequencies across the
53
syllable), but again, to our knowledge, nothing is known concerning the relative
54
importance of frequency modulations for evaluation by listeners.
55
Unfamiliar interlocutors are frequently engaged in oral non-visual conversation,
56
notably over the phone. A lot of companies use phone platforms to approach potential
57
clients, so there is no visual signaling during conversations. Direct oral exchanges are
58
rated more positively than electronically-written messages (Dillma et al., 2009),
59
probably because they allow a better evaluation of the interlocutor. Reports suggest that
60
prosody plays a role during phone interviews. Questionnaire survey results indicate that
61
people say that they would agree more easily to answer questions over the phone when
62
the caller’s voice was not monotonous in terms of sentence intonation (Oksenberg,
63
Coleman and Cannell, 1986; Benkí, Broome, Conrad, Groves and Kreuter, 2011).
64
The current study evaluated the impact of subtle frequency changes (i.e. variation of
65
frequency linearity and orientation) in phone operators’ voices (at the syllable and word
66
levels) on their agreeableness rating by potential consumers. First, we phoned different
67
grocery stores in order to record the voices of various female telephone operators and
68
then classified the recorded voices in terms of frequency modulation linearity and
69
orientation. Second, the different voices were played back to a panel of junior (students)
70
and senior men and women for evaluation. Third, we crossed acoustic and
71
agreeableness data. The impacts of the sex and the age of the listener were assessed.
72
The sex of the listener is supposed to play a role in evaluation ratings (Collins, 2000; 4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
73
Feinberg et al., 2005; Jones et al., 2010), but it is not known whether this is also true in
74
a commercial context. Although reports show that age impacts differently adult and
75
children listeners’ agreeableness evaluations (Saxton, DeBruine, Jones, Little and
76
Roberts, 2009), we do not know whether age impacts adults’ evaluations.
77
2. Methods
78
2.1 Participants
79
The group of participants (listeners) included 30 biology and psychology students at
80
Rennes universities (France), between 18 and 26 years old (13 men, 17 women), and 30
81
retired persons from various socio-professional categories, between 60 and 75 years old
82
(12 men, 18 women). There was no age difference between males and females in each
83
of the two age groups (Mann Whitney tests; students: U=92.500, Z=-0.756, P=0.457 /
84
retired persons: U= 88.000, Z=0.829, P=0.415). All participants were French native
85
speakers, living in Rennes city, and were naive to playback experiments.
86
2.2 Protocol
87
2.2.1 Voice recording of telephone operators
88
The voices of 30 women operators, working in grocery stores in Rennes (France) were
89
recorded during a phone conversation in February 2011. One experimenter (N. A.)
90
phoned the reception of the store and inquired about their opening schedule. The
91
conversation was recorded directly on a PC computer (Dell ® Latitude D600) using
92
Audacity® (sampling rate 11kHz, resolution 16bit, .wav format). From these
93
conversations, only the first greeting word pronounced by the operator was saved. As
94
the study focused on the pronunciation of a single word with post-recording
95
anonymized files, speaker identification was not possible. Hence, no approval was
96
necessary. Hence, we collected a data set of 30 “Bonjour” ([bɔʒ̃ uʀ], meaning ‘Good 5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
97
morning’ in French) that was used for subsequent acoustic and playback analyses. The
98
30 recordings were homogenized in intensity using ANA® software (Richard, 1991) so
99
that the acoustic stimuli were comparable.
100
2.2.2 Playback to participants and evaluation of the agreeableness of the voices
101
To avoid any bias in our interpretation, we first made sure that all the seniors passed a
102
cognitive test (MMSE “mini mental state examination” - Fostein, Folstein and McHugh,
103
1975) and a test for geriatric depression (Yesavage et al., 1983). All the seniors
104
successfully passed the screening tests. The Mini Mental State Examination (MMSE) is
105
an 11-question measure. The maximum score is 30. A score of 23 or lower is indicative
106
of cognitive impairment (Kurlowicz and Wallace, 1999). The students were exposed to
107
playbacks in a quiet room of our laboratory and the seniors were tested at home. All
108
raters declared to have no hearing problem. Each participant was informed in oral and
109
written forms that this study aimed to determine the agreeableness of a person according
110
to its vocal features. Participants were also informed: that they were going to listen to
111
several recordings of telephone operator voices pronouncing a particular word; that each
112
voice will be broadcast only once; that they had to rate each voice by responding to the
113
question “Did you find this voice agreeable?” and by using the following rating scale: 1
114
(not agreeable at all), 2 (No), 3 (neither-yes-nor-no), 4 (Yes), 5 (Yes very agreeable);
115
and that there was no time restriction to answer. During the test, all the participants
116
were in front of the researcher in the isolated experimental room. All participants
117
completed all trials. We choose a five choice scale to make it comparable with what was
118
used in comparative studies about evaluation of human feelings (Johnson, 1996;
119
Kokkinos, 2007; Nagy, 2002; Trout, Magnusson and Hedges, 2000). The 30 voice
120
stimuli were proposed to each participant in a different (random) order. Sounds were 6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
121
played back using a Hewlett Packard® house dv9000 computer connected to same
122
Sennheiser® HD 25-1 II noise-cancelling headphones. In order to homogenize the
123
amplitude of the different recorded voices, we used the function "Normalizer" of
124
Audacity software to homogenize the amplitude of the different recorded voices.
125
Participants informed the experimenter when they were ready to listen to the first voice,
126
and had to rate the voice just heard before to ask for listening to the next one. The
127
experimenter controlled the playbacks.
128
2.2.3 Acoustic analyses
129
The complexity of the frequency modulation (FM) of each voice stimulus was qualified
130
using an acoustic classification, based on visual and audio analysis of spectrograms
131
(Sampling frequency 11kHz, FFt-length 1024), commonly used in animal bioacoustics
132
(Datta and Sturtivant, 2002; Hausberger, 1997; Lemasson and Hausberger, 2011;
133
McCowan and Reiss, 1997): (1) FM linearity (linear vs non-linear), (2) FM orientation
134
(upwards vs flat vs downwards) (Fig. 1). Hence, both syllables (‘bon’ vs ‘jour’) were
135
blindly (i.e. the experimenter did not know the identity of the speaker or the score given
136
to his/her voice) and independently scored in terms of linearity and orientation by a
137
second experimenter (V. A.) (See examples in Fig. 2). In addition to this latter
138
experimenter (A), two other persons (B and C, both were naive with the experiment and
139
the second one was also naive with bioacoustics) were ask to rate all sonograms
140
according to above-defined criteria of classification. Then, Cohen’s kappa were
141
calculated to measure the agreement between raters (A-B: 0.97 / A-C: 0.88). This
142
confirms the reproducibility of our rating as Cohen’s kappa greater than 0.85 is
143
typically considered to be high (Cicchetti and Heavens, 1981).
144
2.2.4 Data crossing and statistical analyses 7
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
145
From the 1800 rates collected (i.e. 60 participants rated 30 voices each), we
146
collected the following total of scores: 66 times “1”, 252 times “2”, 776 times “4” and
147
154 times “5”.
148
A first Binomial (agreeable vs disagreeable: when subjects gave a score equal to
149
respectively ‘4 or 5’ vs ‘1 or 2’) Generalized Linear Model (GLM) compared the
150
agreeableness scores of participants from the different age-sex classes in terms of FM
151
linearity and orientation patterns for each syllable separately.
152
Another Binomial (agreeable vs disagreeable) GLM evaluated the relative importance
153
of FM orientation and linearity on the agreeableness scores given by participants from
154
the different age-sex classes at the word level. For instance, a word was considered flat
155
when both syllables were flat. Words were classified as follows (with L=linear,
156
NL=non-linear, F=flat, NF=non-flat for each syllable): non-linear and non-flat (NLNL
157
NFNF / LNL NFNF / LNL FNF / NLNL FNF), linear and non-flat (LL NFNF / LL
158
NFF), non-linear and flat (NLNL FF / LNL FF), linear and flat (LL FF). Analyses were
159
run with R software with FDR (False Discovery Rate) correction for multiple
160
comparisons.
161 162
3. Results 3.1 Analyses at the syllable level
163
The agreeableness scores given by subjects showed that 69.70% and 30.30% of
164
the voices heard were considered agreeable and disagreeable respectively. However,
165
appreciations varied according to the listeners’ characteristics as well as to the acoustic
166
pattern of frequency modulations (FM). While the listener’s sex did not influence voice
167
appreciation scores (Table 1), age appeared a major factor as the seniors found the
168
voices in general more agreeable than did the students (Fig. 3). 8
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
169
Analyses of voice acoustic structures showed that both linearity and orientation
170
of frequency modulations in the two syllables “bon” (1) and “jour” (2) were determinant
171
(Table 1). Non-linear FM was preferred to linear FM, but only for syllable 1 (Fig. 4).
172
Agreeableness scores regarding FM orientation patterns differed between the two
173
syllables. Syllables 1 with an upward orientation were allocated the highest scores,
174
whereas syllables 2 with a downward orientation were preferred (Fig. 5). Among non-
175
flat voices, 81/90% were going upward and 19/10% were going downward on the
176
first/second syllable. Significant interactions among factors were limited to sex and age
177
with FM orientations (Table 1). Women found upward FM in syllable 2 more agreeable
178
than men did (Fig. 6). The students found downward syllable 1 and upward syllable 2
179
less agreeable than did the seniors. Flat syllables 1 and 2 were also found less agreeable
180
by younger listeners (Fig. 7).
181
3.2 Analyses at the word level
182
None of the voices heard by subjects were linear and flat, 10% were flat and
183
non-linear, 13.33% were linear and non-flat and 76.67% were non-linear and non-flat.
184
Focusing the analysis on the word level showed that variations in evaluation appeared
185
mostly based on orientation of frequency modulations: words with non-linear and non-
186
flat frequency modulations were preferred to non-linear and flat words, whereas
187
agreeableness scores of linear and non-flat words did not differ from the two other
188
classes (Table 2, Fig. 8).
189
Figure 9 shows spectrograms of typical agreeable (syllable 1: upwards and linear,
190
syllable 2: downwards and non-linear) and disagreeable (syllable 1: flat and non-linear,
191
syllable 2: flat and linear) voices.
9
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
192
Moreover, whereas sex did not impact agreeableness scores, it varied with age (Table
193
2). The seniors found non-linear words (both flat and non-flat) more agreeable than the
194
students (Fig. 10).
195
4. Discussion
196
This study confirms that (1) subtle and sudden (at both syllable and word levels)
197
frequency modulations in unfamiliar human voices impact auditory evaluation; (2) the
198
voices of phone operators are not evaluated similarly by all potential consumers; (3) the
199
age more than the sex of the listener impacts his/her agreeableness appreciation; and (4)
200
variations in frequency orientations more than in frequency linearity determine the
201
agreeableness of a voice, non-monotonous voices being allocated the highest rates of
202
agreeableness. However, we first acknowledge that our conclusions are drawn from
203
analyses conducted on fundamental frequencies only. Hence, further investigations on
204
resonant frequencies would be interesting. Also, we used a relatively small sample size
205
of raters and speakers. Hence, replicating this study with more varied speakers (e.g.
206
different age-sex classes), more raters and rating contexts are now necessary.
207
Our study is, to our knowledge, the first to evidence the differential impact of
208
different kinds of subtle and sudden prosodic frequency modulations of human voices
209
on auditory evaluation, particularly at the beginning of a word (first syllable). Hence,
210
short-term prosodic changes seem as crucial as long-term changes. In line with this,
211
some authors have shown that humans are able to categorize very quickly (in less than
212
200ms) a voice as neutral or emotional (Chen and Yang, 2012; Paulmann, Schmidt, Pell
213
and Kotz, 2008). Thus, people attribute particular emotional states to particular voice
214
prosody (Bach et al., 2009; Latinus and Belin, 2011; Mehrabian and Ferris, 1967). For
215
example, some authors showed that voices with large frequency variations are 10
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
216
associated with a friendly personality, and speakers with high pitched voices are
217
considered to be helpful and sincere (Brown et al., 1975; Chen and Yang, 2012;
218
Zetterholm, 1998; Weirich, 2008). However, authors underline a cultural impact on the
219
prosody evaluation. One study stresses that Koreans and Americans do not judge a
220
given speaker in similar ways. More precisely, Americans, contrary to Koreans,
221
associate a fast voice with a powerful and qualified person. However, when Korean
222
participants live in the United States, their judgments converge with those of
223
Americans. Indeed, they associate a fast rhythm with competence (Peng et al., 1993).
224
Variations in frequency modulation orientation impact more than frequency
225
modulation linearity, for each syllable as well as for an entire word. Non-flat voices are
226
particularly positively evaluated. A first hypothesis predicts that changes in orientation
227
are preferred to changes in linearity. A second hypothesis predicts that changes in
228
orientation are more easily detectable than changes in linearity by a human ear. Indeed,
229
some authors underline that some sounds (like tones) are more easily perceived by the
230
human ear than others (like clicks) (Szymaszek, Szelag and Sliwowska, 2006). Even
231
though linearity was not the most determining acoustic criterion, it did impact the
232
evaluation of the first syllable of the word “Bonjour”. Thus we suppose that linearity
233
variations are detected to some extent, and that evaluation is based more on a higher
234
agreeableness score for orientation changes. Moreover, we showed that the type of
235
orientation preferred in the first (upward) and in the second (downward) syllables
236
differed. This should warn researchers of the difficulty to draw firm and general
237
conclusions based on prosodic acoustic analyses conducted at a broad (e.g. sentence)
238
level, as often found in the literature (Brown et al., 1973; Brown et al., 1974; Brown et
239
al., 1975; Peng et al., 1993; Smith et al., 1975). We must acknowledge that our 11
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
240
conclusions at the word level are based on a single greeting item and cannot be
241
generalized to other words. Also, our data concern only French-speaking female phone
242
operators and French student listeners. This limit is mentioned by other authors
243
concerning the risk of “overgeneralization” (Montepare and Zebrowitz- McArthur,
244
1987), and the important variations due to the cultural background of the subjects (Peng
245
et al., 1993) and to the experimental context (Jones et al., 2008). Finally, future studies
246
may want to go beyond the simple analysis of the fundamental frequency pattern and
247
notably investigate variations in formants’ distribution.
248
Here the listener sex effect on auditory evaluation was very limited, although
249
most studies currently underline its importance, probably because they focus on the
250
characteristics allowing people to attribute masculine and feminine features to speakers
251
in a “sexual” attraction context (Collins, 2000; Feinberg et al., 2005; Feinberg et al.,
252
2006; Jones et al., 2008; Jones et al., 2010; Little et al., 2010; Puts, 2005; Vukovic et
253
al., 2008). However, these authors found differences in the women’s preferences for
254
masculine men’s voices according to their estrogen cycle, to the listening context (short-
255
term versus long-term mating contexts), and to their self-rated attractiveness (Feinberg
256
et al., 2006; Puts, 2005; Vukovic et al., 2008). As our study in a “commercial” context
257
could not evidence strong differences between men and women, the impact of prosody
258
possibly varies according to the context of the conversation. It seems that,
259
independently of context, men and women do not detect emotions in voices similarly.
260
Men’s treatment of emotional prosody is slower than women’s (Besson et al., 2002).
261
Moreover, the brain organizations for processing prosody differ between sexes
262
(Imaizumi, Homma, Ozawa, Maruishi and Muranaka, 2004; Rymarczyk and
263
Grabowska, 2007). These differences in perception and treatment of human voices can 12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
264
explain the fact that we found subtle differences in the agreeableness scores of the
265
participants according to their gender. Particularly, we underlined that women allocated
266
higher scores to upward frequency modulations than men did.
267
Conversely to sex, age of listeners seems to be an important factor for voice
268
appreciation. Indeed, older people allocated higher scores to some voices than did
269
younger people. Two hypotheses could explain this. First, because of an age-dependent
270
auditory sensitivity seniors could be less sensitive than juniors to acoustic changes or to
271
the associated emotions. Several authors have reported that detection and perception of
272
prosody varies with age. Older people are less efficient and accurate in the detection of
273
emotionally associated prosodic changes in a voice (Mill, Alink, Realo and Valk, 2009;
274
Mitchell, 2007). The second hypothesis predicts an age-dependent acoustic preference.
275
Some authors have shown that the positive or the negative valence assigned to voices
276
varies with age (Fecteau, Armony, Joanette and Belin 2005). This is confirmed by our
277
data showing that seniors did not systematically rate higher all the voices, neither all the
278
modulated voices, but only some voices.
279
5. Conclusion
280
To summarize, this study is the first to our knowledge to evidence the impact of
281
human voice prosody on a listener in a commercial context when it is particularly
282
important to use an agreeable voice, and it raises several fundamental and applied
283
perspectives. Moreover, whereas studies generally focus on long-term frequency
284
changes, our results underline the importance to focalize on subtle variations that have
285
been neglected until now.
286 287 13
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
288
Acknowledgments
289
The idea of this study emerged from scientific discussions among members of the
290
interdisciplinary research network GIS “Cerveau, Comportement, Société”. This study
291
received financial support from the French Ministry of Research, the C.N.R.S. We
292
thank the receptionists and all the students and seniors for their participation in this
293
study. We are grateful to Ann Cloarec for correcting the English.
294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
312
REFERENCES
313
Alpert, Murray, Pouget, Enrique R, and Silva, Raul R, 2001. Reflections of depression
314
in acoustic measures of the patients speech. Journal of Affective Disorders, 66, 59–69.
315
Bach, DR, Buxtorf, K, Grandjean, D, and Strik, WK, 2009. The influence of emotion
316
clarity on emotional prosody identification in paranoid schizophrenia. Psychological
317
Medecine, 39, 927–938.
318
Benkí, José, Broome, Jessica, Conrad, Frederick, Groves, Robert, and Kreuter, Frauke,
319
2011. Effects of speech rate, pitch, and pausing on survey participation decisions. Proc.
320
Section on Survey Research Methods, American Statistical Association.
321
Besson, Mireille, Magne, Cyrille, and Schon, Daniele, 2002. Emotional prosody: sex
322
differences in sensitivity to speech melody. Trends in Cognitive Sciences, 6, 405–407.
323
Brown, Bruce L, Strong, William J, and Rencher, Alvin C, 1973. Perceptions of
324
personality from speech: effects of manipulations of acoustical parameters. Journal of
325
the Acoustical Society of America, 54, 29-35.
326
Brown, Bruce L, Strong, William J, and Rencher, Alvin C, 1974. Fifty four voices from
327
two: the effects of simultaneous manipulations of rate, mean fundamental frequency,
328
and variance of fundamental frequency on ratings of personality from speech. Journal of
329
the Acoustical Society of America, 55, 313-318.
330
Brown, Bruce L, Strong, William J, and Rencher, Alvin C, 1975. Acoustic determinants
331
of perceptions of personality from speech. International Journal of Sociology of
332
Language, 6, 11–32.
333
Bruckert, Laetitia, Bestelmeyer, Patricia, Latinus, Marianne, Rouger, Julien, Charest,
334
Ian, Rousselet, Guillaume, A., Kawahara Hideki and Belin, Pascal, 2010. Vocal
335
attractiveness increases by averaging. Current Biology, 20.2, 116-120. 15
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
336
Bruckert, Laetitia, Lienard, Jean-Sylvain, Lacroix, André, Kreutzer, Michel, and
337
Leboucher, Gérard, 2006. Women use voice parameters to assess men's characteristics.
338
Proc. Royal Society: Biology Sciences, 273, 83−89.
339
Chen, Xuhai, and Yang, Yufang, 2012. When brain differentiates happy from neutral in
340
prosody? 6th Int. Conference on Speech Prosody.
341
Cicchetti, Domenic V., and Heavens, Robert, 1981. A computer program for
342
determining the significance of the difference between pairs of independently derived
343
values of kappa or weighted kappa. Educational and Psychological Measurement, 41.1,
344
189-193.
345
Collins, Sarah A, 2000. Men’s voices and women’s choices. Animal Behaviour, 60,
346
773–780.
347
Collins, Sarah A, and Missing, Caroline, 2003. Vocal and visual attractiveness are
348
related in women. Animal Behaviour, 6, 997–1004.
349
Datta, S, and Sturtivant, C, 2002. Dolphin whistle classification for determining group
350
identities. Signal processing, 82, 251-258.
351
Dillma, Dillman A, Phelps, Glen, Tortora, Robert, Swift, Karen, Kohrell, Julie, Berck,
352
Jodi, and Messer, Benjamin L, 2009. Response rate and measurement differences in
353
mixed-mode surveys using mail, telephone, interactive voice response (IVR) and the
354
internet. Social Science Research, 38, 1-18.
355
Fecteau, Shirley, Armony, Jorge L, Joanette, Yves, and Belin, Pascal, 2005. Judgment
356
of
357
Neuropsychology, 12, 40–48.
emotional
nonlinguistic
vocalizations:
age-related
differences.
Applied
16
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
358
Feinberg, David R., Jones, Benedict C, Little, Anthony C, Burt DM, and Perrett DI,
359
2005. Manipulations of fundamental and formant frequencies influence the
360
attractiveness of human male voices. Animal Behaviour, 69, 561–568
361
Feinberg, David R, Jones, Benedict C, Smith, Law MJ, Moore, FR, DeBruine, Lisa M,
362
Cornwell, RE, Hillier, SG, and Perrett, DI, 2006. Menstrual cycle, trait estrogen level,
363
and masculinity preferences in the human voice. Hormones and Behavior, 49, 215-222.
364
Fostein, Marshal F, Folstein, Susan E, and McHugh, Paul R, 1975. Mini-mental State. A
365
practical method for grading the cognitive state of patients for the clinician. Journal of
366
Psychiatric Research, 12, 189–198.
367
Hausberger, Martine, 1997. Social influences on song acquistion and sharing in the
368
European starling (Sturnus vulgaris). In Social influences on vocal learning (Cambridge
369
University Press, Cambridge), pp. 128–156.
370
Imaizumi, Satoshi, Homma, Midori, Ozawa, Yoshiaki, Maruishi, Masahura, and
371
Muranaka, Hiroyuki, 2004. Gender differences in the functional organization of the
372
brain for emotional prosody processing. Int. Conference on Speech Prosody, pp. 23-26.
373
Johnson, Jeff. W. 1996. Linking employee perceptions of service climate to customer
374
satisfaction. Personnel psychology, 49(4), 831-851.
375
Jones, Benedict C, Feinberg, David R, DeBruine, Lisa M, Little, Anthony C, and
376
Vukovic, Jovana, 2008. Integrating cues of social interest and voice pitch in men’s
377
preferences for women’s voices. Biology Letters, 4, 192–194.
378
Jones, Benedict C, Feinberg, David R, DeBruine, Lisa M, Little, Anthony C and
379
Vukovic, Jovana, 2010. A domain-specific opposite-sex bias in human preferences for
380
manipulated voice pitch. Animal Behaviour, 79, 57–62.
17
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
381
Klofstad, Casey, Anderson, Rindy, and Peters, Susan, 2012. Sounds like a winner: voice
382
pitch influences perception of leadership capacity. Proc. Royal Society, 279, 2698-2704.
383
Kokkinos, Constantinos. M. 2007. Job stressors, personality and burnout in primary
384
school teachers. British Journal of Educational Psychology, 77(1), 229-243.
385
Kurlowicz, Lenore., & Wallace, Meredith. 1999. The mini-mental state examination
386
(MMSE). Journal of gerontological nursing, 25(5), 8-9.
387
Latinus, Marianne, and Belin, Pascal, 2011. Human voice perception. Current Biology,
388
21, R143–R145.
389
Lemasson, Alban, and Hausberger, Martine, 2011. Acoustic variability and social
390
significance of calls in female Campbell’s monkeys (Cercopi-thecus campbelli
391
campbelli). Journal of the Acoustical Society of America, 129, 3341–3352.
392
Little, Anthony C, Saxton, Tamsin K, Roberts, Craig S, Jones, Benedict C, Debruine,
393
Lisa M, Vukovic, Jovana, Perrett, David I, Feinberg, David R, and Chenore, Todd,
394
2010. Women’s preferences for masculinity in male faces are highest during
395
reproductive
396
Psychoneuroendocrinology, 35, 912-920.
397
McAleer, Phil, Alexander Todorov, and Pascal Belin, 2014. How do you say ‘Hello’?
398
Personality impressions from brief novel voices. PloS one, 9.3, e90779.
399
McCowan, Brenda, and Reiss, Diana, 1997. Vocal learning in captive bottlenose
400
dolphins: a compari- son with humans and nonhuman animals. Social influence and
401
vocal development, (Cambridge University Press, Cambridge), pp. 178–207.
402
Mehrabian, Albert, and Ferris, Susan R, 1967. Inference of attitudes from nonverbal
403
communication in two channels. Journal of Consulting Psychology, 31, 248-252.
age
range
and
lower
around
puberty
and
post-menopause.
18
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
404
Mill, Aire, Alink, Jüri, Realo, Anu, and Valk, Raivo, 2009. Age-related differences in
405
emotion recognition ability: across-sectional study. Emotion, 9, 619–630.
406
Mitchell, Rachel LC, 2007. Age-related decline in the ability to decode emotional
407
prosody: primary or secondary phenomenon? Cognition and Emotion, 7, 1435–1454.
408
Montepare, Joann M, and Zebrowitz-McArthur, Leslie, 1987. Perceptions of adults with
409
childlike voices in two cultures. Journal of Experimental Social Psychology, 23, 331-
410
349.
411
Nagy, Mark. S. 2002. Using a single-item approach to measure facet job satisfaction.
412
Journal of Occupational and Organizational Psychology, 75(1), 77-86.
413
Oksenberg, Lois, Coleman, Lerita, and Cannell, Charles F, 1986. Interviewer voices and
414
refusal rates in telephone surveys. Public Opinion Quarterly, 50, 97–111.
415
Paulmann, Silke, Schmidt, Patricia, Pell, Marc D, and Kotz, Sonja A, 2008. Rapid
416
processing of emotional and voice information as evidenced by ERPs. 4th Int.
417
Conference Speech Prosody, pp. 205‐209.
418
Peng, Ying, Zebrowitz, Leslie A, and Lee, Hoon Koo, 1993. The impact of cultural
419
background and cross-cultural experience on impressions of American and Korean male
420
speakers. Journal of Cross-Cultural Psychology, 24, 203-220.
421
Peron, Julie, El Tamer, Sarah, Grandjean, Didier, Leray, Emmanuelle, Travers, David,
422
Drapier Dominique, Verin, Marc, and Millet, Bruno, 2011. Major depressive disorder
423
skews the recognition of emotional prosody. Progress in Neuro-Psychopharmacology
424
and Biological Psychiatry, 35, 987–996.
425
Puts, David A, 2005. Menstrual phase and mating context affect women’s preferences
426
for male voice pitch. Evolution and Human Behavior, 26, 388–397.
19
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
427
Re, Daniel E, O’Connor, Jillian JM, Bennett, Patrick J, and Feinberg, David R, 2012.
428
Preferences for very low and very high voice pitch in humans. PLoS ONE, 7, e32719.
429
Richard, Jean-Pierre, (1991). Sound analysis and synthesis using an amiga micro-
430
computer. Bioacoustics, 3, 45–60
431
Rymarczyk, Krystyna, and Grabowska, Anna, 2007. Sex differences in brain control of
432
prosody. Neuropsychologia, 45, 921–930.
433
Saxton, Tamsin K, DeBruine, Lisa M, Jones, Benedict C, Little, Anthony C, and
434
Roberts, Craig S, 2009. Face and voice attractiveness judgments change during
435
adolescence. Evolution and Human Behavior, 30, 398-408.
436
Scherer, Klaus R, 1972. Judging personality from voice: A cross-cultural approach to an
437
old issue in inter-personal perception. Journal of Personality, 40, 191-210.
438
Scherer, Klaus R, 1978. Personality inference from voice quality: the loud voice of
439
extroversion. European Journal of Social Psychology, 8, 467–487.
440
Simmons, Leigh W, Peters, Marianne, and Rhodes, Gillian, 2011. Low pitched voices
441
are perceived as masculine and attractive but do they predict semen quality in men?
442
PLoS One, 6, e29271.
443
Smith, Bruce L, Brown, Bruce L, Strong, William J, and Rencher, Alvin C, 1975.
444
Effects of speech rate on personality perceptions. Language and Speech, 18, 145-152.
445
Szymaszek, A, Szelag, E and Sliwowska, M, 2006. Auditory perception of temporal
446
order in humans: the effect of age, gender, listener practice and stimulus presentation
447
mode. Neuroscience Letters, 403, 190-194.
448
Trout, Andrew, Magnusson, Roy, A., and Hedges, Jerris, R. 2000. Patient satisfaction
449
investigations and the emergency department: what does the literature say? Academic
450
emergency medicine, 7(6), 695-709. 20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
451
Vukovic, Jovana, Feinberg, David R, Jones, Benedict C, DeBruine, Lisa M, Welling,
452
LLM, Little, Anthony C and Smith, FG, 2008. Self-rated attractiveness predicts
453
individual differences in women’s preferences for masculine men’s voices. Personality
454
and Individual Differences, 45, 451–456.
455
Weirich, Melanie, 2008. Vocal stereotypes. Proc. of International Symposium on
456
Cumputer Architecture, pp. 25-27.
457
Yesavage, Jerome A, Brink, TL, Rose, Terence L, Lum, Owen, Huang, Virginia, Adey,
458
Michael, and Leirer, Von Otto, 1983. Development and validation of a geriatric
459
depression screening scale: a preliminary report. Journal of Psychiatric Research, 7, 37-
460
49.
461
Zetterholm, Elisabeth, 1998. Prosody and voice quality in the expression of emotions.
462
Proc. of the Seventh Australian International Conference on Speech Science and
463
Technology, pp. 109–113.
464
Zuckerman, Miron, and Miyake, Kunitate, 1993. The attractive voice: what makes it
465
so? Journal of Nonverbal Behavior, 17, 119-130.
466 467 468 469 470 471 472 21
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65
473
Figure captions
474
Fig. 1: Frequency modulation patterns used to classify syllables.
475
Fig. 2: Spectrograms of frequency modulation patterns of the second syllable “jour”: (a)
476
non-linear downwards, (b) non-linear upwards, (c) linear flat.
477
Fig. 3: Voice agreeableness scores given by the students and the seniors (1: agreeable,
478
0: disagreeable). Binomial Glm test ** P