Interaction Studies

0 downloads 0 Views 2MB Size Report
Drapier Dominique, Verin, Marc, and Millet, Bruno, 2011. Major depressive disorder. 422 skews the recognition of emotional prosody. Progress in Neuro- ...
Interaction Studies Voice features of telephone operators predict auditory preferences of consumers. --Manuscript Draft-Manuscript Number:

IS-D-15-00008R3

Full Title:

Voice features of telephone operators predict auditory preferences of consumers.

Short Title:

Voice features of operator predict auditory preferences

Article Type:

Original Article

First Author:

Vanessa André

Other Authors:

Christine Petr Nicolas André Martine Hausberger Alban Lemasson

Corresponding Author:

Vanessa André Rennes, Bretain FRANCE

Funding Information: Section/Category:

Social behaviour & interaction

Keywords:

acoustic structure, human, phone interaction, prosody, voice perception

Manuscript Classifications:

10.03.01: Animal social behaviour; 20.01: Experimental; 10.05.06: Psycholinguistics

Abstract:

What makes a human voice agreeable is a matter of scientific discussion. Whereas prosody was shown to play a role regarding "male-female" attraction, the impact of frequency modulations in "non-sexual", notably commercial, contexts has attracted little attention. Another point unaddressed in the literature is auditory sensitivity to shortterm frequency modulations as current studies focus more on sentence. Thirty French female operators were recorded over the phone. All "bonjour" greeting words were classified in terms of frequency modulation linearity and orientation at the syllable and word levels. Then, the different voices were played back to students and seniors who had to rate each voice according to their degree of agreeableness. Listeners preferred non-monotonous voices. Differences between age-classes were greater than between sex-classes. Results suggest that short-term frequency changes are important for auditory evaluation of voice agreeableness. This study opens new research perspectives concerning the importance of prosody during consumer-seller interactions.

Author Comments: Order of Authors Secondary Information:

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation

Manuscript

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Click here to download Manuscript Corrected Manuscript3.doc

Voice features of telephone operators predict auditory preferences of consumers. André, Vanessa;a Petr, Christine;b André, Nicolas;a Hausberger, Martine;a Lemasson, Albanc C.N.R.S., Ethologie animale et humaine, U.M.R. 6552 - Université de Rennes 1 –, Campus de Beaulieu 263 Avenue du General Leclerc, Bâtiment 25, 35042 Rennes, France a

Institut d’études politiques, Université de Rennes 1, 104 bd de la Duchesse Anne, 35700 Rennes, France b

c

Université de Rennes 1, Ethologie animale et humaine, UMR6552 - C.N.R.S., Station Biologique, 35380, Paimpont, France What makes a human voice agreeable is a matter of scientific discussion. Whereas prosody was shown to play a role regarding “male-female” attraction, the impact of frequency modulations in “non-sexual”, notably commercial, contexts has attracted little attention. Another point unaddressed in the literature is auditory sensitivity to shortterm frequency modulations as current studies focus more on sentence. Thirty French female operators were recorded over the phone. All “bonjour” greeting words were classified in terms of frequency modulation linearity and orientation at the syllable and word levels. Then, the different voices were played back to students and seniors who had to rate each voice according to their degree of agreeableness. Listeners preferred non-monotonous voices. Differences between age-classes were greater than between sex-classes. Results suggest that short-term frequency changes are important for auditory evaluation of voice agreeableness. This study opens new research perspectives concerning the importance of prosody during consumer-seller interactions. Key words: acoustic structure, human, phone interaction, prosody, voice perception Corresponding author: André Vanessa [email protected] UMR 6552, Bâtiment 25, Campus de Beaulieu, 263 Avenue Général Leclerc 35042 Rennes, France +33670856489

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

1

1. Introduction

2

Regardless of the syntactic and the semantic content of speech, the phonetic

3

organization of voices is commonly used by conversing interlocutors to assess each

4

other’s identity, personality, arousal state and motivation (Bruckert, Lienard, Lacroix,

5

Kreutzer and Leboucher, 2006; Collins, 2000; Feinberg, Jones, Little, Burt and Perrett,

6

2005; Jones, Feinberg, DeBruine, Little and Vukovic, 2010; Scherer, 1972; Scherer,

7

1978; Smith, Brown, Strong and Rencher, 1975). Thus, the interpretation of a word by a

8

receiver is based first on facial expressions (55%), second on voice features (38%), and

9

only third on lexical content (7%) (Mehrabian and Ferris, 1967), suggesting that in the

10

absence of any visual clues, prosody plays a role in auditory perception. However, most

11

studies have focused on women – men auditory sexual attraction and evaluation

12

(Bruckert et al., 2006; Collins, 2000; Jones, Feinberg, DeBruine, Little and Vukovic,

13

2008; Jones et al., 2010; Klofstad, Anderson and Peters, 2012; Re, O’Connor, Bennett

14

and Feinberg, 2012; Simmons, Peters and Rhodes, 2011). Interestingly, a few studies

15

also investigated vocal attractiveness in other social contexts (e.g. Bruckert et al, 2010).

16

Nevertheless, little is still known about voice agreeableness and its impact on social

17

interactions and notably in contexts where voices play a crucial role as in commercial

18

interactions over the phone.

19

A key acoustic feature playing a general role in auditory perception and evaluation

20

is the pitch of a voice. Auditory recognition of gender and age is frequency-dependent;

21

the voices of men and seniors are lower-pitched than those, respectively, of women and

22

juniors (Bruckert et al., 2006; Latinus and Belin, 2011). Regardless of the age and the

23

gender of both receivers and emitters, people with low-pitched voices are perceived as

24

more dominant than people with high-pitched voices (Jones et al., 2010; Klofstad et al., 2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

25

2012). Conversely, men judge women with high-pitched voices to be more feminine

26

than women with low-pitched voices (Collins and Missing, 2003; Jones et al., 2008).

27

Similarly, men with low-pitched voices are considered to be more masculine, more

28

corpulent (larger and taller) and more attractive to women than men with high-pitched

29

voices (Collins, 2000; Feinberg et al., 2005; Jones et al., 2010). Higher pitches are

30

associated with anger, joy and anxiety, whereas lower pitches are associated with

31

sadness and indifference (Zetterholm, 1998).

32

Voice variations in both temporal and frequency domains appear even more crucial

33

for evaluation than pitch (Besson, Magne and Schon, 2002; Latinus and Belin, 2011).

34

Some people with clinical disorders, such as schizophrenia or depression, are unable to

35

detect these so-called prosodic variations in their interlocutors’ voices and thus have

36

difficulties holding a proper conversation (Alpert, Pouget and Silva, 2001; Bach,

37

Buxtorf, Grandjean and Strik, 2009; Péron et al., 2011). Among the different prosodic

38

features, voice rhythm is of primal importance. For example, reasonably fast-speaking

39

people are considered more competent and more persuasive than slow-speaking people

40

in occidental cultures (Peng, Zebrowitz and Lee, 1993). However, speakers with very

41

low or very high rhythms are considered less benevolent than speakers with

42

intermediate rhythms (Brown, Strong and Rencher, 1973; Brown, Strong and Rencher,

43

1974; Brown, William and Alvin, 1975; Smith et al., 1975). Frequency variations can

44

also play a crucial role. In general, less monotonous voices (i.e. with frequent frequency

45

changes for instance while reading a speech) are associated with more positive

46

personalities (Zukerman and Miyake, 1993). However, most of the past studies focus on

47

rhythm variations or frequency changes at the sentence or the word level (Brown et al.,

48

1973; Brown et al., 1974; Brown et al., 1975; Peng et al., 1993; McAleer et al, 2014; 3

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

49

Smith et al., 1975), and comparatively little is known concerning the impact of subtle

50

frequency variations such as sudden within-word changes (at the syllable level). Subtle

51

frequency variations concern both frequency linearity (linear vs non-linear frequencies

52

within the syllable) and orientation (upward vs flat vs downward frequencies across the

53

syllable), but again, to our knowledge, nothing is known concerning the relative

54

importance of frequency modulations for evaluation by listeners.

55

Unfamiliar interlocutors are frequently engaged in oral non-visual conversation,

56

notably over the phone. A lot of companies use phone platforms to approach potential

57

clients, so there is no visual signaling during conversations. Direct oral exchanges are

58

rated more positively than electronically-written messages (Dillma et al., 2009),

59

probably because they allow a better evaluation of the interlocutor. Reports suggest that

60

prosody plays a role during phone interviews. Questionnaire survey results indicate that

61

people say that they would agree more easily to answer questions over the phone when

62

the caller’s voice was not monotonous in terms of sentence intonation (Oksenberg,

63

Coleman and Cannell, 1986; Benkí, Broome, Conrad, Groves and Kreuter, 2011).

64

The current study evaluated the impact of subtle frequency changes (i.e. variation of

65

frequency linearity and orientation) in phone operators’ voices (at the syllable and word

66

levels) on their agreeableness rating by potential consumers. First, we phoned different

67

grocery stores in order to record the voices of various female telephone operators and

68

then classified the recorded voices in terms of frequency modulation linearity and

69

orientation. Second, the different voices were played back to a panel of junior (students)

70

and senior men and women for evaluation. Third, we crossed acoustic and

71

agreeableness data. The impacts of the sex and the age of the listener were assessed.

72

The sex of the listener is supposed to play a role in evaluation ratings (Collins, 2000; 4

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

73

Feinberg et al., 2005; Jones et al., 2010), but it is not known whether this is also true in

74

a commercial context. Although reports show that age impacts differently adult and

75

children listeners’ agreeableness evaluations (Saxton, DeBruine, Jones, Little and

76

Roberts, 2009), we do not know whether age impacts adults’ evaluations.

77

2. Methods

78

2.1 Participants

79

The group of participants (listeners) included 30 biology and psychology students at

80

Rennes universities (France), between 18 and 26 years old (13 men, 17 women), and 30

81

retired persons from various socio-professional categories, between 60 and 75 years old

82

(12 men, 18 women). There was no age difference between males and females in each

83

of the two age groups (Mann Whitney tests; students: U=92.500, Z=-0.756, P=0.457 /

84

retired persons: U= 88.000, Z=0.829, P=0.415). All participants were French native

85

speakers, living in Rennes city, and were naive to playback experiments.

86

2.2 Protocol

87

2.2.1 Voice recording of telephone operators

88

The voices of 30 women operators, working in grocery stores in Rennes (France) were

89

recorded during a phone conversation in February 2011. One experimenter (N. A.)

90

phoned the reception of the store and inquired about their opening schedule. The

91

conversation was recorded directly on a PC computer (Dell ® Latitude D600) using

92

Audacity® (sampling rate 11kHz, resolution 16bit, .wav format). From these

93

conversations, only the first greeting word pronounced by the operator was saved. As

94

the study focused on the pronunciation of a single word with post-recording

95

anonymized files, speaker identification was not possible. Hence, no approval was

96

necessary. Hence, we collected a data set of 30 “Bonjour” ([bɔʒ̃ uʀ], meaning ‘Good 5

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

97

morning’ in French) that was used for subsequent acoustic and playback analyses. The

98

30 recordings were homogenized in intensity using ANA® software (Richard, 1991) so

99

that the acoustic stimuli were comparable.

100

2.2.2 Playback to participants and evaluation of the agreeableness of the voices

101

To avoid any bias in our interpretation, we first made sure that all the seniors passed a

102

cognitive test (MMSE “mini mental state examination” - Fostein, Folstein and McHugh,

103

1975) and a test for geriatric depression (Yesavage et al., 1983). All the seniors

104

successfully passed the screening tests. The Mini Mental State Examination (MMSE) is

105

an 11-question measure. The maximum score is 30. A score of 23 or lower is indicative

106

of cognitive impairment (Kurlowicz and Wallace, 1999). The students were exposed to

107

playbacks in a quiet room of our laboratory and the seniors were tested at home. All

108

raters declared to have no hearing problem. Each participant was informed in oral and

109

written forms that this study aimed to determine the agreeableness of a person according

110

to its vocal features. Participants were also informed: that they were going to listen to

111

several recordings of telephone operator voices pronouncing a particular word; that each

112

voice will be broadcast only once; that they had to rate each voice by responding to the

113

question “Did you find this voice agreeable?” and by using the following rating scale: 1

114

(not agreeable at all), 2 (No), 3 (neither-yes-nor-no), 4 (Yes), 5 (Yes very agreeable);

115

and that there was no time restriction to answer. During the test, all the participants

116

were in front of the researcher in the isolated experimental room. All participants

117

completed all trials. We choose a five choice scale to make it comparable with what was

118

used in comparative studies about evaluation of human feelings (Johnson, 1996;

119

Kokkinos, 2007; Nagy, 2002; Trout, Magnusson and Hedges, 2000). The 30 voice

120

stimuli were proposed to each participant in a different (random) order. Sounds were 6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

121

played back using a Hewlett Packard® house dv9000 computer connected to same

122

Sennheiser® HD 25-1 II noise-cancelling headphones. In order to homogenize the

123

amplitude of the different recorded voices, we used the function "Normalizer" of

124

Audacity software to homogenize the amplitude of the different recorded voices.

125

Participants informed the experimenter when they were ready to listen to the first voice,

126

and had to rate the voice just heard before to ask for listening to the next one. The

127

experimenter controlled the playbacks.

128

2.2.3 Acoustic analyses

129

The complexity of the frequency modulation (FM) of each voice stimulus was qualified

130

using an acoustic classification, based on visual and audio analysis of spectrograms

131

(Sampling frequency 11kHz, FFt-length 1024), commonly used in animal bioacoustics

132

(Datta and Sturtivant, 2002; Hausberger, 1997; Lemasson and Hausberger, 2011;

133

McCowan and Reiss, 1997): (1) FM linearity (linear vs non-linear), (2) FM orientation

134

(upwards vs flat vs downwards) (Fig. 1). Hence, both syllables (‘bon’ vs ‘jour’) were

135

blindly (i.e. the experimenter did not know the identity of the speaker or the score given

136

to his/her voice) and independently scored in terms of linearity and orientation by a

137

second experimenter (V. A.) (See examples in Fig. 2). In addition to this latter

138

experimenter (A), two other persons (B and C, both were naive with the experiment and

139

the second one was also naive with bioacoustics) were ask to rate all sonograms

140

according to above-defined criteria of classification. Then, Cohen’s kappa were

141

calculated to measure the agreement between raters (A-B: 0.97 / A-C: 0.88). This

142

confirms the reproducibility of our rating as Cohen’s kappa greater than 0.85 is

143

typically considered to be high (Cicchetti and Heavens, 1981).

144

2.2.4 Data crossing and statistical analyses 7

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

145

From the 1800 rates collected (i.e. 60 participants rated 30 voices each), we

146

collected the following total of scores: 66 times “1”, 252 times “2”, 776 times “4” and

147

154 times “5”.

148

A first Binomial (agreeable vs disagreeable: when subjects gave a score equal to

149

respectively ‘4 or 5’ vs ‘1 or 2’) Generalized Linear Model (GLM) compared the

150

agreeableness scores of participants from the different age-sex classes in terms of FM

151

linearity and orientation patterns for each syllable separately.

152

Another Binomial (agreeable vs disagreeable) GLM evaluated the relative importance

153

of FM orientation and linearity on the agreeableness scores given by participants from

154

the different age-sex classes at the word level. For instance, a word was considered flat

155

when both syllables were flat. Words were classified as follows (with L=linear,

156

NL=non-linear, F=flat, NF=non-flat for each syllable): non-linear and non-flat (NLNL

157

NFNF / LNL NFNF / LNL FNF / NLNL FNF), linear and non-flat (LL NFNF / LL

158

NFF), non-linear and flat (NLNL FF / LNL FF), linear and flat (LL FF). Analyses were

159

run with R software with FDR (False Discovery Rate) correction for multiple

160

comparisons.

161 162

3. Results 3.1 Analyses at the syllable level

163

The agreeableness scores given by subjects showed that 69.70% and 30.30% of

164

the voices heard were considered agreeable and disagreeable respectively. However,

165

appreciations varied according to the listeners’ characteristics as well as to the acoustic

166

pattern of frequency modulations (FM). While the listener’s sex did not influence voice

167

appreciation scores (Table 1), age appeared a major factor as the seniors found the

168

voices in general more agreeable than did the students (Fig. 3). 8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

169

Analyses of voice acoustic structures showed that both linearity and orientation

170

of frequency modulations in the two syllables “bon” (1) and “jour” (2) were determinant

171

(Table 1). Non-linear FM was preferred to linear FM, but only for syllable 1 (Fig. 4).

172

Agreeableness scores regarding FM orientation patterns differed between the two

173

syllables. Syllables 1 with an upward orientation were allocated the highest scores,

174

whereas syllables 2 with a downward orientation were preferred (Fig. 5). Among non-

175

flat voices, 81/90% were going upward and 19/10% were going downward on the

176

first/second syllable. Significant interactions among factors were limited to sex and age

177

with FM orientations (Table 1). Women found upward FM in syllable 2 more agreeable

178

than men did (Fig. 6). The students found downward syllable 1 and upward syllable 2

179

less agreeable than did the seniors. Flat syllables 1 and 2 were also found less agreeable

180

by younger listeners (Fig. 7).

181

3.2 Analyses at the word level

182

None of the voices heard by subjects were linear and flat, 10% were flat and

183

non-linear, 13.33% were linear and non-flat and 76.67% were non-linear and non-flat.

184

Focusing the analysis on the word level showed that variations in evaluation appeared

185

mostly based on orientation of frequency modulations: words with non-linear and non-

186

flat frequency modulations were preferred to non-linear and flat words, whereas

187

agreeableness scores of linear and non-flat words did not differ from the two other

188

classes (Table 2, Fig. 8).

189

Figure 9 shows spectrograms of typical agreeable (syllable 1: upwards and linear,

190

syllable 2: downwards and non-linear) and disagreeable (syllable 1: flat and non-linear,

191

syllable 2: flat and linear) voices.

9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

192

Moreover, whereas sex did not impact agreeableness scores, it varied with age (Table

193

2). The seniors found non-linear words (both flat and non-flat) more agreeable than the

194

students (Fig. 10).

195

4. Discussion

196

This study confirms that (1) subtle and sudden (at both syllable and word levels)

197

frequency modulations in unfamiliar human voices impact auditory evaluation; (2) the

198

voices of phone operators are not evaluated similarly by all potential consumers; (3) the

199

age more than the sex of the listener impacts his/her agreeableness appreciation; and (4)

200

variations in frequency orientations more than in frequency linearity determine the

201

agreeableness of a voice, non-monotonous voices being allocated the highest rates of

202

agreeableness. However, we first acknowledge that our conclusions are drawn from

203

analyses conducted on fundamental frequencies only. Hence, further investigations on

204

resonant frequencies would be interesting. Also, we used a relatively small sample size

205

of raters and speakers. Hence, replicating this study with more varied speakers (e.g.

206

different age-sex classes), more raters and rating contexts are now necessary.

207

Our study is, to our knowledge, the first to evidence the differential impact of

208

different kinds of subtle and sudden prosodic frequency modulations of human voices

209

on auditory evaluation, particularly at the beginning of a word (first syllable). Hence,

210

short-term prosodic changes seem as crucial as long-term changes. In line with this,

211

some authors have shown that humans are able to categorize very quickly (in less than

212

200ms) a voice as neutral or emotional (Chen and Yang, 2012; Paulmann, Schmidt, Pell

213

and Kotz, 2008). Thus, people attribute particular emotional states to particular voice

214

prosody (Bach et al., 2009; Latinus and Belin, 2011; Mehrabian and Ferris, 1967). For

215

example, some authors showed that voices with large frequency variations are 10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

216

associated with a friendly personality, and speakers with high pitched voices are

217

considered to be helpful and sincere (Brown et al., 1975; Chen and Yang, 2012;

218

Zetterholm, 1998; Weirich, 2008). However, authors underline a cultural impact on the

219

prosody evaluation. One study stresses that Koreans and Americans do not judge a

220

given speaker in similar ways. More precisely, Americans, contrary to Koreans,

221

associate a fast voice with a powerful and qualified person. However, when Korean

222

participants live in the United States, their judgments converge with those of

223

Americans. Indeed, they associate a fast rhythm with competence (Peng et al., 1993).

224

Variations in frequency modulation orientation impact more than frequency

225

modulation linearity, for each syllable as well as for an entire word. Non-flat voices are

226

particularly positively evaluated. A first hypothesis predicts that changes in orientation

227

are preferred to changes in linearity. A second hypothesis predicts that changes in

228

orientation are more easily detectable than changes in linearity by a human ear. Indeed,

229

some authors underline that some sounds (like tones) are more easily perceived by the

230

human ear than others (like clicks) (Szymaszek, Szelag and Sliwowska, 2006). Even

231

though linearity was not the most determining acoustic criterion, it did impact the

232

evaluation of the first syllable of the word “Bonjour”. Thus we suppose that linearity

233

variations are detected to some extent, and that evaluation is based more on a higher

234

agreeableness score for orientation changes. Moreover, we showed that the type of

235

orientation preferred in the first (upward) and in the second (downward) syllables

236

differed. This should warn researchers of the difficulty to draw firm and general

237

conclusions based on prosodic acoustic analyses conducted at a broad (e.g. sentence)

238

level, as often found in the literature (Brown et al., 1973; Brown et al., 1974; Brown et

239

al., 1975; Peng et al., 1993; Smith et al., 1975). We must acknowledge that our 11

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

240

conclusions at the word level are based on a single greeting item and cannot be

241

generalized to other words. Also, our data concern only French-speaking female phone

242

operators and French student listeners. This limit is mentioned by other authors

243

concerning the risk of “overgeneralization” (Montepare and Zebrowitz- McArthur,

244

1987), and the important variations due to the cultural background of the subjects (Peng

245

et al., 1993) and to the experimental context (Jones et al., 2008). Finally, future studies

246

may want to go beyond the simple analysis of the fundamental frequency pattern and

247

notably investigate variations in formants’ distribution.

248

Here the listener sex effect on auditory evaluation was very limited, although

249

most studies currently underline its importance, probably because they focus on the

250

characteristics allowing people to attribute masculine and feminine features to speakers

251

in a “sexual” attraction context (Collins, 2000; Feinberg et al., 2005; Feinberg et al.,

252

2006; Jones et al., 2008; Jones et al., 2010; Little et al., 2010; Puts, 2005; Vukovic et

253

al., 2008). However, these authors found differences in the women’s preferences for

254

masculine men’s voices according to their estrogen cycle, to the listening context (short-

255

term versus long-term mating contexts), and to their self-rated attractiveness (Feinberg

256

et al., 2006; Puts, 2005; Vukovic et al., 2008). As our study in a “commercial” context

257

could not evidence strong differences between men and women, the impact of prosody

258

possibly varies according to the context of the conversation. It seems that,

259

independently of context, men and women do not detect emotions in voices similarly.

260

Men’s treatment of emotional prosody is slower than women’s (Besson et al., 2002).

261

Moreover, the brain organizations for processing prosody differ between sexes

262

(Imaizumi, Homma, Ozawa, Maruishi and Muranaka, 2004; Rymarczyk and

263

Grabowska, 2007). These differences in perception and treatment of human voices can 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

264

explain the fact that we found subtle differences in the agreeableness scores of the

265

participants according to their gender. Particularly, we underlined that women allocated

266

higher scores to upward frequency modulations than men did.

267

Conversely to sex, age of listeners seems to be an important factor for voice

268

appreciation. Indeed, older people allocated higher scores to some voices than did

269

younger people. Two hypotheses could explain this. First, because of an age-dependent

270

auditory sensitivity seniors could be less sensitive than juniors to acoustic changes or to

271

the associated emotions. Several authors have reported that detection and perception of

272

prosody varies with age. Older people are less efficient and accurate in the detection of

273

emotionally associated prosodic changes in a voice (Mill, Alink, Realo and Valk, 2009;

274

Mitchell, 2007). The second hypothesis predicts an age-dependent acoustic preference.

275

Some authors have shown that the positive or the negative valence assigned to voices

276

varies with age (Fecteau, Armony, Joanette and Belin 2005). This is confirmed by our

277

data showing that seniors did not systematically rate higher all the voices, neither all the

278

modulated voices, but only some voices.

279

5. Conclusion

280

To summarize, this study is the first to our knowledge to evidence the impact of

281

human voice prosody on a listener in a commercial context when it is particularly

282

important to use an agreeable voice, and it raises several fundamental and applied

283

perspectives. Moreover, whereas studies generally focus on long-term frequency

284

changes, our results underline the importance to focalize on subtle variations that have

285

been neglected until now.

286 287 13

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

288

Acknowledgments

289

The idea of this study emerged from scientific discussions among members of the

290

interdisciplinary research network GIS “Cerveau, Comportement, Société”. This study

291

received financial support from the French Ministry of Research, the C.N.R.S. We

292

thank the receptionists and all the students and seniors for their participation in this

293

study. We are grateful to Ann Cloarec for correcting the English.

294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

312

REFERENCES

313

Alpert, Murray, Pouget, Enrique R, and Silva, Raul R, 2001. Reflections of depression

314

in acoustic measures of the patients speech. Journal of Affective Disorders, 66, 59–69.

315

Bach, DR, Buxtorf, K, Grandjean, D, and Strik, WK, 2009. The influence of emotion

316

clarity on emotional prosody identification in paranoid schizophrenia. Psychological

317

Medecine, 39, 927–938.

318

Benkí, José, Broome, Jessica, Conrad, Frederick, Groves, Robert, and Kreuter, Frauke,

319

2011. Effects of speech rate, pitch, and pausing on survey participation decisions. Proc.

320

Section on Survey Research Methods, American Statistical Association.

321

Besson, Mireille, Magne, Cyrille, and Schon, Daniele, 2002. Emotional prosody: sex

322

differences in sensitivity to speech melody. Trends in Cognitive Sciences, 6, 405–407.

323

Brown, Bruce L, Strong, William J, and Rencher, Alvin C, 1973. Perceptions of

324

personality from speech: effects of manipulations of acoustical parameters. Journal of

325

the Acoustical Society of America, 54, 29-35.

326

Brown, Bruce L, Strong, William J, and Rencher, Alvin C, 1974. Fifty four voices from

327

two: the effects of simultaneous manipulations of rate, mean fundamental frequency,

328

and variance of fundamental frequency on ratings of personality from speech. Journal of

329

the Acoustical Society of America, 55, 313-318.

330

Brown, Bruce L, Strong, William J, and Rencher, Alvin C, 1975. Acoustic determinants

331

of perceptions of personality from speech. International Journal of Sociology of

332

Language, 6, 11–32.

333

Bruckert, Laetitia, Bestelmeyer, Patricia, Latinus, Marianne, Rouger, Julien, Charest,

334

Ian, Rousselet, Guillaume, A., Kawahara Hideki and Belin, Pascal, 2010. Vocal

335

attractiveness increases by averaging. Current Biology, 20.2, 116-120. 15

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

336

Bruckert, Laetitia, Lienard, Jean-Sylvain, Lacroix, André, Kreutzer, Michel, and

337

Leboucher, Gérard, 2006. Women use voice parameters to assess men's characteristics.

338

Proc. Royal Society: Biology Sciences, 273, 83−89.

339

Chen, Xuhai, and Yang, Yufang, 2012. When brain differentiates happy from neutral in

340

prosody? 6th Int. Conference on Speech Prosody.

341

Cicchetti, Domenic V., and Heavens, Robert, 1981. A computer program for

342

determining the significance of the difference between pairs of independently derived

343

values of kappa or weighted kappa. Educational and Psychological Measurement, 41.1,

344

189-193.

345

Collins, Sarah A, 2000. Men’s voices and women’s choices. Animal Behaviour, 60,

346

773–780.

347

Collins, Sarah A, and Missing, Caroline, 2003. Vocal and visual attractiveness are

348

related in women. Animal Behaviour, 6, 997–1004.

349

Datta, S, and Sturtivant, C, 2002. Dolphin whistle classification for determining group

350

identities. Signal processing, 82, 251-258.

351

Dillma, Dillman A, Phelps, Glen, Tortora, Robert, Swift, Karen, Kohrell, Julie, Berck,

352

Jodi, and Messer, Benjamin L, 2009. Response rate and measurement differences in

353

mixed-mode surveys using mail, telephone, interactive voice response (IVR) and the

354

internet. Social Science Research, 38, 1-18.

355

Fecteau, Shirley, Armony, Jorge L, Joanette, Yves, and Belin, Pascal, 2005. Judgment

356

of

357

Neuropsychology, 12, 40–48.

emotional

nonlinguistic

vocalizations:

age-related

differences.

Applied

16

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

358

Feinberg, David R., Jones, Benedict C, Little, Anthony C, Burt DM, and Perrett DI,

359

2005. Manipulations of fundamental and formant frequencies influence the

360

attractiveness of human male voices. Animal Behaviour, 69, 561–568

361

Feinberg, David R, Jones, Benedict C, Smith, Law MJ, Moore, FR, DeBruine, Lisa M,

362

Cornwell, RE, Hillier, SG, and Perrett, DI, 2006. Menstrual cycle, trait estrogen level,

363

and masculinity preferences in the human voice. Hormones and Behavior, 49, 215-222.

364

Fostein, Marshal F, Folstein, Susan E, and McHugh, Paul R, 1975. Mini-mental State. A

365

practical method for grading the cognitive state of patients for the clinician. Journal of

366

Psychiatric Research, 12, 189–198.

367

Hausberger, Martine, 1997. Social influences on song acquistion and sharing in the

368

European starling (Sturnus vulgaris). In Social influences on vocal learning (Cambridge

369

University Press, Cambridge), pp. 128–156.

370

Imaizumi, Satoshi, Homma, Midori, Ozawa, Yoshiaki, Maruishi, Masahura, and

371

Muranaka, Hiroyuki, 2004. Gender differences in the functional organization of the

372

brain for emotional prosody processing. Int. Conference on Speech Prosody, pp. 23-26.

373

Johnson, Jeff. W. 1996. Linking employee perceptions of service climate to customer

374

satisfaction. Personnel psychology, 49(4), 831-851.

375

Jones, Benedict C, Feinberg, David R, DeBruine, Lisa M, Little, Anthony C, and

376

Vukovic, Jovana, 2008. Integrating cues of social interest and voice pitch in men’s

377

preferences for women’s voices. Biology Letters, 4, 192–194.

378

Jones, Benedict C, Feinberg, David R, DeBruine, Lisa M, Little, Anthony C and

379

Vukovic, Jovana, 2010. A domain-specific opposite-sex bias in human preferences for

380

manipulated voice pitch. Animal Behaviour, 79, 57–62.

17

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

381

Klofstad, Casey, Anderson, Rindy, and Peters, Susan, 2012. Sounds like a winner: voice

382

pitch influences perception of leadership capacity. Proc. Royal Society, 279, 2698-2704.

383

Kokkinos, Constantinos. M. 2007. Job stressors, personality and burnout in primary

384

school teachers. British Journal of Educational Psychology, 77(1), 229-243.

385

Kurlowicz, Lenore., & Wallace, Meredith. 1999. The mini-mental state examination

386

(MMSE). Journal of gerontological nursing, 25(5), 8-9.

387

Latinus, Marianne, and Belin, Pascal, 2011. Human voice perception. Current Biology,

388

21, R143–R145.

389

Lemasson, Alban, and Hausberger, Martine, 2011. Acoustic variability and social

390

significance of calls in female Campbell’s monkeys (Cercopi-thecus campbelli

391

campbelli). Journal of the Acoustical Society of America, 129, 3341–3352.

392

Little, Anthony C, Saxton, Tamsin K, Roberts, Craig S, Jones, Benedict C, Debruine,

393

Lisa M, Vukovic, Jovana, Perrett, David I, Feinberg, David R, and Chenore, Todd,

394

2010. Women’s preferences for masculinity in male faces are highest during

395

reproductive

396

Psychoneuroendocrinology, 35, 912-920.

397

McAleer, Phil, Alexander Todorov, and Pascal Belin, 2014. How do you say ‘Hello’?

398

Personality impressions from brief novel voices. PloS one, 9.3, e90779.

399

McCowan, Brenda, and Reiss, Diana, 1997. Vocal learning in captive bottlenose

400

dolphins: a compari- son with humans and nonhuman animals. Social influence and

401

vocal development, (Cambridge University Press, Cambridge), pp. 178–207.

402

Mehrabian, Albert, and Ferris, Susan R, 1967. Inference of attitudes from nonverbal

403

communication in two channels. Journal of Consulting Psychology, 31, 248-252.

age

range

and

lower

around

puberty

and

post-menopause.

18

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

404

Mill, Aire, Alink, Jüri, Realo, Anu, and Valk, Raivo, 2009. Age-related differences in

405

emotion recognition ability: across-sectional study. Emotion, 9, 619–630.

406

Mitchell, Rachel LC, 2007. Age-related decline in the ability to decode emotional

407

prosody: primary or secondary phenomenon? Cognition and Emotion, 7, 1435–1454.

408

Montepare, Joann M, and Zebrowitz-McArthur, Leslie, 1987. Perceptions of adults with

409

childlike voices in two cultures. Journal of Experimental Social Psychology, 23, 331-

410

349.

411

Nagy, Mark. S. 2002. Using a single-item approach to measure facet job satisfaction.

412

Journal of Occupational and Organizational Psychology, 75(1), 77-86.

413

Oksenberg, Lois, Coleman, Lerita, and Cannell, Charles F, 1986. Interviewer voices and

414

refusal rates in telephone surveys. Public Opinion Quarterly, 50, 97–111.

415

Paulmann, Silke, Schmidt, Patricia, Pell, Marc D, and Kotz, Sonja A, 2008. Rapid

416

processing of emotional and voice information as evidenced by ERPs. 4th Int.

417

Conference Speech Prosody, pp. 205‐209.

418

Peng, Ying, Zebrowitz, Leslie A, and Lee, Hoon Koo, 1993. The impact of cultural

419

background and cross-cultural experience on impressions of American and Korean male

420

speakers. Journal of Cross-Cultural Psychology, 24, 203-220.

421

Peron, Julie, El Tamer, Sarah, Grandjean, Didier, Leray, Emmanuelle, Travers, David,

422

Drapier Dominique, Verin, Marc, and Millet, Bruno, 2011. Major depressive disorder

423

skews the recognition of emotional prosody. Progress in Neuro-Psychopharmacology

424

and Biological Psychiatry, 35, 987–996.

425

Puts, David A, 2005. Menstrual phase and mating context affect women’s preferences

426

for male voice pitch. Evolution and Human Behavior, 26, 388–397.

19

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

427

Re, Daniel E, O’Connor, Jillian JM, Bennett, Patrick J, and Feinberg, David R, 2012.

428

Preferences for very low and very high voice pitch in humans. PLoS ONE, 7, e32719.

429

Richard, Jean-Pierre, (1991). Sound analysis and synthesis using an amiga micro-

430

computer. Bioacoustics, 3, 45–60

431

Rymarczyk, Krystyna, and Grabowska, Anna, 2007. Sex differences in brain control of

432

prosody. Neuropsychologia, 45, 921–930.

433

Saxton, Tamsin K, DeBruine, Lisa M, Jones, Benedict C, Little, Anthony C, and

434

Roberts, Craig S, 2009. Face and voice attractiveness judgments change during

435

adolescence. Evolution and Human Behavior, 30, 398-408.

436

Scherer, Klaus R, 1972. Judging personality from voice: A cross-cultural approach to an

437

old issue in inter-personal perception. Journal of Personality, 40, 191-210.

438

Scherer, Klaus R, 1978. Personality inference from voice quality: the loud voice of

439

extroversion. European Journal of Social Psychology, 8, 467–487.

440

Simmons, Leigh W, Peters, Marianne, and Rhodes, Gillian, 2011. Low pitched voices

441

are perceived as masculine and attractive but do they predict semen quality in men?

442

PLoS One, 6, e29271.

443

Smith, Bruce L, Brown, Bruce L, Strong, William J, and Rencher, Alvin C, 1975.

444

Effects of speech rate on personality perceptions. Language and Speech, 18, 145-152.

445

Szymaszek, A, Szelag, E and Sliwowska, M, 2006. Auditory perception of temporal

446

order in humans: the effect of age, gender, listener practice and stimulus presentation

447

mode. Neuroscience Letters, 403, 190-194.

448

Trout, Andrew, Magnusson, Roy, A., and Hedges, Jerris, R. 2000. Patient satisfaction

449

investigations and the emergency department: what does the literature say? Academic

450

emergency medicine, 7(6), 695-709. 20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

451

Vukovic, Jovana, Feinberg, David R, Jones, Benedict C, DeBruine, Lisa M, Welling,

452

LLM, Little, Anthony C and Smith, FG, 2008. Self-rated attractiveness predicts

453

individual differences in women’s preferences for masculine men’s voices. Personality

454

and Individual Differences, 45, 451–456.

455

Weirich, Melanie, 2008. Vocal stereotypes. Proc. of International Symposium on

456

Cumputer Architecture, pp. 25-27.

457

Yesavage, Jerome A, Brink, TL, Rose, Terence L, Lum, Owen, Huang, Virginia, Adey,

458

Michael, and Leirer, Von Otto, 1983. Development and validation of a geriatric

459

depression screening scale: a preliminary report. Journal of Psychiatric Research, 7, 37-

460

49.

461

Zetterholm, Elisabeth, 1998. Prosody and voice quality in the expression of emotions.

462

Proc. of the Seventh Australian International Conference on Speech Science and

463

Technology, pp. 109–113.

464

Zuckerman, Miron, and Miyake, Kunitate, 1993. The attractive voice: what makes it

465

so? Journal of Nonverbal Behavior, 17, 119-130.

466 467 468 469 470 471 472 21

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

473

Figure captions

474

Fig. 1: Frequency modulation patterns used to classify syllables.

475

Fig. 2: Spectrograms of frequency modulation patterns of the second syllable “jour”: (a)

476

non-linear downwards, (b) non-linear upwards, (c) linear flat.

477

Fig. 3: Voice agreeableness scores given by the students and the seniors (1: agreeable,

478

0: disagreeable). Binomial Glm test ** P