Lombard Effect Mimicking

13 downloads 3319 Views 344KB Size Report
of duration, formant frequencies, formant bandwidth, funda- mental frequency (F0), and ... noise-free Lombard speech in terms of similarity, naturalness and voice quality. ... modification was able to maintain the ability to self-monitor through the ...
Lombard Effect Mimicking Dong-Yan Huang, Susanto Rahardja and Ee Ping Ong Signal Processing Department Institute for Infocomm Research, ASTAR 1, Fusionopolis Way, # 21-01 Connexis, South Tower Singapore 138632 {huang, rsusanto, [email protected]}

Abstract Seeing that speakers increase the intensity of their voice when speaking in loud noise (Lombard effect), this paper proposes a speech transformation approach to mimic this Lombard effect for improving the intelligibility of speech in noisy environments. The approach attempts to simulate the variations of duration, formant frequencies, formant bandwidth, fundamental frequency (F0), and energy in each frequency band due to Lombard effect by using a speech manipulation system STRAIGHT and three models of controlling three acoustic features: fundamental frequency (F0) contour, phoneme duration and spectrum. Different from other manipulation methods, this approach simultaneously modified these acoustic features in time-frequency representation of speech. This approach was evaluated by comparing the synthesized Lombard speech and noise-free Lombard speech in terms of similarity, naturalness and voice quality. The experimental results show that the proposed system is able to convert the neutral speech into Lombard speech in the quality very close to the natural Lombard speech. Index Terms: speech transformation, Lombard effect, duration, fundamental frequency, spectrum.

1. Introduction Speech plays an important role in our daily communication. However, the quality and intelligibility of speech are degraded in loudly noisy environments. Especially, such degradations may lead to the difficulty in recognizing the spoken word or phrase in telephone communication or public address systems. As the intelligibility is the most important concern for communication in noisy environments, a speech enhancement system is desirable for delivering speaking voice with sufficient clarity that can be understood. A number of approaches has been proposed for robust speech recognition through techniques such as microphone-array [1], adaptive noise cancellation [2] and spectral subtraction methods [3]. Although the speech quality seems to have been improved by these algorithms, the auditory signalto-noise ratio of the speaker’s weak spoken words could not be increased. Considering people attempt to change their articulation to communicate more effectively in noisy environments, which is called Lombard effect [4], our approach for enhancing speech signal is inspired from the principles of Lombard effect. According to [4, 5], changes between normal and Lombard speech include higher F0, the energy shift from low frequencies bands to middle or high bands, sound intensity increase, vowel duration increase, spectral title, shift of formant center frequencies for F1 (mainly) and F2. Biologically inspired hand-annotated

- 258 -

enhancement method [6] and automatic enhancement methods [7, 8, 9, 10] are proposed to change some of spectral features for intelligibility enhancement. Hazan and Simpson sought information-rich regions of consonants and enhanced them by amplifying and filtering the signal [6]. Two methods are proposed for automatic speech enhancement using energy redistribution for voiced/unvoiced (ERVU) regions and spectrally stationary/transitions (ERST) regions, while conserving global energy [7]. In Blizzard Challenge 2009, the changes of the formants and the power ratio of consonant-vowel of speech are proposed by using signal processing methods [8, 9]. We have proposed three-stage signal processing techniques (TDPSOLA, ERVU and high-pass filter) to improve the intelligibility of speech by modifying pitch, duration, the energy ratio of consonant-vowel, and formant frequencies [10]. All the experiments showed improvement in intelligibility in noisy environments. However, these methods modified only parts of acoustic features in either time or frequency domain, the naturalness of speech is degraded considerably. We can conclude that simple signal processing results in poor naturalness of speech [8, 9]. In this paper, we attempt to propose a Lombard mimicking system which converts a normal speech into noise-free Lombard speech by using a high quality manipulation system STRAIGHT [11] and the models of controlling acoustic features such as duration, F0, formant frequencies, and energy reassignment in both time and frequency domain. This paper is organized as follows: Section 2 describes in detail the principles of human’s speech production and perception in noisy environments, and previous studies in the area of psycho-acoustics experiments which provide the basis for our modification strategies of acoustic features. Section 3 gives a brief presentation regarding the statistical characteristics of Lombard speech in terms of duration, F0 contour, formant frequencies, formant bandwidth and intensity. Section 4 elaborates Lombard mimicking system to stimulate the variations of these acoustic features in both time and frequency domains. Section 5 presents the evaluation results, it is shown that our proposed Lombard mimicking system is able to generate Lombard speech close to the target Lombard speech in similarity, naturalness and voice quality. Finally, Section 6 gives a conclusion.

2. Speech Production and Perception in Noisy Environments Over the last two decades, various researches are carried out for language learning and language comprehension of languagelearning impaired children, speech production of a hearingimpaired person (clear speech) or of speech in a noisy envi-

3. Lombard Speech We give a qualitative description about Lombard effect in above Section. In order to mimic Lombard effect, we need to have a quantitative description about acoustical characteristics of Lombard speech in noisy environments. In this study, noisy speech uttered in the real word is required and the speech database should contain every possible distortion which could occur in noisy environments. But it is not feasible to collect speech data in various environments. One feasible way is to record noisefree Lombard speech, which is produced by the speakers listening to these noises through headphones, and noisy Lombard speech is produced by being contaminated these channel noises to noise-free Lombard speech. Pellon and Hansen [13] collected SUSAS database including Lombard speech. From the features analysis in SUSAS database, Pellon and Hansen [13] showed that F0 and duration of vowels in Lombard speech are higher and longer than normal (neutral) speech. Figure 1 shows the average of F0 and duration of neutral and Lombard speech (A total of 13 speakers, each speaker generates 70 utterances for each state. The total utterances for calculation of the average are 910 utterances). We observed that the duration of the vowels in speech signal with Lombard effect is slightly longer than that of a neutral signal, while the duration of the consonant of Lombard signal reduces compared to that of the neutral signal. The duration of semivowel increases about 3% more than the neutral signals. We know that the formant transitions occur in semivowels. Extending the duration of these transitions increases the intelligibility. From these observations, we attempt to modify the duration and F0 of the neutral signal to the duration and F0 of Lombard signal. Here we choose the duration of

- 259 -

Global Shift in Duration

Duration (ms)

60 Lombard

Neutral 40 20 0

vowel Sïvowel Cons.

vowel Sïvowel Cons. (a) Pitch

Fundamental Freq.(Hz)

ronment (Lombard speech) [12, 4, 5]. Herein, we give a brief presentation of these research results for providing the basis for our Lombard mimicking system design. As early as 1911, Etienne Lombard observed that talkers changed their speech and made vocal effort when they communicated in noisy environments. He suggested that the acoustic modification was able to maintain the ability to self-monitor through the auditory feedback chain. Junqua [4] found that when speaking with masking noise, speakers reduced speech rate, used high F 0. Because the background noise can mask the speech and may result in complete suppression of faster and weaker sounds, listeners are not able to identify vowel and consonant and understand the message of speech. Other studies reveal that according to different levels of noise, speakers raise F 0 to different degrees [5]. This situation is analogous to a language-learning impaired (LLI) child for language learning and language comprehension, who shows deficits in both perception and production of phonemes characterized by a variety of fast elements embedded in ongoing speech. The authors [12] pointed out that extending the duration of the fast changing transitional elements within the acoustic waveform of speech syllables resulted in significantly improved speech discrimination of those syllables for LLI children. Except for the acuity of the listener’s hearing, intelligibility is also affected by the predictability of the message and the speaker’s enunciation. In order to increase intelligibility of our speech, we attempt to change our vocal characteristics when we speak to a hearing-impaired person (clear speech) or in noisy environments (Lombard speech). We try to speak loudly, in a slower speaking rate, higher F0, higher formants, and an increase in the consonant-to-noise ratio, as well as more and longer pauses [4, 5].

200 Neutral Lombard 150 100 50 0

1

2 (b)

3

Figure 1: a) Global shift duration for neutral signal and Lombard signal for vowel, semivowel and consonant phonemes; b) the average F0 values for neutral and Lombard signals. semivowel value as the modification time scale for vowels and semivowels of the neutral signal. The duration will be scaled for the neutral signal from 1 to 1.3. TLombard − Tneutral 26 − 20 =1+ = 1.3 Tneutral 20 (1) The duration of stable steady part of consonants is stretched for 1 to 1.2. The duration is stretched for slowing down the rapids spectral changes. For F0 modification, we observe that F0 of Lombard signal increases about 10.3% more than the neutral signal: Tscale = 1 +

F 0Lombard − F 0neutral 160 − 145 = = 0.103 F 0neutral 145 (2) We shall change the F0 scale from 1 to 1.103. The characteristics of vocal tract spectrum and average spectral intensity of neutral speech and Lombard speech are shown in Figure 2. We observed that the average spectral intensity of Lombard speech is higher than that of neutral speech and the first 4 formants of Lombard speech are higher than those of neutral speech and the shift for each formant is different. Each formant bandwidth of Lombard speech is narrower than the corresponding one of neutral speech. In order to easily implement the modification of formant frequencies, we shift the formant frequencies by a constant value - the difference between the formant frequencies of Lombard speech and the corresponding ones of neutral speech, which are calculated by the formula: "F 0scale =

"F rmti

=

(F rmti,Lombard − F rmti,neutral ) F rmti,neutral

(3)

For the 1st, 2nd, 3rd, 4th, 5th formants, their locations of the neutral speech are at [411, 570, 1970, 2607, 3368]Hz. There are only four formants for Lombard speech located at [561, 2027, 2545, 3364] Hz. Based on the formula 3, we found that the F1 of neutral speech is removed from the Lombard speech. The F2 formant frequency of the neutral speech is very close to F1 of the Lombard speech. However, the F3, F4, F5 formant frequencies of neutral speech are close to the F2, F3 and F4 formants of the Lombard speech. We remove the F1 from

Intensity Magnitude(dB)

30 Neutral

Lombard

20 10 0

1

2 (a) Vocal tract spectrum

3

Frequency.(Hz)

4000

F4 F3

3000

Figure 3: Block diagram of the speech-to-Lombard speech synthesis system

F2

2000 1000

F1

0

1

B3

B2

B1 2

3

B4

4

4.2. Speech Model and Lombard Effect

(b)

Figure 2: a) Average spectral intensity for neutral signal (bleu) and lombard signal (brown); b) the 4 formant locations and their corresponding bandwidths for neutral and Lombard signals (formant location: neutral (blue), Lombard (cyan); formant bandwidth: neutral (yellow), Lombard (brown).

the neutral speech and shift the rest formants of neutral speech by the scale from 1 to 1.014. Then we use weighting function to further change the formant bandwidth. Inspired by human’s speech perception and production principles in noisy environments and the study of acoustical characteristics of Lombard speech, we attempt to build Lombard effect model and Lombard speech mimicking system to transform the normal, conversational speech into noise-free Lombard speech.

4.3. Duration control model

4. Lombard Effect Mimicking 4.1. Lombard Effect Model The acoustical characteristics of Lombard speech can be identified in time-frequency domain. Hence, we build the model in time-frequency domain of Lombard effect. In noisy environments, speakers attempt to alter their speech by the Lombard effect. The noisy Lombard speech generated by contaminating various noises to noise-free Lombard speech. The Lombard effect is a nonlinear distortion depending on the speaker, noise level and noise type. Here the Lombard effect is modelled only the part of alterations in time-frequency domain without noises. Suppose that any acoustical feature of speech in time-frequency domain is represented by S(ω, τ ), the variations of duration, formant frequencies, formant bandwidth, F0, spectral title, and energy in each frequency band is represented by nonlinear time-frequency warping F (·) and amplitude scaling of each frequency band A(·) and intensity variation factor G(·). The time-frequency Lombard speech L(ω, τ ) is that the S(ω, τ ) is alternated by F (·), A(·) and G(·) as L(ω, τ )

= =

G(ω, τ ) · A(ω, τ )S(F (ω, τ )) G(ω, τ ) · A(ω, τ )S(Ω(ω), Γ(τ ))

In this section, we describe a Lombard speech production system shown in Figure 3, which is based on a speech manipulation system STRAIGHT [11] and the three models of controlling three acoustic features: fundamental frequency (F0), phoneme duration and spectrum. The purpose of this system is to generate Lombard speech. This system transforms the normal speech into noise-free Lombard speech by the following steps: (1) decomposing the speaking voice into three acoustic parameters - F0 contour, spectral envelope, and aperiodicity index (AP) - estimated by using the analysis part of the speech analysisby-synthesis system STRAIGHT; (2) modifying the duration of voiced/unvoiced parts; (3) modifying the pitch contour; (4) modifying the frequencies of the formants, the spectral envelope and AP by using the spectral control model I; (5) synthesizing the modified speech by using the synthesis part of the STRAIGHT; and (6) modifying the amplitude of the synthesized voice by using the spectral control model II.

(4)

where Ω(ω) and Γ(τ ) are non-linear interpolation along the frequency axis and the time axis, respectively. The time-frequency representation of speech signal allows to modify the values along the frequency axis and the time axis separately.

- 260 -

In Section 3, we observe that the duration of each phoneme of the Lombard speech is different from that of the neutral speech. Firstly, we detect voiced and unvoiced boundaries based on F0 contour. We change the duration of stable voiced phoneme based on the formula (1) - from 1 to 1.3 via linear interpolation. The transition part between the voiced/unvoiced phonemes is smoothed by 15 ms linear interpolation, which can be detected using F0 contour information. The unvoiced part is stretched from 1 to 1.2. An example is shown the duration modification for the word ”change” in Figure 4. The unvoiced part is prolonged to the start point of Lombard speech. The voiced part is stretched 30% from neutral speech. All the other features such as spectrogram and aperiodicity index along time axis are prolonged in the same way. This is part of Γ(τ ) in Eq. 4. 4.4. F0 control model The F0 control model is to modify F0 contour shape. When the neutral speech is converted into a noise-free Lombard speech, the F0 contour of the neutral speech is changed into that of the noise-free Lombard speech. According to the statistical study of characteristics of the Lombard speech in SUSAS database [13], the average of the F0 of the Lombard speech increases about 10% in Eq. 2. The F0 of the speech is scaled from 1 to 1.103. As the variation at both ends of voiced signal is sharp and changes quickly, the interpolation of F0 contour is normally carried out for the stable part either by linear transformation or by bilinear transformation. Both ends of F0 contour continue to

180

1

160

Amplitude

F0 of neutral signal F0 of noiseïfree Lombard signal F0 of mimicked Lombard signal

ï1

100 80

0

0.05

0.1

0.15 0.2 Time (s)

40

60 40 20 0

0 ï0.5

120

Magnitude (dB)

Fundamental Frequency (Hz)

140

Neutral Waveform Lombard Waveform

0.5

0

200

400

600 time(ms)

800

1000

Figure 4: The fundamental frequency trajectory is shown as a solid blue line, a ’:’ blue line of Lombard speech and ’-.’ transformed blue line.

0.3

0.35

Neutral LP Filter Lombard LP filter Mimicked Lombard LP filter

20 0 ï20 ï40

1200

0.25

0

1000

2000 3000 Frequency (Hz)

4000

5000

Figure 5: Upper panel: neutral signal and Lombard signal waveform of the word ”change”; b) The corresponding formants of neutral signal, Lombard signal, and mimicked Lombard signal.

be kept for the modified F0 contour shown in Figure 4. In the model of Lombard speech in Eq. 4, we change the scaling factor A(ω, τ ). 4.5. Spectral control model To generate spectral envelop of the Lombard speech, the spectral envelope of the neutral speech is modified by two spectral control models (I and II) corresponding to the two acoustic features. The spectral control model I modifies the formant frequencies of the neutral speech to those of the Lombard speech, which correspond to change the model in frequency axis - the part of A(ω, ·)S(Ω(ω), ·) in Eq. 4. The spectral control model II enhances the stop bursts and unvoiced fricatives - modification the part of G(·, τ ) in Eq. 4. 4.5.1. Formants Modification Similar to the duration and F0 analysis for SUSAS database, the formant analysis showed that the first, 2nd, 3rd, 4th formants of Lombard signal were higher than those of neutral signal [13]. An example of the formants of the word ”change” spoken in neutral style and Lombard style are shown in Figure 5. We use a high-pass filter to remove the F1 formant of the neutral speech. Then, the rest formants of neutral speech are all shifted by the scale from 1 to 1.014. Then the formant frequencies are further changed using weighting function in each frequency band. The locations of first four formants of noise-free Lombard speech are at [561, 2027, 2545, 3364] Hz. The locations of first four formants of mimicked Lombard speech are estimated at [551, 2010, 2480, 3320] Hz. They are very close to these values of Lombard speech. 4.5.2. Energy Redistribution Voiced/Unvoiced Method In order to enhance stop bursts and unvoiced fricatives, the voiced, unvoiced and transition decision boundary are determined by examining the F 0 and aperiodic component. Once the voiced/unvoiced/transition decision is determined, each region of the time-domain signal is multiplied by a separate gain factor empirically determined based on the experiments (e.g., 1.15 for unvoiced signal and 0.95 for voiced signal). The transition between voiced and unvoiced gain factors is smoothed by

- 261 -

Figure 6: Spectrogram of neutral signal of the word ”change”

a 15 ms linear interpolation. The word utterance is then scaled by a normalized gain factor while keeping the modified word energy the same as the original. Figures 6, 7 and 8 show the spectrogram of neutral, noise-free Lombard speech, and mimicked Lombard speech for the word ”change”. Form Figures 6, 7 and 9, we observe that the duration of mimicked Lombard speech is aligned with that of Lombard speech with the proposed duration modification strategy. The formant frequencies of neutral speech are shifted as close as those of Lombard speech - higher formants. The formant weighting function enhances the energy of formants. The energy reassignment changes the intensity distribution of signal.

5. Experimental Results The performance of Lombard effect mimicking is evaluated by comparing the mimicked Lombard speech and noise free Lombard speech in similarity, naturalness, and voice quality.

Table 1: A summary of the 35-word vocabulary set used for SUSAS break change degree destination east eight

35-Word eighty enter fifty fix freeze gain

SUSAS go hello help histogram hot mark

Voca. nav no oh on out point

Set six south stand steer strafe ten

thirty three white wide zero

Table 2: Average rating of subjective evaluation for 35 words in mimicked Lombard speech style Speakers Similarity Naturalness Voice Quality Boston1 3.4 3.2 3.5 Figure 7: Spectrogram of Lombard signal of the word ”change” language since start of schooling age. 5.3. Procedure The experiments are conducted in office environment. The 35-word table is given to each participant. An utterance of one word from the table was randomly drawn from 3 conditions (unmodified, noise-free Lombard speech, mimicked Lombard speech). Utterances from trials that were selected for unmodified control were not modified, while the mimicked Lombard speech signal is applied accordingly. The listener hears and identifies the word through SENNHEISER HD 650 headphones. 5.4. Experimental Results

Figure 8: Spectrogram of mimicked Lombard signal of the word ”change”

5.1. Stimuli We use a subset of SUSAS database for evaluation of our proposed system. The neutral speech samples in SIM and BOSTON1 are used for testing the system. The original speech samples consist of 35 aircraft communication words. They are shown in table 1. These words are spoken two times by 1 male speakers in a quiet environment. No emphasis was placed on the discriminating phonemes of each word. The stimuli have three types: the neutral speech signal, noise-free Lombard speech signal, and mimicked Lombard speech signal. 210 stimuli were prepared in total ( 35 words × 3 speech types × 2 repetitions). The stimuli were arranged randomly. 5.2. Subjects The listening test was conducted by 9 normal hearing individuals. Listener age was 24 ± 5 years, and all listeners were engineering graduate students who have no prior experience with speech research studies involving listening test. Four listeners were females and 5 were males, and these listeners were sufficiently proficient with English, having read English as first

- 262 -

The mimicking Lombard speech are compared with noise-free Lombard speech and neutral speech to evaluate similarity, naturalness, and voice quality. The similarity, naturalness, and voice quality of the words are rated using the scales 1 ∼ 5. The rating of 1 is the worst and 5 is the best. Table 2 shows the rating averaged over 70 test cases of each speaker. The results indicate that our Lombard effect mimicking system achieves the ratings on similarity, naturalness and audio quality close to noise-free Lombard speech. Our system is able to transform neutral speech prosodic and spectral properties to noise-free Lombard speech spectral properties by modifying duration, formant frequencies, formant bandwidth, F0, and energy in each frequency band. In order to understand the contributions of each acoustic feature for Lombard speech generation, we conducted the subjective listening test for each feature change while the rest features are kept unchange. Table 3 shows the rating averaged over 70 test cases of each speaker. From the table 3, we observe that each acoustic feature gives contribution for Lombard speech in term of similarity. The changes of formants include shift of formant frequencies, formant bandwidth and F1 removal. The formants play a more important role than others for Lombard speech generation. However, only one feature change can not generate a signal similar to Lombard speech although naturalness and voice quality are acceptable with high quality synthesizers. The purposes of these modifications can be explained from point views of human brain speech analysis in the following: 1) the prolonged speech signals allow the listeners’ brain to have

Table 3: Average rating of subjective evaluation for 35 words in mimicked Lombard speech style in function of each acoustic feature Features Similarity Naturalness Voice Quality Duration 1.6 2.9 3.5 F0 1.9 2.7 3.0 Formants 2.8 3.3 3.3 Intensity 1.7 3.6 3.7

enough time to distinguish fast phonetic elements; 2) the mimicked Lombard speech increases the consonant-to-noise ratio such that the weak consonants are more easily distinguished in noisy environments; 3) the formant increase enhances the brief, rapidly changing transitional elements within speech, which result in significantly improved speech discrimination of those syllables for listeners.

6. Conclusion This paper proposes a Lombard speech synthesis system which is able to transform the neutral speech into the noise-free Lombard speech for increasing the speech intelligibility in noisy environments. We modeled human’s speech perception and production mechanisms using high-quality speech manipulation system STRAIGHT and the models for changing the duration, pitch contour, consonant-vowel energy ratio, and the formants. Different from other manipulation methods, this approach modifies these acoustic features in both time and frequency domains. From subjective listening test, it is shown that the proposed Lombard effect mimicking system could mimic high-quality Lombard effect in natural way. However, any individual feature modification can not generate a synthetic speech similar to Lombard speech. In the future, the strategies for the energy redistribution in time-frequency plan will be further developed for improving the quality of this Lombard effect mimicking system.

7. References [1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

I. McCowan and H. Bourlard, ”Microphone Array Post-filter for Diffuse Noise Field,” In Proceedings of ICASSP 2002, vol. 1, pp. 905-908, 2002. L. Arslan, A. McCree and V. Viswanathan, ”New methods for adaptive noise suppression,” In Proceedings of ICASSP, vol.1, pp. 812-815, 1995. S. F. Boll, ”Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoust., Speech, Signal Process., vol.27, pp. 113-120, 1979. J.C. Junqua, ”The Lombard reflex and its role on human listeners and automatic speech recognizers,” J. Acoust. Soc. Amer. 93(1), pp. 510-524, 1993. W.V. Summers, D.B. Pisoni, R.H. Bernacki, R.I. Pedlow, and M.A. Stokes, ”Effects of noise on speech production:acoustic and percptual analyses,” J. Acoust. Soc. Amer. 84(3), pp. 917-928, 1988. V. Hazan, and A. Simpson, ”Cue-enhancement strategies for natural VCV and sentence materials presented in noise,” Speech, Hearing and Language - Work in Progress, Phonetics and Linguistics, University College London, vol 9, pp. 43-55, 1996. J.G. Harris and M.D. Skowronski,”Energy redistribution speech intelligibility enhancement, vocalic and transitional cues,” J. Acoust. Soc. Amer., vol. 112, no. 5, pp. 2305-2305, 1998. M. Schr¨oder, S. Pammi and O. T¨urk, ”Multilingual MARY TTS participation in Blizzard Challenge 2009,” Blizzard Challenge workshop 2009, Sep 2009, Edinburgh, UK.

- 263 -

[9]

Minghui Dong, Ling Cen, Paul Chan, D.-Y. Huang, Donglai Zhu, Bin Ma, Haizhou Li, ”I2R Text-to-Speech System for Blizzard Challenge 2009”, Blizzard Challenge workshop 2009, Sep 2009, Edinburgh, UK.

[10] D.-Y. Huang, Susanto Rahardja and One Ee Ping ”Biologically Inspired Algorithm for Enhancement of Speech Intelligibility Over Telephone Channel”, MMSP 2009, CD-ROM, Oct. 5-7, 2009, Rio de Janeiro, Brazil. 2. [11] H. Kawahara, ”Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited”, In proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, pp. 1303-1306, April 1997, Munich, Germany. [12] P. Tallal, S. L. Miller, G. Bedi, G. Byma, X. Wang, S. S. Nagarajan, C. Schreiner, W. M. Jenkins, M. M. Merzenich,”Language Comprehension in Language-Learning Impaired Children Improved with Acoustically Modified Speech”, Science, vol. 271. no. 5245, pp. 81 - 84, 1996. [13] http://www.ldc.upenn.edu/Catalog/docs/LDC99S78/.