Blind speech watermarking using hybrid scheme

0 downloads 0 Views 3MB Size Report
Published online: 21 April 2018. © Springer .... We can insert watermarks in high energy regions where human auditory system is less sensitive to, such as the ...
Blind speech watermarking using hybrid scheme based on DWT/DCT and subsampling Ahmed Merrad & Slami Saadi

Multimedia Tools and Applications An International Journal ISSN 1380-7501 Volume 77 Number 20 Multimed Tools Appl (2018) 77:27589-27615 DOI 10.1007/s11042-018-5939-z

1 23

Your article is protected by copyright and all rights are held exclusively by Springer Science+Business Media, LLC, part of Springer Nature. This e-offprint is for personal use only and shall not be self-archived in electronic repositories. If you wish to selfarchive your article, please use the accepted manuscript version for posting on your own website. You may further deposit the accepted manuscript version in any repository, provided it is only made publicly available 12 months after official publication or later and provided acknowledgement is given to the original source of publication and a link is inserted to the published article on Springer's website. The link must be accompanied by the following text: "The final publication is available at link.springer.com”.

1 23

Author's personal copy Multimed Tools Appl (2018) 77:27589–27615 https://doi.org/10.1007/s11042-018-5939-z

Blind speech watermarking using hybrid scheme based on DWT/DCT and sub-sampling Ahmed Merrad 1 & Slami Saadi 1

Received: 12 June 2017 / Revised: 2 February 2018 / Accepted: 26 March 2018 / Published online: 21 April 2018 # Springer Science+Business Media, LLC, part of Springer Nature 2018

Abstract In this paper, a robust and blind speech watermarking technique is proposed using a combined scheme based on discrete cosine transform (DCT) and discrete wavelet transform (DWT) Algorithms within signal sub-sampling. To achieve good imperceptibility, hybridization is implemented against many attacks such as: re-quantization, cropping, echo, amplification and additive white Gaussian noise (AWGN). We profit from the advantages of correlation between two successive samples so that we used signal sub-sampling. Obtained results, compared successfully to recent researches, show the robustness of our proposed approach. Keywords Blind . Speech watermarking . DCT . DWT . Sub-sampling . Attacks

1 Introduction Blind watermarking of speech signals is further challenging compared to the watermarking of images or video sequences, due to the broad dynamic range of the human audit system (HAS) in comparison with human visual system (HVS) [15]. The HAS recognizes sounds over a range of power superior than 109:1 and a range of frequencies larger than 103:1. The sensation of the HAS to the additive white Gaussian noise (AWGN) is elevated as well; this noise in a sound file can be detected as low as 70 dB below ambient level [12, 17, 22, 27]. Depending on the objective and the type of watermark, watermarking systems should have some properties. According to the International Federation of Phonographic Industry (IFPI), audio watermarking must meet a range of requirements such as: a) Imperceptibility: The digital watermark must not influence the quality of original audio signal after it is watermarked. The difference between the original audio and embedded

* Slami Saadi [email protected]

1

Laboratory of Automation and Applied Industrial Diagnostics (LAADI), Faculty of Exact Sciences & Computers, Ziane Achour University of Djelfa (UZAD), BP3117, Djelfa, Algeria

Author's personal copy 27590

b)

c) d)

e)

Multimed Tools Appl (2018) 77:27589–27615

audio can hardly be distinguished by the human ears. In addition, Signal to Noise Ratio (SNR) which is used to assess the quality of audio should be more than 20 dB; Robustness: The embedded watermark data should not be separated or removed by using ordinary audio signal processing operations and attacks. Except the audio undergoes severe damage, the embedded information can be correctly took out from a watermarked audio even under the situation that the host audio undergoes common audio signal processing operations; Capacity: It refers to the number of bits that can be embedded into the audio signal in a unit of time. Security: It implies that the watermark can simply be identified by the authorized person [15]. The safety of a scheme should not relay on its algorithm. People without authorization cannot extract embedded information from a covered audio. Payload: we generally estimate the payload of an audio watermarking scheme with bits per second, and the payload of an successful watermarking algorithm should be upper than 20 bps without disturbing the imperceptibility of the audio;

No algorithm is recognized to suit all of these requirements. Watermarking algorithms intend to attain appropriate trade-offs amongst the requirements. According to IFPI [22], audio watermarking, on a definite data payload generally set toas data embedding ability above 20 bps (Bits Per-Second), must be capable to combat the mainly frequent signal processing influences and attacks, such as temporal scaling, noises, compression, re-sampling, re-quantization, analog/digital conversions in the restriction of imperceptibility (Signal-to-Noise Ratio (SNR) must be greater than 20 dB) [25]. The discrete cosine and wavelet transforms are nowadays used in a broad range of signal processing purposes, such as image, audio and video signals for compression, removal of noise, restoration and other applications. Authors in [26] used undecimated discrete wavelet transform (UDWT) and invariant histogram for audio watermarking algorithm with excellent audible quality and realistic resistance against de-synchronization attack such as arbitrary cropping, time-scale change, pitch shifting, and jittering. Blind audio watermarking algorithm based on the vector norm and the logarithmic quantization index modulation (LQIM) in the wavelet domain is presented in [24]. In [1], a non-blind digital audio watermarking algorithm that suits the minimum requirements of optimal audio watermarking set by the (IFPI) is proposed. A new watermarking scheme in [23] which is robust against desynchronization attacks, improves the security, and has a superior performance in capability of interfering position. Recently, an adaptive audio watermarking algorithm in the wavelet domain to optimize the payload in the perceptual precision limitations of audio signal through strategically using some of its inherent characteristics is presented in [11]. In this study, we combine DWT and DCT and apply them on sub-sampled speech signals prior to embedding the watermark image. This hybridization offers better imperceptibility in addition to its robustness against hard attacks and noises compared to current published works in the subject as revealed in the simulation results below.

2 Discrete cosine transform (DCT) The DCT is a well-known transform capable to show fragments of an audio signal in terms of summation of cosine functions in different frequencies. One of the main significant observable

Author's personal copy Multimed Tools Appl (2018) 77:27589–27615

27591

characteristics of DCT transform is energy storage in a small number of samples (we make use of this advantage). This quality is used to reduce deformation of the original signal in speech watermarking procedure [6, 14]. The discrete cosine transform is a method for translating a signal into basic frequency components. The DCT definition of a 1-D sequence of length N is: 

cðuÞ ¼

N −1 f ðxÞcos aðuÞ∑x¼0

πð2x þ 1Þu 2N



For u ¼ 0; 1; 2; …; N −1 Where, x(n) is the original speech signal and N is the number of samples. In similar manner, the inverse transform is expressed as:   πð2x þ 1Þu N −1 f ðxÞ ¼ ∑x¼0 aðuÞcðuÞcos 2N For u ¼ 0; 1; 2; …; N −1 In both equations, a(u) is defined as: 8 1 > > < pffiffiffiffi Nffiffiffiffiffi r aðuÞ ¼ > 2 > : N

u¼0 u≠0

The features of this algorithm are strong, well hidden and resistant to a diversity of signal distortion resistance. The digital watermark in the DCT transform domain has essential capability of lossy compression resistance. The drawback is its great quantity of computation [20].

3 Discrete wavelet transform (DWT) The DWT is a novel area that provides a time-frequency representation of a signal [4, 18]. It was developed to surmount the short variations of the signal with time are not well covered using Fourrier transform in frequency domain, which can also be applied to analyze non stationary signals [4]. And it is employed in a large range of signal processing purposes [3, 5]. DWT decompose an input signal into two sets of coefficients, at the heart of DWT are a couple of filters: low pass and high pass, the approximated coefficient cA (low frequencies) are formed by passing the signal throughout low pass filter, the details coefficients cD (high frequencies) are formed by passing the signal throughout high pass filter, as illustrated in Fig. 1. Depending on the purpose and the duration of the signal, the signal is decomposed on multi-level discrete wavelets, where the approximation coefficient part is further decomposed into two components of approximation coefficients and detail coefficients [18]. Figure 2 illustrates 2 stages DWT decomposition: Inverse DWT procedure is rebuilt or synthesis the original signal by gathering those components back without loss of information [21] Fig. 3: Wavelet field is appropriate for frequency analysis because of its multi-resolution properties that offers access together to most important components and details of spectrum [2].

Author's personal copy 27592

Multimed Tools Appl (2018) 77:27589–27615

Fig. 1 Decomposing a signal with Haar Daubechies (db1) [15]

4 Performance analysis We evaluate the performance of our watermarking proposal with respect to three common metrics: (a) Payload or capacity (b) Robustness and (c) Imperceptibility (Inaudibility).

Author's personal copy Multimed Tools Appl (2018) 77:27589–27615

27593

Fig. 2 2-levels DWT decomposition

cA2 cA1 S

cD2 cD1

A. Capacity (data payload) Data payload is identified as the number of bits embedded in a one-second audio part [13], and is measured in bits per second (bps). Assume that S the length of the original speech signal in seconds and K is the amount of embedded watermark bits, the capacity of the proposed scheme C is expressed as [10]: K bps C¼ S B. Robustnessxx Robustness is an evaluation of the resistance of the watermark in opposition to attempts to take away or corrupt it, intentionally or accidentally, by different sorts of digital signal processing attacks. We determine the resemblance between the original watermark and the watermark extracted from the attacked image by means of the normalized correlation coefficient and bit error rate [19]. B.1. Normalized correlation coefficient Normalized correlation coefficient (NC) expresses the resemblance between extracted watermarking image and original watermarking image after being attacked. The Normalized correlation given by:  0 0 M NC w; w ¼ ∑Ni¼1 ∑ j¼1 wði; jÞw ði; jÞ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiqffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ∑Ni¼1 ∑Mj¼1 w2 ði; jÞ ∑Ni¼1 ∑Mj¼1 w0 2 ði; jÞ Were N*M is the size of watermark, W(i,j) and W′(i,j) are the watermark and recuperated watermark images respectively.

Fig. 3 Rebuilding a decomposed signal using DWT

Author's personal copy 27594

Multimed Tools Appl (2018) 77:27589–27615

B.2. Bit Error Rate The Bit Error Rate express the proportion of different bits after attacking is calculated by: BER ¼

BERR  100% M N

Where BERR is the amount of error bits and refers to the dimension of the image. C. Imperceptibility C.1. Signal to noise ratio Signal to noise ratio is a factor employed to identify the amount by which the signal is stained with noise. It is identified as the ratio of the signal power to the noise power. Signal to noise ratio can also be determined by the equation below. S is the original speech signal and S′ is the watermarked signal. Both S and S′ have M samples [14]: 2 ∑M a¼1 S ðaÞ SNR ¼ 10log  2 0 ∑M a¼1 S ðaÞ−S ðaÞ

!

C.2. Subjective Quality Evaluation Subjective listening assessments are important to perceptual quality evaluation, because the final decision is accomplished by human acoustic perception. In the subjective listening test, five contributors are given with the original and the watermarked speech signals and are requested to present difference between the two signals, using a five-point subjective grade (SG) specified in Table 1 [22].

5 The proposed scheme We can insert watermarks in high energy regions where human auditory system is less sensitive to, such as the low resolution estimation bands. Embedding watermarks in these sections permit us to raise the robustness of our watermark at small to no further impact on image quality [3]. After Discrete Wavelet Transform, most of the speech signal’s energies are concentrated in the approximation coefficients and the rest of them are in details coefficients, which means are not lost. Speech signals are decomposed into low frequency and high frequency with discrete wavelet transform. Low frequency part focuses the majority of the energy of speech signal, which is the most important component of the original signal. cA presents approximate part. High frequency component focuses the small energy of speech signal. cD presents detail part. Wavelet basis and wavelet level can be chosen according to the type of the algorithm [4]. Thus, digital watermarking is extremely flexible in design. Speech digital signal is decomposed on

Author's personal copy Multimed Tools Appl (2018) 77:27589–27615

27595

Table 1 Subjective grades SG

ODG

Description of impairments

Quality

5.0 4.0 3.0 2.0 1.0

−0.0 −1.0 −2.0 −3.0 −4.0

Imperceptible Perceptible Slightly annoying Annoying Very annoying

Excellent Good Fair poor bad

multi-level discrete wavelet transform. Fig. 4. In the proposed scheme, the embedding of watermark image process, Fig. 5, is described in the following steps: Step 1: For the input speech signal, X(n) is decomposed into two segments with sub-sampling as follows: Seg1: include samples with odd indices; Seg2: include samples with even indices; Seg1={x(1),x(3),x(5),...,};Seg2={x(2),x(4),x(6),...,}; (Note: with respecting the arrangements;) Step 2: applying 1-level DWT with ‘db1’ of each segment produces: For seg1: cAseg1, cDseg1; For seg2: cAseg2, cDseg2; cA: represents the low frequencies (approximation coefficients); cD: represents the high frequencies (detail coefficients); Step 3: Applying DCT on cAseg1 and cAseg2 produces two vectors D1 and D2 respectively; The insertion of watermark bits is in the DCT coefficient so we apply DCT on cAseg1 and cAseg2 to produce two vectors (D1: DCT coefficient of cAseg1; D2: DCT coefficient of cAseg2) Step 4: Insert the watermark image Wnxm and restructure into one dimension vector; Wi={wi (j),1≤ j≤J}, where J=nxm; Step 5: - Include a key in order to random the insertion of the watermark image; - Generate a vector numerated from 1 to (length of D2)/4, (for the component with higher energy) - Random with the introduced Key and generate an additional vector named: rD; Step 6: D1 and D2 are modified as follows: For j=1 to length of watermark (J) ; ; Let If Wi(j)=1

Else

End Step 7: Applying IDCT on the modified D1 and D2 to get watermarked approximation coefficients. Step 8: Applying IDWT on the watermarked approximation coefficients to get modified segments (mseg1, mseg2). Step 9: Rebuild the watermarked speech signal with the inverse of step 1 (inverse of sub-sampling); X’={ mseg1(1),mseg2(1), mseg1(2),mseg2(2), mseg1(3),mseg2(3),...}; (X’: watermarked speech signal) End

The of watermark image extraction process, Fig. 6, is described in the following steps:

Step 1: We do the steps 1, 2, 3 and 5 on the watermarked speech signal X’. Step 2: For j=1 to length of watermark we want to detect Let If( ) Wi’(j)=1; Else Wi’(j)=0; end End

Author's personal copy 27596

Multimed Tools Appl (2018) 77:27589–27615

To illustrate well the working of these steps, we give the following examples for Algorithm explanation: Step 1: For the input speech signal x(n) decomposed into two segments with sub-sampling as follow: Seg1: include samples with odd indices; Seg2: include samples with even indices; Seg1={x(1),x(3),x(5),...,};Seg2={x(2),x(4),x(6),...,}; Note: with respecting the arrangement; For example: The input signal is x=[0.7,0.03,0.27,0.04,0.09,0.82,0,69,0.31,0.95,0.03,0.43,0.38,0.76,0.79,0.18,0.48] From which we can get Seg1,Seg2 as follows: Seg1=[0.03,0.04,0.82,0.31,0.03,0.38,0.79,0.48] ; Seg2=[0.7,0.27,0.09,0.69,0.95,0.43,0.76,0.18]; Step 5: a. Including a key is to random the insertion of the watermark image; b. Generate a vector numerated from 1 to (length of D2)/4; “for the component with high energy” c. Random with the introduced Key generates an additional vector named rD; For example: We suppose that the length: D2=28, from which we construct a vector with elements from 1 to Using the key introduced in step 5(a), we randomize this vector to produce a random vector for example: rD=[3 2 6 7 4 1 5] (randomizing in function with the key value) Step 6: D1, D2 modified as follow: For j=1 to length of watermark (J) ; Let If Wi(j)=1

;

Else

End End For example: We suppose that the watermark length is 4 (4 bits), then the values that will change (after watermarking) are 4 samples from D1 and 4 samples from D2 selected using the first 4 values of vector rD and following the example of the previous step: When j=1 then rD(1)=3 and the first bit is put into the sample D1(3) and D2(3) from the function condition in step 6. When j=2 then rD(2)=2 and the second bit is put into the sample D1(2) and D2(2) from the function condition in step 6. When j=3 then rD(3)=6 and the third bit is put into the sample D1(6) and D2(6) from the function condition in step 6. When j=4 then rD(4)=7 and the second bit is put into the sample D1(7) and D2(7) from the function condition in step 6.

6 Results & discussions All the experiments are implemented on Windows PC having Intel 2.2GHz processor and 2GB RAM, and run using Matlab. All the tests are carry out using MATLAB 7.10.0 on different speech signals which are stored as 16 bit mono wave file, and frequency 44,100 Hz. In order to evaluate the performance of the proposed scheme in real conditions, simulations are performed on different lengths of speech signals including different natures of signals (male and female) and different languages (English, French, German). All of the speeches are downloaded from (http://sound.media.mit.edu/resources/mpeg4/audio/sqam/) SQAM (Sound Quality

Author's personal copy Multimed Tools Appl (2018) 77:27589–27615

27597

Fig. 4 Block diagram of the proposed speech watermarking scheme

Assessment Material Recordings for Subjective Tests) file. We edit the speech file to change lengths. Table 2 represents the speeches used in our experiments. We used a binary image as watermark (32 × 32 pixel), Fig. 7. UZAD means University Zian Achour of Djelfa.

6.1 Imperceptibility Figure 8 represents the evolution of the SNR values with different parameters Δ and demonstrates there is a counter proportionality. Tables 3 and 4 give the accurate

Author's personal copy 27598

Multimed Tools Appl (2018) 77:27589–27615

Original speech signal X

Step1: Segment x into two segments with subsampling

Step2: Perform 1-level DWT of each segment

Step3: Apply DCT on each approximation coefficient

Step4: watermark image

Step5: KEY

Select the part with high energy

Step6: Embed the watermark image

Step7: Apply IDCT

Step8: Apply IDWT

Step9: Apply the inverse of sub-sampling

Watermarked speech signal X’ Fig. 5 Watermark embedding process

values for quantitative evaluation for different speech signals and for only one segment. All of the SNRs are above 20 db. Figure 9 represents the evolution of SNR with variations of speech signal length. Table 5 gives the accurate values for quantitative evaluation. The SNR of our proposed scheme is increasing with the increase of speech signal length.

Author's personal copy Multimed Tools Appl (2018) 77:27589–27615

27599

Watermarked speech signal X’

Step1: Segment X’ into two segments with sub-sampling

Step2: Perform 1-level DWT of each segment

Step3: Apply DCT on each approximation coefficient

KEY

Select the part with energy

Extract the Watermark image Fig. 6 Watermark extraction process

6.2 Robustness In order to evaluate the robustness of the suggested scheme, many attacks were applied including: additive noise (AWGN), re-quantization, cropping, amplification and adding Echo.

6.2.1 AWGN attack Add white Gaussian noise to the vector speech signal (SP) with measuring the power of the SP before adding noise. The performance of watermark mode is evaluated in the presence of AWGN. Table 6 presents that the different speech signals attacked with different powers of

Table 2 Speech properties name

modified

type

Mono/stereo

Nbr bit

Frequency

length

Man/Woman

Language

spfe49_1 spme50_1 spfg53_1 spmf52_1 bass47_1 spmf52_1

SP1 SP2 SP3 SP4 SP5 Sp6

wav wav wav wav wav wav

Mono Mono Mono Mono Mono Mono

16 16 16 16 16 16

44,100 44,100 44,100 44,100 44,100 44,100

5s 10.4 s 15 s 20.01 s 24.860 5.94 s

Woman Man Woman Man Man Man

English English German French – French

Hz Hz Hz Hz Hz Hz

Author's personal copy 27600

Multimed Tools Appl (2018) 77:27589–27615

Fig. 7 Watermark image (32 × 32)

AWGN attack and illustrates the SNR between watermarked signal and attacked watermarked signal. Although the power of attack is large, all the BERs are zeros and the NCs are 1. For that we can state that our proposed scheme is robust for AWGN attacks. Figure 10 illustrates the watermarked signal (SP1), attacked watermarked signal and the difference between them. It demonstrates that the AWGN attack used is big.

6.2.2 Re-quantization attack Sixteen bit per sample watermarked speeches signals is quantized down to 8 bit per sample and then back to 16 bit per sample. Table 7 shows the SNR between the watermarked speech signal and the quantized watermarked speech signal. It shows that all of the BERs are zero and NCs are one after the attack. Figure 11 shows the watermarked speech signal (SP2), quantized watermarked signal and difference between them and illustrates that the difference is small.

Fig. 8 SNR in function with Δ

Author's personal copy Multimed Tools Appl (2018) 77:27589–27615

Table 3 Signal to noise ratio (SNR) evolution in function with Δ

27601

signal

Δ

SNR (db)

SP1 SP2 SP3 SP4 SP5

0.015 0.022 0.023 0.027 0.03

34.8096 35.3223 34.0491 35.7496 35.8737

Table 4 SNR evolution with speech signal length for same delta Δ = 0.015 Speech SNR (db)

3s 33.1824

6s 35.6135

9s 37.0236

12 s 37.5298

15 s 38.3341

18 s 39.6936

21 s 40.1691

24 s 40.8263

6.2.3 Cropping attack We set number (Nbr) of samples of the watermarked speech signal to zero randomly. Table 8 illustrates the SNR between the watermarked speech signal and the cropped watermarked speech signal and the number of samples set randomly to zero. It shows all of the BERs are zero and NCs are 1. Even though the attack is very strong, we can identify our watermark without difficulty. Figure 12 illustrates the watermarked speech signal (SP3), cropped watermarked signal and the difference between them. It also shows that the difference is very large.

6.2.4 Echo attack We add an echo signal with a different delay and decay of to the watermarked speech signal. Table 9 is shows the SNR between watermarked speech signal and watermarked speech signal with echo, and illustrates that all of the BERs are zero and NCs are 1. Although the attack is

Fig. 9 SNR in function with speech signal length

Δ SNR (db)

SP1 (5 s)

0.002 42.4787

0.004 41.4811

0.006 40.1902

0.008 38.8491

Table 5 SNR evolution with delta for same speech signal length

0.01 37.5699

0.012 36.3894

0.014 35.3119

0.016 34.3299

0.018 33.4327

0.02 32.6096

0.022 31.8509

0.024 31.1485

0.026 30.4951

Author's personal copy

27602 Multimed Tools Appl (2018) 77:27589–27615

Author's personal copy Multimed Tools Appl (2018) 77:27589–27615

27603

Table 6 Different speech segments attacked with different AWGNs signal

awgn snr (db)

SNR between WS & AWS

BERs

NCs

SP1 SP2 SP3 SP4 SP5

24 21 19 18 16

24.0276 21.0421 19.0533 18.0711 16.0966

00 00 00 00 00

1 1 1 1 1

Fig. 10 Original signal and watermarked signal attacked with AWGN and the difference between them

very strong (the SNR between WS & EWS), we can detect our watermark easily. Figure 13 illustrates the watermarked speech signal (SP4), watermarked signal with echo and the difference between them which is very big.

6.2.5 Amplification attack The amplitude of the watermarked speech signal is rescaled by ±20% or ± 30%. A positive and negative rate of scaling indicates that the amplitude is amplified and attenuated, respectively

Table 7 SNR between the watermarked signal and its quantized version signal

SNR between WS & QWS

BERs

NCs

SP1 SP2 SP3 SP4 SP5

30.2062 30.1334 29.2238 31.0973 30.2941

00 00 00 00 00

1 1 1 1 1

Author's personal copy 27604

Multimed Tools Appl (2018) 77:27589–27615

Fig. 11 The difference between watermarked signal and its quantized version

Table 8 SNR between watermarked signal and its cropped version signal

SNR between WS & CWS

Nbr

BERs

NCs

SP1 SP2 SP3 SP4 SP5

22.1194 18.8495 15.8470 16.0157 16.5463

1300 5900 18,000 21,800 23,000

00 00 00 00 00

1 1 1 1 1

Fig. 12 The difference between watermarked signal and its copped version

Author's personal copy Multimed Tools Appl (2018) 77:27589–27615

27605

Table 9 SNR between watermarked signal and WS with added echo signal

SNR between WS & EWS

nbr

BERs

NCs

SP1 SP2 SP3 SP4 SP5

7.5334 5.4215 5.9358 4.0261 6.9377

0.4,0.2 0.5,0.5 0.5,0.5 0.6,0.5 0.5,0.4

00 00 00 00 00

1 1 1 1 1

Fig. 13 The difference between watermarked signal and WS with added echo

(http://sound.media.mit.edu/resources/mpeg4/audio/sqam/). Table 10 shows the SNR between watermarked speech signal and amplified watermarked speech signal, and shows that all of the BERs are zero and NCs are 1. Though the attack is strong, we can identify our watermark easily. Figure 14 illustrates the watermarked speech signal (SP5), the amplified watermarked signal and the difference between them. It is observed that the difference is very big.

Table 10 SNR between watermarked signal and its amplified version signal

SNR between WS & AMWS

Factor

BERs

NCs

SP1

19.0363 19.0363 21.2669 21.9868 22.3301 22.9700 22.3303 22.9702 16.1312 17.3911

+20% −20% +20% −20% +20% −20% +20% −20% +30% −30%

00 00 00 00 00 00 00 00 00 00

1 1 1 1 1 1 1 1 1 1

SP2 SP3 SP4 SP5

Author's personal copy 27606

Multimed Tools Appl (2018) 77:27589–27615

Fig. 14 The difference between watermarked signal and its amplified version

Fig. 15 Results of our proposed scheme

Experiments show the strength and robustness of our proposed method and the SNR between watermarked speech signal (WS) & attacked watermarked speech signal (AWS) is small in all types of great attacks on the signal indicating the strength of the attack having a significant impact on the signal, to the level of losing its importance and quality and thus be unusable. Which means that the watermark resists until the signal becomes deficient. We can influence the watermark by greater attacks but it will not be useful if we lose completely the signal.

Author's personal copy Multimed Tools Appl (2018) 77:27589–27615

27607

Fig. 16 Results of the proposed scheme in [7]

Table 11 Capacity of the watermarked speech signal Watermark image 32 × 32 bits Speech Capacity (b/s)

SP1 204.8

SP2 98.4615

SP3 68.2667

SP4 51.1744

SP5 41.1907

6.3 Capacity All of the capacities are above 20 bits per seconds and satisfy the conditions of IFPI (Table 11).

7 Comparisons 7.1 Comparison with results of [7] Figure 15 illustrates the original speech signal (SP1), the watermarked signal and the difference between them. It is obvious that the difference is extremely small and the watermark is spread on the entire signal with uniformity. Figure 16 illustrates the original speech signal

Table 12 Comparison with scheme proposed in [7] based on SNR and capacity Method Parameters

Scheme proposed in [7]

Proposed scheme

Δ SNR (db) Capacity (b/s)

0.036 33.2417 204.8

0.015 34.8096 204.8

Author's personal copy 27608

Multimed Tools Appl (2018) 77:27589–27615

Table 13 Comparison with scheme proposed in [7] based on different attacks using Speech signal sp1

Scheme proposed in [7] NCs

00

1

00

1

35 db

00

1

00

1

30 db

00

1

00

1

24db

0.8789

0.9933

00

1

Down (8bits)

00

1

00

1

Nbr Beginning

00

1

00

1

Random

3.4180

0.9738

00

1

(0.3,0.2)

47.0703

0.6015

00

1

+20%

90.8203

0.1156

00

1

-20%

96.5820

0.0446

00

1

Without attack

AWGN

Requantization

Cropping (1300 samples)

Echo

Images detected

Proposed scheme BERs Images NCs % detected

BERs %

Amplification

Nbr: as defined in ‘B.3.Cropping attack’ (the samples cropped are attacked with AWGN)

(SP1), the watermarked signal and the difference between them. It is clear that the difference is very small on all of the parts of signal and very big on some parts and the watermark is distributed on the entire signal without uniformity (Tables 12, 13 and 14).

7.2 Comparison with results in [13] Figure 17 illustrates the original speech signal (SP5), the watermarked signal and the difference between them. It is obvious that the difference is extremely small and the watermark is spread on the entire signal with uniformity. Figure 18 illustrates the original speech signal (SP5), the watermarked signal and the difference between them. It is clear that the difference is very small on all of the parts of signal and very big on some parts and the watermark is distributed on the entire signal without uniformity (Table 15 and 16).

Scheme proposed in [7] proposed

Method

2.955488 0.405195

3.676715 0.484274

4.282779 0.652848

4.938818 0.738192

SP4 5.875246 0.860793

SP5 1.006378 0.179673

SP1

SP3

SP1

SP2

Extracting time

Embedding time (seconds)

Table 14 Comparison with scheme proposed in [7] between elapsed times in the two schemes

1.033021 0.223196

SP2

1.051057 0.283282

SP3

1.088234 0.318821

SP4

1.145674 0.361185

SP5

Author's personal copy

Multimed Tools Appl (2018) 77:27589–27615 27609

Author's personal copy 27610

Multimed Tools Appl (2018) 77:27589–27615

Table 15 Comparison with scheme proposed in [13] based on SNR and capacity Method Parameters

Scheme proposed in [13]

Proposed scheme

Δ SNR (db) Capacity (b/s)

/ 33.39 17,2

0.035 34.6131 41.19067

7.3 Comparison with results in [8] Figure 19 illustrates the original speech signal (SP6), the watermarked signal and the difference between them. It is obvious that the difference is extremely small and the watermark is spread on the entire signal with uniformity. Figure 20 illustrates the original speech signal (SP6), the watermarked signal and the difference between them. It is clear that the difference is very small on all of the parts of signal and very big on some parts and the watermark is distributed on the entire signal without uniformity (Tables 17 and 18). The proposed method works well. It is known that in most of watermarking methods there is an inverse proportionality between robustness and imperceptibility. We tried to find a trade-

Table 16 Comparison with scheme proposed in [13] based on different attacks using speech signal sp5 Scheme proposed in [13] BERs % NCs Images detected * B AS+ Without attack

BERs %

Proposed scheme Images NCs detected

00

00

-

00

1

40db

6.86

5.71

-

00

1

36 db

11.71

9.14

-

00

1

30db

x

18.57

-

00

1

Re-quantization

Down (8bits)

17.71

16.00

-

00

1

Cropping (samples)

8x25ms

x

00

-

00

1

Echo

(0.3,0.2)

0.57

0.57

-

00

1

+20

00

00

-

00

1

-20

00

00

-

00

1

AWGN

Amplification

*: Basic Detection, +: Adaptive synchronization

Author's personal copy Multimed Tools Appl (2018) 77:27589–27615

27611

Table 17 Comparison with scheme proposed in [8] based on SNR and capacity Method Parameters

Scheme proposed in [8]

Proposed scheme

Δ SNR (db) Capacity (b/s)

0.96 27.1210 172.39

0.02 33.7528 172.39

off by keeping imperceptibility with increasing strength and robustness. To preserve imperceptibility, we exploited the correlation between each two successive samples by subsampling the signal. To enhance robustness, we space between these sub-samples values. This is done using Δ; but since each adjacent two samples are extremely close to each other, the

Table 18 Comparison with scheme proposed in [8] based on different attacks using speech signal sp6

Scheme proposed in [8] BERs %

NCs

00

1

00

1

30 db

1.1719

0.9910

00

1

24db

2.0508

0.9843

00

1

Down (8bits)

0

1

00

1

Nbr Beginning

1.1719

0.9910

00

1

Random

0.7813

0.9940

00

1

(0.3,0.2)

8.8867

0.9314

00

1

+20%

8.7891

0.9314

00

1

-20%

7.4219

0.9423

00

1

Without attack

Images detected

Proposed scheme BERs Images NCs % detected

AWGN

Requantization

Cropping (1300 samples)

Echo

Amplification

Nbr: as defined in ‘B.3.Cropping attack’ (the samples cropped are attacked with AWGN)

Author's personal copy 27612

Multimed Tools Appl (2018) 77:27589–27615

Fig. 17 Results of our proposed scheme

small variations in Δ certainly keep better signal imperceptibility and will separate them clearly which will give superior robustness. Works published in [9, 16] present good results but with more execution times even though computations are carried out using high performances computers. This gives more advantages to our proposed scheme.

8 Conclusion In this work, we implemented a new scheme using a hybrid approach of DWT and DCT algorithms for speech signals watermarking. Sub-sampling is made before transforms

Fig. 18 Results of scheme proposed in [13] a original; b difference (a, c); c watermarked

Author's personal copy Multimed Tools Appl (2018) 77:27589–27615

27613

Fig. 19 Results of our proposed scheme

operations and the watermark is embedded then. Inverse process is realized on the watermarked and hardly attacked and noised speech signal in real conditions. Experimental results performed on different lengths of speeches signals and also different types of signals (male and female) and different languages (English, French, German), indicate that the proposed scheme is robust against different attacks and noises compared to some previous recently published works with good imperceptibility and better performance in embedding capacity satisfying IFPI conditions.

Fig. 20 Results of scheme proposed in [8]

Author's personal copy 27614

Multimed Tools Appl (2018) 77:27589–27615

References 1. Al-Haj A (2013) A dual transform audio watermarking algorithm. Multimed Tools Appl. https://doi. org/10.1007/s11042-013-1645-z. Springer Science+Business Media New York 2. Antony J, Sobin C, Sherly AP (2012) Audio steganography in wavelet domain – a survey. International Journal of Computer Applications 52(13):975 . Journal of Applied Sciences, Engineering and Technology 3. Brannock E, Weeks M, Harrison R (2008) Watermarking with wavelets simplicity leads to robustness. IEEE 4. Cai Y-m (2013) An audio blind watermarking scheme based on DWT-SVD. J Softw 8(7). https://doi. org/10.4304/jsw.8.7.1801-1808. ACADEMY PUBLISHER 5. Debnath L (2002) Wavelet transform and their application. Birkhäuser, Boston 6. Deokar SM, Dhaigude B (2015) Blind audio watermarking based on discrete wavelet and cosine transform. International conference on industrial instrumentation and control (ICIC) 7. Dhar PK 2014 A blind audio watermarking method based on lifting wavelet transform and QR decomposition. 8th International Conference on Electrical and Computer Engineering 20–22 December 8. Dhar PK, Tetsuya S (2017) Blind audion watermarking in transform domain based on singular value decomposition and exponential-log operations. Radioengineering 26(2). https://doi.org/10.13164 /re.2017.0552 9. Elshazly AR, Nasr ME, Fuad MM et al (2016) Synchronized double watermark audio watermarking scheme based on a transform domain for stereo signals. Electronics, Communications and Computers (JEC-ECC), 2016 Fourth International Japan-Egypt Conference on. IEEE, pp 52–57 10. Hemis M, Boudraa B, Merazi-Meksen T (2015) [21 pages]) New secure and robust audio watermarking algorithm based on QR factorization in wavelet domain. Int J Wavelets Multiresolution Inf Process 13: 1550020. https://doi.org/10.1142/S0219691315500204 11. Kaur A, Dutta MK, Soni KM, Taneja N (2017) Localized & self adaptive audio watermarking algorithm in the wavelet domain. Journal of Information Security and Applications 33:1–15. https://doi.org/10.1016/j. jisa.2016.12.003. 2214-2126/©2017 Elsevier Ltd. All rights reserved 12. Khan MI, Sarker IH, Hasan Furhad KD (2012) A new audio watermarking method based on discrete cosine transform with a gray image. Int J Comput Sci Inf Technol (IJCSIT) 4(4) 13. Lin Y, Abdulla WH Audio watermark: a comprehensive foundation using MATLAB. Springer, Cham. https://doi.org/10.1007/978-3-319-07974-5. ISBN 978-3-319-07974-5 (eBook) 14. Mehta V, Sharma N (2015) Secure audio watermarking based on Haar wavelet and discrete cosine transform. International Journal of Computer Applications 123(11):30–36 Published by Foundation of Computer Science (FCS), NY, USA 15. Mishra J, Patil PMV (2013) An effective audio watermarking using dwt-svd. Department of Electronics, BVDUCOE, BVDUCOE, Pune J.S.Chitode, International Journal of Computer Applications 16. Nair UR, Birajdar GK (2016) Audio watermarking in wavelet domain using Fibonacci numbers. Signal and Information Processing (IconSIP), International Conference on. IEEE, pp 1–5 17. Nosrati M, Karimi R, Hariri M (2012) Audio steganography: a survey on recent approaches. World Applied Programming 2(3):202–205 18. Osman MA, Ali NH (2012) Audio watermarking based on wavelet transform. Appl Mech Mater 229-231: 2784–2788 19. Pithiya PM, Desai HL (2013) DWT based digital image watermarking, de-watermarking & authentication. International Journal of Engineering Research and Development 7(5):104–109 e-ISSN: 2278-067X, pISSN: 2278-800X, www.ijerd.com 20. Tiwari A, Sharma M (2012) Comparative evaluation of semi fragile watermarking algorithms for image authentication. J Inf Secur 3(3):189–195. https://doi.org/10.4236/jis.2012.33023. 21. Villanueva-Luna AE, Jaramillo-Nuñ ez A, Sanchez-Lucero D, Ortiz-Lima CM, Gabriel Aguilar-Soto J, Flores-Gil A, May-Alarcon M (2011) De-noising audio signals using MATLAB wavelets toolbox. In: Assi A(ed) Engineering education and research using MATLAB. InTech. http://www.intechopen. com/books/engineering-education-and-researchusing-matlab/de-noising-audio-signals-using-matlabwavelets-toolbox. ISBN: 978-953-307-656-0 22. Vivekananda Bhat K, Sengupta I, Das A (2011) An audio watermarking scheme using singular value decomposition and dither-modulation quantization. Multimed Tools Appl 52:369. https://doi.org/10.1007 /s11042-010-0515-1 23. Wang J, He J (2016) A speech content authentication algorithm based on a novel watermarking method. Multimed Tools Appl. https://doi.org/10.1007/s11042-016-4027-5. Springer Science+Business Media New York 24. Wang X, Wang P, Zhang P, Yang H (2012) A blind audio watermarking algorithm by logarithmic quantization index modulation. Multimed Tools Appl. https://doi.org/10.1007/s11042-012-1259-x. Springer Science+Business Media New York

Author's personal copy Multimed Tools Appl (2018) 77:27589–27615

27615

25. Xiang S, Huang J (2007) Robust audio watermarking against the D/A and A/D conversions. CoRR abs/ 07070397 26. Yang H, Bao D, Wang X, Niu P (2012) A robust content based audio watermarking using UDWT and invariant histogram. Multimed Tools Appl 57:453–476. https://doi.org/10.1007/s11042-010-0644-6. Springer Science+Business Media, LLC 2010 27. Zhao H, Wang F, Chen Z, Liu J (2014) A robust audio watermarking algorithm based on SVD-DWT. Electr Electr Eng 20(1):75–80

A. MERRAD , received his License and MSc degrees in computer science (2012, 2014) respectively from the University Ziane Achour of Djelfa (Algeria). He is working towards his PhD degree in computer science and communications. His interests include artificial intelligence and soft computing with application to signal/image processing.

S. SAADI , received his BSc degree in Electrical and Electronics Engineering from INELEC (1993), University of Boumerdes (Algeria). MSc and PhD degrees in Electronics and Signal Processing from the University of Blida (Algeria, 2005 and 2012). His interests include artificial soft computing applied to signal/image processing, hardware and software reconfigurable implementation of digital signal processing tools, embedded electronics, etc. He worked as an engineer in industrial instrumentation for more than 10 years in the Algerian power systems society. He is, currently, with the Department of Electronics, Faculty of Sciences and Technology, university of Djelfa, Algeria, as an Associate Professor in Signal Processing and Computer Science.