A new speech signal scrambling method for secure ... - IEEE Xplore

0 downloads 0 Views 732KB Size Report
Abstract-This paper presents a two-dimensional signal scrambling method implemented by digital signal processing techniques that elim- inate the need for ...
474

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 7. NO. 4, MAY 1989

A New Speech Signal Scrambling Method for Secure Communications: Theory, Implementation, and Security Evaluation ENRICO DEL RE,

SENIOR

MEMBER, IEEE,ROMANO FANTACCI, DAMIANO MAFFUCCI

MEMBER, IEEE,

AND

Abstract-This paper presents a two-dimensional signal scrambling method implemented by digital signal processing techniques that eliminate the need for frame synchronization without impairing security. Such techniques include short-time Fourier analysis and the filter bank concept. The paper also discusses the use of special digital FIR filters which make it possible to implement the system algorithm completely via commercial processor software. As a result, the system can be configured with very little hardware. Finally, methods for determining available keyspace and selecting keys are also presented.

I. INTRODUCTION ITH security an ever more vital requisite of communications systems, speech scramblers are gaining widespread acceptance as a means of enhancing protection in both military and civilian applications [ 11, [2]. A case in point is mobile radio communications, where the virtual impossibility of providing eavesdropping protection makes security dependent on scrambling effectiveness. Speech scrambling methods may be classified under two major headings. 1 ) Analog Scrambling: In ‘‘analog” scrambling, the only real analog operation is signal transmission since processing is carried out digitally. Incoming speech signals are digitized, processed by a special algorithm, converted, and transmitted to a receiver, where they are digitized again, inversely processed, and reconverted to analog form for reconstruction. 2) Digital Encryption: In digital encryption, the signal is digitized and compressed to reduce the bit rate. The cipher modifies the sequence bit series by means of block or stream (multiregister, nonlinear, or combined) ciphering. The modified sequence is then transmitted via digital modulation. The drawback to this admittedly secure method is that it relies on a signal transmission system totally unlike the one used in conventional radio telephones. In fact, unless there is a provision for speech compression, which, however, considerably complicates Manuscript received August 1, 1988; revised January 15, 1989. This work was supported by OTE-ISC S P A . E. Del Re and R. Fantacci a r e with the Dipartimento di Ingegneria Elettronica, Universith di Firenze, Firenze, Italy. D.Maffucci is with OTE SPA, Firenze, Italy. IEEE Log Number 8927173.

processing and alters voice quality, the bandwidth of the ciphered signal far exceeds that of the clear signal [2]. The degree of security (deciphering difficulty) provided by a speech encryption system is related to 1) the amount of intelligibility left over in the encrypted signal (residual intelligibility) and 2) the number of keys available for encryption (keyspace). Generally speaking, it can be said that the lower a scrambling system’s residual intelligibility and the bigger its keyspace, the higher its degree of security. However, since readily decipherable unintelligible signals may also be generated in large keyspaces, other factors, including bandwidth expansion, delay times, channel resistance (to noise, distortion, etc.), and reconstructed speech quality, cannot be ignored in assessing security.

11. THEORYAND IMPLEMENTATION OF THE METHOD Current speech scramblers rely on techniques such as frame synchronization to provide security, despite the fact that synchronization considerably complicates the implementation of the scrambling method and, in addition, links speech quality to channel condition. The method proposed in this paper avoids synchronization. It affords notable security, while causing no increase in signal bandwidth and introducing only a slight ( = 100 ms) delay in transmission. The proposed scrambling method is derived from that proposed by Lee et al. [4]. However, a different implementation structure is achieved; in particular, different digital filters are used to enhance the performance and flexibility of the method. The major characteristic of the scrambling method proposed by Lee et al. in [4] is that the number of available frequency bands is far higher than in conventional techniques. The method is implemented by dividing the spectrum into subbands and thereafter altering their relative positions. Security is enhanced by the use of time-domain ciphering to create a two-dimensional scrambling system. Ai the same time, frame synchronization is wholly avoided by the use of a fast Fourier transform (FFT), which computes a bank of N filters of FIR type resembling the digital vocoder in [ 6 ] ;the end result

0733-8716/89/0500-0474$01 .OO 0 1989 IEEE

415

DEL RE rr u l . : NEW SPEECH SIGNAL SCRAMBLING METHOD

.-.iup

I Fig. I . The kth filter bank channel.

Interp.

filter

h(n)

Interp.

filter

h(n)

yCn)

x!n)

Interp.

h(n)

filter I

\

\

/

,

I

\ \

n

Fig. 2. The filter bank for analysis and synthesis procedures.

is a notable increase in the computational efficiency of the signal spectral analysis [5]. In implementing the proposed speech signal scrambling method, the following relationship must exist between input and output signals to reconstruct signal x ( n ) (Fig. 1): N- 1

x(n) = y(n) 1

= -

N

N-l

C

=

C

1 N-l yk(n)= - C N k=O

C

x(m)h(n -

k=O +w

Xk(lt)ejwkn

m ) e - j w k m ejwkn

k = O ,=-a

(1)

where wk = 2nk/N. While an ideal low-pass filter with a wr = n / N cutoff frequency is not an essential requisite, it is necessary that the finite impulse response (FIR) filters satisfy the first Nyquist criterion h(0) = 1 h(n) = 0

for n = +N, +2N, +3N,

*

. (2)

In accordance with the Shannon sampling theorem, * , N, are limited over since signals x k ( n ) , k = 1, 2, the frequency range - a / N < w < n/N, they do not have to be computed more than once every R samples, with R IN. Signal y ( n ) is then constructed by interpolation. Two sampling frequencies are used: frequencyf, for input and output signals and frequency& = f , / R for spectral components. If the number of frequency bands N is a power of 2, then both the analysis and synthesis sections can be efficiently computed using the following procedure [6]. Let the low-pass filter be FIR with a length of M = 2LN

--

+ 1 (L being an integer). The most recent M samples of signal x ( n ) are multiplied by the window h ( - n ) . The resulting weighted sequence is partitioned in sections of length N each. These sections are then added together, each sample with its counterpart in the other (2L - 1 ) sections. An N-point sequence function of the time index , N - 1, is thus n, denoted as X,(n), m = 0, 1, obtained. The sequence x, ( n) is circularly shifted (in m ) modulo N by n samples to obtain the new sequence: (3) where [ p ] is p modulo N. By computing the N-point FFT of the sequence x, ( n ), m = 0, 1, N - 1 , we obtain the N complex values x p , p = 0, 1, * , N - 1 (Fig. 2), which represent the frequency components of the speech signal. As cited above, this procedure need be repeated only once every R, where R IN. The synthesis procedure which reconstructs the speech signal from its frequency components Xp(kR ) can be denoted as follows. A computation is made of the inverse N-point FFT of the complex vector Xp (kR ), p = 0, 1, * * , N - 1, sampled at the rate off,/R. From this, the vectors, (kR ), m = 0, 1, N - 1, is obtained. Let the interpolating filter be FIR, of length 2QR + 1 with integer Q, and let f ( n ) , n = 0, 1, 2QR, be its impulse response. R output samples are then obtained from the 2Q most recent vectors s, (kR ) using the formula: e ,

a ,

-

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 7, NO. 4, MAY 1989

476

K

K

(a) (b) Fig. 3. (a) Cipher for time-domain ciphering. (b) Decipher.

111. KEY SELECTION AND KEYSPACING EVALUATION

where

The procedure of key selection is the most important problem to be solved in any scrambling method. The key selection must be performed taking into account the following two parameters. 1) It must take into account residual signal intelligibility after scrambling, which is substantially related to the amount of shifts of the subbands (frequency scrambling and where k 1 is the largest integer contained in k. Fre- efficiency). quency components x p , p = 0, 1 , * * , N - 1 undergo 2) It must take into account permutation security, i.e., permutation prior to a synthesis operation in which vector the difficulty of finding easily the unknown key used in xp is multiplied by the scramble matrix A4 in the cipher the scrambling and the unlikeliness that a scrambled sigand by the descramble matrix M-I in the decipher. Since nal may be transformed into a sufficiently intelligible sigthere is no synchronization, a timing difference is gener- nal using a different key. Permutation security can be reated in the decimation process by the band permutations. lated to the “distance” of the used keys in the keyspace. This results in a phase error in the descrambled signal, To some extent the two parameters are conflicting: the which, however, is too small to be perceived by the hu- maximum scrambling efficiency (minimum residual intelman ear [4]. ligibility) is provided by a spectral inversion, which is, The major differences of the proposed scrambling however, highly unsecure. Hence, the problem of key semethod with respect to similar approaches are the follow- lection consists in determining the set of useful keys coning. The residual intelligibility of the scrambled signal is tained in the space of all possible keys, such that it guarfurther reduced by time-domain ciphering. In this opera- antees low residual intelligibility and high security. tion, a two-dimensional scrambler is created by adding a The key selection procedure is the following. Keyspace masking signal to the output. Considerable memory can evaluation was performed considering the bandwidth of be saved by ciphering on the xp components because of the speech signal, divided into 2 groups (A and B) each their low sampling rate. To get around synchronization, containing 12 subbands. Of the 12 subbands in group A, the masking signal is a variably delayed weighted version the 6 low-frequency subbands were classified as signifiof the transmitted signal; the delayed weighted compo- cant (AS), and 6 were classified as insignificant (ANS); nents are thereafter subtracted from the component ac- in group B, all subbands were classified as insignificant. tually received by the descrambler (Fig. 3). The distinction of insignificant subbands between groups Due to the decimation process, signal reconstruction in A and B is fundamental, as their respective roles are quite the synthesis section is necessarily imperfect; in fact, nei- different for the keyspace selection, as will be clarified at ther the low-pass filter h ( n ) nor the interpolating filter the end of this section. f ( n ) is perfect, Furthermore, even if a Nyquist low-pass The frequency scrambling efficiency was measured for filter were used and there were no decimation process, intelligibility using a weighted shift parameter (WSP). signal reconstruction at the receiving end could not be Weighting was determined on the basis of the type of shift perfect due to permutation. Thus, sharper filters, such as as follows. the FIR type (designed using the equiripple approximaWeight P1 was associated with shifts of 1) a subband tion method [ti]), may be employed instead of theoretical in B within B, 2) an ANS subband within A, and 3) an AS filters (i.e., the windowed sinc function); the maximum to an ANS subband. decimation factor R = N can then be chosen without sacWeight P2 ( P 2 < P1) was associated with a shift rificing signal quality significantly. Moreover, if R = N, of one AS subband to another AS subband. another ciphering operation can be more easily introThe shift weights may be correlated with subband numduced: a frequency component X p ( k N ) can be altema- ber as tively multiplied by +_1to cause the inversion of that band. Finally, in order to avoid band expansion, permuP1 = ( l / a ) NNS tation has been restricted to a number P of central bands, while the extreme bands have been set at zero. P2 = P1/4

Ll(n) =

r

1:

-

-Q

+1

-

-

DEL RE cr

(I/..

417

NEW SPEECH SIGNAL SCRAMBLING METHOD

where NNS is the number of insignificant subbands (i.e., 18) and a is an experimentally derived weight (i.e., 150) so that P1

=

0.12

P2

=

0.03.

(6)

The WSP is defined as the sum of subband shifts (neglecting the direction of shift) divided by the total number of subbands (i.e., 24) and multiplied by the shift weights. A high WSP, the highest being encountered in the case of spectral inversion, indicates a high number of subband shifts from one group to another. This improvement in signal unintelligibility is, however, accompanied by a reduction in permutation key security. A simple procedure allows determination of the security parameter, the weighted mobile average (WMA). Consider a local scrambled sequence of four elements such as S , , S 2 , S 3 , and S4. The mean distance between the original positions of S, and the other subbands is calculated as and the procedure is repeated for each group of four adjacent subbands, shifting one position each time. The minimum value of the calculated 21 local means is the WMA. In order to account for the greater importance of the significant subbands, each local mean must also be weighted according to the number of significant subbands it contains. With only one significant subband, the weight is unitary. With two or more significant subbands, the weight decreases 1) in relation to increases in the number of significant subbands and 2) in relation to the positions of the significant subbands; in the case of adjacent subbands, the weight decrement is even greater. Weights can be expressed as follows:

w=

1

ifN

w = w2

if N 1 2 and NA = 0

= WK/2N-‘

W = (WO

=

2andNA = 0

Since some of the elements are compelled to change group, the number of possible permutations K is (19) 12! 12! < K < 24!, and, since the number of elements changing group is between four and eight,

-

K = (12!

(5 (z

-

12!)

if N

I2

w,

ifN

=

WO =

w,

ifN

>2

12)

z=4

-

(I2)). 2

(10)

This equation can be generalized as

K = NA!

*

-

NB!

(“F) (T))

(z:,;

*

+ WO/(] + N A ) ) / ( l + NA)

WO =

8

/13\

h=O

ifz

I6

h = z - 6

ifz

>6

Kmin= m . Ne

and NA 1 1

6!

*

6!

-

I

p4 (J

2

(8)

where W

12! . 12!.

(I1)

where NA and NB are the number of elements in A and B and vs and vi are, respectively, the upper and lower bounds of the number of elements changing group. The approximations made with the above constraints cause a decrease in the keyspace estimate so that

ifNI 1

w = wl = WK

High WSP and WMA are a sign of good keyspacing. These parameters increase at different rates. Too great an increase in the WSP creates an overconcentration of originally adjacent subbands. The end result is a decrease in the WMA and hence reduced key security. Permutation keyspacing limits were set by experimental testing designed to verify the calculated WSP and WMA values. In addition, the following constraints were established. 1) No significant subband must remain in its original position. 2) A maximum of two significant subbands (preferably originally nonadjacent) can remain in the AS cluster. 3) A minimum of 8 (and a maximum of 16) subbands must change group (4 or 8 from A , 4 or 8 from B ) . 4) At least two of the four subbands in A must be significant. 5) No more than two significant subbands must exist in a cluster of four. The number of possible permutations on 24 elements is 24!. Dividing these into two 12-element groups and permuting each subband gives

weight to be applied to the local mean, WK constant = 0.5, N number of significant subbands present in the local mean, and NA number of pairs of significant subbands resultingadjacent in the local mean.

where

m

=

-6.8

N,.

=

38.

lo7

478

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 7. NO. 4. MAY 1989

The value of the numeric constant m has been derived by using constraints 1) and 2) and by making some pessimistic assumptions. The value of the numeric constant N , has been obtained starting from the constraint 5 ) , yet under pessimistic assumptions. The interested reader can find a detailed derivation of this constant in [8]. Equation (12) was obtained under pessimistic assumptions of subband distribution, thereby causing an underestimation of available keyspace, given by

Kmin= 1.3

*

lo2’.

Kmin= 2.3

-

10”.

(13) With the last constraint, i.e., that the original positions of the two significant subbands remaining in the AS cluster be nonadjacent, the estimate is further decreased to

(14)

A consequence of introducing the WSP and WMA parameters is that several conditions regarding frequency scrambling must be satisfied during key encryption. These constraints determine a reduction in the theoretical keyspace without providing a solution to the problem of key diversity. In Fig. 4, where KT is the set of theoretical keyspaces and Ku is the set of useful keys satisfying the constraints imposed on the frequency scrambling, it can be seen that K,, which is a subset of KU containing a set of “distinct” useful keys from which the encryption keys are taken, is characterized by the additional constraint of encryption key diversity. As illustrated in Fig. 5, with key [ K , ] belonging to the set of useful keys (KO)and its inverse one [ K , ] - ’ to the set of inverse keys ( K ; ’ ) , inverse key [ K , ] - ’ must be such that only its exact counterpart [ K 1] can be retrieved in K D ; in other words, once generic key [ K1I-’ has been assigned, only [ K , ] may be retrieved. Hence, the following general conditions may be stated. 1) Each key in KO must produce an unintelligible encrypted signal. 2) Each inverse key in K,’ must produce an unintelligible signal if applied to a signal encrypted with a direct key not corresponding to the inverse key. Having satisfied the conditions for creating an unintelligible encryption signal, 1) and 2) can be applied to the following way. With keys [ K , ] and [ K 2 ] belonging to set K D , the ap- I[ K ,1 must result in a permutation plication of 1 ~ ~ 1 to exceeding the unintelligibility threshold so that [K31 = LK21-l * [ K I ] . (15) In other words, the WSP of permutation [ K 3 ] must exceed the unintelligibility threshold. Consequently, in generating set K,, it is also necessary to respect key diversity constraints. Fortunately, however, given the definition of key diversity, virtually all of the results obtained for frequency scrambling may be used for key diversity. The diversity constraint was determined as follows. 1) Several direct permutations were extracted from set KU*

Fig. 4 . Key domains.

2) Modifications were made on the corresponding inverse permutations in order to determine the manipulation to achieve unintelligible reconstructed signals. 3) The conditions determined in 2) were transformed into constraints in the direct key structures. Two conditions must be satisfied if encryption keys [ K ,3 and [ K 2 ] are to differ. 1) If a, b, c , d , e, andfindicate the positions occupied by the significant elements in [ K , ] ,then the same positions in permutation [ K 2 ] may be occupied by a maximum of two significant elements, neither of which is in the same position as in [ K, 1. 2) At least two-thirds of the remaining positions occupied by the insignificant elements in [ K , ] must be modified in permutation [ K 2 ] , so that the insignificant elements coincide in a maximum of six positions. If keys [ K l ] and [ K 2 ] satisfy these two conditions, then the permutation [ K , ] in (15) has the following characteristics. No significant element regains its original position, and a maximum of two of the first six elements are significant; this means that the local constraint on the significant elements is satisfied. At most six insignificant elements regain their original positions. In the most favorable case, none of the elements in permutation [ K 3 ]should retum to its original position. However, it is not essential to satisfy this condition, which would entail identical constraints on insignificant and significant elements since not all insignificant elements are compelled to change position in different keys to achieve key diversity. Some conclusions may be drawn regarding the foregoing constraints. Key diversity does not require modification of all positions of the insignificant elements; the positions of a maximum of six may coincide in different keys. Only the fact that limitations are imposed on the mobility of the significant elements’ positions does not suffice to guarantee efficient encryption of transmitted signals.

DEL R E

~f