Hash Functions Based on Large Quasigroups - Semantic Scholar

2 downloads 11920 Views 242KB Size Report
Their simple and cheap hardware design is often preferred in real-world applications. ... possible to produce a fixed length digital signature that depends on the ...
Hash Functions Based on Large Quasigroups V´aclav Sn´asˇel1 , Ajith Abraham2 , Jiˇr´ı Dvorsk´y1 , Pavel Kr¨omer1 , and Jan Platoˇs1 1

2

ˇ – Technical University of Ostrava, Department of Computer Science, FEECS, VSB 17. listopadu 15, 708 33 Ostrava-Poruba, Czech Republic {vaclav.snasel}@vsb.cz Department of Computer Science Norwegian University of Science and Technology, Trondheim, Norway [email protected]

Abstract. In this article we discuss a simple hash function based upon properties of a well-known combinatorial design called quasigroups. The quasigroups are equivalent to the more familiar Latin squares and one of their most important properties is that all possible element of certain quasigroup occurs with equal probability. Actual implementations are based on look-up table implementation of the quasigroup, which is unusable for large quasigroups. In contrast, presneted hash function can be easily implemented. It allows us to compute hash function without storing large amount of data (look-up table). The hash function computation is illustrated by experiments summarized in the last section of this paper.

1 Introduction The need for random and pseudorandom sequences arises in many applications, e.g. in modelling, simulations, and of course in cryptography. Pseudorandom sequences are the core of stream ciphers. They are popular due to their high encryption/decryption speed. Their simple and cheap hardware design is often preferred in real-world applications. The design goal in stream ciphers is to efficiently produce pseudorandom sequences keystreams (i.e. sequences that possess properties common to truly random sequences and in some sense are ”indistinguishable” from these sequences). Hash functions map a large collection of messages into a small set of message digests and can be used for error detection, by appending the digest to the message during the transmission (the appended digest bits are also called parity bits). The error will be detected if the digest of the received message, in the receiving end, is not equal to the received message digest. This application of hash functions is only for random errors, since an active spoofer may intercept the transmitted message, modify it as he wishes, and resend it appended with the digest recalculated for the modified message. With the advent of public key cryptography and digital signature schemes, cryptographic hash functions gained much more prominence. Using hash functions, it is possible to produce a fixed length digital signature that depends on the whole message and ensures authenticity of the message. To produce digital signature for a message M , the digest of M , given by H(M ), is calculated and then encrypted with the secret key of the sender. Encryption may be either by using a public key or a private key algorithm. Encryption of the digest prevents active intruders from modifying the message G. Allen et al. (Eds.): ICCS 2009, Part I, LNCS 5544, pp. 521–529, 2009. c Springer-Verlag Berlin Heidelberg 2009 

522

V. Sn´asˇel et al.

and recalculating its check sum accordingly. It effectively divides the universe into two groups: outsiders who do not have access to the key of the encryption algorithm and hence cannot effectively produce a valid checksum, and insiders who do have access to the key and hence can produce valid checksums. We note that in a public key algorithm, the group of insiders consists of only one member (the owner of the private key) and hence the encrypted hash value uniquely identifies the signer. In the case of symmetric key algorithms, both the transmitter and the receiver have access to the secret key and can produce a valid encrypted hash for an arbitrary message and therefore, unique identification based on the encrypted hash is not possible. However, an outsider cannot alter the message or the digest. In the study of hash functions, Information Theory and Complexity Theory are two major approaches. The methods based on information theory provide unconditional security — an enemy cannot attack such systems even if he/she has unlimited power. This approach is generally impractical. In the second approach, some assumptions are made based on the computing power of the enemy or the weaknesses of the existing systems and algorithms, and therefore, the security cannot be proven but estimated by the analysis of the best known attacking algorithms and considering the improvements of the hardware and softwares. In other words, hash functions based on complexity theory are computationally secure. In this paper, we concentrate on the second approach. 1.1 Definitions Definition 1. A function H() that maps an arbitrary length message M to a fixed length hash value H(M ) is a OneWay Hash Function (OWHF), if it satisfies the following properties: 1. The description of H() is publicly known and should not require any secret information for its operation. 2. Given M , it is easy to compute H(M ). 3. Given H(M ) in the rang of H(), it is hard to find a message M for given H(M ), and given M and H(M ), it is hard to find a message M  (= M ) such that H(M  ) = H(M ). Definition 2. A function H() that maps an arbitrary length message M to a fixed length hash value is a Collision Free Hash Function (CFHF), if it satisfies the following properties: 1. The description of H() is publicly known and should not require any secret information for its operation. 2. Given M , it is easy to compute H(M ). 3. Given H(M ) in the rang of H(), it is hard to find a message M for given H(M ), and given M and H(M ), it is hard to find a message M  (= M ) such that H(M  ) = H(M ). 4. It is hard to find two distinct messages M and M  that hash to the same result (H(M ) = H(M  )).

Hash Functions Based on Large Quasigroups

523

2 Construction of Hashing Function Based on Quasigroup Definition 3. Let Q be a nonempty set with one binary operation (∗). Then Q is said to be a grupoid and is denoted by (Q, ∗). Definition 4. A grupoid (Q, ∗) is said to be a quasigroup (i.e. algebra with one binary operation (∗) on the set Q) satisfying the law: (∀u, v ∈ Q)(∃!x, y ∈ Q)(u ∗ x = v ∧ y ∗ u = v). This implies: 1. x ∗ y = x ∗ z ∨ y ∗ x = z ∗ x ⇒ y = z 2. The equations a ∗ x = b, y ∗ a = b have an unique solutions x, y for each a, b ∈ Q. However, in general, the operation (*) is neither a commutative nor an associative operation. Quasigroups are equivalent to the more familiar Latin squares. The multiplication table of a quasigroup of order q is a Latin square of order q, and conversely, as it was indicated in [1,2,8], every Latin square of order q is the multiplication table of a quasigroup of order q. Definition 5. Let A = {a1 , a2 , . . . , an } be a finite alphabet, a k × n Latin rectangle is a matrix with entries aij ∈ A, i = 1, 2, . . . , k, j = 1, 2, . . . , n, such that each row and each column consists of different elements of A. If k = n we say a Latin square instead of a Latin rectangle. Latin square is called reduced (or in standard form) if both the first row and the left column are in some standard order, alphabetical order being convenient. All reduced Latin squares of order n are enumerated for n ≤ 10 as it is shown in [3]. Let Ln be the number of Latin squares of order n, and let Rn be the number of reduced Latin squares of order n. It is easy to see that Ln = n!(n − 1)!Rn . The problem of classification and exact enumeration of quasigroups of order greater than 10 probably still remains unsolved. Thus, there are more then 1090 quasigroups of order 16 and if we take an alphabet A = {0 . . . 255} (i.e. data are represented by 8 bits) there are at least 256!255! . . . 2! > 1058000 quasigroups. Multiplication in quasigroups has important property: It is proved that each element occurs exactly q times among the products of two elements of Q, q 2 times among the products of three elements of Q and, generally q t−1 among the products of t elements of Q. Since there are q t possible ordered products of t elements of Q, this shows that each element occurs equally often among these q t products (see [4]). Definition 6. Let HQ () : Q → Q be projection defined as HQ (q1 q2 . . . qn ) = ((. . . (a ∗ q1 ) ∗ q2 ∗ . . .) ∗ qn Then HQ () is said to be hash function over quasigroup (Q, ∗). The element a is a fixed element from Q.

524

V. Sn´asˇel et al.

Example 1. Quasigroup of modular subtraction has following table representation: 0 1 2 3

3 0 1 2

2 3 0 1

1 2 3 0

The table above defines quasigroup because it satisfies conditions to be Latin Square. Multiplication in the quasigroup is defined in following manner: a ∗ b = (a + 4 − b) mod 4. It is obvious that the quasigroup is neither commutative (1 ∗ 2 = 3, 2 ∗ 1 = 1) nor associative. Value of hash function is H2 (0013) = (((2 ∗ 0) ∗ 0) ∗ 1) ∗ 3 = 2. 2.1 Sketch of Proof of Resistance to Attacks Hash function based on quasigroup is iterative process which computes hash value (digest) for message X = x1 x2 . . . xn . Suppose that HQ (X) = d. Hash function is preimage resistant when it is ”impossible” to compute from given digest source message X. The digest d should be factorized into message Y = y1 y2 . . . yn . In the first step we can divide digest d into two parts y1 and α1 , where d = y1 ∗ α1 . In the second step value α1 needs to be divided into y2 and α2 (α1 = y2 ∗ α2 ) and so for each element yi , 1 ≤ i ≤ n. Because each yi has a same probability of occurrence among products of Q, |Q|n possible choices should be checked to obtain message Y . Definition 7. Quasigroups Q and R are said to be homotopic, if there are permutations satisfying the law: (∀u, v ∈ R)(u ∗ v = π(ω(u) ∗ ρ(v))). We can imagine homotopy of quasigroups as permutation of rows and columns of quasigroup’s multiplication table. Example 2. Table of quasigroup, which is homotopic with quasigroup of modular subtraction: 0 2 1 3

3 1 0 2

2 0 3 1

1 3 2 0

The table was created from table of modular subtraction. The second and the third row were exchanged. Permutations π, ρ are identities and ω = [0213]. For example 1 ∗ 0 = ω(1) ∗ 0 = 2 ∗ 0 = 2. This example can be considered as a method how to construct new quasigroups. In following text we will use quasigroups homotopic with quasigroup of modular subtraction. Three random permutations will be generated and table will be used to modify original table. Such quasigroup we call ”table quasigroup”. Disadvantage of this method is huge space complexity (n2 elements must be stored). Homotopy gives us possibility to compute result of multiplication without table. Three functions must be chosen to calculate permutations π, ρ, ω. Then the multiplication is defined as follows: a ∗ b = π((ω(a) + n − ρ(b)) mod n) .

Hash Functions Based on Large Quasigroups

525

A sequence of n elements were divided into several parts; these were rotated in various directions and exchanged among themselves. Hereafter presented function P1 compute one of these permutations i.e. P 1(x) = ω(x). Two other permutations are implemented in the same way. const unsigned int cQuasiGroupA2::P1(unsigned int x) const { unsigned int Dimension2 = m_Dimension / 2; if (x < Dimension2 * 2) { if (x & 1) x = 2 * ((x / 2 + 1) % Dimension2) + 1; else x = 2 * ((x / 2 + Dimension2 - 1) % Dimension2); } return x; } This enables us to work with large quasigroups. Works that are already known use quasigroups of small order only, or only a small parts of certain quasigroup are there used mainly as a key for Message Authentication Code [3]. These are represented as a look-up table in main memory. Hypothesis mentioned above will be tested in next section.

3 Experimental Results A simple application was created to verify our hypothesis and expectations. Inputs to the application were sets of distinct words, which were extracted from particular text file. The first input was file bible from Canterbury Corpus [5] (section Large files) and it was about 4 megabytes long. About 10000 distinct words were extracted from this file. The second was file latimes from TREC document corpus (see http://trec.nist.gov). The file contained Los Angeles Times volumes 1989 and 1990. And it was about 450 megabytes long. We extracted 200000 distinct words from this text file. To get statistical data about distribution of words in range of hash values and other properties imaginary hash table have been implemented and values in the table were measured. We observed several parameters: 1. divergences in numbers of words in slots for given size of hash table between our hash function and uniform distribution of words in table, 2. distribution of lengths of slots, 3. how many bits are changed, when one bit in input has been inverted, 4. probability of bits alternation in given position, when one bit in input has been inverted.

526

V. Sn´asˇel et al.

3.1 Distribution of Words in Slots The distribution of words in slots of hash table is figured in charts 1, 2. It can be seen from charts 1(a) and 1(b) that distribution of words in slots of imaginary hash table is quite uniform, both for table quasigroup and for analytic one. But in chart 2 there are differences between table and analytic quaisgroup. Moreover analytic results have regular shape. This error is caused by constant parameters of functions that compute permutations in analytic quasigroup. 25

25 Analytic quasigroup

20

20

15

15

Words in slot

Words in slot

Table quasigroup

10

5

10

5

0

0 0

100

200

300

400

500

600

700

800

900

0

100

200

300

400

Slot

500

600

700

800

900

Slot

(a) Table quasigroup

(b) Analytic quasigroup

Fig. 1. Distribution of words in slots for file bible, hash size 997 200

200 Table quasigroup

Analytic quasigroup

150

Words in slot

Words in slot

150

100

50

100

50

0

0 0

1000

2000

3000

4000

5000

0

1000

Slot

(a) Table quasigroup

2000

3000

4000

5000

Slot

(b) Analytic quasigroup

Fig. 2. Distribution of words in slots for file latimes, hash size 5003

3.2 Distribution of Lengths of Slots We can observe good correspondence between distribution of table quasigroup and analytic quasigroup in chart 3(a) for the file bible. For file latimes there is absolute divergence in chart 3(b). 3.3 Probability of Change of Particular Number of Bits We focus on influence of inputs change on resultant value of hash function. Step by step every bit in each input word was inverted and value of hash function was computed.

Hash Functions Based on Large Quasigroups 140

350 Table quasigroup Analytic quasigroup

Table quasigroup Analytic quasigroup

120

300

100

250

Number of slots

Number of slots

527

80

60

200

150

40

100

20

50

0

0 5

10

15

20

0

20

40

60

80

Length of slot

100

120

140

160

180

Length of slot

(a) bible, hash size 997

(b) latimes, hash size 5003

Fig. 3. Distribution of slots’ length

0.3

0.25 Table quasigroup Analytic quasigroup

Table quasigroup Analytic quasigroup

0.25 0.2

Probability of change

Probability of change

0.2

0.15

0.15

0.1

0.1

0.05 0.05

0

0 0

1

2

3

4

5

6

7

8

9

10

11

0

1

2

3

Number of changed bits

(a) bible, hash size 997

4

5

6

7

8

9

10

11

12

13

14

Number of changed bits

(b) latimes, hash size 5003

Fig. 4. Probability of change of particular number of bits

Then Hamming distance between hash values for original word and modified one was measured. It can be seen from chart 4 that both distribution curve has the same shape and they are very close together. It is interesting especially for chart 4(b) with respect of bad characteristic of slots distribution in chart 3(b). Next we perform experiments with analytic quasigroup of order 216 and 232 i.e. for 16 and 32 bit long numbers. Result of the experiment is given in chart 5. 3.4 Probability of Bits Alternation in Given Position Alternations of bits in specific positions in result of hash function were observed. The experiment runs with the same conditions as previous experiment, but we kept track to positions of changed bits. Only minor errors can be seen (chart 6) between table quasigroup and analytic quasigroup. The changes are uniformly distributed over all bits in resultant hash value.

528

V. Sn´asˇel et al. 0.16

0.14 Analytic quasigroup

Analytic quasigroup

0.14

0.12

0.12

Probability of change

Probability of change

0.1 0.1

0.08

0.06

0.08

0.06

0.04 0.04 0.02

0.02

0

0 0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Number of changed bits

Number of changed bits

16

(b) latimes, hash size 232

(a) latimes, hash size 2

Fig. 5. Probability of change of particular number of bits

0.125

0.1 Table quasigroup Analytic quasigroup

Table quasigroup Analytic quasigroup

0.12 0.09 0.115

Probability of change

Probability of change

0.08 0.11

0.105

0.1

0.07

0.06 0.095 0.05 0.09

0.085

0.04 0

1

2

3

4

5

6

7

8

9

0

1

2

3

4

5

Position of bit

6

7

8

9

10

11

12

Position of bit

(a) bible, hash size 997

(b) latimes, hash size 5003

0.045 Analytic quasigroup 0.04

0.035

Probability of change

0.03

0.025

0.02

0.015

0.01

0.005

0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Position of bit

(c) latimes, hash size 232 Fig. 6. Probability of particular bit’s change

4 Conclusion and Future Works We presented hash function based on non-associative algebraic structures. This work is continuation of our paper [6]. The presented hash function can be easily implemented. Comparison between look-up table and analytic quasigroup implementation is given.

Hash Functions Based on Large Quasigroups

529

Analytic quasigroup has some faults in it properties, but there is no need to store large table. For real usage arithmetic of long numbers (i.g. 512 bits) must be adopted. Nonassociative structure - neofield - could be incorporated in our future works.

References 1. Belousov, V.D.: Osnovi teorii kvazigrup i lup. Nauka, Moscow (1967) (in Russian) 2. D´enes, J., Keedwell, A.: Latin Squares and their Applications, Akad´emiai Kiad´o, Budapest. Academic Press, New York (1974) 3. McKay, B., Rogoyski, E.: Latin square of order 10. Electronic Journal of Combinatorics (1995), http://www.emis.de/journals/EJC/Volume_2/cover.html 4. D´enes, J., Keedwell, A.: A new authentication scheme based on latin squares. Discrete Mathematics (106/107), 157–161 (1992) 5. Arnold, R., Bell, T.: A corpus for evaluation of lossless compression algorithms. In: DCC 1997: Proceedings of the Conference on Data Compression, p. 201 (1997) 6. Dvorsk´y, J., Ochodkov´a, E., Sn´asˇel, V.: Hash Functions Based on Large Quasigroups. In: Proceedings of Velikonoˇcn´ı kryptologie, Brno, pp. 1–8 (2002) 7. Ochodkov´a, E., Sn´asˇel, V.: Using Quasigroups for Secure Encoding of File System. In: Proceedings of the International Scientific NATO PfP/PWP Conference Security and Information Protection 2001, Brno, Czech Republic, pp. 175–181 (2001) 8. Smith, J.D.H.: An introduction to quasigroups and their representations. Chapman and Hall, Boca Raton (2007)