Cloud Storage Using Convergent Encryption Technique

3 downloads 29960 Views 1MB Size Report
secure De-duplication in server storage using convergent encryption. De-duplication .... identify attacks that exploit client-side De-duplication, allowing an ...
International Journal of Applied Engineering Research, ISSN 0973-4562 Vol. 10 No.79 (2015) © Research India Publications; httpwww.ripublication.comijaer.htm

Cloud Storage Using Convergent Encryption Technique Dr. G.V. Uma2

B. Seetharamulu1 1

2

Research Scholar, Dept. of Information Science & Technology, C E G Anna University, Chennai-25, Mail Id: [email protected] [email protected]

Abstract—the data De-duplication is the technique for eliminating

and location. One potential drawback is that you may unnecessarily store duplicate data for a short time which is an issue if the storage system is near full capacity. In-line De-duplication where the De-duplication hash calculations are created on the target device as the data enters the device in real time. If the device spots a block that it already stored on the system it does not store the new block, just references to the existing block. The benefit of in-line Deduplication over post-process De-duplication is that it requires less storage as data is not duplicated. On other side, it is frequently argued that because hash calculations and lookups takes so long, mean that the data ingestion can be slower thereby reducing the backup throughput of the device. However, certain vendors with in-line De-duplication have demonstrated equipment with similar performance to their post-process De-duplication counterparts. In Source versus Target De-duplication, Source Deduplication ensures that data on the data source is deduplicated. This generally takes place directly within a file system. The file system will periodically scan new files creating hashes and compare them to hashes of existing files. When files with same hashes are found then the file copy is removed and the new file points to the old file. The Deduplication process is transparent to the users Target Deduplication is the process of removing duplicates of data in the secondary store. Generally this will be a backup store such as a data repository or a virtual tape library. The most common methods of implementing De-duplication are:  File-based compare   File-based version   File-based hashing   Block or sub-block version  Block or sub-block hashing 

duplicate copies data in the cloud storage and it has been widely used in cloud. This paper is to provide both data security and space efficiency for De-duplication in single server storage. The Deduplication feature enables us to reduce storage space usage and use network bandwidth efficiently. The scope of the paper is to perform secure De-duplication in server storage using convergent encryption. De-duplication identifies and eliminates redundant data, reducing not only the volume of data stored in database but also the bandwidth required for data transfer. In convergent encryption, encryption keys are generated in a consistent manner from the data content thus, identical file will always encrypt to the same cipher text and in retrieval process they produce same plain text. Space efficiency can be achieved by Dekey approach and AES algorithm is used for data security. Keywords—De-duplication, encryption, key management

I.

Authentication,

Professor, Dept. of Information Science & Technology, C E G Anna University, Chennai-25, Mail Id: [email protected]

convergent

INTRODUCTION

Data De-duplication is specialized data compression technique for eliminating duplicate copies of recur data. This technique is used to improve storage utilization and can also be applied to network data transfers to reduce the number of bytes that must be sent. This type of De-duplication is different from that performed by standard file-compression tools. Whereas these tools identify short repeated substrings inside character files, the intent of storage-based data De-duplication is to inspect large volumes of data and identify large sections such as entire files that are identical, in order to store only one copy of it. The types of data De-duplication are:  Post-process De-duplication  In-line De-duplication  Source versus target De-duplication With post-process De-duplication, new data is first stored on the storage device and then a process at a later time will analyze the data looking for duplication. The benefit is that there is no need to wait for the hash calculations and lookup to be completed before storing the data thereby ensuring that store performance is not degraded. Implementations offering policy-based operation can give users the ability to defer optimization on "active" files, or to process files based on type

A. File-Based Compare File system-based De-duplication is a simple method to minimize duplicate data at the file level, and compare operation within the file system or a file system-based algorithm that eliminates duplicates. An example of this method is comparing the name, size, type and date-modified information of two files with the same name being stored in a system. If these parameters match, pretty sure that the files are copies of each other and that you can delete one of them with

30

International Journal of Applied Engineering Research, ISSN 0973-4562 Vol. 10 No.79 (2015) © Research India Publications; httpwww.ripublication.comijaer.htm

no problems. Name File 1.txt File 2.txt

chunk data. Since the information each user needs to access and decrypt the chunks that make up a file is encrypted using a key known only to the user, even a full compromise of the system cannot reveal which chunks are used by which users. They developed two models for secure Deduplicated storage: authenticated and anonymous. These two designs demonstrate that security can be combined with De-duplication in a way that provides a diverse range of security characteristics. In the models present, security is provided through the use of convergent encryption. In both the authenticated and anonymous models, a map is created for each file that describes how to reconstruct a file from chunks. This file is itself encrypted using a unique key. In the authenticated model, sharing of this key is managed through the use of asymmetric key pairs. In the anonymous model, storage is immutable and file sharing is conducted by sharing the map key offline and creating a map reference for each authorized user. Halevi et al, Harnik et al, Pinkas et al [3] proposed “Proofs of Ownership in Remote Storage Systems” they identify attacks that exploit client-side De-duplication, allowing an attacker to gain access to arbitrary-size files of other users based on a very small hash signatures of these files. More specifically, an attacker who knows the hash signature of a file can convince the storage service that it owns that file, hence the server lets the attacker download the entire file. To overcome such attacks, introduce the notion of proof of ownership (PoWs), which lets a client efficiently prove to a server that that the client holds a file, rather than just some short information about it. They formalize the concept of proof-of-ownership, under rigorous security definitions and rigorous efficiency requirements of Petabyte scale storage systems. Bellare et al, Keelveedhi et al, Ristenpart et al [4] proposed “Message Locked Encryption and Secure Deduplication”, message locked encryption (MLE) where the key under which encryption and decryption are performed is itself derived from the message. MLE provides a way to achieve secure De-duplication, a goal targeted by numerous cloudstorage providers. Based on this foundation, they make both practical and theoretical contributions. On the practical side, they provide ROM security analyses of a natural family of MLE schemes that includes deployed schemes. On the theoretical side the challenge is standard model solutions, and they make connections with deterministic encryption, hash functions secure on correlated inputs and the sample-thenextract paradigm to deliver schemes under different assumptions and for different classes of message sources. Kamara et al, Lauter et al [5] proposed “Cryptographic cloud storage”, it consider the problem of building a secure cloud storage service on top of a public cloud infrastructure where the service provider is not completely trusted by the customer. At a high level, several architectures that are combine recent and non-standard cryptographic primitives in order to achieve the goal. Survey the benefits such architecture would provide to both customers and service providers and give an overview of advances in cryptography motivated specially by cloud storage. Meyer et al, barsky et al [6] proposed “A Study of Practical De-duplication”, collected file system content data

Table 1.1 Duplicate Files size Date Modified 1KB 25/05/2015 1KB 25/05/2015

B. File-Based Delta Version and Hashing More intelligent file-level De-duplication methods actually looks inside character files and compare differences within the files themselves, or compare updates to a file and then just store the differences as a "delta" to the original file. File-based hash actually creates a unique mathematical "hash" representation of files, and then compares hashes for new files to the original. If there is a hash match, the files are the same, and one can be removed. C. Block Delta Version and Hashing Block-based solutions work based on the way data is actually stored on disk and do not need to know anything about the files themselves, or even the operating system being used. Block delta version and hashing solutions can be used on files (unstructured data) and databases (structured data). Block delta version works by monitoring updates on disk at the block level, and storing only the data that changed in relation to the original data block. Block-level delta version can also be used as a method to reduce data replication requirements for disaster recovery (DR) purposes If a block of data on disk at the local site is updated hundreds of times during the time delta between the last replication and the new one, but the replication solution uses block delta version, only the last update to the block needs to be sent, which can greatly reduce the amount of data traveling from the local site to the DR site. Block-level hash works similar to file-level hashing, except every block or chunk of data stored on the disk is mathematically hashed and indexed. the hashes are compared in the index. If the new data hash matches a hash for a block already stored, the new data does not get stored. . II. RELATED WORK First, Shamir et al.[1] proposed ”How to secret a share ”, how to divide data into n pieces in such a way that data is easily reconstruct able from any k pieces, but even complete knowledge of k - 1 pieces reveals absolutely no information about data. This technique enables the construction of robust key management schemes for cryptographic. Some of the useful properties of this (k, n) threshold scheme are: (1) the size of each piece does not exceed the size of the original data. (2) When k is kept fixed, pieces can be dynamically added or deleted without affecting the other Data (i) pieces. Stores et al,Long et al.[2] proposed “Secure Data De-duplication”, it provides both data security and space efficiency in singleserver storage and distributed storage systems. Encryption keys are generated in a consistent manner from the chunk data; thus, identical chunks will always encrypt to the same cipher text. The keys cannot be deduced from the encrypted

31

International Journal of Applied Engineering Research, ISSN 0973-4562 Vol. 10 No.79 (2015) © Research India Publications; httpwww.ripublication.comijaer.htm

from 857 desktop computers at Microsoft over a span of 4 weeks. analyze the data to determine the relative efficiency of data De-duplication, particularly considering whole-file versus block-level elimination of redundancy. It found that whole-file De-duplication achieves about three quarters of the space savings of the most aggressive block-level De-duplication for storage of live file systems, and 87% of the savings for backup images. File fragmentation finding that it is not prevalent, and updated prior file system metadata studies, finding that the distribution of file sizes continues to skew toward very large unstructured files. Ng et al, Wen et al , Zhu et al [7] proposed “Private Data De-duplication Protocols in Cloud Storage”, a private data De-duplication protocol allows a client who holds a private data proves to a server who holds a summary string of the data that he/she is the owner of that data without revealing further information to the server. Notion can be viewed as a complement of the state-of-the-art public data Deduplication protocols of Halevi et al. The security of private data De-duplication protocols is formalized in the simulationbased framework in the context of two-party computations. A construction of private De-duplication protocols based on the standard cryptographic assumptions is then presented and analyzed. The proposed private data De-duplication protocol is provably secure assuming that the underlying hash function is collision-resilient, the discrete logarithm is hard and the erasure coding algorithm can erasure up to fraction of the bits in the presence of malicious adversaries in the presence of malicious adversaries. This is the first De-duplication protocol for private data storage. Pietro et al, Sorniotti et al [8] proposed “Boosting Efficiency and Security in Proof of Ownership for De-duplication‟, De-duplication is a technique used to reduce the amount of storage needed by service providers. It is based on the intuition that several users may want to store the same content. Storing a single copy of these files is sufficient. Albeit simple in theory, the implementation of this concept introduces many security risks. It addresses the most severe one: an adversary claiming to possess such a file. The paper's contributions are manifold: first, introduce a novel Proof of Ownership (POW) scheme that has all features of the state-of-the-art solution while incurring only a fraction of the overhead experienced by the competitor; second, the security of the proposed mechanisms relies on information theoretical rather than computational assumptions, it proposes viable optimization techniques that further improve the scheme's performance. III.

File

Hash Secrete sharing

AES Tag

Encrypted file

File value

N Shares

Encrypted file and value

Server Encrypted file File decryption

N Shares Reconstruct key

file

Fig. 1. Architecture

Dekey using the secret sharing scheme that enables the key management to adapt to different reliability and confidentiality levels. In Secret sharing scheme the secret key is divided into N shares using polynomial equation and the secret key is reconstructed using Lagrange interpolation. Using this key encrypt the file using AES algorithm. Client can upload encrypted file, key shares, tag value to the server and server stores all the contents send the id value to the client for accessing the file in future. The five modules of a system which explained below are: Convergent Encryption, Tag Generation, Secret Sharing Scheme, Outsource and Retrieval of File, Decryption. Convergent encryption provides a viable option to enforce data confidentiality while realizing Deduplication. It encrypts/decrypts a data copy with a convergent key, which is derived by computing the cryptographic hash value of the content of the data copy itself. After key generation and data encryption, users retain the keys and send the cipher text to the cloud. Since encryption is deterministic, identical data copies will generate the same convergent key and the same cipher text. This allows the server to perform De-duplication on the cipher text. The cipher text can only be decrypted by the corresponding data owners with their convergent keys. In this convergent encryption, file is hashed using sha256 algorithm to get hash value which can be used as key for file encryption. The output of sha256 is 256 bits which can be used as key for file encryption using AES algorithm. SHA256: Client needs to encrypt a file and store in a database. Convergent encryption used for secure De-duplication in database. First file is hashed using hash algorithm SHA-256 and this hash value is used as a key to encrypt the file. SHA stands for Secure Hash Algorithm. Cryptographic hash functions are mathematical operations run on digital data; by comparing the computed "hash" to a known and expected hash value, a person can determine the data's integrity. The SHA256 compression function operates on a 512-bit message block and a 256-bit intermediate hash value. It is essentially a 256bit block cipher algorithm which encrypts the intermediate hash value using the message block as key. AES Encryption: The output of hash algorithm which is 256 bits is used as key to the AES encryption. In order to store the file in database by a client in a secure manner, file need to be encrypted using

PROPOSED WORK

The architecture of the system consist of five major components namely Convergent encryption, tag generation, key secret sharing, outsourcing and retrieval of file and decryption. De-duplication, while improving storage and bandwidth efficiency, incompatible with traditional encryption. Specifically, traditional encryption requires different users to encrypt their data with their own keys. Thus, identical data copies of different users will lead to different cipher text, making De-duplication impossible.

32

International Journal of Applied Engineering Research, ISSN 0973-4562 Vol. 10 No.79 (2015) © Research India Publications; httpwww.ripublication.comijaer.htm

Tag Generation :Tag value which is generated from file is unique used for checking De-duplication. Before outsourcing file into database, send tag value to the server to check whether file exists or not in the database. Tag value plays a major role in De-duplication. Instead of sending entire file to server to check data De-duplication, send a unique tag value of the file for check De-duplication. Tag value is unique for each file, so this helps to check De-duplication in an easy manner. An Adler-32 checksum is obtained by calculating two 16bitchecksums A and B, and concatenating their bits into a 32bit integer. A is the sum of all bytes in the stream plus one, and B is the sum of the individual values of A from each step. At the beginning of an Adler-32 run, A is initialized to 1, B to 0. The sums are done modulo65521. The bytes are stored in network order, B occupying the two most significant bytes. Secret Sharing Scheme: The output of hash value is used as a key for encrypting a file, we need to Store the key in the database in a secure manner. So, here we are using secret sharing scheme which means for n parties to carry shares or parts of a message s, called the secret, such that the complete set s1… sn of the parts determines the message. The secret sharing scheme is said to be perfect if no proper subset of shares leaks any information regarding the secret. It refers to methods for distributing a secret amongst a group of participants, each of whom is allocated a share of the secret. The secret can be reconstructed only when a sufficient number, of possibly different types, of shares are combined together; individual shares are of no use on their own. Here the secret is our hash value which is used as a key for file encryption, secret is divided into n shares using secret sharing scheme. Secret sharing schemes are ideal for storing information that is highly sensitive and highly important. Each of these pieces of information must be kept highly confidential, as their exposure could be disastrous; however, it is also critical that they should not be lost. Traditional methods for encryption are ill-suited for simultaneously achieving high levels of confidentiality and reliability. This is because when storing the encryption key, one must choose between keeping a single copy of the key in one location for maximum secrecy, or keeping multiple copies of the key in different locations for greater reliability. Increasing reliability of the key by storing multiple copies lowers confidentiality by creating additional attack vectors; there are more opportunities for a copy to fall into the wrong hands. Secret sharing schemes address this problem, and allow arbitrarily high levels of confidentiality and reliability to be achieved. Efficient Secret Sharing: In this scheme, any t out of n shares may be used to recover the secret. The system relies on the idea that can fit a unique polynomial of degree (t-1) to any set of t points that lie on the polynomial. It takes two points to define a straight line, three points to fully define a quadratic, four points to define a cubic curve, and so on. That is it takes t points to define a polynomial of degree t-1. The method is to create a polynomial of degree t-1 with the secret as the first coefficient and the remaining coefficients picked at random. Next find n points on the curve and give one to each of the players. When at least t out of the n players

strong encryption algorithm. AES algorithm is used for encrypting a file, before storing in a database file needs to be encrypted using this algorithm. AES is an algorithm for performing encryption decryption which is a series of welldefined steps that can be followed as a procedure. The original information is known as plaintext, and the encrypted form as cipher text. The cipher text message contains all the information of the plaintext message, but is not in a format readable by a human or computer without the proper mechanism to decrypt it which should resemble random gibberish to those not intended to read it. The encrypting procedure is varied depending on the key which changes the detailed operation of the algorithm. Without the key, the cipher cannot be used to encrypt or decrypt. AES is based on a design principle known as a substitution- permutation network, combination of both substitution and permutation, and is fast in both software and hardware. Unlike its predecessor DES, AES does not use a Feistel network. AES is a variant of Rijndael which has a fixed block size of 128 bits, and a key size of 128, 192, or 256 bits. AES operates on a 4×4 column-major order matrix of bytes, termed the state, although some versions of Rijndael have a larger block size and have additional columns in the state. Most AES calculations are done in a special finite field. The key size used for an AES cipher specifies the number of repetitions of transformation rounds that convert the input, called the plaintext, into the final output, called the cipher text. The numbers of cycles of repetition are as follows:  10 cycles of repetition for 128-bit keys.  12 cycles of repetition for 192-bit keys.  14 cycles of repetition for 256-bit keys. Here we are using 256 bit keys so it consist of 14 rounds .Each round consists of several processing steps, each containing four similar but different stages, including one that depends on the encryption key itself. A set of reverse rounds are applied to transform cipher text back into the original plaintext using the same encryption key. Each Round consists of 1. Sub-Bytes: a non-linear substitution step where each byte is replaced with another according to a lookup table. The SubBytes () transformation is a non-linear byte substitution that operates independently on each byte of the State using a substitution table (S-box). 2. Shift-Rows: a transposition step where the last three rows of the state are shifted cyclically a certain number of steps. In the Shift-Rows () transformation, the bytes in the last three rows of the State are cyclically shifted over different numbers of bytes 3. Mix Columns: a mixing operation which operates on the columns of the state, combining the four bytes in each column. The Mix Columns () transformation operates on the State column-by-column, treating each column as a four-term polynomial. The columns are considered as polynomials over GF (28) and multiplied modulo + 1 with a fixed polynomial a(x), given by a(x) = {03} x3 + {01} x2 + {01} x + {02}. 4. AddRoundKey: In the AddRoundKey () transformation, a Round Key is added to the State by a simple bitwise XOR operation.

33

International Journal of Applied Engineering Research, ISSN 0973-4562 Vol. 10 No.79 (2015) © Research India Publications; httpwww.ripublication.comijaer.htm

reveals their points is sufficient information to fit a (t-1) Th degree polynomial to them, the first coefficient being the secret. A secret sharing scheme can secure a secret over multiple servers and remain recoverable despite multiple server failures. The dealer may act as several distinct participants, distributing the shares among the participants. Each share may be stored on a different server, but the dealer can recover the secret even if several servers break down as long as they can recover at least t shares however, crackers that break into one server would still not know the secret as long as fewer than t shares are stored on each server. A dealer could send t shares, all of which are necessary to recover the original secret, to a single recipient. An attacker would have to intercept all t shares to recover the secret, a task which is more difficult than intercepting a single file, especially if the shares are sent using different media. For large secrets, it may be more efficient to encrypt the secret and then distribute the key using secret sharing. Secret sharing is an important primitive in several protocols for secure multipartycomputation. A scheme provides an elegant construction of a perfect (t, n), threshold scheme using a classical algorithm called Lagrange interpolation. Given t distinct points (xi, yi) of the form (xi, f (xi)), where f(x) is a polynomial of degree less x−x j that t, then f(x) is determined by f x = ti=1 yi i