An efficient and secure data sharing framework ... - ACM Digital Library

26 downloads 21271 Views 467KB Size Report
Aug 31, 2012 - An Efficient and Secure Data Sharing Framework using. Homomorphic Encryption in the Cloud. Bharath K. Samanthula, Gerry Howser, Yousef ...
An Efficient and Secure Data Sharing Framework using Homomorphic Encryption in the Cloud Bharath K. Samanthula, Gerry Howser, Yousef Elmehdwi, and Sanjay Madria Department of Computer Science, Missouri S&T 500 West 15th Street, Rolla, Missouri 65409

{bspq8, gwhrkb, ymez76, madrias}@mst.edu ABSTRACT

1. INTRODUCTION

Due to cost-efficiency and less hands-on management, data owners are outsourcing their data to the cloud which can provide access to the data as a service. However, by outsourcing their data to the cloud, the data owners lose control over their data as the cloud provider becomes a third party. At first, encrypting the data by the owner and then exporting it to the cloud seems to be a good approach. However, there is a potential efficiency problem with the outsourced encrypted data when the data owner revokes some of the users’ access privileges. An existing solution to this problem is based on symmetric key encryption scheme and so it is not secure when a revoked user rejoins the system with different access privileges to the same data record. In this paper, we propose an efficient and Secure Data Sharing (SDS) framework using homomorphic encryption and proxy re-encryption schemes that prevents the leakage of unauthorized data when a revoked user rejoins the system. Our framework is secure under the security definition of Secure MultiParty Computation (SMC) and also is a generic approach - any additive homomorphic encryption and proxy re-encryption schemes can be used as the underlying sub-routines. In addition, we also modify our underlying Secure Data Sharing (SDS) framework and present a new solution based on the data distribution technique to prevent the information leakage in the case of collusion between a user and the Cloud Service Provider.

Cloud computing [2] is a means by which highly scalable and technology enabled services can be easily consumed over the Internet on an as-needed-basis. This innovative paradigm has generated a significant interest in both the marketplace and the academic world, resulting in a number of notable commercial and individual cloud computing services, e.g., from Amazon, Google, Microsoft, Yahoo, and Salesforce. Top database vendors such as Oracle are adding cloud support to their databases. Cloud Computing is clearly one of today’s most enticing technologies, at least in part due to its cost-efficiency and flexibility. However, several security issues in the cloud are impeding the vision of cloud computing as a new IT procurement model. Security concerns preventing companies from taking advantage of the cloud can be categorized into three categories [5]:

Categories and Subject Descriptors K.6.5 [Management of Computing and Information Systems]: Security and Protection (D.4.6)

General Terms Security, Management

Keywords Privacy, Cloud Computing, Homomorphic Encryption, Proxy Reencryption

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Cloud-I ’12, August 31 2012, Istanbul, Turkey Copyright 2012 ACM 978-1-4503-1596-8/12/08...$15.00.

• Traditional security - Involves concerns related to computer and network intrusions. Some of these attacks include VMlevel attacks [18], cloud provider vulnerabilities [23], phishing cloud provider, authentication and authorization, and forensics in the cloud. Cloud providers respond to these concerns by arguing that their security measures and processes are more mature and tested than those of the average company. • Availability - Involves concerns centering on critical applications and data availability. Server uptime issues, single points of failure and attack, and the inability of an enterprise to insure that the cloud provider is faithfully running a hosted application and giving the valid results make companies nervous. • Third-party data control - For the data owner who outsources data to the cloud, the cloud acts as a semi-trusted third-party. Data placed in the cloud can reside anywhere. Outsourcing data to the Cloud should not lead to de facto outsourcing of data control to the cloud [5]. One of the main security concerns in cloud computing is how the data is being used by a third party Cloud Service Provider (CSP). The legal implications of the data and applications being held by a third party are complex and are not well understood [18]. There is also a potential lack of control and transparency when a third party holds the data. Some of the resulting security concerns include due diligence, auditability, contractual obligations, cloud provider espionage, data lock-in, and the transitive nature of the data control. Trusted computing and applied cryptographic techniques [26] may offer new tools to solve these problems. However, more research needs to be done to alleviate much of today’s fear of data security problems in cloud computing.

In general, encryption is a useful tool for protecting the confidentiality of sensitive data so that even if a database is compromised by an intruder, the data remains protected even after the database has been successfully attacked or stolen. Provided that the encryption is done properly and the decryption keys are not also accessible to the attacker, encryption can provide protection for the sensitive data stored in the database, reduce the legal liability of the data owners, and reduce the cost to society of fraud and identity theft. However, there remain issues with limiting user access to unauthorized fields, efficiently revoking users’ privileges without re-encrypting massive amounts of data and re-distributing the new keys to the authorized users, collusion between users, and issuing changes to a user’s access privileges. In this paper, we investigate efficient methods for handling user access rights, revoking those rights efficiently, and issuing either same or different access rights to a returning user. We also address the issues of security from a “curious” cloud, collusion among users, and collusion between a user and the Cloud Service Provider.

2.

PROBLEM STATEMENT

In this work, we propose a scheme to achieve fine-grained data sharing and access control over data in the cloud. In contrast to the scheme proposed by Yang [25], which is based on attributebased encryption and proxy re-encryption, we propose a new Secure Data Sharing (SDS) framework using homomorphic encryption and proxy re-encryption as the underlying sub-routines.

different access privileges, potential collusion between a revoked user Bob and an authorized user Charles, and collusion between a user and the Cloud Service Provider. We propose a new framework that addresses these issues not addressed earlier altogether in one solution.

3. OUR CONTRIBUTION We present a scheme to achieve fine-grained data sharing/access control over data outsourced to the cloud. Our scheme relies on homomorphic encryption and proxy re-encryption to overcome the issues noted above. The proposed SDS framework has the following features: • Efficient user revocation - In our scheme, similar to Yang [25], revocation of user privileges does not require either reencryption of the entire data set or the distribution of new keys to all authorized users. The cloud simply removes the corresponding entry of the revoked user from the authorization list under the data owner commands. • Efficient and secure re-join of a previously revoked user If at some future time, the revoked user rejoins the system, whether with the same or different access privileges, all the data owner Alice needs to do is to add a new entry in the authorization list following the same procedure as used to authorize any new user. The revocation of user privileges or the rejoin of previously revoked user does not affect other users because no key re-distribution or data re-encryption is required. • Prevention of collusion between a user and the CSP - In our modified SDS framework, more details are provided in Section 6.2, the encrypted data and the authorization token list can be outsourced to separate Cloud Service Providers, thus rendering any collusion between one CSP and a user useless. The likelihood of multiple CSPs colluding with a user is negligible in practice, to say the least. • Prevention of collusion between a revoked user and an authorized user - Collusion between a revoked user Bob and an authorized user Charles would also be unsuccessful. The authorized user Charles can successfully decrypt only the data the owner, Alice, has authorized. The decryption of all other data fields (i.e., un-authorized fields) will always yield a value of 0. Thus collusion between Bob and Charles would yield only data fields to which Charles currently holds access. Hence, collusion between Bob and Charles is useless.

Figure 1: System Model In our problem setting, see Figure 1, the data owner Alice encrypts her data locally to ensure privacy. Alice wishes to outsource the data and stores it on the cloud for easy user access. To facilitate a fine-grained access control, a set of attributes is associated with each data record which helps to control user access to a specific set of data fields for each authorized user, e.g. Bob. The data owner Alice then issues a decryption key for each authorized user according to his access rights. There are potential issues in the similar fine-grained share/access control proposed by Yang [25]. These issues include re-authorizing a revoked user Bob who later rejoins the system with potentially

• Generic Approach - The proposed SDS framework is a generic scheme. Any additive homomorphic encryption [15, 17] and proxy re-encryption [3,4] methods can be used as underlying sub-routines.

4. RELATED WORK We will present a brief survey of related work which touches upon the cloud security. Background Works on Cloud Computing. Cloud computing is an emerging field with many research and implementation challenges. While it is tempting to view cloud computing in terms of existing computing models, the cloud presents many unique characteristics outlined in [2]. Data security in the cloud is one of them.

General Security Issues Related to the Cloud. Because of the nature of cloud computing, there are a number of new security risks as well as the same traditional risks [6,7,12,18,20]. Efforts are underway to help guide users to a secure experience in the cloud and to develop standards to address security models [10, 12, 14], but there is still much work to be done [1].

Definition 1. Let x be the data owned by Alice with public key pka . Without loss of generality, assume that T (the oracle) knows the proxy re-encryption key denoted as rkpka →pkb , where pkb is the Bob’s public key. If Alice computes Epka (x) and hands it to T , then T re-encrypts it for Bob:

Data Security Issues. Data Security presents a number of general issues as well as issues specific to data in the cloud. Some proposed schemes rely upon the published SLA of the provider [5,16]. Efforts are underway to provide the user with guidelines to help decide what data to move to the cloud and how to secure that data [13, 22].

where PRE.ReEnc() is the proxy re-encyrption function. After this, T sends the output Epkb (x) to Bob who can decrypt it to retreive x using his private key prb .

Access Control to Encrypted Data in the Cloud. Our approach relies heavily upon encrypted data in the cloud. This approach has been studied in a number of different models [5, 23]. One of the issues in using encrypted data in the cloud is protecting the data from the cloud itself [19, 25, 26]. Another problem with encrypted data in the cloud is the limitations this places upon data searches [21, 26]. A solution to access control is to couple access control with re-encryption to guard data from the cloud and unauthorized access [24, 25, 27].

5.

PRELIMINARIES

While the cloud presents unique challenges, techniques designed for other computing environments can often be moved to the cloud or used in a hybrid environment. Our solution relies heavily upon two specific encryption techniques, additive homomorphic encryption (such as Paillier’s cryptosystem [15]) and proxy re-encryption [3, 4] to secure data in the cloud and in transit from the cloud to a user. We will briefly present some properties related to these techniques and expand upon using those properties later.

5.1

Additive Homomorphic Probabilistic Public Key Encryption (AHPE)

An additive homomorphic probabilistic public key encryption, as explained in [11], involves an encryption function Epk and decryption function Dpr where pk and pr are the public and private keys respectively. Given a ciphertext Epk (x), no one can discover x without pr in polynomial time. In addition, an AHPE system exhibits the following properties: • Epk (x + y) = Epk (x) +h Epk (y) mod N 2 , where x and y are plaintext messages; +h is the additive homomorphic operation and N is the group size of the encryption scheme. • For a given constant c and a ciphertext Epk (x), Epk (c · x) = Epk (x)c mod N 2 . • The encryption function is semantically secure [9] which means that an adversary cannot get any additional information about the plaintext from a given set of ciphertexts.

5.2

Proxy Re-Encryption

Proxy re-encryption (PRE) [3, 4] allows a semi-trusted intermediary, called an oracle, to re-encrypt data for delivery to a specific user without requiring the data to be decrypted and re-encrypted. Furthermore, a key pair can be generated to allow the encrypted data to be delivered in a re-encrypted form such that the end user, we will call Bob, can decrypt the data while the oracle cannot. More formally, we define it as follows:

PRE.ReEnc(Epka (x), rkpka →pkb ) → Epkb (x)

(1)

The observation is that T can first operate on the encrypted data sent by Alice, i.e., Epka (x), using additive homomorphic properties and applies the PRE scheme on the updated data. Therefore, the combination of additive homomorphic properties with the PRE scheme will benifit the data owners to shift the work load to T (i.e., the cloud who is computationally unbounded) and will provide a medium for T to operate on the encrypted data.

6.

PROPOSED SOLUTIONS

We present a new protocol, Secure Data Sharing (SDS), to provide the data owner Alice with fine-grained access control over data outsourced to the cloud environment. SDS allows the owner Alice to authorize a user, Bob or Charles, to securely access the data at any time without the help of Alice. We will then extend this solution to address the problems of collusion between the CSP and a user.

6.1

Proposed SDS Framework

Our SDS protocol is constructed using additive homomorphic encryption and proxy re-encryption schemes. The SDS framework consists of the following five stages: 1. Key Generation and Distribution - In this stage Alice generates two kinds of key pairs based on homomorphic encryption (such as Paillier’s cryptosystem [15]) and appropriately distributes them to the cloud and to the users. In addition, Alice also generates the proxy re-encryption key for each authorized user. More details are given in later parts of this section. 2. Data Outsourcing - In this stage Alice encrypts each data record d and generates the authorization tokens for d. For example, if Bob is authorized to access only a set of attributes in d (say S ⊆ d), then Alice generates the authorization token for Bob, Tbd , as described later, using S and d. After this, the encrypted data records are sent to the cloud along with their corresponding authorization tokens. 3. Data Access - Upon a data request from Bob (can be an authorized or unauthorized user), the service provider checks whether Bob is an authorized user and takes corresponding action. 4. User Revocation - In this stage Alice performs all the actions required to securely revoke the rights of a data consumer Bob. 5. User Rejoin - Consider the case of Alice revoking the access rights of Bob on a data record d. If at some future time, Alice wishes to grant access to Bob on d (which means Bob rejoins the system with respect to d), then the new authorization token Tbd can correspond to either same or different set of attributes which purely depends on the new access privileges granted by Alice.

sisting of n + m attributes. For convenience, we denote di as the ith value in d . For each data record d, that Alice wishes to outsource, she computes d and encrypts it attribute-wise using her master public key pka . That is, Alice computes Epka (d1 + r1 ), . . . , Epka (dn + rn ), Epka (rn+1 ), . . . , Epka (rn+m ). Assume that Epka (d ) denotes the new encrypted data record, that is: Epka (d ) = Epka (d1 ), . . . , Epka (dn+m ) During the actual implementation, Alice need to compute Epka (d ) only once for each data record d. After this, Alice generates the authorization tokens for Epka (d ). We consider the following two cases for each data record d depending on whether Bob is granted access to d or not: • Case 1: If Alice wishes to grant access to Bob for a set of attributes S (⊆ d), then she generates the authorization token for Bob corresponding to d as below: Tbd = {Bob, rkpka →pkb , Epkb (α1 ), . . . , Epkb (αn+m )} Figure 2: The Proposed SDS Framework The overall flow of communication involved in the proposed SDS framework is as shown in Figure 2. Note that Stage 1 involves communication between only the data owner Alice and authorized data consumers (such as Bob). Stages 2, 4 and 5 involves communication between Alice and the cloud. Stage 3 only involves communication between the data consumer and the cloud. Using the SDS protocol, data is outsourced to the cloud only once and thus avoids expensive key re-distribution and data re-encryption tasks in the case of user revocation. 1) Key Generation and Distribution - This stage acts as an initialization step where Alice generates two kinds of key pairs based on the additive homomorphic encryption (such as Paillier’s cryptosystem [15]) as follows: • First, Alice creates her master key pair, (pka , pra ). Alice keeps the private key pra as her secret. • Second, for each authorized user Bob, Alice creates a public/private key pair (pkb , prb ) and sends it to Bob through a secure channel. In addition, Alice’s public key pka and Bob’s public key pkb are treated as public. • Furthermore, Alice generates rkpka →pkb (proxy re-encryption key) for each authorized user Bob (which will be used during the Data Outsourcing Stage) using pra and pkb [3, 4]. 2) Data Outsourcing - Without loss of generality, let us assume that Alice’s data is composed of records and each data record d has n attributes. Each record d can be represented as d1 , . . . , dn , where di is the value of ith attribute of d. In order to mask the number of attributes as well as actual attribute values of d, Alice selects n + m random integers (denoted by r1 , . . . , rn+m ) from the domain ZN for each record d. Note that the value of m may vary for each data record d and is chosen randomly from [1, Rmax ], where Rmax is an additional security parameter decided by the data owner depending on the application’s security requirements. For example, Rmax can be set to 100 for sensitive data such as patient’s medical data. On the other hand, for applications such as U.S Census data Rmax can be set to 10. Let d = d1 + r1 , . . . , dn + rn , rn+1 , . . . , rn+m  be the new data record generated from d con-

Here rkpka →pkb is the proxy re-encryption key generated using the proxy re-encryption method in Stage 1. For 1 ≤ i ≤ n + m, we define αi as follows:  if 1 ≤ i ≤ n ∧ di ∈ S −ri αi = −di otherwise Note that, di = di + ri , for 1 ≤ i ≤ n, and di = ri , for n ≤ i ≤ n + m. In general, N − x is equivalent to −x under ZN . To avoid cluttering the presentation, we simply use −x. • Case 2: If Alice does not wish to grant access to Bob for a data record d, then no authorization token is generated; i.e., Tbd = null. We denote T d as the list of authorization tokens, generated as explained above, of all authorized users who can access either all or some attributes of d. Alice exports the data (T d , Epka (d )) to the cloud. Note that if Tbd is null, then it is not included in T d . The overall steps performed by Alice to generate Tbd using S and d for Bob are as shown in Algorithm 1. In practice, Alice first computes (T d , Epka (d )) for all data records that she wishes to outsource to the cloud. She then uploads the authorization tokens along with the corresponding encrypted data records to the cloud as a single large chunk, thereby reducing the communication cost to a large extent. In addition, our approach can be easily extended to the scenario where a group of users have same set of access attributes. Under this situation, Alice generates a single public/private key pair for the group and sends it to all users of the group. Instead of generating tokens for each user in the group for d, Alice generates a single token for the group by simply including the group ID in the token. For example, consider a group G with two users Bob and Charles who are authorized to access same set of attributes in d, then the corresponding token can be given as: {G-ID, rkpka →pkg , Epkg (α1 ), . . . , Epkg (αn+m )} where pkg and prg are the public and private keys of group G (note that private key prg is known only to Alice, Bob, and Charles) and G-ID denote the group ID of G. 3) Data Access - Consider a user Bob, who sends a data request to the cloud. Upon receiving the data request from Bob, the cloud first checks whether Bob is authorized to access any of the data records. For simplicity, we consider the case of a single data record

Algorithm 1 Data_Outsourcing(d, S) Require: Alice wishes to outsource data record d = d1 , . . . , dn  (Note: The master private key pra is known only to Alice; prb is known to both Alice and Bob; pka and pkb are public.) {Steps 1 - 13 performed by Alice} 1: Picks n + m random numbers r1 , . . . , rn+m (ri ∈ ZN , for 1 ≤ i ≤ n + m) 2: Generates d = d1 + r1 , . . . , dn + rn , rn+1 , . . . , rn+m  3: Epka (d ) ← Epka (d1 ), . . . , Epka (dn+m ) 4: if S = ∅ then 5: for i := 1 to n + m do 6: Compute Epkb (αi ) 7: end for 8: Epkb (α) ← Epkb (α1 ), . . . , Epkb (αn+m ) 9: Tbd = {Bob, rkpka →pkb , Epkb (α)} 10: Add Tbd to T d 11: else 12: Tbd = null 13: end if

Algorithm 2 Data_Access(d) Require: Bob sends a data request to the cloud (Note: The private key prb is only known to Alice and Bob.) {Steps 1 - 10 performed by the cloud} 1: Receive data request from Bob 2: if No entry for Bob in T d then 3: Abort 4: else 5: for i := 1 to n + m do 6: Epkb (di ) ← PRE.ReEnc(Epka (di ), rkpka →pkb ) 7: Epkb (di + αi ) ← Epkb (di ) +h Epkb (αi ) mod N 2 8: end for 9: Sends Epkb (di + αi ) to Bob, for 1 ≤ i ≤ n + m 10: end if {Steps 11 - 14 performed by Bob} 11: if Bob is authorized to access d then 12: Receive data from the cloud 13: di + αi ← Dprb (Epkb (di + αi )), for 1 ≤ i ≤ n + m 14: end if

d. However, similar steps can be used for all data records in the cloud. For a data record d, the cloud checks whether Bob is authorized to access d by using the authorization token list T d associated with d (as discussed in Stage 2). If there is no entry related to Bob in T d , then the cloud simply aborts Bob’s request. On the other hand, if there exists an entry for Bob, i.e., Tbd , then the cloud proceeds as follows:

by Alice on a data record d. Assume that Alice wishes to grant access rights to Bob again on data record d1 . Under this assumption, there are two possible scenarios to consider when Alice grants Bob access to the same data record d:

• Uses the proxy re-encryption key rkpka →pkb in Tbd and converts Epka (d ) to Epkb (d ) by computing Epkb (d1 ), . . . , Epkb (dn+m ).

In scenario 1, Alice generates Tbd using set S as discussed in algorithm 2, sends it to the cloud, and instructs the cloud to add it to T d corresponding to Epka (d ). Similarly, for scenario 2, Alice generates Tbd using the new set of attributes U , sends it to the cloud, and instructs the cloud to add it to T d corresponding to Epka (d ). In practice, Alice could treat both cases as scenario 2.

• Computes Epkb (di + αi ) ← Epkb (di ) +h Epkb (αi ) mod N 2 , for 1 ≤ i ≤ n + m. Where +h is the additive homomorphic property and N is the group size which is also part of pkb i.e., the public key of Bob. • Sends Epkb (d1 + α1 ), . . . , Epkb (dn+m + αn+m ) to Bob. Upon receiving the data from the cloud, Bob decrypts each entry using his secret key prb (which is sent by Alice during Stage 1). That is, Bob performs Dpkb (Epkb (di + αi )) to get di + αi , for 1 ≤ i ≤ n + m. Note that, Bob is able to successfully decrypt only those attributes that he is authorized to access. The attributes he is not authorized to access will yield a value of 0 upon decryption because di + αi = di only if Bob has access rights to the ith attribute of d. The detailed steps involved in the Data Access process between Bob and the cloud for a data record d are given in Algorithm 2. 4) User Revocation - Alice revokes Bob’s access to data record d by simply instructing the cloud to remove the authorization token Tbd corresponding to Epka (d ). Here, we assume that the cloud acts as a semi-honest party (honest but-curious); which follows the rules of the protocol but is free to later use the intermediate results it sees to compromise the security. However, since the data is in encrypted form and the encryption scheme is probabilisitic, the intermediate results computed by the cloud are random values uniformly distributed in ZN . 5) User Rejoin - Let Bob be the user who was revoked earlier

• Scenario 1 - Bob is authorized to access the same set of attributes (S) in d • Scenario 2 - Bob is authorized to access a different set of attributes (U ) in d

Theorem 1. (Correctness) - For any data record d, Bob can successfully retrieve only those set of attribute values he is authorized to access in d from the cloud. The attribute values he is not authorized to access will result in 0 upon decryption. On the other hand, Bob cannot access d if he is not authorized to access d under the assumption that there is no collusion between Bob and the cloud. P ROOF. Without loss of generality, let us assume that S denotes the set of attributes that Bob is authorized (by Alice) to access in d. Since Bob is authorized to access d, he has token Tbd corresponding to Epka (d ) on the cloud. As mentioned in Algorithm 2, upon a data request from Bob, the cloud sends Epkb (d1 + α1 ), . . . , Epkb (dn+m +αn+m ) to Bob. Then, Bob decrypts2 them using his private key prb and gets d1 + α1 , . . . , dn+m + αn+m . For n + 1 ≤ i ≤ n + m, as αi = −di , the value of di + αi is always 0. In addition, for 1 ≤ i ≤ n, we have αi = −ri only if di ∈ S; otherwise we have αi = −di = −ri − di . Therefore, the following property holds for 1 ≤ i ≤ n + m:  if di ∈ S di di + αi = 0 otherwise 1 Note that, if Bob rejoins the system and is authorized for a new data record (other than d - say x), then Alice simply sends the authorization token Tbx to the cloud and asks the cloud to add Tbx to the authorization list corresponding to x. 2 Observe that the decryption function involves a modulo N operation internally.

Table 1: Sample Patient’s Medical data

NAME Tom Cherry David Alex Richard Smith

AGE 36 27 45 43 25 54

SSN 821 163 557 923 629 338

ROOM 63 65 94 20 34 55

DISEASE Miagraine Diabetes Thyroid Diabetes Skin Cancer Cholesterol

Briefly, when Bob has access to di we have αi = −ri and di + αi = (di +ri )+(−ri ) = di . And, when Bob does not have access to di , we have di + αi = di + (−di ) = 0. On the other hand, if Bob is not authorized to access d, then the cloud has no entry for Bob corresponding to Epka (d ) and thus upon a data request from Bob, the cloud will not reveal any information about d to Bob (under the assumption that there is no collusion between Bob and the cloud). Example 1. Let us consider the patient’s medical data, owned by Alice, with attributes NAME, AGE, SSN, ROOM, and DISEASE as shown in Table 1. Without loss of generality, let d = {Cherry, 27, 163, 65, Diabetes}. Now, we consider two different users say Bob and Charles. Bob is the health caretaker for Cherry and is authorized to access NAME, AGE, ROOM, and DISEASE attribute values of d. However, Charles, a friend of Cherry, is authorized to access only the NAME and ROOM attribute values of d. Note that neither Bob nor Charles is authorized to access the SSN of Cherry. Let (pkc , prc ) be the public/private key pair of Charles. Initially, Alice generates the new data record for d. For simplicity, let us assume m = 1 (known only to Alice), then d = Cherry+r1 , 27+ r2 , 163 + r3 , 65 + r4 , Diabetes + r5 , r6 , where ri ’s are randomly chosen by Alice from ZN . Alice encrypts d attribute-wise and creates tokens Tbd and Tcd for Bob and Charles respectively and sends them to the cloud along with Epka (d ) as shown in Table 2. Upon a data request from Bob, the cloud computes Epkb (d ) = Epkb (Cherry + r1 ), Epkb (27 + r2 ), Epkb (163 + r3 ), Epkb (65 + r4 ), Epkb (Diabetes+r5 ), Epkb (r6 ) using the proxy re-encryption key rkpka →pkb and performs a homomorphic addition, as explained in Algorithm 2, and sends the values to Bob for decryption. To make the presentation more clear, we simply use Cherry and Diabetes within encryption. In practice, any string can be converted to a unique integer and the encryption can be directly performed on the corresponding integer representation. For example, string abc can be converted to a unique integer by simply representing each character as a fixed length ASCII value. For example, assuming the fixed length of each character as 3, the integer representation of abc is 970980990. Here the integer representation of character a is 970 (of length 3) and so on. After decryption, Bob successfully learns only the NAME, AGE, ROOM, and DISEASE attribute values of d. The entries corresponding to SSN and the extra field will result in 0. Similarly, Charles can send a data request to the cloud and can retireve only the NAME and ROOM attributes values of d. The final values retrieved by Bob and Charles are as shown in Table 2. Other Functionalities - Our SDS framework also supports the basic operations add, delete, and update on the outsourced data. For completeness, we give a brief description of each of these operations.

• Add - To add a new data record (say x), to Alice’s outsourced data on the cloud, Alice simply exports (T x , Epka (x )) to the cloud where x is the new data record obtained after masking x as explained in Stage 2. • Delete - To delete a data record d from the cloud, Alice simply instructs the cloud to remove (T d , Epka (d )). • Update - To update a data record d, Alice simply instructs the cloud to replace Epka (d ) with the updated one. However, if there is any change in the access rights due to the updated d, then Alice will generate the new tokens corresponding to those changes and send them to the cloud for an update in T d.

6.2

Modified SDS Framework - Handling Collusion

Most of the existing work related to cloud computing assumes that the cloud is semi-honest and there is no collusion between the cloud and a valid or revoked data consumer. The assumption of semi-honest behavior of cloud is reasonable, as the cloud should behave honestly in terms of managing the data owner’s data, processing users’ access requests, and other administrative activities. However, if there is a collusion between a user and the cloud then even the semi-honest behavior of the cloud is violated (since this step is not authorized by the data owner) by the unauthorized leaking of a data owner’s data to the user. From the data owner’s perspective it is very important to prevent data leakage in the case of collusion between a user and the cloud. We now present a new solution to prevent this unauthorized leakage of data in the case of collusion. Data Distribution: In the SDS framework, the outsourced data related to each data record d i.e., T d and Epka (d ) is stored on a single cloud. Since the cloud has both T d and Epka (d ), and Bob (an authorized user of d) has prb , a collusion between the two can reveal the actual attribute values of d. Therefore, to prevent the information leakage due to the above collusion, we distribute the outsourced data. That is, instead of storing the data (T d , Epka (d )) of each data record d on one cloud, we distribute the data between two clouds (resulting in a federated cloud). Alice can choose two different Cloud Service Providers and we assume that the probability of collusion between the two clouds is negligible (which is reasonable in practice). We also assume that the user can collude with at most one of the clouds. Now for each data record d, Alice exports Epka (d ) and list of pairs of proxy re-encryption key and the corresponding user ID of all authorized users, i.e., Bob, rkpka →pkb , to the first cloud (we denote it as the primary cloud), and pairs of Bob, Epkb (α1 ), . . . , Epkb (αn+m ) to the second cloud (we denote it as secondary cloud). During the data access stage, initially Bob sends the data request to the primary cloud. If Bob is an authorized user, the primary cloud converts Epka (d ) to Epkb (d ) using corresponding rkpka →pkb and randomizes each field (in encrypted form using homomorphic property) and sends them to the secondary cloud. Then, the secondary cloud performs homomorphic operations componentwise as mentioned in Stage 3 and sends the results back to the primay cloud. After this, the primary cloud removes the randomness he/she added earlier from each field (in encrypted form) and sends them to Bob. Finally, Bob decrypts them component-wise to get the field values he is authorized to access. We observe that, if a user colludes with exactly one of the clouds, then he/she cannot get any useful information since the information needed to successfully decrypt the unauthorized data is distributed

Table 2: Outsourced data record of Cherry and corrresponding data retrieved by Bob and Charles based on the SDS Framework Data Outsourced to the Cloud (Tbd , Tcd , Epka (d )) is outsourced to the cloud, where Tbd = {Bob, rkpka →pkb , Epkb (−r1 ), Epkb (−r2 ), Epkb (−r3 − 163), Epkb (−r4 ), Epkb (−r5 ), Epkb (−r6 )} Tcd = {Charles, rkpka →pkc , Epkc (−r1 ), Epkc (−r2 − 27), Epkc (−r3 − 163), Epkc (−r4 ), Epkc (−r5 − Diabetes), Epkc (−r6 )} Epka (d ) = Epka (Cherry + r1 ), Epka (27 + r2 ), Epka (163 + r3 ), Epka (65 + r4 ), Epka (Diabetes + r5 ), Epka (r6 ) Data Retrieved by Charles

Dprb (Epkb (Cherry + r1 − r1 )) = Cherry Dprb (Epkb (27 + r2 − r2 )) = 27 Dprb (Epkb (163 + r3 − r3 − 163)) = 0 Dprb (Epkb (65 + r4 − r4 )) = 65 Dprb (Epkb (Diabetes + r5 − r5 )) = Diabetes Dprb (Epkb (r6 − r6 )) = 0

Dprc (Epkc (Cherry + r1 − r1 )) = Cherry Dprc (Epkc (27 + r2 − r2 − 27)) = 0 Dprc (Epkc (163 + r3 − r3 − 163)) = 0 Dprc (Epkc (65 + r4 − r4 )) = 65 Dprc (Epkc (Diabetes + r5 − r5 − Diabetes)) = 0 Dprc (Epkc (r6 − r6 )) = 0

among the two clouds. We consider the following two cases: • Case 1: Consider the collusion between Bob and the primary cloud. Under this scenario, Epka (d ) can be converted to Epkb (d ) using rkpka →pkb . However a component-wise decryption of Epkb (d ) will result in random values since Epkb (di ) is equal to either Epkb (di + ri ) or Epkb (ri ), for 1 ≤ i ≤ n + m. • Case 2: Assume that there is a collusion between Bob and the secondary cloud, then the decryption of Epkb (αi ), for 1 ≤ i ≤ n + m, will also yield random values. Hence, the data on one cloud is not useful without the information on the other cloud; therefore, collusion between a user and one of the cloud cannot leak any valuable information.

7. EXPERIMENTAL RESULTS In this section, we empirically study the efficiency of the proposed protocol. Note that the SDS protocol does not leak any additional information to an user other than what he/she is authorized to access.

7.1 Efficiency R We conducted the experiments on a Linux system with an Intel 3.00GHz CoreTM 2 DUO and 3GB memory. The proposed protocol was implemented in C using the Paillier’s cryptosystem (Note that we can use any additive homomorphic encryption scheme. However, in our experiments, we used Paillier’s encryption scheme due to its efficiency). We generate a data record d with varying number of attributes and the value of m was fixed to 10. The computational time for Alice to generate an authorization token3 Tbd for Bob along with the encryption of data record d (masked data record of d) for key sizes 512 and 1024 (in bits) is as shown in Figure 3. From Figure 3, it is clear that when the number of attributes is changed from 1 to 15, the computational time of Alice varies from 45ms to 3

We do not include the time for generating the proxy re-encryption key since it is computed only once for each authorized user which takes a fixed amount of time

Time (in milliseconds)

Data Retrieved by Bob

500 450 400 350 300 250 200 150 100 50 0

Key Size=512 Key Size=1024

0

2

4

6

8

10

12

14

Number of Attributes Figure 3: Alice computation time for m = 10 86ms for key size 512. A similar trend can be observed for key size 1024 with the curve growing much faster (due to much more costlier exponentiation operations compared to key size 512 bits). Also, when the number of attributes is 5, the computational time of Alice changes from 55ms to 286ms (increasing by a factor of 5) when the key size is changed from 512 to 1024. As shown in Figure 3, a similar trend can be observed for varying number of attributes when the key size is changed from 512 to 1024. In general, for any fixed n and m, we observe that the computational time of Alice increases almost by factor of 5 when the key size is doubled.

8. CONCLUSION Primarily due to cost-efficiency and less hands-on management, data owners are more interested in outsourcing their data to the cloud which can provide access to data as a service. Due to cloud security issues, encrypting the data by the data owner and then outsourcing it to the cloud seems to be a reasonable approach. However, this raises new efficiency issues when the data owner revokes some/all of a users access privileges. In this paper, we propose an efficient and Secure Data Sharing (SDS) framework that prevents information leakage from a previously revoked user rejoining

the system. The proposed framework is secure as per the security definition of Secure Multi-Party Computation [8] and is a generic solution. Furthermore, we also propose a new solution by modifying the proposed SDS framework to prevent the information leakage in the case of collusion between a user and the cloud. The solution is based upon distributing the encrypted data and authorization tokens corresponding to each data record d between two clouds. Our approach prevents the information leakage under the assumption that a user can collude with at most one of the clouds. An alternative solution to prevent the information leakage in case of collusion is to distribute the shares of private key of Bob among multiple clouds and Bob (Threshold Cryptosystem). One possible extension of our work is to develop a hybrid approach by combining the data and key distribution solutions. We are currently implementing the proposed framework in a cloud environment for further evaluation and experimentation.

[12]

[13]

[14]

[15]

[16]

Acknowledgment This research was partially supported by the ONR award N0001411 10256, NSF award IIP 1156098 and a grant from AFRL.

[17]

9. REFERENCES

[1] T. Andrei. Cloud computing challenges and related security issues. Website, 2009. [2] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia. A view of cloud computing. Commun. ACM, 53:50–58, April 2010. [3] G. Ateniese, K. Benson, and S. Hohenberger. Key-private proxy re-encryption. In Proceedings of The Cryptographers’ Track at the RSA Conference, CT-RSA ’09, pages 279–294. Springer-Verlag, 2009. [4] M. Blaze, G. Bleumer, and M. Strauss. Divertible protocols and atomic proxy cryptography. In In EUROCRYPT, pages 127–144. Springer-Verlag, 1998. [5] R. Chow, P. Golle, M. Jakobsson, E. Shi, J. Staddon, R. Masuoka, and J. Molina. Controlling data in the cloud: outsourcing computation without outsourcing control. In Proceedings of the 2009 ACM workshop on Cloud computing security (CCSW), pages 85–90, 2009. [6] K. Dahbur, B. Mohammad, and A. B. Tarakji. Security issues in cloud computing: A survey of risks, threats and vulnerabilities. International Journal of Cloud Applications and Computing (IJCAC), 1, 2011. [7] S. N. Dhage, B. B. Meshram, R. Rawat, S. Padawe, M. Paingaokar, and A. Misra. Intrusion detection system in cloud computing environment. In Proceedings of the International Conference & Workshop on Emerging Trends in Technology, ICWET ’11, pages 235–239, 2011. [8] O. Goldreich, S. Micali, and A. Wigderson. How to play any mental game. In proceedings of the 19th annual ACM symposium on Theory of Computing, pages 218–229, 1987. [9] S. Goldwasser, S. Micali, and C. Rackoff. The knowledge complexity of interactive proof systems. SIAM Journal of Computing, 18:186–208, February 1989. [10] W. Jansen and T. Grance. Draft special publication 800-144: Guidelines on security and privacy in public cloud computing. National Institute of Standards and Technology, U.S. Department of Commerce, 2011. [11] W. Jiang, M. Murugesan, C. Clifton, and L. Si. Similar document detection with limited information disclosure. In

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

IEEE 24th International Conference on Data Engineering, pages 735 –743, April 2008. B. Kandukuri, V. Paturi, and A. Rakshit. Cloud security issues. In IEEE International Conference on Services Computing, pages 517 –520, 2009. D. Lin and A. Squicciarini. Data protection models for service provisioning in the cloud. In Proceeding of the 15th ACM symposium on Access control models and technologies, SACMAT ’10, pages 183–192, 2010. F. Lombardi and R. Di Pietro. Transparent security for cloud. In Proceedings of the 2010 ACM Symposium on Applied Computing, SAC ’10, pages 414–415, New York, NY, USA, 2010. ACM. P. Paillier. Public key cryptosystems based on composite degree residuosity classes. In Advances in Cryptology Eurocrypt ’99, pages 223–238. Springer-Verlag, 1999. S. Pearson. Taking account of privacy when designing cloud computing services. In Proceedings of the Workshop on Software Engineering Challenges of Cloud Computing, CLOUD ’09, pages 44–52, 2009. D. K. Rappe. Homomorphic cryptosystems and their applications. Cryptology ePrint Archive, Report 2006/001, 2006. T. Ristenpart, E. Tromer, H. Shacham, and S. Savage. Hey, you, get off of my cloud: exploring information leakage in third-party compute clouds. In Proceedings of the 16th ACM conference on Computer and communications security, CCS ’09, pages 199–212, New York, NY, USA, 2009. ACM. S. Ruj, A. Nayak, and I. Stojmenovic. Dacc: Distributed access control in clouds. In IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), pages 91 –98, nov. 2011. G. Singh, A. Sharma, and M. S. Lehal. Security apprehensions in different regions of cloud captious grounds. International Journal of Network Security & Its Applications (IJNSA), 3, 2011. M. Singh, P. Krishna, and A. Saxena. A cryptography based privacy preserving solution to mine cloud data. In Proceedings of the Third Annual ACM Bangalore Conference, page 14. ACM, 2010. B. Thuraisingham, V. Khadilkar, A. Gupta, M. Kantarcioglu, and L. Khan. Secure data storage and retrieval in the cloud. In Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom), pages 1 –8, oct. 2010. C. Wang, Q. Wang, K. Ren, and W. Lou. Ensuring data storage security in cloud computing. In International Workshop on Quality of Service, pages 1 –9, july 2009. X. A. Wang and W. Zhong. A new identity based proxy re-encryption scheme. In International Conference on Biomedical Engineering and Computer Science, pages 1 –4, 2010. Y. Yang and Y. Zhang. A generic scheme for secure data sharing in cloud. In 40th International Conference on Parallel Processing Workshops, pages 145 –153, sept. 2011. Z. Yang, S. Zhong, and R. Wright1. Privacy-preserving queries on encrypted data. Computer Security–ESORICS 2006, pages 479–495, 2006. S. Yu, C. Wang, K. Ren, and W. Lou. Achieving secure, scalable, and fine-grained data access control in cloud computing. In Proceedings of IEEE INFOCOM, pages 1–9, 2010.