Multi-User Private Keyword Search for Cloud Computing - Software ...

10 downloads 87572 Views 272KB Size Report
primitive allowing for private keyword based search over the encrypted database. The above setting of enterprise outsourcing database to the cloud requires ...
2011 Third IEEE International Conference on Coud Computing Technology and Science

Multi-User Private Keyword Search for Cloud Computing Yanjiang Yang∗ , Haibing Lu† , Jian Weng‡§¶ ∗ Institute for Infocomm Research, Singapore Email: [email protected] † Dept. of Operations Management and Information Systems,The Leavey School of Business, Santa Clara University, USA Email: [email protected] ‡ Department of Computer Science, and Emergency Technology Research Center of Risk Evaluation and Prewarning on Public Network Security, Jinan University, Guangzhou, China § State Key Laboratory of Information Security, Institute of Software, Chinese Academy of Sciences, China ¶ State Key Lab of Networking and Switching Technology, Beijing University of Posts and Telecommunications, China Email: [email protected]

access requests. A typical example is that a user may wish to retrieve the set of records that contain a certain keyword; normally, the cloud is hard to pinpoint the requested records within an encrypted database. In the literature, searchable encryption (e.g., [4], [5], [6], [7]) is a cryptographic primitive that can be used to enable the above keyword-based searches upon an encrypted database while without revealing the keywords to the cloud. We thus also call it private keyword search. Almost all of the existing searchable encryption schemes work in the singleuser setting: only the holder of a secret key, which is referred to as query key hereafter, can issue valid search queries. We, however, observe that in the application scenario of enterprise-outsourcing-database, it often requires multi-user searchable encryption where searchable encryption works in a multi-user setting: an enterprise outsources its database to the cloud, and authorizes a group of multiple users (e.g., all its members) to access the database. There are more factors to be considered in the multi-user setting than in the singleuser setting, e.g., user accountability, user dynamics (joining of new users and revocation of existing users). A na¨ıve approach to achieve multi-user searchable encryption is to directly apply a single-user scheme to the multi-user setting by sharing the secret query key among the group of multiple users. This clearly cannot attain user accountability, and in case of user revocation, it entails key renewal among non-revoked users and in turn re-encryption of the outsourced database. Enforcing access control by the cloud before catering to user search requests may address these issues, but deployment of access control in practice is prohibitively expensive as pointed out in [4]; worse yet, users have to maintain an additional set of secrets for authentication/authorization. More importantly, this method does not really take away the revoked users’ search ability: if a revoked user manages to get the encrypted data (more precisely, the indexes) by certain out-of-band channels, she can still search the encrypted database. Curtmola et al. [4] proposed a different approach for

Abstract—Enterprises outsourcing their databases to the cloud and authorizing multiple users for access represents a typical use scenario of cloud storage services. In such a case of database outsourcing, data encryption is a good approach enabling the data owner to retain its control over the outsourced data. Searchable encryption is a cryptographic primitive allowing for private keyword based search over the encrypted database. The above setting of enterprise outsourcing database to the cloud requires multi-user searchable encryption, whereas virtually all of the existing schemes consider the single-user setting. To bridge this gap, we are motivated to propose a practical multi-user searchable encryption scheme, which has a number of advantages over the known approaches. The associated model and security requirements are also formulated. We further discuss to extend our scheme in several ways so as to achieve different search capabilities. Keywords-cloud computing; searchable encryption; private keyword search; privacy; data outsourcing

I. I NTRODUCTION Storage services over cloud, e.g., Microsoft’s Azure storage and Amazon’s S3, are a fundamental component of cloud computing, which allow the customers to outsource their databases to the regime of a cloud. Database outsourcing relieves the customers from building and maintaining their proprietary databases, which usually is extremely costly. However, the cloud customers would worry that their data would be abused without their consent or even awareness, among others. It is thus ideal that database outsourcing does not deprive the customers of their control over the data [1]. Encryption of the data in outsourcing is deemed a good approach in attaining this objective, as well as solving other issues such as regulatory compliance, and geographic restrictions [2], [3]. However, data encryption would greatly restrict the cloud’s ability in handling user Corresponding Author: Jian Weng, is supported by the National Science Foundation of China under Grant Nos. 60903178 and 61133014, the Fundamental Research Funds for the Central Universities under Grant No. 21610204, and the Guangdong Provincial Science and Technology Project under Grand No. 2010A032000002.

978-0-7695-4622-3/11 $26.00 © 2011 IEEE DOI 10.1109/CloudCom.2011.43

264

achieving multi-user searchable encryption, which transfers their single-user searchable encryption scheme to one working in the multi-user setting. Their idea is to integrate broadcast encryption (e.g., [8]): users encrypt their search queries using the broadcast encryption before submitting to the server who hosts the database. Since the server also knows the broadcast encryption key, it thus can decrypt and obtain user search queries. User dynamics are handled by the underlying broadcast encryption, which guarantees that only the set of authorized users and the server can use the broadcast encryption. We remark that Curtmola et al.’s method essentially implements “implicit” access control, in the sense that a non-authorized user cannot issue valid search requests to the server. Like above, the revoked users still retain their search ability as they can still search the encrypted database as long as they are given the database. Furthermore, broadcast encryption in general is a quite expensive primitive, and using broadcast encryption cannot achieve user accountability. Towards enabling private keyword search in the application scenario of enterprise-outsourcing-database-to-thecloud, we are motivated to study searchable encryption in the multi-user setting. In particular, our contributions are as follows. ∙





II. R ELATED W ORK Our interest is to enable multiple users to perform private keyword-based search over an encrypted database that was outsourced to the cloud, while without letting the cloud learn the keywords in question. The set of techniques that can fulfil this objective is generally referred to as searchable encryption in the literature. The first practical searchable encryption scheme is due to Song et al. [7], who consider searches across encrypted keywords within a file with an overhead linear to the file size. Subsequently, Goh [6] and Chang and Mitzenmacher [5] propose to search encrypted indexes of a set of documents. Their approaches improve the search efficiency at the cost of a large storage for the constructed indexes, where the bit-length of the index for each document is proportional to the total number of keywords. Simultaneous search of conjunctive keywords is considered in [9], [10], aiming to improve the efficiency of the trivial method of searching each keyword separately. A formal security notion of searchable encryption is defined in [4], which also presents schemes secure against non-adaptive/adaptive adversaries. Yang et al. [11] consider searchable encryption for dynamic databases, where updates to the encrypted database are accommodated. Lately, Wang et al. [12] propose ranked keyword search over encrypted files, which not only finds files containing certain keywords, but also orders the matching files according to, e.g., keyword frequency. All of the above schemes belong to secret-key searchable encryption, where the same (secret) key is used to generate both the encrypted indexes and the search queries. The first public-key searchable encryption scheme is due to Boneh et al. [13], where the private key holder can perform a search among messages encrypted under the corresponding public key. Abdalla et al. [14] further analyze the consistency property of public-key searchable encryption, and demonstrate a generic construction by transforming an anonymous identitybased encryption scheme. Conjunctive keyword search in the public key setting has been studied by Hwang et al. [15]. It should be stressed that one of the main differences between secret-key searchable encryption and its publickey counterpart is that the latter can no longer retain query privacy once a query is known, because one can enumerate and encrypt each possible keyword using the public key, and then match the query in question. Over time, searchable encryption has evolved into predicate encryption (e.g., [16], [17], [18]), which represents a generalized concept covering various cryptographic primitives such as identity-based encryption, attribute-based encryption, and beyond. Predicate encryption targets at handling more complex queries upon encrypted data, e.g., range queries, disjunctions, and so on. While being more powerful, predicate encryption works at the price of higher computation cost. Considering the expected large volume of

Tailored to the application scenario, we formulate a model for multi-user searchable encryption, and set out the security requirements. Under the model, we propose an efficient scheme, which not only achieves the conventional query privacy, but also possesses the following features. – Distinct Query Keys. Each authorized user has a distinct query key for constructing search queries. This makes user revocation and accountability possible in our scheme. – Efficient Yet Complete User Revocation. Our scheme allows for very efficient user revocation: revocation of a user does not affect other nonrevoked users at all, requiring neither key renewal for non-revoked users, nor update to the encrypted database including the index. This is the best we can expect for user revocation in terms of efficiency. Moreover, revoked users completely lose their search privileges, given that the semi-trusted cloud has destroyed the related helper keys (the concept will be explained later) as instructed. – Query Unforgeability. In our scheme, no one including the cloud can generate valid search queries on behalf of other authorized users. Query unforgeability turns out to be an important property in the multi-user setting where accountability is desired. We discuss to extend our scheme in a number of ways, in order to achieve different search capabilities such as fuzzy keyword search and conjunctive keyword search.

265

user accesses over the cloud, the actual feasibility of using the primitive, e.g. [19], is unclear. We point out that none of the above schemes, searchable encryption or predicate encryption, fully consider the multiuser setting that we are desiring, in the sense that a single secret key (which we call query key) is used to generate search queries. This means that in these schemes, only the holder of the query key can search the encrypted database. Sharing the secret query key among all authorized users and complementing single-user searchable encryption with access control is a straightforward approach to achieve multi-user searchable encryption, but it has the weaknesses of expensive operation and “incomplete” user revocation effect, as mentioned earlier. Curtmola et al.’s approach of transforming single-user searchable encryption to multi-user searchable encryption [4] suffers from similar weaknesses. Our scheme solves all these issues. Of particular advantage is ours high efficiency of user revocation, as no non-revoked users are affected all all due to user revocation. As a final note, the scheme proposed in this work can be viewed as an adaptation of our earlier work [20] to the setting of enterprise-outsourcing-database-to-cloud. The scheme in [20] applies to the scenario that multiple users are entitled to not only search but also write to the database, whereas the current work considers that a single data owner (i.e., the enterprise) writes to the database, while multiple users are authorized to search. The two in fact complement each other. In addition, compared to [20], the extended schemes contained in Section V, e.g., implementing separation of duty, fuzzy keyword search, are new results.

user enrolment and user revocation). For security reasons, the database is encrypted against the cloud. To be able to access the database, each authorized user is issued a distinct query key by the enterprise. Only authorized users who have valid query keys can generate valid access queries (this work is restricted to keyword based search queries). The main objective of the system is to enable the cloud to process users’ search queries, while without learning the keywords contained in the queries and the plaintext content of the encrypted records. B. System Model Geared to the application scenario in Figure 1, the system consists of {𝐷, ENT, CLD, 𝒰 }, where 𝐷 is a database, ENT is the enterprise, CLD is the cloud providing storage services, and 𝒰 is a set of users. The database 𝐷 is composed of a number of records {𝑑1 , 𝑑2 , ⋅ ⋅ ⋅} of multiple attributes, and one attribute is keyword used for search. The domain of the keyword attribute is denoted by 𝒲. The keyword of 𝑑𝑖 is denoted by 𝑑𝑖 .𝑤. CLD hosts an encrypted version of 𝐷, denoted by 𝐷′ = {𝑑′1 , 𝑑′2 , ⋅ ⋅ ⋅}, where 𝑑′𝑖 = ⟨𝐼𝑛𝑑𝑥(𝑑𝑖 .𝑤), 𝐸𝑛𝑐(𝑑𝑖 )⟩: the 1𝑡ℎ component is an index generated from 𝑑𝑖 .𝑤, which is used for search, and the 2𝑛𝑑 is an encryption of the remaining attributes of 𝑑𝑖 . Each authorized user 𝑢 ∈ 𝒰 is issued a distinct query key 𝑞𝑘𝑢 by ENT, and can issue search queries based on her chosen keywords by using 𝑞𝑘𝑢 . We use 𝑞𝑢 (𝑤) to denote a query from user 𝑢 on keyword 𝑤 ∈ 𝒲. On receiving query 𝑞 = 𝑞𝑢 (𝑤), CLD is expected to return ℛ𝒫ℒ𝒴 𝑞 = {𝐸𝑛𝑐(𝑑𝑖 ) ∣ 𝑑𝑖 ∈ 𝐷, 𝑑𝑖 .𝑤 = 𝑤}. ENT can revoke an authorized user’s search privileges, so 𝒰 = 𝒰𝐴 ∪ 𝒰𝑅 , where 𝒰𝐴 (resp. 𝒰𝑅 ) is the set of authorized users (resp. revoked users). Formally, the system consists of the following algorithms:

III. M ODEL AND S ECURITY D EFINITIONS A. Application Scenario We consider the use scenario of cloud storage services shown in Figure 1, where an enterprise outsources its database to the cloud and authorizes a group of multiple users to access the database. More specifically, the enterpise

- Setup(1𝜅 ). This algorithm is executed by ENT to set up the system-wide parameters, where 𝜅 is the security parameter. It outputs a master secret key 𝑀𝐾𝐸𝑁𝑇 and the record encryption key 𝑒𝑘 for a semantically secure symmetric key encryption scheme 𝐸𝑛𝑐(.). - AddUser(𝑀𝐾𝐸𝑁𝑇 , 𝑒𝑘, 𝑢). This algorithm is executed by ENT to enroll user 𝑢 to the system. Taking as input 𝑀𝐾𝐸𝑁𝑇 , 𝑒𝑘 and user identity 𝑢, it outputs a pair (𝑞𝑘𝑢 , ℎ𝑘𝑢 ), where 𝑞𝑘𝑢 is the query key that will be used by 𝑢 to generate search queries, and ℎ𝑘𝑢 is a helper key that helps CLD in processing user queries. 𝑞𝑘𝑢 and 𝑒𝑘 are then securely passed to 𝑢, and ℎ𝑘𝑢 is securely passed to CLD who then updates its U-HKey list by inserting a new entry (𝑢, ℎ𝑘𝑢 ). Note that CLD maintains a U-HKey list, with each entry containing a user’s identity and her helper key. - RemoveUser(𝑢). This algorithm is executed by ENT to revoke a user’s search capability. As a result, 𝒰𝐴 = 𝒰𝐴 ∖ {𝑢}, 𝒰𝑅 = 𝒰𝑅 ∪ {𝑢}.

Cloud

Access

Da ta

O

ut so ur cin

g

Authorization

Enterprise Users

Figure 1. Application scenario of enterprise-outsourcing-database-to-cloud

is responsible for updating the database (e.g., add/delete records), and also for managing authorization of users (e.g.,

266

- WriteRecord(𝑀𝐾𝐸𝑁𝑇 , 𝑒𝑘, 𝑑𝑖 ). This algorithm allows ENT to write an encrypted record 𝑑′𝑖 = ⟨𝐼𝑛𝑑𝑥(𝑑𝑖 .𝑤), 𝐸𝑛𝑐(𝑑𝑖 )⟩ to 𝐷′ hosted by CLD, where the first element of 𝑑′𝑖 is generated by using 𝑀𝐾𝐸𝑁𝑇 and the second is by 𝑒𝑘. - GenQuery(𝑞𝑘𝑢 , 𝑤). This algorithm is run by a user 𝑢 to generate a search query. On input the query key 𝑞𝑘𝑢 and a chosen keyword 𝑤, it outputs a query 𝑞𝑢 (𝑤). - Search(𝑞𝑢 (𝑤), ℎ𝑘𝑢 , 𝐷′ ). This algorithm is run by CLD to search 𝐷′ for records containing 𝑤. That is, on a query 𝑞𝑢 (𝑤), it locates and outputs ℛ𝒫ℒ𝒴 𝑞 = {𝐸𝑛𝑐(𝑑𝑖 ) ∣ 𝑑𝑖 ∈ 𝐷, 𝑑𝑖 .𝑤 = 𝑤} with the help of ℎ𝑘𝑢 . Correctness. A multi-user searchable encryption system is correct if an authorized user can always get the correct query reply. More formally, ∀𝑢 ∈ 𝒰𝐴 , ∀𝑤 ∈ 𝒲, Search(GenQuery(𝑞𝑘𝑢 , 𝑤), ℎ𝑘𝑢 , 𝐷′ ) = {𝐸𝑛𝑐(𝑑𝑖 ) ∣ 𝑑𝑖 ∈ 𝐷, 𝑑𝑖 .𝑤 = 𝑤}.

by users are the first step towards accountability. Further, user accountability roughly suggests that a query should be uniquely bound to its issuer as it is claimed. This property is defined to be query unforgeability in this work, which specifically, mandates that other users (or CLD) cannot generate valid search queries on behalf of a user. Before specifying query unforgeability, we first define validity of user queries. For a user 𝑢 ∈ 𝒰 , we define 𝑢’s valid query set as 𝑄𝑢 = {𝑞𝑢 (𝑤)∣𝑞𝑢 (𝑤) ← GenQuery(𝑞𝑘𝑢 , 𝑤), 𝑤 ∈ 𝒲}. Namely, a query is user 𝑢’s valid query if it is indeed constructed by running GenQuery(.,.) with 𝑞𝑘𝑢 . Therefore, an informal meaning of query unforgeability is that for any user 𝑢, no adversary is able to compute 𝑞 satisfying 𝑞 ∈ 𝑄𝑢 without 𝑞𝑘𝑢 . More formal definition could proceed in a game, where the adversary is allowed to ask queries of its chosen keywords under the query key of a target user, and at the end of the game, the adversary forges a valid query of a keyword in the name of the target user. Query unforgeability requires that the adversary has a negligible probability in a successful forgery. The definitional rationale is quite similar to the existential unforgeability of digital signature, hence we do not elaborate. We do want to stress that there are two types of adversaries to be considered in our setting: a coalition of authorized users, and CLD, respectively. They have different knowledges and attack capabilities. Revocability. User revocation is essential in the multi-user setting. It is important to allow ENT to revoke the search capabilities of users who are deemed no longer appropriate to search the database. Our definition of revocability is based on the following observation: a user’s search queries are processed by CLD by using the user’s helper key; without the helper key, it is impossible to perform searches. As the incapability of searching is implied by the incapability of distinguishing database indexes (i.e., 𝐼𝑛𝑑𝑥(𝑑𝑖 .𝑤)), we define revocability based on index indistinguishability. More specifically, the adversary against revocability runs in two phases: in the first phase, the adversary is allowed to freely ask for indexes on the keywords of its choices. At the end of the first stage, the adversary selects two new keywords 𝑤1 and 𝑤2 , which have not been queried thus far. In the second phase, the adversary is revoked, and is given the index of one of the two keywords. The adversary finally guesses which keyword is contained in the index. Revocability requires that the probability that the adversary’s correct guess should not be significantly more than 1/2.

C. Security Requirements We formulate the following security requirements for multi-user searchable encryption. Query Privacy. Query privacy is a security notion regulating the amount of information that user queries can leak to CLD. It should be noted that by processing user search queries, CLD inevitably learns the database access patterns, e.g. two queries have the same reply. Query privacy mandates that apart from the information that can be acquired via observation, user queries should not reveal no other information to CLD. More formally, for a record 𝑑𝑖 , we use 𝐼𝑑(𝑑𝑖 ) to denote the identifying information which is uniquely associated with 𝑑𝑖 , such as its database position or its memory location; for a user 𝑢, we let 𝐼𝑑(𝑢) denote the position of the entry belonging to 𝑢 in the U-HKey list. Let ℚ𝑡 = (𝑞1 , ⋅ ⋅ ⋅ , 𝑞𝑡 ) be a sequence of 𝑡 queries from users 𝕌𝑡 = (𝑢1 , 𝑢2 , ⋅ ⋅ ⋅ , 𝑢𝑡 ), and let 𝕎𝑡 = (𝑤1 , 𝑤2 ..., 𝑤𝑡 ) be the corresponding queried keywords, and ℛ𝒫ℒ𝒴 𝑡 be the corresponding 𝑡 replies, where 𝑡 ∈ 𝑁 and is polynomially bounded. We define 𝕍𝑡 as the view of CLD over the 𝑡 queries, which include the transcript of the interactions between CLD and the query issuers, together with some common knowledge: 𝕍𝑡 = (𝐷′ , ℚ𝑡 , 𝕌𝑡 , U-HKey list, ℛ𝒫ℒ𝒴 𝑡 ). In addition, the trace of the 𝑡 queries is defined to be: 𝕋𝑡 = (∣𝐷′ ∣, 𝐼𝑑(𝕌𝑡 ), 𝐼𝑑(ℛ𝒫ℒ𝒴 𝑞1 ), ⋅ ⋅ ⋅ , 𝐼𝑑(ℛ𝒫ℒ𝒴 𝑞𝑡 ), ∣𝒰𝐴 ∣), which contains all the abstract information on the database, U-HKey list, and user queries, where 𝐼𝑑(𝕌𝑡 ) = {𝐼𝑑(𝑢1 ), 𝐼𝑑(𝑢2 ), ⋅ ⋅ ⋅ , 𝐼𝑑(𝑢𝑡 )} and 𝐼𝑑(ℛ𝒫ℒ𝒴 𝑞 ) denotes the identifying information of each record in ℛ𝒫ℒ𝒴 𝑞 . Note that ∣𝒰𝐴 ∣ equals the number of entries in the U-HKey list. Query privacy requires that whatever information on 𝐷 and 𝕎𝑡 that CLD can compute from 𝕍𝑡 , it can also compute from 𝕋𝑡 alone. Query Unforgeability. In the multi-user setting, user accountability is a desired feature. Distinct query keys held

IV. O UR S CHEME Our scheme uses bilinear maps, we thus begin with a brief review of related concepts. Bilinear Map: Let 𝐺1 and 𝐺2 be two groups of prime order 𝑝. A bilinear map is a function 𝑒ˆ : 𝐺1 × 𝐺1 → 𝐺2 , satisfying the following properties:

267

Setup(1𝜅 ) :

ENT sets up public system parameters 𝐺1 , 𝐺2 , and 𝑒ˆ; selects random 𝑥 ∈ 𝑍𝑝∗ and sets 𝑀𝐾𝐸𝑁𝑇 = 𝑥; selects a random record encryption key 𝑒𝑘 for semantically secure symmetric key encryption 𝐸𝑛𝑐(.).

AddUser(𝑀𝐾𝐸𝑁𝑇 , 𝑒𝑘, 𝑢):

ENT selects random 𝑥𝑢 ∈ 𝑍𝑝∗ and sets 𝑞𝑘𝑢 = 𝑥𝑢 ; computes ℎ𝑘𝑢 = 𝑔 𝑥𝑢 ∈ 𝐺1 ; securely sends 𝑞𝑘𝑢 , 𝑒𝑘 to user 𝑢; also securely sends ℎ𝑘𝑢 to CLD, who then adds a new entry (𝑢, ℎ𝑘𝑢 ) to the U-Hkey list. To revoke user 𝑢, ENT simply instructs CLD to delete the entry of (𝑢, ℎ𝑘𝑢 ) from the U-HKey list. To write a record 𝑑𝑖 to 𝐷′ , ENT first generates the index of 𝑑𝑖 .𝑤 using 𝑀𝐾𝐸𝑁𝑇 as follows: computes in turn 𝑒𝑤 = 𝑒ˆ(ℎ1 (𝑑𝑖 .𝑤), 𝑔 𝑀𝐾𝐸𝑁𝑇 ) and 𝑘 = ℎ2 (𝑒𝑤 ), and sets 𝐼𝑛𝑑𝑥(𝑑𝑖 .𝑤) = ⟨𝑚, ℎ𝑘 (𝑚)⟩, where 𝑚 ∈ ℳ is a random value. ENT then computes 𝐸𝑛𝑐(𝑑𝑖 ) using 𝑒𝑘. Finally, ENT passes 𝑑′𝑖 = ⟨𝐼𝑛𝑑𝑥(𝑑𝑖 .𝑤), 𝐸𝑛𝑐(𝑑𝑖 )⟩ to CLD. User 𝑢 computes 𝑞𝑢 (𝑤) = ℎ1 (𝑤)𝑞𝑘𝑢 and outputs (𝑢, 𝑞𝑢 (𝑤)) as her search query on keyword 𝑤. Upon receipt of a search query (𝑢, 𝑞𝑢 (𝑤)), CLD first looks for (𝑢, ℎ𝑘𝑢 ) in the U-HKey list. If no 𝑒(𝑞𝑢 (𝑤), ℎ𝑘𝑢 )) matching entry is found, it outputs ⊥. Otherwise, using ℎ𝑘𝑢 , it computes 𝑘 ′ = ℎ2 (ˆ and sets ℛ𝒫ℒ𝒴 𝑞𝑢 (𝑤) = ∅. Then CLD scans 𝐷′ and for each 𝐼𝑛𝑑𝑥(𝑑𝑖 .𝑤) in the form ⟨𝑚𝑖 , 𝑐𝑖 ⟩, if 𝑐𝑖 = ℎ𝑘′ (𝑚𝑖 ), then includes 𝐸𝑛𝑐(𝑑𝑖 ) in the reply set, i.e., ℛ𝒫ℒ𝒴 𝑞𝑢 (𝑤) = ℛ𝒫ℒ𝒴 𝑞𝑢 (𝑤) ∪{𝐸𝑛𝑐(𝑑𝑖 )}.

RemoveUser(𝑢): WriteRecord(𝑀𝐾𝐸𝑁𝑇 , 𝑒𝑘, 𝑑𝑖 ):

GenQuery(𝑞𝑘𝑢 , 𝑤): Search(𝑞𝑢 (𝑤), ℎ𝑘𝑢 , 𝐷′ ):

Figure 2.

𝑀𝐾𝐸𝑁𝑇

Multi-user searchable encryption for enterprise-outsourcing-database-to-cloud

1) Bilinear: For all 𝑔1 , 𝑔2 ∈ 𝐺1 and all 𝑥1 , 𝑥2 ∈ 𝑍𝑝∗ , 𝑒ˆ(𝑔1𝑥1 , 𝑔2𝑥2 ) = 𝑒ˆ(𝑔1 , 𝑔2 )𝑥1 𝑥2 . 2) Non-degenerate: If 𝑔 is a generator of 𝐺1 , then 𝑒ˆ(𝑔, 𝑔) is a generator of 𝐺2 . We say 𝐺1 is a Gap-Diffie-Hellman group, if the Decisional DH problem (DDH) in 𝐺1 is easy while the Computational DH problem (CDH) is hard. The CDH problem is to compute 𝑔 𝑎𝑏 , given 𝑔, 𝑔 𝑎 , 𝑔 𝑏 ; and the DDH problem is to ? determine 𝑐 = 𝑔 𝑎𝑏 , given 𝑔, 𝑔 𝑎 , 𝑔 𝑏 , 𝑐, where 𝑎, 𝑏 ∈ 𝑍𝑝∗ and 𝑐 ∈ 𝐺1 are random values.

secret channel), by following a similar rationale of query key + helper key for the “search” part. We provide the details shortly in Subsection IV-E. Remark 2. The second point to be noted is that in our construction, while CLD maintains a U-HKey list, the list is not intended for access control, nor for verification on the legitimacy of the user, or the relationship between the user and her helper key. In the Search algorithm, upon receipt of a search request, CLD simply uses the helper key corresponding to the claimed user identity, without any check of the veracity of the claimed identity.

A. Our Scheme B. Correctness

Let 𝐺1 , 𝐺2 be two cyclic groups of a prime order 𝑝, and a bilinear map 𝑒ˆ : 𝐺1 ×𝐺1 → 𝐺2 be defined as above. Let 𝑔 be the generator of 𝐺1 . Let ℎ1 : 𝒲 → 𝐺1 and ℎ2 : 𝐺2 → 𝒦 be collision-resistant hash functions, and ℎ𝑘 : 𝒦×ℳ → ℋ be a keyed hash function under a secret key 𝑘 ∈ 𝒦, where 𝒦, ℳ, and ℋ are appropriate domains. The details of the scheme are depicted in Figure 2. As the figure is self-contained, we do not further elaborate on the algorithms. However, there are two points should be noted. Remark 1. At the end of the Search algorithm, CLD needs to return ℛ𝒫ℒ𝒴 𝑞𝑢 (𝑤) to the requesting user 𝑢 in a secret channel. This can be easily achieved by encrypting the search result with the public key of the user, assuming that each user has public key encryption. The reason for secretly delivering the search result to the user is because all authorized users (including revoked users) share the record encryption key 𝑒𝑘; without the secret channel, revoked users can still decrypt the records. Assuming an additional layer of secret channel is certainly not desirable. The reason why we let this happen is for simplicity only (ease of understanding and security analysis), as the main concern of searchable encryption is on realizing the functionality of private (keyword) search, rather than on how data payload is encrypted. Fortunately, it is not difficult to resolve this issue of key sharing (and the assumption of

The correctness of the protocol is straightforward. Consider a record ⟨𝐼𝑛𝑑𝑥(𝑤), 𝐸𝑛𝑐(𝑑)⟩, then 𝐼𝑛𝑑𝑥(𝑤) = = ℎ2 (ˆ 𝑒(ℎ1 (𝑤), 𝑔)𝑀𝐾𝐸𝑁𝑇 ) = ⟨𝑚, ℎ𝑘 (𝑚)⟩, and 𝑘 𝑥 𝑒(ℎ1 (𝑤), 𝑔) ). Consider a user 𝑢 ¯𝑥 with a query key ℎ2 (ˆ 𝑞𝑘𝑢¯ = 𝑥𝑢¯ and a helper key ℎ𝑘𝑢¯ = 𝑔 𝑥𝑢¯ . Her query on the keyword 𝑤 is 𝑞𝑢¯ (𝑤) = ℎ1 (𝑤)𝑥𝑢¯ ; and the key used in the al𝑒(𝑞𝑢¯ (𝑤), ℎ𝑘𝑢¯ )) = gorithm of 𝑆𝑒𝑎𝑟𝑐ℎ(., ., .) is thus 𝑘 ′ = ℎ2 (ˆ 𝑥 𝑥𝑢 ¯ 𝑥 𝑒(ℎ1 (𝑤) , 𝑔 𝑢¯ )) = ℎ2 (ˆ 𝑒(ℎ1 (𝑤), 𝑔)𝑥 ). Since 𝑘 = 𝑘 ′ , ℎ2 (ˆ 𝐸𝑛𝑐(𝑑) will be definitely included in the reply set according to the protocol. C. Performance To issue a search query, the main computation at the user side is simply an exponentiation operation, thus it has 𝒪(1) computationa. To process a user’s query, the main overhead at the CLD side includes a pairing operation, and 𝑛 hash operations, where 𝑛 is the number of records. Thus the computational complexity is asymptotically 𝒪(𝑛). We point out that all existing single-user searchable encryption schemes except those in [4] require 𝒪(𝑛) server computation. This suggests that the search efficiency of our scheme does not deteriorate due to the support of multiple users. We should clarify why the schemes in [4] stand out as an exception: their computation cost is linear to the size

268



of the keyword set, rather than of the document set. This performance gain is actually due to a preprocessing of all documents so that documents with the same keyword are linked together. Our scheme can also be optimized in a way that CLD groups together the records retrieved by a query by sharing a common index (since they contain the same keyword). This will relieve CLD from searching the whole database, when processing subsequent queries on the same keyword. As the system proceeds, this can greatly reduce the computation overhead of CLD.



D. Security Analysis We next analyze that our scheme in Figure 2 satisfies the security requirements put forth in Section III. We omit the detailed security proof, but only provide the intuitions. Interested readers are referred to [20]. Claim 1. Our scheme achieves query privacy if 𝐸𝑛𝑐(⋅) is a pseudorandom permutation, and ℎ1 (⋅), ℎ𝑘 (⋅) are pseudorandom functions. The intuition is that the view of the cloud 𝕍𝑡 = (𝐷′ , ℚ𝑡 , 𝕌𝑡 , U-HKey list, ℛ𝒫ℒ𝒴 𝑡 ) = is simulatable, based on the trace 𝕋𝑡 (∣𝐷′ ∣, 𝐼𝑑(𝕌𝑡 ), 𝐼𝑑(ℛ𝒫ℒ𝒴 𝑞1 ), ⋅ ⋅ ⋅ , 𝐼𝑑(ℛ𝒫ℒ𝒴 𝑞𝑡 ), ∣𝒰𝐴 ∣). We briefly explain how the simulation proceeds. For 𝐷′ = {𝑑′𝑖 = ⟨𝐼𝑛𝑑𝑥(𝑑𝑖 .𝑤), 𝐸𝑛𝑐(𝑑𝑖 )⟩}, it is straightforward to simulate 𝐸𝑛𝑐(𝑑𝑖 ) by drawing a random value from the appropriate domain, as 𝐸𝑛𝑐(⋅) is a pseudorandom permutation. We defer the simulation of 𝐼𝑛𝑑𝑥(𝑑𝑖 .𝑤) a little bit. For the remaining discussions, we suppose that users in 𝕌𝑡 are distinct, but some of the queries in ℚ𝑡 may search the same keywords. For 1 ≤ 𝑗 ≤ ∣𝒰𝐴 ∣, the simulator selects random 𝑥∗𝑗 ∈ 𝑍𝑝∗ and sets 𝑥∗ = 𝑥∗1 × ... × 𝑥∗∣𝒰𝐴 ∣ . - Simulating ℚ𝑡 and 𝕌𝑡 : From 𝐼𝑑(ℛ𝒫ℒ𝒴 𝑞𝑖 ), 1 ≤ 𝑖 ≤ 𝑡, contained in 𝕋𝑡 , the simulator can determine which queries ask the same keyword. For each query, it selects a random user identity 𝑢∗𝑖 as the querier, and picks up an element from {𝑥∗1 , ..., 𝑥∗∣𝒰𝐴 ∣ }, say 𝑥∗𝑖 , for 𝑢∗𝑖 . If there does not exist 𝑗 < 𝑖 such that 𝐼𝑑(ℛ𝒫ℒ𝒴 𝑞𝑗 ) = 𝐼𝑑(ℛ𝒫ℒ𝒴 𝑞𝑖 ) (note that 𝐼𝑑(ℛ𝒫ℒ𝒴 𝑞𝑗 ) = 𝐼𝑑(ℛ𝒫ℒ𝒴 𝑞𝑖 ) means that 𝑞𝑖 and 𝑞𝑗 ask the same keyword), selects a random element ∗ 𝑟 ∈ 𝐺1 and computes 𝑞𝑖∗ = 𝑟𝑥𝑖 . Otherwise, re-uses ∗ ∗ the same 𝑟 for 𝑞𝑗 to compute 𝑞𝑖 . It can be seen that ∗ the simulated query 𝑞𝑖∗ = 𝑟𝑥𝑖 is computationally indistinguishable from an actual query 𝑞𝑖 (𝑤) = ℎ1 (𝑤)𝑥𝑢 if ℎ1 (⋅) as long as a pseudorandom function. ∗ ∙ Simulating U-HKey list: We actually associate each 𝑥𝑖 with a user in 𝒰𝐴 . Accordingly, the helper key corre∗ ∗ sponding to 𝑥∗𝑖 is computed as 𝑔 𝑥 /𝑥𝑖 . To organize the U-HKey list, the users together with the corresponding helper keys involved in simulating ℚ𝑡 are placed to the positions according to 𝐼𝑑(𝕌𝑡 ), while the remaining users and helper keys can be placed randomly to fill the remaining entries of the list. It can be seen that an actual





helper key 𝑔 𝑥/𝑥𝑢 and a simulated helper key 𝑔 𝑥 /𝑥𝑢 is indistinguishable, since 𝑥𝑢 and 𝑥∗𝑢 are random values. Simulating {𝐼𝑛𝑑𝑥(𝑑𝑖 .𝑤)}: From 𝐼𝑑(ℛ𝒫ℒ𝒴 𝑞𝑖 ), 1 ≤ 𝑖 ≤ 𝑡, the simulator knows which records are retrieved by query 𝑞𝑖 . Recall that 𝑞𝑖∗ is computed as ∗ ∗ 𝑒(𝑟, 𝑔)𝑥 ), and for each id 𝑟𝑥𝑖 . Computes 𝑘 ∗ = ℎ(ˆ in 𝐼𝑑(ℛ𝒫ℒ𝒴 𝑞𝑖 ), the index is set as ⟨𝑚∗ , ℎ𝑘∗ (𝑚∗ )⟩, where 𝑚∗ ∈ ℳ is a random value. At last, the index of each of the remaining records (i.e., those that are not contained in 𝐼𝑑(ℛ𝒫ℒ𝒴 𝑞𝑖 ), 1 ≤ 𝑖 ≤ 𝑡) is set as ⟨𝑚1 , 𝑚2 ⟩, where 𝑚1 , 𝑚2 ∈𝑅 ℳ are random values. The simulation is clearly computationally indistinguishable if ℎ𝑘 (⋅) is a pseudorandom function. Simulating ℛ𝒫ℒ𝒴 𝑡 : This simulation is straightforward, as it simply consists of records from the simulated 𝐷′ whose id’s are contained in 𝐼𝑑(ℛ𝒫ℒ𝒴 𝑞𝑖 ), 1 ≤ 𝑖 ≤ 𝑡.

Claim 2. Query unforgeability of our scheme is reduced to existential unforgeability of the BLS short signature [21], given that ℎ1 is a random oracle. A brief recall of the BLS short signature presented in [21] is as follows: Let 𝐺1 , 𝐺2 , 𝑒ˆ be defined as above, and 𝑔 be a generator of 𝐺1 ; ℎ : {0, 1}∗ → 𝐺1 be a collision resistant hash function. A user’s key pair is (𝑥 ∈ 𝑍𝑝∗ , 𝑦 = 𝑔 𝑥 ∈ 𝐺1 ), where 𝑥 is the private signing key. Then, the signature on a message 𝑚 is defined to be 𝜎 = ℎ(𝑚)𝑥 . Signature ? verification is to check 𝑒ˆ(𝑔, 𝜎) = 𝑒ˆ(𝑦, ℎ(𝑚)). The BLS short signature is existentially unforgeable if ℎ is modeled as a random oracle. In our scheme, a search query on keyword 𝑤 of user 𝑢 is ℎ1 (𝑤)𝑥𝑢 , which is essentially a BLS short signature under the signing key 𝑥𝑢 . This is the intuition of the above claim. Claim 3. Our scheme achieves revocability if ℎ𝑘 (⋅) is a pseudorandom function. The intuition is pretty straightforward. Let us consider the indexes of two keywords 𝑤1 and 𝑤2 , which are 𝐼𝑛𝑑𝑥(𝑤1 ) = ⟨𝑚1 , ℎ𝑘(𝑤1 ) (𝑚1 )⟩ and 𝐼𝑛𝑑𝑥(𝑤2 ) = ⟨𝑚2 , ℎ𝑘(𝑤2 ) (𝑚2 )⟩, respectively, where 𝑚1 , 𝑚2 ∈ ℳ are random values, and 𝑘(𝑤1 ) and 𝑘(𝑤2 ) denote the secret keys generated from 𝑤1 and 𝑤2 , respectively. When a user is revoked, her helper key is deleted from the U-HKey list. Thus, the revoked user can never get 𝑘(𝑤1 ) and 𝑘(𝑤2 ) from the keywords and her query key. This in turn means that ℎ𝑘(𝑤1 ) (⋅) and ℎ𝑘(𝑤2 ) (⋅) are psedudorandom functions to the revoked user. As a result, 𝐼𝑛𝑑𝑥(𝑤1 ) and 𝐼𝑛𝑑𝑥(𝑤2 ) are computationally indistinguishable to the revoked user, whose guessing probability thus cannot be significantly more than 1/2. E. Resolving the Sharing of Record Encryption Key As we pointed out earlier, it is not appropriate for sharing of the record encryption key 𝑒𝑘, although secretly delivering the search result from the cloud to the requesting user is not a hard problem in practice. We next provide a solution to resolve this issue of key sharing. The solution follows a

269

similar rationale of query key + helper key for the “private search” part. The basic idea follows. We adhere to the notations/algorithms in Figure 2. Let the record encryption key 𝑒𝑘 ∈𝑅 𝐺1 . For each authorized user 𝑢, partition 𝑒𝑘 into two random shares 𝑒𝑘𝑢 and 𝑒𝑘𝑢′ such that 𝑒𝑘 = 𝑒𝑘𝑢 ⋅ 𝑒𝑘𝑢′ . Then 𝑒𝑘𝑢 is given to 𝑢, and 𝑒𝑘𝑢′ is given to CLD, who can manage 𝑒𝑘𝑢′ together with ℎ𝑘𝑢 in the U-HKey list. When revoking the user, 𝑒𝑘𝑢′ is deleted as well (together with ℎ𝑘𝑢 ). The actual encryption key 𝑒𝑘𝑖 for a record 𝑑𝑖 is calculated as follows: picks a random number 𝑟𝑖 ∈𝑅 𝐺1 , and computes 𝑒(𝑒𝑘, 𝑟𝑖 )), where ℎ′ (⋅) is an appropriate hash 𝑒𝑘𝑖 = ℎ′ (ˆ function. Encryption of 𝑑𝑖 is accomplished by computing 𝐸𝑛𝑐(𝑑𝑖 ) with 𝑒𝑘𝑖 . As a result, an encrypted record is in the form 𝑑′𝑖 = ⟨𝐼𝑛𝑑𝑥(𝑑𝑖 .𝑤), 𝑟𝑖 , 𝐸𝑛𝑐(𝑑𝑖 )⟩. = In the Search algorithm, if a record 𝑑′𝑖 ⟨𝐼𝑛𝑑𝑥(𝑑𝑖 .𝑤), 𝑟𝑖 , 𝐸𝑛𝑐(𝑑𝑖 )⟩ in 𝐷′ is located and should be included in ℛ𝒫ℒ𝒴 𝑞𝑢 (𝑤) , CLD computes 𝑡𝑚𝑝𝑖 = 𝑒ˆ(𝑒𝑘𝑢′ , 𝑟𝑖 ) and includes it together with 𝑟𝑖 and 𝐸𝑛𝑐(𝑑𝑖 ) in the reply set, i.e., ℛ𝒫ℒ𝒴 𝑞𝑢 (𝑤) = ℛ𝒫ℒ𝒴 𝑞𝑢 (𝑤) ∪{𝑡𝑚𝑝𝑖 , 𝑟𝑖 , 𝐸𝑛𝑐(𝑑𝑖 )}. At the user side, the decryption key 𝑒𝑘𝑖 is computed as 𝑒(𝑒𝑘𝑢′ , 𝑟𝑖 ) ⋅ 𝑒ˆ(𝑒𝑘𝑢 , 𝑟𝑖 )) = 𝑒𝑘𝑖 = ℎ′ (𝑡𝑚𝑝𝑖 ⋅ 𝑒ˆ(𝑒𝑘𝑢 , 𝑟𝑖 )) = ℎ′ (ˆ ′ ′ ′ 𝑒(𝑒𝑘𝑢 ⋅ 𝑒𝑘𝑢 , 𝑟𝑖 )) = ℎ (ˆ 𝑒(𝑒𝑘, 𝑟𝑖 )). ℎ (ˆ Given that 𝑒ˆ(⋅) is one-way, we achieve: (1) there is no need to deliver ℛ𝒫ℒ𝒴 𝑞𝑢 (𝑤) in a secret channel; (2) a revoked user (without 𝑒𝑘𝑢′ ) cannot decrypt anymore even given the encrypted records that she never accessed before. Of course, we can adopt an alternative strategy of splitting 𝑒𝑘 𝑒𝑘 as (𝑒𝑘𝑢 , 𝑒𝑘𝑢′ = 𝑔 𝑒𝑘𝑢 )1 , which is exactly the same rationale as for the query key part. Other details should be straightforward and are omitted here.

next show how to adapt our scheme in Figure 2 to achieve this kind of “separation of duty” (SoD). To implement SoD, we need to restrict a user’s ability in generating search queries, in the sense that she can only issue queries on keywords that satisfy the restrictions imposed by implementing SoD. To achieve this, it is clear that a user 𝑢 should not be directly given 𝑥𝑢 as her query key 𝑞𝑘𝑢 (referring to Figure 2). Suppose 𝑢 is authorized to issue queries on 𝑤1 , 𝑤2 , ⋅ ⋅ ⋅ , 𝑤𝑡 , we need to slightly modify the algorithm of AddUser(𝑀𝐾𝐸𝑁𝑇 , 𝑒𝑘, 𝑢) in such a way that 𝑞𝑘𝑢 = (ℎ1 (𝑤1 )𝑥𝑢 , ℎ1 (𝑤2 )𝑥𝑢 , ⋅ ⋅ ⋅ , ℎ1 (𝑤𝑡 )𝑥𝑢 ). Note that if the user wants to test the validity of 𝑞𝑘𝑢 , ENT can also pass 𝑔 𝑥𝑢 to her, and the test equations are 𝑒ˆ(𝑞𝑘𝑢 [1], 𝑔) = 𝑒ˆ(ℎ1 (𝑤1 ), 𝑔 𝑥𝑢 ), 𝑒ˆ(𝑞𝑘𝑢 [2], 𝑔) = 𝑒ˆ(ℎ1 (𝑤2 ), 𝑔 𝑥𝑢 ), ⋅ ⋅ ⋅ , 𝑒ˆ(𝑞𝑘𝑢 [𝑡], 𝑔) = 𝑒ˆ(ℎ1 (𝑤𝑡 ), 𝑔 𝑥𝑢 ). Then in GenQuery(𝑞𝑘𝑢 , 𝑤), if 𝑤 is among the set of authorized keywords, the user simply picks up the corresponding element from 𝑞𝑘𝑢 and uses it as the search query. Other algorithms in Figure 2 remain unchanged. B. Conjunctive Keyword Search

In the above scheme, each authorized user has the freedom to issue search queries on any keyword of her choice. This is applicable in many applications. For example, a research institute outsources research data to the cloud, and then every involved researcher should have access to the entire data set. Nevertheless, in many other applications, an authorized user’s search capability should be restricted, in the sense that she should be allowed to retrieve data on the need-to-know basis. For example, consider that a company outsources the management of its employees’ salary information to the cloud; for privacy reason, it is obvious that an employee should only be able to access her own salary records. We

In some applications, it often needs to decide whether a record contains several keywords. This is referred to as conjunctive keyword search [9], [10]. In fact, there is a straightforward way to achieve conjunctive keyword search by extending single-keyword searchable encryption: generating multiple indexes, each from one of the keywords; accordingly, a search query contains several elements corresponding to the set of keywords to be searched; the search process involves matching the set of query elements and the set of indexes. The objective of designing searchable encryption for conjunctive keyword search is obviously to improve the efficiency of the above trivial method. In the following, we show how to enable conjunctive keyword search by modifying the scheme in Figure 2. Let us consider the database 𝐷 has 𝑡 keyword attributes, 𝑤1 , 𝑤2 , ⋅ ⋅ ⋅ , 𝑤𝑡 . In the algorithm of WriteRecord(𝑀𝐾𝐸𝑁𝑇 , 𝑒𝑘, 𝑑𝑖 ), the index of 𝑑′𝑖 is still generated as 𝐼𝑛𝑑𝑥(𝑑𝑖 .𝑤1 , 𝑑𝑖 .𝑤2 , ⋅ ⋅ ⋅ , 𝑑𝑖 .𝑤𝑡 ) = ⟨𝑚, ℎ𝑘 (𝑚)⟩, but 𝑘 = ℎ2 (ˆ 𝑒(ℎ1 (𝑑𝑖 .𝑤1 ) ⊗ ℎ1 (𝑑𝑖 .𝑤2 ) ⊗ ⋅ ⋅ ⋅ ⊗ ℎ1 (𝑑𝑖 .𝑤𝑡 ), 𝑔 𝑀𝐾𝐸𝑁𝑇 )), where ⊗ denotes XOR operation. In the algorithm GenQuery(⋅, ⋅), the user’s search query 𝑞𝑢 (𝑤1′ , 𝑤2′ , ⋅ ⋅ ⋅ , 𝑤𝑡′ ) should be computed as 𝑞𝑢 (𝑤1′ , 𝑤2′ , ⋅ ⋅ ⋅ , 𝑤𝑡′ ) = (ℎ1 (𝑤1′ )⊗ℎ1 (𝑤2′ )⊗⋅ ⋅ ⋅⊗ℎ1 (𝑤𝑡′ ))𝑞𝑘𝑢 . Other steps remain the same. It can be easily seen that this method has almost the same efficiency as the single-keyword scheme, irrespective to the number of simultaneous keywords. Compared to the conjunctive keyword searchable encryption scheme in [10]2 , our method has considerably better performance, notwithstanding ours is meant to support multiple users. In

1 In this case, 𝑒𝑘 = ℎ′ (ˆ 𝑒(𝑔 𝑒𝑘 , 𝑟𝑖 )), and 𝑟𝑖 no longer needs to be included 𝑖 in the reply set ℛ𝒫ℒ𝒴 𝑞𝑢 (𝑤) .

2 The authors of the other (secret-key) conjunctive keyword scheme [9] acknowledged that their scheme has certain weaknesses.

V. E XTENSIONS We propose several extensions/variations to the scheme presented in Figure 2, in order to achieve different search capabilities. It is expected that our scheme can accommodate more extensions such as ranked keyword search [12]. A. Implementation of Separation of Duty

270

the scheme of [10], generating the index for a record or generating a search query, each requires a linear number of exponentiation operations with respect to the number of simultaneous keywords. In contrast, these costs are constant in our scheme, and in particular, 1 exponentiation and 1 pairing for generating an index, and 1 exponentiation for generating a query.

[5] Y. Chang and M. Mitzenmacher, Privacy Preserving Keyword Searches on Remote Encrypted Data. Proc. Applied Cryptography and Network Security, ACNS’05, LNCS 3531, pp. 442455, 2005. [6] E. Goh. Secure Indexes, http://crypto.stanford.edu/ eujin/papers/secureindex/secureindex.pdf, 2003. [7] D. Song, D. Wagner, and A. Perrig, Practical Techniques for Searches on Encrypted Data. Proc. IEEE Symp. on Security and Privacy, S&P’00, pp. 44-55, 2000. [8] D. Naor, M. Naor, and J. Lotspiech, Revocation and Tracing Schemes for Stateless Receivers. Proc. Advances in Cryptology, Crypto’01, LNCS 2139, pp. 41-62, 2001. [9] L. Ballard, S. Kamara, and F. Monrose, Achieving Efficient Conjunctive Keyword Searches over Encrypted Data. Proc. International Conf. on Information and Communications Security, ICICS’05, pp. 414-426, 2005. [10] P. Golle, J. Staddon, and B. Waters, Secure Conjunctive Keyword Search over Encrypted Data. Proc. Applied Cryptography and Network Security, ACNS’04, pp. 31-45, 2004. [11] Z. Yang, S. Zhong, and R. N. Wright, Privacy-Preserving Queries on Encrypted Data. Proc. Computer Security, ESORICS, LNCS 4189, pp. 479-495, 2006. [12] C. Wang, et. al., Secure Ranked Keyword Search over Encrypted Cloud Data . Proc. IEEE International Conf. on Distributed Computing Systems, ICDCS’10, pp. 253-262, 2010. [13] D. Boneh, G. di Crescenzo, R. Ostrovsky, and G. Persiano, Public key encryption with keyword search. Proc. Eurocrypt’04, pp. 506-522, 2004. [14] M. Abdalla, et. al., Searchable Encryption Revisited: Consistency Properties, Relation to Anonymous IBE, and Extensions. Proc. Crypto’05, LNCS 3621, pp. 205-222, 2005. [15] Y.H. Hwang, and P.J. Lee, Public Key Encryption with Conjunctive Keyword Search and Its Extension to a Multi-User System. Proc. International Conf. on Pairing-Based Cryptography, Pairing’07, 2007. [16] J. Katz, A. Sahai, and B. Waters, Predicate Encryption Supporting Disjunctions, Polynomial Equations, and Inner Products. Proc. Advances in Cryptology, EUROCRYPT’08, LNCS 4965, pp. 146-162, 2008. [17] V. Goyal, O. Pandey, A. Sahai, and B. Waters. Attribute-Based Encryption for Fine-Gained Access Control of Encrypted Data. Proc. ACM Conf. on Computer and Communications Security, CCS’06, pp. 89-98. 2006. [18] E. Shen, E. Shi, and B. Waters, Predicate Privacy in Encryption Systems. Proc. Theory of Cryptography, TCC’09, LNCS 5444, pp. 457-473, 2009. [19] M. Li, S Yu, N. Cao, and W. Lou, Authorized Private Keyword Search over Encrypted Data in Cloud Computing, Proc. International Conf. on Distributed Computing Systems, ICDCS’11, pp. 383 - 392, 2011. [20] Y. Yang, F. Bao, X. Ding, and R. H. Deng, Multiuser private queries over encrypted databases. Journal of Applied Cryptography, Vol. 1(4), pp. 309-319, 2009, INDERSCIENCE PUBLISHERS. [21] D. Boneh, B. Lynn, and H. Shacham, Short Signatures from the Weil Pairing. Proc. Asiacrypt’01, LNCS 2248, pp. 514-532, 2001. [22] J. Li, et. al., Fuzzy Keyword Search over Encrypted Data in Cloud Computing. Proc. INFOCOM Mini-Conference, 2010.

C. Fuzzy Private Keyword Search Thus far, we have considered exact private keyword search. In practice, due to reasons such as input errors, fuzzy private keyword search is often quite useful. Simply put it, fuzzy private keyword search locates the set of records that contains similar keywords (under certain criteria) with respect to the keyword in question. In [22], Li et al. proposed a searchable encryption scheme for fuzzy private keyword search. Next we show that it is not difficult to integrate their techniques into ours in Figure 2 to achieve multi-user fuzzy private keyword search. Li et al.’s scheme mainly considers edit distance as the criteria to determine the fuzzy set, and more precisely the wildcard-based fuzzy set. The edit distance between two keywords 𝑤1 and 𝑤2 is the number of operations required to transform one into the other. For example, the wildcardbased fuzzy set of keyword CLOUD with edit distance 1 is 𝐹 𝑆(CLOUD,1) = {CLOUD, *CLOUD, *LOUD, C*LOUD, C*OUD, ⋅ ⋅ ⋅, CLOU*D, CLOU*, CLOUD*}. Then, the index for CLOUD is the set of indexes generated from each individual keyword in 𝐹 𝑆(CLOUD,1). It is straightforward to apply this idea to generating the index for 𝑑𝑖 .𝑤 (i.e., 𝐼𝑛𝑑𝑥(𝑑𝑖 .𝑤)) in our scheme: construct the fuzzy set of 𝑑𝑖 .𝑤, then generate an index from each keyword of the fuzzy set in the same way as in 𝑊 𝑟𝑖𝑡𝑒𝑅𝑒𝑐𝑜𝑟𝑑(𝑀𝐾𝐸𝑁𝑇 , 𝑒𝑘, 𝑑𝑖 ), and finally 𝐼𝑛𝑑𝑥(𝑑𝑖 .𝑤) is the set of all these indexes. Likewise, generation of the query for a word 𝑤 (which may not be the actual keyword intended by the user 𝑢 due to input error) proceeds in a similar fashion: (1) establish 𝐹 𝑆(𝑤,1); (2) generate a query for each word in 𝐹 𝑆(𝑤, 1) using GenQuery(𝑞𝑘𝑢 , ⋅); (3) 𝑞𝑢 (𝑤) consists of all these queries. R EFERENCES [1] R. Chow, et. al., Controlling Data in the Cloud: Outsourcing Computation without Outsourcing Control. Proc. IEEE International Conf. on Cloud Computing, pp. 85-90, 2010. [2] S. Kamara and K. Lauter, Cryptographic Cloud Storage. Proc. Financial Cryptography: Workshop on Real-Life Cryptographic Protocols and Standardization 2010. [3] T. Mather, S. Kumaraswamy, and S. Latif, Cloud Security and Privacy: An Enterprise Perspective on Risks and Compliance. O’Reilly Media, 2009. [4] R. Curtmola, J. Garay, S. Kamara, and R. Ostrovskey, Searchable Symmetric Encryption: Improved Definitions and Efficient Constructions. Proc. ACM Conf. on Computer and Communications Security, CCS’06, pp. 79-88, 2006.

271