Selecting key generating elliptic curves for Privacy ... - IEEE Xplore

14 downloads 0 Views 258KB Size Report
Faculty of Engineering, Universitas Indonesia. Depok, Jawa Barat, Indonesia [email protected]. Abstract— Privacy Preservation in Data Mining (PPDM) including for ...
2015 IEEE Asia Pacific Conference on Wireless and Mobile

Selecting Key Generating Elliptic Curves for Privacy Preserving Association Rule Mining (PPARM) Amiruddin

Riri Fitri Sari

Department of Electrical Engineering Faculty of Engineering, Universitas Indonesia Depok, Jawa Barat, Indonesia [email protected], [email protected]

Department of Electrical Engineering Faculty of Engineering, Universitas Indonesia Depok, Jawa Barat, Indonesia [email protected]

Abstract— Privacy Preservation in Data Mining (PPDM) including for Privacy Preserving Association Rule Mining (PPARM) has attracted lots of attention in recent research and practice. However, the current method or approach still have drawbacks in the sense that there are trade-offs between efficiency and privacy preservation. This paper describes our work towards providing a new efficient PPARM protocol. We reviewed current literature on PPARM and mapped the methods or approaches involved. As previous research showed that Elliptic Curve Cryptography (ECC) perform better than the other Public Key systems such as RSA and Diffie-Hellman, we will utilize ECC for reducing the computational cost of the new PPARM protocol. In choosing good elliptic curves for ECC, we measured the running time of the key generation for various group of recommended elliptic curves i.e. Brainpool curves (by Brainpool), Prime, C2pnb, C2tnb curves (by ANSI X9.62), Secp curves (by SECG), and PrimeCurve curves (by CDC Group). As the result, Secp curves outperformed all of the other curves on overall average ratio of running time and key size of key generation by 4.4% up to 357.6%.

information or knowledge [2]. However, data mining does not always process open or plain data, but also closed or confidential data. The problem in mining such confidential data is how the mining process can be done while maintaining the confidentiality of data (privacy-preservation), especially if the data is owned by several parties or agencies. This problem has raised such a new branch of data mining, called the Privacy-Preserving Data Mining (PPDM). We reviewed PPDM especially for the Privacy-Preserving Association Rule Mining (PPARM) in mapping the current methods or approaches used for different goals or environments. We will utilize Elliptic Curve Crytography (ECC) technique for proposing a new efficient PPARM. For this intent, for choosing good elliptic curves for ECC implementation, we have measured the performance of various recommended elliptic curves on key generation within Java implementation using Library provided by FlexiProvider [3].

Keywords— Association rule mining, Elliptic Curve Cryptography, Privacy-Preserving; Multiparty Computation

II. LITERATURE REVIEW In this section, we presented literature review on PPARM and ECC.

I. INTRODUCTION

A. PPARM This section provides our review on PPARM in more detail. PPARM as one of the tasks of PPDM is a technique to discover data patterns through an analysis of the relationship (link analysis) of data. Collaborative computing can be done by involving two or more parties (multiparty). Zhang et al. [4] proposed a twoparty protocol for PPDM in the category of association rule mining (PPARM) which is applied to horizontally-partitioned, distributed data. The study gives contribution to the previous related studies conducted by other researchers, as well as a real application form from the previous research. Adversary model which is applied in the research is a semi-honest, where the parties involved in the protocol try to cheat against the other party. The model which is best applied is malicious model that tries to prevent any possible attacks against the

Rapid developments in technology, handheld device applications, and machine-to-machine (M2M) communication as well as the emergence of mobile/online social networks have become the triggers of large volume data generation known as Big Data. According to IBM as stated in [1], humans produce 2.5 trillion bytes of data every day, and 90% of the data that is in the world today is generated in just over 2 years. The abundance of Big Data have grabed much more attention of people from government, business and academia. The Big Data trend becomes a great potential to be used to spot trends, patterns, relationships, and to predict the future state. To optimally utilize the Big Data, we require appropriate techniques, methods, and tools to mine it to find important knowledge useful to support decision-making that could have a good impact on economic growth and technical innovations. Data mining is an important tools for transforming data into

72

2015 IEEE Asia Pacific Conference on Wireless and Mobile

Second, combine the rule sets generated in the previous stage. Third, mine the combined rule set to generate a decision tree models. Zhang et al. [4] conducted research on PPARM for 2-party environment, in accordance with some existing PPARM techniques which rely on Secure Multiparty Summary and Union Computation but do not guarantee the security when they involve two or more parties. Zhang et al. [4] proposed a technique using commutative encryption for designing secure division computation protocol for Privacy Preserving Distributed Association Rule Mining. Giannotti et al. [10] studied the data mining outsourcing problem by proposing attack model and a model of privacypreserving scheme for AR. They defined the attack models, built encryption/decryption scheme which can transform the data before it is sent to the server, proposed a compact structure called synopsis, and conducted formal and experimental analysis with real datasets. The experimental results showed that their proposed scheme is effective, scalable, and privacy-preserving. Lai et al. [11] also studied data mining on the cloud where data owners can outsource the data mining process. In such setting, sensitive data and mining result should be protected. For this problem, they provided the first semantically secure solution with categorical data. Jung et al. [12] proposed an algorithm for PPARM using Hadoop framework by adding dummy as noise to the original transaction data. The experiment result showed that the proposed algorithm sufficiently prevent security violation, but it slightly reduced the performance of the mining process. Iqbal et al. [13] tried to apply PPDM for other applications, the XML document. The proposed model is based on the Bayesian Network (BN). The study departed from the question of how to find sensitive rules and count the number which can be called reliable. BN approach which uses K2 algorithm and supports the Apriori algorithm for the AR can answer both previous questions.

protocol, although it requires expensive and complex techniques. Tassa [5] proposed a secure protocol of mining association rules on horizontally partitioned databases. The protocol was developed based on the algorithm of Fast Distributed Mining (FDM) developed by Cheung et al. as stated in [5]. It improved the Unifi-KC protocol proposed by Kantarcioglu and Clifton [6]. Contributions offered by the protocol were two new algorithms for Secure Multiparty, i.e. Threshold algorithm to calculate the combined (union) of the sub-set of private contained in each participant, and SetInc algorithms for testing the insertion (inclusion) of an element by a party into the subset that is held by another party. The protocols were claimed to offer privacy which was higher than the Unifi-KC, simpler and more efficient in terms of communication rounds, communication cost and computation cost. Lou et al. [7] improved the algorithm of Mining Association Security Konstraint (MASK) proposed by Rizvi and Haritsa in 2002 for Privacy Preserving Association Rule Mining (PPARM). The algorithm used Randomized Disturbance and Reconstruction of Distributions to fulfill privacy-preservation, but the analysis showed that the algorithm was still low in time efficiency. The use of one single technique in PPARM get a low rate that suggests the use of hybrid approach. Lou et al. [7] proposed an algorithm based on a combined techniques of Data Perturbation and Query Restriction (DPQR). The experimental results showed that the DPQR algorithm was much better than the MASK algorithm in terms of privacy-preservation, i.e. each of > 90% and