A Novel Authentication Service for Hadoop in Cloud ... - IEEE Xplore

0 downloads 0 Views 321KB Size Report
Abstract—Authentication remains as significant security challenge in hadoop environment. Hadoop does not strongly authenticate the client. As a result, data ...
A Novel Authentication Service for Hadoop in Cloud Environment S.Rubika

G.SudhaSadasivam, K.AnithaKumari Department of Computer Science and Engineering/ Information Technology PSG College of Technology Coimbatore, India. [email protected], [email protected] Abstract—Authentication remains as significant security challenge in hadoop environment. Hadoop does not strongly authenticate the client. As a result, data nodes can be accessed using block locations. This paper suggests the usage of the fundamental properties of a triangle and dual servers to improve the security level of hadoop clusters The password given by the user is interpreted and alienated into more than one unit using the authentication server and stored in multiple Backend Servers along with the corresponding username. The Authentication Server uses the values stored in multiple Backend Servers to authenticate the user. Authentication and Backend servers work together to authenticate the user. The registration process and the authentication process are hosted as a web service to authenticate the users before logging into the hadoop cluster. This paper suggests three approaches for security enhancement in hadoop environment based on triangle properties. An analysis on the security level and complexity of these approaches has also been presented in this paper. Index Terms— Authentication, Hadoop, Security, Triangle

I. INTRODUCTION Hadoop [1] is an open source, java-based framework that supports the processing of large data sets in a distributed computing environment. It was originally conceived on the basis of Google's Map Reduce. Authentication remains as significant security challenge in hadoop environment. A user can directly communicate with a data node once the block location is known. This enables the unauthorized clients to impersonate authorized users and access the cluster. Initially password was the only authentication mechanism used to verify the validity of users. Later, Kerberos protocol [2] was introduced. The Kerberos authentication protocol provides a mechanism for mutual authentication between a client and a server. For authentication, Kerberos allows a user to request an encrypted "ticket" from an authentication agent. This “ticket” can then be used to request a particular service from a server. One of the major limitations of Kerberos is that it is not effective against password guessing attacks. . Further, Kerberos requires a trusted path to handle passwords. Kerberos does not support multipart authentication.

PG Student, Department of Information Technology, PSG College of Technology, Coimbatore, India. [email protected] Authorization, the process of granting access to requested resources, is pointless without suitable authentication. Both the cloud provider and the enterprises must consider the challenges associated with credential management and implement cost effective solution that reduce the risk appropriately [4]. Password authentication is considered as one of the simplest and most convenient authentication mechanisms [5]. But password authentication protocols are subject to replay, password guessing and stolen-verifier attacks as described below [6]. (1)Replay attack: A replay attack is an offensive action in which an adversary impersonates or deceives another legitimate participant via the reuse of information obtained in a protocol. (2)Guessing attack: A guessing attack involves an adversary simply (randomly or systematically) trying passwords, one at a time, in hope that the correct password is set up. Ensuring passwords selected from an adequately large space can resist exhaustive password searches. However, the majority of the users choose pass-words from a small subset of the full password space. Such weak passwords with low entropy are easily guessed by means of the dictionary attack [7]. (3) Stolen-verifier attack: In the majority of the applications, the server stores verifiers of users’ passwords (e.g., hashed passwords) instead of the clear text of passwords. In stolen-verifier attack, the adversary who steals the password-verifier from the server can use it directly to masquerade as a legitimate user in a user authentication execution [7]. II. AUTHENTICATION SYSTEM ARCHITECTURES The architecture for authentication systems fall under four categories as described below: a) Single-server model: A single server (Fig. 1) maintains a database of user passwords. Most of the existing authentication systems follow this single-server model. The main drawback of single server is the single point of vulnerability. It leads to offline dictionary attacks against the user password database.

978-1-4673-4422-7/12/$31.00 ©2012 IEEE

b) Simple multi-server model: In the simple multi-server model depicted (Fig.2), the server side comprises of multiple servers to overcome the single point of vulnerability; the servers are equally exposed to users and a user has to communicate in parallel with several or all servers for authentication. The main problem with the simple multi-server model is the demand on communication bandwidth and the need for synchronization at the user side since a user has to engage in simultaneous communications with multiple servers. c) Gateway augmented multi-server model: In the gateway augmented multi-server model (Fig.3). A gateway is positioned as a relaying point between users and servers and a user only needs to contact the gateway. Apparently, the introduction of the gateway removes the demand of simultaneous communications by a user with multiple servers as in the plain multi-server model. However, the gateway introduces a redundant layer in the architecture, to relay messages between users and servers. Gateways also reduce system reliability. d) Two-server model: The two-server model (Fig. 4) comprises of two servers at the server side, one of which is a public server exposing itself to users and the other is a back-end server staying behind the scene. Users contact only the public server, but the two servers work together to authenticate users. We propose to use a modified version of two server model for our authentication framework. It consists of an authentication server and multiple backend servers to store the alienated portions of the password. Hence it taps the advantages of both the multi-server model and two server model. From a security point of view, servers in the multiserver models are equally exposed to outside attackers, while in the two-server model, only the public server faces such a problem. This improves the server side security and in turns the overall system security in the two-server model. This model also eliminates the drawback of simultaneous communications between a user and multiple servers in simple multi-server model and redundancy limitation in gateway augmented multi-server model [8,9,10].It distributes user passwords and the authentication functionalities to two servers in order to eliminate a single point of vulnerability in the single-server model. As a result, the two-server model appears to be a sound model for practical applications [11, 12].

Fig. 1.Single Server Model

Fig. 2. Plain MultiServer Model

Fig. 3. Gateway augmented MultiServer model

Fig. 4. Two-Server Model

Fig. 5. Proposed architecture The proposed authentication protocol enhances the security as the authentication mechanism utilizes an authentication server and two backend servers for authentication. The authentication server is exposed to the user while the backend servers that store portions of the password is hidden for users. In this protocol, the password is interpreted based on properties of triangle and alienated

into more than one unit. These units are stored in two different backend servers,. Only when the combined authentication scheme from the servers authenticates the user, the privilege of accessing the requested resources is obtained by the user. The dual server authentication protocol gives authentication to the hadoop user if and only if both the servers are mutually involved in the authentication mechanism. It is not possible to obtain the password by hacking a single server. The triangle property based protocol offers effective security against the attacks like replay attack, guessing attack and stolen-verifier attack as the user authentication is a combined mechanism of two servers. The remaining of the paper is organized as follows: Section III deals with some of the existing research works and Section IV describes the proposed approaches. Section V discusses about the implementation. Section VI discusses the results and Section VII concludes the paper.

computing system will record the visitor’s information. By using the TCP mechanism in cloud computing, the trace of participants can be known by the cloud computing trace mechanism. The TCP provides cloud computing a secure base for achieving trusted computing. Integration of hardware modules with cloud computing system is a challenging research issue. Hadoop currently uses Kerberos protocol [3] for its authentication. When organizations begin to utilize applications in the cloud, authenticating users in a trustworthy and manageable manner becomes an additional challenge. Our proposed work on a dual server authentication protocol utilizes dual servers for authentication to enhance the cloud security. The significance of this protocol is the usage of the fundamental concepts and basic elements of the triangle for authentication. IV. PROPOSED APPROACH

III. RELATED WORK Lishan Kang [13] has proposed an Identity-Based Authentication (IBA) scheme over traditional mutual authentication. In cloud storage sharing, mutual authentication between users and between user and Cloud environment is critical in ensuring data security. However, traditional mutual authentication using public-key operation unleashes cloud storage system load, computation and communication overhead and reduces scalability. An IBA scheme has short key size, is identity-based and noninteractive. This scheme divides the sharing users between domain. In the domain global master key is shared to exercise mutual authentication. By the analysis of performance, this scheme improves the computational and communicational efficiency over two times. This scheme is enabled by an emerging cryptographic technique from the bilinear pairing and its security can be assured by the Bilinear Diffie-Hellman Problem (BDHP). In IBA scheme, the master key of some do-main becomes the bottleneck of Cloud Storage System’s security. Once the master key of some domain is leaked, the domain’s security will be wrecked. In addition, if a user wants to share another user’s data, they must be in the same domain Zhidong Shen [14] have proposed Trusted Computing Platform (TCP) to aid the process of authentication in cloud computing. The TCP is based on the Trusted Platform Module (TPM). The TPM is a logic independent hardware. It can resist the attacks from both software and the hardware. The TPM contains a private master key to protect for other information stored in cloud computing system. Because the hardware certificate is stored in TPM it is hard to attack it. So TPM provides the trust root for users. Since the users have full information about their identity, the cloud computing system can use some mechanism to trace the users and get their origin. In TCP the user’s identity is proved by user’s personal key and this mechanism is integrated in the hardware, such as the BIOS and TPM. So it is very hard to deceive a user-id. Each site in the cloud

Three approaches have been proposed. Of these proposals, two are symmetric and the one is asymmetric. The proposed approaches utilize the fundamental concepts of triangules or authentication. In the first approach, the password is interpreted in the form of angles between medians of a triangle and stored in the server along with the randomized parameter called as the strengthening parameter. In the second approach, the centroid is calculated from the medians during the registration process is verified from the centriod calculated from triangle vertices during authentication process. The third approach is based on the property of Euler line where the circumcenter, orthocenter and the centroid lie. During registration, any two points are evaluated and stored. This is verified by calculating the Euler line from other two points during authentication process. Finally an analysis on the security and complexity of the three approaches is done. The entire authentication process is hosted as a web service to authenticate the users before logging into the hadoop cluster. APPROACH 1: This approach is based on the properties of medians of a triangle. Details of registration process and authentication process are as follows: A. REGISTRATION PROCESS: Various steps in registration are given below. 1) User logs into the Authentication Server by entering the Username and password 2) Generate the ASCII value for the password in Authentication server (AS) 3) Split the ASCII value into three equal parts m1, m2 and m3. 4) Generate random numbers c1, c2 and c3 (for strengthening parameters). The sides of the triangle AB, BC and CA are y1 = m1x1 + c1; y2 = m2x2 + c2; and y3 = m3x3 + c3 respectively. 5) Find intersection of these lines to find the vertices (A, B, C) of the triangle

6) Evaluate the medians of the triangle BF, CE and AD 7) Calculate the angles θ1, θ2, θ3 between the medians BF and CE, AD and BF, AD and CE respectively. 8) Store the username along with the strengthening parameters in backend server1 (BS1) and θ values in backend server2 (BS2) B. AUTHENTICATION PROCESS: The various steps in authentication process (Fig.6) are as follows: 1) The user enter the username and password in the authentication screen 2) The ASCII value of the password is split into three equal parts m1, m2 and m3. 3) Evaluate c1,c2 and c3 from strengthening parameters from stored in BS1. 4) Calculate the medians of the triangle 5) Calculate the angles θ1, θ2, θ3 between the medians 6) Compare the θ values calculated and the θ values stored in BS2. 7) If the θ values matches, then the user is authenticated

A. REGISTRATION PROCESS 1) The user logs into the Authentication Server (AS) by entering the Username and Password 2) Generate ASCII value for the password 3) Calculate the vertices of the triangle 4) Calculate the medians of the triangle using the points of the vertices 5) The slope (m) and intercept (c) of the medians are stored in BS1 and BS2 respectively. 6) The Username is stored in a separate file in the authentication server B. AUTHENTICATION PROCESS 1) The user enters username and password in the authentication screen 2) Generate ASCII value for the password 3) Calculate the vertices of the triangle 4) Calculate the Centroid (c1) of the triangle using triangle vertices 5) Calculate the Centroid (c2) from m and c stored in the backend servers BS1 and BS2. 6) The user is authenticated when c1 and c2 match.

Fig.7. Approach 2- Registration and Authentication Process

Fig.6.Approach 1- Registration and Authentication Process

APPROACH 2: It is an asymmetric approach based on the centroid properties of a triangle. The registration and authentication process are described as follows (Fig.7)

APPROACH 3: This approach is based on the properties of Euler line. Circumcenter, centroid and Orthocenter are collinear on the Euler line. The circumcenter and centroid is calculated by authentication server during the registration process and stored in backend servers BS1 and BS2. During the authentication process the values of orthocenter and centroid are calculated. The user is authenticated when the euler line formed by circumcenter and centroid matches with the euler line formed by using orthocenter and centroid (figure 8).

attack, as the password is interpreted and then alienated into two modules and stored in the two servers. A. COMPLEXITY ANALYSIS: An analysis on the communication and computation complexity of the proposed system has been carried out. The number of bits in communication between the User and AS; AS and BS1 and AS and BS2 has been analysed and presented in rows 1,2 and 3 for the various approaches. The number of rounds in communication between the User and AS; AS and BS1 and AS and BS2 has been analysed and presented in row 5 for the various approaches. All approaches show the same performance with regard to the number of rounds of communication. The computational complexity of all the three approaches has been analysed and presented in row5 of TABLE I. TABLE I. Complexity Analysis of the proposed approaches

Fig.8. Approach 3- Registration and Authentication Process V. PERFORMANCE ANALYSIS This section details on the security analysis and complexity analysis of the proposed system. A) SECURITY ANALYSIS: The common threats faced by authentication process in cloud environment and the countermeasures in triangle properties based protocols to overcome these attacks are as follows. a) Replay attack: Usually replay attack is called as ‘man in the middle’ attack. Adversary stays in between the user and the server and hacks the user credentials when the user contacts server. To overcome this, the user has to change the credential randomly. But it is less probable to do that. Our protocol is robust when the replay attack happens in between the two servers as the credentials are interpreted and alienated into two parts. b) Guessing attack: Guessing attack is nothing but the adversaries just contacts the servers by randomly guessed credentials. The effective possibility to overcome this attack is to choose the password by maximum possible characters, so that the probability of guessing the correct password can be reduced. As the proposed work uses random generation of prime numbers for the representation of intercepts of the sides of the triangle, it is more difficult to guess the password. c) Stolen-verifier attack: Instead of storing the original password, the server is normally storing the verifier of the password. If the password attacker steals the verifier from the server, then it will masquerade as the legitimate user. This does not happen in any two server protocol, as the password is alienated into two modules. Hence, we can justify that our protocol is also more robust against the

Commu nication (bits)ap proach 1 Commu nication (bits) approac h2 Commu nication (bits)– approac h3 Comput ation(ex ponentat ion)App roach 12&3 Commu nication (rounds) Approac h 1,2 & 3

User->AS Reg Auth |u|+| |u|+| pwd pwd| |

AS->BS1 Reg Auth |u|+|c1|+ |u|+|c1|+ |c2|+|c3| |c2|+|c3|

AS->BS2 Reg Auth |u|+|θ1I |u|+|θ1|+ +Iθ2I+I |θ2|+|θ3| θ3I

|u|+| pwd |

|u|+| pwd|

|u|+|slop e|

|u|+|slop e|

|u|+|inte rcepts|

|u|+|inte rcepts|

|u|+| pwd |

|u|+| pwd|

|u|+|circ um|

|u|+|circ um|

|u|+|cent roid|

|u|+|cent roid|

O(n )

O(n)

O(n)

O(n)

O(n)

O(n)

1

2

1

2

1

2

Here ‘n’ is the number of bits in password and | a | is length of ‘a’. VI. CONCLUSION This paper proposed three approaches using dual servers and properties of triangles. The proposed authentication protocols enhance Hadoop security as the authentication mechanism utilizes two servers for authentication. As the servers keep the interpreted and distinct form of user credentials, there is very less chance to reveal the user credentials to the adversary. Moreover, the protocol utilizes the fundamental properties of the triangle has made the protocol more secure as the alienated passwords had been derived from these triangle parameters. The generation of

random numbers improves the security level as they cannot be easily hacked. So the utilization of this protocol will make the Hadoop environment more secure. ACKNOWLEDGEMENT We are thankful for Mr Chidambaran Kollengode, Director, Cloud and Big Data Analytics group and other team members for providing the required guidance to complete the project. We thank Dr R Rudramoorthy, Principal, PSG College of Technology for providing the required support. This project is a result of PSG-Nokia Research agreement on Big Data Analytics. REFERENCES [1] Tom White, “Hadoop- Definitive guide”, O’Reilly,2009 [2] Owen O’Malley, “Integrating Kerberos into Apache Hadoop”, Kerberos Conference 2010, 26-27 October 2010, MIT, USA. [3] Mark Brunet, “Perfect Password: Selection, Protection, Authentication”, Syngress, 2005. [4] Cloud Security Alliance "Domain 12: Guidance for Identity & Access Management V2.1",April 2010 [5] Her-Tyan Yeh, Hung-Min Sun and Tzonelih Hwang, “Efficient Three- Party Authentication and Key Agreement Protocols Resistant to Password Guessing Attacks”, Journal of Information Science and Engineering, vol.19, no.6, pp. 1059-1070, 2003. [6] Lin, C.L., and T. Hwang, “A password authentication scheme with secure password updating”, Computer & Security, vol.22, no.1, pp.68–72, 2003. [7] Eun-Jun Yoon, Eun-Kyung Ryu and Kee-Young Yoo, “ Attacks and Solutions of Yang et al.’s Protected Password Changing Scheme”, Informatica, vol.16 , no. 2, pp. 285-294, April 2005. [8] Yanjiang Yang, Feng Bao, "Enabling Use of Single Password Over Multiple Servers in Two-Server Model",2010 10th IEEE International Conference on Computer and Information Technology (CIT 2010). [9] Dexin Yang, Bo Yang, "A Novel Two-Server Password Authentication Scheme with Provable Security",2010 10th IEEE International Conference on Computer and Information Technology (CIT 2010). [10] Jun Ho Lee and Dong Hoon Lee, "Secure and Efficient Password-based Authenticated Key Exchange Protocol for TwoServer Architecture",2007 International Conference on Convergence Information Technology [11] Yanjiang Yang, Robert H. Deng, Senior Member, IEEE, and FengBao, "A Practical Password-Based Two-Server Authentication and Key Exchange System",, ” international journal of dependable and secure computing, vol. 3, no. 2, April-June 2006 [12] Yanjiang Yang, Feng Bao, "Enabling Use of Single Password Over Multiple Servers in Two-Server Model",2010 10th IEEE International Conference on Computer and Information Technology (CIT 2010). [13] LishanKang, Xuejie Zhang(2010), "Identity - Based Authentication in Cloud Storage Sharing",2010 International Conference on Multimedia Information Networking and Security [14] Zhidong Shen, Qiang Tong(2010), “The Security of Cloud Computing System enabled by Trusted Computing Technology",2010 2nd International Conference on Signal Processing Systems

[15] V Ruckmani and Dr G Sudha Sadasivam, “A Novel TrigonBased Dual Authentication Protocol for Enhancing Security in Grid Environment” international journal of computer science and information security, Vol 6 No 3, 2009