Checkpointing Using Mobile Agents for Mobile Computing ... - CiteSeerX

7 downloads 14002 Views 326KB Size Report
Index Terms-rollback recovery, mobile agent, checkpoint. I. INTRODUCTION. Traditional ... A mobile agent is a composition of software and data which is able to ...
POSTER PAPER International Journal of Recent Trends in Engineering, Vol. 1, No. 2, May 2009

Checkpointing Using Mobile Agents for Mobile Computing System Chandreyee Chowdhury1, Sarmistha Neogy1 1

Department of Computer Science and Engineering, Jadavpur University, Kolkata, India Email: {chandreyee.chowdhury,sarmisthaneogy}@gmail.com concurrently on MHs and MSSs in the network. The processes communicate asynchronously with each other. A process experiences a sequence of state transitions during its execution and the atomic action which causes the state transition is called an event. The event having no interaction with another process is called an internal event; the message sending and receipt are external events. Computation is a sequence of state transitions within a process. Mobile IP is used as the underlying protocol for message transmission. So, the presence of Home Agent (HA) and Foreign Agent (FA) are treated as MSS. The huge computing potential of these systems are often hampered by transient and independent failures. So to provide reliability despite the imposed constraints, checkpointing using mobile agents can be used. A mobile agent is a composition of software and data which is able to migrate from one system to another- all under its own control. It can be viewed as a distributed abstraction layer that provides the concepts and mechanisms for mobility and communication [2]. Here it is used to manage the checkpoints and message logs during recovery. A local checkpoint is a recorded state of a process. A global checkpoint is a set of local checkpoints, one from each process in a distributed system [3]. A consistent global checkpoint is one in which every message that has been received is also shown to have been sent in the corresponding state of sender. Rollback recovery can be either checkpoint-based or log-based [4]. In checkpointbased rollback recovery, recovery relies solely on saved checkpoints. In log-based rollback recovery, both checkpointing and logging are used [5]. In this paper, we have combined these two techniques. Checkpointing and recovery protocols previously proposed for mobile environments depend on finding a set of consistent global checkpoints. Most of the techniques are based on either coordinated checkpointing or communication induced one. In coordinated checkpointing synchronization among checkpoints incur large message overhead. Also in communication induced checkpointing messages are piggybacked with checkpointing information needing more bandwidth. Also, if a process fails, most of the algorithms force all the processes to recover. Here we have used independent checkpointing to reduce message overhead. Messages are logged in the MSSs so that upon failure only the failed MH needs to recover.

Abstract- Mobile computing systems often suffer from failures that are transient and independent in nature. To add reliability and high availability to such distributed systems, checkpoint based rollback recovery is one of the widely used techniques for applications such as scientific computing, database, telecommunication applications and more importantly mission critical applications. But control message based algorithms incur large overhead in network traffic. Hence to solve this problem and to address constraints like low bandwidth posed by wireless network, mobile agents may be used for efficiency. We present here a rollback recovery algorithm based on independent checkpointing and message logging. The novelty of the algorithm is that here mobile agents are used to manage the message logs and checkpoints. Also whenever a mobile node goes far away from its latest checkpoint the agents manage to move the checkpoint and message logs that are stored in distant Mobile Service Stations. Thus recovery time of a mobile node will never exceed a certain threshold. Logging of messages ensures that only one checkpoint is needed to be stored in persistent storage. Index Terms-rollback recovery, mobile agent, checkpoint

I. INTRODUCTION Traditional rollback recovery algorithms work well with tightly coupled systems. Variations of these algorithms that are still based on message-passing paradigm perform fairly well in static distributed systems. But due to the low bandwidth, limited power consumption, low memory along with frequent disconnection, such checkpointing algorithms perform poorly in the mobile distributed system. A mobile distributed system consists of both Mobile Hosts (MH) and static Mobile Service Stations (MSS). A set of dynamic and wireless communication links can be established between an MH and an MSS, and a set of high-speed communication link is assumed between the MSSs. An MSS may communicate with a number of MHs but an MH at a time communicates with only one MSS. An MH communicates with the rest of the system via the MSS it is connected to. Message transmission through wireless links takes an unpredictable but finite amount of time. Reliable message delivery is assumed during normal operation. The system does not have any shared memory or global clock [1]. Distributed computation in such mobile computing environment is performed by a set of processes executing 26 © 2009 ACADEMY PUBLISHER

POSTER PAPER International Journal of Recent Trends in Engineering, Vol. 1, No. 2, May 2009

is, the mobility profile. After the MH takes a new checkpoint, previous message logs and checkpoint are deleted and mobility profile is initialized with the current MSS where the checkpoint resides. If the mobility profile at any point contains more than N (n) distinct MSSs, that is the MH might have gone as far as (N+1) from the last checkpoint and hence if fails, may need more time to recover. So the new CPA sends the LGA located at the MSS containing last checkpoint, a request to send the checkpoint. Upon receiving this request the message log is migrated to the new MSS. So, the CPA will send request for log transfer to at least K LGAs residing at the beginning of the profile. Here we assume that the log unification time of (n-k) MSSs is negligible as compared to the total recovery time needed by the MH. The LGAs upon receiving the request will transfer their logs asynchronously to the LGA of the destination MSS. Upon completion of receiving the logs, current receiving LGA will send a request to the CPA to update its mobility profile. But if the CPA has already left that site, the request will be forwarded to the next site. If the MH roams around a particular area, though there is no advantage in migrating the checkpoint but the log becomes scattered thereby causing high network traffic during recovery. So, if at any time the length of the mobility profile reaches a maximum of N MSSs then the logs are needed to be moved to one place. Let us take an example. Before leaving an MSS (say A) the CPA finds that its mobility profile exceeds the length N. So the agent sends a control message to MSS O1 and others residing at the beginning of the profile where O1 contains the most recent checkpoint. The MSSs then send their logs asynchronously to A. Upon completion of the log transfer, the MSSs send completion message to the sender (A). If the CPA has left A by that time then the next destination will be searched for. After receiving the message the CPA updates its mobility profile. The log transfer process is aborted if the MH fails or takes a new checkpoint. As an MH fails, current MSS detects it and notifies the current location to the HS. To keep the system safe receiving host can authenticate the CPA. If after authentication, the receiving host decides not to permit normal execution then all message logs will be either passed to the previous host or to the HA. However checkpoints can be delayed until a trusted receiver is reached. Hence the mobility profile will not contain entry of this host.

II. SALIENT FEATURES OF THE PROPOSED SCHEME • •

• •



Mobile agents are used to relieve the MSSs from managing message logs of the MHs it serves. Mobile agents track the different positions of the message logs and latest checkpoint. So, if the MH goes far away from any of the MSSs (where the logs are stored), the checkpoint and a portion of the message log are shifted to assist faster recovery upon possible failure. Only one checkpoint is stored in stable storage. Neither the send nor receive message log size should be large because only a few messages are exchanged in wireless network. Also, if a process communicates frequently then its checkpointing interval can be reduced. If an MH fails it does not need much time to recover as the message logs to be shifted or the checkpoint will only be few hops away always. This is possible because most of the time a process recovers right after the failure in the same or nearby location. III. SYSTEM MODEL AND ASSUMPTIONS

The MSSs are assumed to be fault-tolerant because it is quite feasible to apply hardware fault tolerance techniques like hotswap at the MSSs. An MH can communicate with the rest of the system via the MSS it is connected to, which may be referred to as the HA of that MH. If an MH moves to the cell of another base station, wireless channel to the old MSS is disconnected and a wireless channel in the new MSS is allocated. The state of the MH at the time of disconnection is available from the old MSS. While disconnected only local events take place at MH. Fail-stop model of communication is assumed. IV. THE MOBILE AGENT ENABLED CHECKPOINTING PROTOCOL We have assumed independent checkpointing in the present work, that is, a process running on an MH takes a checkpoint whenever it is convenient [6]. Checkpoints are used to limit the log size at the MSSs [7]. Garbage collection is achieved without direct participation of mobile hosts as in [7]. Two mobile agents are used in the algorithm - a checkpointing agent (CPA) and a log agent (LGA). A CPA corresponding to each MH initially resides at the HS. An LGA residing at every MSS keeps track of the various message logs of the MHs. The CPA always resides in the MSS with which the MH is currently connected. Whenever an MH leaves a cell its corresponding CPA gives its next destination id to LGA of the current cell so that the LGA can easily find the location of CPA at any time by tracing the ids. As the MH moves, the checkpointing agent also moves to the corresponding MSS. The agent keeps track of the list of MSSs that contains the received message log of the MH, for example, MSS1->MSS3->MSS2 and so on, that

Ckpt_initiate=false//checkpoint transfer process is started or not Next_destination=HA

Mobility profile=null //list of MSSs (FAs) visited by the MH in the current checkpointing interval. Messages and checkpoint are logged in these MSSs. List_for_log_transfer=null //list of MSSs whose logs are needed to be transferred for faster recovery. Send_buff[][]

//stores

unacknowledged messages

Receive buff[][]

//stores received

Figure 1. Data Structures used by the protocol

27 © 2009 ACADEMY PUBLISHER

//next FA

address of the MH

POSTER PAPER International Journal of Recent Trends in Engineering, Vol. 1, No. 2, May 2009 1. 1. if MH takes a checkpoint and sends it to this MSS where the CPA is currently active

1.1.1

store that message in buff[i][j] //j is the message sequence number 1.1.2 send the message to the MHi

1.1 store the latest checkpoint of the MH in stable storage of this MSS 1.2 send request to LGAs of the MSSs from the mobility profile to clear the message logs and previous checkpoints 1.3 current_ckpt=this MSS id 1.4 initialize mobility profile with the current MSS id

2.1 transfer the message from buff[i][j] to receive_buff[i][j] //j is the message sequence number 2.2 if MHk is active in that cell 2.2.1 send acknowledgement

2.3 delete the message from send_buff[i][j] (if exists)

4.1 move receive_buff[i][] to the LGA of sender MSS 4.2 move checkpoint (if any) to the LGA of sender MSS 4.3 send “log transfer complete MHi ” message to the LGA of the sender MSS

4.1 Set ckpt_initiate=false

5.

5 If the MH leaves this MSS and goes to a new MSS

if a request is received to clear message logs and checkpoints

5.1 clear receive_buff[i][] 5.2 if any checkpoint for MHi exists then delete that checkpoint

5.1 Set Next_destination= new MSS id before leaving the current cell

6 If the MH disconnects itself from the network 6.1 the CPA should stay at this cell 6.2 inform the HA about its current location 7 if a message is received by the MH

6.

if a “log transfer complete for MHj” message is received from LGA of MSSi for MHj

6.1 if MHj present in the current cell

7.1 Ask the local LGA to store that message 7.2 if the current MSS id (where the MH currently resides) is not already present in the mobility profile

6.1.1 forward the msg to the corresponding CPA

6.2 else 6.2.1 forward the message to the LGA of Next_destination

Add the MSS id to the mobility profile

8 if a message is sent by the MH whose acknowledgement is yet to be received

6.3 delete MSSi from List_for_log_transfer 6.4 if List_for_log_transfer becomes empty

8.1 Ask the local LGA to store that message 8.2 if the current MSS id (where the MH currently resides) is not already present in the mobility profile

6.4.1 if MHj is present in the current cell

6.4.1.1 send “all log transfer complete for MHi” to the CPA of MHi

Add the MSS id to the mobility profile

8.3 If a message acknowledgement comes 8.3.1

3.1 store the message in send_buff[i][j] //j is the

message sequence number 4. if a log transfer request is received from the CPA corresponding to MHi

3.1 delete that MSS id from the mobility profile

8.2.1

if a message is sent by MHi

3.

If “all log transfer complete for MHi” message is received from an MSS containing the List_for_log_transfer of MHi

7.2.1

if acknowledgement for a message arrives from an MHi for MHk

2.

2 if length of the mobility profile is greater than n and ckpt_initiate==false 2.1 send request for message log and checkpoint transfer to k LGAs of the MSSs residing at the beginning of the mobility profile 2.2 set ckpt_initiate=true 2.3 keep the list of k MSSs in the current LGA in List_for_log_transfer 3 if “log transfer complete for MHi” meassage is received from any LGA then 4

if a message for/to an MHi arrives

1.1 if the MHi is active at that time and is not recovering from failure

6.4.2 else 6.4.3 send “all log transfer complete for MHi” to the

Ask the LGA to delete that message

Figure 2. Steps that a CPA follows

Figure 3. Steps an LGA follows

If the mobile node resumes execution after prolonged disconnection at this location, it searches for the CPA in this MSS and then neighboring MSSs. If not found asks the HA for the current location of the CPA of the MH. Checkpoints are taken periodically according to the local clock of the MHs and sent to the MSS it is currently connected to. Mobile IP identifies this MSS to be the FA. LGA is a fixed software agent residing at every MSS and takes care of the message logs. CPA is responsible for maintaining the mobility profile. It moves from one FA to another and upon failure/disconnection it moves back to the HA. Figure 2 lists a CPA’s actions while arriving at a new MSS. Figure 3 shows the details of the actions of an LGA of any MSS. If an MH fails and does not start recovering for a long time then nearby MSSs can be informed about the presence of the corresponding CPA of the MH in the

current MSS. Such forward notification [8] scheme ensures that when the MH comes up under some MSS, its CPA will be immediately forwarded to that MSS. V. PERFORMANCE Whenever an MH wants to recover after failure, checkpoint transfer request is sent to the MSS containing the latest checkpoint and log transfer request is sent to all other MSSs of the mobility profile. So the MH has to wait till the checkpoint arrives. Let us call this time the checkpoint transfer time (CTT). Then CTT can be calculated as [(checkpoint size)/link speed + propagation delay] While the checkpoint arrives, message logs also get transferred. So the log transfer time (LTT) can similarly be calculated to be: 28

© 2009 ACADEMY PUBLISHER

POSTER PAPER International Journal of Recent Trends in Engineering, Vol. 1, No. 2, May 2009

movement of mobile agent is much less to ensure less network traffic. TABLE I.

CONCLUSION

COMPARISON BETWEEN OUR ALGORITHM AND OTHER EXISTING MOBILE AGENT

In this paper we have presented a mobile agent based checkpointing and recovery protocol for a mobile computing system. Our algorithm uses number of hops that a MH takes to decide whether or not to shift the checkpoints and message logs for faster recovery upon possible failure. Since independent checkpointing is used, no synchronization messages are needed to be exchanged thus saving network bandwidth. Since the applications for the wireless network typically exchanges lesser number of smaller messages as compared to its wired counterpart, the message log does not become too large for the MSS buffers. The metric shown in table 1 indicates linear growth of time and moves complexity. This signifies the scheme to be quite scalable for the mobile computing environment.

BASED CHECKPOINTING SCHEMES

Protocol presented in [10] Hamiltonian O(n2)

Protocol presented in [3] General O(n2)

O(n)

O(n)

Number of checkpoints (each process) stored for k concurrent initiations

One permanent, one temporary

One permanent, one temporary

Data carried by agents

O(1)

O(n/k) average

Maximum number of checkpoints rollback after a failure

One temporary checkpoint

Network topology Worst case moves complexity Time Complexity

One temporary checkpoint

Our Algorithm General O(n) O(k) where k