An Efficient and Bandwidth Sensitive Parallel Download Scheme in ...

1 downloads 121 Views 157KB Size Report
replica servers to improve the performance of the data transfer. To adapt the .... allocates the block to a faster server by checking if the completion time can be ...
An Efficient and Bandwidth Sensitive Parallel Download Scheme in Data Grids Ruay-Shiung Chang, Chun-Fu Lin, Jiing-Hsing Ruey and Shih-Chun Hsi Department of Computer Science and Information Engineering, National Dong Hwa University, Hualien, TAIWAN, R.O.C [email protected], [email protected], [email protected] T

T

T

F

F

Abstract—For modern scientific applications such as astrophysics, astronomy, aerography, and biology, a large amount of storage space is required because of the large-scale datasets. Data Grid collects distributed storage resources such as hard disk space across heterogeneous networks to meet such requirements. In data grid environment, data replication service that copies the replicas to proper storage systems increases the reliability of the data access. By means of these replicas, parallel download creates multiple connections from the client side to the replica servers to improve the performance of the data transfer. To adapt the bandwidth-variation and to make the data transferring more efficient, a parallel download scheme which is called EA (Efficient and Adaptive) parallel download is proposed in this paper. The scheme is to re-evaluate all of the replica servers during the download progress and replace the decaying selected servers with better backup servers. According to our experiments in the Unigrid environment, the EA parallel download decreases the completion time by 1.63% to 13.45% in natural Unigrid environment and 6.28% to 30.56% in choreographed Unigrid environment when compared to the Recursive Co-Allocation scheme. It means that the proposed scheme adapts to the dynamic environment nicely and decreases the total download time effectively. I.

INTRODUCTION

In recent years, a distributed technique called Grid Computing is proposed. In brief, Grid shares computing resources and all kinds of services and resources through Internet. Generally speaking, Grid is mainly divided into two types according to the shared resources: Computational Grid and Data Grid. Computational Grid and the Data Grid are not mutually independent. In fact, both of them play important roles in grid environment and are complementary to each other. In general large-scale scientific projects, the experiments need not only high-performance computing for analyses and simulations but also enough storage space to store large amount of datasets. Thus, as described above, Data Grid collects distributed storage resources such as hard disk space to deal with the management of the large amount of datasets. A. Parallel Download Parallel download is a modern technique to improve the performance of the data transfer. The most significant

difference between the parallel download and the single download is that the parallel download creates multiple connections from the client side to the servers and transfer the data file at the same time. By means of the data transferring from more than one server, the parallel download will increase the download speed. There is a premise that every server in the network must have the requested file which we called a “replica”. Traditionally, there are three kinds of the parallel download algorithms in [1]: 1) Static Equal: The requested file with size of b bytes length will be divided into n equal size, where n is the number of the servers. Each server will be assigned b/n bytes to send to the client. The transfer is complete until all the selected servers finish their allocated task. The advantage of the method is its simple implementation. But if one of the selected servers is in bad situation or the connection is low-bandwidth, the transferring will be encumbered by this slower server. 2) Static Unequal: The static unequal scheme improves the static equal method by dividing the requested file into unequal sizes. The scheme divides the requested file with b bytes length into n unequal size, where n is the number of replica servers. The size of each segment is in accordance with the performance or the throughput of each server. 3) Dynamic: The dynamic scheme divides the whole requested file into many small pieces with equal size before the data transferring. When the transfer starts, each server is assigned to send one of the pieces to the client. If a server finishes its job, the server will be assigned another block one by one until all of the pieces are finished. The scheme is adaptable to the change of the connection state. During the data transferring, the faster and more stable servers will send more pieces to the clients. Because there are so many small pieces of block, the large number of the block requests will cause extra overhead to the network environment. B. Motivation From the above description, the traditional parallel download schemes work well only when the network, servers, and client are all stable and some specific conditions must be met. Therefore, if there exists a parallel download scheme that can adapt the fluctuation of the network bandwidth and the variance of the servers’ situation, the parallel download will

save the transferring time considering the changes of the environment. Thus, in this thesis, we will propose an efficient and bandwidth sensitive parallel download scheme in data grids to reach such purpose.

The Abort and Retransfer scheme and the adPD both allow the faster server to help to deliver the block of the slower server. By this simple concept, we will propose a scheme which allows the slower server to be placed by a faster server which is even not selected at first.

II. RELATED WORK A. Co-Allocation Architecture When The Co-Allocation [2] is one of the common architecture in data grid. The proposed EA also adopts this architecture. Fig.1 depicts the Co-Allocation architecture. The Co-Allocation architecture performs as follows: When a client submits a job through an application, the application sends the necessary information and description of the file to the broker. The broker is responsible for receiving the file request (Agent) and allocates the job to the storage servers (Co-Allocator). The broker which is associated with the information service such as grid information service (GIS) and monitoring and discovery system (MDS) will get a list of the available resources for the target file. When the broker selected the servers, the Co-Allocator can download the data from storage system servers by using GridFTP. Finally, the broker sends the received data to the application to complete the download task. The Recursive Co-Allocation [3] is proposed to reduce the waiting time that the slowest server needs for transferring the final block. The scheme works by dividing the entire file into several sections. Then the section is dividing into many small pieces of block. The allocated number of blocks to individual server is based on the predicted bandwidth, I/O state, and CPU loading. When the fastest server finished its job, the next allocation is triggered to get with the fluctuation of the environment by re-evaluating the selected servers. The only fly in the ointment is the probable load-unbalance between the two rounds. Fig.2 depicts the progress of the Recursive CoAllocation, where E(Ti) is the estimated finished time of the ith round and Ti is the actual finished time. Note that the unfinished blocks of a round will be allocated again in the next round. B. Intermag Conference-Related Papers Adaptive Dynamic Parallel Download [4](adPD) divides the desired file into equal parts and each part is allocated to an individual server. When the fastest server finished its job, the server will be assigned with a part of the job of the slowest server in proportion to the value of bandwidth and so on. The advantage of adPD approach is the adjustment of the size of the transferring data without too many block requests. The Abort and Retransfer [5] scheme is proposed to improve the waiting time for the slowest server to deliver the final block. After all the data blocks are assigned, the scheme allows the abort of the delivery of the slowest server and allocates the block to a faster server by checking if the completion time can be improved recursively.

Broker (includes Agent and Co-Allocator)

Client (Application)

Local Storage System

Local Storage System

GIS MDS

Local Storage System

Local Storage System

Figure 1. The Co-Allocation Architecture The file

Section 1

Round 1

Section 2

E(T1) T1

Round 2

Section 3

E(T2)



Round 3

Server 1

Server 2

․․․․․

Server 3

Figure 2. The Recursive Co-Allocation Architecture III. EA PARALLEL DOWNLOAD A. Assumptions of the EA Parallel Download 1) The Replica Location: The local area network provides the best transfer in general and then the transfer usually achieves the maximum speed between the client and the local server. In this case, creating other connections to remote servers will not improve any transmission rate. Therefore, assume there is no replica in the local server. If there is no target replica file in the local area network, the nearest replica server may be chosen to serve the client because of the fewer hops between them. Intuitively, the nearest server has the highest transmission rate. 2) The Number of Replicas: The worst case in parallel download schemes is that all of the replica servers fall into a bad situation or the available bandwidth between the client and all of replica servers become congested at the same time. To be able to adapt this worst case, assume the number of replicas is much more than the number of connections which the client decided at the beginning of the download. 3) The Size of Target File: Parallel download schemes are designed to download the data with large size. If the data size is too small (such as less than 10 MB), it is not necessary and not recommended to adopt parallel download. More than one connection built between client and servers and the coordination between servers will take a high proportion of time for small data transferring. Hence, the bigger a target file

is, the more advantageous the parallel download scheme will achieve over the single download scheme.

is a and the average time cost of filter mechanism is b, the total transferring time is:

B. The Procedure of EA Parallel Download 1) Step 1. The File Request: Once a user asks for a target file, the request will be sent to an SRB [6] (Storage Resource Broker) server. The SRB will get the resource information from an MCAT (Metadata Catalog)-enabled server. 2) Step 2. To Select the Replicas: We will set two thresholds for the CPU loading and available memory respectively. And then we will sort the qualified servers by the bandwidth between the replica servers and the client. At last, the better n replica servers will be selected to serve the download request. Besides, the non-selected replica servers will not exit the transferring completely. They will stand by and wait for the substitution for the selected servers according to the result of the filter mechanism discussed in detail in Step 3. Thus, all of the non-selected servers are the backup servers during the EA parallel download progress. 3) Step 3. Allocating Data Blocks: The EA parallel download scheme divides the data file into small blocks and assign a suitable number of blocks to individual replica servers. A “section” means a part of the data file which consists of a number of small blocks. Because the focal point of the EA parallel download is the filter mechanism in Step 4, the division of the sections of the data file is a critical point for the efficiency of the download process. In Recursive Co-Allocation, a large section at the beginning of the transfer and a small section near the end of the transfer may be adequate for the schemes with fixed replica servers. Conversely, a smaller section will trigger the filter mechanism more frequently but the overhead is also higher due to the frequent trigger and more connection requests. To detect the decaying server as accurate as possible is the key of the substitution result. We will divide the data file into S sections with equal size where S is set with two modes: Low-Adaptation mode and High-Adaptation mode. These two modes will be described in detail in Section 5. As shown in Fig.3, similar to [3], the download process is separated by filter point which is between rounds. The primary difference is that the Recursive Co-Allocation scheme divides the data file according to an α ((0.5) n) coefficient which n is the number of sections and the EA Parallel Download scheme divides the data file into equal size. In each round, the replica server is assigned a number of blocks in a section in proportion to the individual bandwidth. Once a selected replica server finishes its transfer, the filter mechanism will be triggered and then the next round will start. Note that the unfinished data of the other servers will be added to the next section and transferred in the next round. 4) Step 4. The Filter Mechanism: At the filter point between two rounds, the substitution action of the replica servers is triggered. The selected servers and the non-selected (backup) servers are re-evaluated at this time. But if the filter mechanism costs too much time, the performance of the entire download must be decayed. So, if the number of the sections

TotalTime = (a-1) × b + Data_Transfetringr_Time

(2)

where Data_Transferring_Time is the time used in data transferring. Data size Section 1

Section 2

Section 3

Section 4

Section 5

…… Round 1

Round 2

Round 3

……

Server1

Server2

Transfer progress

Server3

Filter Point

Filter Point +

= The transfer size in a round

A section

The unfinished parts in the last round

Figure 3. The section allocation when S=5

According to the above equation, if we can minimize the time costs of filter mechanism (b), the EA Parallel Download will be more efficient. In [7], the author makes mention of resource monitoring. To achieve better performance, the resource evaluation of the replica selection and the filter mechanism is not to collect the complete information of the replica servers but to collect just the needed value. The pseudo code of the simple evaluation function is shown in Fig. 4 The evaluation function first checks if the CPU usage and the available memory of the replica server both meet a basic requirement to serve the download request. Ref. [8] indicates that the server needs to upgrade if the CPU usage of an SQL database server is more than 75% and the available memory is less than 50MB. Cisco HSI (H.323 Signaling Interface) in [9] also sets 75% CPU usage as a default threshold to reject other calls. And the EA parallel download process must occupy about 2-5% CPU usage and 3-4MB available memory in each selected replica server. Thus, the thresholds we set are 75%-5%=70% CPU usage and the 50MB+4MB=54MB available memory. Evaluation_Function () { if (the CPU usage 54MB) return Available_Bandwidth; else return 0; } Figure 4. The pseudo code of the evaluation function

After the evaluation function is executed individually by each replica server, the replica servers send the available bandwidth value to the client directly. This scenario makes the loading of the resource broker lower and saves some transfer

time because the information of the replica servers is the available bandwidth value left. Then the client only sorts the values to select the better n servers. Note that if a selected replica server is not in the list of the better n servers, the connection between the resource broker and the replica server will be disconnected. And the newly selected replica servers which are in the list of the better n servers will connect to the resource broker to begin downloading. 5) Step 5. The Filter Mechanism: After the final round, if the overhead for reallocation of the remaining (unfinished) data file adding the predicted transfer time is more than the predicted transfer time of the fastest single download, the EA will assign the entire unfinished data file to the fastest replica server. If not, each selected replica server will be assigned a portion of the unfinished blocks which is the same as in step 3. IV. EXPERIMENTS Our testbed consists of 10 machines from Unigrid [10], Globus toolkits 4 [11], NWS [12] (Network Weather Service) and SRB (Storage Resource Broker) 3.4.1 are installed in every machine to build the data grid environment. The experiments are implemented in JAVA. The NWS is used for measurements of the bandwidth between the replica servers and the resource broker. The other information such as CPU loading, available memory, network loading are monitored by Ganglia [13]. For the fairness of the experiments, the machines we choose for the EA scheme in the first round is the same as the Recursive Co-Allocation scheme. That is, the backup machines are evaluated and taken into account after the first filter point. The client side is in NDHU (National Dong Hwa University). And each experiment is repeated 20 times and the extreme results (The best and the worst) are removed. The displayed data in the following profiles is the average value of the remaining results. A. The Natural Unigrid Environment To decide the number of sections properly, we analyze the relationship between the overhead and the number of sections at first. At a filter point, the time for the evaluation of replica servers, re-connection of replica servers, and sections reallocation is supposed to be a small part of the whole data transferring. According to (2), if the number of filter points (a-1) is too large, the total transferring time will be increased and even exceed the effect of the filter mechanism. Therefore, we would like to find a tolerable overhead that help us to set an appropriate number of sections. The overhead of data transferring with different number of sections is derived from the following equation: Overhead(sec)=[ TEvaluation_Function × (Number_of_Servers) +TServer_Reconnection × (Number_of_Connections × SP) +TSection_Reallocation] × (Number_of_Sections-1) (3) B

B

B

B

B

B

B

B

The average values of TEvaluation_Function, TServer_Reconnection and TSection_Reallocation in our implementation are about 10ms, 0.18s and 1ms respectively. The SP is the probability of a selected server to be substituted by a backup server at a filter point. In our experiments, the average SP in the natural Unigrid environment is about 23.62%. The number of servers is 10 and the number of connections is 5 at most in the experiment. For the average case, the number of connections is 3 and the number of servers is 6. According to the above setting, the overhead in this environment is simplified to the following equation from (3): B

B

B

B

B

B

Overhead(sec) = 0.189sec × (Number_of_Sections-1)

(4)

Because the minimum data size for using EA is suggested to be 128MB, we set the 128MB data transferring with 3 connections as a base case. And according to the experiment of this case in this environment, when Num_of_Sections-1 (number of filter points) is set to 3, this case has the minimum completion time. Thus, according to (4), the tolerable overhead of 128MB data transferring is set to 0.567 second in Low-Adaptation mode for the natural Unigrid environment. Based on the above analysis, the numbers of sections with different number of connections and data sizes are derived from the following equations: Overhead(sec)=[ TEvaluation_Function × (Number_of_Servers) +TServer_Reconnection × (Number_of_Connections × 23.62%) +TSection_Reallocation × (Number_of_Sections-1) < Data_Size/128MB) × 0.567 (5) B

B

B

B

B

B

The maximum positive integer in above equation is the number of sections in the Low-Adaptation mode. As shown in Fig. 5, the EA parallel download and the Recursive Co-Allocation are 1.51 to 2.78 times faster than the single download from the fastest site (Institute of Information Science, Academia Sinica). The number of connections from 1 to 2 has the most improvement in both schemes. The improvement from 4 to 5 connections is vague because the maximum transmission rate of the client is reached. More than 5 connections even decreases the performance due to the increased overhead of the coordination between more replica servers.

450 420 390 360 330 300 270 240 210 180 150 120 90 60 30 0 (sec)

128MB

256MB

512MB

1024MB

Single Download (IISAS)

EA(Low):Number of Connections=2

Recursive:Number of Connections=2

EA(Low):Number of Connections=3

Recursive:Number of Connections=3

EA(Low):Number of Connections=4

Recursive:Number of Connections=4

EA(Low):Number of Connections=5

Recursive:Number of Connections=5

Figure 5 EA Parallel vs. Recursive Co-Allocation in natural environment

The completion time of the EA parallel download is less than (1.63% to 13.45%) the Recursive Co-Allocation in nearly every case of our experiments. But in fact, the Recursieve CoAllocation still has less completion time in 33% to 56% runs of each case. Nevertheless, if the filter mechanism is triggered in time, the EA Parallel Download can maintain a tolerable completion time. Recursive Co-Allocation scheme has about 2.5% to 12.5% better minimum completion time, but also has about 10.6% to 16.0% worse maximum completion time and a higher standard deviation. B. The Choreographed Unigrid Environment We will conduct the second experiment in a different way: we suppose that some of the resources have to perform another data transferring task to simulate the more dynamic environment. In this experiment, we schedule a FTP transferring every 30 minutes in half of the machines as background processes Each FTP transfers 1.5GB data file from each scheduled machine to a machine in NDHU site which is not a client. That is, according to the transmission rate of single download in each site (1.62MB/s to 2.313MB/s), there is about 37% to 53% time (11.07min to 15.80min) under the additional FTP transferring in each scheduled machine. Based on the same analysis in Natural Unigrid Environment , the numbers of sections with different number of connections and data sizes are derived from the following equations: Overhead(sec)=[ TEvaluation_Function × (Number_of_Servers) +TServer_Reconnection × (Number_of_Connections*52.33%) +TSection_Reallocation] × (Number_of_Sections-1) < (Data_Size/128MB) × 1.718 (6) B

B

B

B

B

B

The maximum positive integer in above equation is the determined number of sections in High-Adaptation mode. Fig. 6 shows the completion time of 256MB and 1024MB data transferring. In this environment, all cases are worse than in the natural environment. And the completion time of EA Parallel download scheme is 6.44% to 16.87% less than the Recursive Co-Allocation scheme in Low-Adaptation mode.

570 540 510 480 450 420 390 360 330 300 270 240 210 180 150 120 90 60 30 0 (sec)

Single Download(IISAS) EA(Low):# of Connections=2 EA(High):# of Connections=2 Recursive:# of Connections=2 EA(Low):# of Connections=3 EA(High):# of Connections=3 Recursive:# of Connections=3 EA(Low):# of Connections=4 EA(High):# of Connections=4 Recursive:# of Connections=4 EA(Low):# of Connections=5 EA(High):# of Connections=5 256MB

1024MB

Recursive:# of Connections=5

Figure 6 EA Parallel vs. Recursive in Choreographed environment

Compared to the experiments in natural Unigrid environment, the difference between EA Parallel Download and Recursive Co-Allocation is more apparent. The completion time of EA Parallel Download scheme with HighAdaptation mode is 6.28% to 30.56% less than the Recursive Co-Allocation scheme. According to the above experiments in the two different environments, if we know there other applications such as large data transferring tasks, we can use the High-Adaptation to adapt the bandwidth-variation more effectively. V. FUTURE WORK AND CONCLUSIONS The accuracy of information estimation is a key to the performance of parallel download schemes. After our experiment, we found that the real transferring rates are a little better than the measurements of the NWS, though the deviation seems not to change the relative order of the replica servers. In parallel download schemes, to estimate the end-toend bandwidth is important and not an easy task. There are a lot more to be done to predict the network condition flawlessly. Compared to the Recursive Co-Allocation scheme, our proposed EA Parallel Download scheme can reduce the data transferring time by 1.63% to 13.45% in the natural Unigrid environment and by 6.28% to 30.56% in the choreographed environment. That is, the EA Parallel Download adapts to the dynamic environment nicely. Generally, the replication schemes don’t replicate a data file to so many replica servers except the important one. But in data grid environment such as Unigrid, hundreds of machines can be used. And accompanying with larger storage systems, replicating more copies is becoming feasible and affordable. ACKNOWLEDGMENT This research is supported in part by ROC NSC under contract numbers 95-2422-H-259-001 and 94-2213-E-259004.

REFERENCES [1] [2] [3]

[4]

[5] [6] [7]

[8] [9] [10] [11] [12] [13]

Christos Gkantsidis, Mostafa Ammar, Ellen Zegura, On the Effect of Large -Scale Deployment of Parallel Downloading, IEEE Workshop on Internet Applications (WIAPP’03), 2003. Sudharshan Vazhkudai, Enabling the Co-Allocation of Grid Data Transfers, proceeding of Fourth International Workshop on Grid Computing, Phoenix, Arizona, pp. 41-51, November 2003. Chao-Tung Yang, I Hsien Yang, Chun Hsiang Chen, Improve Dynamic Adjustment Mechanism in Co-allocation data Grid Environments, Workshop on Compiler Techniques for High-Performance Computing (CTHPC 05), Taiwan, pp. 189-194. Zhou Xu Lu Xianliang Hou Mengshu Zhan Chuan, A Speed-based Adaptive Dynamic Parallel Downloading Technique, proceedings of ACM Special Interest Group Operating System Review, Vol.39, Page(s):63-69, January ,2005 Ruay-Shiung Chang, Chih-Min Wang, P.H. Chen, Replica Selection on Co-Allocation Data Grids, Second International Symposium on Parallel and Distributed Processing and Applications, Hong Kong, 2004. SRB main page, http://www.sdsc.edu/srb/index.php/Main_Page Kensuke MURAKI, Yasuhiro KAWASAKI, Yasuharu MiZUTANI, Fumihiko INO, and Kenichi HAGIHARA, Grid Resource Monitoring and Selection for Rapid Turnaround Applications, Transactions on Information and Systems Volume E89-D, Issue9 (September 2006) Pages: 2491-2501. Mackin, Mike Hotek, Designing a Database Server Infrastructure Using Microsoft SQL Server 2005. Cisco H.323 Signaling Interface (HSI), http://www.cisco.com/ Taiwan Unigrid project portal, http://www.unigrid.org.tw/ Globus toolkit, http://www.globus.org/toolkit/ Network Weather Service, http://nws.cs.ucsb.edu/ewiki/ Ganglia Monitoring System, http://ganglia.sourceforge.net/ HT

TH

HT

TH

HT

HT

TH

TH

HT

TH

HT

TH