QDFS: A Quality-Aware Distributed File Storage Service - IEEE Xplore

QDFS: A Quality-Aware Distributed File Storage Service Based on HDFS SONG Guang-hua

CHUAI Jun-na

YANG Bo-wei

ZHENG Yao

Center for Engineering and Scientific Computation Zhejiang University Hangzhou, China [email protected]




selection strategy. We have established an environment based on QDFS by using inexpensive desktop PCs and a small scale cluster, and have made experiments to testify the suitability of QDFS to the unstable network environment. The rest of the paper is organized as follows. Section Ċ describes the framework of QDFS. The backup policy based on recovery volumes is depicted in section ċ. The qualityaware data distribution strategy is presented in detail in section Č. The experimental results are analyzed in section č. Finally, section Ď discusses the conclusions and future work.

Abstract—On the basis of the Hadoop distributed file system (HDFS), this paper presents the design and implementation of QDFS, a distributed file storage service system that employs a backup policy based on recovery volumes and a quality-aware data distribution strategy. The experimental results show that QDFS increases the storage utilization ratio and improves the storage performance by being aware of the quality of service of the DataNodes. The improvements of QDFS compared with HDFS make the Hadoop distributed computing model more suitable to unstable wide area network environments. Keywords-hadoop; HDFS; cloud storage; redundant backup; quality-aware

I.

II.

QDFS FRAMEWORK

QDFS extends the master/slave architecture of HDFS. The framework of QDFS is depicted in Fig.1. In QDFS, autonomous computers donate their free storage resources voluntarily, which comprise a mass storage cloud. In the figure, DNn is a DataNode that donates its storage resource, it is either a premium, stable cluster system (such as DN1), or even a free desktop PC (such as DN2~DN6) in the Internet. NN is the global reputation control node (NameNode) of the system, it monitors the quality of service (QoS) of the DataNodes (depicted as different line types), and computes the credit of each DN by the established reputation model. On the basis of each credit computed, NN adjusts the weight of each DN which influences the judgment of on-going data distribution. In order to gain a higher credit, a DataNode should be online longer, provide higher bandwidth and lower latency of data transfer, and be honest on its behaviors. A DataNode with a high credit will be encouraged and rewarded by the system. In addition, QDFS provides a WEB portal, which is a convenient access interface for Internet users. The flow-path of QDFS-based file storage (upload) and download is depicted in Fig. 2. QDFS has many differences compared with HDFS, especially on the data backup policy and the data distribution strategy. We will discuss it in detail in the following sections.

INTRODUCTION

The Apache Hadoop project aims to develop opensource software for reliable, scalable, and distributed computing [1]. The main subprojects of Hadoop include: (1) MapReduce, a software framework for distributed processing of large data sets on computing clusters [2-3], and (2) HDFS, a distributed file system that provides high throughput access to application data [4]. Hadoop is mainly supported by Yahoo, and has been deployed and tested by Yahoo on more than 200 dedicated nodes in a stable network environment. One of Yahoo's Hadoop clusters sorted 1 terabyte of data in 209 seconds, which beat the previous record of 297 seconds in the annual general purpose (Daytona) terabyte sort benchmark [5]. This is the first time that either a Java or an open source program has won. Currently, a wide variety of companies and organizations, such as Facebook, Amazon and Last.fm, are using Hadoop for both research and production. Hadoop has been one of the best suitable software frameworks for constructing enterprise or business data centers. Hadoop is designed to run on top of commodity hardware environments, and HDFS can be deployed on inexpensive desktop computers. However, HDFS is initially designed to be deployed on stable nodes on the cluster; if it runs on an unstable network environment, the storage utilization and adaptation to the environment can not be guaranteed[6-9]. As a result, HDFS may not provide good storage services for Internet users. In order to fit for the unstable network environment, we implement QDFS, a quality-aware cloud storage service platform based on HDFS. QDFS employs a backup policy based on recovery volumes and a quality-aware DataNode

III.

___________________________________ 978-1-4244-8728-8/11/$26.00 ©2011 IEEE

BACKUP POLICY BASED ON RECOVERY VOLUMES

W

The recovery volume is a special volume; it is generated by calculation on other volumes. Any damaged or lost volumes (including data volumes and recovery volumes) can be recovered if only the number of damaged or lost volumes does not exceed the number of recovery volumes. This kind of recovery policy is widely used, such as the WinRAR software and the RAID system. For a file with N data volumes, M recovery volumes are generated, all these N+M volumes are stored to provide data redundancy. As long as no less than N out of the N+M volumes can be read correctly, the original file will be recovered. In QDFS, the recovery volumes are generated in 2 steps. Step 1: define a function F(i,j) : F i, j i ( j 1) (1) Where i is the input volume number, and j is the output volume number, i.e. the recovery volume. Step 2: generate the recovery volumes via the function F(i,j):

ea k at D aL k in

D ak We

St

ata

ro ng

Lin

D at

k

aL

in k

Figure 1. Framework of QDFS A client requests for a service storage

Service type?

¦ F (i, j ) input (i)

output ( j ) download

Upload the file via web Portal to NN, and affirm the number of recovery blocks

Submits download request via the web Portal to NN

NN segments the file data into data blocks

NN downloads blocks from the DataNodes

NN generates recovery blocks for the data blocks

NN recovers the data blocks if any blocks are damaged

NN distributes the blocks via the DataNode selection policy

NN reconstructs the file from the blocks

Recalculates the weight of each DN after the distribution

Hands out the file via the Web Portal

(2)

Where input(i) is the ith original volume, and output(j) is the jth recovery volume. In case any volume is damaged, the recovery is conducted in 2 steps. As to 5 original volumes (denoted as A to E, respectively) and 3 recovery volumes (denoted as X to Z, respectively), if B, C, and D are damaged, the recovery process will be: Step 1: construct a matrix M:

M

§ 1 ¨ 0 ¨ ¨ F (1,1) ¨ ¨ F (1, 2) ¨ © F (1, 3)

0

0

0

0

0

0

F (2,1)

F (3,1)

F (4,1)

F (2, 2)

F (3, 2)

F (4, 2)

F (2, 3)

F (3, 3)

F (4, 3)

According to (1) and (2), we have: M *N Q

End of the service

Where N

Figure 2. The flow-path of QDFS-based file storage and download

HDFS exploits the rack-aware data redundant policy[10], the main idea is: to establish a certain number of backup blocks for each data block according to the system parameter specified, and store the data block as well as its backup blocks to the DataNodes via the rack-aware policy. As a result, a data block and its first backup block are stored on DataNodes that are on the same rack, and the second backup block is stored on a DataNode in another rack. This backup policy is reliable; however, it costs much storage space. Typically, a data file occupies 3 times the storage space of its size on the HDFS. QDFS, however, provides a redundant policy based on the recovery volumes; it costs less space that HDFS, while it has considerable reliability.

§ A· ¨B¸ ¨ ¸ ¨C ¸, ¨ ¸ ¨D¸ ¨ ¸ ©E¹

and Q

· ¸ ¸ F (5,1) ¸ ¸ F (5, 2) ¸ ¸ F (5, 3) ¹ 0

1

(3)

§ A· ¨E¸ ¨ ¸ ¨ X ¸. ¨ ¸ ¨Y ¸ ¨ ¸ ©Z¹

Step 2: calculate the inverse of M (denoted as M ) via the Gaussian algorithm, and calculate the matrix N via (4): (4) N M Q Therefore, the B, C, and D are recovered. Furthermore, as all the calculations are carried out on the byte level, they are very efficient.

IV.

QUALITY-AWARE DATA DISTRIBUTION

*

environment, wT may fluctuate. To improve the stability of the QDFS storage performance, we use (7) to smooth casual * leaps of wT .

A. Background In order to deal with DataNode failures, HDFS enforces a series of enhancement strategies [11-12]. Data blocks can be migrated from one DataNode to another; and the number of backups of a data block can be adjusted to balance the reliability and storage space utilization. However, HDFS assumes that each DataNode has analogous storage characteristics, all the DataNodes are connected within a reliable network [13-14]. Therefore, when selecting DataNodes to store data blocks, HDFS enforces a simple method: to randomly select DataNodes that have enough storage space to ensure that all DataNodes have analogous workload. This kind of DataNode selection policy is not suitable for the dynamic nature of the Internet with autonomous desktop computers [15]. In QDFS, DataNode selection is based on the quality of service (QoS) of the DataNodes. A DataNode with high QoS has a heavy weight than that of a light one when the DataNode selection policy is enforced.

wT

U

e

(7)

Where K is a constant that influences the smoothing * effect of casual leaps of wT . The initial value of wT , i. e., w0 , is 1. However, as stated above, the free space is an important factor for measuring the weight of a DataNode, especially when transferring large sized files. To consider the impact of free space, we define the free space utilization ratio, , as bellow:

G

sp ½ min ® ,1¾ ¯ spv ¿

(8)

Where sp is the free space of a DataNode, while spv is the free space threshold for all DataNodes, which is defined as: spv spmax u P (9)

B. Weighting of a DataNode based on its QoS For a distributed file system, the transfer bandwidth, the availability of service and the free storage space are essential factors for providing a high quality storage service. We define the weight of a DataNode as a combination of these factors. We define the availability of a DataNode, denoted as U , as follows: § RB · 1¸ ¨ © TB ¹

wT* wT 1 wT 1 K

Where spmax is the maximum free space of all DataNodes, and P is a coefficient that controls the threshold. Initially, P is defined as: spi P (10) spa

spi is the smallest free space of all DataNodes when new DataNodes arrive, while spa is the maximum

(5)

Where

Where RB is the actual transferred data size to a DataNode, and TB is the theoretical distributed data size on that DataNode using the WBS algorithm which will be presented in the following section. As (5) indicates, U is 1 if a DataNode fulfilled the scheduled data transfer, is greater than 1 if it stored more data than expected, and is small than 1 if it stored less data than expected. We use time t to measure the transfer speed of a DataNode. t is defined as the time a DataNode takes when it transfers a fix-sized data block. * * We denote the weight of a DataNode as wT . wT is

free space of all DataNodes. Note that P will be reevaluated when new DataNodes enter the system. As long as wT and have been evaluated, the weight of a DataNode will be generated: W wT u G (11) Visually, the flow-path of generating the weight of a DataNode is depicted in Fig. 3. C. WBS: Weight-based data distribution algorithm Upon weighting of a DataNode, we present WBS, a weight-based data distribution algorithm as follows. Assume that the weight of a DataNode i (denoted as DNi) is Wi. If a file with m blocks (including data blocks and recovery blocks) are to be stored, the number of blocks distributed to DNi, denoted as TBi, is defined as follows:

defined as (6), which is proportional to U and is inversely proportional to t. U wT* *C (6) t Where C is a constant; and T means the Tth data transfer of the DataNode. The constant C is used to normalize the * initial value of wT . *

In (6), wT is a measurement of the quality of a data transfer to a DataNode. It concerns the speed and success ratio of a data transfer, and does not concern the free space of a DataNode. For a dynamically changing network

TBi

Wi Wi * m, if n * m t spv ° n Wi ° ¦ Wi ¦ j 1 °j1 ® Wi °0, if n * m % spv ° Wi ¦ ° j 1 ¯

(12)

TABLE I.

Where n is the number of DataNodes in the system. It is obvious that DataNodes with heavy weights will be allotted more blocks. In practice, the DataNode selection process is conducted based on the descending order of the weights, i.e., DataNodes with heavy weights will be preferably pitched on. If all the DataNodes have been allotted blocks according to (12), and there exist blocks that have not been distributed, these blocks would be distributed to the DataNodes with heaviest weights. V.

Node Name node2 node3 node4 node5 cesc ubuntu

INITIAL CONSTRAINTS OF THE DATANODES Bandwidth constraint(Mbps) 30 18 15 12 8 5

Free space(GB) 9.71 15.3 19.75 19.75 14.91 11.23

B.

The experiments of file uploading In the experimental environment described above, we enforced a series of experiments for file uploading. Fig. 4 demonstrates the upload time comparison between QDFS and HDFS. As HDFS randomly selects DataNodes, the file transfer time varies greatly due to the different QoS of the DataNodes. However, QDFS selects DataNodes using the weight-based selection algorithm. As a result, the upload time decreases as the upload proceeds and tends to a stable level.

EXPERIMENTS AND ANALYSES

A.

The experimental environment We deployed the QDFS in an experimental environment which contains 1 Dawning PHPC-100 computers (5 homogeneous computing nodes, denoted as node1 to node5, respectively, the hardware resources of each computing node include 2-way Quad-Core AMD Opteron Processor 2350, 8GB main memory, 160 GB SATA hard disk, and a Gigabit network interface), and 2 desktop PCs (denoted as cesc and ubuntu). We assign node1 as the NameNode and other nodes as the DataNodes. To testify the influence of network bandwidth and free storage space, we limited the network bandwidth and free storage space of the DataNodes, as depicted in Tableĉ. In the experiments, we transferred a file of size 960MB. We split the file into 15 data blocks, and built 15 recovery blocks using the redundant policy depicted in section 3. The coefficient K in (7) is 3.

C.

Monitoring the weights of the DataNodes Fig. 5 demonstrates the changes of the weights of the 6 DataNodes when the upload continues. It is straightforward that DataNodes with different qualities have different weights. node2 is of high access bandwidth but has small free space, its weight decreases rapidly. node3 is of high access bandwidth and has relatively large free space. When the weight of node2 decreases because of the lowering of free space, it becomes the “best” DataNode. node4 and node5 has relatively high bandwidth and free space. Therefore, at the latter stage, they become the main storage nodes. cesc and ubuntu have the biggest free space but smallest bandwidth, their weights are relatively small, as a result, they were distributed small amounts of data, and their weights were not apparently influenced along with the upload process. In addition, the weight of a DataNode changes smoothly by using the WBS algorithm, avoiding leaps brought about by casual “mistakes” of the DataNodes. Therefore, the weight of a DataNode reflects its overall QoS objectively. For a premium DataNode, the weight will not increase infinitely as the free space decreases. The weights of node2 and node3 decrease apparently because of their inadequate free space. This prevents the poor performance or even failure of the storage services. VI.

CONCLUSIONS AND FUTURE WORK

On the basis of the Hadoop distributed file system, this paper presented a novel distributed file storage service system-QDFS. QDFS enforces a data redundancy policy based on recovery volumes, and evaluates the QoS of a DataNode dynamically. By applying the presented redundancy policy, the storage space of a file is decreased; by applying the WBS algorithm, QDFS is more suitable to dynamic network environment than HDFS. Upon the Hadoop system deployed on an experimental network, this paper realized the QDFS model. The experimental results show that QDFS is practical, which in

Figure 3. The flow-path of generating the weight of a DataNode

[13] Yuan Yu, Pradeep Kumar Gunda, and Michael Isard, “Distributed aggregation for data-parallel computing: Interfaces and implementations. In Proc. of the 22nd ACM SIGOPS Symposium on Operating Systems Principles, Big Sky, MT, United states,” pp. 247-260, Oct. 2009. [14] Michael Isard, and Yuan Yu, “Distributed data-parallel computing using a high-level programming language,” Proc. of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems, Providence, RI, United states, pp. 987-994, Dec. 2008. [15] Li Cheng-dong, Dai Yue-fa, and Wang Wei, “Fault-tolerance in the Hadoop Framework,” Computer Knowledge and Technology, 5(28): 8053-8055,2009.

turn lays a foundation for the popularizing of the Hadoop project to the open Internet. In the future, we will deal with the issues that internal DataNodes of a cluster cannot be connected directly by clients over the network. We plan to deploy a NameNode on the access point of each cluster, and construct a tree-like QDFS platform. ACKNOWLEDGMENT This work is supported by the science and technology department of Zhejiang Province, China, under grant No. 2009C14031. We would like to thank the center for engineering and scientific computation, Zhejiang University, for its computational and storage resources, with which the research project has been carried out. REFERENCES [1] [2]

Apache Hadoop Project, Available: http://hadoop.apache.org/, 2010. J.Dean and S.Ghemawat, “Mapreduce: a exible data processing tool,” Commun. ACM, 53(1):72–77, 2010. [3] J.Dean and S.Ghemawat, “Mapreduce: simplied data processing on large clusters,” Proc. of the 6th Symposium on Operating Systems Design & Implementation,” p.10, 2004. [4] Garhan Attebury, Andrew Baranovski, Ken Bloom, BrianBockelman, DorianKcira, James Lettset, et al., “Hadoop Distributed File System for the Grid,” Proc. of 2009 IEEE Nuclear Science Symposium Conference Record (NSS/MIC 2009), Orlando, FL, United states, Oct. 2009, pp.1056-1061. [5] O. O’Malley and A.C.Murthy, “Winning a 60 second dash with a yellow elephant,” Technical report, Yahoo!, 2009. [6] Cornelli F, Damiani E, and Capitani S D, “Choosing Reputable Servents in a P2P Network,” Proc. of the 11th International World Wide Web Confererce, Honolulu, Hawaii, USA, 2002, pp.376-386. [7] Li Xiong, and Ling Liu, “A Reputation-based Trust Model for Peerto-Peer Ecommerce Communities,” Proc. of ACM Conference on Electronic Commerce, ACM Press, 2003, pp.228-229. [8] Yuan W, Li J S, and Hong PL, “Distributed Peer-to-peer Trust Model and Computer Simulation,” Journal of System Simulation, 3(13): 66-69, 2006. [9] Papaioannou T G, and Stamoulis G D, “Reputation Based Policies That Provide the Right Incentives in Peer-to-Peer Environments,” Computer Networks, 50(4): 563-578, 2006. [10] Dhruba Borthakur, HDFS Architecture, Available: http://hadoop.apache.org/common/docs/r0.20.2/hdfs_design.html. Sep. 2, 2009. [11] Zhi-Dan Zhao, and Ming-Sheng Shang, “User-Based CollaborativeFiltering Recommendation Algorithms on Hadoop,” Proc. of Third International Conference on Knowledge Discovery and Data Mining (WKDD 2010), Phuket, Thailand, Jan. 2010, pp.478-481. [12] Huang Lan, Wang Xiao-wei, Zhai Yan-dong, and Yang Bin, “Extraction of User Profile Based on the Hadoop Framework,” Proc. of 5th International Conference on Wireless Communications, Networking and Mobile Computing (WiCOM 2009), Beijing, China, Sep. 2009.

Figure 4. Upload time comparison between QDFS and HDFS

Figure 5. Changes of the weights of the DataNodes along with the upload sequence