Traffic Distribution over Equal-Cost-Multi-Paths using ... - IEEE Xplore

0 downloads 0 Views 399KB Size Report
Abstract. In order to reduce network congestion and fully use link bandwidth, when there are Equal-cost-multi-paths. (ECMPs) between a forwarding node and a ...
Traffic Distribution over Equal-Cost-Multi-Paths using LRU-based Caching with Counting Scheme Wei Lin * *

Bin Liu φ

Graduate School at Shenzhen, Tsinghua University and City University of Hong Kong [email protected]

Abstract In order to reduce network congestion and fully use link bandwidth, when there are Equal-cost-multi-paths (ECMPs) between a forwarding node and a destination subnet, traffic load should be balanced among ECMPs and packets of the same TCP flow should reach destination host in the same order. An algorithm called LRU1 based Caching with Counting (LCC) is proposed. Packet length differentiation is considered to achieve load balance by adapting a counter for each ECMP, and counter overflow is solved by relative counting and restrictions. UDP packets only need to be concerned to achieve load balance. Furthermore, flow delay differentiation forwarding to different hosts of the same destination subnet is transformed to entries in cache invalided time period difference. Simulation shows that when delay differentiation among ECMPs is not significant, storage requirement is small, only one cycle is needed for each cache lookup, load balance is near optimal, and only 2% of packets are out of order.

1. Introduction With rapid deployment of the Internet, network topology is becoming very complicated. For the consideration of traffic engineering and network security, there can be multiple paths between two nodes of the network. At present most unique-path routing protocols are based on dijkstra’s shortest path algorithm, only the least cost path is selected as the routing path. The bandwidth which is This work is supported by the NSFC under Grant No. 60373007 and No. 60573121; ChinaˉIreland Science and Technology Collaboration Research Fund (CI-2003-02) and Specialized Research Fund for the Doctoral Program of Higher Education of China (No. 20040003048); 985 Fund of Tsinghua University(No. JCpy2005054) 1

Least Recently Used algorithm

φ

Yi Tang φ Dept. of Computer Science and Technology Tsinghua University, Beijing, P. R. China [email protected]

offered by network is not fully used, since one of the paths could be heavily loaded while other paths remain idle, causing network congestion in certain areas. The case of load differentiation is even worse when packets traverse through multiple hops. If more than one path costs are approximately equal, the computation result of the routing protocol is changed among these paths, causing frequent updates of the route table. The network becomes unstable and the performance is affected[1]. Contrasting unique-path routing protocols, equal-costmulti-path routing protocols are proposed to solve the above problems[2, 3]. For those packets forwarded to the same destination subnet, ECMPs are provided as the candidates for next hop selection, where equal cost means that the shortest path computation results of the paths (for example, the delay or the number of hops) from the forwarding node to the destination subnet are approximately equal within certain precision. When ECMPs are provided by routing protocols, traffic distribution should be balanced among ECMPs in order to fully use the available bandwidth and reduce the routing instability. In this paper, for the ease of discussion, each path’s bandwidth is assumed to be equal, if not this case, a weight parameter should be added to each path to represent the bandwidth difference, and traffic should be distributed according to the weight parameter. For TCP packets, the out-of-order problem might arise at the destination host when packets belonging to the same TCP connection flow were forwarded through different paths. As each path’s delay is not exactly equal, which may fluctuate with the traffic load pressure, it’s possible that the latter forwarded packet would arrive at the destination host earlier than the anterior packet forwarded through a different path. If the number of consecutive packets out of order is smaller than the length of the TCP glide window, the TCP glide window can handle this problem, and it will not cause network performance degradation, otherwise, a resending request will be sent by the

Proceedings of the 20th International Conference on Advanced Information Networking and Applications (AINA’06) 1550-445X/06 $20.00 © 2006

IEEE

destination host, and the performance of the network and the relative applications will be affected. Therefore both the problems of load balance and packet order should be optimized together. If packets were forwarded in the packet-by-packet using round robin manner, ignoring the length differentiation of each packet, load balance would be well achieved, but the problem of outof-order packets would be serious. On the contrary, if packets belonging to the same flow must be forwarded through the same path, packet order would be guaranteed, with the difficulty that the load balance is not always optimum, depending on the traffic distribution. The distribution is even worse after several hops have been traversed, which is often seen in Internet. In this paper, an algorithm called LRU-based Caching with Counting (LCC) is proposed, which argues that it’s unnecessarily rigid to restrict the packets of the same flow to the same path. The relationship between out-of-order probability caused by flow redirection and the time interval of two consecutive packets of the same flow is explored. The differentiation of packet length is considered. As UDP packets don’t need to be in order, they are distributed to balance the load on the ECMPs. The rest of the paper is organized as follows. Section II describes the related works to distribute traffic among ECMPs. Section III illustrates the idea and architecture of the LCC algorithm. Section IV presents the simulation results of LCC and other algorithms on the performance of load balance and the percent of packets out-of-order, a short theoretical analysis on packets out-of-order is given. Finally, the concluding remarks are given in section V.

2. Related works The principles of traffic distribution among ECMPs are: 1) Balance the load of each path; 2) Guarantee the original sequence of TCP packets of the same flow when they arrive at the destination host. The two mutually conflicting principles require a corresponding tradeoff between them to satisfy the two objectives. The following methods are proposed in the literature to handle this problem. A. Packet-By-Packet Using Round Robin (PBP) Assuming packet length is constant PBP can achieve optimal load sharing at the cost of severe packet disorder. If the packet length is variable, PBP is not a viable solution as neither load sharing nor packets order can be guaranteed. B. Direct Hashing (DH) Using a combination of the five tuples: SIP,DIP, SP, DP and the protocol as an input of a hash function. The hashing value is defined as flow number, which distinguishes each flow. If the number of ECMPs is k, each path

is assigned an integer within the range from 0 to k-1. Then the flow number is modulo k, the remainder is directly mapped to the path with the same value. DH is shown in Fig.1. The sequence of the packets of the same flow is strictly guaranteed, but the traffic load is not well balanced. C. Table-based Hashing (TH) The improvement of TH compared with DH is that the mapping relationship between the flows and the ECMPs can be manually pre-configured. Though improving the load distribution and without packet disorder, the precomputed traffic distribution is static,, and the everchanging traffic load can not be well balanced either. TH is shown in Fig.2.

Fig. 1. Architecture of Direct Hashing D.

Table-based Hashing with Reassignment (THR) Given that a small number of packets disorder can be tolerated for the existence of TCP glide window, reassignment is introduced to further balance traffic load among ECMPs. After a time interval, one of the flows assigned to the heaviest load path would be reassigned to the lightest load path[4]. The traffic load statistics of the last cycle would be lost when new cycle begins in order to avoid counter overflow. The time interval and the reassigned flow should be carefully selected to trade off with the load balance and packets in order. THR is shown in Fig.3.

Fig. 2. Architecture of Table-based Hashing

Fig. 3. Architecture of THR E.

Fast Switch (FS) Using cache to handle the problem of traffic distribution among ECMPs is proposed by CISCO[5]. FS doesn’t consider the packet length differentiation. When the flow can’t be found in cache, FS selects a path in a round robin

Proceedings of the 20th International Conference on Advanced Information Networking and Applications (AINA’06) 1550-445X/06 $20.00 © 2006

IEEE

manner, which is unable to provide balanced traffic. Different requirements of TCP and UDP protocol are not taken into account. To solve the problems associated with the above methods, LCC is proposed in this paper to improve the load balance with a small probability of packet disorder; which can be handled by TCP glide window. LCC is based on cache maintained in an LRU manner. Packet length differentiation is considered by LCC via a counting scheme in bytes. Counter overflow is avoided by relative counting and certain restrictions. TCP packets and UDP packets are treated differently. Furthermore, the forwarding delay of each flow to different hosts of the same destination subnet is transformed to entries in the cache which reflect the lifetime of the flow. The space of cache can be utilized more efficiently. Load balance can be further improved, while still keeping the number of out-of-order packets low. In the next section, LCC will be illustrated in detail.

3ˊLCC algorithm and architecture LCC is a cache-based distribution algorithm, in light of the observation that if the time interval between two packets of the same flow is long enough, i.e., surpassing the maximum differentiation of different path delays, the latter can be forwarded through an alternative path without causing packet disorder. For the delay differentiation of ECMPs is not significant, only a short time interval suffices. Recall that a TCP glide window can handle out-oforder packets if the number of consecutive packets doesn’t exceed the length of the window, packets of the same flow can therefore be forwarded through different paths with a trivial probability of packet disorder. Similarly to the mechanism of cache working in the LRU manner, only recent forwarding information is retained in the cache, as well as the mapping relationship between a flow and an ECMP is recorded in cache. Because the time interval needed to be retained is short, only a small cache is needed. The length of packets is not equal, the longest packet is 1518 Bytes in Ethernet and the shortest is only 64 Bytes. So when the traffic should be balanced among ECMPs, length differentiation should be considered. PBP and FS only use round robin to distribute packets and do not consider length differentiation, so they can’t always achieve good results. LCC employs one counter for each ECMP to monitor the traffic load. when a packet arrives belonging to a flow that does not exist in cache, it indicates a new flow or a flow in which the interval between packet arrivals is so long that its record within the caches has be replaced by another recent flow. A flow in both cases would be referred as a new flow in this paper. LCC will select an ECMP with the lightest load according to the counters,

forward the packet by this path, and update the cache and counters. The length of the packet will be added to the pertinent counter, however this may cause counter overflow. Relative counting is applied to solve this problem. Because only the relative differentiation of the counter values, rather than the absolute values, is useful for choosing the least-loaded ECMP to forward a new flow, each time all the counters are updated: the new value of each counter derives from the old value minus the smallest value of the counters. Consecutive large flow arrivals can cause the counter to overflow. A restriction is imposed to eliminate such issue: if the old counter value plus the packet length overflows, the new value is set to be the maximum value that can be represented by the counter. Because the number of different ECMPs is usually no more than 4[6], the update operation is simple. Since UDP packets need not to be kept in sequence, they are directly distributed to the least-loaded ECMP. Load balance is approximately optimal when UDP packets account for 10%[7] of the total traffic and the number of flows is far more than the that of records in cache.

Fig.4 Sketch map of ECMPs in network In order to use the cache more effectively, flow delay differentiation from the forwarding node to the destination host should be considered. As shown in Fig.4, although the multiple paths from the forwarding node to the subnet are equal-cost, delay differentiation may be significant from the entry node of the subnet to each destination host. The delay differentiation is transformed to the lifetime differentiation of records in cache, which are set to be equal initially. Those flow records with short cache lifetimes can be replaced before the cache becomes full. Load balance can be further improved with low probability of packet disorder. A possible solution could be to use ICMP packets such as “ping” to detect the delay differentiation from forwarding node to particular destination hosts. The position of the Multiple Next Hop Handing Module (ECMPs Selection Module) in the Search Engine is shown in Fig.5. An incoming packet is first processed by the Parser Module, which extracts the DIP field for Routing Lookup Module and some fields for ECMPs Selection Module (such as DIP, SIP, DP, SP, Protocol, Length). The packet is then buffered in the Packet Buffer waiting for the

Proceedings of the 20th International Conference on Advanced Information Networking and Applications (AINA’06) 1550-445X/06 $20.00 © 2006

IEEE

routing result to attach to its front. Subsequently, if only one next hop indication is returned by the Routing Lookup Module, the Unique Next Hop Handling Module will respond giving the Next Hop Address and corresponding Port Number. The Multiple Next Hop Handling Module will just ignore the input; otherwise if the indication shows that there are several ECMPs provided for selection, the corresponding Multiple Next Hop Handling Module will perform to choose one Next Hop Address and corresponding Port Number using LCC. Finally, the routing lookup results are attached to the front of the packet by the Combination Module and exported for forwarding.

are updated with the length of the packet using relative counting.

Fig.6 Inside architecture of ECMPs selection module

Fig.5 Position of ECMPs selection module The internal architecture of Multiple Next Hop Handing Module from the forwarding node to a destination subnet is shown in Fig.6. It works in a pipeline manner. For those packets which must be handled by Multiple Next Hop Handling Module, its protocol field is first used to judge if it is a TCP or UDP packet. Packets with TCP or UDP protocol field are handled differently, as UDP packets are sequence- indifferent. For TCP packets, a flow number is computed using the SIP, DIP, SP and DP as input. The flow number generator can use CRC32 or other hashing methods to generate flow number (FlowNo). In order to search the cache in one clock cycle per packet, parallel comparison is deployed to guarantee the lookup performance. When the length of cache for each ECMPs selection module is short, registers or small BCAMs on chip can be incorporated to implement parallel comparison. A result bitmap will be returned after comparing the FlowNo with the F field of each record whose V field is 1 (it means a valid record) in the cache. The “OR” operation is used on each bit of the bitmap. A returned “1” signifies a “cache hit” (the FlowNo is located in the cache), otherwise, it means “cache missed” (the FlowNo can not be found in cache). In the case of “cache hit”, the address of the “hit” record can be got from the first 1’s address of the bitmap. The packet will be forwarded through the path with the path number equal to the P field of the “hit” record, the cache is updated using LRU method (lifetime information can be organized by link list, so only one record’s index needs to be moved when cache is updated. ). The counters

In the case of “cache missed”, new flow is forwarded through the path with the smallest counter value. The mapping relationship between FlowNo and path number will be recorded into the cache using LRU method. If the cache is already full before updating, the record with the least recent use would be replaced from the cache. For UDP packets, which are indifferent to sequence, the path will be chosen simply with respect to its smallest counter value, and all the counters are updated with the length using relative counting. With this assistance, LCC can achieve better results in load balancing among ECMPs than other methods. Because the delay differentiation among ECMPs is slight, cache with small number of records is adequate to satisfy the requirement of low probability of packet disorder. (In our simulations, only 2% of one million packets are out of order when 16-record cache is utilized) The small memory requirements allow the support of a large number of ECMPs selection modules on chip.

4ˊSimulation and performance analysis According to the statistical result, the number of parallel packet flows in a core-router in CERNET was reaching 300million in 2003[8]. Table 1 demonstrates that Internet traffic is dominated by TCP flows and UDP flows[7]. Although TCP flows are in the majority, UDP flows cannot be neglected. In the LCC algorithm, UDP packets are used to balance the load of ECMPs given that UDP packets don’t need to be kept sequential. In other algorithms, UDP packets are not taken into account specially, both UDP and TCP packets are processed in the same manner. Table 2 shows the application composition in backbone network[7]. The simulations are based on NS-2. The total number of flows in the simulations is 10000, and the simulation time lasts 30 seconds. The distribution of traffic is shown in Table 1, Table 2. The network topology adopted in simulation is shown in Fig.7. Node B represents a boundary router of Subnet A, which takes the responsibility of forwarding packets to

Proceedings of the 20th International Conference on Advanced Information Networking and Applications (AINA’06) 1550-445X/06 $20.00 © 2006

IEEE

Protocol

Byte

TCP

95%

90%

80%

use fixed mapping relationship and can’t reconfigure the load of ECMPs during transferring time. THR can provide satisfactory performance by reassignment during transferring time, but its reconfiguration lacks any real-time, within a certain time interval, the load may possibly vary largely among ECMPs. And the counters in THR will be reset at the end of each cycle, which causes the adjusting in defect of continuity.

UDP

5%

10%

20%

Table 3 Load balancing performances

Subnet D. Node C is a boundary router of destination Subnet D. There are 8 ECMPs between Node B and Node C, the number attached to each link in Fig.7 represents its propagation delay in milliseconds. Table 1 Protocol composition in backbone network

Others

Packet

Flow

ICMP most of the remains

Table 2 Application composition in backbone network

%

LCC

PBP

DH

TH

THR

FS

Port1

12.49

10.37

12.36

12.39

11.57

10.43

Port2

12.51

12.07

11.73

11.68

12.38

12.21

application

Byte

Packet

Flow

Port3

12.51

11.77

18.04

13.47

11.62

14.45

Web

70%

75%

75%

Port4

12.49

12.32

5.86

12.35

12.40

12.42

DNS

1%

3%

18%

Port5

12.52

11.63

15.50

10.67

13.36

12.17

SMTP

5%

5%

2%

Port6

12.50

14.68

13.42

17.49

13.56

13.23

12.49

13.50

6.99

13.15

12.53

11.97

FTP

5%

3%

1%

Port7

NNTP

2%

1%

1%

Port8

12.49

13.62

16.10

8.79

12.54

13.08

1%

Variance

0.000 1

1.864 4

18.36 5

6.297 6

0.504 4

1.348 1

Telnet

1%

Others

1% web-related application

Based on the simulation environment above, the performance of LCC is compared with other algorithms (PBP, DH, TH, THR and FS).

Fig.7 Network topology in simulation

Fig.8 Real time load variation among ECMPs

A. Analysis on the performance of load balancing Referring to Fig.7, we focus on the traffic flows destined to node C, which can be split among the 8 available ECMPs. Assume the desired traffic splitting ratio among the 8 ECMPs is 1:1:1:1: 1:1:1:1. Table 3 shows the load balancing performances of different algorithms .Analyzing Table 3; LCC, FS, PBP all perform well at load balancing. As expected, LCC has the best perform-4 ance with its variance reaching 10 . This performance is a result of the real-time load balancing inherent to LCC. FS and PBP distribute traffic among ECMPs based on round-robin manner; neither considers the length differentiation of the packets. The load balancing performance of the DH and TH algorithms is much worse because they

Referring to Fig.8, we compare the load balance in real time among the several algorithms. X-axis represents the number of packets; Y-axis represents the traffic deviation between the heaviest and the lightest loaded paths. LCC achieves the best performance in load balancing, and the traffic jitter is also trivial. B. Analysis of Packet order continuity The statistic of packets out-of-order is based on per flow. If the ith packet of a flow forwarded from a host of Subnet A is not the ith packet of that flow received by a host Subnet D (Subnet A, D are defined in Fig.7), the packet is thought as out of order in the flow.

Proceedings of the 20th International Conference on Advanced Information Networking and Applications (AINA’06) 1550-445X/06 $20.00 © 2006

IEEE

We evaluate the order preserving performance of each algorithm through comparing the proportions of packets which are out-of-order. Parameters related to order preserving performance are presented below: link width ( Į ), the transfer delay of the ith ECMP (delay[i]), the minimal lifetime of records in cache ( c ), the length of the kth packet ( Ȗk ), the number of ECMPs ( p ), the time interval of kth and (k+1)th consecutive packets belonging to the same flow ( t k ). So when there is a packet out-of-order, it must meet Inequality (1). ∃i, j  (i, j ∈ N , 0 ≤ i ≤ p, 0 ≤ j ≤ p ) 

⎧⎪ delay[i] - delay[j] > ∑ ( Ȗk / α + tk ) ⎨ ⎪⎩∑ ( Ȗk /Į + tk )>c

(1)

The more i, j satisfy the Inequality (1) above, the higher probability packet disorder may occur. We can deduce that if ∑ ( Ȗk /Į + t k ) is reduced; the probability of packet disorder is reduced. This can be achieved by decreasing the size of packet ( Ȗk ), increasing the bandwidth of link ( Į ) or decreasing the interval time of consecutive packets belonging to the same flow ( t k ). However, each one of these methods remains hard to realize. Reducing the number of ECMPs ( p ) also can reduce the possibility of packet disorder. However, this leads to inefficient utilization of the link bandwidth and conflicts with the original intention. Consequently, to achieve the target of keeping the probability of packet disorder under a threshold, we just adjust the minimal lifetime of records in cache ( c ), which corresponds to the cache size. If MAX (delay[i] - delay[j] ) is small, this means the transfer delay differentiation of ECMPs is not significant, small cache can be used to restrict the probability of packet disorder under a threshold. In the simulations, the bandwidth of each ECMP is set to 44.736Mbps (DS3), the size of cache is set to 16, the number of ECMPs is set to 8, the proportion of packets out-of-sequence is shown in Table 4. Table 4 The percents of packets out-of-order Algorithms

LCC

PBP

DH

TH

THR

FS

Percent

2.36%

32.83%

0

0

2.59%

2.56%

Referring to Table 4, PBP has the highest probability of packet sequence discontinuity, due to the fact that it is based on round robin. DH and TH can reach zero for the mapping relationship is fixed. FS and LCC adopt the proximate strategy to retain packet sequence, so the per-

centage of packet disorder is fixed, both under 3%. When adopting reassignment, THR has 2.59% of packets out-oforder.

5ˊConclusion Although many distribution algorithms are proposed to balance the load among ECMPs and keep packets in order, they fail to ensure both the two objectives simultaneously. This paper points out that if the time interval of two consecutive packets of the same flow is large enough, packets can be forwarded through different paths while still keep low probability of packet disorder. A corresponding scheme named LCC based on cache is presented. It adopts an LRU manner for updating, fully takes into account the length differentiation of packets and the different requirement of keeping sequence for TCP and UDP packets. Relative counting is created to avoid counter overflow, and delay differentiation from forwarding node to different hosts is utilized to further improve the cache storage efficiency. Furthermore, a pipeline parallel architecture is deployed for the sake of a high throughput. Cache lookup can be finished in one clock cycle. And only a small cache is needed, allowing On-chip implementations. Finally, simulation results demonstrate advantages of LCC evidently. LCC can balance the load near to optimal in real time while merely retaining a low probability of packet disorder.

Reference [1] D. G. Andersen, A. C. Snoeren, and H. Balakrishnan, "Bestpath vs. multi-path overlay routing," Proceedings of the ACM SIGCOMM Internet Measurement Conference, IMC, pp. 91-100, 2003. [2] C. Hopps, "RFC 2992: Analysis of an Equal-Cost Multi-Path Algorithm," www.ietf.org/rfc/rfc2992 November 2000. [3] D.Thaler, "RFC 2991: Multipath Issues in Unicast and Multicast Next-Hop Selection," www.ietf.org/rfc/rfc2991, November 2000. [4] T. W. Chim and K. L. Yeung, "Traffic distribution over equal-cost-multi-paths," presented at 2004 IEEE International Conference on Communications, Paris, France, 2004. [5] A. Zinin, Cisco IP Routing, Packet Forwarding and Intradomain Routing Protocals, vol. Section 5.5.1: Addison Wesley, 2002. [6] Z.-Y. Liang, K. Xu, J.-P. Wu, and M.-W. Xu, "IP lookup scheme supporting routing compaction and multi next hops," Ruan Jian Xue Bao/Journal of Software, vol. 15, pp. 550-560, 2004. [7] K. Thompson, G. J. Miller, and R. Wilder, "Wide-area internet traffic patterns and characteristics," IEEE Network, vol. 11, pp. 10-23, 1997. [8] G. Cheng, J. Gong, W. Ding, and J.-L. Xu, "Hash algorithm for IP flow measurement," Ruan Jian Xue Bao/Journal of Software, vol. 16, pp. 652-658, 2005.

Proceedings of the 20th International Conference on Advanced Information Networking and Applications (AINA’06) 1550-445X/06 $20.00 © 2006

IEEE