Handling Double-Link Failures in Metro Ethernet ... - IEEE Xplore

4 downloads 0 Views 170KB Size Report
Abstract—Ethernet is becoming a preferred technology to be deployed in metro domain due to its low cost, simplicity and ubiq- uity. However, traditional ...
Handling Double-Link Failures in Metro Ethernet Networks Using Fast Spanning Tree Reconnection Jian Qiu, Gurusamy Mohan, Kee Chaing Chua and Yong Liu Department of Electrical and Computer Engineering National University of Singapore {g0501087, elegm, eleckc, eleliuy}@nus.edu.sg

Abstract—Ethernet is becoming a preferred technology to be deployed in metro domain due to its low cost, simplicity and ubiquity. However, traditional spanning tree based Ethernet protocol does not meet the requirement for Metro Area Networks in terms of network resilience. In [1], we proposed a Fast Spanning Tree Reconnection (FSTR) mechanism for Metro Ethernet networks to handle single link failures. Upon failure of a link on a spanning tree, FSTR mechanism activates a reconnect-link to reconnect the broken spanning tree. FSTR mechanism has the features of fast recovery, simplicity, and guaranteed protection. However, when more than one link fail in the network, the FSTR mechanism would generate unexpected loops and cannot function properly. In this paper, we propose a fast spanning tree reconnection mechanism to handle double-link failures with protection grade guarantees. The mechanism is distributed and can alleviate the problem in previous FSTR mechanism. We formulate the reconnect-link pre-configuration problem for double-link failures as an integer linear programming problem. Through numerical results we demonstrate that the proposed mechanism can satisfy the protection grade required for each connection by efficiently utilizing the network capacity.

I. I NTRODUCTION Ethernet is the dominant technology for Local Area Networks (LANs) over the past 30 years. It is becoming a preferred technology to be extended to Metro Area and Wide Area Networks. Economies of scale, ubiquity, high-bandwidth, ease of configurations, support on various layer 3 technologies, and scalability are some prominent reasons for this preferential status of Ethernet [2]. Failure handling is an important issue in Metro Area Networks (MANs). It is a critical requirement for Metro Ethernet to have a fast, reliable and efficient failure handling mechanism. However, the spanning tree protocol family used in current Ethernet switched networks cannot guarantee fast failure recovery. In both 802.1d Spanning Tree Protocol (STP) [3] and 802.1w Rapid Spanning Tree Protocol (RSTP)[4], a new spanning tree is rebuilt upon failure, which takes a long time especially when the failure happens near the root of the tree. Single link failures are the most common failure scenarios in the network. However, unforeseen disasters such as earthquake or tsunamis may result in two or more link failures. Recently, more and more research concentrates on providing full or partial protection against double or multiple link failures in Metropolitan and Carrier grade networks. 100% guaranteed protection for multiple link failures is expensive and difficult,

since it requires too much network resources and relatively complicated mechanism. Therefore a resilient mechanism with simplicity and fast failure recovery should be designed in Metro Ethernet networks, which may not provide full protection against all the multiple link failure scenarios, but has the capability to provide different quality of protection according to different requirement from customers. The Fast Spanning Tree Reconnection (FSTR) mechanism proposed in [1] can efficiently protect against single link failure at a fast recovery speed. However, when multiple links fail in the network, the FSTR mechanism would generate unexpected loops and cannot function properly. This is due to the fact that FSTR mechanism handles each single link failure in a distributed manner. Upon multiple link failures, each affected Ethernet switch may only get limited failure information, resulting in a wrong determination when they reconfigure their switching table. Broadcasting the failure information in the whole network seems to be the simplest way to handle this problem, however, it would greatly increase the recovery time. In this paper, we propose a different resilient mechanism based on FSTR mechanism, which can handle double-link failures in Metro Ethernet networks without any loops in the network. It also can avoid failure message broadcast in the network, reducing the failure handling time. In addition to 100% protection provisioning for single link failure, this mechanism can also handle double-link failures with a specified protection grade requirement for each connection. The rest of the paper is organized as follows. In Section II, an overview of the related works is given. In Section III, we illustrate the problem of FSTR mechanism when doublelink failure happens in Metro Ethernet, and describe how to solve the problem. In Section IV, we introduce the integer linear progaming model to formulate reconnect-link preconfiguration problem for double-link failures. Optimization results for different network scenarios are given in Section V. Concluding remarks are made in the last section. II. R ELATED W ORK The IEEE 802.1d Spanning Tree Protocol (STP) [3] is responsible for building a loop-free logical forwarding topology over physical network, providing connectivity among all nodes. It suffers from low convergence time (tens of seconds) and inefficient bandwidth usage. The IEEE 802.1w Rapid Spanning Tree Protocol (RSTP)[4] allows that a switch sends

978-1-4244-4148-8/09/$25.00 ©2009 This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2009 proceedings.

on its new root port in order to authorize immediate transition to forwarding, bypassing twice the forward-delay-susceptible listening and learning stages needed by STP. Whereas the recovery time of RSTP is in the order of seconds which is still not desirable in metro domain. Several mechanisms to reduce the recovery time or to provide guaranteed protection in Metro Ethernet have been suggested by modifying STP and RSTP. The approach in [7] builds a master tree and a set of subtrees for each node. Traffic is transmitted on the master tree during normal operation. Upon a link failure, subtree replaces the failed part of the master tree. The re-convergence of the master tree is avoided. But it does not address the problem of guaranteed backup capacity provisioning to carry affected traffic. In [8], a tool for RSTP optimization is introduced, so that protection of the traffic in Metro Ethernet can be guaranteed. Since the mechanism is based on RSTP, failure recovery speed is not increased. To achieve optimal network resource utilization, fast resilience mechanisms based on IEEE 802.1s Multiple Spanning Tree Protocol (MSTP) [5] have been proposed, including Viking [9] which handles failures in a centralized way, and a distributed end-to-end protection method by Farkas [10]. In [11] and our previous work [12], a local restoration mechanism based on multiple spanning trees is proposed to restore traffic onto backup spanning tree by changing Ethernet frames’ Virtual LAN (VLAN [6]) ID in local Ethernet switch. In [12], we also discuss how bandwidth guarantees can be provided for protection through local restoration. III. FSTR P ROTOCOL WITH D OUBLE -L INK FAILURE A. FSTR Mechanism In [1], we proposed a Fast Spanning Tree Reconnection (FSTR) mechanism to protect against single link failures in Metro Ethernet networks. As shown in Fig.1, under the FSTR mechanism, when a link on a working spanning tree fails , the spanning tree is divided into two separated subtrees, and a distributed fast spanning tree reconnection protocol is activated to reconnect the two subtrees by using a pre-configured single link defined as reconnect-link. A failure notification table and an alternate output port table are configured on each switch based on pre-computed reconnect-links. By checking the two tables, switching tables of the Ethernet switches affected by the spanning tree reconnection are reconfigured in a way that traffic passing the failed link is quickly rerouted and backward learning is avoided. The mechanism has the features of fast recovery and guaranteed protection with low implementation cost. B. Double-Link Failures in Metro Ethernet In addition to fully protect single link failures, the resilient mechanism in Metro Ethernet should also have the capability to handle multiple link failures with a certain protection grade. In this paper, we concentrate on double-link failure. As the matter of fact, the probability that more than two links fail simultaneously is trivial.

Spanning Tree Reconnect-Link

A

C

B D E

Fig. 1. Illustration of FSTR mechanism: upon failure of link A−B, A notifies D and B notifies E to reconnect the spanning tree with the reconnect−link D−E

There are totally four situations when double-link failure happens in Metro Ethernet. • Case 1: The two failed links are not links of the working spanning tree, so that no resilient mechanism is needed to reconnect the tree. • Case 2: One failed link is a link of the working spanning tree, while another failed link is not, and it is not the reconnect-link of the first link either. In this case, FSTR mechanism for single link failure mechanism can provide the protection. • Case 3: One failed link is a link of the working spanning tree, while another failed link is the reconnect-link of the first link. This failure can be simply protected by activating an alternative reconnect-link. • Case 4: Two failed links are both links of the working spanning tree. Two reconnect-links must be used to reconnect the spanning tree Our resilient mechanism to handle double-link failures in Metro Ethernet should be able to protect against all of the four kinds of failure situations as shown above, especially Case 4. However, the original FSTR mechanism has some problem when failure as in Case 4 happens. It is because that the mechanism handles link failures in a distributed manner, and does not know which failure situation has happened. Under FSTR mechanism, when a Ethernet switch detects a link failure adjacent to it, it treats the failure as a single link failure without any knowledge on possible failure at other part of the network. Thus, it may notify a wrong reconnect-link to reconnect the spanning tree and generate a loop in the network. Figure 2 illustrates how a loop is formed when double-link failure happens. Assume link C − F is pre-configured as reconnect-link of link B − C as shown in Fig. 2 (a). Based on FSTR mechanism, Ethernet switch B will send a notification knowledge to F when B − C fails, and then C − F will be activated to reconnect the spanning tree. In the same way, we assume link C − G is the reconnect-link of D − F (Fig. 2 (b)). When the failure of single link B−C or D−F happens, FSTR mechanism can reconnect the spanning tree with the proper reconnect-link. However, when the link B − C and D − F fail simultaneously, F does not know the failure on link B − C, and will still activate the reconnect-link C − F . At the same

978-1-4244-4148-8/09/$25.00 ©2009 This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2009 proceedings.

time, C will activate the reconnect-link C − G, which will generate a circle between switches C, F and G (Fig. 2 (c)).

Spanning Tree Reconnetc-link

A

A

A

Spanning Tree E

Reconnetc-link

B

A

A

E B

E

D

C

D

C

C

C

C

(a)

G F

F

G

(b)

F

F

D

D

F

G

D

B F

C

E B

D

A

E B

E B

G

(c)

Fig. 2. (a) link C − F is used as the reconnect-link when link B − C fails (b) link C − G is used as the reconnect-link when link D − F fails (c) an unexpected loop is formed when link B − C and D − F fail.

The easiest way to solve the problem is to let every Ethernet switch adjacent to failed links broadcast a failure message throughout the whole network. Then switches can make the correct decision when they receive all the failure messages, and a loop-free reconnection solution could be found. However, this method takes several Round Trip Times during failure message broadcast and spanning tree reconnection, which can hardly meet the fast failure recovery requirement in Metro Ethernet. We are going to find a different method to handle the double-link failure problem. By carefully pre-configuring the reconnect-links, each Ethernet switch adjacent to the failed link could handle the failure independently, treating the failure as a single link failure, but no loops would be generated in the network at the same time. In Fig. 3, another pre-configuration of the reconnect-links is shown. The reconnect-link of B − C is altered to C − D instead of C − F , while the reconnect-link of D − F remains to be C − G. When the two link B − C and D − F fail, B notifies D to activate C − D, and D notifies G to activate C − G via the path D → C. After that, the spanning tree is reconnected without any loops. Notice that node B does not know the failure of D − F during the reconnection procedure, only handling the failure of link B −C as a single link failure. The example shows that broadcasting failure message all over the network to handel double-link failure is unnecessary. The reconnect-links can be pre-configured in a way that loops are avoided even though each failure is handled independently. Next we give the necessary and sufficient condition for loop avoidance of FSTR mechanism in Metro Ethernet. We fist give the definition of reconnect-path. reconnect-path is defined as the path between the two end nodes of a protected link on the reconnected spanning tree upon single failure of the particular link [1]. It is determined by the protected link and the reconnect-link. For example, in Fig. 3, reconnect-path of link B−C is B ↔ D ↔ C, and that of D−F is D ↔ B ↔

(a)

G

(b)

G

(c)

Fig. 3. (a) link C − D is used as the reconnect-link when link B − C fails (b) link C − G is used as the reconnect-link when link D − F fails (c) the spanning tree can be safely reconnected without any loop

C ↔ G ↔ F . Obviously, the end nodes of protected link and reconnect-link are all on the corresponding reconnect-path. Lemma 1. Let RPi be the reconnect-path of link i on a spanning tree and let m and n be two links on a spanning tree. Loop brought by double-link failure of m and n during spanning tree reconnection does not exist if and only if for m and n, the following condition must be satisfied: / RPm 1) If m ∈ RPn , then n ∈ 2) If n ∈ RPm , then m ∈ / RPn Proof: Assume links A − B and M − N are two links on a spanning tree. Assume rA−B and rM −N are the reconnectlinks of A − B and M − N . If the condition is not satisfied, we will have A − B ∈ RPM −N and M − N ∈ RPA−B . Assume the length of RPM −N is longer than RPA−B . Since A − B ∈ RPM −N , then A, B, M and N which are end nodes of the two links all belong to RPM −N . According to the definition of spanning tree, there is only one path on the spanning tree between any node pair, thus any node between M and A or N and B on the reconnected tree upon failure of A − B should belongs to RPM −N . Otherwise, there will be two paths between a node pair on the spanning tree. On another hand, since M − N ∈ RPA−B , any nodes between M and A or N and B on the reconnected tree upon failure of A − B should also belongs to RPA−B . Due to the assumption that RPM −N is longer than RPA−B , it can be concluded that if a node belongs to RPA−B , then it belongs to RPM −N . Because the two end nodes of rA−B are on RPA−B , they must be on RPM −N . Upon failure of A − B and M − N , rM −N will connect two nodes on RPM −N , and rA−B will also connect two nodes on RPM −N . rM −N , rA−B and RPM −N will generate a circle as shown in Figure 4. Therefore, the loop will be formed if the condition is not satisfied. If the two links satisfy the condition that A − B ∈ RPM −N and M − N ∈ / RPA−B . Since M − N ∈ / RPA−B , failure of M − N does not affect the reconnection upon failure of

978-1-4244-4148-8/09/$25.00 ©2009 This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2009 proceedings.

Spanning Tree Reconnect-link Reconnect-path

M

tion for single link failures, and can provide partial protection for double-link failures. To achieve optimal performance, it should protect the links that are more fragile or more necessary to be protected. In the next section, we will formulate the problem of reconnect-links pre-configuration for double-link failures so that the protection grade guarantees in the network can be met.

N

A

C

B D E

Fig. 4. Reconnect-links: rM −N = E − D and rA−B = B − C. Reconnectpath RPM −N is M ↔ A ↔ B ↔ E ↔ D ↔ C ↔ N ; Reconnect-path RPA−B is A ↔ M ↔ N ↔ C ↔ B.

A − B. Hence reconnecting the spanning tree with rA−B will not generate a loop. Also because of M − N ∈ / RPA−B , there are at least one end node of M − N not on RPA−B . Let node M be the node not on RPA−B , every node between M and one end node of reconnect link rM −N does not belong to RPA−B either. Otherwise, these links would form a loop in a spanning tree, which is contradictory to the definition of spanning tree. Since these nodes on RPM −N does not belong to RPA−B . It can be concluded that at least one end node of rM −N is not on RPA−B . Therefore when spanning tree is reconnected upon failure of A − B and M − N , rM −N is not a link between two nodes of RPA−B and will not generate a loop with RPA−B and rA−B . Obviously, we have B−C ∈ RPD−F and D−F ∈ / RPB−C in Fig. 3. The condition is satisfied and no loop exists. It also can be seen that the example in Fig. 2 does not satisfy the condition in Lemma 1. Our mechanism to handle double-link failure in Metro Ethernet is based on FSTR mechanism, but with a smarter pre-configuration of reconnect-links. When single link failure or the failure as in Case 1 and Case 2 happens, the new mechanism has the same procedure as original FSTR mechanism to reconnect the spanning tree. To handle failure as in Case 3, two different reconnect-links are pre-configured for each protected link on the spanning tree. If the protected link and one of its reconnect-link fail at the same time, end nodes adjacent to the failed reconnect-link would send a message to end nodes of the protected link, asking them to use secondary reconnect-link. After that, the new reconnect-link could be used to reconnect the spanning tree. To handle failure as in Case 4, the pre-configuration of reconnect-links should make the protected link pairs on the spanning tree satisfy the condition in Lemma 1. The mechanism uses the same failure notification and switching table reconfiguration method as FSTR mechanism, hence it also has fast recovery speed and low signaling overhead. However, there is possibility that due to the topology and spanning tree constraints, some protected link pairs on the spanning tree do not satisfy the conditions in Lemma 1, and their failure cannot be protected. As the matter of fact. our mechanism for double-link failure can provide 100% protec-

IV. P ROBLEM F ORMULATION A. Failure Patterns First of all, We formulate the problem under the assumption that network link failures happen independently and constitute a memoryless process [13]. Denote fi the failure of single link i and fij the failure of link i and j, and the scenario of no failure is denoted by f0 . In addition, the stationary distribution of fi and fij are given by π(fi ) and π(fij ). If we only consider and double-link failures, there should  single be π(f0 ) + π(fi ) + π(fij ) = 1. B. Definition of Protection Grade A protection grade of a connection is defined as the probability that the connection can be protected by the mechanism. It also represents the percentage of the connection’s traffic that can be protected, under the assumption that the failure patterns constitute a memoryless process. Network service provider should assign a protection grade requirement for each connection. In case of a failure, spanning trees should be reconnected and bandwidth provision should be reconfigured to guarantee the requirement. Denote ρc (fij ) the percentage of bandwidth that connection c is provided in case of failure fij , there should be ρc (fij ) ∈ [0, 1]. Since our mechanism can fully protect single link failure, the total protectiongrade the network can provide to a particular connection c is π(fij )ρc (fij ) + π(fi ) + π(f0 ), which must be larger than a given protection grade. C. Integer Linear Programming Problem Formulation We formulate the problem as follows: Given one or more spanning trees established in a network and a set of end-toend connections, assign each connection to a proper working spanning tree, and select the best reconnect-links to reconnect each spanning tree to satisfy the protection grade for each connection, such that the backup capacity reserved in the network is minimized. Variable notations are given in Table I. We assume that links are undirected, and end-to-end connections are bidirectional.To calculate the backup capacity required for a particular failure on each link, we first get the working capacity on each link based on working spanning tree assignment. Then the traffic of each link on reconnected spanning tree upon the particular failure can be obtained, and the backup capacity required on the link is the additional traffic when the spanning tree is reconnected. The objective function of the ILP model can be expressed as:  rm min m∈E

978-1-4244-4148-8/09/$25.00 ©2009 This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2009 proceedings.

TABLE I N OTATIONS (V, E) K Tk F fi fij Tfk ,x i

Tfk

ij ,xy

π(f ) ρc (fij ) C c Ac Pc (T, m) rm f rm f wm akc Rfk bkf

il

bbkf

il

RPfk l i

a network with node set V and edge set E number of established spanning trees working spanning tree k set of all failure scenarios single link failure of link i double-link failure of link i and j reconnected spanning tree with the reconnect link x upon failure of i reconnected spanning tree with the reconnect link x and y upon failure of i and j stationary probability of a failure scenario f percentage of bandwidth provision to connection c upon failure of i and j end-to-end connection set one connection in set C protection grade requirement of connection c set 1 if link m belongs to the path of connection c on spanning tree T , otherwise set 0 reserved backup capacity on link m reserved backup capacity on link m for failure of f working traffic on link m upon failure of f binary variable, akc = 1 if connection c uses spanning tree k as working spanning tree set of links which can be used to reconnect spanning tree k upon failure of link f binary variable, bkf l = 1 if on spanning tree k, link i l is the reconnect-link upon failure of link i binary variable, bbkf l = 1 if on spanning tree k, i link l is the secondary reconnect-link upon failure of link i to handle Case 3 failure reconnect-path when link l is the reconnect-link to protect link fi on spanning tree k

Similar to the constraints in [1], we also have the constraints for working spanning tree assignment and reconnect-link preconfiguration. Only one working spanning tree is assigned to each connection. Only one primary and one secondary reconnect-link is assigned to each protected link on the spanning tree. 

akc = 1

k 

bkfi l = 1 bbkfi l = 1

∀m ∈ E

k c∈C

fi wm =

 

akc bbkfi l Pc (Tfki ,l , m)c

∀m, i ∈ E

k c∈C l∈Rfk

i

fi fi f0 + rm = (wm − wm )

∀m, i ∈ E

To handle failure as in Case 3, the backup capacity using the secondary reconnect-link should be reserved independently. Since the primary and secondary reconnect-link can not fail concurrently, the backup capacity reserved in the network when the secondary reconnect-link is activated can be shared with the backup capacity for the primary reconnect-link. The similar equations for single link failure as above are used. The variable bfi l is replaced by bbfi l , and one additional condition ensures that the reconnect-link and the secondary reconnectlink of a protected link are not same.    fij = akc bkfi l Pc (Tfki ,l , m)c ∀m, i, j ∈ E wm k;j∈Rfk c∈C l∈Rfk i

bkfi l bbkfi l = 0

i

∀l ∈ E, i ∈ T k

For the failure as in Case 4, the backup capacity required for a particular failure can be calculated as follows.   fij = akc ρc (fij )Pc (Tfkij ,lt , m)c ∀m, i, j ∈ E wm fij rm

=

k;i,j∈T k c∈C fij f0 + (wm − wm )

∀m, i, j ∈ E

Furthermore, we have the constraints that the conditions for double-link failure in Lemma 1 should be satisfied. When the conditions are not satisfied, ρc (fij ) = 0 which means the connection c cannot be protected upon failure fij . ρc (fij ) ≤ 1−  k;i,j∈T k

akc (



m∈Rfk ;j∈RPfk m i i

bkfi m



bkfj n )

n∈Rfk ;i∈RPfk n j j

∀c ∈ C

∀i, j ∈ E

The backup capacity reserved on a particular link should be the maximum value among all the failure scenarios.

∀c ∈ C ∀k, ∀i ∈ T k

l∈Rfk



tree is reconnected.  f0 = akc Pc (T k , m)c wm

∀k, ∀i ∈ T k

l∈Rfk

The backup capacity required for single link failure is calculated as follows. It is also the backup capacity required for failure as in Case 2. We first calculate the working traffic on each link, which is the traffic without any failure. Then the traffic on the reconnected tree upon a particular failure can be obtained. The backup capacity reserved for the failure should be the additional traffic traversing the link after the spanning

fi fij rm = max{rm , rm }

∀m ∈ E

Finally, we also have the constraint that the protection grade of each connection must meet a given requirement:   π(fij )ρc (fij ) + π(fi ) + π(f0 ) ≤ Ac ∀c ∈ C fij ∈F

fi ∈F

V. P ERFORMANCE E VALUATION In this section, Optimum results of reconnect-link preconfiguration problem for double-link failures is carried out using CPLEX 9.0. We study the efficiency and characteristics of backup capacity provisioning when the mechanism is used to handle double-link failures in Metro Ethernet. We used

978-1-4244-4148-8/09/$25.00 ©2009 This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2009 proceedings.

450

Total Backup Capacity (Gbps)

400

99% Protection Grade 99.9% Protection Grade

350

0.25

30 connections 40 connections 50 connections

0.2 Percentage

4 × 4 grid network and a number of end-to-end connection are randomly selected in the network. We assume all the connections have the same traffic demand of 1Gbps. We assume the failure of each link in the network are independent, and let the failure probability of a particular link i is P r(i) which is randomly selected from {0.01, 0.001, 0.0001}. Since only the failure of single and  double link are considered, We can get π(f i ) = P r(i) j∈E−{i} (1 − P r(j)) and π(fij ) = P r(i)P r(j) t∈E−{i,j} (1 − P r(t)). We establish four spanning trees in the network according to the spanning tree algorithm used in [1], and let all the connections have the same protection grade requirement of 99% to get the reserved backup capacity. Then then connection grade requirement for all the connection is set to 99.9%. Figure 5 shows the backup capacity reserved in the 16-node grid network with different number of connections. When the protection grade requirement of each connection increases, the total backup capacity reserved in the network increases, due to the fact that more double-link pairs need to be protected to meet the requirement of protection grade requirement.

0.15

0.1

0.05

0 0.999

0.9992

0.9994 0.9996 Protection Grade

0.9998

1

Fig. 6. Protection grade distribution with different number of connections under the 99.9% protection grade constraints.

recovery and simplicity. In addition, we develop a reconnectlink pre-configuration model to provide a certain protection grade for each connection. The numerical results show that our model has the capability to pre-configure the reconnect-links such that the protection requirement for each connection can be satisfied with efficient utilization of the backup capacity.

300

R EFERENCES

250

[1] J. Qiu, Y. Liu, G. Mohan and K. C. Chua, “Fast Spanning Tree Reconnection in Resilient Metro Ethernet Networks,” in Proc. of ICC’09, 2009. [2] M. Ali, G. Chiruvolu, and A. Ge, “Traffic Engineering in Metro Ethernet,” IEEE Network, vol.2, pp.11-17, 2005. [3] IEEE 802.1d, Standard for Local and Metropolitan Area Networks-Media Access Control(MAC) bridges. [4] IEEE 802.1w, Standard for Local and Metropolitan Area Networks-Rapid Reconfiguration of Spanning Tree. [5] IEEE 802.1s, Standard for Local and Metropolitan Area NetworksMultiple Spanning Trees. [6] IEEE 802.1q, Standard for Local and Metropolitan Area Networks-Virtual Bridged Local Area Networks. [7] M. V. Pardmaraj, V. S. Nair, M. Marchetti, G. Chiruvolu, and M. Ali, “Bandwidth Sensitive Fast Failure Recovery Scheme for Metro Ethernet,” Computer Networks, vol.52, pp.1603-1616, 2008. [8] A. Kern, I. Moldocan, and T. Cinkler, “Bandwidth Guarantees for Resilient Ethernet Networks through RSTP Port Cost Optimization,” in Proc. of AccessNets’07, 2007. [9] S. Sharma, K. Goplan, S. Nanda, and T. Chiueh, “Viking: A MultipleSpanning-Tree Ethernet Architecture for Metropolitan Area and Cluster Networks,” in Proc. of IEEE INFOCOM’04, 2004. [10] J.Farkas, C. Antal, L. Westberg, A. Paradisi, T. R. Tronco, and V. G. Oliveira, “Fast Failure Handling in Ethernet Network,” in Proc. of IEEE ICC’06, 2006. [11] M. Huynh, P. Mohapatra, and S. Goose, “Cross-Over Spanning Trees Enhancing Metro Ethernet Resilience and Load Balancing,” in Proc. of Broadnet’07, 2007. [12] J. Qiu, G. Mohan, K. C. Chua, and Y. Liu, “Local Restoration with Multiple Spanning Trees in Metro Ethernet,” in Proc. of ONDM’08, 2008. [13] H. Ma, D. Fayek, P. Ho, “Availability-Constrained Multipath Protection in Backbone Networks with Double-Link Failure,” in Proc. of ICC’08, 2008.

200 150 100 50 0

30

40 Number of End to End Connections

50

Fig. 5. Protection grade distribution with different number of connections under the 99.9% protection grade constraints.

The distribution of the protection grade of all connections under the constraints that protection grade of each connection must be larger than 99.9% is shown in Fig. 6. It can be seen that the distribution of the protection grade are almost the same no matter how many connections are routed in the network. About 50% of the connections can have a protection grade higher than 99.95%, and 25% of the connections have protection level higher than 99.98%. It indicates that even though the protection grade requirement is 99.9%, most of the connections are provided with much higher protection grade. VI. C ONCLUSION In this paper, we introduce a resilient mechanism to handle double-link failures in Metro Ethernet. The mechanism is based on Fast Spanning Tree Reconnection mechanism, but solve the loop problem in FSTR mechanism when handling double-link failures. The mechanism has the features of fast

978-1-4244-4148-8/09/$25.00 ©2009 This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE "GLOBECOM" 2009 proceedings.