A Linear Time Pessimistic Diagnosis Algorithm for

1 downloads 0 Views 991KB Size Report
Aug 13, 2013 - Chiou-Yng Lee, Senior Member, IEEE, and Chien-Ping Chang ... system consists of several processors and the communication .... nodes and can be represented as a string of n elements with ..... Let HMi. nА1;k be an k-ary (n А 1)-dimensional subhypermesh with the n-th digit value i for ..... ming Language.
IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. XX, 2014

1

A Linear Time Pessimistic Diagnosis Algorithm for Hypermesh Multiprocessor Systems under the PMC Model Hong-Chun Hsu, Kuang-Shyr Wu, Cheng-Kuan Lin, Chiou-Yng Lee, Senior Member, IEEE, and Chien-Ping Chang Abstract—In microprocessor-based systems, such as the cloud computing infrastructure, high reliability is essential. As multiprocessor systems become more widespread and increasingly complex, system-level diagnosis will increasingly be adopted to determine their robustness. In this paper, we consider a pessimistic diagnostic strategy for hypermesh multiprocessor systems under the PMC model. The pessimistic strategy is a diagnostic process whereby all faulty processors are correctly identified and at most one fault-free processor may be misjudged to be a faulty processor. We first determine the pessimistic diagnosability of a hypermesh to be 2nðk  1Þ  k. We then propose an efficient pessimistic diagnostic algorithm to identify at most 2nðk  1Þ  k faults in OðNÞ time, where k is the radix, n is the number of dimensions, and N ¼ kn is the total number of processors. This result is superior to the best precise diagnostic algorithm, which runs in OðNlogk NÞ time. Furthermore, the Cartesian product network, a subgraph of the hypermesh and the proposed algorithm can be employed to determine faults in the product network.

Q1

1

INTRODUCTION

H

IE E Pr E oo f

Index Terms—System-level diagnosis, pessimistic strategy, diagnosis algorithm, hypermesh

IGH-SPEED multiprocessor systems are becoming increasingly common in computer technology. A multiprocessor system consists of several processors and the communication links between them. The reliability and availability of the system are crucial, but even a few malfunctions, whether due to manufacturing defects, life expectancy of the device, or environmental disturbance, may make the system unreliable. Whenever processors are found to be faulty, they should be replaced with fault-free ones as soon as possible to guarantee that the system does not suffer from unplanned interruptions of service. Automatic periodic performance of a built-in test to identify all the faulty processors in a system is known as system level self-diagnosis. Once the faulty processors are determined, the system is reconfigured to logically remove such processors from the system. The maximum number of faulty processors that can be accurately identified is an important parameter, known as the diagnosability of a system. In order to identify faults, a number of tests must be performed

• H.-C. Hsu is with the Department of Medical Informatics, Tzu Chi University, Hualien 970, Taiwan. • K.-S. Wu and C.-P. Chang are with the Department of Computer Science and Information Engineering, Chien Hsin University of Science and Technology, Jungli 320, Taiwan. • C.-K. Lin is with the Institute of Information Sciences, Academia Sinica, Taipei 11529, Taiwan. • C.-Y. Lee is with the Department of Computer Information and Network Engineering, Lunghwa University of Science and Technology, Taoyuan 333, Taiwan. Manuscript received 20 Nov. 2012; revised 07 Aug. 2013; accepted 13 Aug. 2013; published online 21 Aug. 2013. Recommended for acceptance by B. Ravindran. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TC-2012-11-0858. Digital Object Identifier no. 10.1109/TC.2013.172

on the processors, with the collection of all test results referred to as a syndrome. A system is said to be t-diagnosable if all its faulty processor can be precisely identified when the total number of faulty processors is at most t. In other words, the diagnosability of a system is exactly equal to the maximum integer t such that the system can be t-diagnosable. According to the traditional precise diagnosis strategy, all processors are correctly identified to be faulty or fault-free [16]. The pessimistic diagnosis strategy proposed by Kavianpour and Friedman [9] is a process of diagnosing faults where all faulty processors can be isolated to a set with at most one fault-free processor. The basic idea behind pessimistic diagnosis is to identify the greatest number of faulty processors in the least number of separate steps, resulting in a reduction in the number of times the system must be shut down for reconfiguration to remove those processors. Thus, reducing the diagnostic demand of self-diagnostic systems has farreaching consequences. More specifically, a system is t1 =t1 diagnosable if, provided the number of faulty processors is bounded by t1 , all the faulty processors can be isolated within a set no larger than t1 . The problem of identifying faulty processors in a multiprocessor system has been addressed by many authors [2]-[7], [9]-[12], [14]-[17], [19]-[24]. A theoretical model for system-level fault diagnosis of multiprocessor systems was first proposed by Preparata, Metze, and Chien [16]. According to this model, called the PMC model, the system to be diagnosed can be represented by a directed graph G ¼ ðV ; EÞ, where a set of nodes V represents processors and a set of edges V represents the test relations between processors. Under the PMC model [16], many efficient algorithms have been developed to precisely diagnose systems [5], [12], [14], [15], [17], [22], [23]. However, only a few efficient algorithms have been designed to diagnose certain variants of

0018-9340 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

2

IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. XX, 2014

2

Fig. 1. Cartesian product G ¼ G1  G2 of two graphs.

paths between any two distinct nodes. Let V 0 and V 00 be two node subsets of V ðGÞ; G  V 0 represents the subgraph of G induced by V ðGÞ  V 0 . For a node u of G, the term NðuÞ is the set of all its neighboring nodes, i.e., NðuÞ ¼ fvjv 2 V ; and ðu; vÞ 2 Eg so the neighborhood of u in V 0 is NðV 0 ; uÞ ¼ NðuÞ \ V 0 . The neighborhood of set V 00 is defined as the set NðV 00 Þ ¼ [v2V 00 NðvÞ  V 00 . The neighborhood of V 00 in V 0 is defined as the set NðV 0 ; V 00 Þ ¼ [v2V 00 NðV 0 ; vÞ  V 00 .

IE E Pr E oo f

hypercube systems under the PMC model using a pessimistic strategy [4], [10], [19], [21], [24]. The topologies of most multiprocessor systems can be formally modeled as a graph G ¼ ðV ; EÞ, defined as a set of vertices V and a set of edges E [13]. Each vertex typically represents a node, and each edge between two vertices represents a channel between the two nodes. A fundamental constraint of the graph model is that each edge joins exactly two vertices. If a network channel can interconnect with more than two nodes, another class of network topologies emerges. Using a graph theoretical framework, members of this class can be modeled as hypergraphs, which are generalizations of the conventional graph in which individual edges are able to join an arbitrary number of vertices. One particular class of regular k-ary n-dimensional hypergraphs, known as hypermeshes [18], [25], has N ¼ kn nodes. The one-dimensional hypermesh, referred to as a cluster, is a hypergraph consisting of k nodes directly connected by optical interconnections, crossbar switches, or spanning busses. A k-ary n-dimensional hypermesh is a Cartesian n-product of a fundamental cluster, to which it owes some desirable topological properties [1], [8], [15], [18], [25]. The Hitachi SR2201, SR8000, the CP-CAPS, and the Cray XK7 adopted for Titan are examples of machines that use this approach. Therefore, the k-ary n-dimensional hypermesh is an attractive candidate for the infrastructure of cloud computing. In this work, we will develop a linear time algorithm to pessimistically diagnose a k-ary n-dimensional hypermesh under the PMC model. As mentioned above, many commercial machines have adopted this topology, making a system level diagnostic algorithm necessary to ensure system reliability. Based on a Hamiltonian path in the hypermesh system, this pessimistic diagnostic algorithm can achieve OðNÞ time complexity, where N denotes the number of processors in the system. To our knowledge, the pessimistic diagnosability of hypermeshes under the PMC model has not previously been determined, but herein is calculated to be 2nðk  1Þ  k. The rest of this paper is organized as follows. In the next section, we provide necessary background and notations. Section 3 shows the pessimistic diagnosability of a hypermesh, followed by an algorithm for pessimistic diagnosis of a hypermesh under a PMC model in Section 4. In Section 5, the diagnostic performance of different networks is compared. Conclusions are drawn in Section 6.

BACKGROUND AND NOTATIONS

The topology of a multiprocessor system is often represented by an undirected graph G ¼ ðV ; EÞ, where the set of nodes V ðGÞ represents the processors and the set of edges EðGÞ represents the communication links between the processors. Two nodes u and v are adjacent if ðu; vÞ 2 E; the node u is called a neighbor of v, and vice versa. A graph G0 is a subgraph of a graph G if V ðG0 Þ  V ðGÞ and EðG0 Þ  EðGÞ. The components of a graph G are its maximally connected subgraphs. A component is trivial if it has no edges; otherwise, it is nontrivial. The connectivity of a graph G, denoted by ðGÞ, is the minimum number of nodes such that removal will result in a disconnected or a trivial graph. A graph G is t-connected if ðGÞ  t. Given a t-connected graph, Menger’s theorem states there exist t internally node-disjoint (abbreviated as disjoint)

Definition 1. The Cartesian product of two graphs G1 and G2 is the graph G ¼ G1  G2 , where the node set V ðGÞ and the edge set EðGÞ are given by, 1. V ðGÞ ¼ fhx; yijx 2 V ðG1 Þ and y 2 V ðG2 Þg, and 2. for any two distinct nodes u ¼ hu1 ; u2 i and v ¼ hv1 ; v2 i in V , ðu; vÞ 2 E if and only if ðu1 ; v1 Þ 2 EðG1 Þ and u2 ¼ v2 , or ðu2 ; v2 Þ 2 EðG2 Þ and u1 ¼ v1 . Let y be a fixed node of G2 . The subgraph Gy1 -component of G1  G2 has node set fðx; yÞjx 2 V ðG1 Þg and edge set fðu; vÞju ¼ hu1 ; yi; v ¼ hv1 ; yi; and ðu1 ; v1 Þ 2 EðG1 Þg. Similarly, let x be a fixed node of G1 ; the subgraph Gx2 -component of G1  G2 has node set fðx; yÞjy 2 V ðG2 Þg and edge set fðu; vÞju ¼ hx; u2 i; v ¼ hx; v2 i; and ðu2 ; v2 Þ 2 EðG2 Þg. Clearly, the Gy1 -component (abbreviated as Gy1 ) and the Gx2 -component (abbreviated as Gx2 ) are isomorphic with G1 and G2 , respectively (as illustrated in Fig. 1). A hypergraph is represented by H ¼ ðV ; EÞ where the set of nodes V ðHÞ represent the processors and the set of hyperedges EðGÞ is the buses between the processors [18]. A component is a trivial hypergraph if it has no hyperedges; otherwise, it is nontrivial. The connectivity of a hypergraph H, denoted by ðHÞ, is the minimum number of nodes such that removal will result in a disconnected hypergraph or a trivial hypergraph. A k-ary n-dimensional hypermesh HMn;k ¼ ðV ; EÞ is defined by a set of nodes V with kn nodes and a set of hyperedges E with nkn1 hyperedges, as shown in Fig. 2 [18]. Thus, we can rewrite the definition of hypermesh as Definition 2. Definition 2. The k-ary n-dimensional hypermesh HMn;k is recursively constructed as follows: HM1;k is a complete graph with k nodes labeled 0, 1, 2, : : : , k  1, respectively. HMn;k is the Cartesian product of HMn1;k and HM1;k . The node u can be represented by f0; 1; . . . ; k  1gn and the hyperedge is a set of k

HSU ET AL.: LINEAR TIME PESSIMISTIC DIAGNOSIS ALGORITHM FOR HYPERMESH MULTIPROCESSOR SYSTEMS

3

TABLE 1 PMC Model Test Outcome

Fig. 2. An example of HM3;4 . Fig. 3. Illustration of a distinguishable pair (F 1; F 2).

Let ðF Þ denote the set of all possible syndromes with which the faulty set F can be consistent. Then two distinct faulty sets F1 and F2 of G are said to be distinguishable if ðF1 Þ \ ðF2 Þ ¼ ; otherwise, F1 and F2 are said to be indistinguishable. That is, (F1 ; F2 ) is a distinguishable pair (respectively, an indistinguishable pair) of faulty sets if ðF1 Þ \ ðF2 Þ ¼  (respectively, ðF1 Þ \ ðF2 Þ 6¼ ). Let F1  F2 ¼ ðF1  F2 Þ [ ðF2  F1 Þ be the symmetric difference. Dahbura and Masson [5] presented a sufficient and necessary characterization of t-diagnosable systems and exploited it to design a polynomial-time algorithm for identifying the set of faulty processors in a t-diagnosable system. Thus, (F1 ; F2 ) is a distinguishable pair if and only if there exists a node u 2 V ðGÞ  ðF1 [ F2 Þ and there exists a node v 2 F1  F2 such that ðu; vÞ 2 E, as shown in Fig. 3.

IE E Pr E oo f

nodes and can be represented as a string of n elements with (n  1)’s fixed radix-k digits and a symbol : eðxn1 ; . . . ; xiþ1 ; ; xi1 ; . . . ; x0 Þ ¼ f[k1 l¼0 ðxn1 ; . . . ; xiþ1 ; l; xi1 ; . . . ; x0 Þg ði 2 f0; 1; . . . ; n  1gÞ. By definition 2, the k-ary n-dimensional hypermesh HMn;k is nðk  1Þ regular and its connectivity is ðHMn;k Þ ¼ nðk  1Þ [18]. In particular, a k-ary n-dimensional hypermesh HMn;k is an n-dimensional hypercube for k ¼ 2. Therefore, the hypercube is a special case of hypermesh. It has been shown that the diagnosability of a hypercube under the pessimistic strategy is 2n  2 and that n must be greater than 3 [10]. In this paper, we thus focus on k  2 and n  4 hypermesh systems so that our proposed algorithm will also work for hypercubes. The precise diagnostic strategy under the PMC model of HMn;k was proposed in [15] and [23] and the bijective connection graphs diagnostic algorithm presented in [14]. The pessimistic diagnostic strategy under the PMC model with respect to a hypercube-like graph was introduced in [19] and [24] and a generalized hypercube graph was presented in [4]. Furthermore, the authors [2] proposed the sufficient conditions of regular networks for a pessimistic diagnostic strategy under the PMC model. However, we cannot directly obtain the diagnosability of HMn;k under the pessimistic strategy from these results because there are triangles and at most k  2 common neighbors for any two nodes in HMn;k . Therefore, we will investigate the pessimistic diagnosability of HMn;k under the PMC model. Furthermore, we are going to develop an efficient pessimistic diagnosis algorithm to identify faulty nodes in OðNÞ time, with N ¼ kn nodes. Under the PMC model, the testing processor u conducts a test on the tested processor v and provides a test outcome, ðu; vÞ, defined as 0 if u evaluates v to be fault-free, and 1 otherwise. In this situation, u is a tester, and v is a testee. The PMC model assumes that fault-free processors are able to accurately identify the status of other processors they test, while the test outcome of a faulty processor is arbitrary and unreliable, as shown in Table 1. The collection of test outcomes obtained after a test phase, denoted by , is called a syndrome. Given a syndrome , the procedure of determining the status of the processors is known as diagnosis.

Definition 3 [2]. Under the pessimistic diagnosis strategy, a system G ¼ ðV ; EÞ is t=t-diagnosable if for any two subsets F1 and F2 of V ðGÞ such that jF1 j  t, jF2 j  t, and jF1 [ F2 j > t, (F1 ; F2 ) is a distinguishable pair.

3

PESSIMISTIC DIAGNOSABILITY OF

A

HYPERMESH

A symmetric graph G is a graph that is both edge-transitive and node-transitive. In the following lemma, we show that HMn;k is a symmetric network. Lemma 1. HMn;k is a symmetric network.

Proof. We first show HMn;k to be a node-transitive network. For any two distinct nodes u ¼ un1 un2    u0 and v ¼ vn1 vn2    v0 in HMn;k , we define a function i ðlÞ ¼ ðl þ vi  ui Þ mod k where l 2 f0; 1; . . . ; k  1g fu;v n1 for 0  i  n  1. Then we set fu;v ðwÞ ¼ fu;v 0 ðwn1 Þ . . . fu;v ðw0 Þ where w ¼ wn1 wn2 . . . w0 is a node in HMn;k . Obviously, fu;v is an automorphic function and fu;v ðuÞ ¼ v. Thus, HMn;k is a node-transitive network. We then show HMn;k to be an edge-transitive network. Let ðu; vÞ ¼ ðun1 . . . u0 ; vn1 . . . v0 Þ and ðp; qÞ ¼ ðpn1 . . . p0 ; qn1 . . . q0 Þ be two arbitrarily distinct edges of HMn;k . According to the definition of HMn;k , there is an index s such that ui ¼ vi for all i 6¼ s and us 6¼ vs , and there is an

4

IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. XX, 2014

index t such that pj ¼ qj for all j 6¼ t and pt 6¼ qt . We can define the transitive function at i as follows: 8 < xi ; if i 2 f0; 1; . . . ; n  1g  fs; tg; i (1) gp;q ðxn1 xn2 . . . x0 Þ ¼ xs ; if i ¼ t; : xt ; if i ¼ s; where the edge (p; q) with a digit at t is mapped to edge (u0 ; v0 ) with a different digit at s, and xn1 xn2 . . . x0 is a node in HMn;k for 0  i  n  1. Thus, the transitive func0 tion is defined as gp;q ðwÞ ¼ gn1 p;q ðwÞ . . . gp;q ðwÞ with w ¼ wn1 . . . w0 . Then the edge-transitive function is defined by g u;v;p;q ðx; yÞ ¼ ðfu0 ;u gp;q ðxÞ; fv0 ;v gp;q ðyÞÞ where (x; y) is an edge of HMn;k . It is easy to verify that the function g u;v;p;q is an automorphic function and g u;v;p;q ðp; qÞ ¼ ðu; vÞ. Hence, HMn;k is an edge-transitive network. Consequently, HMn;k is a symmetric network. ◽

Lemma 3 [18]. HMn;k is nðk  1Þ regular and ðHMn;k Þ ¼ nðk  1Þ. Lemma 4. Let (u; v) be a pair of adjacent nodes in HMn;k , where k  2 and n  4. Then jNðfu; vgÞj ¼ 2nðk  1Þ  k. Proof. Since u and v are adjacent nodes, by definition, jNðuÞ\ NðvÞj ¼ k  2. There are a total of jNðfu; vgÞj ¼ jNðuÞjþ jNðvÞj  jðNðuÞ \ NðvÞÞ [ fu; vgj ¼ nðk  1Þ þ nðk  1Þ ◽ ððk  2Þ þ 2Þ ¼ 2nðk  1Þ  k nodes in Nðfu; vgÞ. Let u and v be two adjacent nodes in HMn;k for k  2 and n  4. We then let Nðfu; vgÞ ¼ S, F1 ¼ S [ fug, and F2 ¼ S [ fvg. Thus, we have jF1 j ¼ jF2 j ¼ 2nðk  1Þ k þ 1. Obviously, (F1 ; F2 ) is an indistinguishable pair because there does not exists a node x 2 V ðHMn;k Þ  ðF 1 [ F 2Þ and a node y 2 F 1  F 2 such that ðx; yÞ 2 EðHMn;k Þ. Therefore, the pessimistic diagnosability of HMn;k under PMC model is less than or equal to 2nðk  1Þ  k. We show that the pessimistic diagnosability of HMn;k is 2nðk  1Þ  k in Theorem 1. Theorem 1. Given a system G ¼ ðV ; EÞ modeled by a HMn;k , for k  2 and n  4, the pessimistic diagnosability of G is 2nðk  1Þ  k.

IE E Pr E oo f

To show that there is a Hamiltonian path in HMn;k , we can use the symmetric property to construct a Hamiltonian path starting from (0; . . . ; 0; . . . ; 0) (abbreviated as 00 . . . 0). Thus, we can find a Hamiltonian path from any node u, as shown in the following Lemma. Let P ðu; vÞ ¼ < u ¼ z0 ; z1 ; . . . ; zm1 ; zm ¼ v > be a path in G. The path P ðu; vÞ can also be written as < u ¼ z0 ; z1 ; P ðz1 ; vÞ > or < u ¼ z0 ; z1 ; z2 > [ P ðz2 ; vÞ where [ is the concatenation of two paths.

We next determine the pessimistic diagnosability of HMn;k under the PMC model. Lemmas 3 and 4 will be used to derive the diagnosability of HMn;k .

Lemma 2. A Hamiltonian path can always be constructed in HMn;k , starting from any node.

Proof. Let u be a node in HMn;k . Since HMn;k is a symmetric network, there is an automorphism such that the node u can be mapped to 00 . . . 0. Thus, one Hamiltonian path can be constructed from 00 . . . 0 instead of u. Let PHð1Þ ¼ < 0; 1;    ; k  1 > be a path with one digit. Then one r [ 2PHð1Þ path with two digits is PHð2Þ ¼ 0PHð1Þ [ 1PHð1Þ [    [ ðk  1ÞPHð1Þ if k is odd, and PHð2Þ ¼ 0PHð1Þ [ r r [ 2PHð1Þ [    [ ðk  1ÞPHð1Þ if k is even, where 1PHð1Þ iPHð1Þ is a path appending one digit with value i to the front of all nodes of PHð1Þ for 0  i  k  1, and the path r denote the reverse node sequence of PHð1Þ . Thus, we PHð1Þ can recursively construct one path with n digits as r [ 2PHðn1Þ [    [ ðk  1Þ PHðn1Þ PHðnÞ ¼ 0PHðn1Þ [ 1PHðn1Þ r if k is odd, and PHðnÞ ¼ 0PHðn1Þ [ 1PHðn1Þ [ 2PHðn1Þ r [    [ ðk  1ÞPHðn1Þ if k is even. It is easy to check that PHð1Þ and PHð2Þ are Hamiltonian paths in HM1;k and HM2;k , respectively. Therefore, PHðnÞ is one Hamiltonian path in HMn;k when iPHðn1Þ is one Hamiltonian path in HMn1;k for all 0  i  k  1. ◽ For example, we can construct a Hamiltonian path from 000 as PH ¼ < 000; 001; 002; 012; 011; 010;020; 021; 022; 122; 121; 120; 110; 111;112; 102; 101; 100; 200; 201;202; 212; 211; 210; 220; 221; 222 > in HM3;3 . Moreover, there is a Hamiltonian path starting from any node in HM3;3 , such as 111. Since HM3;3 is one symmetric network, there exists an automorphism function f000;111 mapping node 000 to 111 as f000;111 ðu2 u1 u0 Þ ¼ ðv2 v1 v0 Þ, where vi ¼ ui þ 1 mode 3 for 0  i  2. Therefore, this Hamiltonian path is PH0 ¼ . One Hamiltonian path in HMn;k is used as a test chain to determine the nodes to be fault-free or faulty.

Proof. Let F1 and F2 be two distinct node sets with jF1 j  2nðk  1Þ  k and jF2 j  2nðk  1Þ  k. Then let S ¼ F1 \ F2 with jSj ¼ p and 0  p  2nðk  1Þ  k  1. We prove this theorem by showing F1 and F2 to be distinguishable where there exists a node u 2 V ðGÞ ðF1 [ F2 Þ and a node v 2 F1  F2 such that ðu; vÞ 2 EðGÞ. If G  S is connected graph, (F1 ; F2 ) is a distinguishable pair. Hence, we consider that G  S is disconnected graph with nðk  1Þ  p  2nðk  1Þ k  1. Since there are at most (k  2) common neighbors for an edge (x; y), we have jNðxÞj þ jNðyÞj  ðk  2Þ ¼ 2nðk  1Þ  ðk  2Þ > 2nðk  1Þ k  1. If G  S has at least two trivial components, we can obtain p > 2nðk 1Þ  k  1. Thus, it is impossible that G  S has two trivial components. If NðxÞ S, we have at least one trivial component. Now, we show that G  S contains one trivial component and the other one is the nontrivial component VC ¼ G i be an k-ary (n  1)-dimensional ðS [ fxgÞ. Let HMn1;k subhypermesh with the n-th digit value i for 0  i  k  1. Since ðHMn1;k Þ ¼ ðn  1Þðk  1Þ < p, there may be a disconnected subhypermesh. Without loss of generality, 0 is a disconnected subhypermesh. Thereassume HMn1;k fore, there are at least ðn  1Þðk  1Þ faulty nodes in 0 0 HMn1;k . Let x be a node in HMn1;k . If x is a trivial i of x must component in G  S, the neighbor y 2 HMn1;k be faulty for all 1  i  k  1. Thus, there are at most 2nðk  1Þ  k  1  nðk  1Þ ¼ nðk  1Þ  k  1 faulty nodes in S  NðxÞ. i is a ðn  1Þðk  1Þ-connected graph Since each HMn1;k i is a with ðn  1Þðk  1Þ > nðk  1Þ  k  1, each HMn1;k connected graph for all 1  i  k  1. Let w be any fault-free 0 i . If the neighbor s of w in HMn1;k is faultnode in HMn1;k free for some 1  i  k  1, each fault-free node w is connected by a path to the connected subhypermesh i via s such that the nontrivial component HMn1;k

HSU ET AL.: LINEAR TIME PESSIMISTIC DIAGNOSIS ALGORITHM FOR HYPERMESH MULTIPROCESSOR SYSTEMS

4

a Hamiltonian path as possible to be fault-free with respect to a given syndrome , because all the nodes in V ðGÞ  W are fault-free. Moreover, nodes in W cannot be determined to be faulty or fault-free by the Path-Evaluation algorithm. Algorithm 1: Path-Evaluation PE(HMn;k , ) Input: A syndrome  of HMn;k . Output: A set W containing all indeterminate nodes. Initialization: Set W to the empty set. 1

Construct a Hamiltonian path H in HMn;k starting at any 1 be -one paths in H node u. Let all of H11 ; H21 ; . . . ; Hm 2 1 2 withPrespect to the . Set W as [m i¼1 V ðHi Þ and m2 1 s ¼ i¼1 bV ðHi Þ=2c;

2

for i ¼ 1 to m1 do By Lemma 5 and given s, calculate each -zero path to find all possible faulty nodes in Hi0 as F ;

3

end for

4

Set W as W [ F ;

IE E Pr E oo f

VC ¼ G  ðS [ fxgÞ. Next, suppose that the neighbor s of w i is faulty for all 1  i  k  1. There are at most in HMn1;k nðk  1Þ  k  1  ðk  1Þ nodes in F 0 ¼ S ðNðxÞ [ i i ; wÞÞ where NðHMn1;k ; wÞ denotes the neighNðHMn1;k i bor of w in HMn1;k for all 1  i  k  1 and F 0 is the remaining faulty node set. There must exist one fault-free 0 . There are at most three neighneighbor z of w in HMn1;k 0 ; xÞ. Let T ¼ ððNðwÞ \ bors of z and w in NðHMn1;k 0 0 HMn1;k Þ [ ðNðzÞ\ HMn1;k ÞÞ  fw; zg  NðxÞ be the 0 . By Lemma 4, there are jNðfz; node set in HMn1;k 0 with at wgÞj ¼ 2ðn  1Þðk  1Þ  k neighbors in HMn1;k least 2ðn  1Þðk  1Þ  k  3 neighbors not in NðxÞ, i.e., jT j  2ðn  1Þðk  1Þ  k  3. We choose the node qj in i for fixed 1  i  k  1 which is the neighbor pj HMn1;k for pj 2 T and 1  j  2ðn  1Þðk  1Þ  k  3. Thus, we have at least 2ðn  1Þðk  1Þ  k  3 pairs of (pj ; qj ), which are the edges. Since jT j  2ðn  1Þðk  1Þ  k  3 > nðk  1Þ  k  1  ðk  1Þ  jF 0 j for k  2 and n  4, there exists i = F 0 and qj 2 HMn1;k . an edge (pj ; qj ) with fault-free pj2 i Similarly, since each HMn1;k is a connected subhypermesh for all 1  i  k  1, each fault-free node w is connected by a i via pj and qj path to the connected subhypermesh HMn1;k such that the nontrivial component is VC ¼ G  ðS [ fxgÞ. Hence, (F1 ; F2 ) is a distinguishable pair. Suppose that NðxÞ 6 S for each x 2 F1  F2 , G  S is connected by a path from x to the connected subhyperi via y which is the neighbor of x with mesh HMn1;k i y 2 HMn1;k for 1  i  k  1. Since there is one nontrivial component in G  S, (F1 ; F2 ) is a distinguishable pair. Thus, the proof is complete. ◽

5

A PESSIMISTIC DIAGNOSTIC ALGORITHM HYPERMESH

FOR A

The pessimistic diagnosability of HMn;k has been derived to be 2nðk  1Þ  k. We now consider the design of a pessimistic diagnostic algorithm. Given a syndrome  in a graph G ¼ ðV ; EÞ, a path in G is -zero (respectively, -one) if ðu; vÞ ¼ 0 (respectively, ðu; vÞ ¼ 1) for any two consecutive nodes. The following two lemmas, proven in [19], are used to determine the status of nodes in a graph. Lemma 5 [19]. Let G be a graph with at most t faulty nodes, and let  be a syndrome of G with the path P ¼ < v1 ; v2 ; . . . ; vl > being -zero. If V ðGÞ  V ðP Þ contains s faulty nodes, then vi faultfree for all t  s þ 1  i  l. Moreover, vi is a possible fault for each 1  i  t  s. Notably, from Table 1, at least one node u or v faulty if ðu; vÞ ¼ 1. Thus, Lemma 6 follows.

Lemma 6 [19]. Let G be a graph with at most t faulty nodes, and let  be a syndrome of G with the path P ¼ < v1 ; v2 ; . . . ; vl > being -one. Then, there are at least bl=2c faulty nodes in P . Consider the syndrome  of a graph G that has a Hamiltonian path H, the path H might be decomposed as several 0 , and -one paths, -zero subpaths, namely H10 ; H20 ; . . . ; Hm 1 1 1 1 namely H1 ; H2 ; . . . ; Hm2 where jm1  m2 j  1. After performing the following Path-Evaluation algorithm, all the undetermined nodes are located in a set W and each node in V ðGÞ  W is fault-free. By Lemma 5 and Lemma 6, the Path-Evaluation algorithm can determine as many nodes in

5

return W

Statement 1 takes OðNÞ time where N ¼ kn . Since there are 2nðk  1Þ  k faulty nodes, statement 2 is performed OðnkÞ times. The number of faulty nodes of each -zero path Hi0 is solved in Oð1Þ time by Lemma 5 for a given value of s. Inserting a possible faulty node in W requires Oð1Þ time, bounded by total number of faults 2nðk  1Þ  k. Therefore, the set W can be determined in OððnkÞ2 Þ time. Thus, we have the following lemma. The time complexity of the PathEvaluation algorithm is OðNÞ. Lemma 7. The number of nodes in the set W obtained by PathEvaluation is OððnkÞ2 Þ provided thatthe number of faulty nodes does not exceed 2nðk  1Þ  k.

0 Proof. Let H be a Hamiltonian path, and let H10 ; H20 ; . . . ; Hm 1 1 1 1 be the -zero paths, and H1 ; H2 ;P . . . ; Hm2 be the -one m2 paths. It is observed that s ¼ 1¼1 bjV ðHi1 Þj=2c  m2 and m2  1  m1  m2 þ 1. By Lemma 5, each -zero path Hi0 has at most 2nðk  1Þ  k  m2 undetermined nodes. After performing Path-Evaluation, the W has P set 2 jV ðHi1 Þj at most ð2nðk  1Þ  k  m2 Þðm2 þ 1Þ þ m i¼1 nodes. P Since there are actually 2nðk  1Þ  k faulty m2 1 nodes, i¼1 jV ðHi Þj  4nðk  1Þ  2k. The number of undetermined nodes in W is obtained as follows

jW j  ð2nðk  1Þ  k  m2 Þðm2 þ 1Þ þ

Xm2

i¼1

jV ðHi1 Þj

 ð2nðk  1Þ  k  m2 Þðm2 þ 1Þ þ 4nðk  1Þ  2k   2nðk  1Þ  k 2 ð2nðk  1Þ  kÞ2 ¼  m2  þ 2 4 þ 3ð2nðk  1Þ  kÞ 

ð2nðk  1Þ  kÞ2 þ 3ð2nðk  1Þ  kÞ 4

Therefore, jW j is OððnkÞ2 Þ. The lemma follows.



6

IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. XX, 2014 j the other hand, the connectivity of HMn1;k is : : : Similarly, there are nðk  1Þ  ðk  1Þ disjoint paths from v to x which i , where x is a node do not pass through the nodes in HMn1;k j and the distance between v and x is three. Thus, in HMn1;k the dual tree DTHMn;k ðu; v; 2nðk  1Þ  kÞ rooted at nodes u and v has been constructed with the order 2nðk  1Þ  k. Hence, the proof is complete. ◽

Fig. 4. A dual tree DTG ðu; v; tÞ, rooted at nodes u and v, consists of 2t þ 2 nodes and 2t þ 1 edges.

We next introduce a structure, denoted dual tree, to decide whether at least one node in this tree is fault-free or faulty. Definition 4. Let G be a l-regular graph with t  l. A dual tree of order t rooted at nodes u and v is defined to be a subgraph of G, denoted by DTG ðu; v; tÞ, where V ðDTG ðu; v; tÞÞ ¼ fu; vg [ fðui ; xi Þj 1  i  t  l þ 1g [ fðvj; yj Þ j1  j  l  1g and EðDTG ðu; v; tÞÞ ¼ fðu; vÞg [ fðu; ui Þ; ðui ; xi Þj1  i  t  l þ 1g [ fðv; vj Þ; ðvj; yj Þj1  j  l  1g. (As illustrated in Fig. 4)

IE E Pr E oo f

Using the dual tree structure, we can design a decision scheme to determine the status of one node between the nodes u and v. Let na;b ðuÞ ¼ jfijðxi ; ui; Þ; ðui ; uÞ ¼ ða; bÞ and 1  i  t  l þ 1gj and na;b ðvÞ ¼ jfjjðyj ; vj; Þ; ðvj ; vÞ ¼ ða; bÞ and 1  j  l  1gj, where  is a syndrome of G and a; b 2 f0; 1g. The proof of Theorem 2 is omitted because it is similar to the proof of Theorem 1 in [19]. Theorem 2. Let G be a l-regular graph and t=t-diagnosable and let DTG ðu; v; tÞ be a dual tree of order t rooted at nodes u and v. Then one of the following conditions is satisfied. 1. If ðu; vÞ ¼ 0 and n0;0 ðuÞ þ n0;0 ðvÞ < n0;1 ðuÞ þ n0;1 ðvÞ, the node u is faulty; 2. If ðu; vÞ ¼ 0 and n0;0 ðuÞ þ n0;0 ðvÞ  n0;1 ðuÞ þ n0;1 ðvÞ, the node v is fault-free; 3. If ðu; vÞ ¼ 1 and n0;0 ðuÞ þ n0;1 ðvÞ < n0;1 ðuÞ þ n0;0 ðvÞ, the node u is faulty; 4. If ðu; vÞ ¼ 1 and n0;0 ðuÞ þ n0;1 ðvÞ  n0;1 ðuÞ þ n0;0 ðvÞ, the node v is faulty. Since the status of one of nodes u and v can be identified in DTG ðu; v; tÞ, we use this structure to determine the status of all of the nodes in W . By Theorem 1, the n-dimensional hypermesh network is t=t-diagnosable with t ¼ 2nðk  1Þ  k. Thus, we need to construct a dual tree DTHMn;k ðu; v; 2nðk  1Þ  kÞ in HMn;k . In the following lemma, we show that there is a dual tree DTHMn;k ðu; v; 2nðk  1Þ  kÞ in HMn;k between any two distinct nodes. Lemma 8. There is a dual tree DTHMn;k ðu; v; 2nðk  1Þ kÞ rooted at nodes u and v in HMn;k for n  4 and k  2. i HMn1;k

The sequence for generating the dual tree is as follows. Let l with a different digit at A be k  2 neighbors of u in HMn1;k n  1, with l 6¼ j. We then choose the neighbors of A with a different digit at 0 for each node in A. The edges sequence is denoted by < n  1; 0 > . Thus, the other nðk  1Þ  ðk  1Þ edge sequence can be represented by < n  2; 0 > ; . . ., < 2; 0 > , < 1; 2 > , < 0; 1 > . Therefore, the subtree of order nðk  1Þ  1 rooted at node u is generated. Similarly, the subtree of order nðk  1Þ  ðk  1Þ rooted at node v is generated as the sequence < n  2; 0 > ; . . ., < 2; 0 > , < 1; 2 > , < 0; 1 > . For example, 0 and the dual tree DTHM4;3 ðu; v; 13Þ for u ¼ 0000 2 HM3;3 1 is as follows. The edge sequence is listed as v ¼ 1000 2 HM3;3 < 2000; 2001 > , < 0100; 0101 > , < 0200; 0201 > , < 0010; 0110 > , , < 0001;0011 > , < 0002; 0021 > , < 1100; 1101 > , < 1200; 1201 > , < 1010; 1110 > , < 1020; 1120 > , < 1001; 1011 > , < 1002; 1012 > .

be an k-ary (n-1)-dimensional Proof. Let subhypermesh with the n-th digit being i for 0  i  k  1. Let (u; v) be an edge of HMn;k where u is a j i node in HMn1;k and v is a node in HMn1;k for 0  i 6¼ j  k  1. Since HMn;k is a symmetric network, we can map the edge ðu; vÞ to one edge with a different digit at n  1. By Lemma 3, the connectivity of HMn;k is nðk  1Þ. Thus, there are nðk  1Þ  1 disjoint paths from u to w which do not pass j , where w is a node in through the nodes in HMn1;k i HMn1;k and the distance between u and w is three. On

Algorithm 2: Pessimistic Diagnosis PD(HMn;k , )

Input: A syndrome  of HMn;k .

Output: A set F with all faulty nodes. 1

Initialization: set F to the empty set.

2

W ¼ P EðHMn;k ; Þ;

3

for each node u in W do

4

Choose a node v which is a neighbor of u, and map the edge (u; v) to one edge with a different digit at n  1;

5

Construct a dual tree of order 2nðk  1Þ  k rooted at u and v;

6

The subtree of order nðk  1Þ  1 rooted at node u is a sequence < n  1; 0 > , < n  2; 0 > ; . . ., < 3; 0 > , < 1; 2 > , < 0; 1 > and the subtree of order nðk  1Þ  ðk  1Þ rooted at node v is a sequence < n  2; 0 > ; . . ., < 3; 0 > , < 1; 2 > , < 0; 1 > ;

7

Determine the status of either u or v by Theorem 2.

8

if (u is faulty) then

insert the faulty node u into F as F [ fug; delete the node u from W as W  fug; call Fault-Identify FI(W ; F ; u);

9 10

end if if (v is faulty) then insert the faulty node v into F as F [ fvg; delete the node v from W as W  fvg; call Fault-Identify FI(W ; F ; v);

HSU ET AL.: LINEAR TIME PESSIMISTIC DIAGNOSIS ALGORITHM FOR HYPERMESH MULTIPROCESSOR SYSTEMS

11

12

if (ðv; uÞ ¼ 0

else determine the node v as fault-free and delete v from W as W  fvg;

then the node v is determined as a faulty node set F as F [ fvg;

for each node w in W do

set W as W  fvg;

13

if ðw; vÞ 2 EðHMn;k Þ then

14

if (ðv; wÞ ¼ 0 then

15

insert the faulty node w into F as F [ fwg;

Before showing the algorithm correctness, we need the following lemma to be used in Theorem 3. We begin with the following observations. Let S be a set of nodes of HMn;k with jSj ¼ p. HMn;k  S is a connected graph if p  nðk  1Þ  1. Assume that HMn;k  S is disconnected. By Lemma 4 and Theorem 1, HMn;k  S contains one trivial component v 2 V ðHMn;k Þ  S and the other one is the nontrivial component VC ¼ HMn;k  ðS [ fvgÞ if NðvÞ S. Furthermore, HMn;k  S either contains two trivial components and the other is the nontrivial component VC ¼ HMn;k  ðS [ fu; vgÞ for u 2 V ðHMn;k Þ  S, v 2 V ðHMn;k Þ  S and Nðfu; vgÞ S, or it contains two nontrivial components, one edge being ðu; vÞ and the other VC ¼ HMn;k  ðS [ fu; vgÞ for Nðfu; vgÞ S if 2nðk  1Þ  k  p  3nðk  1Þ  2k  1.

end if end if end for if (jF j ¼ 2nðk  1Þ  k) then return F ;

20

end if

21

end for

22

set A as V ðHMn;k Þ  ðW [ F Þ which is the fault-free set;

23

while (there exists an edge (w; z) between A and W with w 2 W , z 2 A) do

IE E Pr E oo f

19

if (ðz; wÞ ¼ 0 then

set the node w as fault-free A ¼ A [ fwg; delete w from W as W  fwg; 25

return W and F .

else

16

24

end if end for

call Fault-Identify FI(W ; F ; w);

18

end if

determine the node w as fault-free and delete w from W as W  fwg;

delete the node w from W as W  fwg;

17

7

else

set the node w as faulty F [ fwg; delete w from W as W  fwg; 26

end if

27

if (jF j ¼ 2nðk  1Þ  k) then return F ;

28

end if

29

end while

30

return F [ W ;

Algorithm 3: Fault-Identify F IðW ; F ; uÞ

Input: An indeterminate node set W , a faulty set F and a faulty node u. Output: Two sets W and F . for each node v in W do, if ðu; vÞ 2 EðHMn;k Þ

Lemma 9. Let S be a set of nodes of HMn;k with jSj ¼ p. Assume that HMn;k  S is disconnected. Then HMn;k  S either contains two trivial components and the other is the nontrivial component VC ¼ HMn;k  ðS [ fu; vgÞ for u 2 V ðHMn;k Þ  S, v 2 V ðHMn;k Þ  S and Nðfu; vgÞ S, or it contains two nontrivial components, one edge being ðu; vÞ and the other VC ¼ HMn;k  ðS [ fu; vgÞ for Nðfu; vgÞ S if 2nðk  1Þ  k  p  3nðk  1Þ  2k  1.

Proof. We first consider that there are three trivial components in HMn;k  S. There must be 3nðk  1Þ nodes in S, which is contradictory to the assumption. Thus, there are at most two trivial components u and v in HMn;k  S and there are at most 3nðk  1Þ  2k  1  2nðk  1Þ ¼ nðk  1Þ  2k  1 nodes in S  N 0 is a ðfu; vgÞ. Without loss of generality, assume HMn1;k disconnected subhypermesh with two trivial components i is an k-ary (n-1)-dimensional u and v. Since HMn1;k subhypermesh with connectivity ðn  1Þðk  1Þ > i is a connected graph for nðk  1Þ  2k  1, each HMn1;k all 1  i  k  1. Therefore, VC ¼ HMn;k  ðS [ fu; vgÞ is a connected component. On the other hand, by Lemma 4, there are 2nðk  1Þ  k neighboring nodes if (u; v) is an edge. Hence, there are 3nðk  1Þ  2k  1  2nðk  1Þ þ k ¼ nðk  1Þ  k  1 nodes in S  Nðfu; vgÞ. Similarly, i is a connected graph for all 1  i  k  1, each HMn1;k i is a ðn  1Þðk  1Þ-connected graph because each HMn1;k with ðn  1Þðk  1Þ > nðk  1Þ  k  1. Hence, the proof is complete. ◽ By Lemma 9, we can derive the fact that all faulty nodes are isolated to a set with at most one fault-free node. The correctness of the proposed algorithm is proved in the following theorems.

8

IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. XX, 2014

Fig. 5. An example of Path-Evaluation.

IE E Pr E oo f

Theorem 3. Let F be a faulty set. A syndrome  is generated by the set F . For k  2 and n  4, the Pessimistic Diagnosis algorithm can identify all faulty nodes in HMn;k to the set F with jF j  2nðk  1Þ  k and at most one fault-free node in F .

Proof. It follows from the Path-Evaluation that the nodes in W are indeterminate. Since all the nodes in W are checked by the dual tree DTHMn;k ðu; v; 2nðk  1Þ  kÞ, one of two nodes u and v is identified to be faulty or fault-free. Therefore, the nodes in F are faulty and all of nodes in A are fault-free. If there exists an edge (w; z) between A and W with w 2 W , z 2 A, the node w can be determined to be faulty or fault-free. By Theorem 1, if jF j  2nðk  1Þ  k  1, HMn;k  F contains one trivial component v and the other is the nontrivial component VC ¼ HMn;k  ðF [ fvgÞ, which implies the node set W ¼ fvg and v may be a fault-free node. Therefore, the number of nodes in F [ W is at most 2nðk  1Þ  k and all faulty nodes are identified to a set. We then consider the case of jF j ¼ 2nðk  1Þ  k. This case implies that all of the nodes in W are fault-free because there are at most 2nðk  1Þ  k faults. By Lemma 9, HMn;k  F contains two nontrivial components, with one edge being (u; v) and the other VC ¼ HMn;k  ðF [ fu; vgÞ. Since all the nodes in W are checked by the dual tree DTHMn;k ðu; v; 2nðk  1Þ  kÞ, one of two nodes u and v is identified to be fault-free such that two nodes u and v are determined to be fault-free, with u 2 A and v 2 A. Hence, the Pessimistic Diagnosis algorithm identifies all faulty nodes in HMn;k to the set F with jF j  2nðk  1Þ  k and at most one fault-free node in F . ◽ Next, we need to show the time complexity of our proposed algorithm, which is linearly related to the number of nodes. Theorem 4. The Pessimistic Diagnosis algorithm runs in OðNÞ time, where N ¼ kn is the total number of nodes, k is the radix, and n is the number of dimensions. Proof. In statement 2, the Path-Evaluation constructs a Hamiltonian path in OðN ¼ kn Þ time. By Lemma 7, W can be found in OððnkÞ2 Þ time such that statement 3 to statement 21 would be executed in at most OððnkÞ2 Þ

Fig. 6. An example of Pessimistic Diagnosis.

iterations. Statements 5 and 6 can generate a dual tree rooted at any node u 2 W in OðnkÞ time and statement 7 determines the status of one node in OðnkÞ time. The procedure Fault-Identify executes in OðnkÞ time at statements 8 and 10. In fact, at least one node in W will be deleted from statement 8 to statement 21 for each iteration. Therefore, the computation time is at most OððnkÞ3 Þ from statement 3 to statement 21. Since all of the nodes w 2 W are checked by a dual tree to identify their faulty or fault-free status, all of faulty nodes and fault-free nodes are determined in statements 3-15. If jF j < 2nðk 1Þ  k, statement 23 should be performed to identify whether the nodes in W are faulty or fault-free. By Theorem 1, HMn;k  F contains one trivial component v and the other is the nontrivial component VC ¼ HMn;k  ðF [ fvgÞ, which implies the node set W ¼ fvg. Hence, statement 23 to statement 29 execute in OðnkÞ time. Consequently, the time complexity of the proposed algorithm is asymptotically OðNÞ with N ¼ kn for n  4 and k  65. ◽

We list the pairs of (n, k) that fulfill the condition that Oðkn Þ is larger than OððnkÞ3 Þ as (4, 65), (5, 12), (6, 6), (7, 5), (8, 4), (9, 3), and (10, 2). To evaluate the performance of the proposed algorithm, we present experimental results in the next section. The relative efficiency of different networks is calculated, including a binary hypercube, 2D hypermesh and 3D hypermesh.

5

PERFORMANCE EVALUATION

First we give an example of the algorithm Path-Evaluation. Given a HM2;3 and the syndrome  as in Fig. 5, we construct a

HSU ET AL.: LINEAR TIME PESSIMISTIC DIAGNOSIS ALGORITHM FOR HYPERMESH MULTIPROCESSOR SYSTEMS

9

IE E Pr E oo f

Fig. 7. The time complexity of k-ary 2-dimensional hypermesh, k-ary 3-dimensional hypermesh, and n-dimensional hypercube.

Fig. 8. The number of randomly deployed faulty nodes and the number of returned faulty nodes in a k-ary 2-dimensional hypermesh, k-ary 3-dimensional hypermesh, and n-dimensional hypercube.

Fig. 9. The time complexity of 5-ary n-dimensional hypermesh and 6-ary n-dimensional hypermesh.

Hamiltonian Path (the dotted lines) starting at 00 with the r [ 2PHðn1Þ , resulting path format PHðnÞ ¼ 0PHðn1Þ [ 1PHðn1Þ in the -one paths < 02; 12; 11 > and < 10; 20 > , and -zero paths < 00; 01; 02 > , < 11; 10 > and < 20; 21; 22 > . Suppose there are at most two faulty nodes in the graph HM2;3 , the algorithm Path-Evaluation will return the set W as f02; 12; 11; 10; 20g. In this case, we can determine that 12 and 20 are the faulty nodes. Next, we show an example of the algorithm Pessimistic Diagnosis. Fig. 6 represents a part of HM4;3 . In this figure, we construct a dual tree of order 13 rooted at u ¼ 0022 and v ¼ 1022. Clearly, the state of v is unknown (i.e. v is in W ). Moreover, given the syndrome in Fig. 6, we can calculate that n0;0 ðuÞ ¼ 4, n0;1 ðuÞ ¼ 2, n0;0 ðvÞ ¼ 3, and n0;1 ðvÞ ¼ 2. Since n0;0 ðuÞ þ n0;1 ðvÞ ¼ 6 > 5 ¼ n0;1 ðuÞ þ n0;0 ðvÞ and ðu; vÞ ¼ 1, we can conclude that v is faulty. Next, the Pessimistic Diagnosis algorithm will check all the neighbors of v in W . Using

this process, we can identify that 1222, 1012, 1220, 1020, 1000, 1002, and 1102 are all faulty. Finally, we evaluated the performance of Path-Evaluation and Pessimistic Diagnosis with the simulation setup as follows. 1. The hardware and software used to perform the simulation are Intel Core i7-2600 CPU 3.4 GHz, 2. 8 GB DRAM, 3. 64-bit Windows 7 OS, and 4. C++ Programming Language. 2. We configure the infrastructure of the k-ary 2-dimensional hypermeshs, k-ary 3-dimensional hypermeshs, and n-dimensional hypercubes by running 10,000 simulations and computing the average time. 3. We randomly deployed 4ðk  1Þ  k faulty nodes in the k-ary 2-dimensional hypermeshs, 6ðk  1Þ  k faulty nodes in the k-ary 3-dimensional hypermeshs, and 2n  2 faulty nodes in the n-dimensional hypercubes.

10

IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. XX, 2014

The time complexities for the simulations are shown in Fig. 7. The higher the number of nodes, the greater the elapsed time is. Furthermore, the execution time of the Pessimistic Diagnosis for a hypermesh is almost the same as the time for a hypercube when they have the same total number of nodes. Fig. 8 shows that all of the randomly deployed faulty nodes were determined by our algorithm. In other words, the number of faulty nodes returned by our algorithm was equal to the number of faulty nodes deployed. Also, as illustrated in Fig. 8, the hypermesh can tolerate more faults than a hypercube with the same total number of nodes. Thus, our algorithm is an effective approach. Fig. 9 plots time against number of nodes N for 5-ary n-dimensional hypermesh and 6-ary n-dimensional hypermesh. According to statements in the last paragraph of Section 4, the proposed approach is a linear time algorithm as shown in Figs. 9a and 9b if the number of nodes are larger than 57 and 66 , respectively.

6

CONCLUSIONS

REFERENCES

IE E Pr E oo f

The k-ary n-dimensional hypermesh has been implemented in commercial products such as the Hitachi SR2201, SR8000, the CP-CAPS, and the Cray XK7 adopted for Titan. Therefore, the hypermesh has practical applications for the infrastructure of cloud computing. In this work, we consider the pessimistic diagnostic strategy under the PMC model for a hypermesh system. The pessimistic strategy is a diagnostic process that ensures at most one fault-free processor may be misjudged to be a faulty processor. We first derive the pessimistic diagnosability of a hypermesh to be 2nðk  1Þ  k. We then exploit Hamiltonian path and dual tree structures to design the pessimistic algorithm for finding faults. Based on these structural properties, we propose an efficient pessimistic diagnosis algorithm to identify at most 2nðk  1Þ  k faults in OðN ¼ kn Þ time (i.e., linear time) where k is the radix, n is the number of dimensions, and N ¼ kn is the total number of nodes. This result is superior to the best-known precise diagnosis algorithm [15], which runs in OðNlogk NÞ time. Furthermore, the hypercube network is a special case of the hypermesh, where k ¼ 2. Therefore, our algorithm can be used to diagnose faults in a hypercube. Also, the Cartesian product network with k nodes in the component subgraph is a subgraph of a hypermesh if the number of nodes in the component subgraph is the same as the radix k of the hypermesh, as is the case with mesh and tori networks. Thus, the proposed algorithm can also be exploited to identify faults in the product network.

[4] D.R. Duh, C.H. Chen, and K.N. Chang, “A Fast Pessimistic Diagnosis Algorithm for Generalized Hypercube Multicomputer Systems,” J. Supercomputing, vol. 61, no. 3, pp. 605-618, 2012. [5] A.T. Dahbura and G.M. Masson, “An OðN2:5 Þ Faulty Identification Algorithm for Diagnosable Systems,” IEEE Trans. Computers, vol. C33, no. 6, pp. 486-492, June 1984. [6] J. Fan, “Diagnosability of the Möbius Cubes,” IEEE Trans. Parallel and Distributed Systems, vol. 40, no. 1, pp. 88-93, Sept. 1991. [7] J. Fan, and X. Lin, “The t/k-Diagnosability of the BC Graphs,” IEEE Trans. Computers, vol. 54, no. 2, pp. 176-184, 2005. [8] J.W. Goodman, F.I. Leonberger, S.Y. Kung, and R.A. Athale, “Optical Interconnections for VLSI Systems,” Proc. IEEE, vol. 72, no. 7, pp. 850-866, July 1984. [9] A. Kavianpour and A.D. Friedman, “Efficient Design of Easily Diagnosable System,” Proc. IEEE Computer Society 3rd USA-Japan Computer Conf., 1978. [10] A. Kavianpour and K.H. Kim, “Diagnosabilities of Hypercubes under the Pessimistic One-Step Diagnosis Strategy,” IEEE Trans. Computers, vol. 40, no. 2, pp. 232-237, Feb. 1991. [11] A. Kavianpour, “Sequential Diagnosability of Star Graphs,” Computers Electrical Eng., vol. 22, no. 1, pp. 37-44, 1996. [12] T.L. Kung, H.C. Chen, and J.M. Tan, “On the Faulty Sensor Identification Algorithm of Wireless Sensor Networks under the PMC Diagnosis Model,” Proc. 6th Int’l Conf. Network Computing and Advanced Information Management, pp. 657-661, 2010. [13] F.T. Leighton, Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes. Morgan Kaufmann, 1992. [14] T.K. Li, C.H. Tsai, and H.C. Hsu, “A Fast Fault-Identification Algorithm for Bijective Connection Graphs Using the PMC Model,” Information Sciences, vol. 187, no. 15, pp. 291-297, Mar. 2012. [15] X. Liu, X. Yang, and M. Xiang, “One-Step t-Fault Diagnosis for Hypermesh Optical Interconnection Multiprocessor Systems,” J. Systems and Software, vol. 82, pp. 1491-1496, 2009. [16] F.P. Preparata, G. Metze, and R.T. Chien, “On the Connection Assignment Problem of Diagnosis Systems,” IEEE Trans. Electronic Computers, vol. 16, no. 12, pp. 848-854, Dec. 1967. [17] G. Sullivan, “An Oðt3 þ jEjÞ Fault Identification Algorithm for Diagnosable Systems,” IEEE Trans. Computers, vol. 37, no. 4, pp. 388-397, Apr. 1988. [18] T. Szymanski, “Hypermeshes: Optical Interconnection Networks for Parallel Computing,” J. Parallel and Distributed Computing, vol. 26, pp. 1-23, 1995. [19] C.H. Tsai, “A Quick Pessimistic Diagnosis Algorithm for Hypercube-Like Multiprocessor Systems under the PMC Model,” IEEE Trans. Computers, vol. 62, no. 2, pp. 259-267, Feb. 2013. [20] D. Wang, “Diagnosability of Enhanced Hypercubes,” IEEE Trans. Computers, vol. 43, no. 9, pp. 1054-1061, Sept. 1994. [21] C.L. Yang, G.M. Masson, and R.A. Leonetti, “On Fault Isolation and Identification in t1 =t1 -Diagnosable Systems,” IEEE Trans. Computers, vol. C-35, no. 7, pp. 639-643, July 1986. [22] C.L. Yang, and G.M. Masson, “A Fault Identification Algorithm for ti-Diagnosable Systems,” IEEE Trans. Computers, vol. C-35, no. 6, pp. 503-510, June 1986. [23] E. Yang, X. Yang, Q. Dong, and J. Li, “Conditional Diagnosability of Hypermesh Optical Multiprocessor Systems under the PMC Model,” Int’l J. Computer Math., vol. 88, pp. 2275-2284, 2011. [24] X. Yang, “A Fast Pessimistic One-Step Diagnosis Algorithm for Hypercube Multicomputer Systems,” J. Parallel and Distributed Computing, vol. 64, pp. 546-553, 2004. [25] F. Rodríguez-Salazar and J.R. Barker, “Hamming Hypermeshes: High Performance Interconnection Networks for Pin-Out Limited Systems,” Performance Evaluation, vol. 63, pp. 759–775, 2006.

[1] R.D. Chamberlain, M.A. Franklin, and C.S. Baw, “Gemini: An Optical Interconnection Network for Parallel Processing,” IEEE Trans. Parallel and Distributed Systems, vol. 13, no. 10, pp. 10381055, Oct. 2002. [2] G.Y. Chang, G.J. Chang, and G.H. Chen, “Diagnosabilities of Regular Networks,” IEEE Trans. Parallel and Distributed Systems, vol. 16, no. 4, pp. 314-323, Apr. 2005. [3] K.Y. Chwa and S.L. Hakimi, “On Fault Identification in Diagnosable Systems,” IEEE Trans. Computers, vol. C-30, no. 6, pp. 414-422, June 1981.

Hong-Chun Hsu received the BS degree in computer science and information management from Providence University, Taichung, Taiwan, in 1997, and the MS and PhD degrees from National Chiao-Tung University, Hsin-Chu, Taiwan, in 1999 and 2003, respectively. He is now with the Department of Medical Informatics, Tzu Chi University, Hualien, Taiwan. His research interests include graph theory and its applications to interconnection networks, fault diagnosis of network systems, and graph embedding. In addition, his interests also include parallel computing and bioinformatics.

Q2

HSU ET AL.: LINEAR TIME PESSIMISTIC DIAGNOSIS ALGORITHM FOR HYPERMESH MULTIPROCESSOR SYSTEMS

Kuang-Shyr Wu received the PhD degree in computer science from National Chiao-Tung University, Hsin-Chu, Taiwan, in 1997. He is currently an assistant professor in the Computer Science and Information Engineering Department of Chien Hsin University of Science and Technology, Taoyuan, Taiwan. His research interests include image processing and wireless communication, fault diagnosis.

Chiou-Yng Lee received the Bachelor’s degree in medical engineering and the MS degree in electronic engineering, both from the Chung Yuan Christian University, Zhongli, Taiwan, in 1986 and 1992, respectively, and the PhD degree in electrical engineering from Chang Gung University, Taoyuan, Taiwan, in 2001. From 1988 to 2005, he was a research associate with Chunghwa Telecommunication Laboratory, Taiwan. He joined the department of project planning. He taught those related field courses at Ching Yun University. Currently, he is a professor in the Department of Computer Information and Network Engineering at Lunghwa University of Science and Technology, Taoyuan, Taiwan. His research interests include computations in finite fields, error-control coding, signal processing, and digital transmission system. Besides, he is a senior member of the IEEE Computer society. He became an honor member of Phi Tao Phi in 2001. Chien-Ping Chang received the BS degree in electrical engineering from Chung Cheng Institute of Technology, Chiayi, Taiwan, in 1986, and the PhD degree in computer and information science from National Chiao Tung University, Hsin-Chu, Taiwan, in 1998. He is currently a professor in the Department of Computer Science and Information Engineering, Chien Hsin University of Science and Technology, Jungli, Taiwan. His research interests include parallel computing, interconnection networks, image processing, data hiding, and computations in finite fields.

IE E Pr E oo f

Cheng-Kuan Lin received the BS degree in applied mathematics from Chinese Culture University, Taiwan, in 2000, the MS degree in mathematics from the National Central University, Taoyuan, Taiwan, in 2002, and the PhD degree in computer science from the National Chiao-Tung University, Hsin-Chu, Taiwan, in 2011. He is now a postdoctoral fellow in the Institute of Information Science, Academia Sinica, Taipei, Taiwan. His research interests include interconnection network, algorithm, graph theory, wireless network, and wireless sensor network.

11



For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib.

Author Query

IE E Pr E oo f

Q1. Please provide e-mail address for the corresponding author. Q2. Please check and confirm the volume and issue number of Ref. [6], since it doesn’t match with our database.