The Computational Complexity of Complex Systems: the Role of Topology and Functionality Gregory Provan Computer Science Department, University College Cork, Cork, Ireland {[email protected]}

Abstract Recent research in complex systems has focused on the use of several classes of graph to model the topology a wide variety of naturally-occurring systems, ranging from biological systems, the WWW, to human-designed mechanical systems. However, little research has analysed the impact of the types of inference tasks that can be performed efficiently on such a structure. Some empirical studies indicate that there may be wide variability in inference complexity: for example, routing has log-time complexity, yet graph colouring appears to have exponentialtime complexity. To complement the average-case results shown for small-world graphs, we describe some worst-case complexity results for well-known NP-complete problems that are restricted to mimic particular properties of small-world graphs. We identify two broad functional problem classes that have different computational complexity on small-world networks, which we call path-based and consistency-based problems; we show that path-based problems can be solved more efficiently than can consistency-based problems for every graph topology analysed.

1 Small-world Networks and Inference Small-world networks have been used to model the topology of a wide variety of real-world applications, such as biological systems [31], the WWW [6], and human-designed mechanical systems [3, 7]. A small-world network is a complex network such that (a) the nodes are connected into several clusters which are loosely connected, and (b) every node can be reached from every other by a small number of hops or steps. We can measure whether a network is a small world or not according to two (empirical) network measures: clustering coefficient and characteristic (mean-shortest) path length [31]. The clustering coefficient (C) is a measure of how clustered, or locally structured, a graph is. This coefficient is an average of how interconnected each agent’s neighbors are. The characteristic path length (L) is the average distance between any two nodes in the network, or more precisely, the average length of the shortest path connecting each pair of nodes. Given the prevalence of a topology that is common to most complex systems, it is important to identify what types of inference tasks can be performed efficiently on such a structure. To this end, we assume a graph model that displays properties of a small-world network, and study the complexity of inference within graphical models for complex systems, which we call small-world graphs (SWGs). The most widely used algorithm on a real SWG, the WWW, is the PageRank algorithm [5]. This algorithm is based on computing the unique eigenvector corresponding to the largest eigenvalue for the webgraph’s adjacency matrix. The inference thus involves repeated multiplication of the adjacency matrix into any non-zero vector that is itself not an eigenvector.

There are several other important forms of inference on SWGs, most of which have received less attention. For example, we may have a SWG model of a complex system for which we want to diagnose faults in the system. This task involves examining the consistency of observations throughout the system; as a consequence, we call such a task a consistency-based task. A second type of task involves finding paths through a network, such as for packet routing. We call such a task a path-based task. Several results indicate that there may be wide variability in inference complexity within smallworld networks. For example, routing has O(log 2 n) complexity for small-world graphs constructed from an n × n lattice [18]; in contrast, consistency-based tasks like colouring a graph of n vertices appear to have complexity exponential in n [8, 29]. This article focuses on worst-case complexity results for well-known NP-complete problems that are restricted to mimic particular properties of small-world graphs. In contrast to most analyses of complex systems, which focus on topology (e.g., [3, 16, 33]), we focus both on the influence of graph topology and the function that is being computed. We summarise a set of complexity results on models with small-world properties, and discuss the implications of these results on the study of complex systems. The complexity results show that, given identical topology, the worst-case inference complexity differs significantly based on the type of inference. In particular, we identify two broad classes of problem, path-based problems that have at least polynomial-time complexity, and consistency-based problems, that have exponential-time complexity. To frame our problem as a decision problem (as is necessary to prove worst-case results), we translate the average-case-oriented SWG properties, such as clustering coefficient and mean-shortest path length, into deterministic bounds. We then examine well-known graph problems with bounded SWG properties. We propose two classes of problems, path-based and consistency-based problems. We show that, given our SWG model: (1) several path-based problems, e.g., the H AMILTON C YCLE, ROUTING and LONGEST- PATH (which are NP-complete for general graphs), are polynomial for a bounded SWG; (2) several constraint-based tasks remain NP-complete for both general graphs and a bounded SWG. We conjecture that these results generalise to two classes of problem that have different complexity for a SWG, i.e., that path-based problems can be more efficiently solved than can consistency-based problems for every graph topology analysed. Figure 1 depicts these two classes. Small-World Graph Structure Path-Based Problems

Consistency-Based Problems

NP-Complete Problems

Figure 1: Analysis of complexity results for various SWG models, in cases where the problem for an arbitrary graph graph is NP-complete. Consistency-based problems with SWG structure remain NP-complete, whereas path-based problems with SWG structure have complexity depending on the restrictions imposed by the SWG structure. We organize the remainder of the document as follows. Section 2 presents the general framework for our complexity analysis. Section 3 describes the notions of computational complexity that are the focus of this article. Section 4 presents the SWG properties that we use for proving our worst-case complexity results. Section 5 presents our complexity results for SWG path analysis. Section 6 presents our complexity results for SWG consistency analysis. Section 7 compares our worst-case results with experimental analyses of SWG path- and consistency-problems. Section 8

summarises our results and discusses the wider implications of these results.

2 Notation 2.1 Graph Theory This section introduces our notation. We assume that we have a graph G(V, E) with a set V of vertices and set E of edges. Definition 1 (digraph). A digraph is an ordered pair of sets G(V, E), where V is a set of vertices and E is a set of ordered pairs (called arcs) of vertices of V . We say that V1 is a parent of V2 in G, denoted V1 = π(V2 ), if (V1 , V2 ) is an edge in a digraph. Definition 2 (degree of vertex). The in-degree of a vertex is the number of arcs coming to the vertex, and the out-degree is the number of arcs going out of the vertex. Definition 3 (Path). A path from a vertex V0 to a vertex Vn in a graph G = (V, E) is a sequence of vertices V0 , V1 , ....., Vn that satisfies the following: for each i, 0 ≤ i ≤ n − 1, (Vi , Vi+1 ) ∈ E, or (Vi+1 , Vi ) ∈ E, that is, between any pair of vertices there is an arc connecting them. V0 is the initial vertex and Vn is the terminal vertex of the path. The length of a path P is the number of edges in P . A path is called a directed path if (Vi , Vi+1 ) ∈ E, for every 0 ≤ i ≤ n − 1. Definition 4 (Cycle). A cycle is a path in which the initial and the terminal vertices of a path are the same, that is, V0 = Vn .

2.2 SWG Characteristic 1: Mean Distance The Characteristic Path Length L of a SWG is only a meaningful measure if a graph is fully connected, i.e., if there is a sequence of edges joining any two nodes. We adopt the convention that L is infinite if a graph is not connected. In general, to make comparisons more feasible, all graphs we deal with will be fully connected. Definition 5 (Connected graph). A graph is connected if there is a path between every pair of its vertices. We define a notion of distance between two vertices in a graph as follows. Definition 6 (Graph Distance). Given a graph G(V, E), the distance between two vertices is the number of edges in a shortest path connecting the two vertices. We denote the distance of two vertices Vi and Vi by dG (Vi , Vj ). Definition 7 (Maximum (Minimum) Graph Distance). Given a graph G(V, E), the maximum (minimum) distance between two vertices is the number of edges in a longest (shortest) path connecting the two vertices. We denote the maximum (minimum) distance of two vertices Vi and Vj by min dmax G (Vi , Vj ) (dG (Vi , Vj )).

2.3 SWG Characteristic 2: Clustering Coefficient We define graph clustering to characterise the degree of cliquishness of a typical neighbourhood (a node’s immediately connected neighbours). For example, a graph is a clique if it is fully-connected. The clustering coefficient Ci for a vertex vi is the proportion of links between the vertices within its neighbourhood divided by the number of links that could possibly exist between them. Definition 8 (Graph Clustering Coefficient). The P graph clustering coefficient is the average of the clustering coefficients for each vertex [31]: C¯ = n1 ni=1 Ci . The clustering coefficient for vertex i is given by: |{ejk }| : Vj , Vk ∈ Ni , ejk ∈ E. Ci = ki (ki − 1) The notion of SWG is derived primarily from empirical observation, i.e., real-world complex systems have graphs with a particular topology that can be captured in terms of mean distance L ¯ In particular, we compare the parameters (L, C) ¯ with those of an and clustering coefficient C. ln(n) ¯ Erdos-Renyi random graph G(n, p) [13], i.e., (Lr , Cr ) = ( ln(pn) , p). Definition 9 (SWG). A small-world-graph (SWG) is a graph G(V, E) that has small-world prop¯ i.e., given a random erties measurable in terms of its mean distance L and clustering coefficient C; graph G(n, p) with mean distance Lr and clustering coefficient C¯r , L ' Lr and C¯ À C¯r .

3 Complexity Analysis This section introduces the notions of problem complexity that are the focus of this article. In particular, we compare and contrast worst-case complexity concepts with the average-case complexity analyses that are more common in the complex networks literature.

3.1 Worst-Case Complexity Analysis The most common method of analyzing the performance of algorithms is to specify the worst-case complexity of the algorithm, i.e., the longest time that the algorithm could possibly take, expressed in terms of the problem parameters.1 For example, an O(n) algorithm on a graph of n nodes will have to search, at the worst, all n nodes to return a result. Within this framework, the primary classification of interest in this article is whether a problem is in class P or NP. We frame our problems in terms of a decision problem, for which a yes/no answer exists. For example, a decision problem is whether a graph contains a fully-connected subgraph of k vertices or more. The class P consists of all decision problems that can be solved on a deterministic sequential machine in an amount of time that is polynomial in the size of the input; the class NP consists of all those decision problems whose positive solutions can be verified in polynomial time given the right information, or equivalently, whose solution can be found in polynomial time on a non-deterministic machine.2 In this article we focus on NP-complete problems, which can be informally described as the hardest problems in NP, and as a consequence are the problems in NP least likely to have polynomialtime algorithms. NP-hard problems are those into which any problem in NP can be transformed through some efficient transformation, i.e., one which takes at most a polynomially-bounded number of steps. NP-complete problems are those NP-hard problems that are contained in NP. Unfortunately, many important problems have been shown to be NP-complete, and researchers have been 1 2

See a standard text such as [14] for more detail. These definitions are adapted from Wikipedia.

unable to find a single fast algorithm for any of them. Some important classical NP-complete problems include SATISFIABILITY, HAMILTON - CYCLE , TRAVELING SALESMAN , and k-COLOURING [14]. Showing that a problem is NP-complete is significant in that it provides a strong indication that no efficient algorithm exists for this problem.

3.2 Average-case Complexity Analysis The analysis of complex systems, since it is based on the theory of random graphs, is primarily focused on average-case analysis, since this type of analysis is well-suited to the probabilistic nature of random graphs. It is important to note that random graphs are used as a model of a real-world system, the structure of which may be inherently deterministic. Deterministic models, like the decision models we explore, may be equally valid methods for describing complex systems if they adequately capture the properties of the complex system. Average-case complexity results express different concepts than worst-case results; they state the complexity of inference based on a distribution D of problem instances. In other words, efficient average-case algorithms denote that most instances in a distribution D can be solved efficiently, although such algorithms may be possibly inefficient on some instances. Hence, their results are valid to the extent that the distribution D faithfully reflects the space of problem instances. Efficient average-case algorithms have been developed for several NP-complete problems, such as the vertex-colouring problem [32] and dense instances of the Hamiltonian path problem [15], under commonly used distributions on random graphs. However, the study of complex systems has moved away from the analysis of random graphs, since it has been shown that random graphs do not capture the structural properties of complex systems (see [22] for an overview). In the place of random graphs, researchers have proposed a variety of alternative models, all of which capture the structural properties of complex systems with varying degrees of fidelity. There is some preliminary evidence that average-case inference on SWGs is harder than that for random graphs. For example, Walsh [29] has shown empirically that vertex-colouring is significantly harder for randomised distributions over small-world graphs than for similar distributions over randomgraphs; hence, it appears unlikely that the algorithm of [15] will be efficient for small-world graphs for both dense and less-dense SWGs. Our worst-case complexity results also indicate that inference on SWGs is harder than inference on random graphs. To complement this average-case analysis, we examine several model types, each of which captures different properties of a complex system. For example, our study of directed graphs captures the directionality inherent in many real-world systems, like the WWW. By comparing the worst-case complexity of these different models, we are able to compare the relative impact of model properties on inference. For example, by comparing directed vs. undirected graphs, we can examine the effect of edge directionality on path-based and consistency-based tasks.

4 Bounded Small-World Graph Model 4.1 Problem Classification Based on evidence of efficient algorithms over SWGs for routing and none for colouring, we study two problem classes that generalise routing and colouring, which we call path-based and consistency-based classes, repsectively. We define these classes as follows. Path-based Problem We characterise this class of problem in an SWG as identifying a particular type of path in the graph. This class of problems includes problems like Hamilton Cycle/Path (computing a Hamiltonian cycle/path) and Routing problems (computing a path from one vertex to another).

Definition 10 (Path-Based Problem). Given a graph G(V, E) (which may have costs ϕ associated with edges, vertices or both edges and vertices), a path-based problem concerns finding a path in G(V, E) with properties π and cost ϕ governed by a particular bound b. For example, for Hamilton Path π denotes a path that visits every Vi ∈ V exactly once, and ϕ is a trivial cost function where every edge has cost 1. The difficulty of solving this class of problem is often proportional to the size of the graph diameter. Consistency-based Problem We characterise this problem class as identifying in an SWG G(V, E) a consistent assignment given a constraint set defined over G(V, E). Definition 11 (Consistency-Based Problem). Given a graph G(V, E) where vertex Vi has domain of values Di , a consistency-based problem on a graph associates a constraint with every vertex. That is, for every edge Ek = (Vi , Vj ), where Ek ∈ E and Vi , Vj ∈ V , ∃ a constraint φk that restricts the values that Vi , Vj can simultaneously hold. Given this representation, solving a consistency-based problem entails assigning a value to every vertex Vj ∈ V such that all constraints φk ∈ Φ are satisfied. This class of problem includes colouring, k-clique, SAT, Constraint Satisfaction Problems (CSP), Bayesian Networks (BN), etc. For the k-clique problem the constraints are the (trivial) connectivity of k vertices, so we just have to identify the clique. Other problems have a nontrivial constraint set, e.g., for 3-colouring we must colour the vertices with 3 colours so that no adjacent vertices have the same colour. The complexity of solving this class of problem is proportional to the size of the maximum clique in G(V, E). We will show that the inference complexity depends on the problem, with problems that involve path-analysis typically being simple for SWGs, and problems that involve constraint satisfaction typically being hard for SWGs.

4.2 Bounded SWG Models This section defines the bounded SWG models that we use for proving worst-case complexity results. 4.2.1 Directed SWG Model Our first bounded SWG model assumes that a SWG is directed, and is denoted di-SWG. This is true in many real-world systems; for example, the WWW is arguably a digraph, as are natural systems such as river systems, biological systems (flows of blood, nervous-system signals, etc. are all directed), and technological systems (electrical circuits, etc.). We look at a particular restriction on the di-SWG, a directed acyclic graph (DAG), or DAGSWG. A DAG-SWG can model many complex systems, such as river systems, and it plays a role in several important computational frameworks, such as Bayesian Networks [24]. 4.2.2 Path-Length-Bounded SWG Model We introduce a graph with bounded path-length because of the following observations: • real-world graphs have short distances (and in many cases path-lengths) [22]:

• the results of theoretical analyses of scale-free graph models of a SWG indicate that graph distances are bounded by logn for n vertices, even for graphs with large n [22]. In order to develop NP-completeness results, we need to modify the SWG-based notion of mean distance, which is an average-case result, into a hard bound, that of maximum path-length. As a consequence, we use a model of a small-world-graph (SWG) with bounded maximum path-length (MPL), as follows. Definition 12 (bMPL-SWG). A bounded-MPL small-world-graph (bMPL-SWG) is a SWG G(V, E) in which every pair of nodes has a graph maximum path-length (MPL) bounded by an integer b < |V |. 4.2.3 Cluster-Bounded SWG Model The maximum cluster-size of a graph is important, since it governs the inference complexity of a wide range of consistency-based problems [12], such as constraint satisfaction, Bayesian network inference, and causal network model-based diagnosis. We assume that a SWG can be viewed as a set of clusters Γ = {γ1 , ..., γq }, which are loosely connected. In the extreme, each cluster will be fully-connected, i.e., a clique. Any vertex Vi in a graph may be a member of multiple clusters; the cluster degree of Vi is the size of the clusters in which Vi is a member. The maximum cluster degree max δ ∗ of Vi is given by: δ ∗ (Vi ) = j:Vi ∈Γj |Γj |. We propose a model of a small-world-graph (SWG) with bounded cluster size, as follows. Definition 13 (bcSWG). A bounded-cluster small-world-graph (bcSWG) is a SWG G(V, E) in which every vertex Vi has a graph cluster degree δi bounded by an integer χ < |V |. 4.2.4 Bounded SWG Model We can propose a “combined” model of a small-world-graph (SWG) with bounded path-length and cluster size, as follows. Definition 14 (bSWG). A bounded SWG (bSWG) is a SWG G(V, E) in which (1) every pair of nodes has a graph MPL bounded by an integer b < |V |and (2) every vertex Vi has a graph cluster degree δi bounded by an integer χ < |V |. SWG Model Class DiGraph

Sub-Class

Path-Based

ConsistencyBased

Acyclic (DAG)

Polynomial

Cyclic

NP-complete NP-complete (Polynomial-time approximable)

Bounded PathSWG Length

NP-complete

Polynomial – bounded

NP-complete

Clique-Size NP-complete

NP-complete

PathPolynomialLength + bounded Clique-Size

NP-complete

Figure 2: Summary of worst-case complexity results for various SWG models. For each case, the problem for an arbitrary graph graph is NP-complete.

5 Complexity of SWG Path Analysis In this section we analyse the complexity of three path-based problems, HAMILTON CYCLE, LONGESTPATH and ROUTING . Figure 2 summarises the complexity results for both the path-based and consistency-based problem classes.3 We selected these problems because they are all classical NPcomplete problems [14]. A HAMILTON CYCLE is a cycle in an undirected graph that visits each vertex exactly once and also returns to the starting vertex. In addition, we examine the LONGESTPATH problem, which is closely related to HC. In a graph G(V, E) with all edge weights of 1, the LONGEST- PATH addresses whether there is a simple path of weight (length) k or more. We also examine the ROUTING problem, which concerns finding a path between two distinguished vertices in a graph that meets particular weight bounds. We now examine the complexity of HAMILTON CYCLE, LONGEST- PATH and ROUTING when the underlying graph is modeled using a SWG with directed edges, and with bounds on MPL and/or cluster-size.

5.1

HAMILTON CYCLE

5.1.1 Di-Graph Path-Based Problems Some path-based problems are simple on a digraph. For example, the ROUTING problem is O(logn) [19]. However, several path-based problems remain hard on a digraph. In particular, HAMILTON CYCLE and LONGEST- PATH both remain NP-complete on a digraph. For these NP-complete digraph problems, the known approximation results are promising. For example, it can be shown that a simple greedy algorithm finds long paths in SWG, when the graph is dense [17]. When we restrict the digraph to a DAG, then we have the following results: •

LONGEST- PATH

has O(n) complexity, using dynamic programming [21].

•

HAMILTON CYCLE

•

ROUTING

has polynomial-time complexity.

has O(logn) complexity [19].

5.1.2 Bounded-Path-Length HC We adopt the bMPL-SWG notion described above, and define a Bounded-path-length HAMILTON CYCLE (BHC) problem as follows. BL-HC: INSTANCE: A bMPL-SWG G(V, E), with a graph path-length bounded by an integer b < |V |. QUESTION: Does G contain a Hamilton Cycle, i.e., an ordering hV1 , V2 , ..., Vn i of the vertices of G, where n = |V |, such that (Vn , V1 ) ∈ E and (Vi , Vi+1 ) ∈ E ∀i, 1 ≤ i ≤ n? Theorem 1. Bounded-path-length HAMILTON CYCLE (BL-HC) is NP-complete if b ≥ n. We prove this result by reduction from HAMILTON CYCLE, using a straightforward construction. We omit the full proofs for space reasons. If b < n, we can prove that BL-HC is in both NP and co-NP,4 so its worst-case complexity is uncertain, similar to the situation for INTEGER FACTORING [4]. We prove that BL-HC is in co-NP by defining the complement of BL-HC, which asks the question of whether a bMPL-SWG G(V, E) has no Hamilton cycle. We show that an oracle can test an instance of BL-HC-complement in polynomial time. 3 Due to space constraints, we summarise the complexity results and omit proofs; full details and proofs can be found in [25]. 4 Co-NP is the class of problems that have succinct non-membership witnesses.

However, for a given bound b = O(logn), we have a polynomial-time algorithm, the result of which is proven by generalising a result of [1]. Note that this O(logn) bound is reasonable for a SWG, since L = O(logn) in most real-world systems [22]. Theorem 2. Bounded-path-length b = O(logn)

HAMILTON CYCLE

(BL-HC) is solvable in polynomial time if

5.1.3 Bounded-Cluster HC Using the bounded-cluster SWG model, we can define a Bounded-cluster HAMILTON CYCLE (BCHC) problem: BC-HC: INSTANCE: A bcSWG G(V, E), where |V | = n, with graph cluster degree bounded by an integer χ < n. QUESTION: Does G contain a Hamilton Cycle, i.e., an ordering hV1 , V2 , ..., Vn i of the vertices of G, where n = |V |, such that (Vn , V1 ) ∈ E and (Vi , Vi+1 ) ∈ E ∀i, 1 ≤ i ≤ n? Theorem 3. Bounded-cluster HAMILTON CYCLE (BC-HC) is NP-complete for all non-trivial instances (i.e., where cluster degree δ ≥ 3) of a bcSWG G(V, E). It is known that HC is solvable in polynomial time if G has no vertex with degree exceeding 2 (see, for example, [14]). This translates into a trivial case in terms of maximum cluster degree. 5.1.4 Combined-Model HC This section studies the complexity of HC when we have both the properties of bounded path-length and cluster size. B-HC: INSTANCE: A bSWG G(V, E), where |V | = n, with graph path-length bounded by an integer b < n and graph cluster degree bounded by an integer χ < n. QUESTION: Does G contain a Hamilton Cycle, i.e., an ordering hV1 , V2 , ..., Vn i of the vertices of G, where n = |V |, such that (Vn , V1 ) ∈ E and (Vi , Vi+1 ) ∈ E ∀i, 1 ≤ i ≤ n? Theorem 4. Bounded-SWG HAMILTON CYCLE (B-HC) is NP-complete if b ≥ n and cluster degree δ ≥ 3. Corollary 1. Bounded-SWG HAMILTON O(logn), and cluster degree δ ≥ 3.

5.2

CYCLE

(B-HC) is solvable in polynomial time if b =

ROUTING

We refer to routing problems in a generic manner as problems associated with finding paths in networks (graphs). There are a wide variety of definitions of routing problems, such as described in [14, pp. 211–214]. In this article, we focus on the following definition of routing, which is NP-complete [14]. ROUTING : INSTANCE: A graph G(V, E), where |V | = n, with two distinguished vertices Vi , Vj , and integer k. QUESTION: Does G contain a path from Vi to Vj of length at least k? We use the bSWG notion described above to define a Bounded-MPL Routing (B MPL-ROUTING) problem as follows. B MPL-ROUTING : INSTANCE: A bMPL-SWG G(V, E), with |V | = n, graph path-length bounded by an integer b < n, two distinguished vertices Vi and Vj , and integer k. QUESTION: Does G contain a path from Vi to Vj of length at least k?

Theorem 5.

B MPL-ROUTING

is NP-complete only if b ≥ k.

We prove Theorem 5 through reduction from Theorem 2 to show: Theorem 6.

B MPL-ROUTING

BL - HC .

We can apply the algorithm for proving

is solvable in polynomial time if b = O(logn).

5.3 Discussion The key SWG parameter for path-based problems is the MPL dmax G (Vi , Vj ), as shown by our worstmax case complexity analysis. When dG (Vi , Vj ) = O(log|V |), we can compute paths in time polynomial in |V |. In addition, there are several polynomial-time approximation algorithms for path-based problems [10] that can be applied, given the dense graphs typical of a small-world network.

6 Complexity of SWG Consistency-Based Problems 6.1 Complexity Results DiGraph SWG When we restrict a SWG to be directed, inference for some consistency-based problems remains hard. In particular, we can show the following: Theorem 7. k-COLOURING on a digraph, where no two nodes joined by a directed edge cannot have the same colour, is NP-complete for k ≥ 3. If we further restrict the graph to be a DAG, k-COLOURING still remains NP-complete. Note that a graphical consistency-based problem defined over probabilistic constraints based on a DAG, exact inference in a Bayesian network [24], has been shown to be NP-complete [9]. These results indicate that arc-directionality and acyclicity are not the source of hardness for this class of problems. Bounded-Path-Length SWG A range of consistency-based problems remain NP-complete when we examine a SWG with a bound on the MPL. In particular, k-COLOURING and k-CLIQUE remain NP-complete on a bPL-SWG if bound b < |V | Bounded-Cluster SWG A range of consistency-based problems remain NP-complete when we examine a SWG with a non-trivial bound on the cluster-degree. In particular, k-COLOURING and k-CLIQUE remain NP-complete on a bC-SWG if the cluster-degree δ ≥ 3. Combined-Bound SWG Finally, we examine a SWG in which we bound both the MPL and the cluster-degree. For the model, bSWG, a range of consistency-based problems remain NP-complete. In particular, k-COLOURING and k-CLIQUE remain NP-complete on a bC-SWG if the the MPL is bounded such that b < |V | and the cluster-degree δ ≥ 3.

6.2 Discussion This section has shown that several consistency-based problems remain NP-complete for all nontrivial SWG restrictions that we analysed. This means that graph directionality does not affect these consistency-based problems, nor does adding non-trivial bounds to path-length or cluster-size. This concurs with studies of phase transitions for similar problems [30], which show that these problems exhibit intractability for the range of graph parameters that we have examined. In addition, we should note that many of these consistency-based problems remain computationally intractable under schemes that try to approximate within a bound of ². For example, such results are known for Bayesian networks [11, 23], and problems related to satisfiability, such as prime implicants [28]. This contrasts with the path-based problems, for which polynomial-time approximation algorithms do exist, e.g., [10].

7 Experimental Evidence This section reviews related research that provides empirical evidence of complexity of inference within SWG models.

7.1 SWG Path Analysis 7.1.1 Routing There is evidence (using average-case analysis) that routing in SWGs is simple, as has been shown by our worst-case analysis. Kleinberg [18] has shown that, with high probability, there exist paths between every pair of nodes in an SWG constructed from an n×n lattice whose lengths are bounded by a polynomial in logn. This is exponentially smaller than the total number |V | of nodes in the network. This result has been extended [20] such that any two nodes in a path with expected length O(log 2 n) can be found using a simple greedy algorithm which has no global knowledge of longrange links. Further, expected delivery time is θ(log 2 n). 7.1.2 Average-Case Analysis of Hamilton Cycles in SWGs Our results so far have focused on worst-case analyses where we introduced a bound on the maximum path length. In a SWG, the path-length bound is defined over the average path-length. This section shows how we can extend our analysis to consider the case of average path-lengths. To do this, we must move to an average-case analysis. One piece of prior work that are of particular significance is the analysis of loops in a SWG [2]. This analysis provides further evidence that cycles are unlikely in a typical SWG, and that path-lengths are bounded. This further strengthens the notion that path-based problems are simple in SWGs. Bianconi and Marsili [2] analyse the average number of loops in random scale-free networks containing a fixed number n of nodes and arbitrary length Ł of loops. They show that Hamiltonian cycles are rare in random scale-free networks and may fail to appear if the power-law exponent of the degree distribution is close to two even for minimal connectivity kmin ≥ 3. Further, they show that it is impossible to embed in scale-free networks a regular graph of connectivity c ≥ 3 possessing a Hamilton cycle if γ is too small, even if all nodes have ki ≥ c. In the intermediate region of relatively large loops, the expected number of loops attains its maximum for loops of size Ł ' n.

7.2 SWG Consistency-Based Problems Walsh [29] has experimentally shown that colouring in SWGs is hard. Walsh used complete solvers to analyse a variety of combinatorial problems, like graph colouring. Walsh found a clear correlation between SWG structure and inference complexity using the complete inference algorithms, i.e., inference is easy for regular and random graphs, but hard for SWGs. Roli [26] has analysed SWG structure and its impact on SAT complexity using local-search algorithms. Roli found that the type of search algorithm had more impact on inference complexity than the structure. In particular, whereas the complete inference algorithms show a clear correlation between SWG structure and inference complexity, the case for local search algorithms is nowhere near as clear-cut. Further, [27] experimentally studied the relation between SWG structure and inference complexity using complete solvers. Svenson addresses the colouring problem on SWGs created by rewiring square, triangular, and two kinds of cubic lattice (with coordination numbers 5 and 6). His

results show that, as the rewiring parameter p → 1, not all cases showed the expected crossover to the behaviour of random graphs with corresponding connectivity. For example, for the cubic lattices there is a region near p = 0 for which the graphs are colourable. This could in principle be used as an additional heuristic for solving real world colouring or scheduling problems. Moreover, his results show that SWGs with connectivity 5 and p ' 0.1 provide an interesting ensemble of graphs whose colourability is hard to determine.

8 Summary and Discussion This article has summarised a set of complexity results on models with small-world properties, results focused on problems solved within a SWG framework. These results indicate a clear distinction in the computational complexity of path-based and consistency-based problems on various SWG models. In particular, we have shown that: • Path-based problems tend to be polynomial-time or approximable/solvable by greedy algorithms; these problems are not affected by high clustering of a SWG, and in fact they are easier in dense graphs and directed graphs. • Consistency-based problems remain NP-complete for the SWG restrictions studied; the computational complexity of these problems is directly affected by the high clustering of a SWG, and directionality does not reduce the complexity class from an intractable to a tractable one. There are several important implications of these results on the study of complex systems. First, topology is just one issue to be considered in analysing real-world problems within the SWG framework. In particular, we have shown that the actual problem being addressed (problem functionality) is at least as important as the topology of the network. Second, our preliminary results indicate that there may be important distinctions in inference complexity on SWGs. We conjecture that path-based problems can be solved more efficiently than can consistency-based problems, based on a range of graph topologies. Future work is necessary to see how general this observation is, and whether there are additional problem classes with distinct complexity properties. Third, our study provides supporting evidence to the observation that inference on SWGs is harder than is inference on random graphs. It is of great interest to see if this increased hardness holds for problems beyond HAMILTON CYCLE and COLOURING. We have shown that the function being performed in a network is important, in relation to discrete (atemporal) optimisation problems. In future work, we hope to generalise this analysis to address network dynamics. This will address the interplay between network topology and dynamics, i.e., to assess that topological features are most important to different types of dynamics, both from the perspective of computational complexity but also computability.

References [1] N. Alon, R. Yuster, and U. Zwick. Color-coding. J. ACM, 42(4):844–856, 1995. [2] G. Bianconi and M. Marsili. Loops of any size and hamilton cycles in random scale-free networks. J. Stat. Mech., P06005, 2005. [3] D. Braha and Y. Bar-Yam. Topology of large-scale engineering problem-solving networks. Physical Review E, 69:016113, 2004.

[4] R. P. Brent. Recent progress and prospects for integer factorisation algorithms. In COCOON, pages 3–22, 2000. [5] S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks, 30:107–117, 1998. [6] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web. In Proceedings of the 9th international World Wide Web Conference on Computer networks: the international journal of computer and telecommunications networking, pages 309–320, Amsterdam, The Netherlands, The Netherlands, 2000. North-Holland Publishing Co. [7] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web. In Proc. WWW Conf. on Computer networks, pages 309–320, 2000. [8] P. Cheeseman, B. Kanefsky, and W. M. Taylor. Where the Really Hard Problems Are. In Proc. IJCAI-91, pages 331–337, 1991. [9] G. F. Cooper. The computational complexity of probabilistic inference using bayesian belief networks. Artif. Intell., 42(2-3):393–405, 1990. [10] B. Csaba, M. Karpinski, and P. Krysta. Approximability of dense and sparse instances of minimum 2-connectivity, TSP and path problems. In SODA ’02, pages 74–83, 2002. [11] P. Dagum and R. M. Chavez. Approximating probabilistic inference in bayesian belief networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15:246–255, March 1993. [12] R. Dechter. Bucket elimination: A unifying framework for reasoning. Artificial Intelligence, 113(1-2):41–85, 1999. [13] P. Erdos and A. Renyi. On random graphs. Publ. Math. Debrecen, 6:290–297, 1960. [14] M. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NPCompleteness. Freeman, 1978. [15] Y. Gurevich and S. Shelah. Expected computation time for hamiltonian path problem. SIAM J. Comput., 16(3):486–502, 1987. [16] R. F. i Cancho, C. Janssen, and R. V. Sole. Topology of technology graphs: Small world patterns in electronic circuits. Physical Review E (Statistical, Nonlinear, and Soft Matter Physics), 64(4):046119, 2001. [17] D. R. Karger, R. Motwani, and G. D. S. Ramkumar. On approximating the longest path in a graph. Algorithmica, 18(1):82–98, 1997. [18] J. Kleinberg. The small-world phenomenon: an algorithm perspective. In STOC ’00: Proceedings of the thirty-second annual ACM symposium on Theory of computing, pages 163–170, New York, NY, USA, 2000. ACM Press. [19] C. Martel and V. Nguyen. Analyzing Kleinberg’s (and other) small-world Models. In PODC ’04, pages 179–188, 2004.

[20] C. Martel and V. Nguyen. Analyzing Kleinberg’s (and other) small-world Models. In PODC ’04: Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing, pages 179–188, New York, NY, USA, 2004. ACM Press. [21] L. Michel and P. V. Hentenryck. Maintaining longest paths incrementally. In Principles and Practice of Constraint Programming, pages 540–554, 2003. [22] M. Newman. The structure and function of complex networks. SIAM Review, 45(2):167– 256, 2003. [23] J. D. Park and A. Darwiche. Complexity Results and Approximation Strategies for MAP Explanations. J. Artif. Intell. Res. (JAIR), 21:101–133, 2004. [24] J. Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, 1988. [25] G. Provan. The Computational Complexity of Complex Systems: the Role of Topology and Functionality. Technical report, Computer Science Department, University College Cork, Cork, Ireland, 2006. [26] A. Roli. On the impact of small-world on local search. In Proceedings of the 9th Congress of the Italian Association for Artificial Intelligence, Milan, Italy, 2005. [27] P. Svenson. From Neel to NPC: Colouring small worlds. cs.CC/0107015, 2004. [28] C. Umans. On the complexity and inapproximability of shortest implicant problems. Lecture Notes in Computer Science, 1644:687–696, 1999. [29] T. Walsh. Search in a small world. In IJCAI, pages 1172–1177, 1999. [30] T. Walsh. The Interface between P and NP: COL, XOR, NAE, 1-in-k, and Horn SAT. In AAAI/IAAI, pages 695–, 2002. [31] D. J. Watts and S. H. Strogatz. Collective dynamics of “small-world” networks. Nature, 393:440–442, 1998. [32] H. S. Wilf. An o(1) expected time algorithm for the graph coloring problem. Information Processing Letters, 18:119–122, 1984. [33] S. Wuchty, E. Ravasz, and A.-L. Barabsi. The architecture of biological networks. In T. Deisboeck, J. Y. Kresh, and T. Kepler, editors, Complex Systems in Biomedicine, New York, 2003. Kluwer Academic Publishing.

Abstract Recent research in complex systems has focused on the use of several classes of graph to model the topology a wide variety of naturally-occurring systems, ranging from biological systems, the WWW, to human-designed mechanical systems. However, little research has analysed the impact of the types of inference tasks that can be performed efficiently on such a structure. Some empirical studies indicate that there may be wide variability in inference complexity: for example, routing has log-time complexity, yet graph colouring appears to have exponentialtime complexity. To complement the average-case results shown for small-world graphs, we describe some worst-case complexity results for well-known NP-complete problems that are restricted to mimic particular properties of small-world graphs. We identify two broad functional problem classes that have different computational complexity on small-world networks, which we call path-based and consistency-based problems; we show that path-based problems can be solved more efficiently than can consistency-based problems for every graph topology analysed.

1 Small-world Networks and Inference Small-world networks have been used to model the topology of a wide variety of real-world applications, such as biological systems [31], the WWW [6], and human-designed mechanical systems [3, 7]. A small-world network is a complex network such that (a) the nodes are connected into several clusters which are loosely connected, and (b) every node can be reached from every other by a small number of hops or steps. We can measure whether a network is a small world or not according to two (empirical) network measures: clustering coefficient and characteristic (mean-shortest) path length [31]. The clustering coefficient (C) is a measure of how clustered, or locally structured, a graph is. This coefficient is an average of how interconnected each agent’s neighbors are. The characteristic path length (L) is the average distance between any two nodes in the network, or more precisely, the average length of the shortest path connecting each pair of nodes. Given the prevalence of a topology that is common to most complex systems, it is important to identify what types of inference tasks can be performed efficiently on such a structure. To this end, we assume a graph model that displays properties of a small-world network, and study the complexity of inference within graphical models for complex systems, which we call small-world graphs (SWGs). The most widely used algorithm on a real SWG, the WWW, is the PageRank algorithm [5]. This algorithm is based on computing the unique eigenvector corresponding to the largest eigenvalue for the webgraph’s adjacency matrix. The inference thus involves repeated multiplication of the adjacency matrix into any non-zero vector that is itself not an eigenvector.

There are several other important forms of inference on SWGs, most of which have received less attention. For example, we may have a SWG model of a complex system for which we want to diagnose faults in the system. This task involves examining the consistency of observations throughout the system; as a consequence, we call such a task a consistency-based task. A second type of task involves finding paths through a network, such as for packet routing. We call such a task a path-based task. Several results indicate that there may be wide variability in inference complexity within smallworld networks. For example, routing has O(log 2 n) complexity for small-world graphs constructed from an n × n lattice [18]; in contrast, consistency-based tasks like colouring a graph of n vertices appear to have complexity exponential in n [8, 29]. This article focuses on worst-case complexity results for well-known NP-complete problems that are restricted to mimic particular properties of small-world graphs. In contrast to most analyses of complex systems, which focus on topology (e.g., [3, 16, 33]), we focus both on the influence of graph topology and the function that is being computed. We summarise a set of complexity results on models with small-world properties, and discuss the implications of these results on the study of complex systems. The complexity results show that, given identical topology, the worst-case inference complexity differs significantly based on the type of inference. In particular, we identify two broad classes of problem, path-based problems that have at least polynomial-time complexity, and consistency-based problems, that have exponential-time complexity. To frame our problem as a decision problem (as is necessary to prove worst-case results), we translate the average-case-oriented SWG properties, such as clustering coefficient and mean-shortest path length, into deterministic bounds. We then examine well-known graph problems with bounded SWG properties. We propose two classes of problems, path-based and consistency-based problems. We show that, given our SWG model: (1) several path-based problems, e.g., the H AMILTON C YCLE, ROUTING and LONGEST- PATH (which are NP-complete for general graphs), are polynomial for a bounded SWG; (2) several constraint-based tasks remain NP-complete for both general graphs and a bounded SWG. We conjecture that these results generalise to two classes of problem that have different complexity for a SWG, i.e., that path-based problems can be more efficiently solved than can consistency-based problems for every graph topology analysed. Figure 1 depicts these two classes. Small-World Graph Structure Path-Based Problems

Consistency-Based Problems

NP-Complete Problems

Figure 1: Analysis of complexity results for various SWG models, in cases where the problem for an arbitrary graph graph is NP-complete. Consistency-based problems with SWG structure remain NP-complete, whereas path-based problems with SWG structure have complexity depending on the restrictions imposed by the SWG structure. We organize the remainder of the document as follows. Section 2 presents the general framework for our complexity analysis. Section 3 describes the notions of computational complexity that are the focus of this article. Section 4 presents the SWG properties that we use for proving our worst-case complexity results. Section 5 presents our complexity results for SWG path analysis. Section 6 presents our complexity results for SWG consistency analysis. Section 7 compares our worst-case results with experimental analyses of SWG path- and consistency-problems. Section 8

summarises our results and discusses the wider implications of these results.

2 Notation 2.1 Graph Theory This section introduces our notation. We assume that we have a graph G(V, E) with a set V of vertices and set E of edges. Definition 1 (digraph). A digraph is an ordered pair of sets G(V, E), where V is a set of vertices and E is a set of ordered pairs (called arcs) of vertices of V . We say that V1 is a parent of V2 in G, denoted V1 = π(V2 ), if (V1 , V2 ) is an edge in a digraph. Definition 2 (degree of vertex). The in-degree of a vertex is the number of arcs coming to the vertex, and the out-degree is the number of arcs going out of the vertex. Definition 3 (Path). A path from a vertex V0 to a vertex Vn in a graph G = (V, E) is a sequence of vertices V0 , V1 , ....., Vn that satisfies the following: for each i, 0 ≤ i ≤ n − 1, (Vi , Vi+1 ) ∈ E, or (Vi+1 , Vi ) ∈ E, that is, between any pair of vertices there is an arc connecting them. V0 is the initial vertex and Vn is the terminal vertex of the path. The length of a path P is the number of edges in P . A path is called a directed path if (Vi , Vi+1 ) ∈ E, for every 0 ≤ i ≤ n − 1. Definition 4 (Cycle). A cycle is a path in which the initial and the terminal vertices of a path are the same, that is, V0 = Vn .

2.2 SWG Characteristic 1: Mean Distance The Characteristic Path Length L of a SWG is only a meaningful measure if a graph is fully connected, i.e., if there is a sequence of edges joining any two nodes. We adopt the convention that L is infinite if a graph is not connected. In general, to make comparisons more feasible, all graphs we deal with will be fully connected. Definition 5 (Connected graph). A graph is connected if there is a path between every pair of its vertices. We define a notion of distance between two vertices in a graph as follows. Definition 6 (Graph Distance). Given a graph G(V, E), the distance between two vertices is the number of edges in a shortest path connecting the two vertices. We denote the distance of two vertices Vi and Vi by dG (Vi , Vj ). Definition 7 (Maximum (Minimum) Graph Distance). Given a graph G(V, E), the maximum (minimum) distance between two vertices is the number of edges in a longest (shortest) path connecting the two vertices. We denote the maximum (minimum) distance of two vertices Vi and Vj by min dmax G (Vi , Vj ) (dG (Vi , Vj )).

2.3 SWG Characteristic 2: Clustering Coefficient We define graph clustering to characterise the degree of cliquishness of a typical neighbourhood (a node’s immediately connected neighbours). For example, a graph is a clique if it is fully-connected. The clustering coefficient Ci for a vertex vi is the proportion of links between the vertices within its neighbourhood divided by the number of links that could possibly exist between them. Definition 8 (Graph Clustering Coefficient). The P graph clustering coefficient is the average of the clustering coefficients for each vertex [31]: C¯ = n1 ni=1 Ci . The clustering coefficient for vertex i is given by: |{ejk }| : Vj , Vk ∈ Ni , ejk ∈ E. Ci = ki (ki − 1) The notion of SWG is derived primarily from empirical observation, i.e., real-world complex systems have graphs with a particular topology that can be captured in terms of mean distance L ¯ In particular, we compare the parameters (L, C) ¯ with those of an and clustering coefficient C. ln(n) ¯ Erdos-Renyi random graph G(n, p) [13], i.e., (Lr , Cr ) = ( ln(pn) , p). Definition 9 (SWG). A small-world-graph (SWG) is a graph G(V, E) that has small-world prop¯ i.e., given a random erties measurable in terms of its mean distance L and clustering coefficient C; graph G(n, p) with mean distance Lr and clustering coefficient C¯r , L ' Lr and C¯ À C¯r .

3 Complexity Analysis This section introduces the notions of problem complexity that are the focus of this article. In particular, we compare and contrast worst-case complexity concepts with the average-case complexity analyses that are more common in the complex networks literature.

3.1 Worst-Case Complexity Analysis The most common method of analyzing the performance of algorithms is to specify the worst-case complexity of the algorithm, i.e., the longest time that the algorithm could possibly take, expressed in terms of the problem parameters.1 For example, an O(n) algorithm on a graph of n nodes will have to search, at the worst, all n nodes to return a result. Within this framework, the primary classification of interest in this article is whether a problem is in class P or NP. We frame our problems in terms of a decision problem, for which a yes/no answer exists. For example, a decision problem is whether a graph contains a fully-connected subgraph of k vertices or more. The class P consists of all decision problems that can be solved on a deterministic sequential machine in an amount of time that is polynomial in the size of the input; the class NP consists of all those decision problems whose positive solutions can be verified in polynomial time given the right information, or equivalently, whose solution can be found in polynomial time on a non-deterministic machine.2 In this article we focus on NP-complete problems, which can be informally described as the hardest problems in NP, and as a consequence are the problems in NP least likely to have polynomialtime algorithms. NP-hard problems are those into which any problem in NP can be transformed through some efficient transformation, i.e., one which takes at most a polynomially-bounded number of steps. NP-complete problems are those NP-hard problems that are contained in NP. Unfortunately, many important problems have been shown to be NP-complete, and researchers have been 1 2

See a standard text such as [14] for more detail. These definitions are adapted from Wikipedia.

unable to find a single fast algorithm for any of them. Some important classical NP-complete problems include SATISFIABILITY, HAMILTON - CYCLE , TRAVELING SALESMAN , and k-COLOURING [14]. Showing that a problem is NP-complete is significant in that it provides a strong indication that no efficient algorithm exists for this problem.

3.2 Average-case Complexity Analysis The analysis of complex systems, since it is based on the theory of random graphs, is primarily focused on average-case analysis, since this type of analysis is well-suited to the probabilistic nature of random graphs. It is important to note that random graphs are used as a model of a real-world system, the structure of which may be inherently deterministic. Deterministic models, like the decision models we explore, may be equally valid methods for describing complex systems if they adequately capture the properties of the complex system. Average-case complexity results express different concepts than worst-case results; they state the complexity of inference based on a distribution D of problem instances. In other words, efficient average-case algorithms denote that most instances in a distribution D can be solved efficiently, although such algorithms may be possibly inefficient on some instances. Hence, their results are valid to the extent that the distribution D faithfully reflects the space of problem instances. Efficient average-case algorithms have been developed for several NP-complete problems, such as the vertex-colouring problem [32] and dense instances of the Hamiltonian path problem [15], under commonly used distributions on random graphs. However, the study of complex systems has moved away from the analysis of random graphs, since it has been shown that random graphs do not capture the structural properties of complex systems (see [22] for an overview). In the place of random graphs, researchers have proposed a variety of alternative models, all of which capture the structural properties of complex systems with varying degrees of fidelity. There is some preliminary evidence that average-case inference on SWGs is harder than that for random graphs. For example, Walsh [29] has shown empirically that vertex-colouring is significantly harder for randomised distributions over small-world graphs than for similar distributions over randomgraphs; hence, it appears unlikely that the algorithm of [15] will be efficient for small-world graphs for both dense and less-dense SWGs. Our worst-case complexity results also indicate that inference on SWGs is harder than inference on random graphs. To complement this average-case analysis, we examine several model types, each of which captures different properties of a complex system. For example, our study of directed graphs captures the directionality inherent in many real-world systems, like the WWW. By comparing the worst-case complexity of these different models, we are able to compare the relative impact of model properties on inference. For example, by comparing directed vs. undirected graphs, we can examine the effect of edge directionality on path-based and consistency-based tasks.

4 Bounded Small-World Graph Model 4.1 Problem Classification Based on evidence of efficient algorithms over SWGs for routing and none for colouring, we study two problem classes that generalise routing and colouring, which we call path-based and consistency-based classes, repsectively. We define these classes as follows. Path-based Problem We characterise this class of problem in an SWG as identifying a particular type of path in the graph. This class of problems includes problems like Hamilton Cycle/Path (computing a Hamiltonian cycle/path) and Routing problems (computing a path from one vertex to another).

Definition 10 (Path-Based Problem). Given a graph G(V, E) (which may have costs ϕ associated with edges, vertices or both edges and vertices), a path-based problem concerns finding a path in G(V, E) with properties π and cost ϕ governed by a particular bound b. For example, for Hamilton Path π denotes a path that visits every Vi ∈ V exactly once, and ϕ is a trivial cost function where every edge has cost 1. The difficulty of solving this class of problem is often proportional to the size of the graph diameter. Consistency-based Problem We characterise this problem class as identifying in an SWG G(V, E) a consistent assignment given a constraint set defined over G(V, E). Definition 11 (Consistency-Based Problem). Given a graph G(V, E) where vertex Vi has domain of values Di , a consistency-based problem on a graph associates a constraint with every vertex. That is, for every edge Ek = (Vi , Vj ), where Ek ∈ E and Vi , Vj ∈ V , ∃ a constraint φk that restricts the values that Vi , Vj can simultaneously hold. Given this representation, solving a consistency-based problem entails assigning a value to every vertex Vj ∈ V such that all constraints φk ∈ Φ are satisfied. This class of problem includes colouring, k-clique, SAT, Constraint Satisfaction Problems (CSP), Bayesian Networks (BN), etc. For the k-clique problem the constraints are the (trivial) connectivity of k vertices, so we just have to identify the clique. Other problems have a nontrivial constraint set, e.g., for 3-colouring we must colour the vertices with 3 colours so that no adjacent vertices have the same colour. The complexity of solving this class of problem is proportional to the size of the maximum clique in G(V, E). We will show that the inference complexity depends on the problem, with problems that involve path-analysis typically being simple for SWGs, and problems that involve constraint satisfaction typically being hard for SWGs.

4.2 Bounded SWG Models This section defines the bounded SWG models that we use for proving worst-case complexity results. 4.2.1 Directed SWG Model Our first bounded SWG model assumes that a SWG is directed, and is denoted di-SWG. This is true in many real-world systems; for example, the WWW is arguably a digraph, as are natural systems such as river systems, biological systems (flows of blood, nervous-system signals, etc. are all directed), and technological systems (electrical circuits, etc.). We look at a particular restriction on the di-SWG, a directed acyclic graph (DAG), or DAGSWG. A DAG-SWG can model many complex systems, such as river systems, and it plays a role in several important computational frameworks, such as Bayesian Networks [24]. 4.2.2 Path-Length-Bounded SWG Model We introduce a graph with bounded path-length because of the following observations: • real-world graphs have short distances (and in many cases path-lengths) [22]:

• the results of theoretical analyses of scale-free graph models of a SWG indicate that graph distances are bounded by logn for n vertices, even for graphs with large n [22]. In order to develop NP-completeness results, we need to modify the SWG-based notion of mean distance, which is an average-case result, into a hard bound, that of maximum path-length. As a consequence, we use a model of a small-world-graph (SWG) with bounded maximum path-length (MPL), as follows. Definition 12 (bMPL-SWG). A bounded-MPL small-world-graph (bMPL-SWG) is a SWG G(V, E) in which every pair of nodes has a graph maximum path-length (MPL) bounded by an integer b < |V |. 4.2.3 Cluster-Bounded SWG Model The maximum cluster-size of a graph is important, since it governs the inference complexity of a wide range of consistency-based problems [12], such as constraint satisfaction, Bayesian network inference, and causal network model-based diagnosis. We assume that a SWG can be viewed as a set of clusters Γ = {γ1 , ..., γq }, which are loosely connected. In the extreme, each cluster will be fully-connected, i.e., a clique. Any vertex Vi in a graph may be a member of multiple clusters; the cluster degree of Vi is the size of the clusters in which Vi is a member. The maximum cluster degree max δ ∗ of Vi is given by: δ ∗ (Vi ) = j:Vi ∈Γj |Γj |. We propose a model of a small-world-graph (SWG) with bounded cluster size, as follows. Definition 13 (bcSWG). A bounded-cluster small-world-graph (bcSWG) is a SWG G(V, E) in which every vertex Vi has a graph cluster degree δi bounded by an integer χ < |V |. 4.2.4 Bounded SWG Model We can propose a “combined” model of a small-world-graph (SWG) with bounded path-length and cluster size, as follows. Definition 14 (bSWG). A bounded SWG (bSWG) is a SWG G(V, E) in which (1) every pair of nodes has a graph MPL bounded by an integer b < |V |and (2) every vertex Vi has a graph cluster degree δi bounded by an integer χ < |V |. SWG Model Class DiGraph

Sub-Class

Path-Based

ConsistencyBased

Acyclic (DAG)

Polynomial

Cyclic

NP-complete NP-complete (Polynomial-time approximable)

Bounded PathSWG Length

NP-complete

Polynomial – bounded

NP-complete

Clique-Size NP-complete

NP-complete

PathPolynomialLength + bounded Clique-Size

NP-complete

Figure 2: Summary of worst-case complexity results for various SWG models. For each case, the problem for an arbitrary graph graph is NP-complete.

5 Complexity of SWG Path Analysis In this section we analyse the complexity of three path-based problems, HAMILTON CYCLE, LONGESTPATH and ROUTING . Figure 2 summarises the complexity results for both the path-based and consistency-based problem classes.3 We selected these problems because they are all classical NPcomplete problems [14]. A HAMILTON CYCLE is a cycle in an undirected graph that visits each vertex exactly once and also returns to the starting vertex. In addition, we examine the LONGESTPATH problem, which is closely related to HC. In a graph G(V, E) with all edge weights of 1, the LONGEST- PATH addresses whether there is a simple path of weight (length) k or more. We also examine the ROUTING problem, which concerns finding a path between two distinguished vertices in a graph that meets particular weight bounds. We now examine the complexity of HAMILTON CYCLE, LONGEST- PATH and ROUTING when the underlying graph is modeled using a SWG with directed edges, and with bounds on MPL and/or cluster-size.

5.1

HAMILTON CYCLE

5.1.1 Di-Graph Path-Based Problems Some path-based problems are simple on a digraph. For example, the ROUTING problem is O(logn) [19]. However, several path-based problems remain hard on a digraph. In particular, HAMILTON CYCLE and LONGEST- PATH both remain NP-complete on a digraph. For these NP-complete digraph problems, the known approximation results are promising. For example, it can be shown that a simple greedy algorithm finds long paths in SWG, when the graph is dense [17]. When we restrict the digraph to a DAG, then we have the following results: •

LONGEST- PATH

has O(n) complexity, using dynamic programming [21].

•

HAMILTON CYCLE

•

ROUTING

has polynomial-time complexity.

has O(logn) complexity [19].

5.1.2 Bounded-Path-Length HC We adopt the bMPL-SWG notion described above, and define a Bounded-path-length HAMILTON CYCLE (BHC) problem as follows. BL-HC: INSTANCE: A bMPL-SWG G(V, E), with a graph path-length bounded by an integer b < |V |. QUESTION: Does G contain a Hamilton Cycle, i.e., an ordering hV1 , V2 , ..., Vn i of the vertices of G, where n = |V |, such that (Vn , V1 ) ∈ E and (Vi , Vi+1 ) ∈ E ∀i, 1 ≤ i ≤ n? Theorem 1. Bounded-path-length HAMILTON CYCLE (BL-HC) is NP-complete if b ≥ n. We prove this result by reduction from HAMILTON CYCLE, using a straightforward construction. We omit the full proofs for space reasons. If b < n, we can prove that BL-HC is in both NP and co-NP,4 so its worst-case complexity is uncertain, similar to the situation for INTEGER FACTORING [4]. We prove that BL-HC is in co-NP by defining the complement of BL-HC, which asks the question of whether a bMPL-SWG G(V, E) has no Hamilton cycle. We show that an oracle can test an instance of BL-HC-complement in polynomial time. 3 Due to space constraints, we summarise the complexity results and omit proofs; full details and proofs can be found in [25]. 4 Co-NP is the class of problems that have succinct non-membership witnesses.

However, for a given bound b = O(logn), we have a polynomial-time algorithm, the result of which is proven by generalising a result of [1]. Note that this O(logn) bound is reasonable for a SWG, since L = O(logn) in most real-world systems [22]. Theorem 2. Bounded-path-length b = O(logn)

HAMILTON CYCLE

(BL-HC) is solvable in polynomial time if

5.1.3 Bounded-Cluster HC Using the bounded-cluster SWG model, we can define a Bounded-cluster HAMILTON CYCLE (BCHC) problem: BC-HC: INSTANCE: A bcSWG G(V, E), where |V | = n, with graph cluster degree bounded by an integer χ < n. QUESTION: Does G contain a Hamilton Cycle, i.e., an ordering hV1 , V2 , ..., Vn i of the vertices of G, where n = |V |, such that (Vn , V1 ) ∈ E and (Vi , Vi+1 ) ∈ E ∀i, 1 ≤ i ≤ n? Theorem 3. Bounded-cluster HAMILTON CYCLE (BC-HC) is NP-complete for all non-trivial instances (i.e., where cluster degree δ ≥ 3) of a bcSWG G(V, E). It is known that HC is solvable in polynomial time if G has no vertex with degree exceeding 2 (see, for example, [14]). This translates into a trivial case in terms of maximum cluster degree. 5.1.4 Combined-Model HC This section studies the complexity of HC when we have both the properties of bounded path-length and cluster size. B-HC: INSTANCE: A bSWG G(V, E), where |V | = n, with graph path-length bounded by an integer b < n and graph cluster degree bounded by an integer χ < n. QUESTION: Does G contain a Hamilton Cycle, i.e., an ordering hV1 , V2 , ..., Vn i of the vertices of G, where n = |V |, such that (Vn , V1 ) ∈ E and (Vi , Vi+1 ) ∈ E ∀i, 1 ≤ i ≤ n? Theorem 4. Bounded-SWG HAMILTON CYCLE (B-HC) is NP-complete if b ≥ n and cluster degree δ ≥ 3. Corollary 1. Bounded-SWG HAMILTON O(logn), and cluster degree δ ≥ 3.

5.2

CYCLE

(B-HC) is solvable in polynomial time if b =

ROUTING

We refer to routing problems in a generic manner as problems associated with finding paths in networks (graphs). There are a wide variety of definitions of routing problems, such as described in [14, pp. 211–214]. In this article, we focus on the following definition of routing, which is NP-complete [14]. ROUTING : INSTANCE: A graph G(V, E), where |V | = n, with two distinguished vertices Vi , Vj , and integer k. QUESTION: Does G contain a path from Vi to Vj of length at least k? We use the bSWG notion described above to define a Bounded-MPL Routing (B MPL-ROUTING) problem as follows. B MPL-ROUTING : INSTANCE: A bMPL-SWG G(V, E), with |V | = n, graph path-length bounded by an integer b < n, two distinguished vertices Vi and Vj , and integer k. QUESTION: Does G contain a path from Vi to Vj of length at least k?

Theorem 5.

B MPL-ROUTING

is NP-complete only if b ≥ k.

We prove Theorem 5 through reduction from Theorem 2 to show: Theorem 6.

B MPL-ROUTING

BL - HC .

We can apply the algorithm for proving

is solvable in polynomial time if b = O(logn).

5.3 Discussion The key SWG parameter for path-based problems is the MPL dmax G (Vi , Vj ), as shown by our worstmax case complexity analysis. When dG (Vi , Vj ) = O(log|V |), we can compute paths in time polynomial in |V |. In addition, there are several polynomial-time approximation algorithms for path-based problems [10] that can be applied, given the dense graphs typical of a small-world network.

6 Complexity of SWG Consistency-Based Problems 6.1 Complexity Results DiGraph SWG When we restrict a SWG to be directed, inference for some consistency-based problems remains hard. In particular, we can show the following: Theorem 7. k-COLOURING on a digraph, where no two nodes joined by a directed edge cannot have the same colour, is NP-complete for k ≥ 3. If we further restrict the graph to be a DAG, k-COLOURING still remains NP-complete. Note that a graphical consistency-based problem defined over probabilistic constraints based on a DAG, exact inference in a Bayesian network [24], has been shown to be NP-complete [9]. These results indicate that arc-directionality and acyclicity are not the source of hardness for this class of problems. Bounded-Path-Length SWG A range of consistency-based problems remain NP-complete when we examine a SWG with a bound on the MPL. In particular, k-COLOURING and k-CLIQUE remain NP-complete on a bPL-SWG if bound b < |V | Bounded-Cluster SWG A range of consistency-based problems remain NP-complete when we examine a SWG with a non-trivial bound on the cluster-degree. In particular, k-COLOURING and k-CLIQUE remain NP-complete on a bC-SWG if the cluster-degree δ ≥ 3. Combined-Bound SWG Finally, we examine a SWG in which we bound both the MPL and the cluster-degree. For the model, bSWG, a range of consistency-based problems remain NP-complete. In particular, k-COLOURING and k-CLIQUE remain NP-complete on a bC-SWG if the the MPL is bounded such that b < |V | and the cluster-degree δ ≥ 3.

6.2 Discussion This section has shown that several consistency-based problems remain NP-complete for all nontrivial SWG restrictions that we analysed. This means that graph directionality does not affect these consistency-based problems, nor does adding non-trivial bounds to path-length or cluster-size. This concurs with studies of phase transitions for similar problems [30], which show that these problems exhibit intractability for the range of graph parameters that we have examined. In addition, we should note that many of these consistency-based problems remain computationally intractable under schemes that try to approximate within a bound of ². For example, such results are known for Bayesian networks [11, 23], and problems related to satisfiability, such as prime implicants [28]. This contrasts with the path-based problems, for which polynomial-time approximation algorithms do exist, e.g., [10].

7 Experimental Evidence This section reviews related research that provides empirical evidence of complexity of inference within SWG models.

7.1 SWG Path Analysis 7.1.1 Routing There is evidence (using average-case analysis) that routing in SWGs is simple, as has been shown by our worst-case analysis. Kleinberg [18] has shown that, with high probability, there exist paths between every pair of nodes in an SWG constructed from an n×n lattice whose lengths are bounded by a polynomial in logn. This is exponentially smaller than the total number |V | of nodes in the network. This result has been extended [20] such that any two nodes in a path with expected length O(log 2 n) can be found using a simple greedy algorithm which has no global knowledge of longrange links. Further, expected delivery time is θ(log 2 n). 7.1.2 Average-Case Analysis of Hamilton Cycles in SWGs Our results so far have focused on worst-case analyses where we introduced a bound on the maximum path length. In a SWG, the path-length bound is defined over the average path-length. This section shows how we can extend our analysis to consider the case of average path-lengths. To do this, we must move to an average-case analysis. One piece of prior work that are of particular significance is the analysis of loops in a SWG [2]. This analysis provides further evidence that cycles are unlikely in a typical SWG, and that path-lengths are bounded. This further strengthens the notion that path-based problems are simple in SWGs. Bianconi and Marsili [2] analyse the average number of loops in random scale-free networks containing a fixed number n of nodes and arbitrary length Ł of loops. They show that Hamiltonian cycles are rare in random scale-free networks and may fail to appear if the power-law exponent of the degree distribution is close to two even for minimal connectivity kmin ≥ 3. Further, they show that it is impossible to embed in scale-free networks a regular graph of connectivity c ≥ 3 possessing a Hamilton cycle if γ is too small, even if all nodes have ki ≥ c. In the intermediate region of relatively large loops, the expected number of loops attains its maximum for loops of size Ł ' n.

7.2 SWG Consistency-Based Problems Walsh [29] has experimentally shown that colouring in SWGs is hard. Walsh used complete solvers to analyse a variety of combinatorial problems, like graph colouring. Walsh found a clear correlation between SWG structure and inference complexity using the complete inference algorithms, i.e., inference is easy for regular and random graphs, but hard for SWGs. Roli [26] has analysed SWG structure and its impact on SAT complexity using local-search algorithms. Roli found that the type of search algorithm had more impact on inference complexity than the structure. In particular, whereas the complete inference algorithms show a clear correlation between SWG structure and inference complexity, the case for local search algorithms is nowhere near as clear-cut. Further, [27] experimentally studied the relation between SWG structure and inference complexity using complete solvers. Svenson addresses the colouring problem on SWGs created by rewiring square, triangular, and two kinds of cubic lattice (with coordination numbers 5 and 6). His

results show that, as the rewiring parameter p → 1, not all cases showed the expected crossover to the behaviour of random graphs with corresponding connectivity. For example, for the cubic lattices there is a region near p = 0 for which the graphs are colourable. This could in principle be used as an additional heuristic for solving real world colouring or scheduling problems. Moreover, his results show that SWGs with connectivity 5 and p ' 0.1 provide an interesting ensemble of graphs whose colourability is hard to determine.

8 Summary and Discussion This article has summarised a set of complexity results on models with small-world properties, results focused on problems solved within a SWG framework. These results indicate a clear distinction in the computational complexity of path-based and consistency-based problems on various SWG models. In particular, we have shown that: • Path-based problems tend to be polynomial-time or approximable/solvable by greedy algorithms; these problems are not affected by high clustering of a SWG, and in fact they are easier in dense graphs and directed graphs. • Consistency-based problems remain NP-complete for the SWG restrictions studied; the computational complexity of these problems is directly affected by the high clustering of a SWG, and directionality does not reduce the complexity class from an intractable to a tractable one. There are several important implications of these results on the study of complex systems. First, topology is just one issue to be considered in analysing real-world problems within the SWG framework. In particular, we have shown that the actual problem being addressed (problem functionality) is at least as important as the topology of the network. Second, our preliminary results indicate that there may be important distinctions in inference complexity on SWGs. We conjecture that path-based problems can be solved more efficiently than can consistency-based problems, based on a range of graph topologies. Future work is necessary to see how general this observation is, and whether there are additional problem classes with distinct complexity properties. Third, our study provides supporting evidence to the observation that inference on SWGs is harder than is inference on random graphs. It is of great interest to see if this increased hardness holds for problems beyond HAMILTON CYCLE and COLOURING. We have shown that the function being performed in a network is important, in relation to discrete (atemporal) optimisation problems. In future work, we hope to generalise this analysis to address network dynamics. This will address the interplay between network topology and dynamics, i.e., to assess that topological features are most important to different types of dynamics, both from the perspective of computational complexity but also computability.

References [1] N. Alon, R. Yuster, and U. Zwick. Color-coding. J. ACM, 42(4):844–856, 1995. [2] G. Bianconi and M. Marsili. Loops of any size and hamilton cycles in random scale-free networks. J. Stat. Mech., P06005, 2005. [3] D. Braha and Y. Bar-Yam. Topology of large-scale engineering problem-solving networks. Physical Review E, 69:016113, 2004.

[4] R. P. Brent. Recent progress and prospects for integer factorisation algorithms. In COCOON, pages 3–22, 2000. [5] S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks, 30:107–117, 1998. [6] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web. In Proceedings of the 9th international World Wide Web Conference on Computer networks: the international journal of computer and telecommunications networking, pages 309–320, Amsterdam, The Netherlands, The Netherlands, 2000. North-Holland Publishing Co. [7] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web. In Proc. WWW Conf. on Computer networks, pages 309–320, 2000. [8] P. Cheeseman, B. Kanefsky, and W. M. Taylor. Where the Really Hard Problems Are. In Proc. IJCAI-91, pages 331–337, 1991. [9] G. F. Cooper. The computational complexity of probabilistic inference using bayesian belief networks. Artif. Intell., 42(2-3):393–405, 1990. [10] B. Csaba, M. Karpinski, and P. Krysta. Approximability of dense and sparse instances of minimum 2-connectivity, TSP and path problems. In SODA ’02, pages 74–83, 2002. [11] P. Dagum and R. M. Chavez. Approximating probabilistic inference in bayesian belief networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15:246–255, March 1993. [12] R. Dechter. Bucket elimination: A unifying framework for reasoning. Artificial Intelligence, 113(1-2):41–85, 1999. [13] P. Erdos and A. Renyi. On random graphs. Publ. Math. Debrecen, 6:290–297, 1960. [14] M. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NPCompleteness. Freeman, 1978. [15] Y. Gurevich and S. Shelah. Expected computation time for hamiltonian path problem. SIAM J. Comput., 16(3):486–502, 1987. [16] R. F. i Cancho, C. Janssen, and R. V. Sole. Topology of technology graphs: Small world patterns in electronic circuits. Physical Review E (Statistical, Nonlinear, and Soft Matter Physics), 64(4):046119, 2001. [17] D. R. Karger, R. Motwani, and G. D. S. Ramkumar. On approximating the longest path in a graph. Algorithmica, 18(1):82–98, 1997. [18] J. Kleinberg. The small-world phenomenon: an algorithm perspective. In STOC ’00: Proceedings of the thirty-second annual ACM symposium on Theory of computing, pages 163–170, New York, NY, USA, 2000. ACM Press. [19] C. Martel and V. Nguyen. Analyzing Kleinberg’s (and other) small-world Models. In PODC ’04, pages 179–188, 2004.

[20] C. Martel and V. Nguyen. Analyzing Kleinberg’s (and other) small-world Models. In PODC ’04: Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing, pages 179–188, New York, NY, USA, 2004. ACM Press. [21] L. Michel and P. V. Hentenryck. Maintaining longest paths incrementally. In Principles and Practice of Constraint Programming, pages 540–554, 2003. [22] M. Newman. The structure and function of complex networks. SIAM Review, 45(2):167– 256, 2003. [23] J. D. Park and A. Darwiche. Complexity Results and Approximation Strategies for MAP Explanations. J. Artif. Intell. Res. (JAIR), 21:101–133, 2004. [24] J. Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, 1988. [25] G. Provan. The Computational Complexity of Complex Systems: the Role of Topology and Functionality. Technical report, Computer Science Department, University College Cork, Cork, Ireland, 2006. [26] A. Roli. On the impact of small-world on local search. In Proceedings of the 9th Congress of the Italian Association for Artificial Intelligence, Milan, Italy, 2005. [27] P. Svenson. From Neel to NPC: Colouring small worlds. cs.CC/0107015, 2004. [28] C. Umans. On the complexity and inapproximability of shortest implicant problems. Lecture Notes in Computer Science, 1644:687–696, 1999. [29] T. Walsh. Search in a small world. In IJCAI, pages 1172–1177, 1999. [30] T. Walsh. The Interface between P and NP: COL, XOR, NAE, 1-in-k, and Horn SAT. In AAAI/IAAI, pages 695–, 2002. [31] D. J. Watts and S. H. Strogatz. Collective dynamics of “small-world” networks. Nature, 393:440–442, 1998. [32] H. S. Wilf. An o(1) expected time algorithm for the graph coloring problem. Information Processing Letters, 18:119–122, 1984. [33] S. Wuchty, E. Ravasz, and A.-L. Barabsi. The architecture of biological networks. In T. Deisboeck, J. Y. Kresh, and T. Kepler, editors, Complex Systems in Biomedicine, New York, 2003. Kluwer Academic Publishing.