On the Complexity of Determining Autonomic Policy ... - IEEE Xplore

1

On the Complexity of Determining Autonomic Policy Constrained Behaviour Mark Burgess and Lars Kristiansen Oslo University College, Norway and Department of Mathematics, University of Oslo, Norway Abstract— Policy Based Management aims to constrain and even to determine the behaviour of computer systems that operate in dynamic environments, e.g. for the implementation of business goals. Autonomic computing supplements this with the aim to allow computer systems operate in a stable and predictable fashion with a minimum of human involvement. In this work we use a formulation of autonomic computing based on the cfengine model of convergent operations to discuss the computational cost of implementing autonomic regulation. By placing the autonomic properties of a system at a low level, but with a high degree of abstraction, we are able to make quite general statements about the computation cost of searching for autonomic policies. Keywords— Autonomic methods, computational complexity.

plexity theorists will also recognize their own role in the argument. In this way, we hope that future work will be encouraged. Thus we concentrate on explaining how our conclusions are arrived at and their practical importance. There is very little work in this area of relating traditional computer science theory to management problem solving, but recent work based on a quite different approach confirms our general conclusions[3], [4]. The conclusions we present are directly relevant to the implementation of autonomic systems as we have shown earlier[5], [6], and especially to those formulated in such as way as to use policy based constraints on their behaviour.

I. Introduction

II. Computer behaviour

How hard is it to build and tune a computer system so that one can predict its operational behaviour from an original specification? This is the central question behind the value of autonomic computing: can we save valuable resources by giving systems a simple specification and selfrepairing abilities, then leaving them to do the work without human intervention? There are many measures one might use to discuss the complexity of solving this problem[1]: the number of lines of code required to solve the problem, the number of hours of planning needed to formulate it, the CPU cycles expended during the task, the memory resources used to run the algorithms, etc. All of these are potential measures of difficulty or complexity, but to answer the question in a precise manner, one must formalise the question. In this paper we summarise a number of theorems that are within easy reach of a computational complexity theorist (whose repercussions are of great interest to the management community) and which show that our best models of autonomic policy based management involve non-trivial problems, i.e. problems which belong to complexity classes like np and pspace in general. The implication of this is that traditional software design-based approaches to autonomics are unlikely to find suitable algorithms using deterministic development approaches in the general case, and one must either restrict the problem domain of autonomics, or settle for heuristics and approximations. The full proofs of our arguments (which are too large to fit into a short paper) will be given elsewhere[2]. In bridging two disparate disciplines, we ask the forebearance of readers: we attempt to explain unfamiliar mathematical concepts to a management audience in a sketchy way, while maintaining sufficient rigour that com-

We begin with the ability to configure the behaviour of a computer system. The often unstated assumption about configuration management is the following (see fig. 1): a high level policy can be used to determine a number of equivalent low level configurations on a computer system. There will then be a direct association between a “correctly configured computer” and a “correctly behaving computer”, where “correct” means “policy or specification compliant”. The notion of policy based management[7], [8], [9] was important step in separating decision-making from implementation. We understand policy to mean a catalogue of decisions, whose outcomes have been pre-determined, like a code-book of behaviours. When an expected situation arises, there is already an answer to the problem encoded in policy and we merely have to look up the appropriate behaviour. This idea is quite general and has been applied in many arenas, both for runtime behaviour and configuration choices. In configuration management, policy allows a move away from explaining configuration in the imperative terms of an algorithmic recipe of configuration operations, towards the specification of declarative guidelines for the end goal itself. This focuses attention on the final behaviour rather than on intermediate changes of state. Of course, by focusing on goal-oriented, post-condition policies one does not remove the need to find an algorithm for implementing the end result. However, this approach furnishes us with a neat framework for discussing the complexity of solving a configuration problem. We can make a much more direct attack on the problem of configuration management’s algorithmic complexity by studying ways of achieving a fixed final result, satisfying certain properties.

978-1-4244-2066-7/08/$25.00 ©2008 IEEE

295

Policy { object −> promise object −> promise ... }

High Level Policy

Set of Configurations

Set of Behaviours

Fig. 1. The main hypothesis in configuration management is that there is an association between high level policy, low level configuration and eventual behaviours.

III. Configuration A machine configuration is a set of registers and symbolic file contents containing information that alters (but does not necessarily fully determine) its programmed behaviour. The description of symbolic patterns is well known in computer science as the theory of formal grammars[10]. Any pattern must be formed from a number of distinct symbols (the set of which is called the alphabet Σ of the language), and the specific rules to which a pattern conforms determine its language L. We define a general configuration as follows: Definition 1 (Configuration) A configuration is a discrete pattern that is exhibited by a computer system. It is a string belonging to some language L, with alphabet Σ. We shall confine ourselves to the simplest binary alphabet consisting of the symbols Σ = {0, 1}, from which we can code any other. However, we must acknowledge that there are many levels of coding that are represented in a computer system. A configuration can, in other words, be thought of as a complicated, possibly distributed state. To effect change on the state we employ ‘operations’ or operators that transform one state into another. Such operators might for example create, delete, or alter the attributes of managed objects, such as files or processes; they might make orderdependent modifications to grammar trees such as complex file structures, relational databases or command driven interfaces etc, or a variety of other examples. To make these changes predictable and reliable, it has been shown that we can imbue them with certain properties which represent their compliance with policy[5], [6]. We require therefore a spanning set of operators, which act on symbolic patterns of configuration data at the appropriate level of coding. To end up with a predictable result, there are two properties of interest that we can give to these operators: • Idempotence: the repetition of an operation leads to no change after the initial application. i.e. f (f (x)) = f (x), or for any state q, [[O]]2 q = [[O]]q. This contains the essence of non-repetition, but it is a property without an anchor

– it does not lead to a predictable post-condition, unless we also know the initial state. So this property has the special status of a bridge between the two approaches to configuration management. • Convergence at a fixed point: the first application of an operation leads to the specified policy result, regardless of the initial state. Thereafter, no further changes should take place. i.e. for any initial state q and a policy-compliant state p, [[O]]2 q = [[O]]q = p. This is the approach preferred by cfengine[9], [5]. Other properties might also be of interest to us. For instance, injectivity of an operator tells us whether it has a unique inverse (is it a one-to-many map). This is a property that some system administrators value, since it means that operations can be ‘rolled back’ to their former state, after a change. IV. Algorithmic complexity The key question is now what is the computational complexity of configuration management and what do we mean by this? If we measure the complexity of carrying out a simple change in configuration, then the answer cannot be very hard: any change can be made in an amount of time linear in the input, simply by substitution of values. If, however, we have to determine the correct values and implementation method in advance, as part of our estimation of complexity, then the complexity of the task grows considerably from the application of a single transformation to a search in a potentially huge space of possible operations. The measure we look for here corresponds to the time it would take a somewhat dumb and blind, brute-force search through the space of possible answers, by a systematic and somewhat dogged agent. Clearly, one could narrow the search and reduce the complexity if one could inject some intelligence in the form of additional constraints, but that is not what computer science usually measures. The usual way of characterising the answer is to say how much time and space it would take to complete a task, as a function of the size of its input. The most well known time complexity

296

classes are p and np; the space complexity classes linspace and pspace are slightly less well known (see fig 2; for a further discussion, see ref. [2]). P

NP

LINSPACE

PSPACE

Fig. 2. The complexity classes in the space of algorithms. This is a likely picture.

The question we are asking then is not how long does the configuration management software take to run: rather it is, do we know how we write the configuration management software in the first place? The difficulty with system administration lies in determining the correct configuration, satisfying the constraints of policy in an efficient manner. We claim that even simple alphabetic configuration problems are, in general at least np-hard. V. Operators formalised as bit transitions We must formalise the operators that change one configuration into another. We use an approach inspired by the matrix formulation in ref. [11], to state the problems with sufficiently of mathematical precision. Let Bn be the set of all bit strings of length n, e.g. B4 = {0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111} . If b ∈ Bn , then bi denotes the i’th bit in b. We will count the bits in a string from the left to the right, thus if b = 0111, then b1 = 0, b2 = 1, b3 = 1 and b4 = 1. An operator is nothing but a total function with domain Bn and range Bn . So, an operator transforms a string of bits into another string of bits of the same length. Even if there are only finitely many operators over the domain Bn , there are still a lot of them. The exact number of operator n over the domain Bn is 2n2 . There are more that 1019 operators over the domain B4 , and there are more operators over the domain B8 than there are elementary particles in the universe. We need a language strong enough to express all the operators over the domain Bn in a convenient and uniform way. That is our motivation for introducing the operator expressions. An operator expression e is a sequence α1 , α2 , . . . , αn of boolean expressions. If there are n boolean expression in the sequence, the only variables allowed in the expressions are x1 , x2 , . . . , xn .

An operator expression e ≡ α1 , α2 , . . . , αn defines an operator over the domain Bn . As explained, we use [[e]] to denote the operator defined by the operator expression e. The value [[e]](b), that is, the value of [[e]](b) in the particular argument b ∈ Bn is given by the value of the sequence α1 , . . . , αn under the assignment x1 := b1 , . . . , xn := bn . The first bit of [[e]](b) is given by the value of α1 under the assignment, the second bit by the value of α2 , and so on. We now state four problems that play a central role in predictable management operations: • (Problem IDM) Input: an operator expression e. Question: Is [[e]] idempotent, i.e. do we have [[e]]([[e]](x)) = [[e]](x) for all x ∈ Bn ? • (Problem INJ) Input: an operator expression e. Question: Is [[e]] an injection, i.e. do we have [[e]](x) = [[e]](y) if x = y? • (Problem CON) Input: an operator expression e. Question: Is [[e]] convergent, i.e. does there, for every x ∈ Bn , exist k ∈ N such that [[e]]k (x) = [[e]]k+1 (x)? • (Problem CONx) Input: a pair e, b where e is an operator expression and b ∈ Bn . Question: Is [[e]] convergent in b, i.e. does there exist a k ∈ N such that [[e]]k (b) = [[e]]k+1 (b)? The problem CON corresponds roughly to cfengine’s notion of convergence at a fixed point, since it claims that (after a certain number of changes have taken place by repeated application of the operator) the result will converge to a specific location for any input. It does not specify what the final location is, so the outcome is not as predictable as we would like. Ideally we want [[e]]k (x) = [[e]]k+1 (x) = ”policy” for all x, but from a complexity theoretic viewpoint it is the convergence that is hard to verify, not the point at which the operator converges. Hence we are particularly interested in this case for configuration management[5]. What we mean by the complexity in this context is determining whether the policy-operator expression e is convergent. More precisely: we declare the answer b to be a policy goal. This tells us how we would like the system to look after the repeated operation of [[e]]. Then we take the trial method [[e]], defined by the operator expression e, and we ask: is this policy implementable for every possible start configuration x? If not, then we have not truly solved the problem with [[e]]. VI. On the Complexity of IDM and INJ An algorithm is said to run in polynomial time if the number of basic steps in an execution of the algorithm is bounded by a polynomial in the length of the input. If a problem cannot be solved by an algorithm running in polynomial time, the problem is considered to be intractable, that is, the problem is considered too hard for a computer as no program solving the problem will terminate in reasonable time on every input. This explains why the notion of polynomial-time reducibility is a basic notion in complexity theory: Definition 2: A function f polynomial-time reduces a problem A to a problem B if (and only if) 1. the answer to question A with input x is the same as the answer to question B with input f (x)

297

2. the function f in computable in polynomial time. If there exists a function f which polynomial-time reduces the problem A to the problem B, we will say that A is polynomial timed reducible to B. If a problem A polynomial-time reduces to a problem B, then any polynomial-time algorithm solving B will yield a polynomial-time algorithm solving A. Thus, the notion of polynomial-time formalises the notion of one problem being harder than an other. If A polynomial-time reduces to a B, then B is at least as hard as A; and if, in addition, B does not polynomial-time reduce to A, then B is a strictly harder problem than A. The problem SAT is well known from the literature as an intractable problem, and it is perhaps the most famous of all np-hard problems. (Problem SAT) Input: a boolean expression α. Question: Is α satisfiable, i.e., is it possible to assign values to the variables of α such that α = 1? The next theorem follows from results proved in [2]. Theorem 1: i) SAT is polynomial time reducible to IDM. ii) SAT is polynomial time reducible to INJ. We offer a sketch of the proof below for illustrative purposes. For full proofs see ref. [2]. (We prove in [2] is that SAT is polynomial time reducible to coIDM and coINJ, that is, to the problems known as the co-problems of respectively IDM and INJ. Theorem 1 follows straightforwardly from these results.) Thus, both IDM and INJ are as least as hard as SAT, and any algorithm solving the problem IDM, or the problem INJ, in polynomial time, yields an algorithm solving SAT in polynomial time. The argument is like saying that if SAT is far away from the tractable, and INJ and IDM are as least as far away as SAT, then INJ and IDM must also be far away. For more details on the complexity of INJ and IDM, see ref. [2]. A. Proof sketch From definitions, SAT is polynomial time reducible to coIDM if there exists a function f such that i) the answer to the question SAT on input x is the as the answer to the question coIDM on input f (x) ii) the function f in computable in polynomial time. When we have provided such a function f , we have proved the theorem. The input to SAT is a boolean expression whereas the input to coIDM is an operator expression. Thus, the function f should transform an arbitrary boolean expression α to an operator expression e = f (α) such that 1) if α is satisfiable, then [[e]]([[e]](b)) = [[e]](b) for some b ∈ Bn . 2) if α is not satisfiable, then [[e]]([[e]](b)) = b for all b ∈ Bn . We can, without loss of generality, assume that if a boolean expression contains n different variables, then every variable in the expression occurs in the list x1 , . . . , xn . Let inv be the operator inverting the bits of a string, and

let the operator opα : Bn → Bn be such that ⎧ if α = 1 under either ⎪ ⎪ ⎪ ⎪ ⎨ inv(b) the assignment x1 := b1 , . . . , xn := bn or the assignment x1 := b1 , . . . , xn := bn opα (b) = ⎪ ⎪ ⎪ ⎪ ⎩ b otherwise. The boolean expression α is a parameter to the operator, and for each α we have an operator opα . In the case (1) when α is satisfiable there will be at least one bit string b such that opα (b) = inv(b). Observe that if opα (b) = inv(b), then we also have opα (inv(b)) = inv(inv(b)). Further, for any bit sting c we have inv(inv(c)) = c. Thus, if α is satisfiable, there will be at least one bit sting b such that opα (opα (b)) = inv(inv(b)) = b = inv(b) = opα (b) that is, we have opα (opα (b)) = opα (b) for some b ∈ Bn . In the case (2) when α is not satisfiable we have opα (b) = b for all b ∈ Bn , and hence also opα (opα (b)) = opα (b) for all b ∈ Bn . This shows that if e is an operator expression such that [[e]] = opα , then (1) and (2) will be satisfied. We will proceed as follows. First we will provide a function f transforming a boolean expression α into an operator expression e such that [[e]] = opα . Thereafter we provide an algorithm computing f and argue that the algorithm runs in polynomial time. That will prove the theorem. Let α ˆ denote the boolean expression we get when we replace each variable in α by its complement, that is, we replace each occurrence xi by xi . Further, let ˆ ) ) + ( xi · (α + α ˆ) ) ) βi ≡ ( ( xi · (α + α for i = 1, . . . , n, and finally let f (α) = β1 , . . . , βn . The observation that ⎧ if α = 1 under either ⎪ ⎪ ⎪ ⎪ the assignment x1 := b1 , . . . , xn := bn ⎨ xi or the assignment x1 := b1 , . . . , xn := bn βi = ⎪ ⎪ ⎪ ⎪ ⎩ xi otherwise. makes it easy to see that we indeed have [[e]] = opα whenever e = f (α). Thus, we can devote the remainder of the proof to argue that f is computable in polynomial time. For any reasonable representation of βi , there will be the case that the number of symbols required to represent xi is bounded by |α|; the number of symbols required to represent xi is bounded by 2|α|; the number of symbols required to represent α ˆ is bounded by 2|α|. Thus, there exist fixed numbers k and such that the number of symbols required to represent βi is bounded by k|α|+, and the number of symbols the algorithm prints will be bounded by 2 + (n − 1) + n(k|α| + ). Now, n is the number of (different) variables in α, and there cannot be more variables in α than there are symbols in α, hence, we also have n ≤ |α|, and the number of symbols printed will be bounded by 2 + (n − 1) + n(k|α| + ) ≤ 2 + |α| + |α|(k|α| + ) .

298

Let p(x) = 2 + x + x(kx + ) and p(|α|) is a bound on the number of symbols printed by the algorithm. It is now straightforward to argue that the algorithm runs in polynomial time, i.e. there will exist fixed numbers a and b such that the overall number of steps executed is bounded by (p(|α|) + a)b . Thus, the algorithm runs in polynomial time. This completes the proof the theorem. VII. On the Complexity of CON and CONx The so-called pspace-hard problems are in general believed to be very hard, much harder than some np-hard problems and way out of reach of algorithms running in polynomial time. It turns out that CONx indeed is pspace-hard, and we will sketch a proof. A. Turing machines and pspace The class pspace is defined by Turing machines. A Turing machine is a theoretical computing device, and to define a particular Turing machine, we need to give the machine’s (i) alphabet, (ii) states, (iii) instructions and (iv) start state, that is, the state the Turing machine shall be in when the computation starts. At any particular moment in time the machine will be in one, and only one, of a finite number of possible states, and a device called the machine’s head, will be scanning one, and only one, of the cells on an infinite tape. The symbol in the scanned cell is called the scanned symbol. A cell containing B will be called a blank cell. The combination of the current state and the scanned symbol decides what the Turing machine will do next, and the possible actions are indeed very limited: (i) write a symbol in the scanned cell and enter a new state (ii) move the head one position to the left and enter a new state (iii) move the head one position to the right and enter a new state The exact action to be executed is determined by a finite set of instructions of the form a, q, A, r where a is an alphabet symbol, q and r are states, and A is either an alphabet symbol, or a left arrow (←), or a right arrow (→). A single instruction a, q, A, r is interpreted as follows: if the machine is in the state q and is scanning the symbol a, then the machine will carry out the action A and enter state r. If A is an alphabet symbol, the action will simply be to write that symbol in the scanned square; if A is an arrow, the action will simply be to move the scanning head one position in the direction of the arrow. We only consider deterministic Turing machines that always terminates, and thus, we can without loss of generality assume that one, and only one, instruction is applicable in a particular situation. (If a Turing machine is nondeterministic, several instructions might be applicable, and the machine will execute one of them. We can also imagine that none of the instructions are applicable, then the machine will simply “hang” and never terminate.)

When a Turing machine starts, the leftmost cell on the tape, that is, the first cell, will be blank, and the input will be found, symbol by symbol, in the consecutive cells. The input will not contain any blanks, and the head will be scanning the first symbol of the input. The remaining cells of the infinite tape are all blank. E.g., if the input abbaa is given to a machine, the tape will be in the configuration B

a

b

b

a

a

B

B

B

B

B

B

···

↑ when the execution starts. The arrow marks the scanned cell. Turing machines can perform tasks like computing functions and solving problems by manipulating the symbols on the tape and, somewhat loosely, pspace is the class of problems solvable by Truing machines working in polynomial space. Here is an exact definition of what it means for a Turing machine to solve a problem followed by exact definitions of pspace and the pspace-hard problems. Definition 3: A Turing machine M solves the problem A if, and only if, whenever we execute M on input x, then M will halt in • a dedicated accept state qA if the answer to A on the input x is YES • a dedicated reject qR if the answer to A on the input x is NO. If x is the input to a Turing machine, then |x| denotes the number of alphabet symbol in x. A Turing machine M works in polynomial space if, and only if, there exists a polynomial p such that the number of tape cells visited by M ’s head during an execution on input x is bounded by p(|x|). A problem belongs to the class pspace if, and only if, the problem can be solved by a Turing machine working in polynomial space. A problem B is pspace-hard if, and only if, any problem in pspace is polynomial time reducible to B. B. Turing machines and operators We can view a deterministic Turing machine M as an operator TM , that is, TM is a unary function transforming one configuration into another one, and when M , by executing an instruction, makes a transition from the configuration c to the configuration c , then TM makes the same transition, that is, TM (c) = c . If none of the instructions of M ’s is applicable to a given configuration c, we define TM (c) = c. Such an operator TM is essentially a transition function of M . Note that if no instruction of M ’s is applicable for the configuration c, we have TM (c) = c by definition. As stated in the definition above, if M says NO and rejects the input, then M will terminate in a dedicated reject state qR . We can modify M slightly such that, in contrast to rejecting by entering the state qR , M rejects by moving the head back and forth between two cells forever and ever. Thus, when M is given the input x, the execution of M will either end up in the state qA (accepting x) or end up alternating between to configurations c, c , c, c , c, c , . . . (rejecting x).

299

Let TM be the transition function of this modified version of M , and let c be the initial configuration for M at the input x. Then, if the answer to A on the input x is YES, k+1 k (c) = TM (c), and if the answer there exists k such that TM to A on the input x is NO, no such k exists, in other words, TM converges in the argument c if and only if the answer to A on the input x is YES. Now, a Turing machine will only access a finite portion of the tape when it is executed a particular input x, and thus, a configuration can be represented as a finite sequence of bits. Moreover, if a Turing machine working in polynomial space generates the configurations c0 , c1 , c2 , . . ., the number of bits required to represent each configuration ci will be bounded by a polynomial in the length of the input, hence, if n ≥ p(|x|) where x is the input and p is some fixed polynomial, bit strings taken from Bn will be large enough to encode any configuration in the sequence c0 , c1 , c2 , . . .. This explains why the next theorem holds. Theorem 2: Let M be a deterministic Turing machine solving a problem in polynomial space. There exists a fixed polynomial p such that for any input x to M there exist a bit string b ∈ Bn and an operator T : Bn → Bn where n = p(|x|) such that the following assertions are equivalent: k k+1 • T converges in the b, that is, T (b) = T (b) for all sufficiently large k • the execution of M on input x halts in the accept state qA . C. CONx is pspace-hard, – what about CON? To prove that CONx is pspace-hard, we have to prove that every problem in pspace is polynomial time reducible to CONx, that is, we have to prove that for every problem A in pspace there exists a polynomial time computable function f such that the answer to the problem A on the input x is the same as the answer to the problem CONx on the input f (x). If a problem A belongs to pspace, there does of course exist a deterministic Turing machine M solving A in polynomial space. Now, have a second look at Theorem 2. Read the theorem carefully and do particularly note that . . . for any input x to M there exist b ∈ Bn and T : Bn → Bn . . . The operator T is more or less the transition function for the Turing machine M , and the bit string b codes the initial configuration of the execution of M on input x. When x and M are given, it will be possible to construct the bit string b and an operator expression e for the operator T , that is, e such that [[e]] = T . Moreover, the construction can be carried out by an algorithm running in polynomial time in the length of x. (The Turing machine M is fixed and should not be viewed as input to the algorithm.) Any trained complexity theorist can easily design such an algorithm, but since the details are involved and cumbersome, we will save the reader from the entire story. The average reader should be satisfied to realise the consequences of having such an algorithm: There exists a polynomial time computable function f such that when f (x) = e, b, then the answer to the problem A on input x is the same as the

the answer to problem CONx on input e, b. Hence, A is polynomial time reducible to CONx. But A is an arbitrary problem in pspace. Hence, any problem in pspace is polynomial time reducible to CONx, and thus, the next theorem holds. Theorem 3: CONx is pspace-hard. A brief summary sketching the proof of the theorem follows. Let A be an arbitrary problem in pspace. Let M be a deterministic Turing machine solving the problem A in polynomial space. (Since A is in pspace, we know that such a Turing machine exists.) We have a polynomial time computable function f such that when f (x) = e, b, the execution of M on input x halts in the accept state qA if, and only if, the operator [[e]] converges in the argument b. Thus, when f (x) = e, b, the following assertions are equivalent: • the answer to A on input x is YES • the execution of M on input x halts in the state qA • the operator [[e]] converges in the argument b • the answer to the problem CONx on input e, b is YES. Thus, the answer to the problem A on input x is the same as the answer to the problem CONx on input f (x), and since f is a polynomial time computable function, the problem A is polynomial time reducible to CONx. So, Theorem 3 states that CONx is pspace-hard. What about CON? Note the difference between the problem CON and the problem CONx. If you are asked to solve the problem CONx, you will be given a pair e, b where b ∈ Bn and e is an operator expression of the form α1 , . . . , αn , and then, it is your task to decide if the operator [[e]] converges in the particular argument b, that is, you should decide if there exists a k such that [[e]]k (b) = [[e]]k+1 (b). If you are asked to solve the problem CON, you will be given nothing but an operator expression of e of the form α1 , . . . , αn , and it is your task to decide if the operator [[e]] converges for every x ∈ Bn . This sounds like a formidable task as you have to check if the operator converge for each of the 2n elements in Bn . Thus, at a first glance the problem CON seems harder than the problem CONx, but if a trained complexitytheorist thinks twice, he will realize that this is not necessarily the case. Indeed, we have not been able to prove that CON is pspace-complete, but neither have we been able to prove any results indicating that CON is not pspacecomplete, e.g. that the problem belongs to np. Hence, we have an open problem that seems a bit interesting from a pure complexity-theoretic point of view. VIII. Conclusions Although we are unable to present the details of our proofs in this paper, both for reasons of space and audience, our intention has been to bring the central implications of our algorithmic complexity analysis to the management field, and to remark on the conclusions that can be drawn from these. We hope that this will serve as an indication of what can be achieved by such theoretical investigations, as well as provide a platform for future cross-disciplinary work. Complexity analysis essentially tells us the worst

300

case cost of finding solutions: it does not however exclude approximate or heuristic solutions that might be obtained for more cheaply; indeed it tells us when we must seriously start to look for these. Our message here is that such heuristics are the way forward. Our argument proceeds as follows: an autonomic system is equivalent to a repeatable and convergent policy specification, and since this can be implemented fully automatically using operators with special properties we can address the broader problem from a number of simple and fundamental results. The strength of our argument is that we can infer results from the known behaviour of classical automata, without needing to know any details of implementation. The vision of autonomic computing is thus shown to be a non-trivial problem for classical deterministic computer science, without ever having to implement it. The thrust of our argument is to use the notion of convergence[9], [12], which enables policies to be regulated and maintained without human intervention. The term np-hard is often used in computing literature to scare us into believing that a problem is insoluble. The problems we present are np-hard (or even pspace-hard), as noted also by Sun and Couch[3]. Does this mean that there is no real hope of building autonomic systems? Clearly this is not true, as cfengine (a widely used autonomic management system) shows. Rather it tells us that we need to limit either the scope or expectations to something that can be solved. The vision offered by autonomics and its less marketed forerunner computer immune systems[13], [14], [15] is extensive and ambitious, but we know that simple cellular systems performing basically dumb repeated operations do successfully regulate biological organisms, albeit imperfectly and approximately. Our work shows that the best strategy for realizing this vision is to limit the scope of system policies to those areas where solutions are known by whatever means. A brute force search for implementing a generic wish-list of autonomic behaviours would be an inadvisable strategy. We are grateful to Jan Bergstra for helpful discussions. This work is supported by the EC IST-EMANICS Network of Excellence (#26854)

[7] [8] [9] [10] [11] [12]

[13] [14] [15]

national Workshop on Modelling Autonomic Communications Environments (MACE); Multicon verlag 2006. ISBN 3-93073605-5, pages 197–222, 2006. M. Sloman. Policy driven management for distributed systems. Journal of Network and Systems Management, 2:333, 1994. M.S. Sloman and J. Moffet. Policy hierarchies for distributed systems management. Journal of Network and System Management, 11(9):1404, 1993. M. Burgess. A site configuration engine. Computing systems (MIT Press: Cambridge MA), 8:309, 1995. H. Lewis and C. Papadimitriou. Elements of the Theory of Computation, Second edition. Prentice Hall, New York, 1997. M. Burgess. Analytical Network and System Administration — Managing Human-Computer Systems. J. Wiley & Sons, Chichester, 2004. A. Couch and N. Daniels. The maelstrom: Network service debugging via ”ineffective procedures”. Proceedings of the Fifteenth Systems Administration Conference (LISA XV) (USENIX Association: Berkeley, CA), page 63, 2001. S. Forrest, S. Hofmeyr, and A. Somayaji. Computer immunology. Communications of the ACM, 40:88–96, 1997. A. Somayaji, S. Hofmeyr, and S. Forrest. Principles of a computer immune system. New Security Paradigms Workshop, ACM, September:75–82, 1997. M. Burgess. Computer immunology. Proceedings of the Twelth Systems Administration Conference (LISA XII) (USENIX Association: Berkeley, CA), page 283, 1998.

References [1]

[2] [3] [4] [5] [6]

A. Brown, A. Keller, and J. Hellerstein. A model of configuration complexity and its application to a change management system. In Proceedings of the IXth IFIP/IEEE International Symposium on Integrated Network Management IM’2005, pages 631–644. IEEE, 2005. M .Burgess and L. Kristiansen. Handbook of Network and System Administration, chapter On the Complexity of Change and Configuration Management. Elsevier, 2007. A. Couch and Y. Sun. On observed reproducibility in network configuration management. Science of Computer Programming, 53:215–253, 2004. Yizhan Sun and Alva Couch. Handbook of Network and System Administration, chapter Complexity of System Configuration Management. Elsevier, 2007. M. Burgess. Configurable immunity model of evolving configuration management. Science of Computer Programming, 51:197, 2004. M. Burgess and A. Couch. Autonomic computing approximated by fixed point promises. Proceedings of the 1st IEEE Inter-

301