Dynamic Multiple Fault Diagnosis: Mathematical ... - CiteSeerX

0 downloads 0 Views 327KB Size Report
expressed in this paper are solely those of the authors and do not represent those of the sponsor. ...... were varied between 0.0049-0.0051 and 0.00025-0.00033, ..... binary case, we have used the notation δk(0) = δk(xi(k) = 0) and δk(1) .... [12] D. Bertsekas, Nonlinear programming, Athena Scientific, Belmont, MA,. USA, 2nd ...
1

Dynamic Multiple Fault Diagnosis: Mathematical Formulations and Solution Techniques Satnam Singh, Anuradha Kodali, Kihoon Choi, Krishna Pattipati, Fellow, IEEE, Setu Madhavi Namburu, Shunsuke Chigusa Sean, Danil V. Prokhorov and Liu Qiao

Abstract— Imperfect test outcomes, due to factors such as

level facilitates coordination among each of the subproblems,

unreliable sensors, electromagnetic interference, and environ-

and can thus reside in a vehicle-level diagnostic control unit.

mental conditions, manifest themselves as missed detections

At the bottom level, we use a dynamic programming technique

and false alarms. This paper develops near-optimal algorithms

(specifically, the Viterbi decoding or Max-sum algorithm) to solve

for dynamic multiple fault diagnosis (DMFD) problems in the

each of the subproblems. The key advantage of our approach is

presence of imperfect test outcomes. The dynamic diagnostic

that it provides an approximate duality gap, which is a measure of

inference problem is to determine the most likely evolution of

suboptimality of the DMFD solution. Interestingly, the perfectly-

component states, the one that best explains the observed test

observed DMFD problem leads to a dynamic set covering

outcomes.

problem, which can be approximately solved via Lagrangian

Here, we discuss four formulations of the DMFD problem.

relaxation and Viterbi decoding. Computational results on real-

These include the deterministic situation corresponding to a

world problems are presented. A detailed performance analysis

perfectly-observed coupled Markov decision processes, to several

of the proposed algorithm is also discussed.

partially-observed factorial hidden Markov models ranging from the case where the imperfect test outcomes are functions of tests

I. I NTRODUCTION

only to the case where the test outcomes are functions of faults and tests, as well as the case where the false alarms are associated intractable NP-hard combinatorial optimization problems. We

O

solve each of the DMFD problems by decomposing them into

condition-based and opportunistic maintenance, and to reduce

separable subproblems, one for each component state sequence.

maintenance and operational costs by seamlessly integrating

Our solution scheme can be viewed as a two-level coordinated

the on-board and off-line diagnosis, thereby reducing trou-

with the nominal (fault-free) case only. All these formulations are

solution framework for the DMFD problem. At the top (coordination) level, we update the Lagrange multipliers (coordination variables, dual variables) using the subgradient method. The top

N-LINE vehicle health monitoring and fault diagnosis is essential to improve the vehicle availability via

bleshooting time. During on-line (dynamic) fault diagnosis, the test outcomes are obtained over time as compared to static fault diagnosis where the observed test outcomes are available

This work was supported, in part, by the Toyota Technical Center and the Office of Naval Research under contract no. 00014-00-1-0101. Any opinions expressed in this paper are solely those of the authors and do not represent those of the sponsor. Singh, Kodali, Choi, and Pattipati are with the Electrical and Computer Engineering Department, University of Connecticut, Storrs, CT 06269-2157, USA (e-mail: krishna/[email protected]). Namburu, Sean, Prokhorov, and Qiao are with the Toyota Technical Center,1555 Woodridge, RR #7, Ann Arbor, MI 48105, USA. Versions of this paper were published at the IEEE Aerospace Conference, Big Sky, Montana, March 2007 and the DX-07 Workshop, Nashville, TN, May 2007.

as a block. On-line vehicle health monitoring heavily relies on extensive processing of data in real-time, which is made possible by smart on-board sensors. Using these intelligent sensors, the system parameters that are essential to vehicle fault diagnosis can be transmitted to an on-board diagnostic inference engine. A significant technical challenge in on-board vehicle health monitoring is the quality of tests. Generally, the tests are imperfect due to unreliable sensors, electromagnetic interference,

2

environmental conditions, or aliasing inherent in the signature

algorithm. The DMFD problem is decoupled into a set of

analysis of on-board tests. The imperfect tests introduce ad-

parallel subproblems (involving dynamic single HMM state

ditional elements of uncertainty into the diagnostic process:

estimation problems) using Lagrange multipliers. A dynamic

the pass outcome of a test does not guarantee the integrity

programming technique (the Viterbi algorithm) is used to solve

of components under test because the test may have missed

each of the subproblems, and their solutions are used to update

a fault; on the other hand, a fail outcome of a test does not

the Lagrange multipliers via the subgradient method. Feasible

mean that one or more of the implicated components are faulty

(primal) solutions are constructed using the dual solutions. In

because the test outcome may have been a false alarm. Hence,

Sections V to VII, we discuss the details of DMFD problems 2,

it is desired that an on-board diagnostic algorithm should be

3 and 4, respectively. On-line DMFD problem is solved using

able to accommodate missed detections and false alarms in

a sliding window method, which is presented in Section VIII.

test outcomes. The performance of on-board diagnosis can

The simulation results of DMFD problem 1 is performed on

be improved by incorporating the knowledge of reliabilities

several real-world datasets to validate our approach. Section

of tests, and by incorporating temporal correlations of test

IX discusses the simulation results of both block and on-line

outcomes.

DMFD problems. Finally, the paper concludes with a summary

The hidden Markov model (HMM) is a natural choice

and future research directions in Section X.

here to represent the individual component states of the system. The HMM is a doubly-embedded stochastic process

II. P REVIOUS WORK

with an underlying unobservable (hidden) stochastic process

The multiple fault diagnosis (MFD) problem originates in

(individual component state evolution), which can be observed

several fields, such as medical diagnosis [1], error correcting

through another set of stochastic processes (i.e., uncertain test

codes, speech recognition, distributed computer systems and

outcome sequences). The individual component state HMMs

networks [2]. The MFD problem in large-scale systems with

are coupled through the observation process. Consequently,

unreliable tests was first considered by Shakeri et al. in

the fault diagnosis problem corresponds to a factorial HMM,

[3]. They proposed near-optimal algorithms using Lagrangian

where each HMM characterizes the individual component

relaxation and subgradient optimization methods for the static

states of the system. The sequence of uncertain test outcomes

MFD problem. In the area of distributed system management,

are probabilistic functions of the underlying Markov chains

the MFD problem is studied by Odintsova et al. in [2].

characterizing the evolution of system states. Here, we inves-

They utilized an adaptive diagnostic technique, termed active

tigate the problem of determining the most likely states of

probing, for fault diagnosis and isolation. A probe can be

components, given a set of partial and unreliable test outcomes

viewed as a test in our terminology; the purpose of a probe is

over time.

to check the set of system components on the probed path. The probe outcomes determine if one or more of the components

A. Organization of the Paper

on the probed path are faulty or normal. Given the probe out-

The paper is organized as follows. In Section II, we

comes, a diagnostic matrix (D-matrix, diagnostic dictionary,

discuss the previous research on multiple fault diagnosis.

reachability matrix) defining the relationship among the probes

We formulate the NP-hard dynamic multiple fault diagnosis

and component faults, as well as the initial system state, they

(DMFD) problem with imperfect test outcomes in Section

developed a sequential multi-fault algorithm to diagnose the

III. Four formulations of the DMFD problem are also dis-

system state. They considered the probe outcomes as being

cussed in Section III. In Section IV, we decompose the

deterministic, which is analogous to the assumptions made in

DMFD problem formulation 1 using Lagrangian relaxation

our Problem 4, and in the work described in [4]-[6]. In [7],

3

Le et al. applied graphical model-based decoding algorithms

space; consequently, it is either computationally expensive or

to the MFD problem in the presence of unreliable tests. They

far from optimal. A major contribution of the present paper

proposed a suboptimal belief propagation algorithm used to

is that the missed-detection/false-alarm process is modeled as

decode low density parity check codes. They considered a

being a property of the component state: the model is perhaps

fault model, where tests are asymmetric, i.e., the D-matrix is

less realistic, but the computational benefit of a factorial HMM

not binary and the test outcomes are also unreliable, and they

is large.

termed it the Y model. Their implementation is parallel to

Another approach, developed by Ruan et al. [10], decom-

our Problem formulation 1; however, they considered only the

poses the original DMFD problem into a series of decoupled

static case.

subproblems, one for each epoch. For a single epoch MFD,

Dynamic single fault diagnosis problem using HMM for-

they developed a deterministic stimulated annealing (DSA)

malism was first proposed by Ying et al. [8], where it is

method, which is inspired by its sibling stochastic simulated

assumed that, at any time, the system has at most one compo-

annealing and the approximate belief revision (ABR) heuristic

nent state present. This modeling is somewhat unrealistic for

algorithm [1]. The single epoch MFD was extended to incor-

most real-world systems. Another version of the dynamic fault

porate component states of multiple consecutive epochs. In

diagnosis problem was studied in [9]: unknown probabilities

addition, they applied a local search and update scheme to

of sensor error, incompletely-populated sensor observations,

further smooth the “noisy” diagnoses stemming from imper-

and multiple faults were allowed, but the faults could only

fect test results, and, consequently increase the accuracy of

occur or clear once per sampling interval.

fault diagnosis.

In the dynamic single fault framework [8], a hidden Markov

The DMFD problem can be viewed as a factorial HMM

modeling framework was adopted, and a moving window

(FHMM), which is discussed in the machine learning literature

Viterbi algorithm was used to infer the evolution of com-

[11]. Here, the HMM state is factored into multiple state

ponent states. In the multiple fault case, the state space of

variables and represented in a distributed manner. The authors

hidden Markov model increases exponentially from (m + 1)

discussed an exact algorithm for inference computations in the

to 2m , where m is the number of possible component states.

FHMM framework. In this framework, inference and learning

Consequently, the HMM-based method would be viable only

involves computing the posterior probabilities of multiple

for small-sized systems. The solution method proposed in [9]

hidden layers (or states), given the test outcomes. However,

is a multiple hypothesis tracking approach, where at each

due to combinatorial nature of the hidden state representation,

observation epoch, k best component state configurations are

the exact algorithm is intractable. They presented approximate

stored. In that paper, the missed-detection/false-alarm process

inference algorithms based on Gibbs sampling and variational

was a property of the sensor rather than the fault, with the

methods. The latter methods are similar to Lagrangian relax-

effect that the underlying inference process could not be

ation, although motivated from a Fenchel duality perspective

decoupled into a factorial HMM. In [9], at each epoch, all

[1], [3], [4], [12].

candidate fault sets, derived from the previously identified

Here, we extend the work of Ruan et al. [10], Shakeri et al.

faults are listed, based on at most one change per epoch

[3] and Tu et al. [4] on MFD to solve the DMFD problem by

assumption. Then, of all k(m + 1) possible candidate sets,

combining the Viterbi algorithm and Lagrangian relaxation in

each has its score calculated, the candidate set which obtains

an iterative way. Depending on the probabilistic assumptions

the highest score is selected as the inference result at the

on fault-test relationships and test outcomes, one obtains

epoch, and the candidates with the k-best scores are updated.

various DMFD formulations. In summary, the contributions

The method is equivalent to enumeration in a limited search

of this paper are: (1) A primal-dual optimization framework

4

Component m (HMM m)

xm ( k − 1)

xm ( k )

xm ( k + 1)

. . .

. . .

. . .

. . .

Component 2 (HMM 2)

x2 (k −1)

x2 (k )

x2 (k + 1)

x1 (k − 1)

x1 (k )

x1 (k + 1)

Op(k-1)

Op(k)

Op(k+1)

Of(k-1)

Of(k)

Of(k+1)

k-1

k

Hidden component states Component 1 (HMM 1)

Op: Passed test outcomes Of : Failed test outcomes Time

Fig. 1.

k+1

DMFD problem viewed as a factorial hidden Markov model (FHMM)

to solve the DMFD problem; (2) Four formulations of the

sources) associated with the system. The state of component

DMFD problem along with their solutions; (3) Simulation

si is denoted by xi (k) at epoch k, where xi (k) = 1 if

results on several real world systems for the first and most

failure source si is present; xi (k) = 0, otherwise. Here,

general formulation of the DMFD problem; (4) A comparison

κ = {0, 1, ..., k, ...K} is the set of discretized observation

of the results between the subgradient and the deterministic

epochs. The status of all component states at epoch k is

simulated annealing methods [10]; and (5) Simulation results,

denoted by x(k) = {x1 (k), x2 (k), ..., xm (k)}. We assume that

along with performance analysis, of the on-line DMFD prob-

the initial state x(0) is known (or its probability distribution

lem using a sliding window method.

is known). Our problem is to determine the time evolution of compo-

III. DMFD P ROBLEM F ORMULATIONS

nent states based on imperfect test outcomes observed over

The dynamic multiple fault diagnosis problem consists of

time. Fig. 1 shows the DMFD problem viewed as a FHMM.

a set of possible component states in a system, and a set

The hidden component state of ith HMM at time epoch k is

of binary test outcomes that are observed at each sample

denoted by xi (k). Each component state xi (k) is modeled

(observation, decision) epoch. Component states are assumed

as a two-state HMM. The observations at each epoch are

to be independent. Each test outcome provides information on

subsets of binary outcomes1 of tests O = {o1 , o2 , ..., on },

a subset of the component states. At each sample epoch, a

i.e., oj ∈ {pass, f ail} = {0, 1}. Fig. 2 shows the DMFD

subset of test outcomes is available. Tests are imperfect in the

problem as a tri-partite digraph at epoch k. Component states,

sense that the outcomes of some of the tests could be missing,

tests and test outcomes represent the nodes of the digraph.

and tests have missed-detection/false-alarm processes associ-

Here, the true states of the component states and tests are

ated with them. The observations consist of imperfect binary

hidden. P = {P d, P f } represents a set of probabilities of

test outcomes, and are characterized by sets of passed test

detection and false alarm, which is defined differently for

outcomes, Op and failed test outcomes, Of . Formally, we rep-

each of the DMFD problem formulations. We also define the

resent the DMFD problem as DM = {S, κ, T, O, D, P, Π}, where S = {s1 , ..., sm } is a finite set of m components (failure

1 Extension

forward.

to multi-valued component states and test outcomes is straight-

5

a failure source si and test tj . For notational convenience, Components

x1 (k )

x2 (k )

x3 (k )

x4 (k )

… xm ( k )

corresponding P dij = P fij = 0. This problem scenario

Hidden Tests

Test outcomes

when si does not affect the outcome of test tj , we let the

t1 (k )

t2 ( k )

t3 ( k )

t4 ( k )



tn ( k )

frequently arises in medical fault diagnosis. For example, the QMR-DT (Quick Medical Reference, Decision-Theoretic) database used in the domain of internal medicine, contains

o1 (k )

o2 (k )

o3 (k )

o4 (k )



on (k )

approximately 600 disease nodes (faults or failure sources) and 4000 symptoms (tests) [1]. Each of the symptoms could have

Fig. 2.

Tri-partite digraph for DMFD problem

a probability pair (P dij , P fij ) associated with the symptom and the disease node. Fig. 3 shows the bi-partite graph, where

matrix D = [dij ] as the dependency matrix (D-matrix), which

the edges represent the probability pair (P dij , P fij ). These

represents the full-order dependency among failure sources

probabilities can be obtained from the tri-partite digraph (Fig.

and tests.

2) using the total probability theorem as follows:

Each component state is modeled as a two-state nonhomogenous Markov chain. For each component state, e.g., for component si at epoch k, Π = (P ai (k), P vi (k)) denotes the set of fault appearance probability P ai (k) and

Pr(oj (k)|xi (k)) =

X

Pr(oj (k), tj (k)|xi (k))

tj ∈{0,1}

=

X

Pr(oj (k)|tj (k)) Pr(tj (k)|xi (k))

(1)

tj ∈{0,1}

fault disappearance probability P vi (k) defined as P ai (k) =

Problem 2: In situations where the probability of detection

Pr(xi (k) = 1|xi (k − 1) = 0) and P vi (k) = Pr(xi (k) =

(P dij ) is associated with each failure source-test pair, but the

0|xi (k − 1) = 1). These probabilities are required to model

false alarm probability is specified only for the normal system

the intermittent faults. Here, T = {t1 , t2 , ..., tn } is a finite set

state, i.e., P fj = P (oj (k) = 1|x1 (k) = 0, ..., xm (k) = 0), we

of n available binary tests, where the integrity of the system

obtain a slightly complicated variation of Problem formulation

can be ascertained. We denote the set of passed tests, Tp

1 (in terms of computational complexity, but not in terms

and failed tests Tf . At each observation epoch, k, k ∈ κ,

of parameterization). This type of scenario arises when we

test outcomes upto and including epoch k are available, i.e.,

design class-specific classifiers that distinguish between nor-

we let Ok = {O(b) = (Op (b), Of (b))}kb=1 , where Ok is the

mal system operation and failure source, si only, or when the

set of observed test outcomes at epoch k, with Op (b)(⊆ O)

false alarms are defined on an overall system basis. Here, the

and Of (b)(⊆ O) as the corresponding outcomes of sets of

probability pair (P dij ,P fj ) is associated with test outcomes to

passed and failed tests at epoch b, respectively. The tests are

model imperfect test outcomes [3]. This model is also called

partially observed in the sense that outcomes of some tests

the Z model in [7]. Similar to problem 1, the probability pair

may not be available, i.e., (Op (b) ∪ Of (b)) ⊂ O. In addition,

(P dij ,P fj ) is shown as edges between the hidden component

tests exhibit missed detections and false alarms. Here, we also

states and test outcomes in Fig. 3, and they can be obtained

make the noisy-OR (“causal independence”) assumption [13].

from the tri-partite digraph (Fig. 2) using the total probability

The DMFD problem can be formulated in the following ways,

theorem on the nodes of test layer.

arranged from the general to simplified:

Problem 3: When the detection probability (P dj ) and false

Problem 1: When the probability of detection (P dij ) and

alarm probability (P fj ) are associated with each test tj only.

false alarm probability (P fij ) are associated with each failed

The probability pair (P dj ,P fj ) is shown as the edges between

test and each failure source, i.e., P dij = Pr(oj (k) =

the tests and test outcomes in the tri-partite digraph (Fig.

1|xi (k) = 1) and P fij = Pr(oj (k) = 1|xi (k) = 0) of

3). This formulation is quite useful in classifier fusion using

6

{x(1), ..., x(K)}, that best explains the observed test outcome Components

x1 (k )

x2 (k )

x3 (k )

x4 (k )



xm (k )

Hidden

sequence OK . We formulate this as one of finding the maximum a posteriori (MAP) configuration:

Test outcomes

o1 (k )

o2 (k )

o3 (k )

o4 (k )



b K = arg max Pr(X K |OK , x(0)). X

on (k )

XK

Fig. 3.

Bi-partite graph for the DMFD problem

(2)

Applying the Bayes rule in (2), the objective function is equivalent to b K = arg max Pr(OK |X K , x(0))Pr(X K |x(0)). X

error correcting codes. In the error correcting code (ECC)

XK

matrix, each column corresponds to a binary classifier with the associated (P dj ,P fj ) pair, which are learned during training and validation. In this case, the fault-test relationships are deterministic, but the test outcomes are unreliable and depend on the concomitant test only. This type of formulation is also

With passed and failed test outcomes being conditionally independent given the status of component states (“the noisy OR assumption”), and the Markov property of component state evolution, the problem is equivalent to: b K = arg max X

considered in [9].

XK

This formulation provides a nice vehicle for the dynamic

K Y

{Pr(Op (k)|x(k)) ·

k=1

Pr(Of (k)|x(k)) · Pr(x(k)|x(k − 1))},

fusion of classifiers, where each column of the ECC matrix is

(3)

a classifier, and their associated probability pairs (P dj ,P fj )

where Op (k) ⊆ O and Of (k) ⊆ O denote the sets of passed

are uncertainties associated with classifier outcomes. When

and failed test outcomes at epoch k, respectively. We define a

the learned parameters and the ECC matrix are fed as an

new function fk (x(k), x(k − 1)) as

input to the DMFD algorithm, it performs dynamic fusion of

fk (x(k), x(k − 1)) = ln{Pr(Op (k)|x(k)) ·

classifier outputs over time. Note that the sampling interval

Pr(Of (k)|x(k)) · Pr(x(k)|x(k − 1))}.

(4)

of the dynamic fusion algorithm can be different from the sampling interval of the raw sensor data. Problem 4: This is the deterministic case when tests are perfect i.e. P dij = 1 and P fij = 0 [4]. This formulation reduces the tri-partite digraph in Fig. 2 to a bi-partite graph between the components and tests. This scenario is useful in situations where the tests are highly reliable (e.g., automated testing of electronic cards), and leads to a novel dynamic set covering problem. Next, we discuss the DMFD formulations in detail.

Given the component state status, x(k), the test outcomes are independent. Consequently: Y Pr(Op (k)|x(k)) =

In this problem, we assume that the detection and false alarm probabilities (P dij , P fij ) are associated with each failure source and each test. Fig. 4 illustrates these probabilities.

(5)

Pr(oj (k) = 1|x(k)).

(6)

and Y

Pr(Of (k)|x(k)) =

oj (k)∈Of (k)

For test tj to pass at epoch k, it shall pass on all its associated component states, so that Pr(oj (k) = 0|x(k)) =

IV. DMFD P ROBLEM 1

Pr(oj (k) = 0|x(k)),

oj (k)∈Op (k)

m Y

Pr(oj (k) = 0|xi (k))

(7)

i=1

where

  1 − P f , x (k) = 0; ij i Pr(oj (k) = 0|xi (k)) =  1 − P d , x (k) = 1; ij i

= (1 − P dij )xi (k) (1 − P fij )1−xi (k) , xi (k) ∈ {0, 1}.

(8)

The DMFD problem is one of finding, at each decision epoch k, the most likely fault state candidates x(k) ∈ {0, 1}m , i.e., the fault state evolution over time, X K =

Evidently, Pr(oj (k) = 1|x(k)) = 1 − Pr(oj (k) = 0|x(k)).

(9)

7

Normal state

Component

+xi (k − 1)xi (k) ln(1 − P vi (k))}.

Faulty state

xi (k ) = 0

xi (k ) = 1

x(k), x(k − 1) ∈ {0, 1}m

1 − Pdij 1 − Pf ij Pf ij Test Outcome

(13)

Pdij

o j (k ) = 0

The primal DMFD problem posed in (12) and (13) is NP-

o j (k ) = 1

hard. Indeed, even the single epoch problem, i.e., x b(k) = arg max fk (x(k), x b(k − 1)), is NP-hard [3], which, for all

Fig. 4.

x(k)

Detection and false alarm probabilities for problem 1

practical purposes, means that it cannot be solved to optimality within a polynomially bounded computation time.

In the same vein, the assumption of independent evolution of component states leads to: Pr(x(k)|x(k − 1)) =

m Y

A. Primal-dual optimization framework Pr(xi (k)|xi (k − 1))

(10)

i=1

The NP-hard nature of the primal DMFD problem motivates us to decompose it into a primal-dual problem using

where,

a Lagrangian relaxation approach. By defining new variables

Pr(xi (k)|xi (k − 1)) =    1 − P ai (k) xi (k − 1) = 0, xi (k) = 0;      P a (k) xi (k − 1) = 0, xi (k) = 1; i  P v (k)  xi (k − 1) = 1, xi (k) = 0; i      1 − P v (k) x (k − 1) = 1, x (k) = 1; i i i

and constraints, the DMFD problem reduces to a combinatorial optimization problem with a set of equality constraints. The constraints are relaxed via Lagrange multipliers. The relaxation procedure generates an upper bound for the objective function. The procedure of minimizing the upper

Equivalently,

bound via a subgradient optimization produces a sequence of

Pr(xi (k)|xi (k − 1)) = (1 − P ai (k))

(1−xi (k−1))(1−xi (k))

·

dual feasible and the concomitant primal feasible solutions

P ai (k)(1−xi (k−1))xi (k) · P vi (k)xi (k−1)(1−xi (k)) ·

to the DMFD problem. If the objective function value for the best feasible solution and the upper bound are the same,

(1 − P vi (k))xi (k−1)xi (k) ;

the feasible solution is the optimal solution. Otherwise, the xi (k − 1), xi (k) ∈ {0, 1}.

(11)

So, the problem that is equivalent to (3) is as follows b K = arg max X XK

K X

fk (x(k), x(k − 1)).

difference between the upper bound and the feasible solution, termed the approximate duality gap, provides a measure of

(12)

k=1

where

suboptimality of the DMFD solution; this is a key advantage of our approach. Another advantage of the primal-dual method is that, although the primal DMFD problem is not concave, the

fk (x(k), x(k − 1)) = m X

X

[xi (k) ln(1 − P dij ) + (1 − xi (k))ln(1 − P fij )]

oj ∈Op (k) i=1

dual DMFD problem is a piecewise convex function, which can be optimized via the subgradient method. In order to write the primal DMFD problem, we define new variables Y K = {y(1), y(2), ..., y(K)} and y(k) = {yj (k), ∀oj ∈ Of (k)} such

+

X

ln[1 −

oj ∈Of (k)

+

m X

m Y

(1 − P dij )xi (k) (1 − P fij )(1−xi (k)) ]

i=1

{(1 − xi (k − 1))(1 − xi (k)) ln(1 − P ai (k))

that ln yj (k) = where,

+xi (k − 1)(1 − xi (k)) ln(P vi (k))

i=1

µ

i=1

+(1 − xi (k − 1))xi (k) ln(P ai (k))

m P

cij = ln

cij xi (k) + ηj ,

1 − P dij 1 − P fij

¶ , ηj =

∀oj ∈ Of (k), m X

ln(1 − P fij ).

(14)

i=1

After simple algebraic manipulations of (13) and using (12)

8

Two-level Lagrangian relaxation approach

Original DMFD problem

Appending constraints (14) to (16) via Lagrange multipliers

Update Lagrange multipliers using subgradient method

{λj (k)}oj ∈Of (k) , the Lagrangian function L(X, Y, Λ) can be written as

... Solve subproblem 1 using binary Viterbi algorithm

...

K P

L(X, Y, Λ) =

fk (x(k), x(k − 1), y(k))

k=1

Solve subproblem m using binary Viterbi algorithm

P

+

λj (k)(ln yj (k) −

∀oj ∈Of (k)

m P

cij xi (k) − ηj ),

i=1

(18)

where Λ = {λj (k) ≥ 0, k ∈ (1, K), oj ∈ Of (k)} is the set of Fig. 5.

Decomposition of the original DMFD problem

Lagrange multipliers. In (19), Lagrange multipliers {λj (k)} are nonnegative despite equality constraints (14), because the

and (14), the primal problem can be written as K X

max J(X, Y ) = max

X K ,Y K

where

X K ,Y K

the

component

yj (k) need to be nonnegative. Using the Lagrange multiplier theorem, we optimize the Lagrangian function in (17) w.r.t.

fk (x(k), x(k − 1), y(k)),

k=1

state

(15) sequence

XK

is

yj (k) to obtain optimal yj∗ (k) as yj∗ (k) =

=

λj (k) . 1 + λj (k)

(19)

{x(1), x(2), ..., x(K)}. Here, the primal objective function for

The dual of the primal DMFD problem as posed in (15)-(17),

an individual component state i.e., fk (x(k), x(k − 1), y(k))

can be written as

is defined as

min m X

X

fk (x(k), x(k − 1), y(k)) =

Q(Λ)

Λ

subject to Λ = {λj (k) ≥ 0, k ∈ (1, K), oj ∈ Of (k)}

cij xi (k)

(20)

oj ∈Op (k) i=1

+

m X

X

µi (k)xi (k) +

i=1

where the dual function Q(Λ) is defined by

ln(1 − yj (k))

oj ∈Of (k)

+

m X

Q(Λ) = max L(X, Y, Λ).

σi (k)xi (k − 1) + γ(k) + g(k)

Substituting (19) into (20) and simplifying further by rear-

i=1

+

m X

hi (k)xi (k)xi (k − 1)

(16)

i=1

ranging and combining the terms, we obtain the dual function as Q(X, Λ) = max

where, γ(k) = µ

X oj ∈Op (k)

XK

µ



(1 − P ai (k))(1 − P vi (k)) P ai (k)P vi (k)

g(k) =

m X

ln(1 − P ai (k)).

Qi (Λ) =

Qi (Λ).

(22)

i=1

K X

ξi (xi (k), xi (k − 1), λj (k)) +

k=1

ξi (xi (k), xi (k − 1), λj (k)) = Ã P P cij + µi (k) −



oj ∈Op (k)

,

1 wk (Λ) m

(17)

Note that the multiple HMMs are coupled here because their states are observed only via a set of test outcomes. In equation (16), the terms involving yj (k) and hi (k) show the coupling

(23)

! cij λj (k) xi (k)

oj ∈Of (k)

+σi (k)xi (k − 1) + hi (k)xi (k)xi (k − 1)

i=1

effects.

m X

Here

ηj ,

P ai (k) µi (k) = ln , 1 − P ai (k) µ ¶ P vi (k) σi (k) = ln , 1 − P ai (k) hi (k) = ln

(21)

X K ,Y K

(24)

and wk (Λ) = γ(k) + g(k) X + λj (k) ln λj (k) − λj (k)ηj ∀oj ∈Of (k)



X

∀oj ∈Of (k)

(1 + λj (k)) ln(1 + λj (k))

(25)

9

problems. This level facilitates coordination among each of

Step 1: Initialize Lagrange multipliers

the subproblems, and can thus reside in a diagnostic control

Λ = {λ j (k ), k ∈ (1, K ), j ∈ O f (k )}

unit. At the bottom level, we use the dynamic programming

Input fault universe Input test outcomes for K epochs

technique (the Viterbi algorithm) to solve each of the subprob-

1

lems with computational complexity O(K), i.e., we optimize ξi function in (24) to obtain the optimal state sequence x∗i Step 2:

Step 2:

Find optimal sequence for component state 1 . using Viterbi algorithm for fixed Λ l

..

Find optimal sequence for component state m using Viterbi algorithm for fixed Λ l

for each component state, given a fixed set of Lagrange multipliers Λ = {λj (k), k ∈ (1, K), oj ∈ Of (k)}. The Viterbi algorithm is a dynamic programming technique to find the most likely fault sequence [14]. It finds a recursive optimal solution to the problem of estimating the state sequence of a finite state Markov chain observed in memoryless noise. The

Step 3: Compute current primal and dual values, lower and upper bounds

key feature of Viterbi algorithm is that the objective function can be written as a sum of merit functions depending on one state and its preceding one. We obtain the optimal state

Step 4:

sequence for each component state, i.e. X ∗ = {x∗1 , x∗2 , ..., x∗m }

Update Lagrange multipliers via subgradient method

using a binary Viterbi algorithm. The key steps of the Viterbi algorithm are described in Appendix I.

Step 5: Meet stopping criteria?

No

Yes

B. Approximate and Exact Duality Gap After evaluating the optimum state sequence X ∗ for fixed Λ, the problem reduces to one of minimizing the dual function value Ql (Λ) = Q(X, Y, Λ) at iteration l, which is computed

Output: Feasible solution (most likely state sequence )

using (22)-(25). Denoting Q∗ as the optimal dual function value, i.e., Q∗ = Q(Λ∗ ) = min Q(Λ), where the dual problem Λ

is given by (22)-(25). The optimal primal solution is denoted Fig. 6.

Flow chart of the algorithm

by J ∗ = J(X ∗ , Y ∗ ) =

max J(X, Y ), where the primal

X K ,Y K

problem is given by (15)-(17). represents the dual function for the ith component. The main

The difference between the optimal dual and the primal

benefit of (22) is that now the original problem is separable.

function values, i.e. (Q∗ − J ∗ ) is termed the exact duality

As shown in Fig. 5, we employed the Lagrangian relaxation

gap. Since the DMFD problem is NP-hard, it is difficult to

method to decompose the original DMFD problem into m

obtain the global optimal solution J ∗ . However, we can obtain

separable subproblems, one for each component state sequence

several feasible solutions from the dual solution and select the

xi , where xi = (xi (1), xi (2)), ..., xi (K)), xi (k) ∈ {0, 1}

best feasible solution from the set. If Jf = J(Λ∗ , X f , Y f ) be

and i ∈ {1, m}. This scheme can be viewed as a two-level

the best feasible value, then we have

coordinated solution framework for the DMFD problem. At the top (coordination) level, we update Lagrange multipliers

Jf ≤ J ∗ ≤ Q∗ ≤ Ql .

(26)

Λ = {λj (k), k ∈ (1, K), oj ∈ Of (k)} using the subgradient

Using this method, we can obtain an approximate duality

method based on decoupled solutions of the individual sub-

gap Q∗ − Jf = (Q∗ − J ∗ ) + (J ∗ − Jf ) ≥ 0, which

10

Components Normal state x1 (k ) = 0

V. DMFD P ROBLEM 2

Normal state

...

Faulty state

xm (k ) = 0

In this formulation, we define P dij as P dij = Pr(oj (k) =

xi (k ) = 1

1|xi (k) = 1) and P fj = Pr(oj (k) = 1|x1 (k) = 0, x2 (k) =

1 − Pdij

0, ..., xm (k) = 0). This scenario is depicted in Fig. 7.

Pdij Test 1 − Pf j Outcome o j (k ) = 0

Fig. 7.

Pf j

Pr(oj (k) = 0|x(k)) =

o j (k ) = 1

(1 − P fj )(1−x1 (k))...(1−xm (k))

Detection and false alarm probabilities for problem 2

Using a new variable z(k) =

m Q i=1

provides an overestimate of the error between the global

m Q i=1

Pr(oj (k) = 0|x(k)) = (1 − P fj )z(k)

m Y

(1 − P dij )xi (k) ,

i=1

Taking log

bound Qlb as follows: If J(X ∗ (Λl ), Y (X ∗ (Λl )) ≥ Qlb then X f = X ∗ (Λl ),

(29)

(1 − xi (k)) and

optimal solution and the best feasible solution. To summarize, we update feasible solutions, i.e., X f , Y f , and the lower

(1 − P dij )xi (k)

ln z(k) =

Y f = Y (X ∗ (Λl )) and

m X

ln(1 − xi (k))

(30)

i=1

Qlb = Jf = J(X ∗ (Λl ), Y (X ∗ (Λl )).

ln(Pr(oj (k) = 0|x(k))) = z(k) ln(1 − P fj )

(27)

The upper bound Qub is obtained using the current dual value

+

m X

xi (k) ln(1 − P dij )

(31)

i=1

l

Q as follows

Following steps similar to those in Problem 1, we have Qub = Qmin = min(Qmin , Ql ).

(28) ln(Pr(Op (k)|x(k))) P = ln(Pr(oj (k) = 0|x(k)))

Since the dual function Ql (Λ) is a piecewise differentiable

oj (k)∈Op (k)

function of Lagrange multipliers Λ, this problem cannot be

P

=

solved using differentiable optimization algorithms. We use a subgradient algorithm to compute a sequence of upper

P

+

described in Appendix II. Fig. 6 shows the flow chart of our

= z(k)ηj (k) +

m X

algorithm. There are five major steps. In step 1, we initialize

and test information along with the associated probabilities (Pdij , Pfij , Pai and Pvi ). We also input test outcomes for the K epochs. In step 2, we run m binary Viterbi algorithms to obtain the optimal state sequences corresponding to the m f

f

faults. In step 3, we update the feasible solutions, i.e., X , Y , and the lower bound and upper bound i.e., Qlb and Qub using (24)-(26). Next, the Lagrange multipliers are updated using

algorithm outputs the most likely component state sequence for the m components.

xi (k) ln(1 − P dij )

X

xi (k) ln(1 − P dij )

(32)

i=1 oj (k)∈Op (k)

P

where ηj (k) =

ln(1 − P fj ), and z(k) is defined

oj (k)∈Op (k)

in (30). For failed tests

P

ln(Pr(Of (k)|x(k))) = =

P

ln(Pr(oj (k) = 1|x(k)))

oj (k)∈Of (k)

ln(1 − yj (k))

oj (k)∈Of (k)

where yj (k) = Pr(oj (k) = 0|x(k)) and using (31), ln(yj (k)) = z(k) ln(1 − P fj ) +

the subgradient method, which is described in Appendix II. If stopping criteria, defined in Appendix II, are met, then the

m P

oj (k)∈Op (k) i=1

bounds for Ql (Λ)[12]. The details of subgradient method are

the Lagrange multipliers and the input fault universe i.e., fault

z(k) ln(1 − P fj )

oj (k)∈Op (k)

m X

xi (k) ln(1 − P dij ) (33)

i=1

Here the DMFD problem is equivalent to b K = arg max X XK

K X k=1

fk (x(k), x(k − 1), y(k), z(k))

(34)

11

where, the primal objective function for an individual compo-

and optimizing w.r.t. z(k), we obtain optimal z ∗ (k) as

nent state i.e., fk (x(k), x(k − 1), y(k), z(k)) is defined as fk (x(k), x(k − 1), y(k), z(k)) = z(k)ηj (k) m X X + xi (k) ln(1 − P dij )

z ∗ (k) =

P

−ηj (k) +

µ(k) . λj (k) ln(1 − P fj )

(39)

∀oj ∈Of (k)

The dual function Q(Λ) of problem 4 is defined by

i=1 oj (k)∈Op (k)

X

+

ln(1 − yj (k)) +

+

σi (k)xi (k − 1) +

i=1

Q(Λ) =

τi (k)xi (k)

i=1

oj (k)∈Of (k) m X

m X

m X

L(X, Y, z, Λ).

(40)

Substituting (38), (39) into (37) and simplifying further by

hi (k)xi (k)xi (k − 1) + g(k) (35)

i=1

rearranging and combining the terms, we obtain the dual function as

where ln z(k) =

m X

Q(Λ) = max

ln(1 − xi (k)),

XK

i=1

X

ηj (k) =

ln(yj (k)) = z(k) ln(1 − P fj ) + µ τi (k) = ln µ σi (k) = ln µ hi (k) = ln

m X

Qi (Λ) = xi (k) ln(1 − P dij ),

i=1

P ai (k) 1 − P ai (k) P vi (k) 1 − P ai (k)

g(k) =

m X

, ¶ ¶

ln(1 − P ai (k)).

(36)

1 wk (λj (k), µ(k)) m

oj (k)∈Op (k)

X



λj (k)xi (k) ln(1 − P dij ) + σi (k)xi (k − 1) +hi (k)xi (k)xi (k − 1) − µ(k) ln(1 − xi (k)) (43)

and

fk (x(k), x(k − 1), y(k), z(k))

k=1

Ã

+µ(k) ln z(k) −

m X

wk (λj (k), µ(k)) =



!

 µ(k) 

ln(1 − xi (k))

i=1

−ηj (k) +

X

+

λj (k) (ln yj (k) − z(k) ln(1 − P fj )) m X

∀oj ∈Of (k)

[λj (k) ln λj (k) − (1 + λj (k)) ln(1 + λj (k))]



xi (k)λj (k) ln(1 − P dij )

(37)

∀oj ∈Of (k) i=1

 −µ(k) 

X

∀oj ∈Of (k)

where Λ = {µ(k), λj (k) ≥ 0, k ∈ (1, K), oj ∈ Of (k)} is



ηj (k) + ln(µ(k))  P  + g(k) λj (k) ln(1 − P fj )

∀oj ∈Of (k)

∀oj ∈Of (k)

X

(42)

oj (k)∈Of (k)

can be written as



ξi (xi (k), xi (k − 1), λj (k), µ(k))

ξi (xi (k), xi (k − 1), λj (k), µ(k)) X = xi (k) ln(1 − P dij ) + xi (k)τi (k)

,

µ(k), {λj (k)}j∈Of (k) , the Lagrangian function L(X, Y, z, Λ)

X

(41)

and

,

Appending constraints (30) and (33) via Lagrange multipliers

K X

Qi (Λ)

i=1

k=1

+

i=1

L(X, Y, z, Λ) =

K X



(1 − P ai (k))(1 − P vi (k)) P ai (k)P vi (k)

m X

where

ln(1 − P fj ),

oj (k)∈Op (k)

+

max

X K ,Y K ,z

 λj (k) ln(1 − P fj )  P  (44) −ηj (k) + λj (k) ln(1 − P fj ) ∀oj ∈Of (k)

the set of Lagrange multipliers. Using the Lagrange multiplier

The dual problem posed in (40)-(44) is separable and it can

theorem, we optimize the Lagrangian function in (34) w.r.t.

be solved by following a procedure similar to that used for

yj (k) to obtain optimal yj∗ (k) as

solving Problem 1. The only difference is that we also need

yj∗ (k) =

λj (k) , 1 + λj (k)

to update the Lagrange multiplier µ(k) using a subgradient (38)

method.

12

at least one component state xi (k) = 1 that satisfies dij = 1. Test

t j (k ) = 0

t j (k ) = 1

Thus, there must be one or more failure sources that cover the

1 − Pd j 1 − Pf j

failed tests. Let us consider a matrix A, which has each row

Pd j

representing the list of failure sources covered by a failed test.

Pf j Test Outcome

o j (k ) = 0

After excluding the failure sources covered by the passed tests,

o j (k ) = 1

the resulting matrix A is a binary matrix such that aij = dji . After substituting P dij = 1 and P fij = 0 in (13), the reliable Fig. 8.

Detection and false alarm probabilities for problem 3

test scenario with a binary D-matrix simplifies to a dynamic set covering problem with the following objective function term

VI. DMFD P ROBLEM 3

at epoch k:

In this formulation, we consider the case where the probabilities of detection and false alarm (P dj , P fj ) are asso-

fk (x(k), x(k − 1)) =

m X

+hi (k)xi (k)xi (k − 1)} + g(k) (47)

ciated only with each test tj (see Fig. 8). Formally, P dj = Pr(oj (k) = 1|tj (k) = 1) and P fj = Pr(oj (k) = 1|tj (k) = 0). We can convert these probabilities into a special case of Problem 1 by computing (P dij ,P fij ) using (1): P dij = (dij )P dj + (1 − dij )P fj

{µi (k)xi (k) + σi (k)xi (k − 1)

i=1

subject to following constraints: A(k)x(k) ≥ e for tj (k) ∈ Tf (k) where e is a vector of one’s. Appending constraints to (47) via Lagrange multipliers

(45)

Λ = {λj (k) ≤ 0, k ∈ (1, K), tj ∈ Tf (k)}, the Lagrangian function L(X, Λ) can be written as

Similarly P fij = (dij )P fj + (1 − dij )P dj

L(X, Λ) =

(46)

fk (x(k), x(k − 1))

k=1

Here, D = [dij ] is the dependency matrix (D-matrix). The

+

P

µ ¶ m P λj (k) 1 − aji xi (k) i=1

∀tj ∈Tf (k)

solution of Problem 3 can be obtained by substituting P dij and P fij in (45)-(46) in the solution of Problem 1.

K P

(48)

After rearranging the terms, the Lagrangian function of the original problem is shown as a sum of the Lagrangian func-

VII. DMFD P ROBLEM 4

tions of each subproblem as follows:

Next, we consider the case when the system consists of

L(X, Λ) =

Li (xi (k), Λ)

where K P

P

equivalently, the D-matrix completely characterizes the fault-

Li (xi (k), Λ) = µi (k)xi (k) −

test relationships [4]. This formulation can be represented as

+σi (k)xi (k − 1) + hi (k)xi (k)xi (k − 1) Ã ! P 1 λj (k) + m g(k) +

a bipartite graph between the components and tests. In this case, if some tests have passed, then we can infer that all the failure sources covered by these tests are good components.

(49)

i=1

reliable tests, and the fault-test relationships are deterministic, i.e. P dij = 1 and P fij = 0 for i = 1, ..., m and j = 1, ..., n or

m X

aji xi (k)

k=1 tj ∈Tf (k)

(50)

∀tj ∈Tf (k)

The dual function Q(Λ) is defined by

Thus, we need to infer failed components from those covered by the failed tests only, i.e., by excluding those

Q(Λ) = max L(X, Λ).

components covered by the passed tests. Consequently, the size

Simplifying further by rearranging and combining the terms,

of the DMFD problem can be reduced by removing all failure

we obtain the dual function as

sources {si |P fik = 0, P dik = 1, and tj (k) ∈ Tp (k)}. For each failed test tj (k) ∈ Tf (k), the optimal solution contains

XK

Q(Λ) = max XK

m X i=1

Qi (Λ)

(51)

(52)

13

where



Qi (Λ) =

K X

ξi (xi (k), xi (k − 1), λj (k)) +

k=1

1 wk (Λ), (53) m

Initialize component states at k = 2 using the results of previous window



Solve the online DMFD problem using the data from k = 2 to k = W + 1

ξi (xi (k), xi (k − 1), λj (k)) = µi (k)xi (k) +σi (k)xi (k − 1) + hi (k)xi (k)xi (k − 1) X λj (k)aji (k)xi (k) −



Step 3: Continue sliding the window until k = K − W + 1. (54)

tj ∈Tf (k)

and wk (Λ) = g(k) +

Make a decision at epoch k = 2

Selection of window size is a key issue and it depends on the system and fault behavior, i.e., permanent or intermittent

X

faults. λj (k).

(55)

∀tj ∈Tf (k)

The dual problem defined in (53)-(55) is separable. The Viterbi algorithm is used to solve each subproblem corresponding to each component state sequence {xi (k)}K k=1 . This algorithm can be viewed as a dynamic set covering problem, which is

IX. S IMULATIONS AND RESULTS We implemented and applied the solution of problem 1, the most general version of the DMFD problem formulation, to a small-scale system and few real world models.

NP-hard. Thus, the dynamic set covering problem is solved by combining the Viterbi algorithm and Lagrangian relaxation.

A. Small-scale system

This generalizes Beasley’s Lagrangian relaxation algorithm for

We randomly generated a small-scale system to illustrate

the static set covering problem [4], [15] to dynamic settings.

the inputs and outputs of our algorithm. The model was

We will explore the applications of this algorithm in our future

constructed for a system with 20 components, 20 tests and 20

work [17].

observation epochs. Each component can have binary states i.e. normal and faulty. The detection probabilities were set

VIII. S LIDING W INDOW DMFD M ETHOD During the online monitoring of a system, the observations and potential fault sequences are usually very long. Hence in order to reduce the amount of computation and storage, the DMFD problem is solved using a sliding window method. The window size, W is selected based on the performance criteria, such as low classification error and low false isolation rate. One of the key advantages of the sliding window method is that Lagrange multipliers are available W -1 samples ahead, which improves the speed of dual optimization. The sliding window method involves following steps: Step1: Solve the DMFD problem for the window size W (W ≤ K). Make a decision at epoch k = 1 Step 2: Move the window by 1 time epoch, i.e., k = 2 to k =W +1 •

between 0.7-0.9 and the false alarm probabilities were set between 0-0.02, and the tests uniformly cover the component states. The fault appearance and disappearance probabilities were varied between 0.0049-0.0051 and 0.00025-0.00033, respectively. These probabilities were chosen such that the average number of faults was 2 overa a span of 20 epochs. The true fault state set and accordingly the test outcomes of each epoch are generated using the above model parameters. The stopping criteria as defined in Appendix II were used for the subgradient method. We used the following metrics to evaluate the performance of our algorithm: Correct isolation rate (CI): CI is the percentage of true fault states which are detected by the algorithm at epoch k. Let x ˆ(k) be the fault state set at epoch k detected by algorithm, and r(k) is the true fault state set at epoch k. Then CI and average CI over all epochs are obtained as follows:

Initialize (W − 1) Lagrange multipliers using previous window

CI(k) =

|ˆ x(k) ∩ r(k)| |r(k)|

(56)

14

TABLE I S MALL - SCALE SCENARIO FOR SIMULATIONS Fault model (si , tj , P dij , P fij )

Pa

Pv

0.0050

0.00026

0.0049

0.00027

0.0050

0.00026

.

.

.

.

.

.

.

.

(19,18,0.88,0.016)

0.0051

0.00033

(20,11,0.82,0)

0.0051

0.00028

s1

(1,3,0.80,0.01)

(1,13,0.75,0)

(1,19,0.74,0)

s2

(2,10,0.86,0.01)

(2,12,0.88,0)

s3

(3,1,0.80,0.015)

(3,16,0.72,0.01)

.

.

.

.

.

.

.

.

.

.

s19

(19,12,0.85,0.015)

s20

(20,8,0.735,0.011)

Test outcomes k

No. of failed

Failed test outcomes Of (k)

tests

No. of passed

Passed test outcomes Op (k)

tests

1

1

3

19

1 2 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

2

0

φ

20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

3

2

11 18

18

1 2 3 4 5 6 7 8 9 10 12 13 14 15 16 17 19 20

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

19

5

3 10 12 16 17

15

1 2 4 5 6 7 8 9 11 13 14 15 18 19 20

20

6

3 4 10 12 16 17

14

1 2 5 6 7 8 9 11 13 14 15 18 19 20

TABLE II

−45 Lower bound Upper bound Dual fun. current value

R ESULTS FOR SMALL - SCALE SCENARIO 13.2

Correct isolation rate (CI)

100.0

False isolation rate (F I)

0.0

Primal function value (J)

-68

Dual function value (Q)

-59

Computation time (sec) (t)

0.12

−55

−60 Bounds

Approximate duality gap (D) (%)

−50

−65

−70

−75

K P

CI =

−80

CI(k)

k=1

K

−85

(57)

0

5

10

15

20 Iterations

25

30

35

40

False isolation rate (FI): FI is percentage of fault states which are falsely detected by the algorithm as fault state at epoch k.

Fig. 9.

Approximate duality gap

FI and average F I are computed as F I(k) =

|ˆ x(k) ∩ ¬r(k)| |S| − |r(k)| K P

FI =

(58)

t19 with (P d1,3 , P f1,3 ) = (0.80, 0.01), (P d1,13 , P f1,13 ) =

F I(k)

k=1

K

example, state of component 1 can be detected by t3 , t13 and (0.75, 0), and (P d1,19 , P f1,19 ) = (0.74, 0), respectively. This

(59)

implies that when component state s1 occurs, test t3 detects

Table I shows only partial data (5 rows) of a small-scale exam-

it with probability 0.80, and if component state s1 does not

ple. Any component state can be detected by several tests. For

occur, test t3 has 0.01 probability of falsely implicating it. Sim-

15

TABLE III

sion system (Helitrans), and an engine simulator (EngineSim).

R EAL WORLD MODELS

Automotive

Details of these models are provided in [4]. Here, m, n, and c

m

n

P dij P fij

c, P ai

22

60

(0.85-0.95),

3, 9.13e-04

0-0.02

denote the number of components (failure sources), number of tests and the average number of intermittent faults that

Docmatch

257

180

(0.6-1),0

9, 3.12e-04

can occur over a span of 100 epochs. The fault appearance

Powerdist

96

98

(0.6-1),0

3,3.13e-04

probabilities (P ai ) were computed based on the average

Helitrans

34

51

(0.6-1),0

2, 2.95e-04

EngineSim

53

30

(0.6-1),0

2,5.68e-04

number of intermittent faults (c). These real-world systems are not ideal because they have fewer tests as compared to failure sources; hence, some failure sources are not covered

ilarly, test t13 detects s1 with probability 0.75, and has no false

by any tests. The fault disappearance probabilities (P vi ) were

alarms. If s1 is not present at epoch k, then at epoch k +1, the

varied between 0.0025-0.0049 to allow c intermittent faults,

probability of having s1 present is 0.005, while if s1 is present

on average. The probabilities of detection and false alarm

at epoch k, s1 has a probability 0.00026 of disappearing at

were varied as shown in Table III. Here, J , Q, D, CI, F I

epoch k + 1. The test outcomes at epochs k = 1, 2, 3, 19 and

and t denote the average primal function value, the average

20 are shown in the table; test outcomes at other epochs are not

dual function value, the average approximate duality gap, the

shown here for simplicity and for saving space. For example,

average correct isolation rate, the average false isolation rate

at k = 1, the outcome of tests t3 is observed as having failed,

and the average computation time per epoch. The maximum

while the outcome of all other tests except t3 is observed as

number of subgradient iterations was set at 80 and 100 Monte

having passed. The DMFD problem here is to identify the

Carlo runs were used to generate the test outcomes. Table

evolution of component states over the 20 epochs. The results

IV shows the results obtained using the subgradient (S) and

for this model are shown in Table II. Here, J, Q, D, CI, F I

the deterministic simulated annealing (DSA) [10] methods.

and t denote the primal function value, the dual function value,

The subgradient method (S) achieves higher correct isolation

the approximate duality gap, the correct isolation rate, the

rates as compared to (DSA) for all systems except Helitrans.

false isolation rate and the computation time per epoch. The

However, the DSA method achieves better primal function

primal and dual function values are computed using (15)-(17)

value and is also effective in reducing the computation time

and (22)-(25), respectively. The approximate duality gap (D)

(t). Also, note that we can obtain a hybrid duality gap by

is computed as a ratio of the difference between Q and J

taking the maximum primal solution from the subgradient (S)

divided by the absolute value of the primal feasible value J.

and the deterministic simulated annealing (DSA) methods and

The algorithms were implemented in MATLAB. We used a

the dual function value from the subgradient (S) method. The

standard PC having Pentium 4 Processor with 3.0 GHz clock

hybrid DSA-subgradient (HS) duality gaps are also shown in

speed and 512 MB RAM. The approximate duality gap is

Table V. The average computation time ( t ) is measured in

also shown in Fig. 9 is 13.2%. The duality gap reduces as the

seconds. These numbers are attractive practically, and they can

number of iterations increases, and the subgradient method

be further reduced significantly by a careful implementation

converges to the minimum dual function value.

in the C language. We also showed an application of the DMFD Problem 3

B. Real world data sets

formulation in our recent paper [16] where we performed

Table III illustrates the model parameters of an automotive

dynamic fusion of classifiers over time for automotive engine

system, a document matching system (Docmatch), a power

fault diagnosis. The temporal correlations considered by dy-

distribution system (Powerdist), a UH-60 helicopter transmis-

namic fusion improve classification accuracy over a variety of

16

TABLE IV R ESULTS ON REAL WORLD MODELS J

Q

D

CI

FI

t

(%) Automotive

Docmatch

Powerdist

Helitrans

EngineSim

S

-658

-481

27

99.5

0.05

0.43

DSA

-775





75

1.30

0.01

HS

-658

-481

27

S

-541

-311

42.5

88.2

0.36

4.53

DSA

-405





69

0.70

0.24

HS

-405

-311

23.2

S

-232

-125

46.1

91.6

0.75

1.56

DSA

-157





84

0.30

0.05

HS

-157

-125

20.3

S

-15

-14

6.7

94.8

0.31

0.47

DSA

-15





100

0.0

0.02

HS

-15

-14

6.7

S

-85

-33

61.1

95.1

2.28

0.64

DSA

-51





86

0.3

0.02

HS

-51

-33

35.3

Automotive system

Automotive system 20 False isolation rate

Correct isolation rate

100 80 60 40 20 0

15 10 5 0

1

5

10 15 20 Window size

25

1

Document matching system

25

2.5 False isolation rate

Correct isolation rate

10 15 20 Window size

Document matching system

100 80 60 40 20 0

2 1.5 1 0.5 0

1

Fig. 10.

5

5

10 15 20 Window size

25

Boxplots of CI and FI for automotive and document matching system

1

5

10 15 20 Window size

25

17

Power distribution system

Power distribution system 4 80

False isolation rate

Correct isolation rate

100

60 40 20 0

3 2 1 0

1

5

10 15 Window size

20

25

1

UH−60 helicopter transmission system

20

25

UH−60 helicopter transmission system

6 80

False isolation rate

Correct isolation rate

10 15 Window size

7

100

60 40 20

5 4 3 2 1

0

0 1

Fig. 11.

5

5

10 15 Window size

20

25

1

5

10 15 Window size

20

25

Boxplots of CI and FI for power distribution and UH-60 helicopter transmission system TABLE V

static fusion techniques (based on batch data).

T YPE OF FAULTS

C. Sliding Window DMFD Results Figs. 10, 11 and 12 show the boxplots of correct isolation (CI) and false isolation (FI) rates for various real-world

Fault disappearance probabilities

Fault behavior

Case 1

0.0247 to 0.0484

Intermittent

Case 2

6.8089e-004 to 9.2327e-004

Permanent

Case 3

0.2235 to 0.3926

Highly intermittent

models. These plots were obtained using the sliding-window DMFD method. The boxplot shows the dispersion of the data with lines at lower quartile (25%), median and upper quartile (75%). The

appearance probability was kept so that on average 3 faults occur over a span of 100 epochs.

whiskers are shown by extending the lines from each end of the

Fig. 13 illustrates the correct isolation rate for various fault

box and the maximum length of the line is kept as a function of

behaviors. The results show the mean value and the vertical

the inter-quartile range. The outliers are shown as ‘o’ in the

lines on the data points indicate the standard deviation. These

figures. The window size is selected such that it gives high

results were obtained using 1000 Monte Carlo runs. The

CI with a small boxsize and a low FI with a low boxsize.

results demonstrate that the algorithm achieves low variance

The most suitable window size using the above criterion for

when the fault behavior is highly intermittent.

Automotive, Docmatch, Powerdist, Helitrans, and EngineSim

False isolation rate plot (Fig.14) also illustrates the same

systems are 15, 10, 20, 15 and 15, respectively. Next, we

behavior as CI plot, i.e., the algorithm achieves the least

perform simulations to study the effect of intermittent faults

variance for the highly intermittent fault types. This illustrates

on the performance of the sliding-window DMFD method.

that the DMFD algorithm is highly suitable for intermittent

The automotive system is used for simulations. The fault

faults.

18

Window size Fig. 12.

Window size

Boxplots of CI and FI for engine simulator system

D. Complexity The algorithm presented here reduces the overall complexity

99

from O(K(2m )) to O(K(m + Of )) where m is the number

98

of component states, K is the number of epochs and Of is

97

the set of failed tests. More specifically, the complexities of

96

binary Viterbi algorithm over all component states and the

95

subgradient method are O(Km) and O(KOf ), respectively,

Correct isolation rate

100

per iteration; this is a substantial improvement over extensive 94 Intermittent Permanent Highly Intermittent

93

92 0

5

10

15

20

25

30

approaches based on exact inference. X. C ONCLUSION

Window size

In this paper, we discussed the problem of dynamic mulFig. 13.

Correct isolation rate for various fault behaviors

tiple fault diagnosis (DMFD) with imperfect tests. The original DMFD problem is an intractable NP-hard combinatorial optimization problem. Using a Lagrangian relaxation-based coordination framework, we decomposed the original DMFD

7

problem into parallel decoupled subproblems coordinated via

Intermittent Permanent Highly Intermittent

6

Lagrange multipliers. Each subproblem corresponds to finding False isolation rate

5

the optimal state sequence of a fault with fixed Lagrange multipliers. The subproblems were solved using a binary Viterbi

4

decoding algorithm. The coordination among the subproblems

3

was facilitated by Lagrange multipliers, which were updated 2

using a subgradient method. 1

0

We discussed four formulations of the DMFD problem. 0

5

10

15

20

Window size

Fig. 14.

False isolation rate for various fault behaviors

25

30

Analogous forms of these formulations have been studied widely in fault diagnosis community in a static context, and applied in various fields. Here, we provided a unified formulation of all the MFD formulations in a dynamic context. The

19

first formulation refers to a generalized version of the DMFD

states. The maximum function value of ξi in (24) at time

problem when the detection and false alarms probabilities are

k is denoted by δk (xi (k)), and the value of xi where the

associated with each test and fault. In the second formulation,

function value is maximum; is denoted by ψk (xi (k)). For

the false alarm probability is associated with fault-free case

binary case, we have used the notation δk (0) = δk (xi (k) = 0)

only. The solution to the second formulation was shown to be

and δk (1) = δk (xi (k) = 1). At time k = 1,

quite similar to that of problem formulation 1, except for the

δ1 (xi (1)) = ξi (xi (1), xi (0), {λj (1)}) P P = {µi (1) − cij λj (1) +

need to update an additional Lagrange multiplier. The third formulation considers the case where the uncertainties are as-

oj ∈Of (1)

cij }xi (1)

oj ∈Op (1)

+σi (1)xi (0) + hi (1)xi (1)xi (0)

sociated with only test outcomes. This models dynamic fusion of classifier outputs. In the fourth formulation, we considered

ψ1 (xi (1)) = φ; where xi (0) ∈ {0, 1}.

the deterministic case, which led to a novel dynamic set

(60)

covering problem. We implemented the algorithm on several real world data sets and the results validated the theory. The key advantage of our approach is that the method provides an

Recursion: The recursion step involves maximizing the objective function at each epoch k.

approximate duality gap, which is a worst case indicator of

δk (xi (k)) = {

the difference between the feasible solution and the optimal

annealing method. The latter provides better primal function



X

cij λj (k)xi (k)

oj ∈Of (k)

+

max

xi (k−1)∈{0,1}

[δk−1 (xi (k − 1)) + σi (k)xi (k − 1) +hi (k)xi (k)xi (k − 1)];

value as compared to the subgradient method. In this paper, we assumed that the DMFD model parameters are known

cij + µi (k)}xi (k)

oj ∈Op (k)

solution. Our results demonstrate that our algorithm achieves high isolation rate as compared to the deterministic simulated

X

(61)

for 2 ≤ k ≤ K; xi (k) ∈ {0, 1}

and faults evolve independently, and are coupled through the test outcomes via the diagnostic matrix (D-matrix). In our future work, we will implement techniques to learn the

ψk (xi (k)) =

arg max [δk−1 (xi (k − 1)) xi (k−1)∈{0,1}

+σi (k)xi (k − 1) + hi (k)xi (k)xi (k − 1))];

(62)

DMFD model parameters from the observed test outcome sequences and relax the independence assumption to solve the

Termination: This step computes the objective function at time

DMFD problem when faults are dependent. Coupled hidden

epoch k = K.

Markov models offer a promising platform for the solution

F∗ =

of dependent faults problem [18]. We will also focus on improving the primal solution using a soft Viterbi algorithm.

max

xi (K)∈{0,1}

{δK (xi (K))];

x∗i (K) = arg max [δK (xi (K))]

(63)

xi (K)∈{0,1}

A PPENDIX I

Optimal state sequence backtracking— The backtracking step

S OLVING SUBPROBLEMS USING V ITERBI A LGORITHM

computes the optimal state sequence by tracing the path

In this Appendix, we discuss the key steps of the Viterbi algorithm which is used to solve each subproblem corresponding to each component state sequence xi .

backwards. The optimal state x∗i (k) of ith fault at time epoch k is given by x∗i (k) = ψk+1 (xi (k + 1)∗ ),

k = K − 1, ...., 1.

(64)

Initialization: In this step, the objective function is computed at k = 1 for each node (component state). It is assumed

Similar to the recursion step, we can further simplify termi-

that the initial state x(0) is known for all the component

nation and backtracking for the binary case.

20

A PPENDIX II

[2] N. Odintsova, I. Rish, S. Ma. “Multifault Diagnosis in Dynamic Sys-

U PDATING LAGRANGE MULTIPLIERS VIA SUBGRADIENT

tems,” Proceedings of IM, 2005. [3] M. Shakeri, K. R. Pattipati, V. Raghavan and A. Patterson-Hine, “Op-

METHOD

timal and near-optimal algorithms for multiple fault diagnosis with unreliable tests,” IEEE Transactions on Systems, Man and Cybernetics-

Lagrange multipliers are updated via

Part C: Applications and Reviews, vol. 28, No. 3, August 1998.

λl+1 j (k)

=

max(0, λlj (k)



l

(k)dlj (k))

(65)

[4] F. Tu, K. R. Pattipati, S. Deb, and V. N. Malepati, “Computationally efficient algorithms for multiple fault diagnosis in large graph-based

for j ∈ Of (k) and k ∈ (1, K) where the subgradients dlj (k)

systems,” IEEE Transactions on Systems, Man and Cybernetics, vol.

at iteration l and at epoch k are

33, no. 1, pp. 73-85, January 2003.

dlj (k) = ln(yj∗ (k)) −

m X

[5] K. R. Pattipati and M. G. Alexandridis, “A heuristic search and infor-

cij x∗i (k) − ηj

(66)

mation theory approach to sequential fault diagnosis,” IEEE Trans. on Systems Man and Cybernetics, vol. 20, no. 4, pp. 872-887, 1990.

i=1

and step size β l (k)is

[6] V. Raghavan, M. Shakeri and K. R. Pattipati, “Optimal and near-optimal

(Ql − Q∗ ) β (k) = −υ T . f P l 2 (dj (k)) l

test sequencing algorithms with realistic test models,” IEEE Transactions

(67)

on Systems, Man, and Cybernetics: Part A - Systems and Humans, vol. 29, no. 1, pp. 11-27, January 1999.

j=1

[7] T. Le and C. N. Hadjicostis, “Graphical inference methods for fault

Since the optimal dual function value is not available, it is

diagnosis based on information from unreliable sensors,” Proceedings

estimated using the primal feasible solution Jf and best current

of Intl. Conf. on Control, Automation, Robotics and Vision, pp. 1012-

dual value Qmin using (27) and (28), respectively. We estimate the optimal dual function as ˆ ∗ = ω(Jf + Qmin ) Q 2

IEEE Trans. on SMC: Part C - Applications and reviews, 30, no. 4,

(68)

value Qmin doesn’t decreases in the previous 20 iterations of the subgradient procedure with the current value of υ, then υ is reduced by a factor. To improve the subgradient convergence, we also vary ω, which is increased or decreased based on whether the dual function value is decreasing or not [12]. We used the following stopping criteria for the subgradient

2000. [9] O. Erdinc, C. Raghavendra, and P. Willett, “Real-time diagnosis with sensors of uncertain quality,” SPIE Conference Proceedings, Orlando, 1490-1499, April 2003. [10] S. Ruan, Y. Zhou, F. Yu , K. R. Pattipati, P. Willett and A. PattersonHine, “Dynamic multiple fault diagnosis and imperfect tests,” IEEE Trans. on Systems, Man and Cybernetics: Part A, under review, June 2006. [11] Z. Ghahramani and M. I. Jordan, Factorial hidden Markov models, Machine Learning, Kluwer Academic Publishers, Boston, 1997. [12] D. Bertsekas, Nonlinear programming, Athena Scientific, Belmont, MA,

method. Stop if

[8] J. Ying, T. Kirubarajan, and K. R. Pattipati, “A Hidden Markov model based algorithm for fault diagnosis with partial and imperfect tests,”

and initial value υ = 0.01 is used. If the best current dual



1017, Singapore, December 2006.

O Pf j=1

USA, 2nd Edition, 2003.

(dlj (k))2

= 0 since we cannot define a suitable

step size in this case, •

Stop if υ ≤ 10−4 because step sizes becomes too small,



Stop if number of iterations crossed the maximum no. of iterations i.e., l ≥ 100.

[13] J. Pearl, Probabilistic reasoning in intelligent systems: networks of plausible inference, Morgan Kauffmann Publishers Inc., San Mateo, CA, 1988. [14] D. Forney Jr., “The Viterbi algorithm,” Proc. IEEE, vol. 61, pp. 268-278, March 1973. [15] J. E. Beasley, “An algorithm for set covering problems,” European Journal of Operational Research, vol. 31, pp. 85-93, 1987.

R EFERENCES [1] F. Yu, F. Tu, H. Tu, and K. R. Pattipati, “Multiple disease (fault)

[16] S. Singh, K. Choi, A. Kodali, K. Pattipati, S. M. Namburu, S. Chigusa, D. V. Prokhorov, and L. Qiao, “Dynamic fusion of classifiers for fault diagnosis,” IEEE SMC Conference, Montreal, Canada, October 2007.

diagnosis with applications to the QMR-DT problem,” Proceedings

[17] A. Kodali, S. Singh, K. Choi, K. Pattipati, S. M. Namburu, S. Chigusa,

of Computing Communication and Control Technology International

D. V. Prokhorov, and L. Qiao, “Diagnostic Inference with Nearly Perfect

Conference, Austin, TX, 2004 and also to appear in IEEE Trans. on

Tests,” will be published in IEEE Aerospace Conference, Big Sky,

SMC: Part A,pp. 746-757, September 2007.

Montana, March 2008.

21

[18] M. Brand, “Coupled hidden markov models for modeling interacting processes,” Neural Computation, November 1996.