1
Dynamic Multiple Fault Diagnosis: Mathematical Formulations and Solution Techniques Satnam Singh, Anuradha Kodali, Kihoon Choi, Krishna Pattipati, Fellow, IEEE, Setu Madhavi Namburu, Shunsuke Chigusa Sean, Danil V. Prokhorov and Liu Qiao
Abstract— Imperfect test outcomes, due to factors such as
level facilitates coordination among each of the subproblems,
unreliable sensors, electromagnetic interference, and environ-
and can thus reside in a vehicle-level diagnostic control unit.
mental conditions, manifest themselves as missed detections
At the bottom level, we use a dynamic programming technique
and false alarms. This paper develops near-optimal algorithms
(specifically, the Viterbi decoding or Max-sum algorithm) to solve
for dynamic multiple fault diagnosis (DMFD) problems in the
each of the subproblems. The key advantage of our approach is
presence of imperfect test outcomes. The dynamic diagnostic
that it provides an approximate duality gap, which is a measure of
inference problem is to determine the most likely evolution of
suboptimality of the DMFD solution. Interestingly, the perfectly-
component states, the one that best explains the observed test
observed DMFD problem leads to a dynamic set covering
outcomes.
problem, which can be approximately solved via Lagrangian
Here, we discuss four formulations of the DMFD problem.
relaxation and Viterbi decoding. Computational results on real-
These include the deterministic situation corresponding to a
world problems are presented. A detailed performance analysis
perfectly-observed coupled Markov decision processes, to several
of the proposed algorithm is also discussed.
partially-observed factorial hidden Markov models ranging from the case where the imperfect test outcomes are functions of tests
I. I NTRODUCTION
only to the case where the test outcomes are functions of faults and tests, as well as the case where the false alarms are associated intractable NP-hard combinatorial optimization problems. We
O
solve each of the DMFD problems by decomposing them into
condition-based and opportunistic maintenance, and to reduce
separable subproblems, one for each component state sequence.
maintenance and operational costs by seamlessly integrating
Our solution scheme can be viewed as a two-level coordinated
the on-board and off-line diagnosis, thereby reducing trou-
with the nominal (fault-free) case only. All these formulations are
solution framework for the DMFD problem. At the top (coordination) level, we update the Lagrange multipliers (coordination variables, dual variables) using the subgradient method. The top
N-LINE vehicle health monitoring and fault diagnosis is essential to improve the vehicle availability via
bleshooting time. During on-line (dynamic) fault diagnosis, the test outcomes are obtained over time as compared to static fault diagnosis where the observed test outcomes are available
This work was supported, in part, by the Toyota Technical Center and the Office of Naval Research under contract no. 00014-00-1-0101. Any opinions expressed in this paper are solely those of the authors and do not represent those of the sponsor. Singh, Kodali, Choi, and Pattipati are with the Electrical and Computer Engineering Department, University of Connecticut, Storrs, CT 06269-2157, USA (e-mail: krishna/
[email protected]). Namburu, Sean, Prokhorov, and Qiao are with the Toyota Technical Center,1555 Woodridge, RR #7, Ann Arbor, MI 48105, USA. Versions of this paper were published at the IEEE Aerospace Conference, Big Sky, Montana, March 2007 and the DX-07 Workshop, Nashville, TN, May 2007.
as a block. On-line vehicle health monitoring heavily relies on extensive processing of data in real-time, which is made possible by smart on-board sensors. Using these intelligent sensors, the system parameters that are essential to vehicle fault diagnosis can be transmitted to an on-board diagnostic inference engine. A significant technical challenge in on-board vehicle health monitoring is the quality of tests. Generally, the tests are imperfect due to unreliable sensors, electromagnetic interference,
2
environmental conditions, or aliasing inherent in the signature
algorithm. The DMFD problem is decoupled into a set of
analysis of on-board tests. The imperfect tests introduce ad-
parallel subproblems (involving dynamic single HMM state
ditional elements of uncertainty into the diagnostic process:
estimation problems) using Lagrange multipliers. A dynamic
the pass outcome of a test does not guarantee the integrity
programming technique (the Viterbi algorithm) is used to solve
of components under test because the test may have missed
each of the subproblems, and their solutions are used to update
a fault; on the other hand, a fail outcome of a test does not
the Lagrange multipliers via the subgradient method. Feasible
mean that one or more of the implicated components are faulty
(primal) solutions are constructed using the dual solutions. In
because the test outcome may have been a false alarm. Hence,
Sections V to VII, we discuss the details of DMFD problems 2,
it is desired that an on-board diagnostic algorithm should be
3 and 4, respectively. On-line DMFD problem is solved using
able to accommodate missed detections and false alarms in
a sliding window method, which is presented in Section VIII.
test outcomes. The performance of on-board diagnosis can
The simulation results of DMFD problem 1 is performed on
be improved by incorporating the knowledge of reliabilities
several real-world datasets to validate our approach. Section
of tests, and by incorporating temporal correlations of test
IX discusses the simulation results of both block and on-line
outcomes.
DMFD problems. Finally, the paper concludes with a summary
The hidden Markov model (HMM) is a natural choice
and future research directions in Section X.
here to represent the individual component states of the system. The HMM is a doubly-embedded stochastic process
II. P REVIOUS WORK
with an underlying unobservable (hidden) stochastic process
The multiple fault diagnosis (MFD) problem originates in
(individual component state evolution), which can be observed
several fields, such as medical diagnosis [1], error correcting
through another set of stochastic processes (i.e., uncertain test
codes, speech recognition, distributed computer systems and
outcome sequences). The individual component state HMMs
networks [2]. The MFD problem in large-scale systems with
are coupled through the observation process. Consequently,
unreliable tests was first considered by Shakeri et al. in
the fault diagnosis problem corresponds to a factorial HMM,
[3]. They proposed near-optimal algorithms using Lagrangian
where each HMM characterizes the individual component
relaxation and subgradient optimization methods for the static
states of the system. The sequence of uncertain test outcomes
MFD problem. In the area of distributed system management,
are probabilistic functions of the underlying Markov chains
the MFD problem is studied by Odintsova et al. in [2].
characterizing the evolution of system states. Here, we inves-
They utilized an adaptive diagnostic technique, termed active
tigate the problem of determining the most likely states of
probing, for fault diagnosis and isolation. A probe can be
components, given a set of partial and unreliable test outcomes
viewed as a test in our terminology; the purpose of a probe is
over time.
to check the set of system components on the probed path. The probe outcomes determine if one or more of the components
A. Organization of the Paper
on the probed path are faulty or normal. Given the probe out-
The paper is organized as follows. In Section II, we
comes, a diagnostic matrix (D-matrix, diagnostic dictionary,
discuss the previous research on multiple fault diagnosis.
reachability matrix) defining the relationship among the probes
We formulate the NP-hard dynamic multiple fault diagnosis
and component faults, as well as the initial system state, they
(DMFD) problem with imperfect test outcomes in Section
developed a sequential multi-fault algorithm to diagnose the
III. Four formulations of the DMFD problem are also dis-
system state. They considered the probe outcomes as being
cussed in Section III. In Section IV, we decompose the
deterministic, which is analogous to the assumptions made in
DMFD problem formulation 1 using Lagrangian relaxation
our Problem 4, and in the work described in [4]-[6]. In [7],
3
Le et al. applied graphical model-based decoding algorithms
space; consequently, it is either computationally expensive or
to the MFD problem in the presence of unreliable tests. They
far from optimal. A major contribution of the present paper
proposed a suboptimal belief propagation algorithm used to
is that the missed-detection/false-alarm process is modeled as
decode low density parity check codes. They considered a
being a property of the component state: the model is perhaps
fault model, where tests are asymmetric, i.e., the D-matrix is
less realistic, but the computational benefit of a factorial HMM
not binary and the test outcomes are also unreliable, and they
is large.
termed it the Y model. Their implementation is parallel to
Another approach, developed by Ruan et al. [10], decom-
our Problem formulation 1; however, they considered only the
poses the original DMFD problem into a series of decoupled
static case.
subproblems, one for each epoch. For a single epoch MFD,
Dynamic single fault diagnosis problem using HMM for-
they developed a deterministic stimulated annealing (DSA)
malism was first proposed by Ying et al. [8], where it is
method, which is inspired by its sibling stochastic simulated
assumed that, at any time, the system has at most one compo-
annealing and the approximate belief revision (ABR) heuristic
nent state present. This modeling is somewhat unrealistic for
algorithm [1]. The single epoch MFD was extended to incor-
most real-world systems. Another version of the dynamic fault
porate component states of multiple consecutive epochs. In
diagnosis problem was studied in [9]: unknown probabilities
addition, they applied a local search and update scheme to
of sensor error, incompletely-populated sensor observations,
further smooth the “noisy” diagnoses stemming from imper-
and multiple faults were allowed, but the faults could only
fect test results, and, consequently increase the accuracy of
occur or clear once per sampling interval.
fault diagnosis.
In the dynamic single fault framework [8], a hidden Markov
The DMFD problem can be viewed as a factorial HMM
modeling framework was adopted, and a moving window
(FHMM), which is discussed in the machine learning literature
Viterbi algorithm was used to infer the evolution of com-
[11]. Here, the HMM state is factored into multiple state
ponent states. In the multiple fault case, the state space of
variables and represented in a distributed manner. The authors
hidden Markov model increases exponentially from (m + 1)
discussed an exact algorithm for inference computations in the
to 2m , where m is the number of possible component states.
FHMM framework. In this framework, inference and learning
Consequently, the HMM-based method would be viable only
involves computing the posterior probabilities of multiple
for small-sized systems. The solution method proposed in [9]
hidden layers (or states), given the test outcomes. However,
is a multiple hypothesis tracking approach, where at each
due to combinatorial nature of the hidden state representation,
observation epoch, k best component state configurations are
the exact algorithm is intractable. They presented approximate
stored. In that paper, the missed-detection/false-alarm process
inference algorithms based on Gibbs sampling and variational
was a property of the sensor rather than the fault, with the
methods. The latter methods are similar to Lagrangian relax-
effect that the underlying inference process could not be
ation, although motivated from a Fenchel duality perspective
decoupled into a factorial HMM. In [9], at each epoch, all
[1], [3], [4], [12].
candidate fault sets, derived from the previously identified
Here, we extend the work of Ruan et al. [10], Shakeri et al.
faults are listed, based on at most one change per epoch
[3] and Tu et al. [4] on MFD to solve the DMFD problem by
assumption. Then, of all k(m + 1) possible candidate sets,
combining the Viterbi algorithm and Lagrangian relaxation in
each has its score calculated, the candidate set which obtains
an iterative way. Depending on the probabilistic assumptions
the highest score is selected as the inference result at the
on fault-test relationships and test outcomes, one obtains
epoch, and the candidates with the k-best scores are updated.
various DMFD formulations. In summary, the contributions
The method is equivalent to enumeration in a limited search
of this paper are: (1) A primal-dual optimization framework
4
Component m (HMM m)
xm ( k − 1)
xm ( k )
xm ( k + 1)
. . .
. . .
. . .
. . .
Component 2 (HMM 2)
x2 (k −1)
x2 (k )
x2 (k + 1)
x1 (k − 1)
x1 (k )
x1 (k + 1)
Op(k-1)
Op(k)
Op(k+1)
Of(k-1)
Of(k)
Of(k+1)
k-1
k
Hidden component states Component 1 (HMM 1)
Op: Passed test outcomes Of : Failed test outcomes Time
Fig. 1.
k+1
DMFD problem viewed as a factorial hidden Markov model (FHMM)
to solve the DMFD problem; (2) Four formulations of the
sources) associated with the system. The state of component
DMFD problem along with their solutions; (3) Simulation
si is denoted by xi (k) at epoch k, where xi (k) = 1 if
results on several real world systems for the first and most
failure source si is present; xi (k) = 0, otherwise. Here,
general formulation of the DMFD problem; (4) A comparison
κ = {0, 1, ..., k, ...K} is the set of discretized observation
of the results between the subgradient and the deterministic
epochs. The status of all component states at epoch k is
simulated annealing methods [10]; and (5) Simulation results,
denoted by x(k) = {x1 (k), x2 (k), ..., xm (k)}. We assume that
along with performance analysis, of the on-line DMFD prob-
the initial state x(0) is known (or its probability distribution
lem using a sliding window method.
is known). Our problem is to determine the time evolution of compo-
III. DMFD P ROBLEM F ORMULATIONS
nent states based on imperfect test outcomes observed over
The dynamic multiple fault diagnosis problem consists of
time. Fig. 1 shows the DMFD problem viewed as a FHMM.
a set of possible component states in a system, and a set
The hidden component state of ith HMM at time epoch k is
of binary test outcomes that are observed at each sample
denoted by xi (k). Each component state xi (k) is modeled
(observation, decision) epoch. Component states are assumed
as a two-state HMM. The observations at each epoch are
to be independent. Each test outcome provides information on
subsets of binary outcomes1 of tests O = {o1 , o2 , ..., on },
a subset of the component states. At each sample epoch, a
i.e., oj ∈ {pass, f ail} = {0, 1}. Fig. 2 shows the DMFD
subset of test outcomes is available. Tests are imperfect in the
problem as a tri-partite digraph at epoch k. Component states,
sense that the outcomes of some of the tests could be missing,
tests and test outcomes represent the nodes of the digraph.
and tests have missed-detection/false-alarm processes associ-
Here, the true states of the component states and tests are
ated with them. The observations consist of imperfect binary
hidden. P = {P d, P f } represents a set of probabilities of
test outcomes, and are characterized by sets of passed test
detection and false alarm, which is defined differently for
outcomes, Op and failed test outcomes, Of . Formally, we rep-
each of the DMFD problem formulations. We also define the
resent the DMFD problem as DM = {S, κ, T, O, D, P, Π}, where S = {s1 , ..., sm } is a finite set of m components (failure
1 Extension
forward.
to multi-valued component states and test outcomes is straight-
5
a failure source si and test tj . For notational convenience, Components
x1 (k )
x2 (k )
x3 (k )
x4 (k )
… xm ( k )
corresponding P dij = P fij = 0. This problem scenario
Hidden Tests
Test outcomes
when si does not affect the outcome of test tj , we let the
t1 (k )
t2 ( k )
t3 ( k )
t4 ( k )
…
tn ( k )
frequently arises in medical fault diagnosis. For example, the QMR-DT (Quick Medical Reference, Decision-Theoretic) database used in the domain of internal medicine, contains
o1 (k )
o2 (k )
o3 (k )
o4 (k )
…
on (k )
approximately 600 disease nodes (faults or failure sources) and 4000 symptoms (tests) [1]. Each of the symptoms could have
Fig. 2.
Tri-partite digraph for DMFD problem
a probability pair (P dij , P fij ) associated with the symptom and the disease node. Fig. 3 shows the bi-partite graph, where
matrix D = [dij ] as the dependency matrix (D-matrix), which
the edges represent the probability pair (P dij , P fij ). These
represents the full-order dependency among failure sources
probabilities can be obtained from the tri-partite digraph (Fig.
and tests.
2) using the total probability theorem as follows:
Each component state is modeled as a two-state nonhomogenous Markov chain. For each component state, e.g., for component si at epoch k, Π = (P ai (k), P vi (k)) denotes the set of fault appearance probability P ai (k) and
Pr(oj (k)|xi (k)) =
X
Pr(oj (k), tj (k)|xi (k))
tj ∈{0,1}
=
X
Pr(oj (k)|tj (k)) Pr(tj (k)|xi (k))
(1)
tj ∈{0,1}
fault disappearance probability P vi (k) defined as P ai (k) =
Problem 2: In situations where the probability of detection
Pr(xi (k) = 1|xi (k − 1) = 0) and P vi (k) = Pr(xi (k) =
(P dij ) is associated with each failure source-test pair, but the
0|xi (k − 1) = 1). These probabilities are required to model
false alarm probability is specified only for the normal system
the intermittent faults. Here, T = {t1 , t2 , ..., tn } is a finite set
state, i.e., P fj = P (oj (k) = 1|x1 (k) = 0, ..., xm (k) = 0), we
of n available binary tests, where the integrity of the system
obtain a slightly complicated variation of Problem formulation
can be ascertained. We denote the set of passed tests, Tp
1 (in terms of computational complexity, but not in terms
and failed tests Tf . At each observation epoch, k, k ∈ κ,
of parameterization). This type of scenario arises when we
test outcomes upto and including epoch k are available, i.e.,
design class-specific classifiers that distinguish between nor-
we let Ok = {O(b) = (Op (b), Of (b))}kb=1 , where Ok is the
mal system operation and failure source, si only, or when the
set of observed test outcomes at epoch k, with Op (b)(⊆ O)
false alarms are defined on an overall system basis. Here, the
and Of (b)(⊆ O) as the corresponding outcomes of sets of
probability pair (P dij ,P fj ) is associated with test outcomes to
passed and failed tests at epoch b, respectively. The tests are
model imperfect test outcomes [3]. This model is also called
partially observed in the sense that outcomes of some tests
the Z model in [7]. Similar to problem 1, the probability pair
may not be available, i.e., (Op (b) ∪ Of (b)) ⊂ O. In addition,
(P dij ,P fj ) is shown as edges between the hidden component
tests exhibit missed detections and false alarms. Here, we also
states and test outcomes in Fig. 3, and they can be obtained
make the noisy-OR (“causal independence”) assumption [13].
from the tri-partite digraph (Fig. 2) using the total probability
The DMFD problem can be formulated in the following ways,
theorem on the nodes of test layer.
arranged from the general to simplified:
Problem 3: When the detection probability (P dj ) and false
Problem 1: When the probability of detection (P dij ) and
alarm probability (P fj ) are associated with each test tj only.
false alarm probability (P fij ) are associated with each failed
The probability pair (P dj ,P fj ) is shown as the edges between
test and each failure source, i.e., P dij = Pr(oj (k) =
the tests and test outcomes in the tri-partite digraph (Fig.
1|xi (k) = 1) and P fij = Pr(oj (k) = 1|xi (k) = 0) of
3). This formulation is quite useful in classifier fusion using
6
{x(1), ..., x(K)}, that best explains the observed test outcome Components
x1 (k )
x2 (k )
x3 (k )
x4 (k )
…
xm (k )
Hidden
sequence OK . We formulate this as one of finding the maximum a posteriori (MAP) configuration:
Test outcomes
o1 (k )
o2 (k )
o3 (k )
o4 (k )
…
b K = arg max Pr(X K |OK , x(0)). X
on (k )
XK
Fig. 3.
Bi-partite graph for the DMFD problem
(2)
Applying the Bayes rule in (2), the objective function is equivalent to b K = arg max Pr(OK |X K , x(0))Pr(X K |x(0)). X
error correcting codes. In the error correcting code (ECC)
XK
matrix, each column corresponds to a binary classifier with the associated (P dj ,P fj ) pair, which are learned during training and validation. In this case, the fault-test relationships are deterministic, but the test outcomes are unreliable and depend on the concomitant test only. This type of formulation is also
With passed and failed test outcomes being conditionally independent given the status of component states (“the noisy OR assumption”), and the Markov property of component state evolution, the problem is equivalent to: b K = arg max X
considered in [9].
XK
This formulation provides a nice vehicle for the dynamic
K Y
{Pr(Op (k)|x(k)) ·
k=1
Pr(Of (k)|x(k)) · Pr(x(k)|x(k − 1))},
fusion of classifiers, where each column of the ECC matrix is
(3)
a classifier, and their associated probability pairs (P dj ,P fj )
where Op (k) ⊆ O and Of (k) ⊆ O denote the sets of passed
are uncertainties associated with classifier outcomes. When
and failed test outcomes at epoch k, respectively. We define a
the learned parameters and the ECC matrix are fed as an
new function fk (x(k), x(k − 1)) as
input to the DMFD algorithm, it performs dynamic fusion of
fk (x(k), x(k − 1)) = ln{Pr(Op (k)|x(k)) ·
classifier outputs over time. Note that the sampling interval
Pr(Of (k)|x(k)) · Pr(x(k)|x(k − 1))}.
(4)
of the dynamic fusion algorithm can be different from the sampling interval of the raw sensor data. Problem 4: This is the deterministic case when tests are perfect i.e. P dij = 1 and P fij = 0 [4]. This formulation reduces the tri-partite digraph in Fig. 2 to a bi-partite graph between the components and tests. This scenario is useful in situations where the tests are highly reliable (e.g., automated testing of electronic cards), and leads to a novel dynamic set covering problem. Next, we discuss the DMFD formulations in detail.
Given the component state status, x(k), the test outcomes are independent. Consequently: Y Pr(Op (k)|x(k)) =
In this problem, we assume that the detection and false alarm probabilities (P dij , P fij ) are associated with each failure source and each test. Fig. 4 illustrates these probabilities.
(5)
Pr(oj (k) = 1|x(k)).
(6)
and Y
Pr(Of (k)|x(k)) =
oj (k)∈Of (k)
For test tj to pass at epoch k, it shall pass on all its associated component states, so that Pr(oj (k) = 0|x(k)) =
IV. DMFD P ROBLEM 1
Pr(oj (k) = 0|x(k)),
oj (k)∈Op (k)
m Y
Pr(oj (k) = 0|xi (k))
(7)
i=1
where
1 − P f , x (k) = 0; ij i Pr(oj (k) = 0|xi (k)) = 1 − P d , x (k) = 1; ij i
= (1 − P dij )xi (k) (1 − P fij )1−xi (k) , xi (k) ∈ {0, 1}.
(8)
The DMFD problem is one of finding, at each decision epoch k, the most likely fault state candidates x(k) ∈ {0, 1}m , i.e., the fault state evolution over time, X K =
Evidently, Pr(oj (k) = 1|x(k)) = 1 − Pr(oj (k) = 0|x(k)).
(9)
7
Normal state
Component
+xi (k − 1)xi (k) ln(1 − P vi (k))}.
Faulty state
xi (k ) = 0
xi (k ) = 1
x(k), x(k − 1) ∈ {0, 1}m
1 − Pdij 1 − Pf ij Pf ij Test Outcome
(13)
Pdij
o j (k ) = 0
The primal DMFD problem posed in (12) and (13) is NP-
o j (k ) = 1
hard. Indeed, even the single epoch problem, i.e., x b(k) = arg max fk (x(k), x b(k − 1)), is NP-hard [3], which, for all
Fig. 4.
x(k)
Detection and false alarm probabilities for problem 1
practical purposes, means that it cannot be solved to optimality within a polynomially bounded computation time.
In the same vein, the assumption of independent evolution of component states leads to: Pr(x(k)|x(k − 1)) =
m Y
A. Primal-dual optimization framework Pr(xi (k)|xi (k − 1))
(10)
i=1
The NP-hard nature of the primal DMFD problem motivates us to decompose it into a primal-dual problem using
where,
a Lagrangian relaxation approach. By defining new variables
Pr(xi (k)|xi (k − 1)) = 1 − P ai (k) xi (k − 1) = 0, xi (k) = 0; P a (k) xi (k − 1) = 0, xi (k) = 1; i P v (k) xi (k − 1) = 1, xi (k) = 0; i 1 − P v (k) x (k − 1) = 1, x (k) = 1; i i i
and constraints, the DMFD problem reduces to a combinatorial optimization problem with a set of equality constraints. The constraints are relaxed via Lagrange multipliers. The relaxation procedure generates an upper bound for the objective function. The procedure of minimizing the upper
Equivalently,
bound via a subgradient optimization produces a sequence of
Pr(xi (k)|xi (k − 1)) = (1 − P ai (k))
(1−xi (k−1))(1−xi (k))
·
dual feasible and the concomitant primal feasible solutions
P ai (k)(1−xi (k−1))xi (k) · P vi (k)xi (k−1)(1−xi (k)) ·
to the DMFD problem. If the objective function value for the best feasible solution and the upper bound are the same,
(1 − P vi (k))xi (k−1)xi (k) ;
the feasible solution is the optimal solution. Otherwise, the xi (k − 1), xi (k) ∈ {0, 1}.
(11)
So, the problem that is equivalent to (3) is as follows b K = arg max X XK
K X
fk (x(k), x(k − 1)).
difference between the upper bound and the feasible solution, termed the approximate duality gap, provides a measure of
(12)
k=1
where
suboptimality of the DMFD solution; this is a key advantage of our approach. Another advantage of the primal-dual method is that, although the primal DMFD problem is not concave, the
fk (x(k), x(k − 1)) = m X
X
[xi (k) ln(1 − P dij ) + (1 − xi (k))ln(1 − P fij )]
oj ∈Op (k) i=1
dual DMFD problem is a piecewise convex function, which can be optimized via the subgradient method. In order to write the primal DMFD problem, we define new variables Y K = {y(1), y(2), ..., y(K)} and y(k) = {yj (k), ∀oj ∈ Of (k)} such
+
X
ln[1 −
oj ∈Of (k)
+
m X
m Y
(1 − P dij )xi (k) (1 − P fij )(1−xi (k)) ]
i=1
{(1 − xi (k − 1))(1 − xi (k)) ln(1 − P ai (k))
that ln yj (k) = where,
+xi (k − 1)(1 − xi (k)) ln(P vi (k))
i=1
µ
i=1
+(1 − xi (k − 1))xi (k) ln(P ai (k))
m P
cij = ln
cij xi (k) + ηj ,
1 − P dij 1 − P fij
¶ , ηj =
∀oj ∈ Of (k), m X
ln(1 − P fij ).
(14)
i=1
After simple algebraic manipulations of (13) and using (12)
8
Two-level Lagrangian relaxation approach
Original DMFD problem
Appending constraints (14) to (16) via Lagrange multipliers
Update Lagrange multipliers using subgradient method
{λj (k)}oj ∈Of (k) , the Lagrangian function L(X, Y, Λ) can be written as
... Solve subproblem 1 using binary Viterbi algorithm
...
K P
L(X, Y, Λ) =
fk (x(k), x(k − 1), y(k))
k=1
Solve subproblem m using binary Viterbi algorithm
P
+
λj (k)(ln yj (k) −
∀oj ∈Of (k)
m P
cij xi (k) − ηj ),
i=1
(18)
where Λ = {λj (k) ≥ 0, k ∈ (1, K), oj ∈ Of (k)} is the set of Fig. 5.
Decomposition of the original DMFD problem
Lagrange multipliers. In (19), Lagrange multipliers {λj (k)} are nonnegative despite equality constraints (14), because the
and (14), the primal problem can be written as K X
max J(X, Y ) = max
X K ,Y K
where
X K ,Y K
the
component
yj (k) need to be nonnegative. Using the Lagrange multiplier theorem, we optimize the Lagrangian function in (17) w.r.t.
fk (x(k), x(k − 1), y(k)),
k=1
state
(15) sequence
XK
is
yj (k) to obtain optimal yj∗ (k) as yj∗ (k) =
=
λj (k) . 1 + λj (k)
(19)
{x(1), x(2), ..., x(K)}. Here, the primal objective function for
The dual of the primal DMFD problem as posed in (15)-(17),
an individual component state i.e., fk (x(k), x(k − 1), y(k))
can be written as
is defined as
min m X
X
fk (x(k), x(k − 1), y(k)) =
Q(Λ)
Λ
subject to Λ = {λj (k) ≥ 0, k ∈ (1, K), oj ∈ Of (k)}
cij xi (k)
(20)
oj ∈Op (k) i=1
+
m X
X
µi (k)xi (k) +
i=1
where the dual function Q(Λ) is defined by
ln(1 − yj (k))
oj ∈Of (k)
+
m X
Q(Λ) = max L(X, Y, Λ).
σi (k)xi (k − 1) + γ(k) + g(k)
Substituting (19) into (20) and simplifying further by rear-
i=1
+
m X
hi (k)xi (k)xi (k − 1)
(16)
i=1
ranging and combining the terms, we obtain the dual function as Q(X, Λ) = max
where, γ(k) = µ
X oj ∈Op (k)
XK
µ
¶
(1 − P ai (k))(1 − P vi (k)) P ai (k)P vi (k)
g(k) =
m X
ln(1 − P ai (k)).
Qi (Λ) =
Qi (Λ).
(22)
i=1
K X
ξi (xi (k), xi (k − 1), λj (k)) +
k=1
ξi (xi (k), xi (k − 1), λj (k)) = Ã P P cij + µi (k) −
¶
oj ∈Op (k)
,
1 wk (Λ) m
(17)
Note that the multiple HMMs are coupled here because their states are observed only via a set of test outcomes. In equation (16), the terms involving yj (k) and hi (k) show the coupling
(23)
! cij λj (k) xi (k)
oj ∈Of (k)
+σi (k)xi (k − 1) + hi (k)xi (k)xi (k − 1)
i=1
effects.
m X
Here
ηj ,
P ai (k) µi (k) = ln , 1 − P ai (k) µ ¶ P vi (k) σi (k) = ln , 1 − P ai (k) hi (k) = ln
(21)
X K ,Y K
(24)
and wk (Λ) = γ(k) + g(k) X + λj (k) ln λj (k) − λj (k)ηj ∀oj ∈Of (k)
−
X
∀oj ∈Of (k)
(1 + λj (k)) ln(1 + λj (k))
(25)
9
problems. This level facilitates coordination among each of
Step 1: Initialize Lagrange multipliers
the subproblems, and can thus reside in a diagnostic control
Λ = {λ j (k ), k ∈ (1, K ), j ∈ O f (k )}
unit. At the bottom level, we use the dynamic programming
Input fault universe Input test outcomes for K epochs
technique (the Viterbi algorithm) to solve each of the subprob-
1
lems with computational complexity O(K), i.e., we optimize ξi function in (24) to obtain the optimal state sequence x∗i Step 2:
Step 2:
Find optimal sequence for component state 1 . using Viterbi algorithm for fixed Λ l
..
Find optimal sequence for component state m using Viterbi algorithm for fixed Λ l
for each component state, given a fixed set of Lagrange multipliers Λ = {λj (k), k ∈ (1, K), oj ∈ Of (k)}. The Viterbi algorithm is a dynamic programming technique to find the most likely fault sequence [14]. It finds a recursive optimal solution to the problem of estimating the state sequence of a finite state Markov chain observed in memoryless noise. The
Step 3: Compute current primal and dual values, lower and upper bounds
key feature of Viterbi algorithm is that the objective function can be written as a sum of merit functions depending on one state and its preceding one. We obtain the optimal state
Step 4:
sequence for each component state, i.e. X ∗ = {x∗1 , x∗2 , ..., x∗m }
Update Lagrange multipliers via subgradient method
using a binary Viterbi algorithm. The key steps of the Viterbi algorithm are described in Appendix I.
Step 5: Meet stopping criteria?
No
Yes
B. Approximate and Exact Duality Gap After evaluating the optimum state sequence X ∗ for fixed Λ, the problem reduces to one of minimizing the dual function value Ql (Λ) = Q(X, Y, Λ) at iteration l, which is computed
Output: Feasible solution (most likely state sequence )
using (22)-(25). Denoting Q∗ as the optimal dual function value, i.e., Q∗ = Q(Λ∗ ) = min Q(Λ), where the dual problem Λ
is given by (22)-(25). The optimal primal solution is denoted Fig. 6.
Flow chart of the algorithm
by J ∗ = J(X ∗ , Y ∗ ) =
max J(X, Y ), where the primal
X K ,Y K
problem is given by (15)-(17). represents the dual function for the ith component. The main
The difference between the optimal dual and the primal
benefit of (22) is that now the original problem is separable.
function values, i.e. (Q∗ − J ∗ ) is termed the exact duality
As shown in Fig. 5, we employed the Lagrangian relaxation
gap. Since the DMFD problem is NP-hard, it is difficult to
method to decompose the original DMFD problem into m
obtain the global optimal solution J ∗ . However, we can obtain
separable subproblems, one for each component state sequence
several feasible solutions from the dual solution and select the
xi , where xi = (xi (1), xi (2)), ..., xi (K)), xi (k) ∈ {0, 1}
best feasible solution from the set. If Jf = J(Λ∗ , X f , Y f ) be
and i ∈ {1, m}. This scheme can be viewed as a two-level
the best feasible value, then we have
coordinated solution framework for the DMFD problem. At the top (coordination) level, we update Lagrange multipliers
Jf ≤ J ∗ ≤ Q∗ ≤ Ql .
(26)
Λ = {λj (k), k ∈ (1, K), oj ∈ Of (k)} using the subgradient
Using this method, we can obtain an approximate duality
method based on decoupled solutions of the individual sub-
gap Q∗ − Jf = (Q∗ − J ∗ ) + (J ∗ − Jf ) ≥ 0, which
10
Components Normal state x1 (k ) = 0
V. DMFD P ROBLEM 2
Normal state
...
Faulty state
xm (k ) = 0
In this formulation, we define P dij as P dij = Pr(oj (k) =
xi (k ) = 1
1|xi (k) = 1) and P fj = Pr(oj (k) = 1|x1 (k) = 0, x2 (k) =
1 − Pdij
0, ..., xm (k) = 0). This scenario is depicted in Fig. 7.
Pdij Test 1 − Pf j Outcome o j (k ) = 0
Fig. 7.
Pf j
Pr(oj (k) = 0|x(k)) =
o j (k ) = 1
(1 − P fj )(1−x1 (k))...(1−xm (k))
Detection and false alarm probabilities for problem 2
Using a new variable z(k) =
m Q i=1
provides an overestimate of the error between the global
m Q i=1
Pr(oj (k) = 0|x(k)) = (1 − P fj )z(k)
m Y
(1 − P dij )xi (k) ,
i=1
Taking log
bound Qlb as follows: If J(X ∗ (Λl ), Y (X ∗ (Λl )) ≥ Qlb then X f = X ∗ (Λl ),
(29)
(1 − xi (k)) and
optimal solution and the best feasible solution. To summarize, we update feasible solutions, i.e., X f , Y f , and the lower
(1 − P dij )xi (k)
ln z(k) =
Y f = Y (X ∗ (Λl )) and
m X
ln(1 − xi (k))
(30)
i=1
Qlb = Jf = J(X ∗ (Λl ), Y (X ∗ (Λl )).
ln(Pr(oj (k) = 0|x(k))) = z(k) ln(1 − P fj )
(27)
The upper bound Qub is obtained using the current dual value
+
m X
xi (k) ln(1 − P dij )
(31)
i=1
l
Q as follows
Following steps similar to those in Problem 1, we have Qub = Qmin = min(Qmin , Ql ).
(28) ln(Pr(Op (k)|x(k))) P = ln(Pr(oj (k) = 0|x(k)))
Since the dual function Ql (Λ) is a piecewise differentiable
oj (k)∈Op (k)
function of Lagrange multipliers Λ, this problem cannot be
P
=
solved using differentiable optimization algorithms. We use a subgradient algorithm to compute a sequence of upper
P
+
described in Appendix II. Fig. 6 shows the flow chart of our
= z(k)ηj (k) +
m X
algorithm. There are five major steps. In step 1, we initialize
and test information along with the associated probabilities (Pdij , Pfij , Pai and Pvi ). We also input test outcomes for the K epochs. In step 2, we run m binary Viterbi algorithms to obtain the optimal state sequences corresponding to the m f
f
faults. In step 3, we update the feasible solutions, i.e., X , Y , and the lower bound and upper bound i.e., Qlb and Qub using (24)-(26). Next, the Lagrange multipliers are updated using
algorithm outputs the most likely component state sequence for the m components.
xi (k) ln(1 − P dij )
X
xi (k) ln(1 − P dij )
(32)
i=1 oj (k)∈Op (k)
P
where ηj (k) =
ln(1 − P fj ), and z(k) is defined
oj (k)∈Op (k)
in (30). For failed tests
P
ln(Pr(Of (k)|x(k))) = =
P
ln(Pr(oj (k) = 1|x(k)))
oj (k)∈Of (k)
ln(1 − yj (k))
oj (k)∈Of (k)
where yj (k) = Pr(oj (k) = 0|x(k)) and using (31), ln(yj (k)) = z(k) ln(1 − P fj ) +
the subgradient method, which is described in Appendix II. If stopping criteria, defined in Appendix II, are met, then the
m P
oj (k)∈Op (k) i=1
bounds for Ql (Λ)[12]. The details of subgradient method are
the Lagrange multipliers and the input fault universe i.e., fault
z(k) ln(1 − P fj )
oj (k)∈Op (k)
m X
xi (k) ln(1 − P dij ) (33)
i=1
Here the DMFD problem is equivalent to b K = arg max X XK
K X k=1
fk (x(k), x(k − 1), y(k), z(k))
(34)
11
where, the primal objective function for an individual compo-
and optimizing w.r.t. z(k), we obtain optimal z ∗ (k) as
nent state i.e., fk (x(k), x(k − 1), y(k), z(k)) is defined as fk (x(k), x(k − 1), y(k), z(k)) = z(k)ηj (k) m X X + xi (k) ln(1 − P dij )
z ∗ (k) =
P
−ηj (k) +
µ(k) . λj (k) ln(1 − P fj )
(39)
∀oj ∈Of (k)
The dual function Q(Λ) of problem 4 is defined by
i=1 oj (k)∈Op (k)
X
+
ln(1 − yj (k)) +
+
σi (k)xi (k − 1) +
i=1
Q(Λ) =
τi (k)xi (k)
i=1
oj (k)∈Of (k) m X
m X
m X
L(X, Y, z, Λ).
(40)
Substituting (38), (39) into (37) and simplifying further by
hi (k)xi (k)xi (k − 1) + g(k) (35)
i=1
rearranging and combining the terms, we obtain the dual function as
where ln z(k) =
m X
Q(Λ) = max
ln(1 − xi (k)),
XK
i=1
X
ηj (k) =
ln(yj (k)) = z(k) ln(1 − P fj ) + µ τi (k) = ln µ σi (k) = ln µ hi (k) = ln
m X
Qi (Λ) = xi (k) ln(1 − P dij ),
i=1
P ai (k) 1 − P ai (k) P vi (k) 1 − P ai (k)
g(k) =
m X
, ¶ ¶
ln(1 − P ai (k)).
(36)
1 wk (λj (k), µ(k)) m
oj (k)∈Op (k)
X
−
λj (k)xi (k) ln(1 − P dij ) + σi (k)xi (k − 1) +hi (k)xi (k)xi (k − 1) − µ(k) ln(1 − xi (k)) (43)
and
fk (x(k), x(k − 1), y(k), z(k))
k=1
Ã
+µ(k) ln z(k) −
m X
wk (λj (k), µ(k)) =
!
µ(k)
ln(1 − xi (k))
i=1
−ηj (k) +
X
+
λj (k) (ln yj (k) − z(k) ln(1 − P fj )) m X
∀oj ∈Of (k)
[λj (k) ln λj (k) − (1 + λj (k)) ln(1 + λj (k))]
xi (k)λj (k) ln(1 − P dij )
(37)
∀oj ∈Of (k) i=1
−µ(k)
X
∀oj ∈Of (k)
where Λ = {µ(k), λj (k) ≥ 0, k ∈ (1, K), oj ∈ Of (k)} is
ηj (k) + ln(µ(k)) P + g(k) λj (k) ln(1 − P fj )
∀oj ∈Of (k)
∀oj ∈Of (k)
X
(42)
oj (k)∈Of (k)
can be written as
−
ξi (xi (k), xi (k − 1), λj (k), µ(k))
ξi (xi (k), xi (k − 1), λj (k), µ(k)) X = xi (k) ln(1 − P dij ) + xi (k)τi (k)
,
µ(k), {λj (k)}j∈Of (k) , the Lagrangian function L(X, Y, z, Λ)
X
(41)
and
,
Appending constraints (30) and (33) via Lagrange multipliers
K X
Qi (Λ)
i=1
k=1
+
i=1
L(X, Y, z, Λ) =
K X
¶
(1 − P ai (k))(1 − P vi (k)) P ai (k)P vi (k)
m X
where
ln(1 − P fj ),
oj (k)∈Op (k)
+
max
X K ,Y K ,z
λj (k) ln(1 − P fj ) P (44) −ηj (k) + λj (k) ln(1 − P fj ) ∀oj ∈Of (k)
the set of Lagrange multipliers. Using the Lagrange multiplier
The dual problem posed in (40)-(44) is separable and it can
theorem, we optimize the Lagrangian function in (34) w.r.t.
be solved by following a procedure similar to that used for
yj (k) to obtain optimal yj∗ (k) as
solving Problem 1. The only difference is that we also need
yj∗ (k) =
λj (k) , 1 + λj (k)
to update the Lagrange multiplier µ(k) using a subgradient (38)
method.
12
at least one component state xi (k) = 1 that satisfies dij = 1. Test
t j (k ) = 0
t j (k ) = 1
Thus, there must be one or more failure sources that cover the
1 − Pd j 1 − Pf j
failed tests. Let us consider a matrix A, which has each row
Pd j
representing the list of failure sources covered by a failed test.
Pf j Test Outcome
o j (k ) = 0
After excluding the failure sources covered by the passed tests,
o j (k ) = 1
the resulting matrix A is a binary matrix such that aij = dji . After substituting P dij = 1 and P fij = 0 in (13), the reliable Fig. 8.
Detection and false alarm probabilities for problem 3
test scenario with a binary D-matrix simplifies to a dynamic set covering problem with the following objective function term
VI. DMFD P ROBLEM 3
at epoch k:
In this formulation, we consider the case where the probabilities of detection and false alarm (P dj , P fj ) are asso-
fk (x(k), x(k − 1)) =
m X
+hi (k)xi (k)xi (k − 1)} + g(k) (47)
ciated only with each test tj (see Fig. 8). Formally, P dj = Pr(oj (k) = 1|tj (k) = 1) and P fj = Pr(oj (k) = 1|tj (k) = 0). We can convert these probabilities into a special case of Problem 1 by computing (P dij ,P fij ) using (1): P dij = (dij )P dj + (1 − dij )P fj
{µi (k)xi (k) + σi (k)xi (k − 1)
i=1
subject to following constraints: A(k)x(k) ≥ e for tj (k) ∈ Tf (k) where e is a vector of one’s. Appending constraints to (47) via Lagrange multipliers
(45)
Λ = {λj (k) ≤ 0, k ∈ (1, K), tj ∈ Tf (k)}, the Lagrangian function L(X, Λ) can be written as
Similarly P fij = (dij )P fj + (1 − dij )P dj
L(X, Λ) =
(46)
fk (x(k), x(k − 1))
k=1
Here, D = [dij ] is the dependency matrix (D-matrix). The
+
P
µ ¶ m P λj (k) 1 − aji xi (k) i=1
∀tj ∈Tf (k)
solution of Problem 3 can be obtained by substituting P dij and P fij in (45)-(46) in the solution of Problem 1.
K P
(48)
After rearranging the terms, the Lagrangian function of the original problem is shown as a sum of the Lagrangian func-
VII. DMFD P ROBLEM 4
tions of each subproblem as follows:
Next, we consider the case when the system consists of
L(X, Λ) =
Li (xi (k), Λ)
where K P
P
equivalently, the D-matrix completely characterizes the fault-
Li (xi (k), Λ) = µi (k)xi (k) −
test relationships [4]. This formulation can be represented as
+σi (k)xi (k − 1) + hi (k)xi (k)xi (k − 1) Ã ! P 1 λj (k) + m g(k) +
a bipartite graph between the components and tests. In this case, if some tests have passed, then we can infer that all the failure sources covered by these tests are good components.
(49)
i=1
reliable tests, and the fault-test relationships are deterministic, i.e. P dij = 1 and P fij = 0 for i = 1, ..., m and j = 1, ..., n or
m X
aji xi (k)
k=1 tj ∈Tf (k)
(50)
∀tj ∈Tf (k)
The dual function Q(Λ) is defined by
Thus, we need to infer failed components from those covered by the failed tests only, i.e., by excluding those
Q(Λ) = max L(X, Λ).
components covered by the passed tests. Consequently, the size
Simplifying further by rearranging and combining the terms,
of the DMFD problem can be reduced by removing all failure
we obtain the dual function as
sources {si |P fik = 0, P dik = 1, and tj (k) ∈ Tp (k)}. For each failed test tj (k) ∈ Tf (k), the optimal solution contains
XK
Q(Λ) = max XK
m X i=1
Qi (Λ)
(51)
(52)
13
where
•
Qi (Λ) =
K X
ξi (xi (k), xi (k − 1), λj (k)) +
k=1
1 wk (Λ), (53) m
Initialize component states at k = 2 using the results of previous window
•
Solve the online DMFD problem using the data from k = 2 to k = W + 1
ξi (xi (k), xi (k − 1), λj (k)) = µi (k)xi (k) +σi (k)xi (k − 1) + hi (k)xi (k)xi (k − 1) X λj (k)aji (k)xi (k) −
•
Step 3: Continue sliding the window until k = K − W + 1. (54)
tj ∈Tf (k)
and wk (Λ) = g(k) +
Make a decision at epoch k = 2
Selection of window size is a key issue and it depends on the system and fault behavior, i.e., permanent or intermittent
X
faults. λj (k).
(55)
∀tj ∈Tf (k)
The dual problem defined in (53)-(55) is separable. The Viterbi algorithm is used to solve each subproblem corresponding to each component state sequence {xi (k)}K k=1 . This algorithm can be viewed as a dynamic set covering problem, which is
IX. S IMULATIONS AND RESULTS We implemented and applied the solution of problem 1, the most general version of the DMFD problem formulation, to a small-scale system and few real world models.
NP-hard. Thus, the dynamic set covering problem is solved by combining the Viterbi algorithm and Lagrangian relaxation.
A. Small-scale system
This generalizes Beasley’s Lagrangian relaxation algorithm for
We randomly generated a small-scale system to illustrate
the static set covering problem [4], [15] to dynamic settings.
the inputs and outputs of our algorithm. The model was
We will explore the applications of this algorithm in our future
constructed for a system with 20 components, 20 tests and 20
work [17].
observation epochs. Each component can have binary states i.e. normal and faulty. The detection probabilities were set
VIII. S LIDING W INDOW DMFD M ETHOD During the online monitoring of a system, the observations and potential fault sequences are usually very long. Hence in order to reduce the amount of computation and storage, the DMFD problem is solved using a sliding window method. The window size, W is selected based on the performance criteria, such as low classification error and low false isolation rate. One of the key advantages of the sliding window method is that Lagrange multipliers are available W -1 samples ahead, which improves the speed of dual optimization. The sliding window method involves following steps: Step1: Solve the DMFD problem for the window size W (W ≤ K). Make a decision at epoch k = 1 Step 2: Move the window by 1 time epoch, i.e., k = 2 to k =W +1 •
between 0.7-0.9 and the false alarm probabilities were set between 0-0.02, and the tests uniformly cover the component states. The fault appearance and disappearance probabilities were varied between 0.0049-0.0051 and 0.00025-0.00033, respectively. These probabilities were chosen such that the average number of faults was 2 overa a span of 20 epochs. The true fault state set and accordingly the test outcomes of each epoch are generated using the above model parameters. The stopping criteria as defined in Appendix II were used for the subgradient method. We used the following metrics to evaluate the performance of our algorithm: Correct isolation rate (CI): CI is the percentage of true fault states which are detected by the algorithm at epoch k. Let x ˆ(k) be the fault state set at epoch k detected by algorithm, and r(k) is the true fault state set at epoch k. Then CI and average CI over all epochs are obtained as follows:
Initialize (W − 1) Lagrange multipliers using previous window
CI(k) =
|ˆ x(k) ∩ r(k)| |r(k)|
(56)
14
TABLE I S MALL - SCALE SCENARIO FOR SIMULATIONS Fault model (si , tj , P dij , P fij )
Pa
Pv
0.0050
0.00026
0.0049
0.00027
0.0050
0.00026
.
.
.
.
.
.
.
.
(19,18,0.88,0.016)
0.0051
0.00033
(20,11,0.82,0)
0.0051
0.00028
s1
(1,3,0.80,0.01)
(1,13,0.75,0)
(1,19,0.74,0)
s2
(2,10,0.86,0.01)
(2,12,0.88,0)
s3
(3,1,0.80,0.015)
(3,16,0.72,0.01)
.
.
.
.
.
.
.
.
.
.
s19
(19,12,0.85,0.015)
s20
(20,8,0.735,0.011)
Test outcomes k
No. of failed
Failed test outcomes Of (k)
tests
No. of passed
Passed test outcomes Op (k)
tests
1
1
3
19
1 2 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2
0
φ
20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
3
2
11 18
18
1 2 3 4 5 6 7 8 9 10 12 13 14 15 16 17 19 20
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
19
5
3 10 12 16 17
15
1 2 4 5 6 7 8 9 11 13 14 15 18 19 20
20
6
3 4 10 12 16 17
14
1 2 5 6 7 8 9 11 13 14 15 18 19 20
TABLE II
−45 Lower bound Upper bound Dual fun. current value
R ESULTS FOR SMALL - SCALE SCENARIO 13.2
Correct isolation rate (CI)
100.0
False isolation rate (F I)
0.0
Primal function value (J)
-68
Dual function value (Q)
-59
Computation time (sec) (t)
0.12
−55
−60 Bounds
Approximate duality gap (D) (%)
−50
−65
−70
−75
K P
CI =
−80
CI(k)
k=1
K
−85
(57)
0
5
10
15
20 Iterations
25
30
35
40
False isolation rate (FI): FI is percentage of fault states which are falsely detected by the algorithm as fault state at epoch k.
Fig. 9.
Approximate duality gap
FI and average F I are computed as F I(k) =
|ˆ x(k) ∩ ¬r(k)| |S| − |r(k)| K P
FI =
(58)
t19 with (P d1,3 , P f1,3 ) = (0.80, 0.01), (P d1,13 , P f1,13 ) =
F I(k)
k=1
K
example, state of component 1 can be detected by t3 , t13 and (0.75, 0), and (P d1,19 , P f1,19 ) = (0.74, 0), respectively. This
(59)
implies that when component state s1 occurs, test t3 detects
Table I shows only partial data (5 rows) of a small-scale exam-
it with probability 0.80, and if component state s1 does not
ple. Any component state can be detected by several tests. For
occur, test t3 has 0.01 probability of falsely implicating it. Sim-
15
TABLE III
sion system (Helitrans), and an engine simulator (EngineSim).
R EAL WORLD MODELS
Automotive
Details of these models are provided in [4]. Here, m, n, and c
m
n
P dij P fij
c, P ai
22
60
(0.85-0.95),
3, 9.13e-04
0-0.02
denote the number of components (failure sources), number of tests and the average number of intermittent faults that
Docmatch
257
180
(0.6-1),0
9, 3.12e-04
can occur over a span of 100 epochs. The fault appearance
Powerdist
96
98
(0.6-1),0
3,3.13e-04
probabilities (P ai ) were computed based on the average
Helitrans
34
51
(0.6-1),0
2, 2.95e-04
EngineSim
53
30
(0.6-1),0
2,5.68e-04
number of intermittent faults (c). These real-world systems are not ideal because they have fewer tests as compared to failure sources; hence, some failure sources are not covered
ilarly, test t13 detects s1 with probability 0.75, and has no false
by any tests. The fault disappearance probabilities (P vi ) were
alarms. If s1 is not present at epoch k, then at epoch k +1, the
varied between 0.0025-0.0049 to allow c intermittent faults,
probability of having s1 present is 0.005, while if s1 is present
on average. The probabilities of detection and false alarm
at epoch k, s1 has a probability 0.00026 of disappearing at
were varied as shown in Table III. Here, J , Q, D, CI, F I
epoch k + 1. The test outcomes at epochs k = 1, 2, 3, 19 and
and t denote the average primal function value, the average
20 are shown in the table; test outcomes at other epochs are not
dual function value, the average approximate duality gap, the
shown here for simplicity and for saving space. For example,
average correct isolation rate, the average false isolation rate
at k = 1, the outcome of tests t3 is observed as having failed,
and the average computation time per epoch. The maximum
while the outcome of all other tests except t3 is observed as
number of subgradient iterations was set at 80 and 100 Monte
having passed. The DMFD problem here is to identify the
Carlo runs were used to generate the test outcomes. Table
evolution of component states over the 20 epochs. The results
IV shows the results obtained using the subgradient (S) and
for this model are shown in Table II. Here, J, Q, D, CI, F I
the deterministic simulated annealing (DSA) [10] methods.
and t denote the primal function value, the dual function value,
The subgradient method (S) achieves higher correct isolation
the approximate duality gap, the correct isolation rate, the
rates as compared to (DSA) for all systems except Helitrans.
false isolation rate and the computation time per epoch. The
However, the DSA method achieves better primal function
primal and dual function values are computed using (15)-(17)
value and is also effective in reducing the computation time
and (22)-(25), respectively. The approximate duality gap (D)
(t). Also, note that we can obtain a hybrid duality gap by
is computed as a ratio of the difference between Q and J
taking the maximum primal solution from the subgradient (S)
divided by the absolute value of the primal feasible value J.
and the deterministic simulated annealing (DSA) methods and
The algorithms were implemented in MATLAB. We used a
the dual function value from the subgradient (S) method. The
standard PC having Pentium 4 Processor with 3.0 GHz clock
hybrid DSA-subgradient (HS) duality gaps are also shown in
speed and 512 MB RAM. The approximate duality gap is
Table V. The average computation time ( t ) is measured in
also shown in Fig. 9 is 13.2%. The duality gap reduces as the
seconds. These numbers are attractive practically, and they can
number of iterations increases, and the subgradient method
be further reduced significantly by a careful implementation
converges to the minimum dual function value.
in the C language. We also showed an application of the DMFD Problem 3
B. Real world data sets
formulation in our recent paper [16] where we performed
Table III illustrates the model parameters of an automotive
dynamic fusion of classifiers over time for automotive engine
system, a document matching system (Docmatch), a power
fault diagnosis. The temporal correlations considered by dy-
distribution system (Powerdist), a UH-60 helicopter transmis-
namic fusion improve classification accuracy over a variety of
16
TABLE IV R ESULTS ON REAL WORLD MODELS J
Q
D
CI
FI
t
(%) Automotive
Docmatch
Powerdist
Helitrans
EngineSim
S
-658
-481
27
99.5
0.05
0.43
DSA
-775
–
–
75
1.30
0.01
HS
-658
-481
27
S
-541
-311
42.5
88.2
0.36
4.53
DSA
-405
–
–
69
0.70
0.24
HS
-405
-311
23.2
S
-232
-125
46.1
91.6
0.75
1.56
DSA
-157
–
–
84
0.30
0.05
HS
-157
-125
20.3
S
-15
-14
6.7
94.8
0.31
0.47
DSA
-15
–
–
100
0.0
0.02
HS
-15
-14
6.7
S
-85
-33
61.1
95.1
2.28
0.64
DSA
-51
–
–
86
0.3
0.02
HS
-51
-33
35.3
Automotive system
Automotive system 20 False isolation rate
Correct isolation rate
100 80 60 40 20 0
15 10 5 0
1
5
10 15 20 Window size
25
1
Document matching system
25
2.5 False isolation rate
Correct isolation rate
10 15 20 Window size
Document matching system
100 80 60 40 20 0
2 1.5 1 0.5 0
1
Fig. 10.
5
5
10 15 20 Window size
25
Boxplots of CI and FI for automotive and document matching system
1
5
10 15 20 Window size
25
17
Power distribution system
Power distribution system 4 80
False isolation rate
Correct isolation rate
100
60 40 20 0
3 2 1 0
1
5
10 15 Window size
20
25
1
UH−60 helicopter transmission system
20
25
UH−60 helicopter transmission system
6 80
False isolation rate
Correct isolation rate
10 15 Window size
7
100
60 40 20
5 4 3 2 1
0
0 1
Fig. 11.
5
5
10 15 Window size
20
25
1
5
10 15 Window size
20
25
Boxplots of CI and FI for power distribution and UH-60 helicopter transmission system TABLE V
static fusion techniques (based on batch data).
T YPE OF FAULTS
C. Sliding Window DMFD Results Figs. 10, 11 and 12 show the boxplots of correct isolation (CI) and false isolation (FI) rates for various real-world
Fault disappearance probabilities
Fault behavior
Case 1
0.0247 to 0.0484
Intermittent
Case 2
6.8089e-004 to 9.2327e-004
Permanent
Case 3
0.2235 to 0.3926
Highly intermittent
models. These plots were obtained using the sliding-window DMFD method. The boxplot shows the dispersion of the data with lines at lower quartile (25%), median and upper quartile (75%). The
appearance probability was kept so that on average 3 faults occur over a span of 100 epochs.
whiskers are shown by extending the lines from each end of the
Fig. 13 illustrates the correct isolation rate for various fault
box and the maximum length of the line is kept as a function of
behaviors. The results show the mean value and the vertical
the inter-quartile range. The outliers are shown as ‘o’ in the
lines on the data points indicate the standard deviation. These
figures. The window size is selected such that it gives high
results were obtained using 1000 Monte Carlo runs. The
CI with a small boxsize and a low FI with a low boxsize.
results demonstrate that the algorithm achieves low variance
The most suitable window size using the above criterion for
when the fault behavior is highly intermittent.
Automotive, Docmatch, Powerdist, Helitrans, and EngineSim
False isolation rate plot (Fig.14) also illustrates the same
systems are 15, 10, 20, 15 and 15, respectively. Next, we
behavior as CI plot, i.e., the algorithm achieves the least
perform simulations to study the effect of intermittent faults
variance for the highly intermittent fault types. This illustrates
on the performance of the sliding-window DMFD method.
that the DMFD algorithm is highly suitable for intermittent
The automotive system is used for simulations. The fault
faults.
18
Window size Fig. 12.
Window size
Boxplots of CI and FI for engine simulator system
D. Complexity The algorithm presented here reduces the overall complexity
99
from O(K(2m )) to O(K(m + Of )) where m is the number
98
of component states, K is the number of epochs and Of is
97
the set of failed tests. More specifically, the complexities of
96
binary Viterbi algorithm over all component states and the
95
subgradient method are O(Km) and O(KOf ), respectively,
Correct isolation rate
100
per iteration; this is a substantial improvement over extensive 94 Intermittent Permanent Highly Intermittent
93
92 0
5
10
15
20
25
30
approaches based on exact inference. X. C ONCLUSION
Window size
In this paper, we discussed the problem of dynamic mulFig. 13.
Correct isolation rate for various fault behaviors
tiple fault diagnosis (DMFD) with imperfect tests. The original DMFD problem is an intractable NP-hard combinatorial optimization problem. Using a Lagrangian relaxation-based coordination framework, we decomposed the original DMFD
7
problem into parallel decoupled subproblems coordinated via
Intermittent Permanent Highly Intermittent
6
Lagrange multipliers. Each subproblem corresponds to finding False isolation rate
5
the optimal state sequence of a fault with fixed Lagrange multipliers. The subproblems were solved using a binary Viterbi
4
decoding algorithm. The coordination among the subproblems
3
was facilitated by Lagrange multipliers, which were updated 2
using a subgradient method. 1
0
We discussed four formulations of the DMFD problem. 0
5
10
15
20
Window size
Fig. 14.
False isolation rate for various fault behaviors
25
30
Analogous forms of these formulations have been studied widely in fault diagnosis community in a static context, and applied in various fields. Here, we provided a unified formulation of all the MFD formulations in a dynamic context. The
19
first formulation refers to a generalized version of the DMFD
states. The maximum function value of ξi in (24) at time
problem when the detection and false alarms probabilities are
k is denoted by δk (xi (k)), and the value of xi where the
associated with each test and fault. In the second formulation,
function value is maximum; is denoted by ψk (xi (k)). For
the false alarm probability is associated with fault-free case
binary case, we have used the notation δk (0) = δk (xi (k) = 0)
only. The solution to the second formulation was shown to be
and δk (1) = δk (xi (k) = 1). At time k = 1,
quite similar to that of problem formulation 1, except for the
δ1 (xi (1)) = ξi (xi (1), xi (0), {λj (1)}) P P = {µi (1) − cij λj (1) +
need to update an additional Lagrange multiplier. The third formulation considers the case where the uncertainties are as-
oj ∈Of (1)
cij }xi (1)
oj ∈Op (1)
+σi (1)xi (0) + hi (1)xi (1)xi (0)
sociated with only test outcomes. This models dynamic fusion of classifier outputs. In the fourth formulation, we considered
ψ1 (xi (1)) = φ; where xi (0) ∈ {0, 1}.
the deterministic case, which led to a novel dynamic set
(60)
covering problem. We implemented the algorithm on several real world data sets and the results validated the theory. The key advantage of our approach is that the method provides an
Recursion: The recursion step involves maximizing the objective function at each epoch k.
approximate duality gap, which is a worst case indicator of
δk (xi (k)) = {
the difference between the feasible solution and the optimal
annealing method. The latter provides better primal function
−
X
cij λj (k)xi (k)
oj ∈Of (k)
+
max
xi (k−1)∈{0,1}
[δk−1 (xi (k − 1)) + σi (k)xi (k − 1) +hi (k)xi (k)xi (k − 1)];
value as compared to the subgradient method. In this paper, we assumed that the DMFD model parameters are known
cij + µi (k)}xi (k)
oj ∈Op (k)
solution. Our results demonstrate that our algorithm achieves high isolation rate as compared to the deterministic simulated
X
(61)
for 2 ≤ k ≤ K; xi (k) ∈ {0, 1}
and faults evolve independently, and are coupled through the test outcomes via the diagnostic matrix (D-matrix). In our future work, we will implement techniques to learn the
ψk (xi (k)) =
arg max [δk−1 (xi (k − 1)) xi (k−1)∈{0,1}
+σi (k)xi (k − 1) + hi (k)xi (k)xi (k − 1))];
(62)
DMFD model parameters from the observed test outcome sequences and relax the independence assumption to solve the
Termination: This step computes the objective function at time
DMFD problem when faults are dependent. Coupled hidden
epoch k = K.
Markov models offer a promising platform for the solution
F∗ =
of dependent faults problem [18]. We will also focus on improving the primal solution using a soft Viterbi algorithm.
max
xi (K)∈{0,1}
{δK (xi (K))];
x∗i (K) = arg max [δK (xi (K))]
(63)
xi (K)∈{0,1}
A PPENDIX I
Optimal state sequence backtracking— The backtracking step
S OLVING SUBPROBLEMS USING V ITERBI A LGORITHM
computes the optimal state sequence by tracing the path
In this Appendix, we discuss the key steps of the Viterbi algorithm which is used to solve each subproblem corresponding to each component state sequence xi .
backwards. The optimal state x∗i (k) of ith fault at time epoch k is given by x∗i (k) = ψk+1 (xi (k + 1)∗ ),
k = K − 1, ...., 1.
(64)
Initialization: In this step, the objective function is computed at k = 1 for each node (component state). It is assumed
Similar to the recursion step, we can further simplify termi-
that the initial state x(0) is known for all the component
nation and backtracking for the binary case.
20
A PPENDIX II
[2] N. Odintsova, I. Rish, S. Ma. “Multifault Diagnosis in Dynamic Sys-
U PDATING LAGRANGE MULTIPLIERS VIA SUBGRADIENT
tems,” Proceedings of IM, 2005. [3] M. Shakeri, K. R. Pattipati, V. Raghavan and A. Patterson-Hine, “Op-
METHOD
timal and near-optimal algorithms for multiple fault diagnosis with unreliable tests,” IEEE Transactions on Systems, Man and Cybernetics-
Lagrange multipliers are updated via
Part C: Applications and Reviews, vol. 28, No. 3, August 1998.
λl+1 j (k)
=
max(0, λlj (k)
+β
l
(k)dlj (k))
(65)
[4] F. Tu, K. R. Pattipati, S. Deb, and V. N. Malepati, “Computationally efficient algorithms for multiple fault diagnosis in large graph-based
for j ∈ Of (k) and k ∈ (1, K) where the subgradients dlj (k)
systems,” IEEE Transactions on Systems, Man and Cybernetics, vol.
at iteration l and at epoch k are
33, no. 1, pp. 73-85, January 2003.
dlj (k) = ln(yj∗ (k)) −
m X
[5] K. R. Pattipati and M. G. Alexandridis, “A heuristic search and infor-
cij x∗i (k) − ηj
(66)
mation theory approach to sequential fault diagnosis,” IEEE Trans. on Systems Man and Cybernetics, vol. 20, no. 4, pp. 872-887, 1990.
i=1
and step size β l (k)is
[6] V. Raghavan, M. Shakeri and K. R. Pattipati, “Optimal and near-optimal
(Ql − Q∗ ) β (k) = −υ T . f P l 2 (dj (k)) l
test sequencing algorithms with realistic test models,” IEEE Transactions
(67)
on Systems, Man, and Cybernetics: Part A - Systems and Humans, vol. 29, no. 1, pp. 11-27, January 1999.
j=1
[7] T. Le and C. N. Hadjicostis, “Graphical inference methods for fault
Since the optimal dual function value is not available, it is
diagnosis based on information from unreliable sensors,” Proceedings
estimated using the primal feasible solution Jf and best current
of Intl. Conf. on Control, Automation, Robotics and Vision, pp. 1012-
dual value Qmin using (27) and (28), respectively. We estimate the optimal dual function as ˆ ∗ = ω(Jf + Qmin ) Q 2
IEEE Trans. on SMC: Part C - Applications and reviews, 30, no. 4,
(68)
value Qmin doesn’t decreases in the previous 20 iterations of the subgradient procedure with the current value of υ, then υ is reduced by a factor. To improve the subgradient convergence, we also vary ω, which is increased or decreased based on whether the dual function value is decreasing or not [12]. We used the following stopping criteria for the subgradient
2000. [9] O. Erdinc, C. Raghavendra, and P. Willett, “Real-time diagnosis with sensors of uncertain quality,” SPIE Conference Proceedings, Orlando, 1490-1499, April 2003. [10] S. Ruan, Y. Zhou, F. Yu , K. R. Pattipati, P. Willett and A. PattersonHine, “Dynamic multiple fault diagnosis and imperfect tests,” IEEE Trans. on Systems, Man and Cybernetics: Part A, under review, June 2006. [11] Z. Ghahramani and M. I. Jordan, Factorial hidden Markov models, Machine Learning, Kluwer Academic Publishers, Boston, 1997. [12] D. Bertsekas, Nonlinear programming, Athena Scientific, Belmont, MA,
method. Stop if
[8] J. Ying, T. Kirubarajan, and K. R. Pattipati, “A Hidden Markov model based algorithm for fault diagnosis with partial and imperfect tests,”
and initial value υ = 0.01 is used. If the best current dual
•
1017, Singapore, December 2006.
O Pf j=1
USA, 2nd Edition, 2003.
(dlj (k))2
= 0 since we cannot define a suitable
step size in this case, •
Stop if υ ≤ 10−4 because step sizes becomes too small,
•
Stop if number of iterations crossed the maximum no. of iterations i.e., l ≥ 100.
[13] J. Pearl, Probabilistic reasoning in intelligent systems: networks of plausible inference, Morgan Kauffmann Publishers Inc., San Mateo, CA, 1988. [14] D. Forney Jr., “The Viterbi algorithm,” Proc. IEEE, vol. 61, pp. 268-278, March 1973. [15] J. E. Beasley, “An algorithm for set covering problems,” European Journal of Operational Research, vol. 31, pp. 85-93, 1987.
R EFERENCES [1] F. Yu, F. Tu, H. Tu, and K. R. Pattipati, “Multiple disease (fault)
[16] S. Singh, K. Choi, A. Kodali, K. Pattipati, S. M. Namburu, S. Chigusa, D. V. Prokhorov, and L. Qiao, “Dynamic fusion of classifiers for fault diagnosis,” IEEE SMC Conference, Montreal, Canada, October 2007.
diagnosis with applications to the QMR-DT problem,” Proceedings
[17] A. Kodali, S. Singh, K. Choi, K. Pattipati, S. M. Namburu, S. Chigusa,
of Computing Communication and Control Technology International
D. V. Prokhorov, and L. Qiao, “Diagnostic Inference with Nearly Perfect
Conference, Austin, TX, 2004 and also to appear in IEEE Trans. on
Tests,” will be published in IEEE Aerospace Conference, Big Sky,
SMC: Part A,pp. 746-757, September 2007.
Montana, March 2008.
21
[18] M. Brand, “Coupled hidden markov models for modeling interacting processes,” Neural Computation, November 1996.