Specification and Design Considerations for Reliable Embedded

0 downloads 0 Views 188KB Size Report
Specification and Design Considerations for Reliable Embedded. Systems. Adeel Israr and Sorin A. Huss. Dept. of Computer Science. TU Darmstadt. Darmstadt ...
Specification and Design Considerations for Reliable Embedded Systems Adeel Israr and Sorin A. Huss Dept. of Computer Science TU Darmstadt Darmstadt, Germany {israr|huss}@iss.tu-darmstadt.de Abstract The objective of this paper is to introduce a novel representation as a means to consider both permanent and temporal errors in order to increase the overall reliability of an embedded system. The deployment of embedded systems in safety critical applications, e.g. in the automotive domain, demands that the fundamental set of design criteria consisting of functionality, timeliness, and production costs be extended to consider of reliability as an optimization criterion. Thus reliability engineering becomes part of the overall design flow for embedded systems. The proposed approach is based on the introduction of Permanent/Transient error Decision Diagrams and on dedicated algorithms for the generation of system implementation sets which feature maximum reliability at minimal costs in terms of redundant resources. The proposed approach is demonstrated for a control system taken from the automotive domain.

1. Introduction Information processing systems are becoming increasingly pervasive such that the reliance on their continual provision of correct services, in spite of external perturbations such as node crashes, has correspondingly increased from a user’s point of view. From the very first computer systems the emphasis has been on correctness of the result, in addition to the cost of the system, and on the overall execution time of the application. With the emergence of embedded systems, the requirement of correct results has become more stringent, because the embedded computer systems are being deployed to applications in which incorrect results may translate into loss of life or property. In case of safety-critical systems, dedicated and possibly multiple replicated processors can be used to execute the safetycritical tasks, whereas the correctness and the timeliness of results are the main design objectives. Thus, reliability has to be engineered in the same way as the functionality of an embedded system during the whole life cycle of a product. Most of the existing approaches for the design of a reliable system rely on a sequence of basic methods: a) design the functionality, b) analyze

978-3-9810801-3-1/DATE08 © 2008 EDAA

the reliability figures, and c) replicate and/or exchange critical parts of the embedded system in order to meet given reliability constraints. Designing for reliability in late stages of the design process may lead to a complete redesign of the system. Hence, there are certain reliability-driven factors that need to be taken into consideration earlier on. One of these factors is replication of computing tasks, the other is redundancy of execution modules. The errors that can occur in an embedded system can be divided into two categories based on their duration: permanent and transient errors. During the infancy of computers the permanent errors were dominating [13], but the advancement of VLSI technologies has greatly reduced the probability of the their occurrence. Nevertheless, a system designer should not outrightly ignore them. On the other hand the transient error rates have over time increased considerably [9]. Especially the radiation from different sources (like environment [2], radioactive impurities within silicon and the packaging [12], etc.) can lead to single event upsets (SEU) resulting in transient errors. Permanent errors are easier to locate because of their persistent behavior. Once the error is identified and located, the system can be reconfigured even up to full functionality in case of sufficient redundant and accordingly binded resources. A transient error may be overwritten by a correct result before causing any flaw. On the other hand, a transient fault in the system can disappear before being isolated, and the error thus caused may propagate into the final result of the system, ultimately causing the failure of the system. Thus, a mechanism is required that can mask these errors and prevent them from propagating to the final result. This paper makes the following contributions: a) it proposes a HW/SW co-design framework, where the reliability of the system is ensured in case of the occurrence of both permanent and transient errors; b) it presents a new data structure (PTDD) that denotes both system specifications and viable system implementations. This data structure helps in selection of a system implementation for a particular permanent error state of the system resources. It also provides an

efficient way of calculating the reliability of the whole embedded system in presence of transient errors.

2. Related Work Any attempt to increase the reliability of a computing system results in general in additional overall costs, execution times, area and/or power figures. This fact has been considered by researchers while developing methods for the design of reliable systems. Nickel and Somani [10] exploited the empty instruction slots in the execution stream of a superscalar processor. The result of the instruction about to be “committed” is compared with its re-executed version before actually committing to the state of the processor. Since the proposed architecture is implementing a kind of “rollback”, it cannot be used in real-time systems. In addition, this approach operates on a rather low abstraction level. The reliability as one of the design criteria has to be considered at higher levels of abstraction when conceiving modern embedded systems. Suri et al. [14] developed a fault containment-based approach, whereby applications are first decomposed into their constituent tasks and procedures, which are subsequently transformed to create fault-containment blocks. This approach works well when the source code is already available, but it is not readily applicable at an early entry point in the system design process. A similar approach is considered by Dave and Jha [6]. Recently some researchers have studied reliability as one of the central design criteria in the embedded system HW/SW co-design flow when starting the design at the system level of abstraction. A co-design framework which assesses reliability properties and a dedicated “target architecture” as the processing environment in the enhanced co-design flow has been presented by Bolchini et al. [3, 4]. This architecture features primary and auxiliary processing resources. A totally self-checking “system-reconfigurator”, as part of the architecture, can reconfigure the system in case of a fault in the primary processing resource. However, this architecture does not support online masking of transient errors. In [8] Jhumka et al. have developed the concept of using dependability as an optimization criterion in the system level design process of embedded systems. Multiple instances of the tasks in the task graph are being executed in parallel to make the results produced more reliable and hence to increase the dependability of the complete system. However, they do not describe any method of error detection. Unlike many others, Xie et al. [15] utilize the presence of empty time slots in the schedule of the control flow graph produced by the scheduling process during HW/SW co-design. These empty time slots are utilized to execute replicas of the tasks executed forehand by the system architecture unveiling a certain similarity to the ideas of [10]. The replication of the tasks is done either in time or in space in order to arrange for duplicated comparison. The problem with this approach is

that implementing a task by means of duplicated comparison does not increase its reliability [13]. Glaß et al. [7] have proposed a novel approach to bind some or all tasks of the task graph to multiple allocated resources. This arrangement ensures continuation of the service of the system in spite of errors in some resources of the system. In the following sections, we will discuss some restrictions of this scheme.

3. System Specification and Implementation Space Soft/transient errors are much more cause for concern in embedded systems than hard/permanent errors [9], but in spite of this fact hard errors cannot be ignored right away. Therefore, in this work we consider both hard and soft errors in the system-level design process. We assume that the errors that occur during the execution of processes/tasks on the resources are soft errors. The hard errors are assumed to occur during the offtime. Such an assumption is plausible because the probability of a soft error occurring in the system is a lot greater than that of a hard one. So, even if a hard error occurs during the execution time of the system, it can be classified as a soft error for this execution time period. The system is reconfigured upon the detection of hard errors and it will continue its operation given enough redundant resources. An online soft error detection mechanism is required during the execution of processes on the resources. However, if there are not enough resources after a hard error has been detected, then the system is said to have failed. In the sequel, the proposed strategy of dealing with this problem is detailed. The specification graph Gs = (Gp , Ga , Em ) as proposed in [7] captures both the functionality and the basic architecture of envisaged implementations of an embedded system. For example depicted in Fig. 1. Gs consists of two subgraphs: The acyclic task graph Gp = (V p, Ep) models the data flow graph with processes p ∈ Vp and data dependencies e ∈ Ep , and the architecture graph Ga = (Va , Ea ) models resources r ∈ Va and their interconnections e ∈ Ea . The dashed edges are called mapping edges m ∈ Em and indicate process to resource mapping. During system design, a set of resources α ⊆ Va is selected to be used for system development, i.e., an allocation is performed. This is followed by assigning each process to the resource(s). This design step is called binding β ⊆ Em . As in [7] a process is binded to multiple resources to ensure the possibility of a working system even in case of hard errors. Then, a valid schedule has to be determined as the last step of system synthesis. In our approach, for decreasing the probability of system failure due to soft errors, we have decided to implement each process in a TMR [13] style. Duplicated comparison can not be used, as it has less reliability than a singular implementation, and NMR is sort of an overkill [13]. In the following we give some

r1 pA

r2 pB

r3 pC

r4

Figure 1. Specification Graph 1 definitions that constitute the system model. Definition 1 TMRable Implementation A 4-tuple (p, ra , rb , rc ) is TMRable, if ra , rb , rc ∈ α and ma = (p, ra ), mb = (p, rb ), mc = (p, rc ) ∈ β. The process p is executed on all three resources, but the voting takes place at the resource ra . The binary function T : Vp × Va × Va × Va → {0, 1} determines whether the 4-tuple is TMRable or not. T (p, ra , rb , rc ) =

ra , rb , rc ∈ α∧ ma = (p, ra ), mb = (p, rb ), mc = (p, rc ) ∈ β∧ {(rb , ra ), (rc , ra )} ⊆ Ea ∨ {(rc , rb ), (rb , rc )} ⊆ Ea

Each process has at least one TMRable implementation t ∈ Γ, where Γ is the set TMRable implementations of all processes in the task graph. During the run time of the system a process, due to hard errors, cannot be reconfigured to another of its TMRable implementations, then a singular process mapping implementation m ∈ β is selected. Such a system implementation is more susceptible to soft errors, but it is still better than a failed system. Definition 2 COMMable Implementations Two implementations for two directly dependant processes are said to be COMMable, if the data produced by the implementation of the predecessor process can be transferred to that of successor process. To verify this property the following binary function is used. The binary function COM M able : Em ∪Γ×Em ∪Γ → {0, 1} verifies if the implementations of two processes can communicate in an unidirectional way. COMMable(tmp , tm p ) = (p,  p) ∈ Ep ∧ (tm, t m are implementations of p,  p respectively)∧ (((tm = (p, ra ) ∨ tm = (p, ra , rb , rc )) ∧ t m = ( p,  r )) → (ra =  r ∨ (ra ,  r ) ∈ Ea ))∨ (((tm = (p, ra ) ∨ tm = (p, ra , rb , rc )) ∧ t m = ( p, ra , rb , rc )) →  (ra , ra ), (ra , rb ), (ra , rc )} ⊆ Ea ∨ ) (ra = r ) → {(ra , r1 ), (ra , r2 )} ⊆ Ea , r , r1 , r2 ∈ {ra , rb , rc }&r = r1 = r2 Definition 3 Feasible Binding The function BindF eas(β, ImpT yp) verifies whether the binding β is feasible for implementing the system. If the parameter ImpTyp is “Γ”, then the function verifies whether the binding “β” is feasible for implementing a system designed to handle soft and hard errors, and if ImpT yp is “Em ”, then the function verifies the feasibility of the system that handles hard errors only. The binding β is defined to be feasible if: 1. ((ImpT yp = Γ) → ∀p ∈ Vp ∃T (p, ra , rb , rc )) ∨ ((ImpT yp = Em ) → ∀p ∈ Vp ∃(p, r) ∈ Em , r ∈ Va ) 2. ∃L ⊆ β : ∀(p,  p) ∈ Ep , ((ImpT yp = Γ) → (tm , t m ∈ Γ)) ∨ ((ImpT yp = β) → (tm , t m ∈ β)) COM M able(tm , t m) = 1

A process mapping implementation for a process is selected during the runtime, if there are not enough system resources without hard error to constitute a TMR implementation for it. Such a graceful degradation is only possible, if for a given binding B BindFeas(B,Γ)=BindFeas(B,β)=true holds. It can be easily shown that if a binding is feasible for a system that handles hard and soft errors, it is also feasible for the development of a system that handles hard errors only. Both approaches use the concept of multiple binding to tolerate hard errors. Definition 4 α-string Let RW: Γ ∪ Em → {0, 1} define a function that translates the input implementation to a binary variable. A value “1” indicates that there is no hard error in the resource(s) that carry the execution of the process. Let a bit string “α-string” of length |α| be defined such that a 1 indicates no hard errors in the corresponding resource, and 0 otherwise. The reliability of a resource r and process mapping m is either entered as a number or calculated as a time function based on the respective hard/soft error rates. For a TMR implementation, the reliability is calculated using the reliability value of the constituent process mappings as given in [13]. Note that we assume that the probability of a hard error occurrence is much smaller than the probability of a soft error of a resource. Definition 5 System Implementation A set System Implementation (SysImp) is defined as: tm ∈ SysImp ⊆ Γ ∪ Em : |SysImp| = |Vp |. The elements of this set must be valid implementations, i.e., either a simple process mapping or a TMR of a process p ∈ Vp . Two implementations tmp of process p and tm of process p are members of SysImp, if ((p, p) ∈ p Ep ) → COM M able(tmp , tm ) ∧ RW (tmp ) ∧ RW (tm ) p p is a tautology. The probability that a given system is working depends upon the reliability of every implementation within a SysImp set. Therefore, reliability of a system implementation set “SI” is given by:  RelSysImp (SI) =

RelImp (tm).

tm∈Γ∪Em

Consider the specification graph of a simple example as depicted in Fig. 1. Let βˆ = Em be selected as the binding of the system specified in Fig. 1. The binding can verified to be feasible according to BindF eas(β, Γ) ˆ β). and BindF eas(β, Fig. 2 a, b, c, and d represent the system implementation sets for α-strings “1111”,“1011”, “0111”, and “1001”, respectively. Large boxes in Fig. 2 represents TMR implementation, with red colored resources representing the voting resource.

4. Permanent/Transient-error Decision Diagram The well-known data structure of Binary Decision Diagrams (BDD) has been adopted in [7] to calculate the reliability of the overall system. In this solution it is assumed that the system can be reconfigured when a soft error occurs, provided that there are

r2 r3 pA r4

r1 r3 pA r4

r2 r3

r1 r3

pB r4

r2 r3

pA r4

pB r4

r1 r3

r2 r3

pB r4

r1 r4

pC r4

pC r3

(a)

(b)

pA r2 pB r3

pC r4

pC r3

(c)

(d)

Figure 2. Possible SysImp Sets for the Example in Fig. 1 pA

pB

r1

pC

pD

r2

r3

Figure 3. Specification Graph 2 enough resources. This assumption, however, is questionable, because the system cannot detect the occurrence of the soft error because of its short duration. In contrast, in our work a system implementation is statistically selected for a given error permutation of an α-string, thus tolerating hard errors and choosing a TMR process implementation when possible to mask soft errors. The genuine BDD concept is not feasible in this context because of the involved statistical decision making. Therefore, a new data structure called joint Permanent/Transient-error Decision Diagram (PTDD) is being proposed to capture the effects of both hard and soft errors and to determine feasible reconfigurations of the system. For further development of the concept of PTDD, a simple specification graph, see Fig. 3, is used with the ˜ β)=true. binding β˜ = Em and where only BindFeas(β, The first, i.e. upper part of the PTDD, the Permanent-error Decision Diagram (PDD), is a data structure used to find the fundamental state of the system (i.e., failed/reconfigurable) depending upon the presence of hard errors in the architecture resources. The ability to tolerate these hard error comes because of multiple bindings. The definition and construction algorithms for PDD are similar to those for BDD [1, 5]. The main difference is that unlike BDD, which has two types of leaf nodes “0” and “1”, a PDD has only SysImp Set nodes. An empty SysImp Set node represents the failure of the system due to hard errors, whereas a non-empty node represents the possibility of reconfiguration after a hard error. Each path in the PDD from the root node represents an α-string, the SysImp Set node connected that path represents the set of all possible SysImps for that α-string feasible according to Definition. 5. The second, i.e. lower, part of the PTDD, hereafter called as Transient-error Decision Diagram(TDD), determines whether the system has succeeded or failed

due to soft errors occurring in any implementation of the selected SysImp set. The TDD is a data structure, further explained in Def. 6, contains the entire set of system implementations for all resources in the set α. Definition 6 A TDD is a directed graph that consists of one or more inverted tree representations with the vertices set VT DD containing two types of vertices: A non-terminal vertex v has as attributes an implementation and two children F ailure(v), Success(v) ∈ VT DD . A terminal vertex v has as attribute the state state(v) ∈ {F, S} representing failure and success, respectively. Each path from the top vertex of the TDD graph represents a system implementation set (SysImp). The state of the SysImp is the state of the top vertex in the graph representing that set. The number of nodes in TDD is reduced by using the concept of eliminating the similar nodes as developed by Akers [1] for his binary decision diagram variant. Akers describes an algorithm for building a BDD, which develops it in a top to down manner. The TDD, however, cannot be built like a normal BDD, because there is no top-root node in a TDD and the nodes are arranged in inverted tree-structures. To generate a TDD, every implementation of the process from one end of the sorted ProcList (list of processes in the task graph) form the roots of the inverted trees. The ProcList is sorted according to the number of valid implementations associated with each process. This is a simple approach discovered through experiment to generate minimum sized TDD. Then every implementation of a successive process in the list is made to be the previous nodes of every top most node in the TDD for that stage of development. The procedure is repeated for all the processes in the ProcList and finally each path represented by the top most node is checked to verify, if it results in a valid SysImp according to Def. 5 assuming that all the resources in α are without any hard error. A path that is found not to be a valid SysImp member is not inserted into the TDD data structure. The TDD is a data structure to store all SysImp sets assuming that all the resources in the resource set α are without any hard error. However, during the generation of the PDD, i.e., the upper portion of PTDD, all possible permutations of the corresponding α-strings have to be investigated. It is not feasible to generate a TDD for every permutation of α-string, therefore a function MatchingSysImps generates a set of SysImps, i.e., the set of the top most nodes of the TDD, given an α-string. Starting from the root nodes of the TDD data-structure, the function recursively searches every path for system implementations for the given α-string, that are feasible according to Def. 5, while deleting the path that contains the implementation, which is dependant upon a “faulty” resource. The TDD is, thus, a special BDD that has an inverted tree structure with

r2 r2

SISet

r2 r3

r3

r3

SISet

SISet

SISet

pA

pB pC

pA r2

pA r1 pB r2

pA r1

pA r2

pB r1

pB r2

pD r2

pD r3

pA r1

pA r1

pD

pE

pB r1 pF

pJ

pC r3

F

pH

pG

r1

r2

r3

r4

r5

r6

pI

Res r1 r2 r3 r4 r5 r6

S

Figure 4. Permanent/Transient-error Decision Diagram for the System Specification of Fig. 3

the false/failure side of every node pointing to the failure of the whole system and with the top most nodes representing a complete SysImp. Therefore, the algorithm for calculation the reliability for BDD encoded fault-tree [11] cannot be used directly. The size of SysImps set for each permutation of αstring can be large. However, given an α-string only one SysImp can be selected. Therefore, a selection criterion has to be defined. For this work, a SysImp with the highest reliability is selected from the SysImps set for an α-string. Fig. 4 depicts a complete PTDD for the example ˜ system specification given in Fig. 3 with binding β. We have developed an algorithm denoted PTDDReliability to calculate the reliability of the entire system depicted as PTDD. This dedicated algorithm is developed on the similar footings to the algorithm developed by Rauzy [11], which calculates the failure probability of BDD encoded fault-trees. The algorithm recursively calculates the reliablity of the entire PTDD. Unlike Rauzy, whose algorithm’s complexity depends upon the size of BDD, the complexity of the algorithm we developed is dependent only on the size of PDD.

5. Results In the following, we present some preliminary experimental results of our design methodology for an application from the automotive domain. We have used the specification graph as given in Fig. 5 and the reliablity of the resources in the architecture graph, with Table 1 representing the mapping of the processes in the task graph onto the resources (Em ). For this experiment we ˆ take βˆ = Em . It can be proved that BindFeas(β,Γ)= ˆ BindFeas(β,β)=true. The soft error rate used in this experiment is 0.01 errors/sec. Table 2 depicts some of the α-strings with the corresponding non-empty SysImps sets. We generated two PTDD: one is dedicated to a system that only covers hard errors (HrdSys), and the second corresponds to

Task Graph

Architecture-Graph

Reliablity 0.999999971 0.999999956 0.999999928 0.999999994 0.999999999 0.999999991

Resource Reliabilities

Figure 5. Simplified Specification of a Cruise Control in a Car

pA pB pC pD pE pF pG pH pI pJ

r1 0.096

r2 0.08

r3 0.096 0.08 0.08

0.04 0.024 0.04 0.136 0.04

0.024 0.136 0.04

0.056 0.024 0.04 0.024 0.136

r4 0.096 0.08 0.08 0.04 0.056 0.024 0.04 0.024 0.136

r5

r6 0.08

0.08 0.04 0.056 0.024 0.04 0.024 0.04

0.04 0.056

0.04

Table 1. Process to Resource Mappings Em and Execution Times

α-string r1 , r 2 , r 3 , r 4 , r 5 , r 6 111111 111110 111101 111100 110111 110110 110101 101111 101110 101101 100111 011111 011110 011101 010111 001111

SysImp Set Members Simple Mixed TMR 2026 1668268 126 1966 1334194 72 700 42756 0 700 42756 0 756 12516 0 696 6042 0 128 32 0 709 59031 0 649 31947 0 45 205 0 336 1332 0 653 95719 0 640 81716 0 180 1980 0 203 977 0 229 2547 0

Table 2. α-strings and corresponding System Implementations Sets

Rel = 0.999985 r1 r3

Rel = 0.993859

pA r4

pA r1

pC r4 r4 r5 pD r2

r3 r5 pE r4 r1 r3

pG r4

r5 r6

r1 r3 pI r4

pE r4

pI r1

r2 r3

pH r4

r1 r2 pJ r5

HrdSftSys

pF r2

pG r1

r5 r6

pD r4

r3 r4 pF r5

pB r4

pA r3

r3 r4 pC r5

pB r4

pC r3

pD r4 pF r4

Rel = 0.993859

pB r4

pB r4 r3 r5

r2 r3

Rel = 0.996482 pA r4

r2 r3

pE r4

r3 r5 pG r4

pC r3

pI r4

r3 r5 pH r4

pD r4

pF r4

pE r3

pG r4

pI r4

pH r4

pH r2 pJ r5

pJ r1

HrdSys

Figure 6. Selected SysImp Sets for α-string “111111” system that handles both hard and soft errors (HrdSftSys). For HrdSftSys, the TMR, process mapping, and mixed SysImps sets are feasible. However, in case of HrdSys process mapping based SysImps sets are feasible. The “Simple” column in Table 2 shows the number of pure process mapping based SysImps sets for α-strings of both HrdSftSys and HrdSys Systems, whereas columns “Mixed” and “TMR” show the number of possible Mixed and simple TMR based SysImps sets of α-strings for HrdSftSys, respectively. Using the algorithm PTDDReliability, the reliability of HrdSftSys and HrdSys was found to be 0.999985 and 0.993859 respectively. Thus, it can be seen that the HrdSftSys and the HrdSys implemented on the same architecture have different reliability values, because one of them only can tolerate hard errors, whereas the other one can also tolerate soft errors. The experiment was carried out on an Intel Pentium 4 2.60GHz machine with 1GB RAM. It took 3 mins and 48 secs to generate the TDD, and 2 mins and 17 secs to generate the PDD for HrdSftSys. The time to run the PTDDReliability algorithm was too small to be detected by the time calculating process. The long execution time to generate the PTDD is because it has inherited many of its properties, along with long generating time, from BDDs.

6. Conclusion and Future Work In this paper we introduced a novel representation as a means to consider both permanent and temporal errors in order to increase the overall reliability of an embedded system. A data structure denoted as PTDD and some dedicated algorithms were introduced. PTDD stores the system specification and helps in finding the ‘best’ system implementation for a particular state of system resources. It also helps in efficiently calculating the reliability of the entire system. However, like BDD the construction process is quite

HrdSftSys

pJ r5

HrdSys

Figure 7. Selected SysImp Sets for α-string “001111” expensive. Therefore, in the future we will explore hierarchal PTDDs as part of reliability engineering. Similarly, different criteria for selecting a system implementation from a given SysImp set based on a mixture of reliability, execution and communication time, and power figures will be studied.

References [1] S. B. Akers. Binary decision diagrams. IEEE Trans. on Computers, C-27(6):509–516, June 1978. [2] L. S. Blanchard and D. Whitehead. A study to assess the possible effects on radio based services of electromagnetic emission from the proposed increase of electrically powered public ad private transport. Final Report, UK Transport Research Laboratory, October 2000. [3] C. Bolchini, L. Pomante, F. Salice, and D. Sciuto. Reliability properties assessment at system level: a co-design framework. Proc. IEEE Intl. On-Line Testing Workshop, 7:165– 171, 2001. [4] C. Bolchini, L. Pomante, F. Salice, and D. Sciuto. A system level approach in designing dual-duplex fault tolerant embedded systems. Proc. IEEE Intl. On-Line Testing Workshop, pages 32–36, July 2002. [5] R. E. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Trans. on Computers, C-35(8):677–691, August 1986. [6] B. P. Dave and N. Jha. Cofta: Hardware-software co-synthesis of heterogeneous distributed embedded systems for low overhead fault tolerance. Proc IEEE Intl. Symp. Fault-Tolerant Computing, pages 417–441, 1997. [7] M. Glaß, M. Lukasiewycz, T. Streichert, C. Haubelt, and J. Teich. Reliability-aware system synthesis. Proc. IEEE/ACM Design, Automation and Test in Europe, pages 409–414, April 2007. [8] A. Jhumka, S. Klaus, and S. Huss. A dependabilitydriven system-level design approach for embedded systems. IEEE/ACM Design, Automation and Test in Europe, pages 372–377, March 2005. [9] S. R. McConnel, D. P. Siewiorek, and M. M. Tsao. The measurement and analysis of transient errors in digital computer system. Proc IEEE Intl. Symp. Fault-Tolerant Computing, pages 67–70, 1979. [10] J. B. Nickel and A. K. Somani. Reese: A method of soft error detection in microprocessors. Proc. Intl Conf. Dependable Systems and Networks, pages 401–410, 2001. [11] A. Rauzy. New algorithms for fault tree analysis. Reliability Eng. and System Safety, 40:203–211, 1993. [12] C. B. Robert. Soft-errors in advanced semiconductor devices. part i: The three radiation sources. IEEE Trans. on Device and Materials Reliability, 1:17–22, March 2001. [13] D. P. Siewiorek and R. S. Swarz. Reliable computer systems: Design and evaluation. A. K. Peters Ltd, 3rd edition, 1998. [14] N. Suri, S. Ghosh, and T. Marlowe. A framework for dependability driven sw integration. Proc. IEEE Distributed Computing Systems, pages 405–416, 1998. [15] Y. Xie, L. Li, M. Kandemir, N. Vijaykrishnan, and M. J. Irwin. Reliability-aware co-synthesis for embedded systems. Proc. Intl. Conf. on Application-Specific Systems, Architectures and Processors, pages 41–50, September 2004.