AnAlgorithmProviding Fault-Tolerancefor Layered DistributedSystems M.Taghelit,S.HaddadandPSens . Université PierreeMarie t Curie C.N.R.S. MASI 4, Place Jussieu 75252ParisCedex05,France e-mail:
[email protected]
Abstract Thispaperpresentsnew a approachforthefault-tolerance inlayereddistributedsystems.Wedevelop thedynamicregenerationmethodinwhichfaultysoftwarecompone ntsaredynamicallyregenerated. In contrast to othertechniqueswhichduplicated critical components,thismethoddoesnotincreasethe complexityotfhesystemandtoleratesanunlimitednumbe orfailures.Toapplythistechnique,we developmethod a forthedesignoreliable f systems.Wetran sform aninitialsoftwarearchitectureina modelwhichencloseshomogeneouselements.Ourstudyreliesont heOSIstandardmodeldefined withintheISOorganization.So,fromaninitiallayered distributedarchitecture,weexhibita homogeneouscommunicationchain.Thebuildingandthemaintenance ofthischaininanunreliable environmentisachieved bayanlgorithm usingdynamicregeneration . Keywords f:ault-tolerance,dynamicregeneration,distributedsystems communication.
l,ayeredsystems,end-to-end
1.Introduction Distributedsystemsprovidenewopportunitiesfordevelopinghighperformanceapplications;athe sametime,becauseofdependencyofthecomponents,suchsystems areparticularlyfragile:one componentfailuremayimplyallthesystemfailure.Soit isessentialtohaveafault-tolerant management.Severalmethodsbasedontheredundancetechniqu esexist[7][1].Thesemethods preventthe failure ocritical f elementsbdyuplicating the m in several copies.There aretwobasickinds ofredundancy[6]:thepassiveonewherecopiesrunonlyithe f elementfailsandtheactiveonewhere allcopiesruninparallelwiththeelement.Thesetechni quesimplyagreatoverheadothe f systemand tolerates limited a numberoffailures(thisnumber isproportional to thenumberofcopies). Weproposeanewsolutionthatoffersanoptimumdegreeof regeneration.Insteadoduplicating f theelements,they ar
toleranceforalowcost:thedynamic reegenerated icnase ofailure. f
FromtheOSIstandardmodeldefinedwithintheISOorgani architecture taony communication chain.Acommunicationc and terminal a entitiesthrough inneroneswhich relay me
zation,wegeneralizethiskindof hainallowsdialogue a betweenaninitial ssages.
Theaimoftheproposedalgorithmistobuildandpreservet environment.Theentitiesotfhechainaredynamicallygenera elementshaveastaticexistence.Whenafailureoaf nent and integrate iinto t the chain.
hecommunicationchaininanunreliable tedunliketheothertechniqueswhere ityids etected,weregenerateanewentity
Inthefirstpart,wedescribehowtoobtainacommunication part, we outline thedifferentmethodsproviding fault-tolerance
chainfromtheOSImodel.Inthesecond and wientroduce the
principleothe f dynamicregeneration.Thethirdpartinforma onedescribesit.Finally,the lastpart showsthe correct
llypresentsthealgorithmandthefourth nessothe f algorithm.
2.Functionalschemeforlayered distributed systems TheISOorganizationdefinesanarchitectureforcommunica [3].Thishierarchicalmodeliscomposedosfevenlayers. thecommunicationbetweentwolayersotfhesamelevelis theOSIstandarddoesnotimposeaspecificlocalizatio implementallofthem.
tingopenedsystems(theOSIstandard) Eachlayerhasa specificfunctionality donethroughthelowerlayers.Although nofthelayers,currentsystemslocally
figure The 1: OSImodel Wewouldliketomakethiskindofarchitecturereliable scheme.Thefirststepconsistsinunlayeringthearchitecture newarchitecture,eventhoughidenticalastheoldone,makes entities.Thisunlayeringshowsthreekindsoefntities:the theendingentityatheendofthechainandtheinnerent successor.Allfunctionalitiesoof nelayerareincludedin bymaking noassumptionabout thelocalization.
F. orthat,wemodifythebasicfunctional The . notionolfayerdisappearsandthis achaincomposedofcommunicating initialentityathe t beginningotfhechain, itieswhicheachhasapredecessoranda anentitybutwegeneralizetheinitialmodel
figure Unlayering 2: othe f model Theglobalarchitecturebeingdefined,wewillproceedto thefunctionaldivisionofentities.Allthe entitieshavethreefunctionalities:thereceiptandse ndingom f essages beinghomogeneous,forallthe entities andtheprocessingwhichis heterogeneous. Webringtogetherthesefunctionsintwo specializedmodules:onemoduleofcommunicationthatcarri esoutthecommunicationtasksand anotherone,processingmodule,thatcarriesouttheprocess ing.Thecommunicationmodulebecomes the communication interface between anentity and the othe rs.
figure Functional 3: division Communicationmodules(CM)composean homogeneouscommunicationchain, hasthe same functionality (sending/receipt).The process ingmodulesremain
sinceeachmodule
and
heterogeneousandeachotfhemlocallycommunicateswiththec entity.Thus,afterwardswewillpayattentiontoensur done bpyreserving the communication chain icnaseofailure f
ommunicationmoduleotfhesame ecommunicationbetweenentitiesandthisis s.
Notethatthemodularityoafnentityimpliesalimite dpropagationotfhefailures.Thefailureotfhe processingmoduledoesnotmeanthebreakingofcommunication. Moreover,theseparationof functionalitiesallowsdiagnosismore preciseofailures f and thus more a efficientrecovery.
figure 5Communication : chain
3.Tolerancebydynamicregeneration Generally,techniquesprovidingfault-toleranceuseredundanc ymethods[7].Thesemethodsduplicate abasicelementinseveralcopies.Inthisway,theypr eventthefailureotfhebasicelementcalledthe primary. Therearetwokindsofredundancytechniques:theactive redundancywherecopiesareactivesin parallelwiththeprimaryelement,andthepassiveredundanc ywherecopiesareactiveonlyifthe primaly elementfails. 3.1. Activeredundancy Eachcopyreceivesthesameinputanddothesameteratm [8]. To achieve this,two differentmethodsare currently
ent.Onlyoneresultistakenintoaccount used.
Inthefirstone,avotemechanismchoosesoneresultfroma redundantelements.Adisagreementbetweenelementsprovide majority itshemostused vote technique [6].
mongallofthoseproducedbythe safailuredetection.Thesimple
figure 6The : votemechanism Inthesecondmethod,onlytheresultotfheprimaryelement fails,itisreplacedbyoneoiftscopies.Therecovery resultsotnheoutputofthe chosen copy.
isconsidered.Itfheprimarymodule isverysimple,itonlyconsistsintakingnew
Thevotetechniqueallowsalimiteddegreeotfolerance(i numberofnredundantelements,amajorityvotetolerat componentcanitselffail.Thiskindofailure f icsatast isrecommended.
.e.thenumberoef lementfailures):fora esonlyn/2failures.Moreover,thevote rophic,andareplicationothe f votecomponent
Thesecondtechniqueallowsagreaterdegreeotolerance f toleraten-lfailures.Butthismethoddoesnotpreventafa Contrarytothevotetechnique,erroneousresultsproduce account. Thesemethodsleadtoagreatoverheadotreatment. f I redundantelements)theremustbentreatmentsfortheproduc redundantelementhasafailureprobabilityandsotheglobal redundancyprovidesafastrecoveryfailure,becauseiasks t dothesameprocessandareinthesamestate).Thisme systemswhere the recovery timeibsounded [10].
thanthevote.Fornredundantelements,it ultybehaviouroftheprimaryelement. dbytheprimaryelementwillbetakeninto the f degreeoredundancy f in(i.e. s therearen tionofoneissue.Moreover,each failureprobabilityigs reater.But,active nostate'srestoration(allthetime,copies thodiscurrentlyusedforreliablerealtime
3.2. Passiveredundancy Aredundantelementcanonlystartrunningwhentheprimary oneelementisactive.
elementisfailing.Allthetime,only
figure 8passive : redundancy Whenacopyreplacesthefaultyprimaryelement,aprevious restored.Passiveredundancyneedsalsoacheckpointman expensiveinprocessingtimeandspace.Moreover,liketheac ofredundantmodulesins on-null.Generally,thistechniqu and iisot ften preferred ttohisone [9]. Themaindrawbacksofredundancytechniquesistheincrea limitedfault-tolerancedegree.Wepropose new a technique ofthe characteristicsoour f architecture and othe f tu
stateotfheprimaryelementmustbe agement.Thiskindoftechniqueis tiveredundancy,thefailureprobability ehasalowercostthanactiveredundancy
seofthesystem'scomplexityandthe thatsolvestheseproblemswhileprofiting nctionaldivision: thedynamicredundancy.
3.3. Principleothe f dynamicregeneration Theentitiesarenolongerduplicatedbutgenerateddynamical elementhasnoexistencea ndiisntegratedinthesyste same tunctionalityothe f faulty element.Thisapproach i •
.The system hasamuch s elementasfault-free a system. redundancy techniques.
•
.The overhead iosnly related ttohe faultdetection.
•
The faulttolerantdegreeinsotlimited.
•
Thegeneratedelementhavenloocalization constraintunlike fixed localization.Anewmodulemay bdeynamically genera
More precisely,thetreatmentoffaailureicsomposed naewelementand itsinsertion itnhe chain.
lyaftertheirfailure.Thegenerated monlyafteronefailure.Thiselementhasthe mpliesthe following points: So,itislesscomplexthansystem a using
theredundantelementswhichhavea ted oanunnfaulty host. ofthreesteps:thedetection,theregenerationof
figure The 9: dynamicregeneration Eachmodulehastosupervisetheoneihas t generated(itssuc failure.Thisensuresadecentralizedcontrolwhichansw requirements.
cessor)andtoregenerateiitncaseof ersourtoleranceandperformance
4.Implementationothe f dynamicregeneration 4.1. Targetsothe f algorithm •
The buildingothe f communication chain
Thebuildingothe f chainconsistsingeneratingafinitese innermodulehasapredecessorandasuccessor.Unlikethe static,themodulesaregenerateddynamically.Ontheother Thisstepconsiststgoenerate the firstmodulewhich iosn •
Maintenanceothe f communication chain ianunnreliable
Ifoneoseveral r modulesfail,thechainbreaksandnocomm entitiesips ossible.Tokeepend-to-endcommunication,it Thisicsarriedoutby the dynamicregeneration othe f faulty
of tcommunicationmodulessuchthateach othersystemswheretheelementsare hand,aninitializationstepisrequired. user'sinitiative. environment unicationbetweentheinitialandending isnecessarytoreplacethefaultymodules. modules.
4.2. Principlesothe f algorithm WewanttoconstructandkeepachainoN fmodules,each modules.There are three principles: •
Activation principle: Each non-endingmodule carriesotnhe bui
EachmodulewithnosuccessorandwhichisnottheNthonege carry otnhe building othe f chain iitis fnotcomplete a •
modulecommunicatingwiththeadjacent ldingothe f chain. neratesasuccessor.Thisallowsto nd troebuild iifmodules t fail.
Knowledge principle: Eachmoduleknowsall itssuccessors
Whenamodulefails,itisregeneratedaccordingtothea needstobeattachedtothepredecessor(theonewhogenerate successor.Theregeneratingmodulemustknowitstwoimmediat regeneratedmodule the informationsnecessary taottach it Butthatisnotenoughincaseom f ultiplefailuresoaf modulehastroegenerate successor a until vaalid succes
ctivationprinciple.Then,thisnewmodule dit)otfhefaultymoduleandtoits esuccessorssoastogivetothe selfto the chain. djacentmodules.Inthatcase,eachregenerated sorhasbeen found. So,to
restorethechainwhateverthenumberofaultyadjacent successors. Iftheregeneratedmoduleistheinitialone,itcannotrece predecessorsincethelastonedoesnotexist.Inordertoa modulemustkeepinstablememoryinformationsofitssuc communicated tiot,ifitisregenerated after faailure •
Purge principle: Each non-initialmodulewithnporedecessor
Whenthepredecessoroamodule f fails,thismodulemustbe the predecessorafter fainite time.We avoid,like that, •
Suspicion principle: The validity othe f chain ips eriodical
modules,eachmodulehastoknowallits iveinformationsfromitssuccessorbyits voidthebuildingoanew f chain,theinitial cessors.Theseinformationswillbe . destroiesitselfafter fainitetime eliminatedinf onewmodulereplaced the creationo"parasite" f sub chain. ly tested
Periodically,eachmodule supervisesitssuccessor. This
mechanism allowstdoetectfaultymodules.
Moreprecisely,thedetectionibs asedonanacknowledgement transmitsacontrolmessagetoitssuccessorsoasto answerigs iven,thesuccessoricsonsideredfaulty.This rate ipsroportional to t.
mechanism.Eachmoduleperiodically verifythatiitstillvalid.Ifafterawhile mechanismcanbringfalsedetectionswhich
nt o
4.3.Fewexecutions •
The chain building
The activationprinciple allowsthechainbuilding.ThechainbuildingoN flongids onei nNsteps. Eachstepcorrespondstoagenerationoanf ewmoduleandits joiningtothechainalreadybuilt.This joiningias chievedwhenallitspredecessorsknowitsexiste nce.Thenewmodulecommunicatesits identitytoitspredecessor(regeneratingmodule)whichsend isback t toitspredecessorandso nuntil theinitialmodulereceivesit.Then,theinitialmodulet ransmitsanacknowledgementwhichis retransmittedfromsuccessortosuccessoruntilitarri vestothenewmodule.Itisonlyathis t moment thatthenewmoduleisjoinedtothechain.Thisprocess isrepeateduntilthejoiningoftheN-th module. • Chainmaintenanceicnase ofailures f The principlesof suspicion, activation and knowledge allow the chainmaintenance.Assume thattwomodules,CMiandCMi+l,failatthesametim e.TheCMi-lmoduledetectsbythe suspicion principle the CMimodule failureandregeneratessubstitute a moduleCMi'bythe activation principle. ThesubstituteCMi'moduledetectstheCMi+lfailureandfo llowthesameprocess.Thatis, regenerationothe f substituteCMi+l'moduletowhichigives t itsownidentityandthoseoCMi+2 f to CMNmodulesaccordingtothe knowledge principle.Then,CMi+l'moduleconnectsitselftoCMi+2 module and thusrestoresthe chain.Thisexample osituat f ion ishown ifnigure 10.
c
Figure 10: Recovery from multiple failures
With the knowledge principle,itispossibletorestore communication a c faultymodules.Thefailureothe f firstcommunicationmodule detection and regeneration are done btyhe user. •
hain,whateverthenumberof (CMlmodule)isparticular,becauseits
creation anddestructionopafarasite sub-chain
Wesawthattheprobabilityofalsedetectionins on-null In .caseofalsedetection,anewmoduleis regeneratedandwillbereconnectedtothesuccessorofthe modulesupposedtobefaulty.The supposedfaultymodulebecomesisolated(withnosuccessora ndpredecessor).Ict onstitutesasubchainwhich can grow upbfyollowing the activation principle.Such case a ishown ifnigure 11.
Figure 11: Creation osafub-chain from an isolatedmodule ( Thissituation insottheonlyonewhere sub-chain a icsrea beaccessiblecanalsoconstituteasub-chain.Thedestruc purge principle.
CM3) ted.Severaladjacentmoduleswhichcannot tionoparasite f sub-chainias chievedbythe
4.4. Definition othe f algorithm 4.4.1.Description oam f odule The statestransitionsocafommunicationmodulearesetup sakeosfimplicity,onlythemaintransitionsaredescri variousrecoverycases(duringoafter r completebuildingof fail at anymomentand thusianny state.
bythereducedgraphofigure f 12.Forthe bed.Inthisgraph,wedonotspecifyneither thechain)northefactthatamodulecan
Figure 12: Statestransitionsocafommunicationmodule Non-existent: themodule doesnotexist. Generated: the CMimodulehasjustbeen generated. WaitConnpred: the CMimodule gave itsidentity ttohemodulewhich genera module)andwaitsthe acknowledgementmessage. WaitConnSucc: the CMimodule generated CMi+l a module towhich igave t andwaitsforthe reception othe f CMi+l identity.
ted i(CMi-l t itsidentity,
Connected: the CMimodule received theidentity f'orm themodule it module). WaitReConnPred: the CMimodule detected itsdisconnection with itspredece waits reconnection a message. Eachmodulehasasetovf ariablesincludingitsleveli containsall itssuccessors'identities.
hasgenerated (CMi+l
nthechainanda
ssor,it
knowledgevectorwhich
4..42.Description omessage f types Thevariousmessage typesthat are exchanged between comm
unicationmodulesare:
New:allowsanewlygeneratedmoduletobke nownbyallitsprede cessors.Eachpredecessorwhich receivesthismessagememorizestheidentityothe f newlyge neratedmoduleandtransfersito titsown predecessor Whenthismessagearrivestothefirstmodule ofthechain,itmemorizestheidentityin the stablememory and sendsaancknowledgementmessage tiots successor. AckNew: istheacknowledgementmessagefortheNewmessagesentby thenewlygeneratedmodule. Itisused tcoonfirm to thismodule thatitisactually known bayll itspredecessors.
Update:incaseorfecovery,thismessageis entbytheregener successorthatitisitsnew predecessor. Thismessage AckUpdate: confirmstregenerated ao module thatitssuccessorcons When module a igs enerated,orregenerated,itreceivesthe vectorfrom itspredecessor(thegeneratingmodule) When the regenerated,itreceivesthe knowledgevectorsavedisntable
atedmoduleinordertoinformthe isalwaystaken into accountand acknowledged. idersias titsnew predecessor. identity and theknowledge initialmoduleigsenerated,or memory
Thisexampleshowsthegenerationandtheinsertionotfhefo urthmodule Inthefirststep,thethird modulegenerates new a module and changesttohe WaitConnSucc (WCS)state Thenewmoduleiisn Generated (G) state.Inthesecondstep,thenewmodulesendsitsidentity (Newmessage)toits predecessorandchangestotheWaitConnPred(WCP)state T heNewmessageirsetransmittedfrom predecessortopredecessor.Inthelaststep,theinitial modulesendstheacknowledgementoftheNew message(AckNewmessage)whichisretransmittedfroms uccessortosuccessor.Whenthenew modulereceivestheAckNewmessage,igt eneratesasuccess orandchangestotheWaitConnSucc state Thisprocessirsepeated until thegeneration and insertion othe f Nthmodule.
Thissecondexampleshowstherecoveryofafaultymodule.Int detectsthefailureofthethirdone.Itregeneratesasubstit WaitConnSucc(WCS)state.Thefourthmoduledetectsitsd changestotheWaitReConnPred(WRCP)state.Theregenera thesecondstep,thesubstitutemoduleisrecognizedbyitsprede sendsanUpdatemessagetoitssuccessorandchangesto fourthmodule receivesthe Updatemessage,
hefirststep,thesecondmodule utemoduleandchangestothe isconnectionwiththefaultymoduleand tedmoduleiisntheGeneratedstate.In cessors,likeintheaboveexample, theWaitConnSuccstate.Inthelaststep,the
acknowledgesibt ysendinganAckUpdatemessageandchangesto substitutemodule receivestheAckUpdatemessageichanges t to
theConnectedstate.Whenthe the Connected state.
4.4.3.Definition orules f Wegivethevariousrulesforthebuildingofthechain,itsr modulesand theelimination osub-chains. f
ebuildingafterthefailureopf articular
4.4.3. 1. Rulesobuilding f Initializationrule: Theoriginalchaincreationdemandcomesfromtheuserwho generatesthefirst module(CMlmodule).Whenthisfirstmoduleigs enerated(ge neratedstate)itthenalsogenerates the secondmodule and son. Connectionrule: Whenamoduleigs enerated,itsimultaneouslyreceivestheide ntityotfhemodule whichgeneratedit(exceptforthefirstmodule).Amodule thatisinstateGeneratedsendsits identitytoitspredecessors(thegeneratingmodule)through theemissionoaNew f message,and waitsforan acknowledgement(WaitConnpred state). Theidentityofacommunicationmoduleencloselocalization processingmoduleetc...
,localizationoftheassociated
Knowledgerule: Amodulewhichreceivesitssuccessor'sidentityfromane wlygeneratedmodule memorizesiand t sendsiback t toitspredecessor.Ifth ereceiveritsheinitialoneotfhechain,it savesiitnstablememoryandsendsanacknowledgementme ssage(AckNewmessage)toits successor.Moreover,ifthereceiverisinstateWaitCon nSucc(ithasnosuccessor)thenthe messagesenderisconsideredasitssuccessorandchanges toConnectedstate(itgeneratedthe senderofmessage). Generation rule: Whenmodule a CMireceivesanAckNewmessagefromitspredec essor(CMi-l),it sendsitoitssuccessor(CMi+l).IC f Miisthenewly generatedone(WaitConnpredstate)two casesare possible: .CMi isthe Nthmodule othe f chain (i.e.the lastone)thenit changesttohe Connected state and the building iosver; .CMi isnotthelastofthechain(i ≠N),twoothercasescanoccur:eitheritreceivedtheide ntity ofCMi+1(rebuildingstep)andthenitcanconnectitselft oitandtherebuildingisover; eithertheCMididnotreceivetheidentityofCMi+l,then itgeneratesasuccessorand changesttohe WaitConnSuccstate (building step). Whenamoduledetectsitssuccessor'sfailureiot nlygener connected tpredecessor. ao
atesanothermodulewhenitis
4.4.3.2.Rulesof maintenance Therebuildingrulesofthechainconsistinstartingthe detected.Twocasesmustbetakenintoaccount:eithert orrebuilding step.
recoveryprocessassoonasafailureis hechainiaslreadybuilteitheritisinbuilding
Regenerationrule When : aCMi-ldetectsitssuccessorfailure,theCMimod ule,itreplacesiw t ith the generation oCMi' f substitutemodule and waitsitnhe Wai tConnSuccstate.Atthesametime,it givesiits town coordinatesand thoseoCMi+l f to CMNm odules,iftheyexist. Updaterule An : CMimodule receiving Update a message from C a Mj mod acknowledgesi t,and considersthe CMj module aistsnew predecessor. Purgerule When : module, a thatisdisconnected tiotspredecessor, reconnection demand (Updatemessage)during particular a while Thisrule allowsthegradual elimination omodules f ian
ule always doesnotreceive itdestroys , itself.
sub-chain wrongly created.
4.5. Verification othe f algorithm
4.5.1.Model Thespecificationoadf istributedapplicationalwaysreq aremainlyconsidered:eithertheprogramminglanguagesupposed f'ormalmodelsuchaC s .C.S.[5]otrhePetrinets[2].T reducedcostreasons,itdoesnotincludeanyprovingmethod, that require validity a proof.Withthesecondsolution,some the formal proving othe f system correction. Wechooseaformalmodel,thoughmoregeneralthentheonemen inspiredfrom[4]:thesocalledeventmodel.Iitsworth others which are more specific.
uiresthechoiceoam f odel.Twopossibilities tobuildtheapplication;eitherany houghthefirstsolutionim s oreattractivefor whichiisrrelevantforcomplexsystems validationtoolsareavailablewhichallow tionedaboveandwhichihs ighly ourwhiletochooseageneralmodelthan
4.5.1.1.Definition otransition f systems Theeventmodeldescribesasystemwiththetotalityoif change from one state taonother.
tsstatesandthesetoef ventsthatmakeit
Definition 1 A :transition system S is pair a (E,R) where: • E isthe setofsystem states. • Risthe setofsystem rules. where r ∈R , .r.p: E →{true,false}ipredicate as whichdefinestheguard E→E is function a which definesthe action .r.a: Moreprecisely,r.pisapredicateonasystemstatew possibleinthatstate,andthefalseoneitfheactioni system behaviourand step a othe f system isdefined b:y
hichtakesthetruevalueifther.aactionis ns otpossibleinthatstate.Thus,
Rdefinesthe
ir.p(e) f ande=r'.a(e) e →e'ifand r only Such model a iscalledeventmodelfortheeventistheoccur the system change from satate taonother.
renceoan f action.Then,aneventmakes
4.5.1.2.Provingtechniqueson themodel System statesand propertiesmay bexpressed apsredic capture osystem f statesand propertiesbpyredicates, called assertions-orientedmethod.
ates.The proofmethod based otnhe and which iasdapted tourmodel,is
We are often interested toensure thatif saystemve thispropertyisretainedwhateverthesystemevolution.Pr propertiesare calledinvariants. Given transition a system,wegenerally study the behaviour certain initial states.Itis,ofcourse,unlikely that of E aspossibleinitialstates.Whenasystemstartsine by the system from e 0. Definition2 Let : follows:
rifiesproperty a am at omentorat gaivenstate,so edicateswhichexpressthiskindof
ofthe systemwhen startedin we are interested icnonsidering allelements , onlyinterestingstatesarethosereached 0the
S=(E,R) betransition a systemande
(i)e 0 ∈ Acc(e0) (ii)ife ∈ Acc(e0) and e
→e'then r e'
0
∈We E. definethet'unction
.Acc overEas
∈ Acc(e0)
Acc(e)isthe setofall reachable statesfrom e. Definition 3 Let : ifforeache
S=(E,R) be transition a system ande
∈ A E.predicate
0
Pissaidtobe
0-invariant
∈ Acc(e0) P(e)istrue.
Toprovethatparedicate P ise 0-invariantrequirestoprovethat P istrueforallstatesof Acc(e0).This is,tediousforcomplexsystemsandimpossiblewhen Acc(e0)isinfinite.FollowingKeller[4],we propose more a restrictive conceptnamely induction. Definition 4 Let : S=(E,R) be transition a system ande provided that: (i) P(e0and ) (ii) ∀e,e' ∈ E, P(e) ANDe →e'implies r Thepowerwhichliesintheinductionprincipleitshat ofreachabilityinordertodemonstratethatapredicat trueaftereveryactionwhichmakesthesystemchangefrom thatthispredicate iisnvariant. Thus,we have justto s
0
∈A E predicate .
P issaid tboe
0-inductive
P(e') itdoesnotrequireacompletecharacterization eiisnvariant.Provingthatapredicateremains astatetoanotheriesnoughtoconclude tudy the initial state and the rules.
Itisnotarestrictiontoconsideraninductivepredicate firstoneiosbtained bsytrrengthening and including thesec Proposition1 [4]:Anyinvariantisnotaninductiveinvariant,butanyin invariantwhichimpliesit.
insteadoaf ninvariantpredicatebecausethe ondone. varianthasaninductive
Thediscussionabovehasinvolvedtheuseotfheinductionprincipl etoshowthatcertainproperties always hold.However,thereareotherconditionswhichmight berequiredbeforeasystemcanbe considered correct. Insomecases,itisdesirablethatsaystem alwayste rminatesforcertaininitialvalues.Thisitshecase withsystem a designedtocomplete specific a task.Onth oetherhand,manysystemsaredesignednot to terminate,orto terminate only ianbnormal situation s.
Ouralgorithmmatchesthefirstcase.Thechainbuildingmust generatedandeachothem f isinits"Connected"state.This system.However,itisnotobvioustoshowthatstartingfr reach itshomestate.Hencewuese anothermethod forproving
terminatewhenNmodulesare constitutesahomestate,notede , the Hfor om aninitialstatethesystemwillinevitably termination.
Definition 5 Let : S=(E,R) be transition a system ande saythatafunction η: E → ω (any 0 ∈ E . We well-ordered setwilldoipnlace of ω)isa R'-norm (R' ⊂ R ) withminimal state eprovided that H (i) η(e)isminimaliff =e H (ii) ∀e ∈ Acc(e0), η(e)isnotminimal ∀e ∈ E, ∀ r ∈ R' e, →e'r η(e')< η(e) R' isa
well-foundedsubset
associatea
R'-normfunction
of Rsuchthateachruleof ηwiththesystemsuchthat
bythesystemandim s inimalforthestateww e ishthesy state the system goesto,itmustalwaysgetto itshome
R' decreases η.Toprovethetermination,we ηdecreaseseachtimea nactioniesxecuted stemreach.Thismeansthatnomatterwhat state.
4.5.2.Modelling 4.5.2.1.Problemsrelated tothemodel Thetimedoesnotexistintheeventmodel.Thus,itistr ickytomodeleventswhoseoccurrencesobey totemporalconstraints.Schedulingmayalsobteediously achieved.Thenon existenceotime f implies sometimesuncontrolledeventoccurrenceswhichexpresscondition seasilyavoidableinthereal system.Thetimeabstractioncannotbeaninconveniencesi ncethemodelallowstodescribea behaviourwhich includesthe real system behaviour. 4.5.2.2.Modellingchoice Tomodeloursystemwechooseanapproachbasedonthechara communicationchainiscomposedofasetofmodulesanda stateids efinedbythevaluesoiftsvariablesandthech sentbutnotyetreceived.Thesystemstateitshendefin modulesandthechannel.Inthisway,wecanstudythesyste whichmakeichange t fromonestatetoanother.Aninstance sequence oevents f expressesthe system behaviour. Weassumethatthesystemiscomposedofaninfinitear indexedwith itsidentity thatdesignatesiin taunniqu modulewhich are used tporove the correctnessothe f algori
cterizationofitsstates.The communicationchannel.Themodule annelstateids efinedbythesetofmessages edbyacombinationothe f statesothe f whole mbymeansoiftsstatesandtheactions ofanactionicsalledaneventandeach ray(mod[])ofmodules.Eachmoduleis m e anner.Wegivethestructurecomponentsoaf thm in the nextsection.
Array omodules f indexedwith Id
mod[Id]
Id: Integer ∪{-1}itshemoduleidentity ∪{-1}expressesthemodule levelin the chain
MLev:
1,..., [ n]
Pred:
Id ∪{-1}iistspredecessor'sidentity
Succ:
Id ∪{-1}iistssuccessor'sidentity
State: [Non-existent, Generated, WaitConnPred, WaitConnSucc, Connected, WaitReConnPred] mod[Id]isthemoduleindexedbyitsidentityId.Whenacomp onentisnotdefined,wegiveithe t -1 value.We use global a variable Nextto allocate aunniqu iedentity foreachmodule.
Initially,allthemoduleshavetheirStatecomponentequal (1).
toNon-existentandNextisequaltoone
Thechannelismodelledbyamulti-set"Channel".Thediff messagesaregiveninthefollowing.Wegivetheonlynecessary correctnessothe f algorithm in thenextsection.
erenttypesandcontentsotfheexchanged messagesthatareusedtoprovethe
New=(type =New,Sender,receiver,source-sender,Level) AckNew=(type =AckNew,Sender, receiver,final-receiver) Two actionsare definedotnhemessages: • Channel := Channel-m,expressesthe reception othe f messagem. • Channel :=Channel +(type,compl,comp2,...), expressesthee message (type,compl,comp2,...). Moreover, • Channel ≥ m,expressesthatthere iasleast t onemessagem in the = ∅, expressesthatthe channelisempty. • Channel
mission othe f channel.
4.5.2.3.Proof Theassertions-orientedmethodliesonpredicatesthatof way,the userhas gaeneral viewofthe system and can the instances.
tenexpresssystemglobalvariables.Inthis nbettermasteritsevolutionincaseoevent f
Forsakeofsimplicityandlackofplace,wegivethecorr building ireliable an environment.
ectnessofthealgorithmforthechain
Ateach time,in the chain building stepwheave: I0 : ∃0 ≤k ≤N modules, ∀i,i ≤k ⇔ mod[i].state ≠Non-existent whichexpressesthatkmoduleshave beengenerated.Nisth leengthothe f chainww e antto build. The I invariantwhichexpressesall the statesstemmingfrom the a lgorithm execution,i.e.the statesof the kmodulesand the channel,isdefined afsollow: I I=
0
AND
(Ideb OR I1 OR I2 OR I3 OR I4 OR I5)
where Ideb:
I 1:
k= 0, ∀ i
∈
Thebuildinginot s started
[1,k-2],
mod[i] .state=Connected mod[k-l].state=WaitConnSucc mod[k].state =Generated
I2:
∀ i
∈
[1,k-2],
mod[i] .state=Connected mod[k-l].state=WaitConnSucc mod[k].state =WaitConnPred Channel
I3:
∀ i
∈
[1,k-1],
=m,
m.type =New
mod[i] .state=Connected mod[k].state =WaitConnPred Channel =m,
I4:
∀ i
∈
[1,k-1],
m.type =New
mod[i] .state=Connected mod[k].state =WaitConnPred Channel
I4:
∀ i
∈
[1,k],
=m,
mod[i] .state=Connected Channel k=N
=∅
m.type =AckNew
Informally,the Idebpredicatetakesintoaccountthesystemstatesforwhich thechainbuildinghasnot started.Thispredicatebecomesandremainstrueasoo natsheinitializationstarts.Thelastpredicate Iallows toexpresstheterminationconditionotfhebuilding.The otherpredicatestakeintoaccount 5 the situationswhere the building istillin progress. Weshownowthatthepreviouslydefinedinvariantremainstrue whatevertheactionthatcanchange state ttohe system.Forthat,we systematically study alltheactionswhichcanalter I.Weconsiderone byoneeachpredicateof aIndestablishthatifanactionothe f systemalters theconsideredpredicate, then oneothe f otherspredicatesof bIecomestrue. Let Ibe TheonlyonepossibleeventisthesendingoaN f ew messagebythek-thmodule.Two l true. casesarepossible.Ifthek-thmoduleitsheinitialone(k= l),thenigenerates t asuccessorandthe predicateremainstrue.Otherwise(k ≠1),itsendsitsidentity(includedinaNewmessage)toi true I2 . predecessorand changesttohe WaitConnpred state. Thisalt ers Ibut l make Weproceedinthesamemannerwiththeotherspredicatest invariantsincewteake into accountall the possible
odemonstratethat rulesothe f system.
Iisaninductive
5.Conclusion Wehavepresentedanalgorithmprovidingfault-toleranceforl OSImodel,wehaveconsideredacommunicationchain.Oura preservingofthechaininaunreliableenvironment.Thisis regenerationoffaultyelements.Incontrasttootherm tolerates anunlimitednumberofailureswithasmal applicable forsoftware architectures.The correctnesso
ayereddistributedsystems.Fromthe lgorithmensuresthebuildingandthe achievedbyintroducingthedynamic ethods,thedynamicregenerationmethod leroverhead.Naturallythistechniqueisonly the f algorithm isformally proved.
Attheprospect,thegeneralizationofthedynamicregenerati componenthas onepredecessorandoneomore r successors.T tree and sonwhich are mostoftenoperated.
ontoanyarchitectureswhereeach hiskindoarchitectures f includesring,
References [1]A.Avizienis,The N-versionApproach toFaultTolerant, Engineering voln°12,December1985.
IEEE TransactionsoS n oftware
[2]G.W.BRAMS(collective name), RéseauxdP e etri:Theorieepratique, t Masson,Vol.1and 2Paris, , 1982 and 1983.
Edited by
[3]J.Henshall,S.Shaw, OSI EXPLAINEDEnd-to-End Computer Standards, Edition,EllisHorwoodlimited,1990. [4]R.M.Keller, FormalVerificationoParallel f Programs, July 1976,Vol.19,No.7, pp.371-384. [5]R. Milner, [6]V.P.Nelson, 1990.
Communication and Concurrency,
Second
Communicationsothe f ACM,
EditedbP y rentice Hall,1989.
Fault-Tolerant Computing:Fundamental Concepts,
COMPUTER, July
[7]B.Randell, DesignFault Tolerance, Theevolution oFault-Tolerant f Computing,A. Avizienis,H.Kopetz,J-C.Laprie,Edited bS y pringer-Ve rlag 1987,Vol.1,pp251-270.
Il ts
[8]S.S.B. Shi,G.G.Belford, ConsistentReplicatedTransactions,AhighlyReliableProgramexecut ion Environment, EighthSymposiumonReliableDistributedSystems,Seattl e,Washington,October 1989,pp30-41. [9]N.A.Speirs P.A. , Barett, Using passiveReplicatesinDELTA-4 toprovidedependable distributedcomputing, 19-th Fault-Toleranton Computing Systems,1989. [10]P. Thambidurai,K.S.Trivedi, TransientOverloads in Fault-Tolerant Real-Time Systems, Real Time SystemsSymposium,SantaMonica,Californie 1989, , pp126-133.