HIL - Kylowave

9 downloads 0 Views 4MB Size Report
Matlab/Simulink users to develop ..... Signal processing: Xilinx's System Generator and Altera's DSP Builder ..... Manual partitioning done by the user. – Assisted ...
Huntsville Simulation Conference 2009 Kylowave Inc. (former Castel Research Canada Inc.) Applied Dynamics International

Advances in RealReal-Time Hardware--in Hardware in--thethe-Loop (HIL) Simulation Technology

1

Introductions Castel Research Inc.

Applied Dynamics Intl.

• •

• • •





Complex System Experts Automatic code generation from high level system models Leverage large base of Matlab/Simulink users to develop applications for massively parallel platforms Integrated solutions

RealReal-time Simulation Expertise Tools & Total Systems Supporting Programs in: Aerospace, Defense, Automotive, Land Vehicles, Power Generation



Founded 1957

Zero-defect process, hardware design, Zeroverification and realreal-time simulation



Complex System applications Aerospace/Defense, Automotive and CleanTech Power Systems

2

Agenda • • • • • • • • • •

Introductions Objectives General Architectures Emerging Alternative Architectures Real--time Integration Algorithms Real Multi--rate Simulation Multi HIL Interface Characteristics Time Synchronization Some HighHigh-Performance Architectures for HIL Summary & Strategy for the Future

3

Objectives • • • •

Overview of HIL requirements Available HIL simulator architectures More challenging HIL requirements Strategies and architectures to address requirements

4

General HIL Architecture & Requirements

5

Typical HIL Applications • HIL is an efficient, costcost-effective method to develop and test dynamic systems with embedded devices, control systems, avionics • Control System Development and Test – Jet Engine HIL Simulation for FADEC Testing – IronBird (fly(fly-by by--wire actuation) – Automotive

• Integration Labs – Aircraft – Ground Vehicles – “Satellite “Satellite--in in--a-Box”

6

Aircraft Integration Test Facilities • Distributed Architecture – Modular – Parallel Development – Accessibility

7

Typical HIL Applications • Historically class of problems solved using HIL simulation required ~1 ms frameframe-time – mechanical system with controls: missile, aircraft, automotive – Characteristically • set of non non--linear, scalar, differential equations • relatively small number of data samples per frame • often stiff systems

• Previous 55-10 years we’ve worked with applications requiring 100 µsecond range and ≤10 µseconds

8

Higher Speed RealReal-time Requirement • Applications Requiring Higher Speed (on the order of 1 µsecond) • Power Systems Research and Development – Hybrid Hybrid--Electric Vehicles – More Electric Vehicles (Aircraft, Ship) – Smart Grid

• Optimize & Test Control Strategies – Detailed Model of Energy sources: Solid Oxide Fuel Cell and LiLiIon Battery – Electrical Component Simulation • • • •

High Speed DC DC--DC Converters High Speed AC AC--DC Rectifiers High Speed DC DC--AC Inverters High Speed AC AC--AC Matrix Converters

9

Simulation Frame Time Constraints • Maximum integration stepstep-size limited for accurate and stable simulation – Simulation model representing system with fast response

100 microsecond frame cycle

10 microsecond frame cycle

– Example 2nd order systems: 800 Hz vs 100 Hz

10

Simulation Frame Time Constraints • Limited by sampling requirement – –

switching used in electric motor control: PWM frequencies are generally in the 10 to 20 KHz range DC to AC inverter •



degree of realism of the generated phase signals depends on the ratio of the PWM frequency to the frequency of the phase signals; the higher this ratio, the better Increasing the PWM frequency allows smaller components such as inductors and capacitors to be used, thus saving space

11

HIL Requirements & Architecture • HIL RealReal-time Requirements – – – – – –

determinism low interrupt response time high--resolution clock high high performance low latency hardware interface low latency process dispatch

12

System Load • • • •

• • •

Application code Sampling & Updating external hardware Communication with other applications Communication interfaces (MIL--STD 1553, ARINC, (MIL CAN, Firewire, UDP, etc.) RTOS overhead Data Acquisition and Storage Operator input

13

RTOS Considerations • Stripped Stripped--down Executive – extremely low overhead – may require expertise to support lowerlower-priority threads

• POSIX POSIX--compliant – supports operatoroperator-friendly utilities – software maintenance based on large user community – overhead is measurable for highhigh-performance applications

• Support and scheduling for multiple threads

14

Other Considerations • • • • • • • • •

Cost-effective CostPortability Upgradeability Scaleable processing Scaleable IO Reconfigurable for different projects Cabling ease and accessibility Environmental Conditions Service life

15

Real--time Buses Real • Hardware interface boards supported via bus – VME • 40 MByte/s

– PCI, PCIPCI-X, cPCI, PXI • 133 MByte/s

– PCI PCI--E • point point--to to--point • 250 MByte/s to 16 GByte/s

• Published bus speeds are “burst” • Realistically must be tempered with – interrupt latency – bus arbitration – software overhead

16

Hardware Interface Consideration • Although highhigh-speed Data Acquisition (DAQ) boards are available (50 Ms/second, 16 bit) they often are not optimal for HIL simulation applications • Problems for HIL applications: – latency due to pipepipe-lining – setup to access data is timetime-consuming – must read large amount of samples to get latest sample

17

PCPC-Based Simulators Cost--effective with relatively high performance Cost • large consumer market drives the PC technology advancements • Clock speeds ~3GHz • Optimized program execution: branches, etc. • Multiprocessor platforms with high--speed internal interconnect high capability • Common Operating systems: – QNX Neutrino RTOS – enhanced realreal-time Linux kernel

• PC Buses: PCI, cPCI, PXI, PCI PCI--E

18

Distributed Architectures • Advantages – Remote – Efficiency – Economy – Specific functionality – Divided as Physical Subsystems – Horsepower • Implementation – – – –

VME bus shared memory Reflective memory Ethernet UDP and TCP IEEE--1394 IEEE

19

Bottlenecks & Overhead • Latency from hardware sample to compute – – – –

bus traffic bus speed hardware interface latency time to setup external hardware access

• Overhead – RTOS – Synchronization with other processes – low priority services

20

Hardware Interface Strategies • • • •

Direct Memory Access averaging and extrapolation techniques point--to point to--point buses (PCI(PCI-E) drivers optimized for realreal-time simulation rather than DAQ • Development of IO boards specific to HIL

21

Execution Strategies • Distributed simulation on a multimulti-core PC – off off--load RTOS and operator interface to one processor – other processors are dedicated to application subsystems – shared clock – shared memory

QNX RTOS Slower I/O Slower Subsystems Operator Support

CPU1

simulation cycle

simulation cycle

– each core has direct access to PC bus

Faster Subsystems with IO access

CPU2--4 CPU2 22

Emerging Alternative Technologies

23

What you will learn • Why we need to consider it • What are the main competing technologies • High level view of their internal architectures • Programming models • Comparison of some key metrics

24

Motivation • Problem: end of uniprocessors – We have hit the Brick Wall : Power Wall + Memory Wall + ILP Wall – No more clock frequency increase: New “Moore‟s Law” is 2X “cores” per chip every technology generation

• But applications are becoming bigger and more complex everyday – Multi physics and multi domain simulation – All All--electrical ships, aircrafts, vehicles, etc. – Low latency and ultra high speed HIL realreal-time simulation including • High speed power converters • High speed controllers • Requires timestep below 1us

• Challenge: Multicore CPUs alleviates some problems for now, but the main question is how to scale to massively parallel processing

25

Emerging Architectures • GPGPU - General Purpose Graphics Processing Unit – – – –

designed to accelerate graphics applications highly parallel processors capable of 100’s of Gflops/sec Mature technology driven by strong in the PC gaming market Three main vendors: Nvidia, AMD/ATI and Intel

• Cell/BE Processor – – – –

Includes one Power PC and 8 RISC processors IBM Cell Processor is used in video games. Ex.: PS3 Developed by: SCEI/Sony, Toshiba and IBM Next generation: 2 PPCs and 16 processors @3.2 GHz

• FPGA - Field Programmable Gate Arrays – Widely used in high speed image and signal processing – Two main vendors: Xilinx and Altera

26

GPGPU • Difference between CPUs and GPGPUs • CPUs – are “smarter” but are “narrow” in data processing elements – Its control unit are capable of realizing many functions - very versatile – Achieve moderate level of parallelism through multimulti-core

• GPGPUs Massively Parallel

– devote many more transistors to data processing – Capable of massively parallel processing – Efficient implementation of the data streaming paradigm

27

GPGPU • Example: NVIDIA 8800 GTX

• •

Device Multiprocessor #N

– M = 8 processors – N = 16 multiprocessors

Multiprocessor #2 Multiprocessor #1 Shared Memory Registers Processor #1

Many processors together with no cache coherency 256 kb cache used by 128 processors

Registers

Registers

Processor #2

Processor #M

Instruction Unit

• •

Constant Cache

Texture Cache

• •

8192 reg/multiprocessor Constant and texture memories : 8 KB per multiprocessor (each) Peak processing: ~300 GF/s GPGPU peak data transfer – To Internal memory : ~530 Gb/s – To CPU: ~80 Gb/s (PCIe bus)

Device Memory



External memory peak transfer rate – CPU to memory (DDR2 800 - PC 6400): ~60 Gb/s 28

GPGPU • Main solutions – NVIDIA • GeForce (gaming/movie playback), Quadro (professional graphics) ans Tesla (HPC)

– AMD/ATI • Radeon (gaming/movie playback) and FireStream (HPC)

– Intel • Larabee (HPC/gaming) • Polaris research project - above 1TFlop/sec (next generation)

• Example of programming languages – Platform specific: NVIDIA’s CUDA and CTM from AMD/ATI – Commercial • RapidMind • PeakStream

– Standards • OpenCL and OpenGL 29

Cell/BE • High level internal architecture •

1 Power Processing Element (PPE) –

– – – – –



8 Synergistic Processor Elements (SPEs) – – – –



power processing unit (PXU) - generalgeneralpurpose 6464-bit RISC processor (PowerPC) two way hardware multithreading on on--chip L1/L2 cache 512 KB L2 cache 32 KB L1 instruction cache 32 KB L1 data cache

synergistic processor unit (SXU) special--purpose processor special 128--bit SIMD 128 Local Store (LS) - 256 KB local Memory Direct Memory Access Engine (SMF) (250 Gb/s)

Interconnected with the Element Interconnect Bus (EIB)

30

CellBE • 2nd Generation : IBM PowerXCell 8i (2008) – – – – – –

9 cores, 10 threads ~230 GFlops peak (SP) @3.2GHz ~110 GFlops peak (DP) @3.2GHz Up to 250 Gb/s memory bandwidth Up to 750 Gb/s I/O bandwidth Top frequency > 4GHz (observed in lab)

31

FPGA • Basic FPGA architecture •

FPGAs – Are semiconductor device consisting of programmable logic components and programmable interconnects – Allow programmers to hardcode a specific algorithm without incurring the overheads of instructioninstruction-style programming CLB Slice 0 I/O Blocks (IOBs)

LUT

Carry

CLR

Programmable interconnect

Block SelectRAM™ resource Dedicated multipliers

Configurable Logic Blocks (CLBs)

PRE D Q CE

LUT

Carry

D PRE Q CE CLR

Clock Management (DCMs, BUFGMUXes)

Each CLB includes 4 slices

32

FPGA • General internal architecture •

Modern FPGAs can run at 600 MHz and can include: – – – – – –



A VirtexVirtex-6 LXT FPGAs include: – – –

RAM and ROM memories Multipliers and accumulators High speed serial transceivers Lots of glue logic and lookup tables Lots of registers Better yet: they are reprogrammable

– –

600 MHz switch Matrix Up to 720 36Kbit Block RAM @600MHz Up to 864 25x18 multipliers, 4848-bit adder, and 4848-bit accumulator up to 36 lowlow-power GTX 6.5 Gb/s serial transceivers Up to 2 interface blocks for PCIe

Fixed point Oper/s (1)

GF/s (SFP) (2)/(3)

GF/s (DFP) (2)/(3)

Max. IO transfer (Gb/s) (4)

Number Of FFPs (Millions)

Maximum BlockRAM (kb)

XC6VLX75T

345

172/56

20/13

78

11.6

5,616

XC6VLX240T

922

462/150

54/35

156

37.7

14,976

XC6VSX475T

2420

924/300

108/78

234

74.4

38,304

Device

(1) - One multiply and one accumulate of 25x18 bits @600 MHz (2)/(3) – means peak / sustainable based on Floating Floating--Point Operator v5.0 (4) – estimation based on using only GTX 6.5 Gb/s transceivers

33

FPGA •

Specifics – have a long history in embedded processing and specialized computing – are generally slower than custom ASICs (“Application(“Application-Specific Integrated Circuit”) and can only implement designs of moderate complexity – On the other hand, custom ASICs are not programmable and their development cycle is much longer and expensive – are very good at implementing • small integer and floating floating--point calculations with a small number of bits • Control logic



Programming languages – VHDL and Verilog • Allow to generate very efficient implementations • But not what the HPC community is used to using

– C-like languages • Conventional wisdom: the closer to standard C that the solution is, the worse the resulting application performs • Ex.: Celoxica’s Handel C, Mitrionics’s Mitrion Mitrion--C and Impulse C from Impulse Technologies

– Higher level of abstraction tools • Signal processing: Xilinx’s System Generator and Altera’s DSP Builder • Controller: Mathworks HDL Coder • LiquidWave from Castel Research

34

Comparison metrics • PROGRAMMING - clear tradeoff between performance and programming effort [2] • Example: Time domain FIR filter C SIMD Hand Parallel C

Coding

(8 SPEs)

33

110

371

546

Design Time

Minute

Hour

Hour

-

Coding Time

Minute

Hour

Day

-

Debug Time

Minute

Minute

Day

-

Performance Efficiency (1 SPE)

0.014

0.27

0.88

0.82

0.27

5.2

17

126

Lines of code

GFLOPS @ 2.4 GHz

Data for Mercury Cell Processor System (@ 2.4 GHz) 1 Cell Processor: ~150 ~150 GF/s max (single precision) ~15 GF/s max (double precision)

• This example runs in a IBM Cell/BE processor • But similar numbers apply to the other two technologies as well • CONCLUSIONS: 1) Brick walls – no more faster processors for larger and more complex applications 2) Challenge - how to program massively parallel hardware without sacrificing performance too much 3) Highly optimized libraries are helpful, but what if the computation you want is not in the library? 35

Summary • Each type is good for specific types of applications – CPUs (single and multimulti-core) • are very versatile, “smart” but “narrow” in processing elements • It is not clear how to scale to massively parallel processing

– GPGPUs and CellBEs • • • • •

Scale well for massively parallel processing SIMD parallel data model and very rich in processing elements Performance has scaled faster than multicore CPUs $/W and GF/W is much higher than CPUs Very good at: graphics, 32 32--bit floating floating--point

– FPGAs • Scale well for massively parallel processing • Very good at: embedded applications, controllers, signal pre pre--processing, digital filter, etc. • Are used in HIL for sampling and pre pre--processing of fast signals • Very low W/GF but very high $/GF • Very good at integer and applications that require a small number of bits 36

Integration Algorithms

37

Discretization Methods • Numerical Integration – Forward rectangular (Euler) – Backward rectangular (Implicit Euler) – Trapezoidal (Tustin or Bilinear) with prewarping

• ZeroZero-pole mapping • Hold equivalence – ZOH – FOH

38

In General Use • Typically simulation packages used to model the application (Simulink, ACSL, SystemBuild, Dymola, etc.) • The application can be implemented in the continuous domain and one of a number of integration algorithms can be selected for its solution • Methods such as: – – – – –

Euler (1st order) Heun or Runge Runge--Kutta (2nd order) Predictor algorithms: Adams Bashforth (2nd, 3rd, and 4th order) Predictor corrector algorithms: Adams Adams--Moulton Runge Runge--Kutta (4th order)

39

Solver Considerations • Explicit vs Implicit – iterative method to solve implicit algorithm not typically used for fixed--step real fixed real--time integration – if derivatives depend on hardware inputs and require implicit solution the hardware input must be approximated for the next timetime-step

40

Solver Considerations • Single step vs MultiMulti-pass • Algorithm Order

Definitions state: Y input: U integration stepstep-size: T nonlinear differential equation: F(Y, U)

Euler: Yk+1 = Yk + T * F(Yk,Uk) AB2: Yk+1 = Yk + 0.5 * T * [3 * F(Yk, Uk) – F(Yk-1, Uk-1)] RK2: Ŷk+1 = Yk + T * F(Yk, Uk) Yk+1 = Yk + T * [F(Yk, Uk) + F(Ŷk+1, Uk+1)] RKRT2: Ŷk+1/2 = Yk + 0.5 * T * F(Yk, Uk) Yk+1 = Yk + T * F(Ŷk+1/2, Uk+1/2) Howe, R.M., “Dynamics of RealReal-time Digital Simulation,” 1996, Applied Dynamics International 41

Solver Considerations Higher-order algorithm may allow larger Higherintegration stepstep-size given increased accuracy

current

Compares AdamsAdams-Moulton 2nd order to Euler

42

Compatibility with RealReal-time Samples Uk+1

Expected Inputs

Uk

time

kh

(k+1)h

Outputs

Yk

Yk+1

Expected Inputs

Uk

Uk+1not available yet!

Yk+1 = Yk + T * F(Yk,Uk)

AB2 Method Yk+1 = Yk+0.5*T *[3*F(Yk, Uk) – F(Yk-1, Uk-1)]

Uk+1

RK2 Method

time

kh

(k+1/2)h

(k+1)h

Outputs

Yk

Ŷk+1

Yk+1

Uk

Uk+1/2

time

kh

(k+1/2)h

(k+1)h

Outputs

Yk

Ŷk+1/2

Yk+1

Expected Inputs

Euler Method

Uk+1

Ŷk+1 = Yk + T * F(Yk, Uk) Yk+1 = Yk + T * [F(Yk, Uk) + F(Ŷk+1, Uk+1)]

RKRT2 Method Ŷk+1/2 = Yk + 0.5 * T * F(Yk, Uk) Yk+1 = Yk + T * F(Ŷk+1/2, Uk+1/2)

43

Solver Considerations • Stability regions based on eigenvalues and stepstepsize • Predictor algorithms depend on previous derivatives so use Euler for startup step if discontinuities • Multi Multi--rate simulations

44

Error Estimates Error measures in numerical integration are difficult to develop, partly due to the propagation of errors from step to step; however, error estimates have been derived for comparing the performance of integration methods. These error estimates include: • Error estimates of local and global errors (direct quantifications of roundround-off and truncation errors made over the integration interval) • Error estimates that incorporate the dynamics of the linearized system model (characteristic root errors, transfer function errors, phase shift errors, etc.)

45

Quantifying Dynamic Errors • Error in simulations due to – finite integration step – sample & update rates – latencies

• Common methods – compare real real--time simulation outputs with accurate offoff-line results – decrease the integration step step--size by half and compare

• On On--line calculation of dynamic error – R.M. Howe, “A New Method for On On--line Measurement of Dynamic Errors in RealReal-time Simulation,” ADIUS 1997

46

Multi--rate Simulation Multi

47

What you will learn • Why we need to consider it • What are the main groups of methods • The multimulti-rate Filtering/extrapolation methods • Some other multimulti-rate methods

48

Motivation • Problem: RealReal-time simulation of complex multi multi-physics systems is a large problem • Three main reasons why the direct approach can become inefficient – sparse matrix solution time grows supersuper-linearly with the size of the problem – for large differential equation systems, the state variables normally change at very different rates – Systems may include stiff and nonnon-stiff models

• More efficient strategy – Partition the system under test – Use a different multimulti-rate method in each part

49

Current state • During the last decades, many partitioning and multi--rate methods have been proposed and multi widely used for offline simulation • Real Real--time requirements add more complexity – Time must advance in fixed steps (no variable time step) – All calculations in any time step must be completed before the start of the next time step (the simulation can’t go back in time)

• In the last five years, partitioning and multimulti-rate methods have attracted new attention from academia and industry

50

Partitioning strategies • Sub Sub--systems can be partitioned at: – physical level – equation level

• Most commercial realreal-time simulators partition at system level – Manual partitioning done by the user – Assisted or not by the simulator

• Automatic partitioning still not widely used – Stiff versus nonnon-stiff equations – Slower versus higher dynamics – Linear versus non linear subsub-systems

51

Research directions in the last decades • Multi Multi--method schemes – Used for systems containing both nonnon-stiff and stiff parts – The partitioning is done on the level of the discretization scheme – An explicit scheme is used for the nonnon-stiff parts, and an implicit method for the stiff ones

• Multi Multi--order schemes – Used for nonnon-stiff problems only – The same explicit method, and the same step size, is used for all parts – But the order of the method is chosen according to the activity level of the subsystem

• Multi Multi--rate schemes – Used for both stiff and nonnon-stiff problems – The same (implicit or explicit method), with the same order, is applied to all subsystems – But the step size is chosen according to the activity level

• Presently, the majority of research and development is done on multimulti-rate methods 52

Multi--rate scheme Multi • Basic idea of multi multi--rate methods – Most systems are formed by subsub-systems with very different dynamics – Partition the system according to those dynamics – Simulate each subsub-system using the most appropriate time step

Typical dynamics of some subsub-systems

Power converter partition example

53

Filtering/extrapolation methods • Consider two sub sub--systems running at a fast (h) and slow (H) frame rate Such that: H = N * h – They can not immediately exchange information

• We must decrease (“downsample”) the sample rate from the fast subsub-system to the slow one – Otherwise, the sampled value of the rapidly changing signals from the fast subsub-system can become very unpredictable

• Increase (“upsample”) data sample rate from the slow sub--system to the fast one sub • Decide which one should be done first: down or up sampling • And ensure the two parallel simulations are kept synchronized 54

Filtering/extrapolation methods • Downsampling – Decimation – used on multi multi--rate digital filter design • For every N samples, keep one and discard the remaining (N (N--1)

– Average over the larger time step 1 iD ( t ) = Tb − T a

N

i D ( Ta + ( k −1) ) + i D ( Ta + k )

k =1

2



⋅ ∆t

– Low pass filtering • Use linear phase filters to minimize error

• Upsampling – Zero Zero--order hold • y(t) = y(kT), kT 1 instability • Note the effect of the delay

69

Bergeron Travelling Wave Models • Also called Transmission Line Model (TLM) [4] – Basic idea: realize partitioning over existing transmission lines – Three different implementations • Frequency Dependant Line Model, Bergeron Line Model, and Lossless Bergeron Line Model • Requires the transmission line length to correspond to at least one time delay to improve accuracy (some implementations require at least 2 time step delays) • Exchange voltages/currents at the end of every transmission line related delay • Issue: modeling resistance Rlk introduces losses which may be unacceptable GOL

1 − α ⋅ e −2 sτ Z S = ⋅ 1 + α ⋅ e −2 sτ Z lk α=

ZL -R lk ZL +R lk

Rlk ( L) =

L

τ τ Rlk (C ) = C

or

Va (k ) = V2 (k − 1) + Rlk ⋅ I 2 (k − 1)

Vb ( k ) = V1 (k − 1) + Rlk ⋅ I1 ( k − 1)

70

Voltage--Voltage Model Voltage • Also called Partial Circuit Duplication Method [4] – Basic idea: duplicate the link component between both circuits on each partitioned subsystem and use relaxation technique to calculate the interface variables • The method is easy to implement • It can achieve higher stability than the ideal transformer method for resistive systems (because it is easier to ensure GOL < 1) • Requires many iterations to improve accuracy  not good for realreal-time GOL =

Z a Zb ⋅ e − sτ ( Z a + Z ab ) ⋅ ( Z b + Z ab )

71

First--Order Compensation Model First • It is also known as Time Time--variant First First--order Approximation – Proposed by Wu et al. [5] – The basic idea is to approximate the HUT by a firstfirst-order linear system (RL or RC topologies) and use it as a predictor – For example, assume an RL implementation (RC is similar) • First order HUT:

di2 = a ⋅ i2 + b ⋅ v2 dt • Apply trapezoidal discretization i2 (k ) ≈ α ⋅ v1 (k − 1) + β ⋅ i2 (k − 1),

i2 (k ) ≈ Geq ⋅ v1 (k − 1) + I eq • Solve for the coefficients −1 α  v1 (k − 2) i2 (k − 2)   i2 (k − 1)   β  =  v (k − 3) i (k − 3)  ⋅ i (k − 2)     1 2  2  • Limitations of the method – The matrix formulation above can easily become ill conditioned – It is also extremely sensitive to sensor noise – Predictor Predictor--based methods can lead to instabilities

72

Delay Related Stability Analysis • As we have seen, delays over the simulatorsimulatorhardware interfaces are a major source of errors • The previous methods can reduce the instability but can not completely eliminate it • Similarly to the HIL and PHIL interface, the partitioning required by some multimulti-rate simulation methods can also cause instability • CONCLUSION: We need methods to predict the limit of stability related to the delay on the interfaces 73

Delay Related Stability Analysis • Traditional Methods – Bode Plot – Padé approximation of the exponential exp(exp(-sT) term – Z-transform methods

• All these methods are based on the ODE mathematical framework – Exp(Exp(-sT) is highly non linear and causes a phase decreasing when the frequency increases

• We would like to be able to answer the following questions: – What is the maximum delay which guarantees the stability of the partitioned system under these scenarios: • • • •

i) commensurate delays (the ratio between any two delays is rational)? Ii) Time varying delays? iii) Multiple delays? Iv) delay dependent and delay independent stability 74

Delay Related Stability Analysis • We will discuss these questions tomorrow – “Study of the Stability of Transient Simulation of Power Systems: A New Approach Based on Delay Differential Equations Theory” [6]

• The proposed method allow us to determine: – The delay dependent and delay independent stability conditions – The maximum delay bound to guarantee stability – For partitioned systems with any number of commensurate delays

75

Summary • The delay over the interfaces (partitioning or simulator--hardware interface) can cause the simulator simulation to become unstable • Many interface models have been proposed to increase the stability • But, at this time, no method can completely eliminate the instability for any interface condition • We have proposed a new method, based on DDE framework, to predict the maximum delay causing the simulation to become unstable 76

Time Synchronization

77

Common Practice: Fixed Integration Time Step • Typically realreal-time simulation executed with fixed step • straight straight--forward as the update rate is equal to the execution rate • the selected integration stepstep-size must be at least as large as the longest execution time • Other: variablevariable-step based on execution time; time--tagged data exchanged between time asynchronous processes

78

Synchronization Strategies • Synchronization between processes or distributed compute nodes – to ensure coherent data – remain time time--synchronous

• Synching methods to coordinate processes – polling – interrupts – standardized modulated “time code” signals (IRIG(IRIG-B) – 0.1us resolution

– architectures are master and slaves so that there is a common clock

79

High Performance Architectures for HIL

80

High Performance Simulation “Hyperfast” Simulation • Use multimulti-core technology • frametimes ~ 10 µseconds • offload operator interface and datadata-logging • remove RTOS overhead

81

Adding FPGAs to Simulator • FPGAs are finding wide spread use – becoming the “new” DSP • High High--performance solutions are possible

82

Ultra--High Speed Simulation & Control Ultra • frametimes ≤ 10 µsecond threshold with today’s technology require another type of technology • FPGAs are one answer

83

FPGA Board for Hardware Interface • Boost converter application example • R.M. Howe HSC 2007 • 10 kHz PWM control • 1 MHz input sample & average switch time using FPGA • 0.1 MHz frametime

84

Distributing the Application • Higher speed subsystems executed on FPGA

Runtime Utilities

85

FPGA Board Considerations • Gate count, logic cell count • Internal FPGA architecture: Xilinx SX x LX – Xilinx LX is optimized for logic/gate count (economic) – Xilinx SX is optimized for multiply/accumulate (better for state space)

• Onboard standard bus supporting digital/analog acquisition daughter cards • Onboard High speed communication interfaces available – – – –

latency bandwidth PCI--E for communication to simulator platform PCI Standard for FPGA to FPGA (very time sensitive) aurora

• Cost Cost--effective 86

Challenges • FPGA Programming efficiently per application • Rapid Rapid--prototyping requires the need to do quick turn around on an application • Programming Tools? – Efficiency – Operator knowledgeknowledge-level required

• Available FPGA Board Options – Difficult to find a board with efficient IO and open driver access

87

Summary & Strategy for the Future

88

Future Architecture • Combine multimulti-core PC technology with FPGA technology • Exploring FPGA board availability & development with ononboard IO • high high--speed bus interface

89

LiquidWave Flow Rectificateur en pont complet

Rload=10 Cload=20uF

From Simulink to FPGA in RealReal-Time

Discrete, Ts = 1e-006 s.

Rs=50 Cs=4uF Rd=0.01 Ld=15uH

L=50mH Vs=100Vpp Vsource

+ v -

Il m

k

Vsource

D1

Id3

D2

Vd3

a

L

Load

i -

+

Id1

Il

Vd1 k

m

Vs

a

D3

D4

+ v -

Vout

Vout

Scope

Partitioning Interface Modeling Multi--rate Multi Discretization Architecture Time Synchronization Design Optimizations Script Generation VHDL Generation Commercial FPGA CAE FPGA Board

90

High Level Modeling and Design Flow Partitioning A U T O M A T I Z A T I O N

LiquidWave

Interface Modeling Multi--rate Multi Discretization Architecture Time Synchronization Design Optimizations

Matlab HDL Coder ??? Xilinx System Generator Altera DSP Builder LabView C-like tools

Script Generation VHDL Code Generation 91

LiquidWave Simulation Result #1 Three--phase DC Three DC--AC converter with internal PWM Discrete, Ts = 5e-007 s.

DC-AC PWM Inverter Vsw1A

+ v -

Vsw1A

Cnt1

+ v -

Cnt1_b Pulses

Vsw2A

Cnt2 Demux Cnt2_b

2

Cnt3

PWM

sw1

Isw2A

Ia +

g Ib

-

A

+

i Ic

Ia

sw2

Rs = 200 Ohms Lon = 60 uH Ron = 0.01 Ohms

LoadA +

Scope

g

LoadB VDC 100V

-

A

+

i -

Ib

sw3

Rload = 10 Ohms Lload = 50 mH

Isw1A

Multimeter

Cnt3_b

F0 = 10 kHz Fc = 60 Hz m = 0.8

Demux

Vsw2A

+

g

-

A

+

LoadC

i -

Ic

92

LiquidWave Simulation Result #1 Three--phase DC Three DC--AC converter with internal PWM ThreePhInv0u5 - FPGA simulation @ Ts = 5.000000e-001us

ThreePhInv0u5 - Matlab/SymPowerSystems Simulation 100 V sw 1A

V sw 2 A

100 50 0 0.02

0

0.0205

0.021

0.0215

0.022

0.0225

0.023

0.0235

0.024

0 V sw 2A

50 0 0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05

Isw1A

0 -2 0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

50 0 0.02

0.05

2 Ic

0.005

100

100 V sw 1 A

50

0.0205

0.021

0.0215

0.022

0.0225

0.023

0.0235

0.024

2 1 0 -1 0

0.05

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05

Isw2A

Ib

2 0 -2

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.0205

0.05

0.021

0.0215

0.022

0.0225

0.023

0.0235

0.024

2 Ia

Ia

0 -1

2 0

0 -2

-2 0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05

0

0.005

0.01

0.015

0.02

0.025 0.03 time(seconds)

0.035

0.04

0.045

0.05

2 Ib

2 1 0 -1

0 -2

0.05

2

2

1

Ic

ISig a rm 12s w 1 I Sig a r m 11s w 1

1

0

0 -2

0.02

0.0205

0.021

0.0215

0.022 time(seconds)

0.0225

0.023

0.0235

0.024

93

LiquidWave Simulation Result #2 50Hz to 60 Hz Cicloconverter Discrete, Ts = 1e-006 s.

Vs = 150 V Fs = 50Hz

Iind Vcap

A

+

B

Phase_A C

+

Lf

Vab

+ v -

Lf = 100 mH Cf = 100 uF

Sbridge i -

Ia

Vab

Scope1

g + A

Iind

B

-

+

i -

A

Ia

B

C

Phase_B

Dbridge

Cf

Vcap Phase_C

Rs = 100 Ohms Ron = 0.1 Ohms Lon = 100 uH

C

+ v -

Load

Rs = 100 Ohms Ron = 0.1 Ohms Lon = 50 uH

Cnt1_b Cnt2

Demux Cnt2_b

Pulses

PWM

Cnt1

Fc = 2000 Hz Fs = 60 Hz m = 0.8

Cnt3 Cnt3_b

Scope2

94

LiquidWave Simulation Result #2 50Hz to 60 Hz Cicloconverter cicloconv1 - Matlab/SymPowerSystems Simulation

100

4

Iind

6

50 0

2 0

0

0.01

0.02

0.03

0.04

0.05

0.06

100

100

Vab

200

150 Vcap

Vcap

cicloconv1 - FPGA simulation @ Ts = 1us 150

0

0

0.01

0.02

0.03

0.04

0.05

0.06

0

0.01

0.02

0.03

0.04

0.05

0.06

0

0.01

0.02

0.03

0.04

0.05

0.06

0

0.01

0.02

0.03 time(seconds)

0.04

0.05

0.06

50

-100 -200

0 0

0.01

0.02

0.03

0.04

0.05

0.06

200

6

Iind

Vab

4

0

2 0

-200 0

0.01

0.02

0.03

0.04

0.05

0.06

2

2

Ia

Ia

1 0

0

-1 -2

-2 0

0.01

0.02

0.03 time(seconds)

0.04

0.05

0.06

95

Thank you! Questions?

96

References

97

References • • •











[1] Lupton, G. & Thulin, D., “Accelerating HPC Using GPU’s,” Georgia Tech Cell BE Workshop, June 13, 2008 [2] Sacco, Sharon, et al., “A High High--Level Signal Processing Library for Multicore Processors,” GT Cell Workshop, SMHS, June 19th, 2007 [3] Bovay, J., Henderson, B., Lin, H.Y., Wadleigh, K., ““Accelerators Accelerators For High Performance Computing Investigation,” Investigation,” High Performance Computing Division, Hewlett--Packard Company, January 24, 2007 Hewlett [4] Kuffel, R., Wierckx, Forsyth, P., R.P., Duchén, H., Lagerkvist, M., P. Holmberg, Wang, X., “Expanding “Expanding an Analogue HVDC Simulator’s Modeling Capability Using a Real--Time Digital Simulator (RTDS),” First International Conference on Digital Real Power System Simulators - ICDS ‘95, College Station, Texas, USA., April 55-7, 1995 [5] Wu, X., Lentijo, S., & Monti, A., “A Novel Interface for PowerPower-Hardware Hardware--InIn-the the-Loop Simulation,” IEEE workshop on Computers in Power Electronics, Urbana IL , USA, August 1515-18, 2004, pp. 178 178--182 [6] Pimentel, J.C.G., “Study of the Stability of Transient Simulation of Power Systems: A New Approach Based on Delay Differential Equations Theory,” Huntsville Simulation Conference 2009, Huntsville, AL, USA, October 2727-29, 2009 [7] Lelarasrnee, E., Ruehli, A. E., Sangiovanni Sangiovanni--Vincentelli, A. L., "The Waveform Relaxation Method for Time Domain Analysis of Large Scale Integrated Circuits" IEEE Trans. on CAD of IC and Systems, Vol 1, No. 3, July 1982, pp. 131131-145 [8] Szczupak, J., Facerolli, S. T., & Guedes, K. B., “Electrical Network Simulation By Multirate Parallel Digital Filter Structures,” 2003 IEEE Bologna PowerTech Conference, June 2323-26, Bologna, Italy

98