Matlab/Simulink users to develop ..... Signal processing: Xilinx's System Generator and Altera's DSP Builder ..... Manual partitioning done by the user. â Assisted ...
Huntsville Simulation Conference 2009 Kylowave Inc. (former Castel Research Canada Inc.) Applied Dynamics International
Advances in RealReal-Time Hardware--in Hardware in--thethe-Loop (HIL) Simulation Technology
1
Introductions Castel Research Inc.
Applied Dynamics Intl.
• •
• • •
•
•
Complex System Experts Automatic code generation from high level system models Leverage large base of Matlab/Simulink users to develop applications for massively parallel platforms Integrated solutions
RealReal-time Simulation Expertise Tools & Total Systems Supporting Programs in: Aerospace, Defense, Automotive, Land Vehicles, Power Generation
•
Founded 1957
Zero-defect process, hardware design, Zeroverification and realreal-time simulation
•
Complex System applications Aerospace/Defense, Automotive and CleanTech Power Systems
2
Agenda • • • • • • • • • •
Introductions Objectives General Architectures Emerging Alternative Architectures Real--time Integration Algorithms Real Multi--rate Simulation Multi HIL Interface Characteristics Time Synchronization Some HighHigh-Performance Architectures for HIL Summary & Strategy for the Future
3
Objectives • • • •
Overview of HIL requirements Available HIL simulator architectures More challenging HIL requirements Strategies and architectures to address requirements
4
General HIL Architecture & Requirements
5
Typical HIL Applications • HIL is an efficient, costcost-effective method to develop and test dynamic systems with embedded devices, control systems, avionics • Control System Development and Test – Jet Engine HIL Simulation for FADEC Testing – IronBird (fly(fly-by by--wire actuation) – Automotive
• Integration Labs – Aircraft – Ground Vehicles – “Satellite “Satellite--in in--a-Box”
6
Aircraft Integration Test Facilities • Distributed Architecture – Modular – Parallel Development – Accessibility
7
Typical HIL Applications • Historically class of problems solved using HIL simulation required ~1 ms frameframe-time – mechanical system with controls: missile, aircraft, automotive – Characteristically • set of non non--linear, scalar, differential equations • relatively small number of data samples per frame • often stiff systems
• Previous 55-10 years we’ve worked with applications requiring 100 µsecond range and ≤10 µseconds
8
Higher Speed RealReal-time Requirement • Applications Requiring Higher Speed (on the order of 1 µsecond) • Power Systems Research and Development – Hybrid Hybrid--Electric Vehicles – More Electric Vehicles (Aircraft, Ship) – Smart Grid
• Optimize & Test Control Strategies – Detailed Model of Energy sources: Solid Oxide Fuel Cell and LiLiIon Battery – Electrical Component Simulation • • • •
High Speed DC DC--DC Converters High Speed AC AC--DC Rectifiers High Speed DC DC--AC Inverters High Speed AC AC--AC Matrix Converters
9
Simulation Frame Time Constraints • Maximum integration stepstep-size limited for accurate and stable simulation – Simulation model representing system with fast response
100 microsecond frame cycle
10 microsecond frame cycle
– Example 2nd order systems: 800 Hz vs 100 Hz
10
Simulation Frame Time Constraints • Limited by sampling requirement – –
switching used in electric motor control: PWM frequencies are generally in the 10 to 20 KHz range DC to AC inverter •
•
degree of realism of the generated phase signals depends on the ratio of the PWM frequency to the frequency of the phase signals; the higher this ratio, the better Increasing the PWM frequency allows smaller components such as inductors and capacitors to be used, thus saving space
11
HIL Requirements & Architecture • HIL RealReal-time Requirements – – – – – –
determinism low interrupt response time high--resolution clock high high performance low latency hardware interface low latency process dispatch
12
System Load • • • •
• • •
Application code Sampling & Updating external hardware Communication with other applications Communication interfaces (MIL--STD 1553, ARINC, (MIL CAN, Firewire, UDP, etc.) RTOS overhead Data Acquisition and Storage Operator input
13
RTOS Considerations • Stripped Stripped--down Executive – extremely low overhead – may require expertise to support lowerlower-priority threads
• POSIX POSIX--compliant – supports operatoroperator-friendly utilities – software maintenance based on large user community – overhead is measurable for highhigh-performance applications
• Support and scheduling for multiple threads
14
Other Considerations • • • • • • • • •
Cost-effective CostPortability Upgradeability Scaleable processing Scaleable IO Reconfigurable for different projects Cabling ease and accessibility Environmental Conditions Service life
15
Real--time Buses Real • Hardware interface boards supported via bus – VME • 40 MByte/s
– PCI, PCIPCI-X, cPCI, PXI • 133 MByte/s
– PCI PCI--E • point point--to to--point • 250 MByte/s to 16 GByte/s
• Published bus speeds are “burst” • Realistically must be tempered with – interrupt latency – bus arbitration – software overhead
16
Hardware Interface Consideration • Although highhigh-speed Data Acquisition (DAQ) boards are available (50 Ms/second, 16 bit) they often are not optimal for HIL simulation applications • Problems for HIL applications: – latency due to pipepipe-lining – setup to access data is timetime-consuming – must read large amount of samples to get latest sample
17
PCPC-Based Simulators Cost--effective with relatively high performance Cost • large consumer market drives the PC technology advancements • Clock speeds ~3GHz • Optimized program execution: branches, etc. • Multiprocessor platforms with high--speed internal interconnect high capability • Common Operating systems: – QNX Neutrino RTOS – enhanced realreal-time Linux kernel
• PC Buses: PCI, cPCI, PXI, PCI PCI--E
18
Distributed Architectures • Advantages – Remote – Efficiency – Economy – Specific functionality – Divided as Physical Subsystems – Horsepower • Implementation – – – –
VME bus shared memory Reflective memory Ethernet UDP and TCP IEEE--1394 IEEE
19
Bottlenecks & Overhead • Latency from hardware sample to compute – – – –
bus traffic bus speed hardware interface latency time to setup external hardware access
• Overhead – RTOS – Synchronization with other processes – low priority services
20
Hardware Interface Strategies • • • •
Direct Memory Access averaging and extrapolation techniques point--to point to--point buses (PCI(PCI-E) drivers optimized for realreal-time simulation rather than DAQ • Development of IO boards specific to HIL
21
Execution Strategies • Distributed simulation on a multimulti-core PC – off off--load RTOS and operator interface to one processor – other processors are dedicated to application subsystems – shared clock – shared memory
QNX RTOS Slower I/O Slower Subsystems Operator Support
CPU1
simulation cycle
simulation cycle
– each core has direct access to PC bus
Faster Subsystems with IO access
CPU2--4 CPU2 22
Emerging Alternative Technologies
23
What you will learn • Why we need to consider it • What are the main competing technologies • High level view of their internal architectures • Programming models • Comparison of some key metrics
24
Motivation • Problem: end of uniprocessors – We have hit the Brick Wall : Power Wall + Memory Wall + ILP Wall – No more clock frequency increase: New “Moore‟s Law” is 2X “cores” per chip every technology generation
• But applications are becoming bigger and more complex everyday – Multi physics and multi domain simulation – All All--electrical ships, aircrafts, vehicles, etc. – Low latency and ultra high speed HIL realreal-time simulation including • High speed power converters • High speed controllers • Requires timestep below 1us
• Challenge: Multicore CPUs alleviates some problems for now, but the main question is how to scale to massively parallel processing
25
Emerging Architectures • GPGPU - General Purpose Graphics Processing Unit – – – –
designed to accelerate graphics applications highly parallel processors capable of 100’s of Gflops/sec Mature technology driven by strong in the PC gaming market Three main vendors: Nvidia, AMD/ATI and Intel
• Cell/BE Processor – – – –
Includes one Power PC and 8 RISC processors IBM Cell Processor is used in video games. Ex.: PS3 Developed by: SCEI/Sony, Toshiba and IBM Next generation: 2 PPCs and 16 processors @3.2 GHz
• FPGA - Field Programmable Gate Arrays – Widely used in high speed image and signal processing – Two main vendors: Xilinx and Altera
26
GPGPU • Difference between CPUs and GPGPUs • CPUs – are “smarter” but are “narrow” in data processing elements – Its control unit are capable of realizing many functions - very versatile – Achieve moderate level of parallelism through multimulti-core
• GPGPUs Massively Parallel
– devote many more transistors to data processing – Capable of massively parallel processing – Efficient implementation of the data streaming paradigm
27
GPGPU • Example: NVIDIA 8800 GTX
• •
Device Multiprocessor #N
– M = 8 processors – N = 16 multiprocessors
Multiprocessor #2 Multiprocessor #1 Shared Memory Registers Processor #1
Many processors together with no cache coherency 256 kb cache used by 128 processors
Registers
Registers
Processor #2
Processor #M
Instruction Unit
• •
Constant Cache
Texture Cache
• •
8192 reg/multiprocessor Constant and texture memories : 8 KB per multiprocessor (each) Peak processing: ~300 GF/s GPGPU peak data transfer – To Internal memory : ~530 Gb/s – To CPU: ~80 Gb/s (PCIe bus)
Device Memory
•
External memory peak transfer rate – CPU to memory (DDR2 800 - PC 6400): ~60 Gb/s 28
GPGPU • Main solutions – NVIDIA • GeForce (gaming/movie playback), Quadro (professional graphics) ans Tesla (HPC)
– AMD/ATI • Radeon (gaming/movie playback) and FireStream (HPC)
– Intel • Larabee (HPC/gaming) • Polaris research project - above 1TFlop/sec (next generation)
• Example of programming languages – Platform specific: NVIDIA’s CUDA and CTM from AMD/ATI – Commercial • RapidMind • PeakStream
– Standards • OpenCL and OpenGL 29
Cell/BE • High level internal architecture •
1 Power Processing Element (PPE) –
– – – – –
•
8 Synergistic Processor Elements (SPEs) – – – –
•
power processing unit (PXU) - generalgeneralpurpose 6464-bit RISC processor (PowerPC) two way hardware multithreading on on--chip L1/L2 cache 512 KB L2 cache 32 KB L1 instruction cache 32 KB L1 data cache
synergistic processor unit (SXU) special--purpose processor special 128--bit SIMD 128 Local Store (LS) - 256 KB local Memory Direct Memory Access Engine (SMF) (250 Gb/s)
Interconnected with the Element Interconnect Bus (EIB)
30
CellBE • 2nd Generation : IBM PowerXCell 8i (2008) – – – – – –
9 cores, 10 threads ~230 GFlops peak (SP) @3.2GHz ~110 GFlops peak (DP) @3.2GHz Up to 250 Gb/s memory bandwidth Up to 750 Gb/s I/O bandwidth Top frequency > 4GHz (observed in lab)
31
FPGA • Basic FPGA architecture •
FPGAs – Are semiconductor device consisting of programmable logic components and programmable interconnects – Allow programmers to hardcode a specific algorithm without incurring the overheads of instructioninstruction-style programming CLB Slice 0 I/O Blocks (IOBs)
LUT
Carry
CLR
Programmable interconnect
Block SelectRAM™ resource Dedicated multipliers
Configurable Logic Blocks (CLBs)
PRE D Q CE
LUT
Carry
D PRE Q CE CLR
Clock Management (DCMs, BUFGMUXes)
Each CLB includes 4 slices
32
FPGA • General internal architecture •
Modern FPGAs can run at 600 MHz and can include: – – – – – –
•
A VirtexVirtex-6 LXT FPGAs include: – – –
RAM and ROM memories Multipliers and accumulators High speed serial transceivers Lots of glue logic and lookup tables Lots of registers Better yet: they are reprogrammable
– –
600 MHz switch Matrix Up to 720 36Kbit Block RAM @600MHz Up to 864 25x18 multipliers, 4848-bit adder, and 4848-bit accumulator up to 36 lowlow-power GTX 6.5 Gb/s serial transceivers Up to 2 interface blocks for PCIe
Fixed point Oper/s (1)
GF/s (SFP) (2)/(3)
GF/s (DFP) (2)/(3)
Max. IO transfer (Gb/s) (4)
Number Of FFPs (Millions)
Maximum BlockRAM (kb)
XC6VLX75T
345
172/56
20/13
78
11.6
5,616
XC6VLX240T
922
462/150
54/35
156
37.7
14,976
XC6VSX475T
2420
924/300
108/78
234
74.4
38,304
Device
(1) - One multiply and one accumulate of 25x18 bits @600 MHz (2)/(3) – means peak / sustainable based on Floating Floating--Point Operator v5.0 (4) – estimation based on using only GTX 6.5 Gb/s transceivers
33
FPGA •
Specifics – have a long history in embedded processing and specialized computing – are generally slower than custom ASICs (“Application(“Application-Specific Integrated Circuit”) and can only implement designs of moderate complexity – On the other hand, custom ASICs are not programmable and their development cycle is much longer and expensive – are very good at implementing • small integer and floating floating--point calculations with a small number of bits • Control logic
•
Programming languages – VHDL and Verilog • Allow to generate very efficient implementations • But not what the HPC community is used to using
– C-like languages • Conventional wisdom: the closer to standard C that the solution is, the worse the resulting application performs • Ex.: Celoxica’s Handel C, Mitrionics’s Mitrion Mitrion--C and Impulse C from Impulse Technologies
– Higher level of abstraction tools • Signal processing: Xilinx’s System Generator and Altera’s DSP Builder • Controller: Mathworks HDL Coder • LiquidWave from Castel Research
34
Comparison metrics • PROGRAMMING - clear tradeoff between performance and programming effort [2] • Example: Time domain FIR filter C SIMD Hand Parallel C
Coding
(8 SPEs)
33
110
371
546
Design Time
Minute
Hour
Hour
-
Coding Time
Minute
Hour
Day
-
Debug Time
Minute
Minute
Day
-
Performance Efficiency (1 SPE)
0.014
0.27
0.88
0.82
0.27
5.2
17
126
Lines of code
GFLOPS @ 2.4 GHz
Data for Mercury Cell Processor System (@ 2.4 GHz) 1 Cell Processor: ~150 ~150 GF/s max (single precision) ~15 GF/s max (double precision)
• This example runs in a IBM Cell/BE processor • But similar numbers apply to the other two technologies as well • CONCLUSIONS: 1) Brick walls – no more faster processors for larger and more complex applications 2) Challenge - how to program massively parallel hardware without sacrificing performance too much 3) Highly optimized libraries are helpful, but what if the computation you want is not in the library? 35
Summary • Each type is good for specific types of applications – CPUs (single and multimulti-core) • are very versatile, “smart” but “narrow” in processing elements • It is not clear how to scale to massively parallel processing
– GPGPUs and CellBEs • • • • •
Scale well for massively parallel processing SIMD parallel data model and very rich in processing elements Performance has scaled faster than multicore CPUs $/W and GF/W is much higher than CPUs Very good at: graphics, 32 32--bit floating floating--point
– FPGAs • Scale well for massively parallel processing • Very good at: embedded applications, controllers, signal pre pre--processing, digital filter, etc. • Are used in HIL for sampling and pre pre--processing of fast signals • Very low W/GF but very high $/GF • Very good at integer and applications that require a small number of bits 36
Integration Algorithms
37
Discretization Methods • Numerical Integration – Forward rectangular (Euler) – Backward rectangular (Implicit Euler) – Trapezoidal (Tustin or Bilinear) with prewarping
• ZeroZero-pole mapping • Hold equivalence – ZOH – FOH
38
In General Use • Typically simulation packages used to model the application (Simulink, ACSL, SystemBuild, Dymola, etc.) • The application can be implemented in the continuous domain and one of a number of integration algorithms can be selected for its solution • Methods such as: – – – – –
Euler (1st order) Heun or Runge Runge--Kutta (2nd order) Predictor algorithms: Adams Bashforth (2nd, 3rd, and 4th order) Predictor corrector algorithms: Adams Adams--Moulton Runge Runge--Kutta (4th order)
39
Solver Considerations • Explicit vs Implicit – iterative method to solve implicit algorithm not typically used for fixed--step real fixed real--time integration – if derivatives depend on hardware inputs and require implicit solution the hardware input must be approximated for the next timetime-step
40
Solver Considerations • Single step vs MultiMulti-pass • Algorithm Order
Definitions state: Y input: U integration stepstep-size: T nonlinear differential equation: F(Y, U)
Euler: Yk+1 = Yk + T * F(Yk,Uk) AB2: Yk+1 = Yk + 0.5 * T * [3 * F(Yk, Uk) – F(Yk-1, Uk-1)] RK2: Ŷk+1 = Yk + T * F(Yk, Uk) Yk+1 = Yk + T * [F(Yk, Uk) + F(Ŷk+1, Uk+1)] RKRT2: Ŷk+1/2 = Yk + 0.5 * T * F(Yk, Uk) Yk+1 = Yk + T * F(Ŷk+1/2, Uk+1/2) Howe, R.M., “Dynamics of RealReal-time Digital Simulation,” 1996, Applied Dynamics International 41
Solver Considerations Higher-order algorithm may allow larger Higherintegration stepstep-size given increased accuracy
current
Compares AdamsAdams-Moulton 2nd order to Euler
42
Compatibility with RealReal-time Samples Uk+1
Expected Inputs
Uk
time
kh
(k+1)h
Outputs
Yk
Yk+1
Expected Inputs
Uk
Uk+1not available yet!
Yk+1 = Yk + T * F(Yk,Uk)
AB2 Method Yk+1 = Yk+0.5*T *[3*F(Yk, Uk) – F(Yk-1, Uk-1)]
Uk+1
RK2 Method
time
kh
(k+1/2)h
(k+1)h
Outputs
Yk
Ŷk+1
Yk+1
Uk
Uk+1/2
time
kh
(k+1/2)h
(k+1)h
Outputs
Yk
Ŷk+1/2
Yk+1
Expected Inputs
Euler Method
Uk+1
Ŷk+1 = Yk + T * F(Yk, Uk) Yk+1 = Yk + T * [F(Yk, Uk) + F(Ŷk+1, Uk+1)]
RKRT2 Method Ŷk+1/2 = Yk + 0.5 * T * F(Yk, Uk) Yk+1 = Yk + T * F(Ŷk+1/2, Uk+1/2)
43
Solver Considerations • Stability regions based on eigenvalues and stepstepsize • Predictor algorithms depend on previous derivatives so use Euler for startup step if discontinuities • Multi Multi--rate simulations
44
Error Estimates Error measures in numerical integration are difficult to develop, partly due to the propagation of errors from step to step; however, error estimates have been derived for comparing the performance of integration methods. These error estimates include: • Error estimates of local and global errors (direct quantifications of roundround-off and truncation errors made over the integration interval) • Error estimates that incorporate the dynamics of the linearized system model (characteristic root errors, transfer function errors, phase shift errors, etc.)
45
Quantifying Dynamic Errors • Error in simulations due to – finite integration step – sample & update rates – latencies
• Common methods – compare real real--time simulation outputs with accurate offoff-line results – decrease the integration step step--size by half and compare
• On On--line calculation of dynamic error – R.M. Howe, “A New Method for On On--line Measurement of Dynamic Errors in RealReal-time Simulation,” ADIUS 1997
46
Multi--rate Simulation Multi
47
What you will learn • Why we need to consider it • What are the main groups of methods • The multimulti-rate Filtering/extrapolation methods • Some other multimulti-rate methods
48
Motivation • Problem: RealReal-time simulation of complex multi multi-physics systems is a large problem • Three main reasons why the direct approach can become inefficient – sparse matrix solution time grows supersuper-linearly with the size of the problem – for large differential equation systems, the state variables normally change at very different rates – Systems may include stiff and nonnon-stiff models
• More efficient strategy – Partition the system under test – Use a different multimulti-rate method in each part
49
Current state • During the last decades, many partitioning and multi--rate methods have been proposed and multi widely used for offline simulation • Real Real--time requirements add more complexity – Time must advance in fixed steps (no variable time step) – All calculations in any time step must be completed before the start of the next time step (the simulation can’t go back in time)
• In the last five years, partitioning and multimulti-rate methods have attracted new attention from academia and industry
50
Partitioning strategies • Sub Sub--systems can be partitioned at: – physical level – equation level
• Most commercial realreal-time simulators partition at system level – Manual partitioning done by the user – Assisted or not by the simulator
• Automatic partitioning still not widely used – Stiff versus nonnon-stiff equations – Slower versus higher dynamics – Linear versus non linear subsub-systems
51
Research directions in the last decades • Multi Multi--method schemes – Used for systems containing both nonnon-stiff and stiff parts – The partitioning is done on the level of the discretization scheme – An explicit scheme is used for the nonnon-stiff parts, and an implicit method for the stiff ones
• Multi Multi--order schemes – Used for nonnon-stiff problems only – The same explicit method, and the same step size, is used for all parts – But the order of the method is chosen according to the activity level of the subsystem
• Multi Multi--rate schemes – Used for both stiff and nonnon-stiff problems – The same (implicit or explicit method), with the same order, is applied to all subsystems – But the step size is chosen according to the activity level
• Presently, the majority of research and development is done on multimulti-rate methods 52
Multi--rate scheme Multi • Basic idea of multi multi--rate methods – Most systems are formed by subsub-systems with very different dynamics – Partition the system according to those dynamics – Simulate each subsub-system using the most appropriate time step
Typical dynamics of some subsub-systems
Power converter partition example
53
Filtering/extrapolation methods • Consider two sub sub--systems running at a fast (h) and slow (H) frame rate Such that: H = N * h – They can not immediately exchange information
• We must decrease (“downsample”) the sample rate from the fast subsub-system to the slow one – Otherwise, the sampled value of the rapidly changing signals from the fast subsub-system can become very unpredictable
• Increase (“upsample”) data sample rate from the slow sub--system to the fast one sub • Decide which one should be done first: down or up sampling • And ensure the two parallel simulations are kept synchronized 54
Filtering/extrapolation methods • Downsampling – Decimation – used on multi multi--rate digital filter design • For every N samples, keep one and discard the remaining (N (N--1)
– Average over the larger time step 1 iD ( t ) = Tb − T a
N
i D ( Ta + ( k −1) ) + i D ( Ta + k )
k =1
2
∑
⋅ ∆t
– Low pass filtering • Use linear phase filters to minimize error
• Upsampling – Zero Zero--order hold • y(t) = y(kT), kT 1 instability • Note the effect of the delay
69
Bergeron Travelling Wave Models • Also called Transmission Line Model (TLM) [4] – Basic idea: realize partitioning over existing transmission lines – Three different implementations • Frequency Dependant Line Model, Bergeron Line Model, and Lossless Bergeron Line Model • Requires the transmission line length to correspond to at least one time delay to improve accuracy (some implementations require at least 2 time step delays) • Exchange voltages/currents at the end of every transmission line related delay • Issue: modeling resistance Rlk introduces losses which may be unacceptable GOL
1 − α ⋅ e −2 sτ Z S = ⋅ 1 + α ⋅ e −2 sτ Z lk α=
ZL -R lk ZL +R lk
Rlk ( L) =
L
τ τ Rlk (C ) = C
or
Va (k ) = V2 (k − 1) + Rlk ⋅ I 2 (k − 1)
Vb ( k ) = V1 (k − 1) + Rlk ⋅ I1 ( k − 1)
70
Voltage--Voltage Model Voltage • Also called Partial Circuit Duplication Method [4] – Basic idea: duplicate the link component between both circuits on each partitioned subsystem and use relaxation technique to calculate the interface variables • The method is easy to implement • It can achieve higher stability than the ideal transformer method for resistive systems (because it is easier to ensure GOL < 1) • Requires many iterations to improve accuracy not good for realreal-time GOL =
Z a Zb ⋅ e − sτ ( Z a + Z ab ) ⋅ ( Z b + Z ab )
71
First--Order Compensation Model First • It is also known as Time Time--variant First First--order Approximation – Proposed by Wu et al. [5] – The basic idea is to approximate the HUT by a firstfirst-order linear system (RL or RC topologies) and use it as a predictor – For example, assume an RL implementation (RC is similar) • First order HUT:
di2 = a ⋅ i2 + b ⋅ v2 dt • Apply trapezoidal discretization i2 (k ) ≈ α ⋅ v1 (k − 1) + β ⋅ i2 (k − 1),
i2 (k ) ≈ Geq ⋅ v1 (k − 1) + I eq • Solve for the coefficients −1 α v1 (k − 2) i2 (k − 2) i2 (k − 1) β = v (k − 3) i (k − 3) ⋅ i (k − 2) 1 2 2 • Limitations of the method – The matrix formulation above can easily become ill conditioned – It is also extremely sensitive to sensor noise – Predictor Predictor--based methods can lead to instabilities
72
Delay Related Stability Analysis • As we have seen, delays over the simulatorsimulatorhardware interfaces are a major source of errors • The previous methods can reduce the instability but can not completely eliminate it • Similarly to the HIL and PHIL interface, the partitioning required by some multimulti-rate simulation methods can also cause instability • CONCLUSION: We need methods to predict the limit of stability related to the delay on the interfaces 73
Delay Related Stability Analysis • Traditional Methods – Bode Plot – Padé approximation of the exponential exp(exp(-sT) term – Z-transform methods
• All these methods are based on the ODE mathematical framework – Exp(Exp(-sT) is highly non linear and causes a phase decreasing when the frequency increases
• We would like to be able to answer the following questions: – What is the maximum delay which guarantees the stability of the partitioned system under these scenarios: • • • •
i) commensurate delays (the ratio between any two delays is rational)? Ii) Time varying delays? iii) Multiple delays? Iv) delay dependent and delay independent stability 74
Delay Related Stability Analysis • We will discuss these questions tomorrow – “Study of the Stability of Transient Simulation of Power Systems: A New Approach Based on Delay Differential Equations Theory” [6]
• The proposed method allow us to determine: – The delay dependent and delay independent stability conditions – The maximum delay bound to guarantee stability – For partitioned systems with any number of commensurate delays
75
Summary • The delay over the interfaces (partitioning or simulator--hardware interface) can cause the simulator simulation to become unstable • Many interface models have been proposed to increase the stability • But, at this time, no method can completely eliminate the instability for any interface condition • We have proposed a new method, based on DDE framework, to predict the maximum delay causing the simulation to become unstable 76
Time Synchronization
77
Common Practice: Fixed Integration Time Step • Typically realreal-time simulation executed with fixed step • straight straight--forward as the update rate is equal to the execution rate • the selected integration stepstep-size must be at least as large as the longest execution time • Other: variablevariable-step based on execution time; time--tagged data exchanged between time asynchronous processes
78
Synchronization Strategies • Synchronization between processes or distributed compute nodes – to ensure coherent data – remain time time--synchronous
• Synching methods to coordinate processes – polling – interrupts – standardized modulated “time code” signals (IRIG(IRIG-B) – 0.1us resolution
– architectures are master and slaves so that there is a common clock
79
High Performance Architectures for HIL
80
High Performance Simulation “Hyperfast” Simulation • Use multimulti-core technology • frametimes ~ 10 µseconds • offload operator interface and datadata-logging • remove RTOS overhead
81
Adding FPGAs to Simulator • FPGAs are finding wide spread use – becoming the “new” DSP • High High--performance solutions are possible
82
Ultra--High Speed Simulation & Control Ultra • frametimes ≤ 10 µsecond threshold with today’s technology require another type of technology • FPGAs are one answer
83
FPGA Board for Hardware Interface • Boost converter application example • R.M. Howe HSC 2007 • 10 kHz PWM control • 1 MHz input sample & average switch time using FPGA • 0.1 MHz frametime
84
Distributing the Application • Higher speed subsystems executed on FPGA
Runtime Utilities
85
FPGA Board Considerations • Gate count, logic cell count • Internal FPGA architecture: Xilinx SX x LX – Xilinx LX is optimized for logic/gate count (economic) – Xilinx SX is optimized for multiply/accumulate (better for state space)
• Onboard standard bus supporting digital/analog acquisition daughter cards • Onboard High speed communication interfaces available – – – –
latency bandwidth PCI--E for communication to simulator platform PCI Standard for FPGA to FPGA (very time sensitive) aurora
• Cost Cost--effective 86
Challenges • FPGA Programming efficiently per application • Rapid Rapid--prototyping requires the need to do quick turn around on an application • Programming Tools? – Efficiency – Operator knowledgeknowledge-level required
• Available FPGA Board Options – Difficult to find a board with efficient IO and open driver access
87
Summary & Strategy for the Future
88
Future Architecture • Combine multimulti-core PC technology with FPGA technology • Exploring FPGA board availability & development with ononboard IO • high high--speed bus interface
89
LiquidWave Flow Rectificateur en pont complet
Rload=10 Cload=20uF
From Simulink to FPGA in RealReal-Time
Discrete, Ts = 1e-006 s.
Rs=50 Cs=4uF Rd=0.01 Ld=15uH
L=50mH Vs=100Vpp Vsource
+ v -
Il m
k
Vsource
D1
Id3
D2
Vd3
a
L
Load
i -
+
Id1
Il
Vd1 k
m
Vs
a
D3
D4
+ v -
Vout
Vout
Scope
Partitioning Interface Modeling Multi--rate Multi Discretization Architecture Time Synchronization Design Optimizations Script Generation VHDL Generation Commercial FPGA CAE FPGA Board
90
High Level Modeling and Design Flow Partitioning A U T O M A T I Z A T I O N
LiquidWave
Interface Modeling Multi--rate Multi Discretization Architecture Time Synchronization Design Optimizations
Matlab HDL Coder ??? Xilinx System Generator Altera DSP Builder LabView C-like tools
Script Generation VHDL Code Generation 91
LiquidWave Simulation Result #1 Three--phase DC Three DC--AC converter with internal PWM Discrete, Ts = 5e-007 s.
DC-AC PWM Inverter Vsw1A
+ v -
Vsw1A
Cnt1
+ v -
Cnt1_b Pulses
Vsw2A
Cnt2 Demux Cnt2_b
2
Cnt3
PWM
sw1
Isw2A
Ia +
g Ib
-
A
+
i Ic
Ia
sw2
Rs = 200 Ohms Lon = 60 uH Ron = 0.01 Ohms
LoadA +
Scope
g
LoadB VDC 100V
-
A
+
i -
Ib
sw3
Rload = 10 Ohms Lload = 50 mH
Isw1A
Multimeter
Cnt3_b
F0 = 10 kHz Fc = 60 Hz m = 0.8
Demux
Vsw2A
+
g
-
A
+
LoadC
i -
Ic
92
LiquidWave Simulation Result #1 Three--phase DC Three DC--AC converter with internal PWM ThreePhInv0u5 - FPGA simulation @ Ts = 5.000000e-001us
ThreePhInv0u5 - Matlab/SymPowerSystems Simulation 100 V sw 1A
V sw 2 A
100 50 0 0.02
0
0.0205
0.021
0.0215
0.022
0.0225
0.023
0.0235
0.024
0 V sw 2A
50 0 0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
Isw1A
0 -2 0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
50 0 0.02
0.05
2 Ic
0.005
100
100 V sw 1 A
50
0.0205
0.021
0.0215
0.022
0.0225
0.023
0.0235
0.024
2 1 0 -1 0
0.05
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
Isw2A
Ib
2 0 -2
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.0205
0.05
0.021
0.0215
0.022
0.0225
0.023
0.0235
0.024
2 Ia
Ia
0 -1
2 0
0 -2
-2 0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
0
0.005
0.01
0.015
0.02
0.025 0.03 time(seconds)
0.035
0.04
0.045
0.05
2 Ib
2 1 0 -1
0 -2
0.05
2
2
1
Ic
ISig a rm 12s w 1 I Sig a r m 11s w 1
1
0
0 -2
0.02
0.0205
0.021
0.0215
0.022 time(seconds)
0.0225
0.023
0.0235
0.024
93
LiquidWave Simulation Result #2 50Hz to 60 Hz Cicloconverter Discrete, Ts = 1e-006 s.
Vs = 150 V Fs = 50Hz
Iind Vcap
A
+
B
Phase_A C
+
Lf
Vab
+ v -
Lf = 100 mH Cf = 100 uF
Sbridge i -
Ia
Vab
Scope1
g + A
Iind
B
-
+
i -
A
Ia
B
C
Phase_B
Dbridge
Cf
Vcap Phase_C
Rs = 100 Ohms Ron = 0.1 Ohms Lon = 100 uH
C
+ v -
Load
Rs = 100 Ohms Ron = 0.1 Ohms Lon = 50 uH
Cnt1_b Cnt2
Demux Cnt2_b
Pulses
PWM
Cnt1
Fc = 2000 Hz Fs = 60 Hz m = 0.8
Cnt3 Cnt3_b
Scope2
94
LiquidWave Simulation Result #2 50Hz to 60 Hz Cicloconverter cicloconv1 - Matlab/SymPowerSystems Simulation
100
4
Iind
6
50 0
2 0
0
0.01
0.02
0.03
0.04
0.05
0.06
100
100
Vab
200
150 Vcap
Vcap
cicloconv1 - FPGA simulation @ Ts = 1us 150
0
0
0.01
0.02
0.03
0.04
0.05
0.06
0
0.01
0.02
0.03
0.04
0.05
0.06
0
0.01
0.02
0.03
0.04
0.05
0.06
0
0.01
0.02
0.03 time(seconds)
0.04
0.05
0.06
50
-100 -200
0 0
0.01
0.02
0.03
0.04
0.05
0.06
200
6
Iind
Vab
4
0
2 0
-200 0
0.01
0.02
0.03
0.04
0.05
0.06
2
2
Ia
Ia
1 0
0
-1 -2
-2 0
0.01
0.02
0.03 time(seconds)
0.04
0.05
0.06
95
Thank you! Questions?
96
References
97
References • • •
•
•
•
•
•
[1] Lupton, G. & Thulin, D., “Accelerating HPC Using GPU’s,” Georgia Tech Cell BE Workshop, June 13, 2008 [2] Sacco, Sharon, et al., “A High High--Level Signal Processing Library for Multicore Processors,” GT Cell Workshop, SMHS, June 19th, 2007 [3] Bovay, J., Henderson, B., Lin, H.Y., Wadleigh, K., ““Accelerators Accelerators For High Performance Computing Investigation,” Investigation,” High Performance Computing Division, Hewlett--Packard Company, January 24, 2007 Hewlett [4] Kuffel, R., Wierckx, Forsyth, P., R.P., Duchén, H., Lagerkvist, M., P. Holmberg, Wang, X., “Expanding “Expanding an Analogue HVDC Simulator’s Modeling Capability Using a Real--Time Digital Simulator (RTDS),” First International Conference on Digital Real Power System Simulators - ICDS ‘95, College Station, Texas, USA., April 55-7, 1995 [5] Wu, X., Lentijo, S., & Monti, A., “A Novel Interface for PowerPower-Hardware Hardware--InIn-the the-Loop Simulation,” IEEE workshop on Computers in Power Electronics, Urbana IL , USA, August 1515-18, 2004, pp. 178 178--182 [6] Pimentel, J.C.G., “Study of the Stability of Transient Simulation of Power Systems: A New Approach Based on Delay Differential Equations Theory,” Huntsville Simulation Conference 2009, Huntsville, AL, USA, October 2727-29, 2009 [7] Lelarasrnee, E., Ruehli, A. E., Sangiovanni Sangiovanni--Vincentelli, A. L., "The Waveform Relaxation Method for Time Domain Analysis of Large Scale Integrated Circuits" IEEE Trans. on CAD of IC and Systems, Vol 1, No. 3, July 1982, pp. 131131-145 [8] Szczupak, J., Facerolli, S. T., & Guedes, K. B., “Electrical Network Simulation By Multirate Parallel Digital Filter Structures,” 2003 IEEE Bologna PowerTech Conference, June 2323-26, Bologna, Italy
98