An Overview of (Electronic) System Level Design: beyond hardware ...

14 downloads 16199 Views 6MB Size Report
FABIO ROMEO, Magneti-Marelli ..... Magneti Marelli Power-train Platform Stack .... “The semantics and execution of a synchronous block-diagram language”,.
An Overview of (Electronic) System Level Design: beyond hardwaresoftware co-design

PARADES

Alberto Ferrari Deputy Director PARADES GEIE [email protected]

PARADES

Outline ‹Embedded System Applications ‹Platform Based Design Methodology ‹Electronic System Level Design Œ

Functions: MoC, Languages

Œ

Architectures: Network, Node, SoC

‹Metropolis ‹Conclusions

2

PARADES

ESL Design

‹ Designing embedded systems requires addressing concurrently

different engineering domains, e.g., mechanics, sensors, actuators, analog/digital electronic hardware, and software. ‹ In this tutorial, we focus on Electronic System Level Design (ESLD),

traditionally considered as the design step that pertains to the electronic part (hardware and software) of an embedded system. ‹ ESL design starts from system specifications and ends with a

system implementation that requires the definition and/or selection of hardware, software and communication components

3

PARADES

Outline ‹Embedded System Applications ‹Copying with heterogeneity ‹Methodology: platform based design ‹Electronic System Level Design Œ

Functions: MoC, Languages

Œ

Architectures: Network, Node, SoC

‹Metropolis ‹Conclusions 4

PARADES

Embedded Systems • Computational – but not first-and-foremost a computer

• Integral with physical processes – sensors, actuators

• Reactive – at the speed of the environment

• Heterogeneous – hardware/software, mixed architectures

• Networked – shared, adaptive Source: Edward A. Lee

5

6

PARADES

PARADES

OTIS Elevators

1. EN: GeN2-Cx

2. ANSI: Gen2/GEM

3. JIS: GeN2-JIS

7

PARADES

$4 billion development effort 40-50% system integration & validation cost 8

PARADES

Electronics and the Car

•More than 30% of the cost of a car is now in Electronics •90% of all innovations will be based on electronic systems

9

PWT UNIT

PARADES

Complexity, Quality, & Time To Market today BODY GATEWAY

INSTRUMENT CLUSTER

TELEMATIC UNIT

Memory

256 Kb

128 Kb

184 Kb

8 Mb

Lines Of Code

50.000

30.000

45.000

300.000

Productivity

6 Lines/Day 10 Lines/Day 6 Lines/Day 10 Lines/Day*

Residual Defect Rate @ End Of Dev

3000 Ppm

2500 ppm

2000ppm

1000 ppm

Changing Rate

3 Years

2 Years

1 Year

< 1 Year

Dev. Effort

40 Man-yr

12 Man-yr

30 Man-yr

200 Man-yr

Validation Time

5 Months

1 Month

2 Months

2 Months

Time To Market

24 Months

18 Months

12 Months

< 12 Months

* C++ CODE

FABIO ROMEO, Magneti-Marelli DAC, Las Vegas, June 20th, 2001

10

Fail Safe

CAN Lin

Fire Wall

PARADES

Theft warning

Door Module Gate Way

Light Module ABS

CAN TTCAN Gate Way

Steer by Wire

Soft Real Time

Access to WWW

DAB

Shift by Wire

Real Time

MOST Firewire

Navigation

Engine Management

Brake by Wire

Hard Real Time

Fail Stop

Mobile Communications

Air Conditioning

Fault Functional

System Electronics

Body Functions

Body Electronics

Driving and Vehicle Dynamic Functions

Information Systems

Telematics

Distributed Car Systems Architectures

FlexRay 11

PARADES

Design ‹ From an idea… ‹ … build something that performs a certain function ‹ Never done directly: Œ some aspects are not considered at the beginning of the development:   

Œ

Node and Network Processes and Processors SoC Software and Hardware

the designer wants to explore different possible implementations in order to maximize (or minimize) a cost function

‹ The solution is a trade-off among: Œ Mechanical partition Œ Hardware partition: analog and digital Œ Software partition: low, middle and application level

12

PARADES

(Automotive) V-Models: Car level

Distributed System Sign-Off! Sub-System(s) Integration, Test, and Validation

Development of Distributed System ‹ What:

Œ

Computation (hw/sw) Communication (hw/sw) 

Time trigger/Event trigger

‹ Abstractions ? ‹ Cost evaluation ?

System Electronics

Theft warning

Air Conditioning CAN Lin

Access to WWW

DAB Fire Wall

Door Module Gate Way

Light Module ABS

CAN TTCAN Gate Way

Steer by Wire

Soft Real Time

Fail Stop

MOST Firewire

Navigation

Shift by Wire

Real Time

Œ

Mobile Communications

Engine Management

Brake by Wire

Hard Real Time

‹ Trading (ES):

Body Electronics

Fail Safe

Architecture

Fault Functional

Œ

Information Systems

Body Functions

‹ How:

Telematics

Functionality

Driving and Vehicle Dynamic Functions

Œ

FlexRay

13

Development of Distributed System Development of Sub-System Development of Mechanical Part (s) ECU Development

PARADES

(Automotive) V-Models: Subsystem Level

Distributed System Sign-Off! Sub-System(s) Integration, Test, and Validation Sub-System Sign-Off! ECU/ Sens./Actrs./Mech. Part(s) Integration, Calibration, and Test

‹ What: Functionality ‹ How: Architecture ‹ Trading (ES): Œ Algorithm complexity (hw/sw) Œ Sensors/Actuators ‹ Abstractions ? ‹ Cost evaluation ? 14

Distributed System Sign-Off! Sub-System(s) Integration, Test, and Validation

Development of Distributed System Development of Sub-System Development of Mechanical Part (s) ECU Development

‹ What: Functionality ‹ How: Architecture ‹ Trade (ES): Œ Hardware Œ Software ‹ Abstractions ? ‹ Cost evaluation ?

PARADES

(Automotive) V-Models: ECU level (Hw/Sw)

ECU SW Development ECU HW Development ECU SW Implementation

Sub-System Sign-Off! ECU/ Sens./Actrs./Mech. Part(s) Integration, Calibration, and Test ECU Sign-Off! ECU HW/SW Integration and Test ECU HW Sign-Off! ECU SW Integration and Test 15

PARADES

(Automotive) V-Models

Distributed System Sign-Off! Sub-System(s) Integration, Test, and Validation

Development of Distributed System Development of Sub-System Development of Mechanical Part (s) ECU Development ECU SW Development ECU HW Development ECU SW Implementation

Sub-System Sign-Off! ECU/ Sens./Actrs./Mech. Part(s) Integration, Calibration, and Test ECU Sign-Off! ECU HW/SW Integration and Test ECU HW Sign-Off! ECU SW Integration and Test 16

PARADES

Common Situation in Industry ‹Different hardware devices and architectures ‹Increased complexity ‹Non-standard tools and design processes ‹Redundant development efforts ‹Increased R&D and sustaining costs ‹Lack of standardization results in greater quality risks ‹Customer confusion

17

PARADES

How to… ‹How to propagate functionality from top to bottom ‹How to evaluate the trade offs ‹How to cope with: Œ

Design Time

Œ

Design Reuse

Œ

Design Heterogeneity

‹How to abstract with models that can be used to reason

about the properties

18

PARADES

Heterogeneity in Electronic Design ‹Heterogeneity in: Œ

Specification:   

formal/semi-formal/natural language MoC Language

Œ

Analysis

Œ

Synthesis: 

Manual/automatic/semi-automatic

Œ

Verification

Œ

Methodology

Œ

Design Process 19

PARADES

Outline ‹Embedded System Applications ‹Platform based design methodology ‹Electronic System Level Design Œ

Functions: MoC, Languages

Œ

Architectures: Network, Node, SoC

‹Metropolis ‹Conclusions

20

PARADES

Separation of concerns ‹Computation versus Communication ‹Function versus Architecture ‹Function versus Time

21

PARADES

Separation of Concerns (1990 Vintage!) Behavior Components

IPs

Virtual Architectural Components

C-Code Matlab ASCET

Buses CPUs

Buses Buses

Operating Systems

Development Process

Analysis System Behavior Specification

f1

System Platform

f2

ECUECU-1

f3

ECUECU-3

Mapping Implementation Calibration After Sales Service

ECUECU-2

Performance Analysis

Bus

Evaluation of Architectural and Partitioning Alternatives

Refinement

22

PARADES

Principles of Platform methodology: Meet-in-the-Middle ‹ Top-Down: Œ

Define a set of abstraction layers

Œ

From specifications at a given level, select a solution (controls, components) in terms of components (Platforms) of the following layer and propagate constraints

‹ Bottom-Up: Œ

Platform components (e.g., micro-controller, RTOS, communication primitives) at a given level are abstracted to a higher level by their functionality and a set of parameters that help guiding the solution selection process. The selection process is equivalent to a covering problem if a common semantic domain is used.

23

PARADES

Platform Models for Model Based Development

Distributed Distributed System System Sign Sign--Off! Off!

Development Development of of Distributed Distributed System System Distributed System Requirements Distributed System Partitioning Sub-Systems Model Based Development Sub-Systems (s) Requirements Network Network Protocol Protocol Requirements Requirements Sub Sub--System(s) System(s) Sign Sign--Off! Off!

Sub Sub--System(s) System(s) Integration, Integration, Test, Test, and and Validation Validation

Platform Abstraction

Virtual Integration of Sub-System(s) w/ Network Protocol, Test, and Validation Sub-System(s) Implementation Models Sign-Off!

Network Network Communication Communication Protocol Protocol Sign Sign--Off! Off!

24

WHAT ? ‹

Design Exploration Œ

Partitioning

Œ

Scheduling

Œ

Estimation ‹

Interface Synthesis (or configuration)

‹

PARADES

Meet-in-the-middle

HOW ?

Platform Abstraction

Component Synthesis (or configuration) 25

PARADES

Aspects of the Hw/Sw Design Problem ‹ Specification of the system (top-down)

‹ Architecture export (bottom-up) Œ Abstraction of processor, of communication infrastructure, interface between hardware and software, etc. ‹ Partitioning Œ Partitioning objectives   

Œ

Minimize network load, latency, jitter, Maximize speedup, extensibility, flexibility Minimize size, cost, etc.

Partitioning strategies  

partitioning by hand automated partitioning using various techniques, etc.

‹ Scheduling Œ Computation Œ Communication ‹ Different levels: Œ Transaction/Packet scheduling in communication Œ Process scheduling in operating systems Œ Instruction scheduling in compilers Œ Operation scheduling in hardware ‹ Modeling the partitioned system during the design process 26

PARADES

Platform-based Design Tensilica Xtensa RISC CPU

ASICs

Application Space

SRAM

Application Instance Sonics Silicon Backplane

Platform Mapping

Speech Samples Interface

UART Interface

External Bus Interface

System (Software + Hardware) Platform

Platform Design-Space Export

Flash

Wireless Processor Protocol

Baseband Processor

Bus

Platform Instance

Architectural Space Xilinx FPGA

ADC DAC

RF Frontend

Intercom Platform (BWRC, 2001)

‹ Platform: library of resources defining an abstraction layer Œ hide unnecessary details Œ expose only relevant parameters for the next step

27

PARADES

Formal Mechanism

Platform Instance Function Space

Architecture Platform

Function Closure under constrained composition (term algebra)

Library Elements 28

PARADES

Mapping

Platform Instance Function Space

Function

Semantic Platform

Mapped Instance

Admissible Refinements 29

PARADES

Platform stack & design refinements Application Space Platform 1

application instance

Platform i

Platform 2

plat.2 instance

platform i instance

Platform Mapping Refinement

Platform 3

plat.3 instance

Platform Design-Space Export

Platform i+1 Platform 4

platform i+1 instance

implementation instance

Implementation Space 30

PARADES

Automotive Supply Chain: Tier 1 Subsystem Providers 1 2 3 4 5 6/7 8 9 10 11

Transmission ECU Actuation group Engine ECU DBW Active shift display Up/Down buttons City mode button Up/Down lever Accelerator pedal position sensor Brake switch

ƒSubsystem Partitioning ƒSubsystem Integration ƒSoftware Design: Control Algorithms, Data Processing ƒPhysical Implementation and Production 31

PARADES

Magneti Marelli Power-train Platform Stack

A2

Powertrain System Specifications

DESIGN

Operation Refinement

Operational Architecture (ES)

Functions Capture Electrical/Mechanical Architecture

Operations and Macro Architecture Capture Electronic Architecture

Design Mechanical Components

Performance BackAnnotation

HW/SW partitioning

Verify Performance

HW and SW Components Implementation

Verify Components

Electronic System Mapping

Components

A3

Partitioning and Optimization

Functional Network

Capture System Architecture

A4

Functional Decomposition

A5

Powertrain System Behavior

Only SW components

32

PARADES

Outline ‹Embedded System Applications ‹Platform based design methodology ‹Electronic System Level Design Œ

Functions: MoC, Languages

Œ

Architectures: Network, Node, SoC

‹Metropolis ‹Conclusions

33

PARADES

Design Formalization

‹Model of a design with precise unambiguous semantics: ‹Implicit or explicit relations: inputs, outputs and (possibly)

state variables ‹Properties ‹“Cost” functions ‹Constraints

Formalization of Design + Environment = closed system of equations and inequalities over some algebra. 34

PARADES

What: Functional Design ‹ A rigorous design of functions requires a mathematical framework Œ

The functional description must be an invariant of the design

Œ

The mathematical model should be expressive enough to capture easily the functions 

The different nature of functions might be better captured by heterogeneous model of computations (e.g. finite state machine, data flows)

‹ The functional design requires the abstraction of Œ

Time (i.e. un-timed model) 

Œ

Time appears only in constraints that involve interactions with the environment

Data type (i.e. infinite precision)

‹ Any implementation MUST be a refinement of this abstraction (i.e. functionality is

“guaranteed”): Œ

E.g. Un-timed -> logic time -> time

Œ

E.g. Infinite precision -> float -> fixed point

35

‹ FSMs ‹ Discrete Event Systems ‹ CFSMs

PARADES

Models of Computation

Definition: A mathematical description that has a syntax and rules for computation of the behavior described by the syntax (semantics). Used to specify the semantics of computation and concurrency.

‹ Data Flow Models ‹ Petri Nets ‹ The Tagged Signal Model ‹ Synchronous Languages and De-synchronization ‹ Heterogeneous Composition: Hybrid Systems and Languages ‹ Interface Synthesis and Verification ‹ Trace Algebra, Trace Structure Algebra and Agent Algebra

36

PARADES

Usefulness of a Model of Computation ‹Expressiveness ‹Generality ‹Simplicity ‹Compilability/ Synthesizability ‹Verifiability

The Conclusion One way to get all of these is to mix diverse, simple models of computation, while keeping compilation, synthesis, and verification separate for each MoC. To do that, we need to understand these MoCs relative to one another, and understand their interaction when combined in a single system design. 37

PARADES

Reactive Real-time Systems ‹Reactive Real-Time Systems Œ

“React” to external environment

Œ

Maintain permanent interaction

Œ

Ideally never terminate

Œ

timing constraints (real-time)

‹As opposed to Œ

transformational systems

Œ

interactive systems

38

PARADES

Models Of Computation for reactive systems ‹ We need to consider essential aspects of reactive systems: Œ

time/synchronization

Œ

concurrency

Œ

heterogeneity

‹ Classify models based on: Œ

how specify behavior

Œ

how specify communication

Œ

implementability

Œ

composability

Œ

availability of tools for validation and synthesis

39

‹

Main MOCs: Œ Œ Œ Œ Œ Œ Œ

‹

PARADES

Models Of Computation for reactive systems Communicating Finite State Machines Details Dataflow Process Networks Petri Nets Discrete Event (Abstract) Codesign Finite State Machines Synchronous Reactive Details Task Programming Model

Main languages: Œ Œ Œ Œ Œ

StateCharts Esterel Dataflow networks Simulink UML 40

‹

‹

PARADES

Models Of Computation for reactive systems Main MOCs: Œ

Communicating Finite State Machines

Œ

Dataflow Process Networks

Œ

Petri Nets

Œ

Discrete Event

Œ

Codesign Finite State Machines

Œ

Synchronous Reactive

Œ

Task Programming Model

Main languages: Œ

StateCharts

Œ

Esterel

Œ

Dataflow networks

Œ

Simulink

Œ

UML 41

PARADES

The Synchronous Programming Model ‹Synchronous programming model* is dealing with

concurrency as follows: Œ

non overlapping computation and communication phases taking zero-time and triggered by a global tick

‹Widely used and supported by several tools: Simulink,

SCADE, ESTEREL … ‹Strong constraints on the final implementation to preserve

the separation between computation and communication phases *A.

Benveniste and G. Berry: The synchronous approach to reactive and real-time systems, Proc IEEE, 1991 42

PARADES

The Synchronous Reactive (SR) MoC (*)

‹ Discrete model of time (global set of totally ordered “time ticks”) ‹ Blocks execute atomically at every time tick ‹ Blocks are computed in causal order (writer before reader) ‹ State variables (MEMs) are used to break combinatorial paths ‹ Combinatorial loops have fixed-point semantics MEM

Uk Vk (*)

G

Yk +

Wk

Uk = Wk-1 Yk = G*Uk = G*Wk-1 Wk = Vk+Yk = Vk+G*Wk-1

S. A. Edwards and E. A. Lee, “The semantics and execution of a synchronous block-diagram language”, Science of Computer Programming, 48(1):21–42, jul 2003.

43

PARADES

The Task Programming Model ‹The Task Programming Model (TPM) Œ

A task is a logically grouped sequence of operations

Œ

Each task is released for execution on an event/time reference

Œ

Task execution can be deferred as long as it meets its deadline

Œ

Task scheduling is priority-based possibly with preemption 

Œ

Communication between tasks occurs:  

Œ

Priorities can be static or dynamic Locally: via shared variables Globally: via communication network

Output values depend on scheduling

T9

T7

T8

T10

‹Represented by Task Graphs

T11

T12

T13

T14

44

PARADES

Outline ‹Embedded System Applications ‹Platform based design methodology ‹Electronic System Level Design Œ

Functions: MoC, Languages

Œ

Architectures: Network, Node, SoC

‹Metropolis ‹Conclusions

45

PARADES

(Automotive) V-Models: Car level

Distributed System Sign-Off! Sub-System(s) Integration, Test, and Validation

Fail Safe

CAN Lin

Theft warning

Door Module Gate Way

Light Module ABS

CAN TTCAN Gate Way

Steer by Wire

Soft Real Time

Access to WWW

DAB Fire Wall

Shift by Wire

Real Time

MOST Firewire

Navigation

Engine Management

Brake by Wire

Hard Real Time

Fail Stop

Mobile Communications

Air Conditioning

Fault Functional

System Electronics

Body Functions

Body Electronics

Driving and Vehicle Dynamic Functions

Information Systems

Telematics

Development of Distributed System

FlexRay

46

PARADES

Distributed Embedded Systems: Architectural Design The Design Components at work

Functions

Functional Networks

bus

Solution Patterns

Mapping

Topologies

Resources

Evaluation and Iteration

Solution n+1

47

PARADES

Co-Design Problem ‹ From: Œ Œ

a model of the functionality (e.g. TPM or SPM) a model of the platform (abstraction of topology, network protocol, CPU, Hw/Sw etc)

‹ Allocate:  

The tasks to the nodes The communication signals to the network segments

‹ Schedule:  

The task sets in each node The packets (mapping signals) in each network segment

‹ Such that: 

The system is schedulable and the cost is minimized

‹ Design solutions: Œ Œ Œ

Architectural constrains Analytical approaches Simulation models

48

PARADES

The Time Triggered Approach ‹ Time Triggered Architecture: Global notion of time Œ

Communication and computation are synchronized and MUST HAPPEN AND COMPLETE in a given cyclic time-division schema

‹ Time-Triggered Architecture (TTA) C. Scheidler, G. Heiner, R. Sasse, E. Fuchs, H. Kopetz

‹ Find optimal allocation and

scheduling of a Time Triggered TPM ‹ An Improved Scheduling Technique for Time-

Triggered Embedded Systems, Systems Paul Pop, Petru Eles, and Zebo Peng ‹ Extensible and Scalable Time Triggered Scheduling , EEWei Zheng, Jike Chong, Claudio Pinello, Sri Kanajan, Alberto L. Sangiovanni-Vincentelli

‹ Models of bus/network speed and

topology (Hw) and WCET (Hw/Sw) are needed

49

PARADES

The Holistic Scheduling and Analysis ‹ Based on a Time and Event Triggered

Task Graph Model allocated to a set of nodes Œ

Worst Case Execution Time of Tasks and Communication time of each message are known

‹ Construct a correct static schedule for the TT tasks and ST messages (a

schedule which meets all time constraints related to these activities) and conduct a schedulability analysis in order to check that all ET tasks meet their deadlines.

Holistic Scheduling and Analysis of Mixed Time/Event-Triggered Distributed Embedded Systems (2002) Traian Pop, Petru Eles, Zebo Peng 50

PARADES

Network Calculus Modelings ‹Network calculus: Œ

“Network calculus”, J-Y Le Boudec and P. Thiran, Lecture Notes in Computer Sciences vol. 2050, Springer Verlag

51

PARADES

Event Models

52

PARADES

Composition and Analysis

Px transformation based on: • Output event dependency • WCET • BCET

Provide: • Schedulability check • Output stream models

Other strategy to search solutions (allocation and scheduling)

53

out

Task_A

PARADES

Executable Model: Computation and Communication

in

Task_B

54

out

Post() from Task_A

in

Task_A

Task_B

Receiver

Device Driver

Device NetwLayer Driver

RTOS

RTOS

NetwLayer CLib

Value()/Enabled() from Task_B

Communication Pattern

Sender

PARADES

Communication Refinement: Platform Model

CLib

Memory Access

CPU

CPU

Memory Access

Bus Adapter

CPU Port

CPU Port

Bus Adapter

Bus Arbiter Slave Adapter

LLC/MAC

Memory

Bus Adapter

Local Bus

Local Bus

Controller Network

Controller Network

Network Bus

Bus Arbiter Slave Adapter

LLC/MAC

Memory

Bus Adapter

Bus 55

PARADES

Exploring Solutions by Simulation My_Vehicle_Application

Double Disconnect

Corrupt Data

Single Disconnect

Project_Driver

M3

P1

T

P2

P3

Project_Car_v06

Project_Steer_Control_v06

M4

M5

M6

Car_brake Plant_brake Car_steer Plant_steer

M7

M8

M9

Control_steer Vote_steer Interrupt_counter

Project_Brake_Control_v06 f1

M1

f2

f3

M2

f4

Vote_brake Control_brake T1 Task_2ms

T2 T3 Task_10ms

Init

t1

T1

t2

f5

f6

f7

f8

f9 f10 f11 f12

f13 f14 f15 f16 f17 f18 f19 f20

Driver T4 T5 T6 Task_10ms Task_1ms Init

T7 T8 T9 T10 T11 Task_10ms Prc_count Task_2ms Init

SW_IRQ1

Requires a model of the functionality and performance models of CPUs and network protocols It is trace based!

Cadence SYSDESIGN 56

Development of Distributed System Development of Sub-System Development of Mechanical Part (s) ECU Development

PARADES

(Automotive) V-Models: Subsystem Level

Distributed System Sign-Off! Sub-System(s) Integration, Test, and Validation Sub-System Sign-Off! ECU/ Sens./Actrs./Mech. Part(s) Integration, Calibration, and Test

57

PARADES

Control system design ‹ Specifications given at a

high level of abstraction: Œ

known input/output relation (or properties) and constraints on performance indexes

‹ Control algorithms design ‹ Mapping to different architectures using performance estimation techniques and

automatic code generation from models ‹ Mechanical/Electronic architecture selected among a set of candidates

58

PARADES

HW/SW implementation architecture • a set of possible hw/sw implementations is given by – M different hw/sw implementation architectures

– for each hw/sw implementation architecture m ∈{1,...,M}, • a set of hw/sw implementation parameters z – e.g. CPU clock, task priorities, hardware frequency, etc.

• an admissible set XZ of values for z --------------Water temp. Odometer Tachometer Tachometer Speedometer Speedometer

Application Libraries

OSEK RTOS

Customer Libraries Application Specific Software

Application Programming Interface Sys. Config. (> Boot Loader

I/O drivers & handlers 20 configurable modules)

CCP KWP 2000 Transport

OSEK COM

μControllers Library 59

PARADES

The classical and the ideal design approach ‹ Classical approach (decoupled design) Œ

controller structure and parameters (r ∈ R, c ∈ XC) 

Œ

implementation architecture and parameters (m ∈ M, z ∈ XZ) 

Œ

are selected in order to satisfy system specifications are selected in order to minimize implementation cost

if system specifications are not met, the design cycle is repeated

‹ Ideal approach Œ

Œ

both controller and architecture options (r, c, m, z) are selected at the same time to 

minimize implementation cost



satisfy system specifications

too complex!! 60

PARADES

Algorithm Explorations and Control Synthesis

•1 •inputEvent_1

Powertrain System Behavior

Functional Decomposition

Capture System Architecture

•1

•2

•inputEvent_1

•inputEvent_2 •events

A2

Powertrain System Specifications •events

•2

•fc_event1

•inputEvent_2

•3 •InData

•3

•InData

•4

•5

•fc_event_2

•fc_event1

•InData_2

•function() •InData_2 •OutData

•InData_2 •SF-SS

Functions •InData

•inData

•InData

•inData

•5

•fc_event_2

•InData_2

•function()•FC-SS-1 •InData_2 •OutData

•InData_2 •function() •SF-SS

•InData1

•InData_1

•OutData

•InData_1

•FC-SS-1

•Merge

•OutData

•1 •OutData

•MergeOutData •FC-SS-2

•4

•InData_1

•function() •InData1 •OutData

•InData_1

Operational Architecture (ES)

PerformanceBackAnnotation

HW/SW partitioning

Verify Performance

HW and SW Components Implementation

Verify Components

Electronic System Mapping

Components

A3

Operations and MacroArchitecture

Design Mechanical Components

•OutData

•MergeOutData •FC-SS-2

Capture Electrical /Mechanical Architecture

Capture Electronic Architecture

•1

A4

DESIGN

Operation Refinement

•OutData

A5

Partitioning and Optimization

Functional Network

•Merge

Only SW components

61

PARADES

Implementation abstraction layer ‹ we introduce an implementation abstraction layer Œ

which exposes ONLY the implementation non-idealities that affect the performance of the controlled plant, e.g. ‹ ‹ ‹ ‹

control loop delay quantization error sample and hold error computation imprecision

‹ at the implementation abstraction layer, platform instances

are described by Œ Œ

S different implementation architectures for each implementation architecture s ∈{1,...,S}, 

a set of implementation parameters p ‹



e.g. latency, quantization interval, computation errors, etc.

an admissible set XP of values for p 62

y

d u

Plant

Δu

+

PARADES

Effects of controller implementation in the controlled plant performance w

Δw Controller

Δr

nw

+

r

+

nu nr

‹ modeling of implementation non-idealities: Œ

Δu, Δr, Δw : time-domain perturbations 

Œ

control loop delays, sample & hold , etc.

nu , nr , nw :value-domain perturbations  quantization error, computation imprecision, etc. 63

PARADES

Algorithm Development Control Algorithm Design • Control Algorithm Specification Model and Simulation files •

Simulink model



Calibrations data



Time history data

Simulation Results

Simulink Model •1 •inputEvent_1 •eve nts •2 •inputEvent_2 •fc_event1

•3 •InData •inData •InData

•function() •fc_event_2

•SF-SS •4 •InData_1 •InData_1

•5 •InData_2•InData_2 •OutData •InData_2 •FC-SS-1

•function() •InData1 •OutData

•FC-SS-2

•Merge•OutData •MergeOutData

•1 •OutData

Time History Calibration data 64

PARADES

(Automotive) V-Models: ECU level (Hw/Sw)

Distributed System Sign-Off! Sub-System(s) Integration, Test, and Validation

Development of Distributed System Development of Sub-System Development of Mechanical Part (s) ECU Development ECU SW Development ECU HW Development ECU SW Implementation

Sub-System Sign-Off! ECU/ Sens./Actrs./Mech. Part(s) Integration, Calibration, and Test ECU Sign-Off! ECU HW/SW Integration and Test ECU HW Sign-Off! ECU SW Integration and Test 65

Main design tasks:

PARADES

(Automotive) V-Models: ECU level (Hw/Sw) Development Define ECU Hardware/Software Partitioning of‹Distributed System Œ Platform instance structure selection Development of Sub-System ‹ Software Implementation

Distributed System Sign-Off! Sub-System(s) Integration, Test, and Validation

Development ‹ Hardware (SoC)ofDesign and Implementation

Sub-System Sign-Off!

Mechanical Part (s)

ECU Development ECU SW Development ECU HW Development ECU SW Implementation

ECU/ Sens./Actrs./Mech. Part(s) Integration, Calibration, and Test ECU Sign-Off! ECU HW/SW Integration and Test ECU HW Sign-Off! ECU SW Integration and Test 66

PARADES

Control Algorithm Implementation Strategy

‹Control algorithms are mapped to the target platform to

achieve the best performance/cost trade-off. Œ

In most cases the platform can accommodate in software the control algorithms, if not:

Œ

New platform services might be required or

Œ

New hardware components might be implemented or

Œ

New control algorithms must be explored.

67

PARADES

Platform Design Strategy ‹Minimize software development time Œ

Maximize model based software 

Software generation is possible today from several MoC and languages: ‹

 

Implement the same MoC of specification or guarantee the equivalence Fit into the chosen software architecture to maximize reuse at component level ‹

Œ

StateCharts, Dataflow, SR, …

E.g. AUTOSAR for automotive

Maximize the reuse of hand-written software component 

Define application and platform software architecture

‹Minimize the change requests for the hardware platform Œ

Implement as much as possible in software 68

PARADES

System Platform Definition •1 •inputEvent_1

•events

•2 •inputEvent_2 •fc_event1

•3 •InData

•InData

•inData

•5

•fc_event_2

•InData_2

•function() •InData_2 •OutData

•InData_2 •SF-SS

•1 •inputEvent_1

•FC-SS-1

•function() •InData1 •OutData

•2 •4 •InData_1 •inputEvent_2 •InData_1

•InData

•Merge

•OutData

•1 •OutData

•MergeOutData •FC-SS-2

•fc_event1

•3 •InData

Application Software

•events

•inData

•5

•fc_event_2

•InData_2

•function() •InData_2 •OutData

•InData_2 •SF-SS

•4

•InData_1

•FC-SS-1

•function() •InData1 •OutData

•InData_1

•Merge

•OutData

•1 •OutData

•MergeOutData •FC-SS-2

Sensor/Actuator Layer Software Platform (API services)

Net Device Drivers BIOS CPUs ECU output devices ECU input devices

RTOS

The software application is composed of modelbased and hand-written application-dependent software components (sources) The software platform is cross applications and cross HW plats and is composed of parameterized software components (sources)

69

PARADES

Software Implementation Flow •1 •inputEvent_1

•events

•2 •inputEvent_2 •fc_event1

•3 •InData

•InData

•inData

•5

•fc_event_2

•InData_2

•function() •InData_2 •OutData

•InData_2 •SF-SS

•1 •inputEvent_1

•FC-SS-1

•function() •InData1 •OutData

•2 •4 •InData_1 •inputEvent_2 •InData_1

•InData

•Merge

•OutData

•1 •OutData

•MergeOutData •FC-SS-2

•fc_event1

•3 •InData

Application Software

•events

•inData

•5

•fc_event_2

•InData_2

•function() •InData_2 •OutData

•InData_2 •SF-SS

•4

•InData_1

•FC-SS-1

•function() •InData1 •OutData

•InData_1

•Merge

•OutData

•1 •OutData

•MergeOutData •FC-SS-2

Sensor/Actuator Layer Software Platform (API services)

Net Device Drivers BIOS CPUs ECU output devices ECU input devices

RTOS

The software application is composed of modelbased and hand-written application-dependent software components (sources) The software platform is cross applications and cross HW plats and is composed of parameterized software components (sources)

70

PARADES

Exampe of Specification of Control Algorithms

‹A control algorithm is a (synch or a-synch) composition of

extended finite state machines (EFSM). control-logic

data-flow computational blocks •1 •inputEvent_1

•events

•2 •inputEvent_2 •fc_event1

•3

•inData •InData

•InData

•function()

•5

•fc_event_2

•InData_2

•OutData

•InData_2

•InData_2 •SF-SS

•FC-SS-1

•function()

•4

•InData1

•OutData

•Merge

•1 •OutData

•OutData

•InData_1

•InData_1

•MergeOutData •FC-SS-2

71

‹

Mapping a functional model to software platform: Œ Œ Œ

‹

Data refinement Software platform services mapping (communication and computation) Time refinement (scheduling)

Data refinement Œ

Float to Fixed Point Translation.  

Œ

‹

Range, scaling and size setting (by the designer). Worst case analysis for internal variable ranges and scaling.

Signals and parameters to C-variables mapping.

Software platform model: Œ

variables and services (naming). 

Œ

Access variable method are mapped with variable classes.

execution model: 

‹

PARADES

Code Generation

Multi-rate subsystems are implemented as multi-task software components scheduled by an OSEK/VDX standard RTOS

Time refinement Œ

Task scheduling

72

PARADES

Mapping Control Algorithms to the Platform •1 •inputEvent_1

•events

•2 •inputEvent_2 •fc_event1

•3 •InData

•InData

•inData

•5

•fc_event_2

•InData_2

•function() •InData_2 •OutData

Application Software

•InData_2 •SF-SS

•4

•InData_1

•FC-SS-1

•function() •InData1 •OutData

•InData_1

•Merge

•OutData

•1 •OutData

•MergeOutData •FC-SS-2

Automatic synthesis From high level models: • Automatic translation to C/C++ code • (Semi)-Automatic data refinement for computation • Automatic refinement of communication services Flow examples: ASCET, Simulink/eRTW/TargetLink, UML

Sensor/Actuator Layer

Software Platform (API services)

Net Device Drivers BIOS CPUs ECU output devices ECU input devices

RTOS

Handwritten code 73

Modelled Components

SLOC

Platform Components

26-HandCoded

26500

Application Components

86-AutomCoded 13-HandCoded

PARADES

Example: Gasoline Direct Injection Engine Control % of Model Compiled SLOC 0% 90% 93600

% of the total memory occupation ROM %

RAM %

Platform

17.9

2.9

Application

82.1

97.1

74

PARADES

Example: Gasoline Direct Injection Engine Control

‹Tremendous increase in application-software productivity: Œ

Up to 4 time faster than in the traditional hand-coding cycle.

‹Tremendous decrease in verification effort: Œ

Close to 0 ppm

‹Tremendous reuse of modes and source code

75

PARADES

Defining the Platform Application Space (Features) Application Software

Application Instances

Platform Specification Application Software Platform API

System Platform (no ISA)

BIOS Device Drivers

Hardware BIOS Platform Device Drivers Input devices BIOS Platform Hardware Input devices Hardware Platform

ST10

Platform Instance

HITACHI

Input devices

Network Communication

Device Drivers

Network Communication

RTOS

Network Communication

RTOS

Platform Design Space Exploration

RTOS

Software Platform

DUAL-CORE Hardware

Output Devices

Output Devices

Output Devices

I HITACHI Hardware

ST10

network I Hardware

network I

O

O

O

network

DUAL-CORE

Architectural Space (Performance) 76

Different Languages and MoCs

Platform non idealities

UML

ASCET Simulink

StateMate

Algorithm Analysis Platform Export

Generators

Exporters

Code Generation (Synthesis)

Defined MoC and Languages

PARADES

Simulation Based (C/C++/SystemC) Exploration Flow

C/C++/SystemC

Platform Models

Unique Representation

C/C++/SystemC

Mapping Build C/C++/SystemC

Performance Traces Simulator

Integration

Simulation and Performance Estimation 77

PARADES

SystemC and OCP Abstraction Levels Communication (I/F) Abstraction Accuracy

OCP Layers

Abstraction Removes

Untimed Functional

Token

Message (L-3)

Programmers View (PV)

+Address

Time Resource Sharing

Programmers View + Time (PVT)

+Transaction time

SystemC

Transaction (L-2) Clocks, protocols

Bus cycle Accurate (BCA) +Clock cycle

Transfer (L-1)

Wire registers

Pin Cycle Accurate (PCA)

RTL (L-0)

Gates

+Pin/clock

Computation Untimed Functional (UTF) Function Time Functional (TF)

+Computation Time

Register Transfer (RT)

+Clock cycle 78

PARADES

Mapping application to platform

CPU load%

IRQ/s

2000

15 10

1000

5 0

0 mapping "zero"

mapping "uno"

mapping "due"

mapping "tre"

mapping "zero"

task sw itching (attivazioni/s)

mapping "uno"

mapping "due"

mapping "tre"

num ero di task

10000

15 10

5000

5 0 mapping "zero"

0 mapping "uno"

mapping "due"

mapping "tre"

mapping "zero"

mapping "uno"

mapping "due"

mapping "tre"

79

PARADES

SW estimation ‹ SW estimation is needed to Œ

Evaluate HW/SW trade-offs

Œ

Check performance/constraints 

Œ

Higher reliability

Reduce system cost 

Allow slower hardware, smaller size, lower power consumption

80

PARADES

SW estimation: Static vs. Dynamic ‹ Static estimation Œ Œ Œ

Determination of runtime properties at compile time Most of the (interesting) properties are undecidable => use approximations An approximation program analysis is safe, if its results can always be depended on. 

Œ

E.G. WCET, BCET

Quality of the results (precision) should be as good as possible

‹ Dynamic estimation Œ Œ

Determination of properties at runtime DSP Processors    

Œ

relatively data independent most time spent in hand-coded kernels static data-flow consumes most cycles small number of threads, simple interrupts

Regular processors   

arbitrary C, highly data dependent commercial RTOS, many threads complex interrupts, priorities 81

PARADES

SW estimation overview ‹ Two aspects to be considered Œ

The structure of the code (program path analysis) 

Œ

E.g. loops and false paths

The system on which the software will run (micro-architecture modeling) 

CPU (ISA, interrupts, etc.), HW (cache, etc.), OS, Compiler

‹ Level at which it is done Œ Low-level  

Œ

e.g. gate-level, assembly-language level Easy and accurate, but long design iteration time

High/system-level   

Fast: reduces the exploration time of the design space Accurate “enough”: approximations are required Processor model must be cheap ‹ ‹ ‹



“what if” my processor did X future processors not yet developed evaluation of processor not currently used

Must be convenient to use ‹

no need to compile with cross-compilers and debug on my desktop 82

PARADES

SW estimation in VCC Virtual Processor Model (VPM) compiled code virtual instruction set simulator

‹ An virtual processor functional model with its own ISA estimating

computation time based on a table with instruction time information Œ

Pros:    

Œ

does not require target software development chain (uses host compiler) fast simulation model generation and execution simple and cheap generation of a new processor model Needed when target processor and compiler not available

Cons:  

hard to model target compiler optimizations (requires “best in class” Virtual Compiler that can also as C-to-C optimization for the target compiler) low precision, especially for data memory accesses 83

PARADES

SW estimation by ISS Interpreted instruction set simulator (I-ISS)

‹ A model of the processor interpreting the instruction stream

and accounting for clock cycle accurate or approximate time evaluation Œ

Pros:   

Œ

generally available from processor IP provider often integrates fast cache model considers target compiler optimizations and real data and code addresses

Cons:    

requires target software development chain and full application (boot, RTOS, Interrupt handling, etc) often low speed different integration problem for every vendor (and often for every CPU) may be difficult to support communication models that require waiting to complete an I/O or synchronization operation

84

PARADES

Accuracy vs Performance vs Cost Accuracy

Speed

$$$*

Hardware Emulation

+++

+-

---

Cycle accurate model Cycle counting ISS

++ ++

-+

--

Dynamic estimation

+

++

++

Static spreadsheet

-

+++ +++

*$$$ = NRE + per model + per design 85

PARADES

CoWare Platform Modeling Environment ‹Focus on computation/communication separation

‹Leverage their LISA platform and SystemC Transaction

Level Models

86

PARADES

CoWare Support for Multiple Abstraction Levels ‹ Support successive refinement for both processors and bus models

‹ Depending on abstraction level, simulation performance of 100 to 200 Kcycles/sec

87

Model based Model level

PARADES

Refining the Control Algoritm Code based Untimed, host data type Untimed, target data type Timed, target data type Real target UF Platform-in-the-Loop C Code on platform model

Platform model

TF/RT Platform-in-the-Loop C Code on platform model

Platform model

88

PARADES

Model Based Control-Platform Co-Design •1 •inputEvent_1

•events

•2 •inputEvent_2 •1 •inputEvent_1

•events

•fc_event1

•2

•1 •3 •inputEvent_2 •inData •InData •inputEvent_1 •InData •2

•events •fc_event_2 •fc_event1

•3 •inputEvent_2•inData •SF-SS •InData •InData •3

•4

•InData_1

•InData

•InData_2

•function() •InData_2 •OutData

•InData_2

•FC-SS-1 •function() •5

•fc_event_2 •fc_event1

•InData_2

•InData_2

•InData_2

•InData

•InData_1 •4

•5

•function() •inData •SF-SS•InData1 •OutData •fc_event_2

•InData1

•InData_1

•Merge •OutData •FC-SS-1 •function() •5

•InData_2

•InData_2

•FC-SS-2 •function() •SF-SS

•OutData

•InData1

•OutData

•1 •OutData

•MergeOutData •Merge

•OutData

•InData_1

Platform Abstraction

•1

•OutData

•MergeOutData •Merge

•FC-SS-2 •function() •InData_1

•InData_2

•FC-SS-1

•InData_1 •4

•OutData

•OutData •OutData

•1 •OutData

•MergeOutData •FC-SS-2

Control Specification

void integratutto4_initializer( void ) { /* Initialize machine's broadcast event variable */ _sfEvent_ = CALL_EVENT;

Software Platform (API services)

Net Device Drivers BIOS CPUs ECU output devices ECU input devices

RTOS

_integratutto4MachineNumber_ = sf_debug_initialize_machine("integratutto4","sfun",0,3,0,0,0); sf_debug_set_machine_event_thresholds(_integratutto4MachineNumber_,0,0); sf_debug_set_machine_data_thresholds(_integratutto4MachineNumber_,0); }

89

PARADES

Platform Design •1 •inputEvent_1

•events

•2 •inputEvent_2 •fc_event1

•3 •InData

•InData

•inData

•5

•fc_event_2

•InData_2

•function() •InData_2 •OutData

•InData_2 •SF-SS

•1 •inputEvent_1

•FC-SS-1

•function() •InData1 •OutData

•2 •4 •InData_1 •inputEvent_2 •InData_1

•InData

•Merge

•OutData

•1 •OutData

•MergeOutData •FC-SS-2

•fc_event1

•3 •InData

Application Software

•events

•inData

•5

•fc_event_2

•InData_2

•function() •InData_2 •OutData

•InData_2 •SF-SS

•4

•InData_1

•FC-SS-1

•function() •InData1 •OutData

•InData_1

•Merge

•OutData

•1 •OutData

•MergeOutData •FC-SS-2

Sensor/Actuator Layer Software Platform (API services)

Net Device Drivers BIOS CPUs ECU output devices ECU input devices

RTOS

The software application is composed of modelbased and hand-written application-dependent software components (sources) The software platform is cross applications and cross HW plats and is composed of parameterized software components (sources)

90

PARADES

Choosing an Implementation Architecture Application Space (Features) Application Software

Application Instances

Platform Specification Application Software Platform API

System Platform (no ISA)

BIOS Device Drivers

Hardware BIOS Platform Device Drivers Input devices BIOS Platform Hardware Input devices Hardware Platform

ST10

Platform Instance

HITACHI

Input devices

Network Communication

Device Drivers

Network Communication

RTOS

Network Communication

RTOS

Platform Design Space Exploration

RTOS

Software Platform

DUAL-CORE Hardware

Output Devices

Output Devices

Output Devices

I HITACHI Hardware

ST10

network I Hardware

network I

O

O

O

network

DUAL-CORE

Architectural Space (Performance) 91

‹Hardware, computation: Œ

Cores:  

Œ

Coprocessors:  

Œ

Core selection Core instantiation Selection (Peripherals) Configuration/Synthesis

Instructions:  

ISA definition (VLIW) ISA Extension Flow

‹Hardware, communication: Œ

Busses

Œ

Networks

PARADES

Platform Design and Implementation ‹Software, granularity: Œ

Set of Processes

Œ

Process/Thread

Œ

Instruction sequences

Œ

Instructions

‹Software, layers: Œ

RTOS

Œ

HAL

Œ

Middle layers

92

PARADES

AUTOSAR Software Platform Standardization

93

94

PARADES

PARADES

Hardware Design Flow

‹Not a unified approach to explore the different levels of

parallelism ‹The macro level architecture must be selected Œ

Implementing function in RTL (SystemC/C++ Flow) 

Hardware implementation of RTOS

Œ

Partition the function and implements some parts using a dedicated Co-Processor

Œ

Change Core Instruction Set Application (ISA):   

Parameterization of a configurable processor Custom extension of the ISA Define a new ISA (e.g. VLIW)

95

PARADES

Traditional System-On-Chip Design Flow

96

PARADES

C/C++ Synthesis Flow

97

PARADES

Evolution of System-On-Chip Design Flow

98

PARADES

Implementing Function in RTL General-purpose CPUs used used in in traditional traditional SOCs SOCs are are not not fast fast enough enough for for data-intensive data-intensive applications, applications, don’t don’t have have enough enough I/O I/O or or compute compute bandwidth, bandwidth, lacks lacks efficiency efficiency

General Purpose 32b CPU

ROM A/D

Hardwired Logic •• High High performance performance due due to to parallelism parallelism •• Large Large number number of of wires wires in/out in/out of of the the block block •• Languages Languages//Tools Tools familiar familiar to to many many

But …

RAM

I/O

Hardwired Logic

•• Slow Slow to to design design and and verify verify •• Inflexible Inflexible after after tapeout tapeout •• High High re-spin re-spin risk risk and and cost cost •• Slows Slows time time to to market market

PHY

Courtesy of Grant Martin, Chief Scientist, Tensilica 99

PARADES

SystemC/C++ Synthesis Flow High Level Models: TLM/Simulink SystemC/C++ Models

High-Level Synthesis Hardware implementations

Chunks Identification & System partitioning

Cost Function Evaluation

Hardware Cost Estimation

Performance Estimation

IR: Control Flow Data Graph

Software Extraction Software Compilation

Software Cost Estimation

Hw/Sw Integration Hardware Refinement

hardware

Hw/Sw Co-verification

RTL Level

Software Refinement

software 100

PARADES

Celoxica and Forte Flows DK Design Suite

Cynthesizer

101

PARADES

Coprocessor Synthesis ‹ Loosely coupled coprocessor that

accelerates the execution of compiled binary executable software code offloaded from the CPU Œ

Delivers the parallel processing resources of a custom processor.

Œ

Automatically synthesizes programmable coprocessor from software executable (hw and sw).

Œ

Maximizes system performance through memory access and bus communication optimizations. 102

PARADES

Criticalblue Approach ‹ Bottleneck Identification: Œ Analyze the profiling results of the application software running on the main microprocessor. Œ Manually identifies the specific tasks to be migrated to the coprocessor.

‹ Architecture Synthesis and Performance Estimation: Œ User-defined constraints like gate count, clock cycle count, and bus utilization Œ Analysis of the instruction code and architecte the coprocessor deploy the maximum parallelism consistent with the input constraints. Œ Estimation of gate-count and performance including estimates of communication overhead with the main processor. ‹ Coprocessor-Performance and “What-If” Analysis: Œ Generation of an instruction- and bit-accurate C model of the coprocessor architecture used in conjunction with the main processor’s instruction-set simulator (ISS). Œ Typical analysis: performance profiling, memory-access activity, and activation trace data Œ The model also is used to validate the coprocessor within a standard C or SystemC simulation environment. ‹ Hardware Synthesis and Microcode generation: Œ Generation of the coprocessor hardware, delivering synthesizable RTL code in either VHDL or Verilog and of the circuitry that’s needed to enable the coprocessor to communicate with the main processor’s bus interface. Œ Generation of the coprocessor microcode. Œ It automatically modifies the original executable code so that function calls are directed to a communications library. Œ This library manages the coprocessor handoff. It also communicates parameters and results between the main processor and the coprocessor. Œ Microcode can be generated independently of the coprocessor hardware, allowing new microcode to be targeted at an existing coprocessor design. 103

PARADES

Configurable and Extensible Processor

Courtesy of Grant Martin, Chief Scientist, Tensilica

Fully Configurable Processor Features

Instruction Fetch / Decode Designer-defined FLIX parallel execution pipelines - “N” wide

.....

User Defined Execution Units, Register Files and Interfaces

Base ISA Feature

User Defined Execution Units, Register Files and Interfaces

User Defined Queues / Ports up to 1M Pins

...

Configurable Functions Optional Function Optional & Configurable Designer Defined Features (TIE)

Load/Store Unit #2

Base ISA Execution Pipeline

Processor Controls Trace/TJAG/OCD Interrupts, Breakpoints, Timers

Base ALU

Local Instruction Memories

Optional Execution Units

External Bus Interface

Register File

User Defined Execution Unit Vectra LX DSP Engine Data Load/Store Unit

Processor Interface (PIF) to System Bus

Local Data Memories

Xtensa Local Memory Interface

104

1

3

PARADES

Instruction Extension : Simple Example

Courtesy of Grant Martin, Chief Scientist, Tensilica

2

operation TRUNCATE_16 {out AR z, in AR m}{} { assign z = {16'b0, m[23:8] }; }

The operation statement describes an entire new instruction, including:

1

Instruction name

2 3

Instruction format and arguments Functional Behavior From this single statement, Tensilica’s technology generates processor hardware, simulation and software development tool support for the new instruction. 105

PARADES

More Complex Extensions

Courtesy of Grant Martin, Chief Scientist, Tensilica

operation MUL_SAT_16 {out AR z, in AR a, in AR b} {} { wire [31:0] m = TIEmul(a[15:0],b[15:0],1); assign z = {16'b0, m[31] ? ((m[31:23]==9'b1) ? m[23:8] : 16'h8000) : ((m[31:23]==9'b0) ? m[23:8] : 16'h7fff) }; } schedule ms {MUL_SAT_16} {def z 2;}

Core 32bit Register File (AR)

a

b

Pipeline Stage E1

E2

MUL

a

OPERAND1

b

OPERAND2

SAT

X

SAT

RESULT

z 106

PARADES

SIMD : Exploiting Data Parallelism

Courtesy of Grant Martin, Chief Scientist, Tensilica

operation MUL_SAT_2x16 {out AR z, in AR a, in AR b} {} { wire [31:0] m1 = TIEmul(a[31:16],b[31:16],1); wire [31:0] m0 = TIEmul(a[15:0], b[15:0], 1); {m1[31] ? ((m1[31:23]==9'b1) ? m1[23:8] : assign z = { : ((m1[31:23]==9'b0) ? m1[23:8] : m0[31] ? ((m0[31:23]==9'b1) ? m0[23:8] : : ((m0[31:23]==9'b0) ? m0[23:8] : } schedule ms {MUL_SAT_2x16} {def z 2;}

Core 32bit Register File (AR)

a

a1

a0

b

b1

b0

z

a1 a0

b0

16'h8000) 16'h7fff), 16'h8000) 16'h7fff) };

b1

MUL

SAT 107

PARADES

Multiple Instruction Issues ™ - FLIX Architecture

Courtesy of Grant Martin, Chief Scientist, Tensilica



‹

FLIX – Flexible Length Instruction Xtensions

‹

Multiple, concurrent, independent, compound operations per instruction Œ Œ Œ Œ

‹

Modeless intermixing of 16, 24, and 32 or 64 bit instructions Fast and concurrent code (concurrent execution) when needed Compact code when concurrency / parallelism isn’t needed Full code compatibility with base 16/24 bit Xtensa ISA

Minimal overhead Œ Œ

No VLIW-style code-bloat ~2000 gates added control logic

Designer-Defined FLIX Instruction Formats with Designer-Defined Number of Operations 63

0

Operation 1

Operation 2

Operation 3

1 1 1 0

Example 3 – Operation, 64b Instruction Format 63

0

Operation 1

Operation 2

Op 3

Op 4

1 1 1 0

Operation 5

Example 5 – Operation, 64b Instruction Format 31

Op 1

0

Op 2

Op 3

Op. 4

1 1 1 0

Example 4 – Operation, 32b Instruction Format

108

PARADES

Parallelism at Three Levels in Extensible Instructions

Courtesy of Grant Martin, Chief Scientist, Tensilica L operations packed in one long instruction

M copies of storage and function

register and constant inputs

reg

reg

Multi-issue instruction

SIMD operation

reg const

Three forms of instruction-set parallelism: op

op

• Very Long Instruction Word (VLIW) • Single Instruction Multiple Data (SIMD) aka “vectors”

op

N dependent operations implemented as single fused operation

reg

• Fused operations aka “complex operations”

Parallelism: L x M x N Example: 3 x 4 x 3 = 36 ops/cycle

Fused operation 109

PARADES

HW & SW automatically generated

Courtesy of Grant Martin, Chief Scientist, Tensilica

Xtensa Xplorer Œ

Integrated Development Environment

Software Œ

Œ

TIE Development tools

Œ

Œ

C Development tools

Œ

Profiling & visualization tools

Œ Œ Œ Œ

Hardware Œ

Synthesizable RTL

Œ

Synopsys/Cadence flows

Œ

Scheduling assembler Xtensa C/C++ Compiler: vectorizing C/C++ compiler Xtensa Instruction Set Simulator – Pipeline accurate Debuggers XTMP: System Modeling API Bus Functional Model for HW/SW co-simulation model RTOS: VxWorks, Nucleus, XTOS

110

PARADES

Design Flow

Courtesy of Grant Martin, Chief Scientist, Tensilica

Automation: Optimized Processor & Matching Software Tools Complete Hardware Design Source RTL, EDA scripts, test suite

ANSI C/C++ Code Source code

Processor

int int main() main() {{ int int i; i; short short c[100]; c[100]; for for (i=0;i