Monolithic Compiler Experiments Using C++ Expression Templates*

3 downloads 4 Views 1MB Size Report
Edward Rutledge. Robert Bond. HPEC 2002. 25 September, 2002. Lexington, MA. * This work is sponsored by the Department of Defense, under Air Force ...

Monolithic Compiler Experiments Using C++ Expression Templates* Lenore R. Mullin** Edward Rutledge Robert Bond HPEC 2002 25 September, 2002 Lexington, MA * This work is sponsored by the Department of Defense, under Air Force Contract F19628-00-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the Department of Defense.

MIT Lincoln Laboratory 999999-1 XYZ 10/3/02

** Dr. Mullin participated in this work while on sabbatical leave from the Dept. of Computer Science, University of Albany, State University of New York, Albany, NY.

Outline • Overview – Motivation – The Psi Calculus – Expression Templates

• Implementing the Psi Calculus with Expression Templates • Experiments • Future Work and Conclusions

020723-er-2 KAM 10/3/02

MIT Lincoln Laboratory

Motivation: The Mapping Problem y= conv (x) Map

intricate math intricate memory accesses (indexing)

Mathematics of Arrays

Approach

• Math and indexing operations in same expression • Framework for design space search – Rigorous and provably correct – Extensible to complex architectures

Example: “raising” array dimensionality

x: < 0

1

2 … 35 >

Memory Hierarchy

Main Memory

1 4

2 > 5 >

< 6 7 8 > < 9 10 11 >

L2 Cache

< 12 13 14 > < 15 16 17 >

L1 Cache

Map:

< 18 19 20 > < 21 22 23 > < 24 25 26 > < 27 28 29 >

Parallelism

020723-er-3 KAM 10/3/02

< 0 < 3

< 30 31 32 > < 33 34 35 >

MIT Lincoln Laboratory

Basic Idea Benefits • Theory based • High level API • Efficient

•• Expression ExpressionTemplates Templates ––Efficient Efficienthigh-level high-level container containeroperations operations ––C++ C++

Implementation PETE PETE Style Style Array Array Operations Operations

•• Psi PsiCalculus Calculus

––Array Arrayoperations operationsthat that compose efficiently compose efficiently ––Minimum Minimumnumber numberof of memory reads/writes memory reads/writes

Theory

Combining CombiningExpression ExpressionTemplates Templatesand andPsi PsiCalculus Calculusyields yields an anoptimal optimalimplementation implementationof ofarray arrayoperations operations 020723-er-4 KAM 10/3/02

MIT Lincoln Laboratory

Psi Calculus1 Key Concepts Denotational Normal Form (DNF): • Minimum number of memory reads/writes for a given array expression • Independent of data storage order Gamma function: Specifies data storage order

Operational Normal Form (ONF): • Like DNF, but takes data storage into account • For 1-d expressions, consists of one or more loops of the form: x[i]=y[stride*i+offset], l

i

h= < 5 6 7 >

h= < 5 6 7 >

x’=cat(reshape(, ), cat(x, reshape(,)))=

x’= < 0 0 1 . . . 4 0 0 >

x’ rot=binaryOmega(rotate,0,iota(N+M-1), 1 x’)

x’ rot= < 0 1 2 3 . . . >

x’ final=binaryOmega(take,0,reshape(,),1,x’ rot)

x’ final= < 0 1 2 >

Prod=binaryOmega (*,1, h,1,x’ final)

Prod= < 0 6 14 > < 5 12 21 >

Y=unaryOmega (sum, 1, Prod)

Y= < 7 20 38 . . . >

Psi PsiCalculus Calculusreduces reducesthis thisto toDNF DNFwith withminimum minimummemory memoryaccesses accesses 020723-er-7 KAM 10/3/02

MIT Lincoln Laboratory

Typical C++ Operator Overloading Example: Example: A=B+C A=B+C vector vector add add 22 temporary temporary vectors vectors created created

Main 1. Pass B and C references to operator + B&,

Operator +

Additional Memory Use

2. Create temporary result vector 3. Calculate results, store in temporary 4.Return copy of temporary 5. Pass results reference to operator= tem

Operator =

pc op y&

6. Perform assignment

020723-er-8 KAM 10/3/02

• Static memory • Dynamic memory (also affects execution time)

C&

temp B+C

temp

tem

op pc

Additional Execution Time

y

temp copy

A

• Cache misses/ page faults • Time to create a new vector • Time to create a copy of a vector • Time to destruct both temporaries MIT Lincoln Laboratory

C++ Expression Templates and PETE Parse Tree +

Expression Expression Templates

A=B+C A=B+C

Main

BinaryNode

Reduced Memory Use

, B& & C

2. Create expression parse tree 3. Return expression parse tree

• Parse tree only contains references

+ B&

C&

py co

Reduced Execution Time

4. Pass expression tree reference to operator

Operator =

C

Parse Parsetrees, trees,not notvectors, vectors,created created Parse trees, not vectors, created

1. Pass B and C references to operator +

Operator +

B

Expression Type

co

py

&

5. Calculate result and perform assignment B+C

A

• Better cache use • Loop fusion style optimization • Compile-time expression tree manipulation

• PETE, the Portable Expression Template Engine, is available from the Advanced Computing Laboratory at Los Alamos National Laboratory • PETE provides: – Expression template capability PETE: http://www.acl.lanl.gov/pete – Facilities to help navigate and evaluating parse trees

020723-er-9 KAM 10/3/02

MIT Lincoln Laboratory

Outline • Overview – Motivation – The Psi Calculus – Expression Templates

• Implementing the Psi Calculus with Expression Templates • Experiments • Future Work and Conclusions

020723-er-10 KAM 10/3/02

MIT Lincoln Laboratory

Implementing Psi Calculus with Expression Templates Example: A=take(4,drop(3,rev(B)))

1. Form expression tree take 4

B= A=

drop 3

rev B

020723-er-11 KAM 10/3/02

MIT Lincoln Laboratory

Implementing Psi Calculus with Expression Templates Example: A=take(4,drop(3,rev(B)))

1. Form expression tree take 4

B= A=

drop 3

rev B

Size info

2. Add size information

size=10

020723-er-12 KAM 10/3/02

B

MIT Lincoln Laboratory

Implementing Psi Calculus with Expression Templates Example: A=take(4,drop(3,rev(B)))

1. Form expression tree take 4

B= A=

drop 3

rev B

Size info

2. Add size information

size=10 size=10

020723-er-13 KAM 10/3/02

rev B

MIT Lincoln Laboratory

Implementing Psi Calculus with Expression Templates Example: A=take(4,drop(3,rev(B)))

1. Form expression tree take 4

B= A=

drop 3

rev B

Size info

2. Add size information

size=7

drop

3

size=10 size=10

020723-er-14 KAM 10/3/02

rev B

MIT Lincoln Laboratory

Implementing Psi Calculus with Expression Templates Example: A=take(4,drop(3,rev(B)))

1. Form expression tree take 4

B= A=

drop 3

rev B

2. Add size information Size info

size=4 4

take size=7

drop

3

size=10 size=10

020723-er-15 KAM 10/3/02

rev B

MIT Lincoln Laboratory

Implementing Psi Calculus with Expression Templates Example: A=take(4,drop(3,rev(B)))

1. Form expression tree take 4

B= A=

drop 3

rev B

3. Apply Psi Reduction rules

2. Add size information 4

take size=7

drop

3

size=10 size=10

020723-er-16 KAM 10/3/02

Reduction

Size info

size=4

rev B

size=10

A[i]=B[i]

MIT Lincoln Laboratory

Implementing Psi Calculus with Expression Templates Example: A=take(4,drop(3,rev(B)))

1. Form expression tree take 4

B= A=

drop 3

rev B

Size info

size=4 4

take size=7

drop

3

size=10 size=10

020723-er-17 KAM 10/3/02

rev

size=10

A[i] =B[-i+B.size-1] =B[-i+9]

B

size=10

A[i]=B[i]

Reduction

3. Apply Psi Reduction rules

2. Add size information

MIT Lincoln Laboratory

Implementing Psi Calculus with Expression Templates Example: A=take(4,drop(3,rev(B)))

1. Form expression tree take 4

B= A=

drop 3

rev B

Size info

size=4 4

take size=7

drop

3

size=10 size=10

020723-er-18 KAM 10/3/02

size=7 rev

A[i] =B[-(i+3)+9] =B[-i+6]

size=10

A[i] =B[-i+B.size-1] =B[-i+9]

B

size=10

A[i]=B[i]

Reduction

3. Apply Psi Reduction rules

2. Add size information

MIT Lincoln Laboratory

Implementing Psi Calculus with Expression Templates Recall: Psi Reduction for 1-d arrays always yields one or more expressions of the form: x[i]=y[stride*i+ offset] l i