## Parallel Computing and Parallel Programming - LIP Lisboa

Introduction. M.A. Oliveira. Outline. Parallel Computing and Parallel Programming. Miguel Afonso Oliveira. Laboratório de Instrumentaç˜ao e Fısica Experimental ...

Introduction M.A. Oliveira Outline

Parallel Computing and Parallel Programming Miguel Afonso Oliveira Laborat´ orio de Instrumenta¸c˜ ao e F´ısica Experimental de Part´ıculas LIP

LNEC April 2010

Introduction M.A. Oliveira Outline

Outline

Outline Introduction M.A. Oliveira Outline

1 Parallel Computing

What is Parallel Computing? Why do Parallel Computing? Limits of Parallel Computing? 2 Parallel Programming Notions

Scalability Speedup factor Efficiency Maximum speedup Amdahl’s Law Practical Limits: Amdahl’s Law versus Reality Networking

Outline Introduction M.A. Oliveira Outline

3 Parallel Computers

Flynn’s Taxonomy Flynn’s Taxonomy Memory model Taxonomy

4 Parallel Programming

The Two Extreme Models Parallel Programming: The Real World

Introduction M.A. Oliveira Parallel Computing What? Why? Limits

Notions

Parallel Computing

What is Parallel Computing? Introduction M.A. Oliveira Parallel Computing What? Why? Limits

Notions

Parallel Computing: use of multiple processing units or computers for a common task. Each processing unit works on its section of the problem. Processing units can exchange information.

PU_1 works on this are of the problem

PU_2 works on this are of the problem

PU_3 works on this are of the problem

PU_4 works on this are of the problem

Why do Parallel Computing? Introduction M.A. Oliveira Parallel Computing What? Why? Limits

Notions

To compute beyond the limits of single PU systems: achieve more performance; utilize more memory.

To be able to: solve that can’t be solved in a reasonable time with single PU systems; solve problems that don’t fit on a single PU system or even a single system.

So we can: solve larger problems; solve problems faster; solve more problems.

Limits of Parallel Computing Introduction M.A. Oliveira Parallel Computing What? Why? Limits

Notions

Theoretical Upper Limits: Amdahl’s Law.

Practical Limits Load Balancing. non-computacional sections.

Other considerations: Time to develop/rewrite code. Time do debug and optimize code.

Introduction M.A. Oliveira Parallel Computing Notions Scalability Speedup factor Efficiency Maximum speedup Amdahl’s Law Praticalities Network

Parallel Programming Notions

Scalability Introduction M.A. Oliveira Parallel Computing Notions Scalability Speedup factor Efficiency Maximum speedup Amdahl’s Law Praticalities Network

Imprecise term: It’s used to indicate if an algorithm or a system can be increased in size and in doing so obtain increased performance.

Speedup factor Introduction M.A. Oliveira Parallel Computing Notions Scalability Speedup factor Efficiency Maximum speedup Amdahl’s Law Praticalities Network

S(p) =

Execution time for best sequential algorithm Execution time using p processors

=

ts tp

Efficiency Introduction M.A. Oliveira Parallel Computing Notions Scalability Speedup factor Efficiency Maximum speedup Amdahl’s Law Praticalities Network

E=

Execution time using one processor Execution time on multiprocessor × p

=

ts tp × p

=

S(p) p

Maximum speedup Introduction M.A. Oliveira Parallel Computing Notions Scalability Speedup factor Efficiency Maximum speedup Amdahl’s Law Praticalities Network

All parallel codes contain: parallel section serial sections

Maximum speedup is usually the linear speedup. ts

tp

S(p) =

ts ts p

=p

Superlinear speedup - S(p) > p - is not theoretically excluded but is usually due to: suboptimal sequencial algorithm, unique feature of the parallel architecture, non-deterministic nature of algorithm.

Amdahl’s Law Introduction M.A. Oliveira Parallel Computing

Derivation ts

Notions Scalability Speedup factor Efficiency Maximum speedup Amdahl’s Law Praticalities Network

ft s

(1−f)ts

S(p) =

ft s

fts +

(1−f)ts p

Corollary lim S(p) =

p→∞

1 f

ts (1−f )ts p

=

p 1+(p−1)f

Amdahl’s Law Introduction M.A. Oliveira Parallel Computing Notions Scalability Speedup factor Efficiency Maximum speedup Amdahl’s Law Praticalities Network

Practical Limits: Amdahl’s Law versus Reality Introduction M.A. Oliveira Parallel Computing Notions Scalability Speedup factor Efficiency Maximum speedup Amdahl’s Law Praticalities Network

Amdahl’l Law provides a theoretical upper limit on parallel speedup but in reality the situation is even worse due to: Load Balancing. Scheduling. Communications. I/O.

Networking Introduction M.A. Oliveira Parallel Computing Notions Scalability Speedup factor Efficiency Maximum speedup Amdahl’s Law Praticalities Network

The purpose for the interconnecting network is to provide a physical path for the memory access or for messages. There are several key issues when considering the network: Design (mesh, hypercube, crossbar, tree,...). Bandwidth. Latency. Cost. A “good” algorithm takes into account the underlying network characteristics.

Introduction M.A. Oliveira Computer Early days Recently Currently

Programming

Parallel Computers

Early days: Flynn’s Taxonomy Introduction M.A. Oliveira Computer Early days Recently Currently

Programming

Recently: Flynn’s Taxonomy Introduction M.A. Oliveira Computer Early days Recently Currently

MIMD

Programming

SPMD Single Program Multiple Data

MPMD Multiple Program Multiple Data

Memory Model Taxonomy Introduction M.A. Oliveira Computer

Parallel Computers

Interconnect

Memory

CPU_N

Mem_0

CPU_0 CPU_1

CPU_0

Mem_1

Distributed Memory

CPU_1

Mem_N

Shared Memory

Programming

CPU_N

Interconnect

Early days Recently Currently

Introduction M.A. Oliveira Computer Programming The Models Reality

Parallel Programming

The Two Extreme Parallel Programming Models Introduction M.A. Oliveira

Parallel Computers

Computer Programming

Shared Memory

Distributed Memory

The Models Reality

Shared Memory Programming OpenMP

Message−Passing Programming MPI

The distributed model can be used directly on a shared memory system. Using the shared memory model on a distributed memory system is only possible indirectly. Both models can be combined to optimize performance.

Parallel Computers and Parallel Programming: The Real World Introduction M.A. Oliveira Computer

Parallel Programming

Programming Homogeneous Systems

The Models Reality Distributed Memory

PVM

Distributed Global Address Space − DGAS

Software MPI

Operating System

Hardware

Library/ Language

Heterogeneous Systems

Partinioned Global Address Space − PGAS

CAF HPF UPC Chapel SUN Fortress IBM X10

Shared Memory

RDMA Myrinet

Intel Cluster OpenMP

Infiniband

CPU+GPU

CUDA

OpenMP

OpenCL

CPU+ Coprocessor

CPU+FPGA