NASA Contractor Report 187634 ICASE Report No. 91-72 ... - CiteSeerX

24 downloads 14398 Views 1MB Size Report
issue in programming distributed memory machines is the distribution of data across the processors of the target machine. An appropriate data distribution.
NASA

Contractor

ICASE

Report

Report No.

187634

91-72

ICASE 0 _0

VIENNA FORTRAN m A FORTRAN LANGUAGE EXTENSION FOR DISTRIBUTED MEMORY MULTIPROCESSORS

_.

! e_ 0_

mad _J_t C: 0

Barbara Chapman Piyush Mehrotra Hans Zima

Contract

No.

September

Institute NASA Hampton, Operated

NAS1-18605

LI-_

0

1991

for Computer Langley

Applications

Research

Virginia

in Science

and Engineering

Center

OU..

23665-5225

by the Universities

Space

Research

Association I

o"_ t5

National Aeronautics and Space Administration IJngley FleselDrehCenter Hampton, Virginia 23665-5225

I,-

VIENNA FORTRANA FORTRAN LANGUAGE EXTENSION FOR DISTRIBUTED MEMORY MULTIPROCESSORS* Barbara

Chapman

t

Piyush

Mehrotra

t

Hans

Zima

machines

requires

t

Abstract Exploiting careful sion

performance

distribution

of Fortran

of data data

the

structures.

references.

paradigm

of data which

while

the basic features of these features.

potential across

provides However,

Thus,

the

explicitly

the

the

processors.

user

with

programs

user

has

controlling

of Vienna

of distributed

Fortran

memory

Vienna

a wide

range

in Vienna

the

advantages

the

Fortran

with

are

of a shared

placement

along

Fortran

of data. a set

is a language

of facilities

for such

written memory

In this

of examples

a

extenmapping

using

global

programming

paper,

we present

illustrating

the

use

*The work described in this paper is being carried out as part of the research project "Virtual Shared Memory for Multiprocessor Systems with Distributed Memory" funded by the Austrian Research Foundation (FWF) under the grant number P7576-TEC and the ESPRIT project "An Automatic Parallelization System for Genesis" funded by the Austrian Ministry for Science and Research (BMWF). This research was also supported by the National Aeronautics and Space Administration under NASA contract NAS1-18605 while the authors were in residence at ICASE, Mail Stop 132C, NASA Langley Research Center, Hampton, VA 23666. The authors assume all responsibility for the contents of the paper. tDepartment of Statistics and Computer Science, University of Vienna, Rathausstrasse 19/II/3, A1010 Vienna AUSTRIA tICASE,

MS 132C,

NASA

Langley

Research

Center,

Hampton

VA. 23666

USA

1

Introduction

In recent

years,

troduced

into

a number the

market

systems).

In contrast

build

are

and

of distributed (e.g.

programming

the

between

the communication

of data

has

shown

the

memory

programming

The

apparent

to a number range

providing

(e.g.

Research totype

the

ory

in the

code

to be written but

distribution

(Single

Program

local

non-local

is then

used

Multiple

multiprocessor. into

the

The

and

references code.

combining

statements

In this called machines critical mapping specifying so that of data. the

paper,

Vienna

parallel

data

for performance, of data data it has

the

a fixed

user

by Vienna

as easy

the

analyzes

in the

the

based

on

by inserting

the

communication data

at the

allows

the

user

only.

earliest

described Vienna

including with

another

mappings.

is to make

as possible,

Fortran

without

the

be specified The

transition sacrificing

overall

where

from

the

performance.

references

the

user.

distributed to the

in the by

range

memory processors

is the

of facilities

for

one array

is mapped

redistribution

manner,

language

sequential

77,

control

dynamic

of the

The

for FORTRAN

in a simple aim

an SPMD memory

to explicitly

a wide

It also supports

into

This

in time.

of data

supports

data.

in particular

extension

by alignment,

array. can

point

for

mem-

statements

possible

the user

systems

data

by

possible,

permit

of pro-

distributed

where

distribution

here

distribution

distributions

complex

the

code

global

programs

system

These

program's

specified

language to write

Since

operating

message-passing

is optimized

led

efforts

do on a shared

translating

appropriate

These

[16].

target

distributions

have

of a number

the

on the

code,

distributed

parallelization.

of the

of restructuring

source

associated

Experience

makes

and

as one would

for execution

to

synchronization

machines.

MIMDizer

distribution

process

based

paradigm

development

references, the

such

the

in-

experimentation.

to automatic

[6, 26], and

data

program

processors.

occurring

Fortran

resulted

to specify

to guide

relationship

permits

has

global

references

distributions,

Frequently

version

areas

the extensions

onto

only

mechanisms

resolution),

a machine-independent

which

global

conflict

by sending

we present

full language

provided

and

Fortran,

using

and

the

algorithm.

not

inhibits

of hardware

[10], SUPERB

the

also

details

on

references

Finally,

but

by the

memory

are satisfied

generated

low level

of the

shared

combination

compiler

as dictated

such prone

details

simulating

Data)

non-local

processors

transputer

However,

programming

using

require

complete

been

are less expensive

of processors.

to specify

error

architechures

memory

of these

as Kali

number

have

several

a shared

paging

last

these

and

of using

a suitable

such

machine,

data

at

automatic

systems,

allow

and

advantages

user

computers

NCUBE,

machines,

to provide

tedious

of attempts

from

support

user

multiprocessing

series,

to a large

requires

and

forcing

iPSC

memory

scalable

paradigm

that

Intel's

to shared

potentially

memory

whereas extensions

algorithm

to a

In this guage.

paper,

we concentrate

Future

compilation

model.

plements

their

is more This

papers

will The

describing

a full

in Section

by a discussion

the

description

section

introduces

by several

illustrated

is followed

give

next

description

fully

on

and

semantics

of the

of both

the

language

features

the

short

examples.

3, where

three

of related

syntax

work.

Vienna

Fortran

The

use

examples Finally,

in the

last

sup-

constructs

and

section

the

and

language

presented

lan-

and

extensions

of the

are

basic

discussed.

we reach

some

conclusions.

2

The

The

critical

across the

the

distributed

processors

of the

machine.

of the

hardware

and

provides

trol over

the

nisms

data

the

characteristics

simple

2.1

basic

model,

Vienna

is specified

Fortran

arrays

where

above the

Processor as shown

be a constant processors the

number

arrays

user

subsets

include

available,

user

language

to handle

the

provides

more

of processors.

of

Vienna

complete

con-

extensions

can

most

frequently

general

In this together

size

details

provided.

the

the full language,

the

for

mecha-

section,

with

we

a formal

available

declares

value

R)

keywords

a set

be declared

ASSERT

of processors

in the

to be greater

is determined

and

R .GE.

a two dimensional

in the underlying

of processors

can

requires

onto

program

which

in a manner

the

data

similar

to

here:*

of R is asserted whose

*In this paper,

These

of data

is crucial

choice

allow

set which

constructs;

the

mechanisms which

the

distribution

of processors

program.

allow

arbitrary

model

P2D(R,

statement

value

in the

distribution

in [27].

programming

PROCESSORS The

number

an extended

of language

influencing

constructs

which

onto

data

communication

structures

structures set

the

set of language

and

factors

is the

Structure

can be distributed. Fortran

code;

of features

machines

An appropriate

of the

data

Fortran

memory

patterns,

distributions

Processor

The

set

data

only the

programming

access

of the

a basic

for mapping

describe

parallel

the

an extensive

into but

target

resulting

mapping

be divided occurring

of Vienna

in programming

application,

Fortran

Features

issue

performance

of an the

Basic

thus

avoids

in code segments

processor

than

at load

machine.

8 array,

or equal time

This

are emboldened

with

Here,

R is considered

depending

allows

recompilation

to 8.

P2D,

the

if the while

code number

on

the

R _ processors

total

number

to be parameterized of processors

comments

to of by

change.

are in italics.

The

processor

bounds

can be compile-time

constants

if the code

has to run on a fixed

number

of processors. The

R 2 processors

introduced

as a two-dimensional defined

array

individual

topology:

two-dimensional The

processor

secondary

the

P1D

The

number

between

not

imply

are

way

a wella specific

connected

by a

arrays

static

can

This

and the

of distributed may

processor

both

(I:$NP).

with

of a

Thus,

built-in

yields

the

total

An implicit

each

program.

are implicitly

If

declared

if the program

of processors

can

the

execution.

provided

number

declaration

view

to establish

with

$NP

current

$PL

range

maximum

be annotated

dynamic.

of the

The

of the program

strict

highly

efficient of arrays

arrays, optimize

i.e., the

to specify

arrays former

have

in Vienna

Arrays

currently

as only

available

be omitted.

arrays

changed

during

code

for the

for which

whose

implementation

of its elements

can

be

of arrays

distribution

DYNAMIC

divided

may

distribution

into

whose

the

two

cat-

distribution

in the

remains facilitates

onto

be modified

as discussed

execution,

static the

is

during next

sub-

and

others

compiler's

task

machine.

no distribution

is conceptually

Fortran

whose

program target

distribution

consists

to be declared

between

there

the

category

declaration.

separation maybe

semantics

the

the

and

is used

function

is also $P

alternative

associated

intrinsic

with the subscript

Distributed

scope

distribution

then

di-

structures.

during

then

of the

to lower

P1D(R*R)

an

ordering

is implicitly

$PL(I:$NP)

array

RESHAPE

provides

processor

program

declaration,

8

which

parameterless

by the

be reshaped

of Arrays

processors.

of generating

compiler

machine,

to the

declaration

processor

declaration

execution

The

does

can

column-major

of a program

used

a one-dimensional

within

whose

structure

identical

Distribution

section.

determines

processors

above,

array

secondary

structure

2.2

fixed

target

R .GE.

processor

and

being

target

egories,

the

ASSERT

primary

of processors

program's

R)

Fortran

a reference

array

declaration

same

as follows:

The

SP while

on the

An

in the

P2D(3,7)

the

as declared

P2D.

process

one-dimensional

the

the

is no explicit

requires

that

structures

P2D(R,

array

one-dimensional there

can be accessed

for example

that

specify

is a one-dimensional

primary

identifier

not

processor

two-dimensional

relationship

however,

structure,

PROCESSORS Here,

declaration

use of subscripts;

Note,

it does

above

mesh.

primary

mensional

by the

processor.

in particular,

in the

has

been

specified

a single

copy

of the

by replicating

the

data

is the data

structure

same

as that

structure.

The

on each

of the

processorsexecuting the program. It is the compiler's task to maintain consistencyamong thesecopies.Note that scalar variablesare handled in a similar manner. Static The

Distribution

distribution

laration

of Arrays

of arrays

of the

array.

is specified

arrays,

expression, distribution

Intrinsic as block,

cyclic

distribution

Here

and

rows

The

for distributing

specifying following data

across

)

REAL

C(100)

DIST

( CYCLIC

(U))

REAL

9(100,

the

then

i.e.

- the

rank.

In the

rank

of either

blocks of the

processors. the

are

use of the

array

that D are

columns

the

as shown

dec-

are

language, primary in the

such

that

array

the

left

each

in one

distribution

of which

into disjoint

spec-

sections.

distributions

the

use

array

such

of the

intrinsic

of processors.

elements

) each

B are

cyclically

elision

the

occurring show

distribution

)

( BLOCK,:

processor

distributed

of array assigned

function,

into

to the

dimension blocks

with

owns

a contiguous

cyclically,

C are

denoted

corresponding

partitioned

of distributed

basic

The

blocks

specifies

of the

by

elements

the

the

specified

language,

functions

commonly

the

arrays

a one-dimensional

( CYCLIC

DIST

the

basic

declarations

DIST

the

with

while

is to be partitioned

the

B(100)

100)

All

In the

array

REAL

number

structures

of the

The

ranges

of n distribution

( BLOCK

D shows

processor

block-cyclic.

rank.

DIST

and

the

same

consists

for

subscript

distributed.

A(100)

across

array,

array

are

REAL

while

the

the

dimension

A is partitioned

fashion,

associated

arrays

have

provided

functions

elements,

array

are

their

the

must

corresponding

functions

K

how

for a n-dimensional

ifies how the

size

specifies

declaration

expression

expression

DISTdex

adi, 1 __ i