issue in programming distributed memory machines is the distribution of data across the processors of the target machine. An appropriate data distribution.
NASA
Contractor
ICASE
Report
Report No.
187634
91-72
ICASE 0 _0
VIENNA FORTRAN m A FORTRAN LANGUAGE EXTENSION FOR DISTRIBUTED MEMORY MULTIPROCESSORS
_.
! e_ 0_
mad _J_t C: 0
Barbara Chapman Piyush Mehrotra Hans Zima
Contract
No.
September
Institute NASA Hampton, Operated
NAS1-18605
LI-_
0
1991
for Computer Langley
Applications
Research
Virginia
in Science
and Engineering
Center
OU..
23665-5225
by the Universities
Space
Research
Association I
o"_ t5
National Aeronautics and Space Administration IJngley FleselDrehCenter Hampton, Virginia 23665-5225
I,-
VIENNA FORTRANA FORTRAN LANGUAGE EXTENSION FOR DISTRIBUTED MEMORY MULTIPROCESSORS* Barbara
Chapman
t
Piyush
Mehrotra
t
Hans
Zima
machines
requires
t
Abstract Exploiting careful sion
performance
distribution
of Fortran
of data data
the
structures.
references.
paradigm
of data which
while
the basic features of these features.
potential across
provides However,
Thus,
the
explicitly
the
the
processors.
user
with
programs
user
has
controlling
of Vienna
of distributed
Fortran
memory
Vienna
a wide
range
in Vienna
the
advantages
the
Fortran
with
are
of a shared
placement
along
Fortran
of data. a set
is a language
of facilities
for such
written memory
In this
of examples
a
extenmapping
using
global
programming
paper,
we present
illustrating
the
use
*The work described in this paper is being carried out as part of the research project "Virtual Shared Memory for Multiprocessor Systems with Distributed Memory" funded by the Austrian Research Foundation (FWF) under the grant number P7576-TEC and the ESPRIT project "An Automatic Parallelization System for Genesis" funded by the Austrian Ministry for Science and Research (BMWF). This research was also supported by the National Aeronautics and Space Administration under NASA contract NAS1-18605 while the authors were in residence at ICASE, Mail Stop 132C, NASA Langley Research Center, Hampton, VA 23666. The authors assume all responsibility for the contents of the paper. tDepartment of Statistics and Computer Science, University of Vienna, Rathausstrasse 19/II/3, A1010 Vienna AUSTRIA tICASE,
MS 132C,
NASA
Langley
Research
Center,
Hampton
VA. 23666
USA
1
Introduction
In recent
years,
troduced
into
a number the
market
systems).
In contrast
build
are
and
of distributed (e.g.
programming
the
between
the communication
of data
has
shown
the
memory
programming
The
apparent
to a number range
providing
(e.g.
Research totype
the
ory
in the
code
to be written but
distribution
(Single
Program
local
non-local
is then
used
Multiple
multiprocessor. into
the
The
and
references code.
combining
statements
In this called machines critical mapping specifying so that of data. the
paper,
Vienna
parallel
data
for performance, of data data it has
the
a fixed
user
by Vienna
as easy
the
analyzes
in the
the
based
on
by inserting
the
communication data
at the
allows
the
user
only.
earliest
described Vienna
including with
another
mappings.
is to make
as possible,
Fortran
without
the
be specified The
transition sacrificing
overall
where
from
the
performance.
references
the
user.
distributed to the
in the by
range
memory processors
is the
of facilities
for
one array
is mapped
redistribution
manner,
language
sequential
77,
control
dynamic
of the
The
for FORTRAN
in a simple aim
an SPMD memory
to explicitly
a wide
It also supports
into
This
in time.
of data
supports
data.
in particular
extension
by alignment,
array. can
point
for
mem-
statements
possible
the user
systems
data
by
possible,
permit
of pro-
distributed
where
distribution
here
distribution
distributions
complex
the
code
global
programs
system
These
program's
specified
language to write
Since
operating
message-passing
is optimized
led
efforts
do on a shared
translating
appropriate
These
[16].
target
distributions
have
of a number
the
on the
code,
distributed
parallelization.
of the
of restructuring
source
associated
Experience
makes
and
as one would
for execution
to
synchronization
machines.
MIMDizer
distribution
process
based
paradigm
development
references, the
such
the
in-
experimentation.
to automatic
[6, 26], and
data
program
processors.
occurring
Fortran
resulted
to specify
to guide
relationship
permits
has
global
references
distributions,
Frequently
version
areas
the extensions
onto
only
mechanisms
resolution),
a machine-independent
which
global
conflict
by sending
we present
full language
provided
and
Fortran,
using
and
the
algorithm.
not
inhibits
of hardware
[10], SUPERB
the
also
details
on
references
Finally,
but
by the
memory
are satisfied
generated
low level
of the
shared
combination
compiler
as dictated
such prone
details
simulating
Data)
non-local
processors
transputer
However,
programming
using
require
complete
been
are less expensive
of processors.
to specify
error
architechures
memory
of these
as Kali
number
have
several
a shared
paging
last
these
and
of using
a suitable
such
machine,
data
at
automatic
systems,
allow
and
advantages
user
computers
NCUBE,
machines,
to provide
tedious
of attempts
from
support
user
multiprocessing
series,
to a large
requires
and
forcing
iPSC
memory
scalable
paradigm
that
Intel's
to shared
potentially
memory
whereas extensions
algorithm
to a
In this guage.
paper,
we concentrate
Future
compilation
model.
plements
their
is more This
papers
will The
describing
a full
in Section
by a discussion
the
description
section
introduces
by several
illustrated
is followed
give
next
description
fully
on
and
semantics
of the
of both
the
language
features
the
short
examples.
3, where
three
of related
syntax
work.
Vienna
Fortran
The
use
examples Finally,
in the
last
sup-
constructs
and
section
the
and
language
presented
lan-
and
extensions
of the
are
basic
discussed.
we reach
some
conclusions.
2
The
The
critical
across the
the
distributed
processors
of the
machine.
of the
hardware
and
provides
trol over
the
nisms
data
the
characteristics
simple
2.1
basic
model,
Vienna
is specified
Fortran
arrays
where
above the
Processor as shown
be a constant processors the
number
arrays
user
subsets
include
available,
user
language
to handle
the
provides
more
of processors.
of
Vienna
complete
con-
extensions
can
most
frequently
general
In this together
size
details
provided.
the
the full language,
the
for
mecha-
section,
with
we
a formal
available
declares
value
R)
keywords
a set
be declared
ASSERT
of processors
in the
to be greater
is determined
and
R .GE.
a two dimensional
in the underlying
of processors
can
requires
onto
program
which
in a manner
the
data
similar
to
here:*
of R is asserted whose
*In this paper,
These
of data
is crucial
choice
allow
set which
constructs;
the
mechanisms which
the
distribution
of processors
program.
allow
arbitrary
model
P2D(R,
statement
value
in the
distribution
in [27].
programming
PROCESSORS The
number
an extended
of language
influencing
constructs
which
onto
data
communication
structures
structures set
the
set of language
and
factors
is the
Structure
can be distributed. Fortran
code;
of features
machines
An appropriate
of the
data
Fortran
memory
patterns,
distributions
Processor
The
set
data
only the
programming
access
of the
a basic
for mapping
describe
parallel
the
an extensive
into but
target
resulting
mapping
be divided occurring
of Vienna
in programming
application,
Fortran
Features
issue
performance
of an the
Basic
thus
avoids
in code segments
processor
than
at load
machine.
8 array,
or equal time
This
are emboldened
with
Here,
R is considered
depending
allows
recompilation
to 8.
P2D,
the
if the while
code number
on
the
R _ processors
total
number
to be parameterized of processors
comments
to of by
change.
are in italics.
The
processor
bounds
can be compile-time
constants
if the code
has to run on a fixed
number
of processors. The
R 2 processors
introduced
as a two-dimensional defined
array
individual
topology:
two-dimensional The
processor
secondary
the
P1D
The
number
between
not
imply
are
way
a wella specific
connected
by a
arrays
static
can
This
and the
of distributed may
processor
both
(I:$NP).
with
of a
Thus,
built-in
yields
the
total
An implicit
each
program.
are implicitly
If
declared
if the program
of processors
can
the
execution.
provided
number
declaration
view
to establish
with
$NP
current
$PL
range
maximum
be annotated
dynamic.
of the
The
of the program
strict
highly
efficient of arrays
arrays, optimize
i.e., the
to specify
arrays former
have
in Vienna
Arrays
currently
as only
available
be omitted.
arrays
changed
during
code
for the
for which
whose
implementation
of its elements
can
be
of arrays
distribution
DYNAMIC
divided
may
distribution
into
whose
the
two
cat-
distribution
in the
remains facilitates
onto
be modified
as discussed
execution,
static the
is
during next
sub-
and
others
compiler's
task
machine.
no distribution
is conceptually
Fortran
whose
program target
distribution
consists
to be declared
between
there
the
category
declaration.
separation maybe
semantics
the
the
and
is used
function
is also $P
alternative
associated
intrinsic
with the subscript
Distributed
scope
distribution
then
di-
structures.
during
then
of the
to lower
P1D(R*R)
an
ordering
is implicitly
$PL(I:$NP)
array
RESHAPE
provides
processor
program
declaration,
8
which
parameterless
by the
be reshaped
of Arrays
processors.
of generating
compiler
machine,
to the
declaration
processor
declaration
execution
The
does
can
column-major
of a program
used
a one-dimensional
within
whose
structure
identical
Distribution
section.
determines
processors
above,
array
secondary
structure
2.2
fixed
target
R .GE.
processor
and
being
target
egories,
the
ASSERT
primary
of processors
program's
R)
Fortran
a reference
array
declaration
same
as follows:
The
SP while
on the
An
in the
P2D(3,7)
the
as declared
P2D.
process
one-dimensional
the
the
is no explicit
requires
that
structures
P2D(R,
array
one-dimensional there
can be accessed
for example
that
specify
is a one-dimensional
primary
identifier
not
processor
two-dimensional
relationship
however,
structure,
PROCESSORS Here,
declaration
use of subscripts;
Note,
it does
above
mesh.
primary
mensional
by the
processor.
in particular,
in the
has
been
specified
a single
copy
of the
by replicating
the
data
is the data
structure
same
as that
structure.
The
on each
of the
processorsexecuting the program. It is the compiler's task to maintain consistencyamong thesecopies.Note that scalar variablesare handled in a similar manner. Static The
Distribution
distribution
laration
of Arrays
of arrays
of the
array.
is specified
arrays,
expression, distribution
Intrinsic as block,
cyclic
distribution
Here
and
rows
The
for distributing
specifying following data
across
)
REAL
C(100)
DIST
( CYCLIC
(U))
REAL
9(100,
the
then
i.e.
- the
rank.
In the
rank
of either
blocks of the
processors. the
are
use of the
array
that D are
columns
the
as shown
dec-
are
language, primary in the
such
that
array
the
left
each
in one
distribution
of which
into disjoint
spec-
sections.
distributions
the
use
array
such
of the
intrinsic
of processors.
elements
) each
B are
cyclically
elision
the
occurring show
distribution
)
( BLOCK,:
processor
distributed
of array assigned
function,
into
to the
dimension blocks
with
owns
a contiguous
cyclically,
C are
denoted
corresponding
partitioned
of distributed
basic
The
blocks
specifies
of the
by
elements
the
the
specified
language,
functions
commonly
the
arrays
a one-dimensional
( CYCLIC
DIST
the
basic
declarations
DIST
the
with
while
is to be partitioned
the
B(100)
100)
All
In the
array
REAL
number
structures
of the
The
ranges
of n distribution
( BLOCK
D shows
processor
block-cyclic.
rank.
DIST
and
the
same
consists
for
subscript
distributed.
A(100)
across
array,
array
are
REAL
while
the
the
dimension
A is partitioned
fashion,
associated
arrays
have
provided
functions
elements,
array
are
their
the
must
corresponding
functions
K
how
for a n-dimensional
ifies how the
size
specifies
declaration
expression
expression
DISTdex
adi, 1 __ i