NASA/CRICASE
1999-209546 Report
No.
99-35
Parallel Method
Implementation
Abdelkader
Baggag
Purdue
University,
Harold
Atkins
NASA
Langley
David
Keyes
Old Dominion Lawrence ICASE,
August
Hampton,
1999
West-Lafayette,
Research
University,
Livermore
of the Discontinuous
Virginia
Indiana
Center,
Hampton,
Norfolk,
National
Galerkin
Virginia
Virginia,
Laborato_,
Livermore,
California,
and
The
NASA
STI
Program
Offwe.
Since its founding, NASA has been dedicated to the advancement of aeronautics and space science. The NASA Scientific and Technical Information
(STI) Program
part in helping important
maintain
plays
a key
CONFERENCE
PUBLICATIONS.
Collected
papers
from scientific
technical seminars,
conferences, symposia, or other meetings sponsored
co-sponsored
this
and or
by NASA.
role.
The NASA Langley NASA's
NASA
Office
. . in Profile
STI Program
Research scientific
Office
is operated
by
SPECIAL
PUBLICATION.
technical,
or historical
substantial
largest collection of aeronautical and space science STI in the world. The Program Office is also NASA's institutional mechanism tbr
TECHNICAL
the results
of its research
public
interest.
TRANSLATION.
English-
language translations of foreign and technical material pertinent NASA's mission.
and
development activities. These results are published by NASA in the NASA STI Report Series, which includes the following report
Specialized
services
that help round
TECHNICAL completed
PUBLICATION. research
Reports
or a major
of
databases, organizing and publishing research results.., even providing videos.
For more inlormation Program
Office,
about
the NASA
Access
to be of continuing reference counter-part or peer-reviewed
Page at http://www.sti.nasa.gov/ST1homepage.html
professional papers, stringent limitations length and extent presentations. TECHNICAL
value. NASA formal
but having less on manuscript
of graphic
MEMORANDUM.
and technical
findings
papers, and bibliographies that contain minimal annotation. Does not contain analysis.
STI Program
Home
Email your question
[email protected]
•
Fax your question to the NASA Help Desk at (301) 621-0134
•
Phone the NASA (301) 621-0390
via the Internet
Access
to
Access
Help Desk at
Write to: NASA
CONTRACTOR
the NASA
•
that are
or of specialized interest, release reports, working
STI
you can:
compilations of significant scientific and technical data and information deemed
extensive
out the
significant
phase of research that present the results of NASA programs and include extensive data or theoretical analysis. Includes
preliminary e.g., quick
scientific to
ST1 Program Office's diverse offerings include creating custom thesauri, building customized
types:
Scientific
from
NASA programs, projects, and missions, often concerned with subjects having
Center, the lead center |br and technical inlormation.
The NASA STI Program Office provides access to the NASA STI Database, the
disseminating
Scientific,
information
REPORT.
Scientific
technical findings by NASA-sponsored contractors and grantees.
and
Access
NASA Center 7121 Standard Hanover,
Help Desk for AeroSpace Drive
MD 21076-1320
Information
NASA/CRICASE
1999-209546 Report
No.
99-35
Parallel Method
Implementation
Abdelkader
Baggag
Purdue
Universi_,
Harold
Atkins
NASA
Langley
David
Keyes
Research
Old Dominion Lawrence
Livermore Hampton,
Institute
for
Operated
National Space
Virginia,
Laboratory,
Applications
Research
Center,
by Universities
Space
Livermore,
California,
and
Aeronautics
in Science Hampton, Research
and
Engineering
VA Association
and
Administration
Langley Research Hampton, Virginia
August
Virginia
Virginia
Computer
Langley
Hampton,
Norfolk,
National
Galerkin
Indiana
Center,
University,
ICASE,
NASA
West-Lafayette,
of the Discontinuous
1999
Center 23681-2199
Prepared for Langley Research under Contract NAS 1-97046
Center
Available
from the following:
NASA Cen_rlbr
AeroSpace
lnformation(CASl)
National
7121StandardDfive
5285 Po_
Hanover.
Spdngfield.
MD 21076-1320
(301)621-0390
Technicallnformation Royal Road
(703)487-4650
VA 22161-2171
Service(NTIS)
PARALLEL IMPLEMENTATION
OF
ABDELKADER
Abstract.
This
continuous
Galerkin
unstructured and
and
has
been
Sun
workstations.
The
Euler
Subject
that
accuracy
is required.
An
is that
within
each
in the
DG
element
thought
of as a separate
compact
form
allows and
of the
a heterogeneous even
the
Many arbitrary a very
of tile
smoothness which
greatly
features can
of tile
be
are
crucial
combined
Baggag
NASA Langley fDepartment
National (email:
Its
for tile
it well
keyes_k2.11nl,
gov).
are
implemented
in an
is MPI-based
and
slightly
clusters
of SGI
superlinear
object
stability
robustness
and
speedup
oriented,
robust
shown
on
unstructured
can accuracy
treatment
by the National
& Statistics,
of
of COml)lex
time-marching
methods,
Aeronautics
at the Institute
Branch,
and
have
been even
near
geometries. such
as
University, NASA
Research
degree
Norfolk, Langley
of approxinmtion the
course
form,
which
the
of the
NASA Contract
Center,
DG
inethod
disadvantages
No. NAS1-97046 ([CASE),
IN 47907-1398,
Lawrence
Hamptom
to
to the
These
VA 23681-2199, ISCR,
lead
treatment,
Engineering
West-Lafayette,
Hampton,
of
[1, 2, 3, 4, 5] for
imt)lementation.
VA 23529-0162, Research
also
special
and
be
compactness
problems,
in Science Building,
may
[6] to be insensitive
One
solution
The
over
without
under
The
neighbors.
proven
studies
Center,
its
in time
In semi-discrete
Science
ICASE,
the
Runge-Kutta.
finite-element
element
This
condition
1398 Computer Langley
fi'om
and
boundaries
Applications
usual
each
for nonlinear
for Computer
NASA
data
rigorously
boundary
element methods in which
element.
Thus,
element
refinement
any
generating
topology, to
finite accurate
the
platforms.
and Space Administration
Old Dominion
CA 94551-9989,
element
be applied
and
t)oundary
computer
and
corot)act
conq)utations
method
to the
some
in mesh
and
of high-order
elements.
element
dimensions,
been
and
local
properties
of spatial has
is, the from
robust
DG
neighboring
for parallel
vary
formulation
purdue, edu). Modeling and Simulation
Livermore,
SP2,
non-smooth
approaches
been
time-dependent
the
obtain
That
method.
and
to
Research Center, Hampton, VA 23681-2199. of Computer Sciences, Purdue University,
Laboratory,
show
is a
large-scale
are
to
suited
in the
were in residence
c. nasa. gov). of Mathemati(:s
IBM
developnmnt
[)etween
needs
can
number
for
looking
of prol)lems.
compact
explicit
was supported
and Keyes
merely
DG method
the
with
(email: baggag_cs, ¢Computational h. 1. atkins_lar §Department
mesh.
increases
*This research while
any The
method
equations
by
equations
accuracy
shapes,
method.
suited
resulting
makes
of rigor
method's
element robust
method
loss
Origin
on
implementation
strategies,
for the
distinction
the
of governing
without
is well
reconstructed that
(DG)
framework
important
treatment
choice
a calculation
method
entity
DG
SGI
t)arallelization
Galerkin
a practical
method
is not
the
has
parallel
Origin,
robustness
Dis-
Science
discontinuous
The
for
SGI
method.
parallelization
approaches
Tile
as the
and
Several
of the
Galerkin
accuracy
provides
grids.
accuracy
METHOD*
KEYES _
discontinuous
simulations.
such
method,
Computer The
method
method
Galerkin
high-order
Motivation.
its
scattering.
t)resented
AND DAVID
of the
symmetric
platforlns
GALERKIN
effects.
discontinuous
unstructured
high
to cache
t,
retains
del)endent and
results
ATKINS
that
aeroacoustic
parallel
classification.
projection
method
natural
simulate
scalat)ility
equations,
implenmntation
for time
most
to
due
words,
a parallel (:()mpact
on various
t)roblem
Key
1.
used
tested
HAROLD
suited
The
code
a fixed-size
is well
evaluated.
object-oriented
using
and
DISCONTINUOUS
BAGGAG¢.
descril)es
is a spatially
grids
studied
grids,
paper
THE
(email: Livermore
VA 23681-2199,
of tile
method
is its high
quadratm'e-fi'ee
implementation
Parallel
[7] applied
iwl)ercube
high
computational
a 97.57%
hp-adal)tive
DG method
for hyperbolic
speedul)s
In both
works,
The
when
the
the
grids
form
in an object-oriented
code
tim unsteady
and triangles) platfornls descrit)tion
of the
for the
linear
work,
three
elements
prediction
description
parallelization
interface
scattering
al)proaches
from
unstructured
obtained
nearly
is sufficiently
large.
and
complex
mesh
validated
objects
are described
[6, 9, 10]
configurations.
of mixed
elements
The (squares
[11] to several
(:an be found in reference
model
a
case.
has been ported
algorithm and
elements
Devine,
[8] iml)lemellted
They
in the latter implemented
by Atkins
routines
on a NCUBE/2
grids.
on a general
parallelization
Biswas,
wave equation
laws on structured
of aeroacoustic
developed
investigators.
Be 3' et al.
been previously
of the numerical
a recently
on 256 processors.
cell sub-division
has
equations
by other to a scalar
to subdomain
with
The DG code developed
A detailed
different
type
DG method
Euler
of tim code structure,
apl)roach
efficiency
however,
concerns.
DG method
conservation
of interior
in two dimensions.
using MPI.
parallel
were of a Cartesian
quadrature-free
(',()de solves
ratio
these
have been performed
quadrature-based
and reported
requirements;
ameliorated
of the DG method
a third-order
platform
optinml
In this
and
[6] has greatly,
implenmntations
and Flaherty
paralM
storage
can be found
and efficiency
parallel
[6]; and the
in reference
results
for the
[11].
selected
are rel)orted.
Ttm next
section
of parallelization Origin200(} 2.
provides
strategies,
and several
a brief
a citation
other
Discontinuous
description
of our standard
computing
Galerkin
of the
numerical
test case,
method
and
is followed
and performance
results
by a discussion
of the (:ode on the
platforms.
Method.
The DG method
is readily
applied
to an)' equation
of the form
OU
--Ot + v. Y(u) = 0.
(2.1) on a domain The
DG
that
has been divided
method
element,
where
al)proximating
is defined
into arbitrarily
by choosing
N is a function the solution
shaped
a set of local
of the
nonoverlapping
basis
local polynomial
in the element
in terms
functions
order
of the
elements
p and
basis
_
that
(:over the domain.
t3 = {bt, 1 < l < N(p,d)} the number
of space
for each
dimensions
d, and
set
N(p,d)
(2.2)
U_
_ I,_ _
E
vi,lb_.
I=l
The governing
equation
(2.3)
ot.--_-
where
t_j, 01l,j
'ffi¢ is the
unit
is the segment
of the
generates
a set
a lower
dimensional
approximate
provides
b_is
Riemann
Because double
of equations
valued
each and
the only
in element
of the vector
approximate
meinber
_i,
element on O_ij,
solution
governing
has a distinct
mechanism
The by which
_
denotes
and
that
Vi and
'Ui.l are the
these
set b_ associated
discontinuous.
basis set. and cast. in a weak
bkYn(I_,
boundary
with
local
the approximate
Vj denote
trace
/_R denotes
=
in a neighboring
to the lmighboring
and
local
are
a numerical
element
of the solutions
the
quantities
to give
O,
solution
the trace
unknowns
The
Kijds
form
integral
expressed
flux
which
t_j,
on 0tl O. projection in terms
of
is usually
an
type. approxinmte
approximate adjacent
t)t_j.
I5).
is common
new
unknowns.
flux of the Lax-Friedriehs
element
of the
Vbk . ,K(V_) dFt +
solution
outward-normal
coefficients
onto each
df_ -
I'} is the approximate
element
The
is projected
solution,
Rienmnn elements
the
flux PR(V_,
communicate.
solution
on each
_)
resolves
The
fact that
interior
edge
the discontinuity this communication
is
and
occursin anedgeintegralmeanstile solutionin a givenelementV/ neighboring solution
solution within
additional
each
element
whole
is stored
of the
neighboring
as a function
solution
the
edge
1/).
trace
only on the Also,
of the
edge trace
because
solution
of the
tile approximate
is obtained
without
al)proximations.
The the
Vj, not on the
depends
DG method
quadrature-free
is efficiently
iml)lemented
formulation.
tile flux vector
In the
/_ is approximated
approxiumted
in terms
of the
on general
unstructured
quadrature-free
in terms
lower basis
of the
grids
formulation,
basis
set
develol)ed
hi, and
the
f(l',)
_
these
leading
Shu in [6], flux
_I¢ is
N(p,d-1)
E
f,'
bt,
fin(V/,
I._). ,F, -=
Z
?/_,l /Tt.
/=1
approximations,
of by quadrature,
and
Riemann
using
set bt:
/=1
With
of accuracy
by Atkins
apI)roximate
N(p,d)
(2.4)
to any order
the volume
to a simple
and
sequence
boundary
integrals
of matrix-vector
can
be evahtated
analytically,
instead
operations --R
(2.5)
Olvi,i] Ot
_
(M-'A)[.f,,t]
-
E(M-1Bo)[.fo.t], {A
where
The
matrices
solution
p.
M,
A, and
Bij
Thus,
the set
of matrices
and applied residual
to all elements
of equation
depend
that
map
only
associated
=
shat)e
with
of tile similarity
a particular
to it at a considerat)le
(2.5) is evaluated [i?O.t]
on the
by the tbllowing T 0 [u_,,]
similarity
saxqngs
sequence
element
and
element
of both
storage
the
degree
of the
can be precomputed and
computation.
The
of operations:
_
Vfli,
J R
0[v,,l] Ot where
T_j is the trace
on edge j. between 3.
Edges
-
operator,
(M-1A)
and
[f,,tl
E (M{J}
[()_j,t] denotes
will be referred
to as interior
Computation.
In this
B_j)[
a vector
O,l]
containing
edges or boundary
the
coefficients
edges when
of an edge quantity
it is necessary
to distinguish
the two. Parallel
DG method
are described.
flux calculations.
The first
The second
approach
third
three
approaches
occurs
stages
making
it more
of the implementation.
Tile
following
any other summarized
edge.
The first
as follows:
any element, approach
let. cOftp denote is symmetric
and
possible
parallelization
and easy to implement
eliminate
the complexity
Let ft denote
different
is symmetric
communication
mentation.
in two
and
section,
the redundant
difficult notation
to overlap
results
flux calculations; with
will be used
computation, to describe
any edge on the partition is easily
but
strategies
implemented
boundary, in the
in redundant however, and
the
serial
for the
the
increasing
parallel
imple-
and 0f_l
denote
(:ode.
It can be
1. Compute[vj,I}and[fj,t]
gft
2. Send[_j.l]and[fg.t]on 0_
v to neighboring
3. Compute 4.
Receive
5.
Conlpute
[ft] and
(M-iA)[f,]
[i_j.l] and [7j{l]
In this approach,
VfL and
[fj.l} on 0Q v from
'gO{*p
[]_5,1 V/)_,
neighboring
(M-IBj)[7_,]
and
nearly
l,artitions
partitions
Vf_
all of tile computation
is scheduled
to occur
between
tile nonblocking
send
and
--R
receive:
however,
tile edge flux [fja]
is doubly
computed
oil all partition
edges
0t_ v. It is observed
in actual
--R
conqmtations
that
OFtU, represents a fraction
of these
The above however,
is not a significant
or not they
can also
boundary
communication
the actual
offers the
to whether
partition
reflects
approach
edge integral
calculation
CPU
and
for further
are adjacent
f_1 denote
4. Compute
[f_] and (M-1A)Ill]
5. Compute
(M-IBj)[f_l]
to a partition the send
any other
element.
[Fj.l] and
[f#]
7. Compute
Other
denote in the
[fj,,}
[fj,l]
Vt_,,
redundant
processor.
V_,
and
on OFtv from and
receiw_.
The
to be shown
the elements
into groups
some of the work associated
Let Qv denote
following
sequence
any element
provides
later;
with
adjacent
inaxinlal
the to a
overlap
of
boundary The
processor
partitions
} =
that
shared
by processors
be given
Two
variations
are described.
any
A and B. For the A. Thus
{ 0 } in the first variation.
parallelization
approaches, and
the flux calculation
processors
A, 0f_(pb) denote
to processor
processor
are owned
by two
of all alternative
In these
by one
performs
by two processors
of the edges
any edge shared
{ 0 f_(p_b) } N { 0 _lv
V0_),
V_v
is performed
by processor
_(b)
[f_d]
neighboring
Strategies.
any edge owned
variation
partitions
(M-_B¢)[fJa]
flux calculations
all edges shared
ownership
-
steps:
By collecting
boundary,
and
the results
--R
[_R/] on a partition
first
on ()lily
V_1
Parallelization
the
first variation,
denote
is performed
V_l
--R
variation,
[11] used to generate improvenmnt.
between
on Of_ v to neighboring
[Vj,l] and
neighboring
of [fia,l] on all edges,
computation
a,ld
3. Compute
6. Receive
flux
The calculation
and computation.
2. Send [_3,_] and [fj,t]
3.1.
factor.
and tile redundant
inlplenmntation
potential
be performed
1. Compute
eliminates
time,
edges.
sequence
this
according
redundant
only 2% to 3% of the total
the
strategy
the computation result
is said
is commmlicated to "own"
is divided edge owned purpose
equally
between
by adjacent
of illustration,
to the
the edge.
by only one of the two processors.
In the
the two.
processor let ownership
that
of the edge
Let
B, and
In the second 0Ft (a) Of_ (ab)
of all OQ (ab)
{ cOf_(a) } vI { 0_ (b) } = { 0 } in both
variations,
Both
in tile following
computations
can be summarized
and
Process
A
[vj,l] and
Process
[fj.l]
Vf_p
Compute
1.
Compute
2. a.
Send [_j.,[and [7j.,} VO_,',Oo (')oo, Comp,_t_[_j,_}a,td [yj._] V_,
S(,nd [_j.,]and [?j.,] V0_\0_ Compute [_,.t]a,_d [Y,.L]V_
4.
Compute
Conq,ute
5.
Receive
[ft] V_ [Fj,t] and
[f_,,}
Vt)_tl; ')
Receive
[_j,t] and
6.
Compute Send
Ira,t]
[Fj.t] and
V/)_I _)
Compute
[fj,l]
V0ft{,, _1
[fa,,]
Send
V/)_,
9.
Compute
10.
Comlmt
e (M_IBj)
11.
Receive
[,'j.,]
1_9.
Compute
amount
because
(M-_A)[f_]
computation. balancing is twice
(M-'B,)[7_¢.,I
of data
these
sent
the
is actually
associated
total
V0ft(p '1
symmetric nmnber
there
less than
Compute
(M-_A)[f_]
Compute
(M-IBj) [. j,t]
are no redundant that
difficult
in this
approach
-1¢
g/)a,,
\ 0 Vt_, In I)oth variations,
apl)roach
earlier.
to overlap
presented
the communication
in the first variation
flux calculation.
however,
Vg,
flux calculations.
of the symmetric
it is more
V0_t,
(M-'Bj)[7_,I
form of edge ownershil) the edge
apl,roach; of sends
[fo,t]
Colllpllt(,
in two stages,
with
Compute
Receive
Vt_,,
Also, the m_symmetric
of the
v_
strategi(,s,
are perforined
the work that
variatkm,
V_ -u [fa._]
VOW2,, \ i)_l_, '1
under
the sends
V0_(p t')
--l't
[fj,t]
It is
assigned
to each
perforniance
a valuable
100
figures
performance
improvenmnt. Three nmthod
larger
boundary.
range
cache-resident. observed;
The
however,
is supported on each
processor.
number
of time
The steps
good
"speedup"
measure.
computational
strategy
communication. with
parts
as the number not
shows
will improve
rate is defined wall clock
per processor.
rate
(Note:
introduces
a redundant
redundant
strategies
it is more difficult
the location
number the
and thus,
and
no SUl)erlinear
of processors of the
become
is increased.
is time
is
This
of elements
per processor
calculation
a
speeduI)
number
of elements
for
when
and workingsets
for all three indicator
but maximizes
to hide communication
occurs
times
accurate,
all
the wall clock time of all processors
DG method
can be eliminated
is obtained
that
show a 50% increase rates
for the
speedup
performance
as a function
may be a better
flux calculation,
flux calculation
size and by varying
is increased,
because
size problems
rate
a fifth-order
as the number
The comt)utational
computational
cases
superlinear
in cache
as the average
time.
For these
in 128 processors
computational
the small and medium
Thus,
a slight
of processors
fit in cache
the
Origin2000.
the element
size problems,
performance
parallelization
approach
by varying
is due to the improvement
Three
The this
by the
on the
at each stage of the Runge-Kutta,
Both
scalability.
Conclusions.
which
divided
and medium
does
that
(5.1.b)
2000 elenmnts
indicating
however,
it is expected
the same.)
at about
synmmtric
l)robtem
are synchronized
is essentially
6.
into smaller
larger
by Figure
processors
This result
is divided
the code
size was controlled
In the small
of processors.
fixed problem
rate
were used to evaluate
was used and the l)roblem
of the outer some
t)roblems
in the computational
problem
of scalability
have
been
than
the overlap
described.
the usual
A simple
of computation
by introducing with
sizes are similar
computation,
a sequence and
and
of sends; it leads to a
complex
implementation.
symmetric using
the
parallelization
MPI.
method
provides
compact
of the
character
of
speedup
are
the
that
overhead
due
for several
can
be
problems.
occur
when
work
in the
This
redundant
hide
are
cache
to
each
(_lement,
l)arallel
overlat)ping
code
platforms.
The
this
eonmlunication
The
The
aeroacoustics
parallel
the
implementation. the
is negligible.
computational
is local
to
to
calculation
memory
that
used
is attributed
workingsets
flux
distributed
effectively
is exploited
large
tile
object-oriented
of computational
which
method
to
in an
presented
amount
method
for
accelerations
the
is implemented
results
a significant
form
superlinear
practice,
approach
Performance
conlI)act
cache
In
is due
overhead.
to The
imt)lementation
of
DG
gives
('omputatioI_
as
well
as
resident.
REFERENCES
[1] C.
[2] B.
JOItNSON
AND
Hyperbolic
Equation,
COCKBURN
Method
No.
(1989),
[3] B. COCKBURN, Finite
AND C.
matics
of Conlputation,
ATKtNS
[8] K.
K.
J.
ODEN,
BEY,
T.
H.
L.
[11]
A.
BAGGAG
ATKINS,
Paper
tured for NASA mark
Continued
Local
97-2032, H.
Scientific
Problems,
Galerkin pp.
Method
Discontinuous
Framework,
for
a Scalar
1 26.
ProjectioTI
Runge-Kutta
Laws 90
Galerk#_
Mathematics
Finite
of Coml)utation,
52,
III:
Laws
IV:
DiscoTttinuous
Systems,
Galerkin
,Journal
of Coral)U-
Projection
Discontinuous
Galerkin
Multidimensional
pp.
for
531
Case
Mathematics
of
Discontinuous
A
Pl).
14,
Numerical of the
of DiscoTttinuous
775
PaT"allel.
parallel
Adaptive No.
of Shock
Capturing
C)ZTURAN,
AND D.
Finite
1-3 (1994),
hp-adaptive Mathematics,
May
Methods,
Mathe-
Galerkin
Method
for
782.
Discontinuous
97-1581,
Galerkin
538.
hnplementation
PATRA,
Paper
Local The
Inequality
(1994),
Mathematics,
Applied
Dimensional
Runge-Kutta
36 (1998),
laws,
07to
Projection
545-581.
Journal,
A.
Local
113. The
Quadrature-flee
AND
Element pp.
discontinuous 20
Galerkin
(1996), Method
Methods
for
Con-
225-283. Galerkin pp.
321
for
Computational
method
for
336. AeTva-
1997. Using
Discontinuous
Galerkin
Methodology,
AIAA
1997. C.
Solver,
Coml)uting,
CONFERENCE
(1986),
Local
AND .]'. FLAHERT¥,
Analysis
ATKINS,
Aeroacoustics
Diseonti_uous
, 46
General
Entropy
206
Numerical
AIAA
June
TVB
pp.
Development
Applications,
SHI!,
pp.
Cell No.
SHU,
conservation
L. ATKINS,
[10]
On
DEVINE,
S.
II:
V_z. SHU,
(1990),
AIAA
Applied
coustie
[12]
W.
Laws,
Laws
Conservation
62,
servation
hyperbolic [9] H.
D.
for 190
Equations,
BISWAS,
1 (1989),
AND C.
SHU,
AND C.
Hyperbolic [7] R.
\V.
the
Runge-Kutta
for" Conser_ation
No.
,]INN("
L.
AND C. \¥.
Method 54,
of
435.
LIN,
Hot:,
Element
TVB
Conservation
84, No.
S.
Analysis
of Coml)utation
SHI!,
411
Meth.od
Computation,
[6] H.
_V.
for
Physics,
Finite
[5] G.
C.
S. Y.
COCKBURN,
An,
Mathenlatics
pp.
ElemeT_t
rational [4] B.
PITK_t(ATA,
AND
Element 186
.].
ICASE San
Antonio,
PUBLICATION Proceedings
Report
KEYES,
No. TX,
3352,
of a workshop
99-11,
March
Second
Parallelization Ninth 22-24,
SIAM
Object-o_ented
Conference
on Parallel
Aeroacoustics
Workshop
UnstrucProcessing
1999.
Computational
sponsored
of an
by NASA,
June
1997.
on
Bench-
REPORT
DOCUMENTATION
FormApproved OMBNo. 0704-0188
PAGE
Publicreporting burden forthis collection of information is estimated toaverage 1 hourper response, including thetimeforreviewing instructions, searching existing datasources, gatheringandmaintainln_thedataneeded,andcompleting andreviewing thecollection of information. Sendcomments regarding thisburdenestimateoranyotheraspectof this collection of nformation,includingsuggestions forreducing this burden,to Washington Headquarters Services, DirectorateforInformationOperations andReports, 1215JefTerson DavisHighway, Suite1204,Arlington, VA222024302,andto the Officeof Management andBudget,PaperworkReductionProject(0704-0188), Washngton,DC 20503. 1. AGENCY USE ONLY(Leaveblank) 2. REPORT DATE August
3. REPORT TYPE AND DATES COVERED
1999
Contractor
Report,
4. TITLE AND SUBTITLE ParalM
5. FUNDING NUMBERS
implementation
of the discoutinuous
Galerkin
nlethod
C NAS1-97046 WU 505-90-52-01
6. AUTHOR(S) Ab(Mkader Baggag Harold Atkins David Keyes 7. PERFORMING ORGANIZATION Institute
for COml)uter
Applications
Mail Stop
132C,
NASA
Hampton,
\\4, 23681-2199
Langley
9. SPONSORING/MONITORING National Langley
Aeronauti(s Research
Hampton,
8. PERFORMING ORGANIZATION REPORT NUMBER
NAME(S) AND ADDRESS(ES) in Science Research
and Engineering
Center
ICASE
Space
No. 99-35
10. SPONSORING/MONITORING AGENCY REPORT NUMBER
AGENCY NAME(S) AND ADDRESS(ES)
and
Report
Adlninistration
NASA/CR-1999-209546 ICASE Report No.
Center
VA 23681-2199
99-35
11. SUPPLEMENTARY NOTES Langley Technical Final Report Submitted to tile
Monitor:
Delmis
Proceedings
12a. DISTRIBUTION/AVAILABILITY Unclassified
M. Bushnell
of tile hlternational
ParalM
CFD
1999 Conference 12b. DISTRIBUTION CODE
STATEMENT
Uldimited
Subject Category 60, 61 Distribution: Nonstandard Availability:
(301) 621-0390
NASA-CASI
13. ABSTRACT (Maximum 200 words) This paper descril)es a parallel ilnplementation of the discontinuous GaJerkin nlethod. Discontinuous Galerkin is a spatially compact method that retains its accuracy and robustness o11 non-smooth unstructured grids and is well suited for tilne dependent simulations. Several parallelization approaches are studied and evaluated. The most natural and synmletric of the approaches has been implemented in an object-oriented code used to sinmlate aeroacoustic scattering. The parallel implelnentation is MPI-based and has been tested on various paraJ.lel platforms such as the SGI Origin, IBM SP2, and clusters of SGI and Sun workstations. The scalability results presented for the SGI Origin show slightly superlinear speedup on a fixed-size problenl due to cache effects.
14. SUBJECT TERMS discontinuous Galerkin unstructured
grids,
15. NUMBER OF PAGES methods,
Euler
17. SECURITY CLASSIFICATION OF REPORT Unclassified NISN 7540-01-280-5500
equations,
parallelization high-order
strategies, accuracy
object
oriented,
12 16. PRICE CODE
A03 18. SECURITY CLASSIFICATION 19. SECURITY CLASSIFICATIOlY 20. LIMITATION OF ABSTRACT OF THIS PAGE OF ABSTRACT Unclassified ;tandard Form 298(Rev. 2-89) Prescribed byANSIStd.Z39-18 298102