Parallel Implementation of the Discontinuous Galerkin ... - CiteSeerX

25 downloads 6421 Views 600KB Size Report
Access. Help Desk at (301) 621-0134. • Phone the NASA Access Help Desk at .... platform and reported a 97.57% parallel efficiency on 256 processors. Be 3' et ...
NASA/CRICASE

1999-209546 Report

No.

99-35

Parallel Method

Implementation

Abdelkader

Baggag

Purdue

University,

Harold

Atkins

NASA

Langley

David

Keyes

Old Dominion Lawrence ICASE,

August

Hampton,

1999

West-Lafayette,

Research

University,

Livermore

of the Discontinuous

Virginia

Indiana

Center,

Hampton,

Norfolk,

National

Galerkin

Virginia

Virginia,

Laborato_,

Livermore,

California,

and

The

NASA

STI

Program

Offwe.

Since its founding, NASA has been dedicated to the advancement of aeronautics and space science. The NASA Scientific and Technical Information

(STI) Program

part in helping important

maintain

plays

a key

CONFERENCE

PUBLICATIONS.

Collected

papers

from scientific

technical seminars,

conferences, symposia, or other meetings sponsored

co-sponsored

this

and or

by NASA.

role.

The NASA Langley NASA's

NASA

Office

. . in Profile

STI Program

Research scientific

Office

is operated

by

SPECIAL

PUBLICATION.

technical,

or historical

substantial

largest collection of aeronautical and space science STI in the world. The Program Office is also NASA's institutional mechanism tbr

TECHNICAL

the results

of its research

public

interest.

TRANSLATION.

English-

language translations of foreign and technical material pertinent NASA's mission.

and

development activities. These results are published by NASA in the NASA STI Report Series, which includes the following report

Specialized

services

that help round

TECHNICAL completed

PUBLICATION. research

Reports

or a major

of

databases, organizing and publishing research results.., even providing videos.

For more inlormation Program

Office,

about

the NASA

Access

to be of continuing reference counter-part or peer-reviewed

Page at http://www.sti.nasa.gov/ST1homepage.html

professional papers, stringent limitations length and extent presentations. TECHNICAL

value. NASA formal

but having less on manuscript

of graphic

MEMORANDUM.

and technical

findings

papers, and bibliographies that contain minimal annotation. Does not contain analysis.

STI Program

Home

Email your question [email protected]



Fax your question to the NASA Help Desk at (301) 621-0134



Phone the NASA (301) 621-0390

via the Internet

Access

to

Access

Help Desk at

Write to: NASA

CONTRACTOR

the NASA



that are

or of specialized interest, release reports, working

STI

you can:

compilations of significant scientific and technical data and information deemed

extensive

out the

significant

phase of research that present the results of NASA programs and include extensive data or theoretical analysis. Includes

preliminary e.g., quick

scientific to

ST1 Program Office's diverse offerings include creating custom thesauri, building customized

types:

Scientific

from

NASA programs, projects, and missions, often concerned with subjects having

Center, the lead center |br and technical inlormation.

The NASA STI Program Office provides access to the NASA STI Database, the

disseminating

Scientific,

information

REPORT.

Scientific

technical findings by NASA-sponsored contractors and grantees.

and

Access

NASA Center 7121 Standard Hanover,

Help Desk for AeroSpace Drive

MD 21076-1320

Information

NASA/CRICASE

1999-209546 Report

No.

99-35

Parallel Method

Implementation

Abdelkader

Baggag

Purdue

Universi_,

Harold

Atkins

NASA

Langley

David

Keyes

Research

Old Dominion Lawrence

Livermore Hampton,

Institute

for

Operated

National Space

Virginia,

Laboratory,

Applications

Research

Center,

by Universities

Space

Livermore,

California,

and

Aeronautics

in Science Hampton, Research

and

Engineering

VA Association

and

Administration

Langley Research Hampton, Virginia

August

Virginia

Virginia

Computer

Langley

Hampton,

Norfolk,

National

Galerkin

Indiana

Center,

University,

ICASE,

NASA

West-Lafayette,

of the Discontinuous

1999

Center 23681-2199

Prepared for Langley Research under Contract NAS 1-97046

Center

Available

from the following:

NASA Cen_rlbr

AeroSpace

lnformation(CASl)

National

7121StandardDfive

5285 Po_

Hanover.

Spdngfield.

MD 21076-1320

(301)621-0390

Technicallnformation Royal Road

(703)487-4650

VA 22161-2171

Service(NTIS)

PARALLEL IMPLEMENTATION

OF

ABDELKADER

Abstract.

This

continuous

Galerkin

unstructured and

and

has

been

Sun

workstations.

The

Euler

Subject

that

accuracy

is required.

An

is that

within

each

in the

DG

element

thought

of as a separate

compact

form

allows and

of the

a heterogeneous even

the

Many arbitrary a very

of tile

smoothness which

greatly

features can

of tile

be

are

crucial

combined

Baggag

NASA Langley fDepartment

National (email:

Its

for tile

it well

keyes_k2.11nl,

gov).

are

implemented

in an

is MPI-based

and

slightly

clusters

of SGI

superlinear

object

stability

robustness

and

speedup

oriented,

robust

shown

on

unstructured

can accuracy

treatment

by the National

& Statistics,

of

of COml)lex

time-marching

methods,

Aeronautics

at the Institute

Branch,

and

have

been even

near

geometries. such

as

University, NASA

Research

degree

Norfolk, Langley

of approxinmtion the

course

form,

which

the

of the

NASA Contract

Center,

DG

inethod

disadvantages

No. NAS1-97046 ([CASE),

IN 47907-1398,

Lawrence

Hamptom

to

to the

These

VA 23681-2199, ISCR,

lead

treatment,

Engineering

West-Lafayette,

Hampton,

of

[1, 2, 3, 4, 5] for

imt)lementation.

VA 23529-0162, Research

also

special

and

be

compactness

problems,

in Science Building,

may

[6] to be insensitive

One

solution

The

over

without

under

The

neighbors.

proven

studies

Center,

its

in time

In semi-discrete

Science

ICASE,

the

Runge-Kutta.

finite-element

element

This

condition

1398 Computer Langley

fi'om

and

boundaries

Applications

usual

each

for nonlinear

for Computer

NASA

data

rigorously

boundary

element methods in which

element.

Thus,

element

refinement

any

generating

topology, to

finite accurate

the

platforms.

and Space Administration

Old Dominion

CA 94551-9989,

element

be applied

and

t)oundary

computer

and

corot)act

conq)utations

method

to the

some

in mesh

and

of high-order

elements.

element

dimensions,

been

and

local

properties

of spatial has

is, the from

robust

DG

neighboring

for parallel

vary

formulation

purdue, edu). Modeling and Simulation

Livermore,

SP2,

non-smooth

approaches

been

time-dependent

the

obtain

That

method.

and

to

Research Center, Hampton, VA 23681-2199. of Computer Sciences, Purdue University,

Laboratory,

show

is a

large-scale

are

to

suited

in the

were in residence

c. nasa. gov). of Mathemati(:s

IBM

developnmnt

[)etween

needs

can

number

for

looking

of prol)lems.

compact

explicit

was supported

and Keyes

merely

DG method

the

with

(email: baggag_cs, ¢Computational h. 1. atkins_lar §Department

mesh.

increases

*This research while

any The

method

equations

by

equations

accuracy

shapes,

method.

suited

resulting

makes

of rigor

method's

element robust

method

loss

Origin

on

implementation

strategies,

for the

distinction

the

of governing

without

is well

reconstructed that

(DG)

framework

important

treatment

choice

a calculation

method

entity

DG

SGI

t)arallelization

Galerkin

a practical

method

is not

the

has

parallel

Origin,

robustness

Dis-

Science

discontinuous

The

for

SGI

method.

parallelization

approaches

Tile

as the

and

Several

of the

Galerkin

accuracy

provides

grids.

accuracy

METHOD*

KEYES _

discontinuous

simulations.

such

method,

Computer The

method

method

Galerkin

high-order

Motivation.

its

scattering.

t)resented

AND DAVID

of the

symmetric

platforlns

GALERKIN

effects.

discontinuous

unstructured

high

to cache

t,

retains

del)endent and

results

ATKINS

that

aeroacoustic

parallel

classification.

projection

method

natural

simulate

scalat)ility

equations,

implenmntation

for time

most

to

due

words,

a parallel (:()mpact

on various

t)roblem

Key

1.

used

tested

HAROLD

suited

The

code

a fixed-size

is well

evaluated.

object-oriented

using

and

DISCONTINUOUS

BAGGAG¢.

descril)es

is a spatially

grids

studied

grids,

paper

THE

(email: Livermore

VA 23681-2199,

of tile

method

is its high

quadratm'e-fi'ee

implementation

Parallel

[7] applied

iwl)ercube

high

computational

a 97.57%

hp-adal)tive

DG method

for hyperbolic

speedul)s

In both

works,

The

when

the

the

grids

form

in an object-oriented

code

tim unsteady

and triangles) platfornls descrit)tion

of the

for the

linear

work,

three

elements

prediction

description

parallelization

interface

scattering

al)proaches

from

unstructured

obtained

nearly

is sufficiently

large.

and

complex

mesh

validated

objects

are described

[6, 9, 10]

configurations.

of mixed

elements

The (squares

[11] to several

(:an be found in reference

model

a

case.

has been ported

algorithm and

elements

Devine,

[8] iml)lemellted

They

in the latter implemented

by Atkins

routines

on a NCUBE/2

grids.

on a general

parallelization

Biswas,

wave equation

laws on structured

of aeroacoustic

developed

investigators.

Be 3' et al.

been previously

of the numerical

a recently

on 256 processors.

cell sub-division

has

equations

by other to a scalar

to subdomain

with

The DG code developed

A detailed

different

type

DG method

Euler

of tim code structure,

apl)roach

efficiency

however,

concerns.

DG method

conservation

of interior

in two dimensions.

using MPI.

parallel

were of a Cartesian

quadrature-free

(',()de solves

ratio

these

have been performed

quadrature-based

and reported

requirements;

ameliorated

of the DG method

a third-order

platform

optinml

In this

and

[6] has greatly,

implenmntations

and Flaherty

paralM

storage

can be found

and efficiency

parallel

[6]; and the

in reference

results

for the

[11].

selected

are rel)orted.

Ttm next

section

of parallelization Origin200(} 2.

provides

strategies,

and several

a brief

a citation

other

Discontinuous

description

of our standard

computing

Galerkin

of the

numerical

test case,

method

and

is followed

and performance

results

by a discussion

of the (:ode on the

platforms.

Method.

The DG method

is readily

applied

to an)' equation

of the form

OU

--Ot + v. Y(u) = 0.

(2.1) on a domain The

DG

that

has been divided

method

element,

where

al)proximating

is defined

into arbitrarily

by choosing

N is a function the solution

shaped

a set of local

of the

nonoverlapping

basis

local polynomial

in the element

in terms

functions

order

of the

elements

p and

basis

_

that

(:over the domain.

t3 = {bt, 1 < l < N(p,d)} the number

of space

for each

dimensions

d, and

set

N(p,d)

(2.2)

U_

_ I,_ _

E

vi,lb_.

I=l

The governing

equation

(2.3)

ot.--_-

where

t_j, 01l,j

'ffi¢ is the

unit

is the segment

of the

generates

a set

a lower

dimensional

approximate

provides

b_is

Riemann

Because double

of equations

valued

each and

the only

in element

of the vector

approximate

meinber

_i,

element on O_ij,

solution

governing

has a distinct

mechanism

The by which

_

denotes

and

that

Vi and

'Ui.l are the

these

set b_ associated

discontinuous.

basis set. and cast. in a weak

bkYn(I_,

boundary

with

local

the approximate

Vj denote

trace

/_R denotes

=

in a neighboring

to the lmighboring

and

local

are

a numerical

element

of the solutions

the

quantities

to give

O,

solution

the trace

unknowns

The

Kijds

form

integral

expressed

flux

which

t_j,

on 0tl O. projection in terms

of

is usually

an

type. approxinmte

approximate adjacent

t)t_j.

I5).

is common

new

unknowns.

flux of the Lax-Friedriehs

element

of the

Vbk . ,K(V_) dFt +

solution

outward-normal

coefficients

onto each

df_ -

I'} is the approximate

element

The

is projected

solution,

Rienmnn elements

the

flux PR(V_,

communicate.

solution

on each

_)

resolves

The

fact that

interior

edge

the discontinuity this communication

is

and

occursin anedgeintegralmeanstile solutionin a givenelementV/ neighboring solution

solution within

additional

each

element

whole

is stored

of the

neighboring

as a function

solution

the

edge

1/).

trace

only on the Also,

of the

edge trace

because

solution

of the

tile approximate

is obtained

without

al)proximations.

The the

Vj, not on the

depends

DG method

quadrature-free

is efficiently

iml)lemented

formulation.

tile flux vector

In the

/_ is approximated

approxiumted

in terms

of the

on general

unstructured

quadrature-free

in terms

lower basis

of the

grids

formulation,

basis

set

develol)ed

hi, and

the

f(l',)

_

these

leading

Shu in [6], flux

_I¢ is

N(p,d-1)

E

f,'

bt,

fin(V/,

I._). ,F, -=

Z

?/_,l /Tt.

/=1

approximations,

of by quadrature,

and

Riemann

using

set bt:

/=1

With

of accuracy

by Atkins

apI)roximate

N(p,d)

(2.4)

to any order

the volume

to a simple

and

sequence

boundary

integrals

of matrix-vector

can

be evahtated

analytically,

instead

operations --R

(2.5)

Olvi,i] Ot

_

(M-'A)[.f,,t]

-

E(M-1Bo)[.fo.t], {A

where

The

matrices

solution

p.

M,

A, and

Bij

Thus,

the set

of matrices

and applied residual

to all elements

of equation

depend

that

map

only

associated

=

shat)e

with

of tile similarity

a particular

to it at a considerat)le

(2.5) is evaluated [i?O.t]

on the

by the tbllowing T 0 [u_,,]

similarity

saxqngs

sequence

element

and

element

of both

storage

the

degree

of the

can be precomputed and

computation.

The

of operations:

_

Vfli,

J R

0[v,,l] Ot where

T_j is the trace

on edge j. between 3.

Edges

-

operator,

(M-1A)

and

[f,,tl

E (M{J}

[()_j,t] denotes

will be referred

to as interior

Computation.

In this

B_j)[

a vector

O,l]

containing

edges or boundary

the

coefficients

edges when

of an edge quantity

it is necessary

to distinguish

the two. Parallel

DG method

are described.

flux calculations.

The first

The second

approach

third

three

approaches

occurs

stages

making

it more

of the implementation.

Tile

following

any other summarized

edge.

The first

as follows:

any element, approach

let. cOftp denote is symmetric

and

possible

parallelization

and easy to implement

eliminate

the complexity

Let ft denote

different

is symmetric

communication

mentation.

in two

and

section,

the redundant

difficult notation

to overlap

results

flux calculations; with

will be used

computation, to describe

any edge on the partition is easily

but

strategies

implemented

boundary, in the

in redundant however, and

the

serial

for the

the

increasing

parallel

imple-

and 0f_l

denote

(:ode.

It can be

1. Compute[vj,I}and[fj,t]

gft

2. Send[_j.l]and[fg.t]on 0_

v to neighboring

3. Compute 4.

Receive

5.

Conlpute

[ft] and

(M-iA)[f,]

[i_j.l] and [7j{l]

In this approach,

VfL and

[fj.l} on 0Q v from

'gO{*p

[]_5,1 V/)_,

neighboring

(M-IBj)[7_,]

and

nearly

l,artitions

partitions

Vf_

all of tile computation

is scheduled

to occur

between

tile nonblocking

send

and

--R

receive:

however,

tile edge flux [fja]

is doubly

computed

oil all partition

edges

0t_ v. It is observed

in actual

--R

conqmtations

that

OFtU, represents a fraction

of these

The above however,

is not a significant

or not they

can also

boundary

communication

the actual

offers the

to whether

partition

reflects

approach

edge integral

calculation

CPU

and

for further

are adjacent

f_1 denote

4. Compute

[f_] and (M-1A)Ill]

5. Compute

(M-IBj)[f_l]

to a partition the send

any other

element.

[Fj.l] and

[f#]

7. Compute

Other

denote in the

[fj,,}

[fj,l]

Vt_,,

redundant

processor.

V_,

and

on OFtv from and

receiw_.

The

to be shown

the elements

into groups

some of the work associated

Let Qv denote

following

sequence

any element

provides

later;

with

adjacent

inaxinlal

the to a

overlap

of

boundary The

processor

partitions

} =

that

shared

by processors

be given

Two

variations

are described.

any

A and B. For the A. Thus

{ 0 } in the first variation.

parallelization

approaches, and

the flux calculation

processors

A, 0f_(pb) denote

to processor

processor

are owned

by two

of all alternative

In these

by one

performs

by two processors

of the edges

any edge shared

{ 0 f_(p_b) } N { 0 _lv

V0_),

V_v

is performed

by processor

_(b)

[f_d]

neighboring

Strategies.

any edge owned

variation

partitions

(M-_B¢)[fJa]

flux calculations

all edges shared

ownership

-

steps:

By collecting

boundary,

and

the results

--R

[_R/] on a partition

first

on ()lily

V_1

Parallelization

the

first variation,

denote

is performed

V_l

--R

variation,

[11] used to generate improvenmnt.

between

on Of_ v to neighboring

[Vj,l] and

neighboring

of [fia,l] on all edges,

computation

a,ld

3. Compute

6. Receive

flux

The calculation

and computation.

2. Send [_3,_] and [fj,t]

3.1.

factor.

and tile redundant

inlplenmntation

potential

be performed

1. Compute

eliminates

time,

edges.

sequence

this

according

redundant

only 2% to 3% of the total

the

strategy

the computation result

is said

is commmlicated to "own"

is divided edge owned purpose

equally

between

by adjacent

of illustration,

to the

the edge.

by only one of the two processors.

In the

the two.

processor let ownership

that

of the edge

Let

B, and

In the second 0Ft (a) Of_ (ab)

of all OQ (ab)

{ cOf_(a) } vI { 0_ (b) } = { 0 } in both

variations,

Both

in tile following

computations

can be summarized

and

Process

A

[vj,l] and

Process

[fj.l]

Vf_p

Compute

1.

Compute

2. a.

Send [_j.,[and [7j.,} VO_,',Oo (')oo, Comp,_t_[_j,_}a,td [yj._] V_,

S(,nd [_j.,]and [?j.,] V0_\0_ Compute [_,.t]a,_d [Y,.L]V_

4.

Compute

Conq,ute

5.

Receive

[ft] V_ [Fj,t] and

[f_,,}

Vt)_tl; ')

Receive

[_j,t] and

6.

Compute Send

Ira,t]

[Fj.t] and

V/)_I _)

Compute

[fj,l]

V0ft{,, _1

[fa,,]

Send

V/)_,

9.

Compute

10.

Comlmt

e (M_IBj)

11.

Receive

[,'j.,]

1_9.

Compute

amount

because

(M-_A)[f_]

computation. balancing is twice

(M-'B,)[7_¢.,I

of data

these

sent

the

is actually

associated

total

V0ft(p '1

symmetric nmnber

there

less than

Compute

(M-_A)[f_]

Compute

(M-IBj) [. j,t]

are no redundant that

difficult

in this

approach

-1¢

g/)a,,

\ 0 Vt_, In I)oth variations,

apl)roach

earlier.

to overlap

presented

the communication

in the first variation

flux calculation.

however,

Vg,

flux calculations.

of the symmetric

it is more

V0_t,

(M-'Bj)[7_,I

form of edge ownershil) the edge

apl,roach; of sends

[fo,t]

Colllpllt(,

in two stages,

with

Compute

Receive

Vt_,,

Also, the m_symmetric

of the

v_

strategi(,s,

are perforined

the work that

variatkm,

V_ -u [fa._]

VOW2,, \ i)_l_, '1

under

the sends

V0_(p t')

--l't

[fj,t]

It is

assigned

to each

perforniance

a valuable

100

figures

performance

improvenmnt. Three nmthod

larger

boundary.

range

cache-resident. observed;

The

however,

is supported on each

processor.

number

of time

The steps

good

"speedup"

measure.

computational

strategy

communication. with

parts

as the number not

shows

will improve

rate is defined wall clock

per processor.

rate

(Note:

introduces

a redundant

redundant

strategies

it is more difficult

the location

number the

and thus,

and

no SUl)erlinear

of processors of the

become

is increased.

is time

is

This

of elements

per processor

calculation

a

speeduI)

number

of elements

for

when

and workingsets

for all three indicator

but maximizes

to hide communication

occurs

times

accurate,

all

the wall clock time of all processors

DG method

can be eliminated

is obtained

that

show a 50% increase rates

for the

speedup

performance

as a function

may be a better

flux calculation,

flux calculation

size and by varying

is increased,

because

size problems

rate

a fifth-order

as the number

The comt)utational

computational

cases

superlinear

in cache

as the average

time.

For these

in 128 processors

computational

the small and medium

Thus,

a slight

of processors

fit in cache

the

Origin2000.

the element

size problems,

performance

parallelization

approach

by varying

is due to the improvement

Three

The this

by the

on the

at each stage of the Runge-Kutta,

Both

scalability.

Conclusions.

which

divided

and medium

does

that

(5.1.b)

2000 elenmnts

indicating

however,

it is expected

the same.)

at about

synmmtric

l)robtem

are synchronized

is essentially

6.

into smaller

larger

by Figure

processors

This result

is divided

the code

size was controlled

In the small

of processors.

fixed problem

rate

were used to evaluate

was used and the l)roblem

of the outer some

t)roblems

in the computational

problem

of scalability

have

been

than

the overlap

described.

the usual

A simple

of computation

by introducing with

sizes are similar

computation,

a sequence and

and

of sends; it leads to a

complex

implementation.

symmetric using

the

parallelization

MPI.

method

provides

compact

of the

character

of

speedup

are

the

that

overhead

due

for several

can

be

problems.

occur

when

work

in the

This

redundant

hide

are

cache

to

each

(_lement,

l)arallel

overlat)ping

code

platforms.

The

this

eonmlunication

The

The

aeroacoustics

parallel

the

implementation. the

is negligible.

computational

is local

to

to

calculation

memory

that

used

is attributed

workingsets

flux

distributed

effectively

is exploited

large

tile

object-oriented

of computational

which

method

to

in an

presented

amount

method

for

accelerations

the

is implemented

results

a significant

form

superlinear

practice,

approach

Performance

conlI)act

cache

In

is due

overhead.

to The

imt)lementation

of

DG

gives

('omputatioI_

as

well

as

resident.

REFERENCES

[1] C.

[2] B.

JOItNSON

AND

Hyperbolic

Equation,

COCKBURN

Method

No.

(1989),

[3] B. COCKBURN, Finite

AND C.

matics

of Conlputation,

ATKtNS

[8] K.

K.

J.

ODEN,

BEY,

T.

H.

L.

[11]

A.

BAGGAG

ATKINS,

Paper

tured for NASA mark

Continued

Local

97-2032, H.

Scientific

Problems,

Galerkin pp.

Method

Discontinuous

Framework,

for

a Scalar

1 26.

ProjectioTI

Runge-Kutta

Laws 90

Galerk#_

Mathematics

Finite

of Coml)utation,

52,

III:

Laws

IV:

DiscoTttinuous

Systems,

Galerkin

,Journal

of Coral)U-

Projection

Discontinuous

Galerkin

Multidimensional

pp.

for

531

Case

Mathematics

of

Discontinuous

A

Pl).

14,

Numerical of the

of DiscoTttinuous

775

PaT"allel.

parallel

Adaptive No.

of Shock

Capturing

C)ZTURAN,

AND D.

Finite

1-3 (1994),

hp-adaptive Mathematics,

May

Methods,

Mathe-

Galerkin

Method

for

782.

Discontinuous

97-1581,

Galerkin

538.

hnplementation

PATRA,

Paper

Local The

Inequality

(1994),

Mathematics,

Applied

Dimensional

Runge-Kutta

36 (1998),

laws,

07to

Projection

545-581.

Journal,

A.

Local

113. The

Quadrature-flee

AND

Element pp.

discontinuous 20

Galerkin

(1996), Method

Methods

for

Con-

225-283. Galerkin pp.

321

for

Computational

method

for

336. AeTva-

1997. Using

Discontinuous

Galerkin

Methodology,

AIAA

1997. C.

Solver,

Coml)uting,

CONFERENCE

(1986),

Local

AND .]'. FLAHERT¥,

Analysis

ATKINS,

Aeroacoustics

Diseonti_uous

, 46

General

Entropy

206

Numerical

AIAA

June

TVB

pp.

Development

Applications,

SHI!,

pp.

Cell No.

SHU,

conservation

L. ATKINS,

[10]

On

DEVINE,

S.

II:

V_z. SHU,

(1990),

AIAA

Applied

coustie

[12]

W.

Laws,

Laws

Conservation

62,

servation

hyperbolic [9] H.

D.

for 190

Equations,

BISWAS,

1 (1989),

AND C.

SHU,

AND C.

Hyperbolic [7] R.

\V.

the

Runge-Kutta

for" Conser_ation

No.

,]INN("

L.

AND C. \¥.

Method 54,

of

435.

LIN,

Hot:,

Element

TVB

Conservation

84, No.

S.

Analysis

of Coml)utation

SHI!,

411

Meth.od

Computation,

[6] H.

_V.

for

Physics,

Finite

[5] G.

C.

S. Y.

COCKBURN,

An,

Mathenlatics

pp.

ElemeT_t

rational [4] B.

PITK_t(ATA,

AND

Element 186

.].

ICASE San

Antonio,

PUBLICATION Proceedings

Report

KEYES,

No. TX,

3352,

of a workshop

99-11,

March

Second

Parallelization Ninth 22-24,

SIAM

Object-o_ented

Conference

on Parallel

Aeroacoustics

Workshop

UnstrucProcessing

1999.

Computational

sponsored

of an

by NASA,

June

1997.

on

Bench-

REPORT

DOCUMENTATION

FormApproved OMBNo. 0704-0188

PAGE

Publicreporting burden forthis collection of information is estimated toaverage 1 hourper response, including thetimeforreviewing instructions, searching existing datasources, gatheringandmaintainln_thedataneeded,andcompleting andreviewing thecollection of information. Sendcomments regarding thisburdenestimateoranyotheraspectof this collection of nformation,includingsuggestions forreducing this burden,to Washington Headquarters Services, DirectorateforInformationOperations andReports, 1215JefTerson DavisHighway, Suite1204,Arlington, VA222024302,andto the Officeof Management andBudget,PaperworkReductionProject(0704-0188), Washngton,DC 20503. 1. AGENCY USE ONLY(Leaveblank) 2. REPORT DATE August

3. REPORT TYPE AND DATES COVERED

1999

Contractor

Report,

4. TITLE AND SUBTITLE ParalM

5. FUNDING NUMBERS

implementation

of the discoutinuous

Galerkin

nlethod

C NAS1-97046 WU 505-90-52-01

6. AUTHOR(S) Ab(Mkader Baggag Harold Atkins David Keyes 7. PERFORMING ORGANIZATION Institute

for COml)uter

Applications

Mail Stop

132C,

NASA

Hampton,

\\4, 23681-2199

Langley

9. SPONSORING/MONITORING National Langley

Aeronauti(s Research

Hampton,

8. PERFORMING ORGANIZATION REPORT NUMBER

NAME(S) AND ADDRESS(ES) in Science Research

and Engineering

Center

ICASE

Space

No. 99-35

10. SPONSORING/MONITORING AGENCY REPORT NUMBER

AGENCY NAME(S) AND ADDRESS(ES)

and

Report

Adlninistration

NASA/CR-1999-209546 ICASE Report No.

Center

VA 23681-2199

99-35

11. SUPPLEMENTARY NOTES Langley Technical Final Report Submitted to tile

Monitor:

Delmis

Proceedings

12a. DISTRIBUTION/AVAILABILITY Unclassified

M. Bushnell

of tile hlternational

ParalM

CFD

1999 Conference 12b. DISTRIBUTION CODE

STATEMENT

Uldimited

Subject Category 60, 61 Distribution: Nonstandard Availability:

(301) 621-0390

NASA-CASI

13. ABSTRACT (Maximum 200 words) This paper descril)es a parallel ilnplementation of the discontinuous GaJerkin nlethod. Discontinuous Galerkin is a spatially compact method that retains its accuracy and robustness o11 non-smooth unstructured grids and is well suited for tilne dependent simulations. Several parallelization approaches are studied and evaluated. The most natural and synmletric of the approaches has been implemented in an object-oriented code used to sinmlate aeroacoustic scattering. The parallel implelnentation is MPI-based and has been tested on various paraJ.lel platforms such as the SGI Origin, IBM SP2, and clusters of SGI and Sun workstations. The scalability results presented for the SGI Origin show slightly superlinear speedup on a fixed-size problenl due to cache effects.

14. SUBJECT TERMS discontinuous Galerkin unstructured

grids,

15. NUMBER OF PAGES methods,

Euler

17. SECURITY CLASSIFICATION OF REPORT Unclassified NISN 7540-01-280-5500

equations,

parallelization high-order

strategies, accuracy

object

oriented,

12 16. PRICE CODE

A03 18. SECURITY CLASSIFICATION 19. SECURITY CLASSIFICATIOlY 20. LIMITATION OF ABSTRACT OF THIS PAGE OF ABSTRACT Unclassified ;tandard Form 298(Rev. 2-89) Prescribed byANSIStd.Z39-18 298102