automated cfd parameter studies on distributed ... - NTRS - NASA

38 downloads 0 Views 186KB Size Report
NASA. Ames Research. Center. Moffett Field, California. 94035. Edward. Tejnil ... job. These include pre-proeessing input files, logging into the compute system.
AUTOMATED

CFD

PARAMETER

DISTRIBUTED

Stuart

E.

STUDIES

PARALLEL

Rogers,

Michael

COMPUTERS

Aftosmis,

and Center

NASA

Ames

Research

Moffett

Field,

California

Edward

Tejnil

and

ON

Shishir

Pandya

94035

Jasim

Ahmad

Eloret Moffett

Field,

California

94035

Introduction

As

Computational

Fluid

computational-resource run

large

costs

parameter

for a complete

Dynamics continue

studies.

aerospace

vehicle

depending

on

requirements

for viscous

CFD

advantage

wall-clock

time

are available comes trade

the

and

include there,

must

pre-proeessing executing

use of simple But when

a full-time

scripts

one wants

job,

is particularly

true

when

heterogeneous

compute

and most

thousands the

compute

platforms

of the

If enough

analysis control

databases.

attempting

to run

of mundane

tasks

into the compute

job, rather

tedious,

when

of these

tasks

of job,

a more

resources

at more

then

and

are

spread

out

the

the

this

including

study. CFD

than

With

ways,

parameter

more

or

platforms

study.

archiving

will help when sophisticated

to reduce

and copying

running

solvers

solution.

computing

for each

system

post-processing,

possible

of unique

a large

to

Computational

parameter

in a number

as

PC

of an Euler

parallel

a large

CFD

on a high-end

that

it becomes

and

methods

Euler

vehicle.

10 to 20 times

hour.

a number

the

to perform to run

and

files, logging

monitoring

complexity

matures,

to use CFD

of 5 to 20 hours

to perform

CFD

software

for inviscid

platforms, an

when

performing

input

and

This can become

arise

possible

the

possible

stability

of difficulties

it becomes

are typically

high-end

in building

and

order

computing

it becomes

time

and

to less than

of using

spend

solver

technologies

requirements

on the

solvers

solution

possibility

studies

the

are

of parallel

to a user,

A number a user

per

to drop,

Computational

workstation,

By taking

(CFD)

Often

job.

These

the input solver

files

output.

a few jobs.

The

running

a few dozen

jobs.

processes

is required.

This

over

a number

if different

one location. 1OF4

NASA

Information

Power

NASA

Computing,

Information

the

Computing,

software

and

objective

automated

the

the

need

the

use

for

possible.

process

user

and

user

of the

(IPG) and

Networking,

for security

The

Grid

monitoring

work

Such

a software

approach

taken

Systems and

jobs

computers system

and

been

The

software

single

a massive

developed,

tll_' under

IPG

provides

submission.

resources.

of every

under

Program,

Project.

a prototype

to populate has

job

focus

(CICT)

(CNIS)

on IPG

intervention

a major

Technologies

is to build

CFD

and

is currently

Communication

authentication,

of running

different

This

Information

current

of many

effort.

This

system

CFD

job.

run

and

system

matrix

should

will

remove

It should

in the

is known

which

enable

shortest

time

as the

AeroDB

script

several

discrete

mod-

system. The ules.

These

each

include

individual

rameter The

study.

following

AeroDB

and

The

details

section

on the

development

a database,

job,

for the

section

for the

of the

provides

lessons

was to build

module,

a run-manager

a job-launcher

a web-based

analysis

of AeroDB

user

design

the

results

AeroDB

a Remote AeroDB all the

Execution design

scripts

attributes. mand;

launch

in this effort,

The

remote the step

is shown

and

flow

database.

The

the

each job

in the database,

AeroDB

effort

by the

provided was

(GCS)

database.

also

CFD

and

included flow solvers

a mechanism

and

The

for the

jobs

used

for each

using

concludes

with

a

area.

for the

for re-running use of a Run

to monitor

convergence

flow-solver

to automatically

and

globusrun

run

user

Manager

until

library

execution the

monitored

converged

each

about

to see the restarting

the

for each

reporting

information

and

remaining

com-

from

and

while

for

and job

was executed

files,

an interface

and

the

files, executed

to enter

hub

on information

script

used

a mechanism

of the

information

using

output

Launcher,

communication

job based

the

a Job

A flowchart

all status

Execution

were

server,

as the

the input

scripts

development

pa-

section.

performed

in this

portal.

for execution

Remote

provided

a web

to store

post-staged

portal

provided the

and

pre-processed

and

was

work

of the

following

paper

on a MySql

was

resource

Broker.

web

script,

database

job-submission

The

service

it was used

compute

The

for future

progress

in the which

(RLV).

ideas

submitted

post-processed

jobs into

used

script

appropriate

and

1. The

presented

to monitor

Design

submission

interface;

job; it pre-staged

solver,

to the

web

Launcher

Services

compute

in Fig.

of the

study

vehicle

of a database

a job

the user

Job the

Common

consisted

script,

and

it chose

Grid

scripts

are

of a parameter

AeroDB The

for monitoring

of AeroDB

of a reusable

learned

portal

module

new

status

of

jobs.

The

which

was

time.

This

solution

obtained. 20F4

Fig.

1 AeroDB

flowchart.

Results

The

AeroDB

for a Liquid Cart3D hours

1000

We continued

13 different

shows and

globusrun

the

approximate resources

(LGBB)

had

cases

resources job

starting

number and

cases four

had

hours

function.

cases

Jobs 9:00

At

AM

over the

on

end

on each

time,

Many enough sent host.

No special-priority

of these

computing to each

and

Within

had

2863 These

study

Overflow

10th. cases

successfully.

job submissions

the

September

of this

locations.

parameter

both

100 Overflow

to obtain

used

for a large

to execute

completed

different

in order

of CPU

monitor

and

days.

of globusrun

their

at

completed

submissions number

used

at

and vehicle.

for seven

211 Overflow

compute

to execute

launched cases

the approximate

compute the

were

Cart3D

and

used

Booster

to execute

completed,

multiple

were

Glide-Back

flow solvers over

had

scripts

completed.

Cart3D cases cases time.

compute

Table

1 also

lists

queues

were

used

72

cases utilized

required Table

1

resource, the

other

on any

computers.

3OF4

of

Location

Host

Hardware

ARC

chapman.nas.nasa.gov

SGI

O3K

3489

25485

lomax.nas.nasa.gov

SGI

O3K

1074

15678

steger.nas.nasa.gov

SGI

O2K

477

8017

hopper.nas.nasa.gov

SGI

O2K

411

4702

evelyn.nas.nasa.gov

SGI

O2K

61

262

simak.nas.nasa.gov

Sun

Ultra

136

234

sharp.as.nren.nasa.gov

SGI

O2K

126

1014

aeroshark.as.nren.nasa.gov

Linux

PC

70

976

NCSA

modi4.ncsa.uiuc.edu

SGI

O2K

99

483

ISI

jupiter.isi.edu

SGI

O2K

GRC

# of Jobs

Hrs

21 5964

Total

CPU

212 57065

Conclusion

The tem, study,

final version

of the results and

of this paper

and

of suggestions

effectiveness for future

will include

more

of the current

details

approach

of the design for automating

of the

AeroDB

a CFD

sys-

parameter

improvements.

4OF4