Tutorial 25. Parallel Processing

730 downloads 0 Views 995KB Size Report
partition interfaces. FLUENT's solver allows parallel processing on a dedicated parallel machine, or a network of workstations running Linux, UNIX, or Windows.
Tutorial 25.

Parallel Processing

Introduction This tutorial illustrates the setup and solution of a simple 3D problem using FLUENT’s parallel processing capabilities. In order to be run in parallel, the mesh must be divided into smaller, evenly sized partitions. Each FLUENT process, called a compute node, will solve on a single partition, and information will be passed back and forth across all partition interfaces. FLUENT’s solver allows parallel processing on a dedicated parallel machine, or a network of workstations running Linux, UNIX, or Windows. The tutorial assumes that both FLUENT and network communication software have been correctly installed (see the separate installation instructions and related information for details). The case chosen is the mixing elbow problem you solved in Tutorial 1. This tutorial demonstrates how to do the following: • Start the parallel version of FLUENTusing either Linux/UNIX, or Windows. • Partition a grid for parallel processing. • Use a parallel network of workstations. • Check the performance of the parallel solver.

Prerequisites This tutorial assumes that you are familiar with the menu structure in FLUENT and that you have completed Tutorial 1. Some steps in the setup and solution procedure will not be shown explicitly.

c Fluent Inc. September 21, 2006

25-1

Parallel Processing

Problem Description The problem to be considered is shown schematically in Figure 25.1. A cold fluid at 20◦ C flows into the pipe through a large inlet, and mixes with a warmer fluid at 40◦ C that enters through a smaller inlet located at the elbow. The pipe dimensions are in inches, and the fluid properties and boundary conditions are given in SI units. The Reynolds number for the flow at the larger inlet is 50,800, so a turbulent flow model will be required.

Density: Viscosity: Conductivity: Specific Heat:

ρ µ k Cp

= = = =

1000 kg/m3 8 x 10 −4 Pa−s 0.677 W/m−K 4216 J/kg−K

8"

4"

Ux = 0.4 m/s T = 20oC I = 5%

1"

4" Dia. 3"

1" Dia.

8" Uy = 1.2 m/s T = 40oC I = 5%

Figure 25.1: Problem Specification

25-2

c Fluent Inc. September 21, 2006

Parallel Processing

Setup and Solution Preparation 1. Download parallel_process.zip from the Fluent Inc. User Services Center or copy it from the FLUENT documentation CD to your working folder (as described in Tutorial 1). 2. Unzip parallel_process.zip. elbow3.cas can be found in the parallel process folder created after unzipping the file. You can partition the grid before or after you set up the problem (define models, boundary conditions, etc.). It is best to partition after the problem is set up, since partitioning has some model dependencies (e.g., sliding-mesh and shell-conduction encapsulation). Since you have already followed the procedure for setting up the mixing elbow in Tutorial 1, elbow3.cas is provided to save you the effort of redefining the models and boundary conditions.

Step 1: Starting the Parallel Version of FLUENT Since the procedure for starting the parallel version of FLUENT is dependent upon the type of machine(s) you are using, four versions of this step are provided here. Follow the procedure for the machine configuration that is appropriate for you. • Step 1A: Multiprocessor Windows, Linux, or UNIX Computer • Step 1B: Network of Windows, Linux, or UNIX Computers

Step 1A: Multiprocessor Windows, Linux, or UNIX Computer You can start the 3D parallel version of FLUENT on a Windows, Linux, or UNIX machine using 2 processes by performing either of the following steps: • At the command prompt, type fluent 3d -t2 See Chapter 31 of the User’s Guide for additional information about parallel command line options.

c Fluent Inc. September 21, 2006

25-3

Parallel Processing

• For Linux or UNIX, at the command prompt, type fluent. For Windows, type fluent -t2.

!

Do not specify any argument (e.g., 3d).

1. Specify the 3D parallel version. File −→Run...

(a) Enable the 3D and the Parallel options in the Versions group box. (b) Set Processes to 2 in the Options group box. (c) Retain the selection of Default in the Interconnect drop-down list. (d) Click Run.

Step 1B: Network of Windows, Linux, or UNIX Computers You can start the 3D parallel version of FLUENT on a network of Windows, Linux, or UNIX machines using 2 processes and check the network connectivity by performing the following steps: 1. Start parallel FLUENT. • At the command prompt, type fluent 3d -t2 -cnf=fluent.hosts

25-4

c Fluent Inc. September 21, 2006

Parallel Processing

where -cnf indicates the location of the hosts text file. The hosts file is a text file that contains a list of the computers on which you want to run the parallel job. If the hosts file is not located in the directory where you are typing the startup command, you will need to supply the full pathname to the file. For example, the fluent.hosts file may look like the following: my_computer another_computer See Chapter 31 of the User’s Guide for additional information about hosts files and parallel command line options. • For Linux or UNIX, at the command prompt, type fluent. For Windows, type fluent -t2.

!

Do not specify any additional arguments (e.g., 3d).

(a) Specify the 3D network parallel version. File −→Run...

i. Enable the 3D and the Parallel options in the Versions group box. ii. Retain the default value of 1 for Processes in the Options group box. iii. Specify the name and location of the hosts text file in the Hosts File text box.

c Fluent Inc. September 21, 2006

25-5

Parallel Processing

iv. Retain the selection of Default in the Interconnect drop-down list. v. Click Run. 2. Check the network connectivity information. Although FLUENT displays a message confirming the connection to each new compute node and summarizing the host and node processes defined, you may find it useful to review the same information at some time during your session, especially if more compute nodes are spawned to several different machines. Parallel −→Show Connectivity...

(a) Set Compute Node to 0. For information about all defined compute nodes, you will select node 0, since this is the node from which all other nodes are spawned. (b) Click Print. -----------------------------------------------------------------------------ID Comm. Hostname O.S. PID Mach ID HW ID Name -----------------------------------------------------------------------------n1 mpich2 another_computer Windows-32 21240 1 1 Fluent Node host net my_computer Windows-32 1204 0 3 Fluent Host n0* mpich2 my_computer Windows-32 1372 0 0 Fluent Node ------------------------------------------------------------------------------

ID is the sequential denomination of each compute node (the host process is always host), Comm. is the communication library (i.e., MPI type), Hostname is the name of the machine hosting the compute node (or the host process), O.S. is the architecture, PID is the process ID number, Mach ID is the compute node ID, and HW ID is an identifier specific to the communicator used. (c) Close the Parallel Connectivity panel.

25-6

c Fluent Inc. September 21, 2006

Parallel Processing

Step 2: Reading and Partitioning the Grid When you use the parallel solver, you need to subdivide (or partition) the grid into groups of cells that can be solved on separate processors. If you read an unpartitioned grid into the parallel solver, FLUENT will automatically partition it using the default partition settings. You can then check the partitions to see if you need to modify the settings and repartition the grid. 1. Inspect the automatic partitioning settings. Parallel −→Auto Partition...

If the Case File option is enabled (the default setting), and there exists a valid partition section in the case file (i.e., one where the number of partitions in the case file divides evenly into the number of compute nodes), then that partition information will be used rather than repartitioning the mesh. You need to disable the Case File option only if you want to change other parameters in the Auto Partition Grid panel. (a) Retain the Case File option. When the Case File option is enabled, FLUENT will automatically select a partitioning method for you. This is the preferred initial approach for most problems. In the next step, you will inspect the partitions created and be able to change them, if required. (b) Click OK to close the Auto Partition Grid panel. 2. Read the case file elbow3.cas. File −→ Read −→Case...

c Fluent Inc. September 21, 2006

25-7

Parallel Processing

3. Display the grid (Figure 25.2). Display −→Grid...

Y Z

X

Grid

FLUENT 6.3 (3d, pbns, rke)

Figure 25.2: Grid Along the Symmetry Plane for the Mixing Elbow

4. Check the partition information. Parallel −→Partition...

(a) Click Print Active Partitions.

25-8

c Fluent Inc. September 21, 2006

Parallel Processing

FLUENT will print the active partition statistics in the console. >> 2 Active Partitions: P Cells I-Cells Cell Ratio 0 11329 1900 0.168 1 11329 359 0.032

Faces I-Faces Face Ratio Neighbors 37891 2342 0.062 1 38723 2342 0.060 1

---------------------------------------------------------------------Collective Partition Statistics: Minimum Maximum Total ---------------------------------------------------------------------Cell count 11329 11329 22658 Mean cell count deviation 0.0% 0.0% Partition boundary cell count 359 1900 2259 Partition boundary cell count ratio 3.2% 16.8% 10.0% Face count Mean face count deviation Partition boundary face count Partition boundary face count ratio

37891 -1.1% 2342 6.0%

38723 1.1% 2342 6.2%

74272 2342 3.2%

Partition neighbor count 1 1 ---------------------------------------------------------------------Partition Method Principal Axes Stored Partition Count 2 Done.

Note: FLUENT distinguishes between two cell partition schemes within a parallel problem—the active cell partition, and the stored cell partition. Here, both are set to the cell partition that was created upon reading the case file. If you repartition the grid using the Partition Grid panel, the new partition will be referred to as the stored cell partition. To make it the active cell partition, you need to click the Use Stored Partitions button in the Partition Grid panel. The active cell partition is used for the current calculation, while the stored cell partition (the last partition performed) is used when you save a case file. This distinction is made mainly to allow you to partition a case on one machine or network of machines and solve it on a different one. See Chapter 31 of the User’s Guide for details. (b) Review the partition statistics. An optimal partition should produce an equal number of cells in each partition for load balancing, a minimum number of partition interfaces to reduce interpartition communication bandwidth, and a minimum number of partition neighbors to reduce the startup time for communication. Here, you will be

c Fluent Inc. September 21, 2006

25-9

Parallel Processing

looking for relatively small values of mean cell and face count deviation, and total partition boundary cell and face count ratio. (c) Close the Partition Grid panel. 5. Examine the partitions graphically. (a) Initialize the solution using the default values. Solve −→ Initialize −→Initialize... In order to use the Contours panel to inspect the partition you just created, you have to initialize the solution, even though you are not going to solve the problem at this point. The default values are sufficient for this initialization. (b) Display the cell partitions (Figure 25.3). Display −→Contours...

i. Enable Filled in the Options group box. ii. Select Cell Info... and Active Cell Partition from the Contours of drop-down lists. iii. Select symmetry from the Surfaces selection list. iv. Set Levels to 2, which is the number of compute nodes. v. Click Display and close the Contours panel. As shown in Figure 25.3, the cell partitions are acceptable for this problem. The position of the interface reveals that the criteria mentioned earlier will be

25-10

c Fluent Inc. September 21, 2006

Parallel Processing

1.00e+00

5.00e-01

Y 0.00e+00

Z

X

Contours of Active Cell Partition

FLUENT 6.3 (3d, pbns, rke)

Figure 25.3: Cell Partitions matched. If you are dissatisfied with the partitions, you can use the Partition Grid panel to repartition the grid. Recall that, if you wish to use the modified partitions for a calculation, you will need to make the Stored Cell Partition the Active Cell Partition by either clicking the Use Stored Partitions button in the Partition Grid panel, or saving the case file and reading it back into FLUENT. See Section 31.5.4 of the User’s Guide for details about the procedure and options for manually partitioning a grid. 6. Save the case file with the partitioned mesh (elbow4.cas). File −→ Write −→Case...

c Fluent Inc. September 21, 2006

25-11

Parallel Processing

Step 3: Solution 1. Initialize the flow field using the boundary conditions set at velocity-inlet-5. Solve −→ Initialize −→Initialize...

(a) Select velocity-inlet-5 from the Compute From drop-down list. (b) Click Init. A Warning dialog box will open, asking if you want to discard the data generated during the first initialization, which was used to inspect the cell partitions. (c) Click OK in the Warning dialog box to discard the data. (d) Close the Solution Initialization panel. 2. Enable the plotting of residuals during the calculation. Solve −→ Monitors −→Residual... 3. Start the calculation by requesting 200 iterations. Solve −→Iterate... The solution will converge in approximately 180 iterations. 4. Save the data file (elbow4.dat). File −→ Write −→Data...

25-12

c Fluent Inc. September 21, 2006

Parallel Processing

Step 4: Checking Parallel Performance Generally, you will use the parallel solver for large, computationally intensive problems, and you will want to check the parallel performance to determine if any optimization is required. Although the example in this tutorial is a simple 3D case, you will check the parallel performance as an exercise. See Chapter 31 of the User’s Guide for details. Parallel −→ Timer −→Usage Performance Timer for 179 iterations on 2 compute nodes Average wall-clock time per iteration: 0.574 sec Global reductions per iteration: 123 ops Global reductions time per iteration: 0.000 sec (0.0%) Message count per iteration: 70 messages Data transfer per iteration: 0.907 MB LE solves per iteration: 7 solves LE wall-clock time per iteration: 0.150 sec (26.1%) LE global solves per iteration: 2 solves LE global wall-clock time per iteration: 0.001 sec (0.1%) AMG cycles per iteration: 12 cycles Relaxation sweeps per iteration: 479 sweeps Relaxation exchanges per iteration: 141 exchanges Total wall-clock time: Total CPU time:

102.819 sec 308.565 sec

The most accurate way to evaluate parallel performance is by running the same parallel problem on 1 CPU and on n CPUs, and comparing the Total wall-clock time (elapsed time for the iterations) in both cases. Ideally you would want to have the Total wall-clock time with n CPUs be 1/n times the Total wall-clock time with 1 CPU. In practice, this improvement will be reduced by the performance of the communication subsystem of your hardware, and the overhead of the parallel process itself. As a rough estimate of parallel performance, you can compare the Total wall-clock time with the CPU time. In this case, the CPU time was approximately 3 times the Total wall-clock time. For a parallel process run on two compute nodes, this reveals very good parallel performance, even though the advantage over a serial calculation is small, as expected for this simple 3D problem. Note: The wall clock time, the CPU time, and the ratio of iterations to convergence time may differ depending on the type of computer you are running (e.g., Windows32, Linux 64, etc.).

c Fluent Inc. September 21, 2006

25-13

Parallel Processing

Step 5: Postprocessing See Tutorial 1 for complete postprocessing exercises for this example. Here, two plots are generated so that you can confirm that the results obtained with the parallel solver are the same as those obtained with the serial solver. 1. Display an XY plot of temperature across the exit (Figure 25.4). Plot −→ XY Plot...

(a) Select Temperature... and Static Temperature from the Y Axis Function dropdown lists. (b) Select pressure-outlet-7 from the Surfaces selection list.

25-14

c Fluent Inc. September 21, 2006

Parallel Processing

(c) Click Plot and close the Solution XY Plot panel. pressure-outlet-7 3.01e+02 3.00e+02 2.99e+02 2.98e+02

Static Temperature (k)

2.97e+02 2.96e+02 2.95e+02 2.94e+02 2.93e+02 3.5

Y Z

X

Static Temperature

4

4.5

5

5.5

6

6.5

7

7.5

8

Position (in)

FLUENT 6.3 (3d, pbns, rke)

Figure 25.4: Temperature Distribution at the Outlet 2. Display filled contours of the custom field function dynam-head (Figure 25.5). Display −→ Contours...

(a) Select Custom Field Functions... from the Contours of drop-down list.

c Fluent Inc. September 21, 2006

25-15

Parallel Processing

The custom field function you created in Tutorial 1 (dynam-head) will be selected in the lower drop-down list. (b) Enter 80 for Levels. (c) Select symmetry from the Surfaces selection list. (d) Click Display and close the Contours panel. 9.91e+02 9.66e+02 9.29e+02 8.92e+02 8.55e+02 8.18e+02 7.81e+02 7.43e+02 7.06e+02 6.69e+02 6.32e+02 5.95e+02 5.58e+02 5.20e+02 4.83e+02 4.46e+02 4.09e+02 3.72e+02 3.35e+02 2.97e+02 2.60e+02 2.23e+02 1.86e+02 1.49e+02 1.12e+02 7.43e+01 3.72e+01 0.00e+00

Y Z

X

Contours of dynamic-head

FLUENT 6.3 (3d, pbns, rke)

Figure 25.5: Contours of the Custom Field Function, Dynamic Head

Summary This tutorial demonstrated how to solve a simple 3D problem using FLUENT’s parallel solver. Here, the automatic grid partitioning performed by FLUENT when you read the mesh into the parallel version, was found to be acceptable. You also learned how to check the performance of the parallel solver to determine if optimizations are required. See Section 31.6 of the User’s Guide for additional details about using the parallel solver.

25-16

c Fluent Inc. September 21, 2006