The INSIDE project: Some remarks concerning Control Flow in 'Real ...

3 downloads 18 Views 621KB Size Report
Some remarks concerning Control Flow in 'Real Life'. Parallel Software and the Parallelization of the p-Version of FEM. M. Krafczyk, E. Rank, M. Rücker, ...
The INSIDE project: Some remarks concerning Control Flow in ’Real Life’ Parallel Software and the Parallelization of the p-Version of FEM M. Krafczyk, E. Rank, M. R¨ucker, A. D¨uster LS Bauinformatik, Civil Eng. Department, TU Munich, Arcisstr. 21, D-80290 M¨unchen E-mail: [email protected]

Abstract This contribution describes intermediate results and motivates approaches for two topics relevant for the ESPRIT project INSIDE. After a short sketch of the project itself the paper first discusses the needs for a software tool to control information- and data flow in an HPC network to be used for real life applications and properties of the resulting prototype software. The second part of the paper motivates the use of the p-version of the FEM and adresses some reasons why this method is expected to be superior in parallel efficiency compared to standard h-version codes when targeting at structural analysis problems relevant to the INSIDE project.

1 The INSIDE project The INSIDE project (Integrated Simulation and design system for civil and structural engineering, ESPRIT project 20.216, [1]) was initiated in 1996 with the objective to develop an HPCN-based integrated CAD-FEA system for civil and structural engineering, supporting interactive optimization of constructions using distributed and parallel software on workstation clusters and low-cost parallel computers. The speedup of computation necessary for this interactive structural design will be achieved by combining methodological improvements (mesh adaptivity and p-elements) to reduce the required number of unknowns without loss of accuracy and parallel computing supported by high performance networking. A multitude of software modules interact to form the system: The user works on a PC using a CAD program to define a structural problem. A so-called monitor program allows (as an additional feature within the CAD surface) to define a virtual parallel machine consisting of UNIX-workstations and to start and monitor a distributed simulation which itself consists of several task specific groups of processes, each of which is organized according to a master-slave hierarchy.

1

2 Information- and Control Flow: The monitor Computing in a distributed system requires a hierarchical approach concerning processcontrol-, information- and data-flow. Thus on top of the hierarchy a monitor daemon process has to control the behaviour of the virtual machine at any instance of the computation. This includes the initialization of the virtual machine and of computing processes, detection of their termination, report of occuring errors to the user and a bundle of measures to synchronize groups of processes. The monitor will not control the data flow between CAD program and database during post-processing in order to increase efficiency. Basically the monitor is the link between the user and the virtual machine with its own additional window on the CAD graphics surface. If an error in one of the groups of processes is detected, the corresponding slave will signal its master process who himself then signals the monitor. The monitor informs the user via a message window and asks for further decisions while offering some appropriate default actions. This requires a considerable extension compared to the complexity required merely to act as a switch to start up processes. The monitor will thus be a complex PVM based program that should be able to handle all possible errors except his own termination. This task can only be achieved in a limited sense as the complexity of the system will increase until the end of the project demanding an a posteriori extension of the monitor. Furthermore it might turn out that the ultimate control of every interprocess communication by the monitor might imply a non-acceptable decrease of the overall system performance. In this respect it might become necessary to define an acceptable compromise between system stability and system performance. Initialization will be done by activating a menu point in the PC-based CAD program. This will start the monitor process on the UNIX side of the network and its minor counterpart process on the PC. The monitor analyzes the availability of hardware and the actual system load (this quantity is calculated by a short-time test which is compared to system installation data) and returns this information on request to the user who optionally has to decide if the virtual machine is sufficiently powerful for the task to be computed. If this is true the monitor will initiate to start up one (or a sequence) of the master processes shown in Figure 1. Based on loading data of the processors and their relative power an initial loadbalancing will be provided at every job startup. Figure 1 sketches the system hierarchy. Every distributed task running on a number of slave processors communicates any event being relevant for the overall computational flow to its master who eventually bundles these informations for all members of his group and sends a concise message to the monitor. The monitor transfers information about the state of a calculation from the master process of a distributable task to the user. Processes which are not parallelized (e.g. mesh generation) are initialized and monitored as well. Figure 2 gives an impression of the monitor GUI prototype with the properties mentioned above.

2

User-Frontend

Data Flow

CAD-System

Information- and Control Flow

PC-World UNIX-World Domain Decomposition Slave 1 Slave 2 .... Slave n

Database

MONITOR

Materprocess Postprocessing Mesh-Generator

Masterprocess p-version

Static Analysis Domain Decomposition Slave 1 Slave 2 .... Slave n

Masterprocess h-version Static Analysis Domain Decomposition Slave 1 Slave 2 .... Slave n

Dynamic Analysis Pool of Tasks Slave 1 Slave 2 .... Slave n

Figure 1: Structure of control-, information- and data - flow in the distributed system

Figure 2: Monitor-GUI consisting of main-, parameter- and job-Window

3

Tests so far indicate that the future prototype is to be extended in its functionality with respect to the possibility of eventually giving advise to the non-expert end-user in case that the virtual machine suffers from unforeseeable events like host failures etc. However the authors feel that it is not possible to really obtain absolute stability on a heterogeneous architecture under all real life circumstances. This is partially due to the shortcomings of some UNIX and PVM ([2]) features.

3 The p-FEM in INSIDE 3.1 Motivation While in the standard h-version of the finite element method the mesh is refined to achieve convergence, the polynomial degree of the shape functions remains unchanged. Usually low order approximation of degree p=1 or p=2 is chosen. The p-version leaves the mesh unchanged and increases the polynomial degree of the shape functions locally or globally. In most implementations a hierarchical set of shape functions is applied, providing a simple and consistent facility of implementation in 1-, 2- or 3-dimensional analysis. Oscillations in the approximate solution, which could be expected when working with high order shape functions can be avoided by using properly designed meshes. Guidelines to construct these meshes a priori can often be given much easier for the p-version than for the h-version [3]. For linear elliptic problems it was also proven mathematically that a sequence of meshes can be constructed so that the approximation error only depends on the polynomial degree p and not on the order of singularities in the exact solution. Following [4], our p-version implementation uses hierarchical basis functions, which can easily be implemented up to any desired polynomial degree. Starting with the two linear shape functions on a standard element [?1; +1], a whole family of shape functions in one space dimension is obtained by definition of a sequence of polynomials Pi with degree i satisfying the condition

fPi  j i  ;  2 ? ; ; Pi ? ( )

2

[

1 1]

(

1) =

Pi

(1) = 0

g

(1)

A possible choice for Pi is given by integrals of the Legendre polynomials defined by Z 

Pi 

( ) =

Ln x

( ) =

1

t=?1

n?1 (n ? 1)!

2

Li?1 t dt

with

( )

dn x2 ? dxn (

1)

n

for

n

1

(2)

A hierarchical basis has an immediate consequence on the structure of the resulting stiffness matrix. If equations are ordered so that all linear modes get numbers 1 to n1 , all quadratic modes get numbers n1 + 1 to n2 and so on, stiffness matrices corresponding to polynomial order 1 to p ? 1 are submatrices of the stiffness matrix corresponding to polynomial order p. A construction of a hierarchical family of two-dimensional (and similary three-dimensional) elements is quite straightforward. Standard shape functions are defined on a standard square [?1; 1]  [?1; 1] in local coordinates ;  by a tensor product calculation. They can be grouped into three classes. Elements in the first group are called basic modes being the 4

’usual’ bilinear shape functions. The second class are edge modes. Hierarchical modes for edge  = 1 (and analogously for the other edges of the standard element) are defined by Hi(; ) := Pi ( )( + 1)=2 (3) Edge modes are identically zero on all edges except on their defining edge. The third class are bubble modes being given by

Hkl ;  (

) =

P k   Pl  ( )

( )

k; l 

2

(4)

and disappearing on all edges of the element. Figure 3 shows a linear node mode, an edge mode of degree three and a bubble mode of degree five.

Figure 3: A node-, edge- and bubble-mode typically used in p-version FEM Another main difference between h- and p-version finite-element methods lies in mapping requirements. Because in the p-version the element size is not reduced as the degrees of freedom are increased the description of the geometry must be independent of the number of elements. This results in the necessity to construct elements with an exact representation of the boundary. The isoparametric mapping, used in standard finite-element formulations, can be seen as a special case of mapping with the blending function method (see e.g. [5, 3]). Following these ideas element boundaries can be implemented as arbitrarily curved edges. A relatively strong insensitivity of the p-version accuracy concerning element distortions combined with the use of blending function techniques easily offers the possibility to redesign the geometry without remeshing for some classes of mechanical problems and thus allows naturally for parameter studies as shown in the first example being an extension of an example given in [4]. Figs. 4a and 4b show an attachment lug with a hole at different positions discretized by a very coarse mesh (Figs. 5a+b). The hole position is the design parameter constrained by a maximum stress condition to hold when the hole is put too close to the edge.

Accuracy can easily be tuned by varying p without remeshing. Convergence history for such a procedure is shown in Figs. 6a+b. 5

0.50

1.35

A A

45

1.50

1.65

2.00

2.00

45

0.30

0.40

0.35

0.30

1.00

0.65 1.00

Figure 4: Test-case for a parameter study (hole-position)

Figure 5: Meshes for the test cases

lug (hole position 1)

lug (hole position 5)

-14

-20 Sxy

Sxy

-12 -15 -10

Sxy

Sxy

-8 -10

-6

-4 -5

-2

0

0 2

4

6 8 polynomial degree p

10

12

2

4

6 8 polynomial degree p

Figure 6: Convergence of Stresses for two hole positions

6

10

12

3.2 Investigation of the p-version with respect to Parallel Computing Typically the FEM computation can be divided into several subprocesses requiring different amounts of time:

    

definition of the system (preprocessing): Tpre mesh generation: Tmesh computation of stiffness matrices and load vectors: Tst solution of the generated equation system: Teq post-processing: Tpost

For the h-version usually the biggest amount of CPU time is needed for the solution of the equation system. For the p-version things are different: For p  4 the computation of the element stiffness matrices usually takes more time than solving the equation system. The element stiffness matrix has then a typical structure as shown in Fig. 7. nodal modes

edge modes

bubble modes

Figure 7: Structure of K, quadrilateral element, p=8

As the bubble DOF are purely local to the element they can be condensed using a modified Cholesky decomposition for the element stiffness matrices. This results in further increase of Tst and drastic decrease of Teq because the condition number of the global stiffness matrix is strongly reduced. Several authors ([8, 7, 9]) have investigated these observations in detail interpreting the bubble DOF condensation as a preconditioning procedure. When considering a priori parallelization efficiency of FEM computations the single FE subtasks have to be weighted with their overall percentage of the computational time needed for the problem. Efficient parallel equation solvers, mesh generators and domain decompositioners are still an active field of research, for the p-version parallel mesh generators are not needed due to the coarse meshes used, domain decompositioners thus need very little CPU-time and the CPU-percentage of the parallel equation solver is decreasing 7

with increasing p-level. As the most CPU intensive part is then the computation of the element stiffness matrices one notes that these computations can be naturally parallelized as they do hardly require communication. A similar argument holds for post-processing computations which take relatively more time for the p- than for the h-version. If the overall sequential speed of a p-code is comparable to that of an h-code implementation this implies that one can expect the p-code to perform superior compared to an h-code with respect to parallel efficiency. We found in numerical experiments that our p-code is indeed comparable in speed to commercial (e.g. [10]) h-version codes (not considering the strongly increased accuracy obtained by the p-code for the same amount of DOF): elements 88 5700 6404 6480

AdhoC, p=8, full space SOFiSTiK, SEPP AdhoC, p=1 SOFiSTiK, SEPP

dof 17059 17254 19399 19604

time 371 206 228 269

The test case here was a plate problem, all calculations were done on HP-715/100. For the parallel version of our p-code to be utilized for the INSIDE project we thus expect a very good parallel performance because the bottleneck of depending on a very efficient parallel equation solver is relaxed as the computational weight is shifted to the almost ideally parallelizing computation of element stiffness matrices. The following test case finally demonstrates this shift. 3.2.1

A numerical example

Plate problems make up a strong part in the civil engineerings daily construction process and calculations often suffer from modelling and meshing problems when it comes to the inclusion of foundations. We show a ’real life’ testcase from one of the INSIDE industrial partners (Heitkamp) to verify above assumptions for a non-academic problem. A more complete description of the p-code for plate problems can be found in [6]. The geometry and the computational mesh is depicted in Fig. 8. The plate is supported by 22 pillars and is loaded by own weight. Parts of the system are supported by walls. Fig. 9 shows the deflections on the post-processing mesh being derived from the coarse computational mesh shown in Fig. 8.

8

t=0.38 cm t=0.40 cm t=0.60 cm 129 vertices 16 regions

91 curves 104 quadrilaterals

Figure 8: Real life meshed plate example with columns

Figure 9: Deflections of the plate example shown on the post-processing mesh

9

The overall computation time adds up as follows:

element stiffness matrices elimination interior dof assembly equation solver post-processing Total Time

p=8, 20045 DOF p=6, 11291 DOF 317.5 s 77.1 s 38.8 s 8.1 s 4.6 s 1.9 s 28.7 s 16.0 s 111.2 s 63.6 500.8 s 166.7 s

As an equation solver a conjugate gradient method with incomplete Cholesky preconditioning was used. Even for this real life example it can be seen that by far the major part of the computational time is needed by tasks which are naturally parallelizable.

4 Conclusions The purpose of this article was to report on two aspects of the work of the ongoing INSIDE project. We discussed the need and the desired functionality of the so-called monitor software responsible for control- and information flow in a distributed ’agent-based’ heterogeneous parallel computing environment. A prototype was introduced which is to be extensively tested in the near future. Although such a piece of software is hardly necessary in the academic world we would like to stress here that for all industrial applications the amount of work to be invested in similar system control software modules is easily underestimated. Nevertheless we feel that the development of such tools is mandatory for an HPCN project as the transparent behaviour of a (virtual) parallel machine is a necessary condition for the broad acceptance of sophisticated parallel simulation tools in the broad market. The second part of this work motivates the use of the p-version of the FEM for some typical problems in the field of civil engineering. We examined the influence of the condensation of bubble mode DOF on the overall parallel efficiency to be expected when parallelizing the p-code in the near future. We found that there is evidence to expect that for higher p-levels the p-version will be superior in parallel efficiency as compared to a classic hversion approach. Within the next year it is planned to integrate all the software components necessary to build the INSIDE prototype software. The functionality and efficiency of the system will then be reported elsewhere.

References [1] see http : ==www:bauwesen:uni ? dortmund:de=nmi=inside:html [2] see e.g. http : ==www:epm:ornl:gov

=pvm=

: 80

[3] B. S ZAB O´ : Mesh design for the p-version of the finite element method, Comp. Meth. in Applied Mech. and Eng. 55 , 181-197, 1986.

10

[4] B. S ZAB O´ , I. BABU Sˇ KA: Finite Element Analysis, Wiley, Chichester, 1990. [5] G ORDON , W. J. / H ALL , C H . A.: Construction of Curvilinear Co-ordinate Systems and Applications to Mesh Generation, International Journal for Numerical Methods in Engineering, Vol. 7, 1973, pp. 461-477 [6] E. R ANK , R. K RAUSE, K. PREUSCH : On the accuracy of p-version elements for the Reissner-Mindlin plate problem, submitted to International Journal for Numerical Methods in Engineering, 1996 [7] J. M ANDEL : Iterative Solvers by substructuring for the p-version finite element method, Comp. Meth. Appl. Mech.Engrg., 80, 117-128, 1990 [8] M. A INSWORTH : A preconditioner based on domain decomposition for hp-FE approximation on quasi-uniform meshes, SIAM J. Num. Anal., Vol 33 No. 4, 13581376, 1996 [9] M. PAPADRAKAKIS, G.B. BABILIS : Solution techniques for the p-version of the adaptive FE method, Int. J. Num. Meth. Eng., Vol. 37, 1413-1431, 1994 [10] SOFiSTIK, Bruckmannring 6 , D-85764 Oberschleissheim, 1997

11