4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998, Presentation at the Mini-Symposium “Dynamic Balancing: Current Status and Recent Progress”

Dynamic Re-Allocation of Meshes for parallel Finite Element Applications

A M A R D Project No. 24953 of the European Commission's ESPRIT Programme (Long Term Research) Project Partners and contributing personnel: CEMEF (Ecole des Mines/ARMINES): Engineering Systems International S.A.: K.U. Leuven: NEC Europe Ltd., C&C Research Laboratories: Transvalor S.A.: Contact for further information: Project Web-page:

T. Coupez, H. Digonnet J. Clinckemaillie, G. Thierry B. Maerten, D. Roose A. Basermann, J. Fingberg, G. Lonsdale R. Ducloux

G. Lonsdale (lonsdale @ ccrl-nece.technopark.gmd.de) http://www.cs.kuleuven.ac.be/cwis/research/natw/DRAMA.html

Project Overview Background to the developments The ESPRIT project DRAMA has been initiated to support the take-up of large scale parallel simulation in industry by dealing with the main problem which restricts the use of message-passing simulation codes - the inability to perform dynamic load-balancing. The particular focus of the project is on the requirements of industrial Finite Element codes, but codes using Finite Volume formulations will also be able to make use of the project results. The focus on the message-passing approach corresponds to the target of addressing large scale and thus highly scalable parallel applications. The most obvious cases where message-passing codes require dynamic load balancing are those where parallelisation via mesh partitioning is combined with adaptive meshing (as in local mesh refinement and coarsening) or adaptive re-meshing. However, as will be seen when considering the applications included within the DRAMA project, a need for dynamic load balancing arises in applications with fixed meshes where computational and/or communications costs vary greatly as the simulation progresses. Major advances have been made in recent years in the two areas which form the starting point for the project activities: the development of parallel mesh-partitioning algorithms suitable for dynamic repartitioning (re-allocation of sub-meshes to processors at run-time); the migration and optimisation of industrial-strength simulation codes to HPC platforms using the message-passing paradigm. However, most industrial-strength parallel simulations using large processor numbers are performed with static partitioning and non-adaptive meshing - or when adaptive meshing, then with a sequentialised repartitioning phase which greatly reduces the parallel performance. Thus, much of the exploitation within the end-user industry can currently be categorised as “exploratory installations”. The DRAMA 1/11

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998, Presentation at the Mini-Symposium “Dynamic Balancing: Current Status and Recent Progress”

project aims to bring together the developments in parallel partitioning and parallel FE applications to ensure that the potential of scalable computing can be achieved for fully-functional industrial simulation, which includes efficient adaptive meshing (and re-meshing) options. The parallel dynamic re-partitioning routines should also be to handle the full complexity and range of finite elements as used in industrial structural mechanics codes, as exemplified by the applications within the project.

The DRAMA Approach The central product of the project will be the DRAMA Library comprising various tools for dynamic re-partitioning of unstructured finite element applications. The core library functions will perform a parallel computation of a mesh re-allocation that will re-balance the costs of the application code based on the DRAMA cost model. The DRAMA cost model is able to take account of: dynamically changing computational and communications requirements. Furthermore, it is formulated in such a way that all information can be provided by the application based on its actual local data and measured costs (via code instrumentation). The library will provide support information to enable an efficient migration of the re-allocation between processors. Via the DRAMA Library, dynamic load balancing may be achieved which will enable scalable, efficient parallel FE applications, even with adaptive mesh refinement (coarsening) and re-meshing. As a by-product to this approach, fully parallel mesh generation will be enabled via exploitation of the parallel re-partitioning of adaptively generated meshes. The mesh re-allocation approach to dynamic load balancing will be demonstrated and validated by the leading industrial codes PAM-CRASH (for crashworthiness simulation), PAM-STAMP (for metal stamping / deep-drawing and related simulations), FORGE-3 (for forging with viscoplastic incompressible materials). Despite this emphasis on the validation codes within the project, the library has been designed to be general purpose. Since the final DRAMA library will be put into the public domain, it is hoped that a wide range of applications will be able to make use of the project results.

The DRAMA Applications While the technology to be developed has an impact for a wide range of applications, one possible ‘classical’ application being the adaptive shock-capturing features in aerodynamics codes, the DRAMA project focuses on structural mechanics codes whose large deformation simulations highlight the importance of the dynamic handling. The industrial simulation codes chosen for the validation of the DRAMA approach and library are representative of the wide-ranging finite element simulation codes which have a natural requirement for a re-partitioning library as parallelisation aid. All the DRAMA applications use time-marching as basic solution procedure and both explicit (PAMCRASH/-STAMP) and implicit (FORGE3) methods are included. Causes of load-imbalance, and resulting degradation of scalability, are: (a) a dynamic behaviour of computational cost per element and of the communication patterns ; (b) meshes which are changing during the calculation - adaptive meshing or re-meshing, including reshaping, refinement and coarsening. The self-impacting contactimpact algorithms used in PAM-CRASH are extreme cases of the former. Adaptive meshing is essential for codes like FORGE3 or PAM-STAMP where the large deformations would otherwise result in extremely severe distortions of the mesh elements.

2/11

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998, Presentation at the Mini-Symposium “Dynamic Balancing: Current Status and Recent Progress”

FORGE-3 & Parallel adaptive re-meshing FORGE3 from Transvalor is an implicit finite element code designed for the simulation of threedimensional metal forming. It is able to simulate the large deformations of viscoplastic incompressible materials with unilateral contact conditions. The code is based on a stable mixed velocity/pressure formulation using tetrahedral unstructured meshes and employs an implicit time stepping technique. Central to the Newton iteration dealing with the non-linearity arising from the behaviour of material and the unilateral contact condition is an iterative procedure based on a conjugate residual method for the solution of the large linear system. The parallelisation of the full code, including adaptive re-meshing, was done within the EUROPORT project ([1]) employing a mesh partitioning approach. For forging simulations, the capability for remeshing is a unique, competitive advantage of the FORGE3 code. A functioning 3-D parallel remeshing procedure has been established which requires a repartitioning stage ('element migration'), not only to avoid load imbalance but also to deal with the interface re-meshing. This has to date been achieved via a centralised re-allocation process, which becomes a bottleneck, especially for large problems or when a large number of processors are used. Figure 1 shows an example of the evolving mesh as local re-meshing is followed by re-partitioning and (subsequent) interface re-meshing as the local re-meshing procedure is performed at later time-steps.

Figure 1: Adaptive re-meshing and re-partitioning for a crankshaft forging simulation with FORGE3 (A) initial mesh & partition; (B) parallel meshing without repartitioning; (C) repartitioning; (D) mesh and partition after several further increments.

The reader is referred to [2,3] and the references therein for further information.

3/11

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998, Presentation at the Mini-Symposium “Dynamic Balancing: Current Status and Recent Progress”

PAM-CRASH & PAM-STAMP The PAM-CRASH and PAM-STAMP codes are two of the ESI/PSI Group products that are built around the PAM-SOLID core solver libraries. This means that they share the same basic algorithms and computational kernels, but include different algorithms and routines for application-specific functions. The most crucial components within the crashworthiness code PAM-CRASH are the contact-impact algorithms whose main feature, from the DRAMA viewpoint, is the dynamically changing computation and communications costs. Contact-impact algorithms are also crucial to the simulations performed by PAM-STAMP, but the much more significant parallelisation requirement is the efficient handling of adaptive meshing since around 90% of stamping applications rely on the adaptive meshing features. In contrast to the re-meshing approach adopted by FORGE3, PAMSTAMP uses a mesh-refinement (and coarsening) strategy based on the original user-defined mesh. An example of such a mesh can be seen in Figure 2. Leaving details of the algorithms and their parallelisation to [6,7] and the references therein, a summary of the PAM-SOLID-based codes is as follows: The non-linear explicit finite element method employed uses a Lagrangian formulation of the equations of motion of the nodes of the unstructured mesh constructed by the replacement of the physical model by an interconnected set of mechanical elements. Modelling of the materials involved in the physical model is done on an element level. This locality of discretisation enables all stressstrain calculations to be performed element-wise and the use of a simple central difference timemarching scheme for the, thus diagonalised, equations of motion. For most industrial models, the majority of elements employed are 4 node thin shell (reduced integration) elements. For PAMCRASH, these are supplemented by a whole range of, in part highly specialised, mechanical elements. The two dominant (in terms of CPU time) computational components are: the element-wise (and thus with mesh partitioning highly local) stress-strain calculations; contact-impact calculations. The contact algorithms serve to detect and correct penetration of structural components and have, in contrast to the stress-strain calculations, a pseudo-global nature. They first perform a proximity and penetration search, followed by a penetration correction procedure. An implementation (or practical usage) issue which affects parallelisation is that the contact calculations are performed only within user-defined (and not necessarily disjunct) areas, referred to as “slide-lines”. The current message-passing version of PAM-CRASH, further developed from the prototypes produced within the CAMAS-EUROPORT ([1]) and EUROPORT-D ([8]) projects, employs a static partitioning approach. This can lead to greatly reduced scalability due to the dynamically changing costs within the contact-impact slide-lines, whose distribution across processes may be in any case not balanced.

Figure 2: Adaptive meshing as occurring in PAM-STAMP simulations 4/11

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998, Presentation at the Mini-Symposium “Dynamic Balancing: Current Status and Recent Progress”

The DRAMA Library The DRAMA Library is designed to be called by parallel message-passing (MPI) finite element applications. The “expectation” of such applications is for the rapid provision of information about: a re-partitioning of the mesh which balances the costs occurring in the application; the interaction between processes required to achieve the re-partitioning. Given the normal complexity and application dependence of such algorithms, the actual data migration would not be expected of the library. Thus, the DRAMA library and its re-partitioning algorithms must be efficient, parallel (operating on distributed data) and must also take the current partition into account, in order to avoid high communication costs during the resulting data migration. Furthermore, it should be based on actual occurring costs, rather than some abstract heuristic. The current library design and repartitioning modules included has taken these requirements into account by the careful definition of the cost model and library interface. A summary of this strategy would be: “The DRAMA Library is designed to balance in parallel the actual costs occurring on the application's finite element mesh”.

The DRAMA Cost Model & Library Interface An introduction to the initial definition of the DRAMA Cost Model and the interface has been given in some detail in [9]. Full details of the library interface will be provided (November '98) in the publicly available project deliverable [10]. In the following, an overview of the features included in the cost model will be given, followed by the components of the library and a simple example of the mesh information transfer between the code and the DRAMA library. The DRAMA Library is written in C and message-passing exploits MPI. The library may be called by applications written in both Fortran and C. The interface between the application code and the library is designed around the DRAMA cost model (which results in an objective cost function for the load-balancing re-partitioning algorithms) and the instrumentation of the application code to specify current and future computational and communication costs. The DRAMA cost model provides a measure of the quality of the current distribution and allows the prediction of the effect on the computation of moving some parts of the mesh to other sub-domains. Calculation and communication speeds of the processors are taken into account by a combination of hardware specific parameters and costs which are based on time measurements and enumeration provided by application code instrumentation. Heterogeneous machine architectures can also be taken into account in this way. The essential feature is that the cost model is mesh-based, so that it is able to take account of the various workload contributions and communication dependencies that can occur in finite element applications. Being mesh-based, the DRAMA cost model includes both per element and per node computational costs. Indeed, within a finite element code, part of the computations may be performed element-wise, for example, a matrix-assembly phase, while other operations are node based, such as the update of physical variables and nodal co-ordinates or the solution of systems of linear equations. Furthermore, the inter-sub-domain communication is frequently carried out using node lists. Therefore, the cost model includes element-element, node-node, and element-node data dependencies. In addition to data dependencies between neighbouring elements and nodes in the mesh, dependencies between arbitrary parts of the mesh can occur. For the PAM-CRASH code, such data dependencies originate within the contact-impact algorithms when the penetration of mesh segments by nonconnected nodes is detected and corrected. The DRAMA cost model (and of course the library interface) allows the construction of “virtual elements” which represent the occurring costs of such dependencies. The current library design includes several types of mesh re-partitioners that may be selected by the application: mesh-migration, graph partitioning & co-ordinate partitioning. An overview of these 5/11

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998, Presentation at the Mini-Symposium “Dynamic Balancing: Current Status and Recent Progress”

approaches will be given in the next sub-section. The library builds upon these partitioning options with modules to provide the interface to the full DRAMA input mesh and the cost monitoring parameters and to deliver the full DRAMA output mesh and data migration information (old ↔ new mesh relationships). An overview of the library design is given in Figure 3.

Figure 3. The DRAMA Library Design The reader is referred to [10] for details of the interface format. The following example will focus on the input and output DRAMA meshes and mesh relationships and will omit the definition of weights per element and node type and also the definition of timing and enumerated operation/communication counts. The numbering format used within the DRAMA library interface is a dual numbering which is globally unique - it combines local node and element numbering with a unique processor number to which the node/element is “assigned”. Typical finite element applications with replicated nodes on sub-domain boundaries or with overlap/halo regions will be able to conform to this numbering though mapping to- and from this numbering will have to be performed by the application. The simple original mesh with its partitioning on two processes (using the above dual numbering) is shown in Figure 4a together with the two parts of the input mesh provided to the DRAMA Library (in parallel by the two calling application processes). The horizontal line within the table is a demarcation between the two parallel inputs. The resulting partition (in the updated numbering system) is shown in 6/11

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998, Presentation at the Mini-Symposium “Dynamic Balancing: Current Status and Recent Progress”

Figure 4b. Figure 4c shows the output from the DRAMA Library - tables giving each process its new partition and the data migration relationships.

DRAMA Input mesh (on the two processes)

Figure 4a: Existing partitioned example mesh and the DRAMA input mesh

Figure 4b: The resulting re-partitioned mesh

7/11

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998, Presentation at the Mini-Symposium “Dynamic Balancing: Current Status and Recent Progress”

New Mesh, i.e. New partition

Figure 4c: DRAMA Output information

8/11

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998, Presentation at the Mini-Symposium “Dynamic Balancing: Current Status and Recent Progress”

Re-Partitioning Modules As mentioned above, several types of mesh re-partitioners are available for selection by the application: mesh-migration, graph partitioning & co-ordinate partitioning. The mesh-migration approach uses the DRAMA cost model directly as a cost function when applying an iterative procedure in which processor pairs perform load balancing (or more precisely cost function balancing) by the logical exchange of elements between their sub-domains. While theoretical convergence proofs show that in worst cases the number of iterative steps grows with the square of the number of processes, practical experience with finite element meshes shows that load balance is achieved after only small numbers of iterations. For further information see [11,12]. ‘Classical’ graph partitioning methods employing weighted graphs derived from either element or nodal mesh connections would be unable to fully account for the costs arising in a finite element application in general. The mesh-to-graph module of the DRAMA library constructs an appropriate weighted graph from the distributed mesh. Depending on the properties and the needs of the application, the resulting graph can be an 'element graph', a 'node graph', or a combined 'element-node graph'. The latter contains all possible relevant cost contributions for finite element codes. For a given partition, edges between nodes, elements or elements and nodes represent different communication requirements between processors. For instance, edges between elements and nodes lead to communication when a sub-domain possesses an element but not all its nodes. The combination of the mesh-to-graph module with a suitable graph partioner results in a mesh partitioner based on the DRAMA cost model. Within the current version of the DRAMA library, the subsequent graph partitioning is carried out by calling routines from PARMETIS, the software package developed by Karypis et al., University of Minneapolis ([13,14]). PARMETIS contains several strategies for graph re-partitioning; in particular a multilevel method based on 'diffusing' load to adjacent partitions. The idea behind this multilevel technique is that from the originally graph a hierarchy of coarser graphs is generated (by merging graph vertices to 'supervertices'). A careful re-partitioning of the coarsest graph is computed, and then this new partitioning is successively 'projected' onto the next finer graph and improved. The latter is achieved by a load diffusion scheme. The co-ordinate partitioning option refers to the standard recursive co-ordinate bisection approach. In addition to providing a possible default partitioning scheme, it is also included to provide a possibility for a later investigation of a dual-partitioning approach (see [15]) for the contact-impact phase of the calculations within PAM-CRASH.

9/11

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998, Presentation at the Mini-Symposium “Dynamic Balancing: Current Status and Recent Progress”

Future Outlook Following the library design and initial implementation phase, investigations are underway to verify that the re-partitioning defined by the library, based on the parameters and costs provided by code instrumentation via the library interface, leads to a maintenance of load balance within the continuing simulations. Results of very preliminary tests with the self-impacting box beam model for the PAMCRASH code (which includes the use of virtual elements defined by the contact-impact algorithms) are shown in Figure 5.

Initial Partition

After 1st re-partitioning

After 2nd repartitioning

Figure 5: Preliminary re-partitioning results with PAM-CRASH interfaced to DRAMA In the near future, validation and performance benchmarking will be carried out with both PAMCRASH and FORGE3 simulations using (limited size) industrial examples. In addition, the possibilities for parallel mesh generation based on the combination of parallel remeshing and repartitioning through mesh migration will be demonstrated. In the latter stages of the project, the expectation is to achieve high scalability with large scale industrial modelling with all three DRAMA applications codes including adaptive meshing and remeshing. In addition to continuing modification of the re-partitioning modules currently included in the library, a co-operation with the University of Greenwich will lead to the inclusion of a DRAMA interface to a modified version of the Jostle mesh partitioning software ([16]). 10/11

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998, Presentation at the Mini-Symposium “Dynamic Balancing: Current Status and Recent Progress”

References [1] K. Stüben, H. Mierrendorff, C.-A. Thole and O. Thomas, Parallel industrial Fluid Dynamics and Structural Mechanics codes, 90-98, in [3], 1996 [2] J. A. Elliott, S. H. Brindle, A. Colbrook, D. G. Green and F. Wray, Real industrial HPC applications, 29-35, in [3], 1996 [3] H. Liddell, A. Colbrook, B. Hertzberger and P. Sloot (Eds.), Proceedings of the HPCN '96 Conference, Lecture Notes in Computer Science 1067, Springer-Verlag, 1996 [4] T. Coupez and S. Marie, From a direct solver to a parallel iterative solver in 3D forming simulation, Int. J. Supercomputer Applications and High Performance Computing, 11(4), 205211, 1997 [5] T. Coupez, S. Marie and R. Ducloux, Parallel 3D simulation of forming processes including parallel remeshing and reloading, Numerical Methods in Engineering '96 (Proceedings of 2nd ECCOMAS Conference, J.-A. Désidéri et. al. Editors), 738-743, Wiley, 1996 [6] J Clinckemaillie, B Elsner, G Lonsdale, S Meliciani, S Vlachoutsis, F de Bruyne and M Holzner, Performance issues of the parallel PAM-CRASH code, Int. J. Supercomputer Applications and High Performance Computing, 11(1), 3-11,1997 [7] G. Lonsdale, A. Petitet, F. Zimmermann, J. Clinckemaillie, S. Meliciani and S. Vlachoutsis, Programming crashworthiness simulation for parallel platforms, Mathematical and Computer Modelling, to appear. [8] EUROPORT-D ESPRIT HPCN Project No. 21102, World-Wide Web Document: http://www.gmd.de/SCAI/europort-d/ [9] B. Maerten, A. Basermann, J. Fingberg, G. Lonsdale, D. Roose, Parallel dynamic mesh repartitioning in FEM codes, Advances in Computational Mechanics with High Performance Computing (Proceedings of the 2nd Euro-Conference on parallel and distributed computing for computational mechanics, B.H.V. Topping Ed.), Saxe-Coburg, 163-167, 1998 [10] The DRAMA Consortium, Library Interface Definition, DRAMA Project Deliverable D1.2a, 1998 [11] T. Coupez, Parallel adaptive remeshing in 3D moving mesh finite element, Numerical Grid Generation in Computational Field Simulation, Vol. 1 (B.K. Soni et.al Editors), 783-792, Mississippi University, 1996 [12] C. Ozturan, H. L. de Cougny, M. S. Shephard and J. E.Flaherty, Parallel adaptive mesh refinement and redistribution on distributed memory computers, Comp. Meth. Mech. Engnrg., 119, 123-137, 1994 [13] G. Karypis, K. Schloegel and V. Kumar, PARMETIS Parallel graph partitioning and sparse matrix ordering library, Version 1.0, Dept. of Computer Science, University of Minnesota, 1997 [14] K. Schloegel, G. Karypis and V. Kumar, Multilevel diffusion schemes for repartitioning of adaptive meshes, J. Parallel and Distributed Computing, 47, 109-124, 1997 [15] S. A. Attaway, E. J. Barragy, K. H. Brown, D.R. Gardner, B. A. Hendrickson and S. J. Plimpton, Transient solid dynamics simulations on the Sandia/Intel Teraflop computer, Supercomputing '97 (Proceedings on CD-ROM), Technical Paper, 1997 [16] C. Walshaw, M. Cross and M. Everett, Dynamic load-balancing for parallel adaptive unstructured meshes, Parallel processing for scientific computing (M. Heath et. al. Eds.), SIAM, 1997

11/11

Dynamic Re-Allocation of Meshes for parallel Finite Element Applications

A M A R D Project No. 24953 of the European Commission's ESPRIT Programme (Long Term Research) Project Partners and contributing personnel: CEMEF (Ecole des Mines/ARMINES): Engineering Systems International S.A.: K.U. Leuven: NEC Europe Ltd., C&C Research Laboratories: Transvalor S.A.: Contact for further information: Project Web-page:

T. Coupez, H. Digonnet J. Clinckemaillie, G. Thierry B. Maerten, D. Roose A. Basermann, J. Fingberg, G. Lonsdale R. Ducloux

G. Lonsdale (lonsdale @ ccrl-nece.technopark.gmd.de) http://www.cs.kuleuven.ac.be/cwis/research/natw/DRAMA.html

Project Overview Background to the developments The ESPRIT project DRAMA has been initiated to support the take-up of large scale parallel simulation in industry by dealing with the main problem which restricts the use of message-passing simulation codes - the inability to perform dynamic load-balancing. The particular focus of the project is on the requirements of industrial Finite Element codes, but codes using Finite Volume formulations will also be able to make use of the project results. The focus on the message-passing approach corresponds to the target of addressing large scale and thus highly scalable parallel applications. The most obvious cases where message-passing codes require dynamic load balancing are those where parallelisation via mesh partitioning is combined with adaptive meshing (as in local mesh refinement and coarsening) or adaptive re-meshing. However, as will be seen when considering the applications included within the DRAMA project, a need for dynamic load balancing arises in applications with fixed meshes where computational and/or communications costs vary greatly as the simulation progresses. Major advances have been made in recent years in the two areas which form the starting point for the project activities: the development of parallel mesh-partitioning algorithms suitable for dynamic repartitioning (re-allocation of sub-meshes to processors at run-time); the migration and optimisation of industrial-strength simulation codes to HPC platforms using the message-passing paradigm. However, most industrial-strength parallel simulations using large processor numbers are performed with static partitioning and non-adaptive meshing - or when adaptive meshing, then with a sequentialised repartitioning phase which greatly reduces the parallel performance. Thus, much of the exploitation within the end-user industry can currently be categorised as “exploratory installations”. The DRAMA 1/11

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998, Presentation at the Mini-Symposium “Dynamic Balancing: Current Status and Recent Progress”

project aims to bring together the developments in parallel partitioning and parallel FE applications to ensure that the potential of scalable computing can be achieved for fully-functional industrial simulation, which includes efficient adaptive meshing (and re-meshing) options. The parallel dynamic re-partitioning routines should also be to handle the full complexity and range of finite elements as used in industrial structural mechanics codes, as exemplified by the applications within the project.

The DRAMA Approach The central product of the project will be the DRAMA Library comprising various tools for dynamic re-partitioning of unstructured finite element applications. The core library functions will perform a parallel computation of a mesh re-allocation that will re-balance the costs of the application code based on the DRAMA cost model. The DRAMA cost model is able to take account of: dynamically changing computational and communications requirements. Furthermore, it is formulated in such a way that all information can be provided by the application based on its actual local data and measured costs (via code instrumentation). The library will provide support information to enable an efficient migration of the re-allocation between processors. Via the DRAMA Library, dynamic load balancing may be achieved which will enable scalable, efficient parallel FE applications, even with adaptive mesh refinement (coarsening) and re-meshing. As a by-product to this approach, fully parallel mesh generation will be enabled via exploitation of the parallel re-partitioning of adaptively generated meshes. The mesh re-allocation approach to dynamic load balancing will be demonstrated and validated by the leading industrial codes PAM-CRASH (for crashworthiness simulation), PAM-STAMP (for metal stamping / deep-drawing and related simulations), FORGE-3 (for forging with viscoplastic incompressible materials). Despite this emphasis on the validation codes within the project, the library has been designed to be general purpose. Since the final DRAMA library will be put into the public domain, it is hoped that a wide range of applications will be able to make use of the project results.

The DRAMA Applications While the technology to be developed has an impact for a wide range of applications, one possible ‘classical’ application being the adaptive shock-capturing features in aerodynamics codes, the DRAMA project focuses on structural mechanics codes whose large deformation simulations highlight the importance of the dynamic handling. The industrial simulation codes chosen for the validation of the DRAMA approach and library are representative of the wide-ranging finite element simulation codes which have a natural requirement for a re-partitioning library as parallelisation aid. All the DRAMA applications use time-marching as basic solution procedure and both explicit (PAMCRASH/-STAMP) and implicit (FORGE3) methods are included. Causes of load-imbalance, and resulting degradation of scalability, are: (a) a dynamic behaviour of computational cost per element and of the communication patterns ; (b) meshes which are changing during the calculation - adaptive meshing or re-meshing, including reshaping, refinement and coarsening. The self-impacting contactimpact algorithms used in PAM-CRASH are extreme cases of the former. Adaptive meshing is essential for codes like FORGE3 or PAM-STAMP where the large deformations would otherwise result in extremely severe distortions of the mesh elements.

2/11

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998, Presentation at the Mini-Symposium “Dynamic Balancing: Current Status and Recent Progress”

FORGE-3 & Parallel adaptive re-meshing FORGE3 from Transvalor is an implicit finite element code designed for the simulation of threedimensional metal forming. It is able to simulate the large deformations of viscoplastic incompressible materials with unilateral contact conditions. The code is based on a stable mixed velocity/pressure formulation using tetrahedral unstructured meshes and employs an implicit time stepping technique. Central to the Newton iteration dealing with the non-linearity arising from the behaviour of material and the unilateral contact condition is an iterative procedure based on a conjugate residual method for the solution of the large linear system. The parallelisation of the full code, including adaptive re-meshing, was done within the EUROPORT project ([1]) employing a mesh partitioning approach. For forging simulations, the capability for remeshing is a unique, competitive advantage of the FORGE3 code. A functioning 3-D parallel remeshing procedure has been established which requires a repartitioning stage ('element migration'), not only to avoid load imbalance but also to deal with the interface re-meshing. This has to date been achieved via a centralised re-allocation process, which becomes a bottleneck, especially for large problems or when a large number of processors are used. Figure 1 shows an example of the evolving mesh as local re-meshing is followed by re-partitioning and (subsequent) interface re-meshing as the local re-meshing procedure is performed at later time-steps.

Figure 1: Adaptive re-meshing and re-partitioning for a crankshaft forging simulation with FORGE3 (A) initial mesh & partition; (B) parallel meshing without repartitioning; (C) repartitioning; (D) mesh and partition after several further increments.

The reader is referred to [2,3] and the references therein for further information.

3/11

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998, Presentation at the Mini-Symposium “Dynamic Balancing: Current Status and Recent Progress”

PAM-CRASH & PAM-STAMP The PAM-CRASH and PAM-STAMP codes are two of the ESI/PSI Group products that are built around the PAM-SOLID core solver libraries. This means that they share the same basic algorithms and computational kernels, but include different algorithms and routines for application-specific functions. The most crucial components within the crashworthiness code PAM-CRASH are the contact-impact algorithms whose main feature, from the DRAMA viewpoint, is the dynamically changing computation and communications costs. Contact-impact algorithms are also crucial to the simulations performed by PAM-STAMP, but the much more significant parallelisation requirement is the efficient handling of adaptive meshing since around 90% of stamping applications rely on the adaptive meshing features. In contrast to the re-meshing approach adopted by FORGE3, PAMSTAMP uses a mesh-refinement (and coarsening) strategy based on the original user-defined mesh. An example of such a mesh can be seen in Figure 2. Leaving details of the algorithms and their parallelisation to [6,7] and the references therein, a summary of the PAM-SOLID-based codes is as follows: The non-linear explicit finite element method employed uses a Lagrangian formulation of the equations of motion of the nodes of the unstructured mesh constructed by the replacement of the physical model by an interconnected set of mechanical elements. Modelling of the materials involved in the physical model is done on an element level. This locality of discretisation enables all stressstrain calculations to be performed element-wise and the use of a simple central difference timemarching scheme for the, thus diagonalised, equations of motion. For most industrial models, the majority of elements employed are 4 node thin shell (reduced integration) elements. For PAMCRASH, these are supplemented by a whole range of, in part highly specialised, mechanical elements. The two dominant (in terms of CPU time) computational components are: the element-wise (and thus with mesh partitioning highly local) stress-strain calculations; contact-impact calculations. The contact algorithms serve to detect and correct penetration of structural components and have, in contrast to the stress-strain calculations, a pseudo-global nature. They first perform a proximity and penetration search, followed by a penetration correction procedure. An implementation (or practical usage) issue which affects parallelisation is that the contact calculations are performed only within user-defined (and not necessarily disjunct) areas, referred to as “slide-lines”. The current message-passing version of PAM-CRASH, further developed from the prototypes produced within the CAMAS-EUROPORT ([1]) and EUROPORT-D ([8]) projects, employs a static partitioning approach. This can lead to greatly reduced scalability due to the dynamically changing costs within the contact-impact slide-lines, whose distribution across processes may be in any case not balanced.

Figure 2: Adaptive meshing as occurring in PAM-STAMP simulations 4/11

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998, Presentation at the Mini-Symposium “Dynamic Balancing: Current Status and Recent Progress”

The DRAMA Library The DRAMA Library is designed to be called by parallel message-passing (MPI) finite element applications. The “expectation” of such applications is for the rapid provision of information about: a re-partitioning of the mesh which balances the costs occurring in the application; the interaction between processes required to achieve the re-partitioning. Given the normal complexity and application dependence of such algorithms, the actual data migration would not be expected of the library. Thus, the DRAMA library and its re-partitioning algorithms must be efficient, parallel (operating on distributed data) and must also take the current partition into account, in order to avoid high communication costs during the resulting data migration. Furthermore, it should be based on actual occurring costs, rather than some abstract heuristic. The current library design and repartitioning modules included has taken these requirements into account by the careful definition of the cost model and library interface. A summary of this strategy would be: “The DRAMA Library is designed to balance in parallel the actual costs occurring on the application's finite element mesh”.

The DRAMA Cost Model & Library Interface An introduction to the initial definition of the DRAMA Cost Model and the interface has been given in some detail in [9]. Full details of the library interface will be provided (November '98) in the publicly available project deliverable [10]. In the following, an overview of the features included in the cost model will be given, followed by the components of the library and a simple example of the mesh information transfer between the code and the DRAMA library. The DRAMA Library is written in C and message-passing exploits MPI. The library may be called by applications written in both Fortran and C. The interface between the application code and the library is designed around the DRAMA cost model (which results in an objective cost function for the load-balancing re-partitioning algorithms) and the instrumentation of the application code to specify current and future computational and communication costs. The DRAMA cost model provides a measure of the quality of the current distribution and allows the prediction of the effect on the computation of moving some parts of the mesh to other sub-domains. Calculation and communication speeds of the processors are taken into account by a combination of hardware specific parameters and costs which are based on time measurements and enumeration provided by application code instrumentation. Heterogeneous machine architectures can also be taken into account in this way. The essential feature is that the cost model is mesh-based, so that it is able to take account of the various workload contributions and communication dependencies that can occur in finite element applications. Being mesh-based, the DRAMA cost model includes both per element and per node computational costs. Indeed, within a finite element code, part of the computations may be performed element-wise, for example, a matrix-assembly phase, while other operations are node based, such as the update of physical variables and nodal co-ordinates or the solution of systems of linear equations. Furthermore, the inter-sub-domain communication is frequently carried out using node lists. Therefore, the cost model includes element-element, node-node, and element-node data dependencies. In addition to data dependencies between neighbouring elements and nodes in the mesh, dependencies between arbitrary parts of the mesh can occur. For the PAM-CRASH code, such data dependencies originate within the contact-impact algorithms when the penetration of mesh segments by nonconnected nodes is detected and corrected. The DRAMA cost model (and of course the library interface) allows the construction of “virtual elements” which represent the occurring costs of such dependencies. The current library design includes several types of mesh re-partitioners that may be selected by the application: mesh-migration, graph partitioning & co-ordinate partitioning. An overview of these 5/11

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998, Presentation at the Mini-Symposium “Dynamic Balancing: Current Status and Recent Progress”

approaches will be given in the next sub-section. The library builds upon these partitioning options with modules to provide the interface to the full DRAMA input mesh and the cost monitoring parameters and to deliver the full DRAMA output mesh and data migration information (old ↔ new mesh relationships). An overview of the library design is given in Figure 3.

Figure 3. The DRAMA Library Design The reader is referred to [10] for details of the interface format. The following example will focus on the input and output DRAMA meshes and mesh relationships and will omit the definition of weights per element and node type and also the definition of timing and enumerated operation/communication counts. The numbering format used within the DRAMA library interface is a dual numbering which is globally unique - it combines local node and element numbering with a unique processor number to which the node/element is “assigned”. Typical finite element applications with replicated nodes on sub-domain boundaries or with overlap/halo regions will be able to conform to this numbering though mapping to- and from this numbering will have to be performed by the application. The simple original mesh with its partitioning on two processes (using the above dual numbering) is shown in Figure 4a together with the two parts of the input mesh provided to the DRAMA Library (in parallel by the two calling application processes). The horizontal line within the table is a demarcation between the two parallel inputs. The resulting partition (in the updated numbering system) is shown in 6/11

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998, Presentation at the Mini-Symposium “Dynamic Balancing: Current Status and Recent Progress”

Figure 4b. Figure 4c shows the output from the DRAMA Library - tables giving each process its new partition and the data migration relationships.

DRAMA Input mesh (on the two processes)

Figure 4a: Existing partitioned example mesh and the DRAMA input mesh

Figure 4b: The resulting re-partitioned mesh

7/11

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998, Presentation at the Mini-Symposium “Dynamic Balancing: Current Status and Recent Progress”

New Mesh, i.e. New partition

Figure 4c: DRAMA Output information

8/11

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998, Presentation at the Mini-Symposium “Dynamic Balancing: Current Status and Recent Progress”

Re-Partitioning Modules As mentioned above, several types of mesh re-partitioners are available for selection by the application: mesh-migration, graph partitioning & co-ordinate partitioning. The mesh-migration approach uses the DRAMA cost model directly as a cost function when applying an iterative procedure in which processor pairs perform load balancing (or more precisely cost function balancing) by the logical exchange of elements between their sub-domains. While theoretical convergence proofs show that in worst cases the number of iterative steps grows with the square of the number of processes, practical experience with finite element meshes shows that load balance is achieved after only small numbers of iterations. For further information see [11,12]. ‘Classical’ graph partitioning methods employing weighted graphs derived from either element or nodal mesh connections would be unable to fully account for the costs arising in a finite element application in general. The mesh-to-graph module of the DRAMA library constructs an appropriate weighted graph from the distributed mesh. Depending on the properties and the needs of the application, the resulting graph can be an 'element graph', a 'node graph', or a combined 'element-node graph'. The latter contains all possible relevant cost contributions for finite element codes. For a given partition, edges between nodes, elements or elements and nodes represent different communication requirements between processors. For instance, edges between elements and nodes lead to communication when a sub-domain possesses an element but not all its nodes. The combination of the mesh-to-graph module with a suitable graph partioner results in a mesh partitioner based on the DRAMA cost model. Within the current version of the DRAMA library, the subsequent graph partitioning is carried out by calling routines from PARMETIS, the software package developed by Karypis et al., University of Minneapolis ([13,14]). PARMETIS contains several strategies for graph re-partitioning; in particular a multilevel method based on 'diffusing' load to adjacent partitions. The idea behind this multilevel technique is that from the originally graph a hierarchy of coarser graphs is generated (by merging graph vertices to 'supervertices'). A careful re-partitioning of the coarsest graph is computed, and then this new partitioning is successively 'projected' onto the next finer graph and improved. The latter is achieved by a load diffusion scheme. The co-ordinate partitioning option refers to the standard recursive co-ordinate bisection approach. In addition to providing a possible default partitioning scheme, it is also included to provide a possibility for a later investigation of a dual-partitioning approach (see [15]) for the contact-impact phase of the calculations within PAM-CRASH.

9/11

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998, Presentation at the Mini-Symposium “Dynamic Balancing: Current Status and Recent Progress”

Future Outlook Following the library design and initial implementation phase, investigations are underway to verify that the re-partitioning defined by the library, based on the parameters and costs provided by code instrumentation via the library interface, leads to a maintenance of load balance within the continuing simulations. Results of very preliminary tests with the self-impacting box beam model for the PAMCRASH code (which includes the use of virtual elements defined by the contact-impact algorithms) are shown in Figure 5.

Initial Partition

After 1st re-partitioning

After 2nd repartitioning

Figure 5: Preliminary re-partitioning results with PAM-CRASH interfaced to DRAMA In the near future, validation and performance benchmarking will be carried out with both PAMCRASH and FORGE3 simulations using (limited size) industrial examples. In addition, the possibilities for parallel mesh generation based on the combination of parallel remeshing and repartitioning through mesh migration will be demonstrated. In the latter stages of the project, the expectation is to achieve high scalability with large scale industrial modelling with all three DRAMA applications codes including adaptive meshing and remeshing. In addition to continuing modification of the re-partitioning modules currently included in the library, a co-operation with the University of Greenwich will lead to the inclusion of a DRAMA interface to a modified version of the Jostle mesh partitioning software ([16]). 10/11

4th ECCOMAS Computational Fluid Dynamics Conference, Athens 7-11 September, 1998, Presentation at the Mini-Symposium “Dynamic Balancing: Current Status and Recent Progress”

References [1] K. Stüben, H. Mierrendorff, C.-A. Thole and O. Thomas, Parallel industrial Fluid Dynamics and Structural Mechanics codes, 90-98, in [3], 1996 [2] J. A. Elliott, S. H. Brindle, A. Colbrook, D. G. Green and F. Wray, Real industrial HPC applications, 29-35, in [3], 1996 [3] H. Liddell, A. Colbrook, B. Hertzberger and P. Sloot (Eds.), Proceedings of the HPCN '96 Conference, Lecture Notes in Computer Science 1067, Springer-Verlag, 1996 [4] T. Coupez and S. Marie, From a direct solver to a parallel iterative solver in 3D forming simulation, Int. J. Supercomputer Applications and High Performance Computing, 11(4), 205211, 1997 [5] T. Coupez, S. Marie and R. Ducloux, Parallel 3D simulation of forming processes including parallel remeshing and reloading, Numerical Methods in Engineering '96 (Proceedings of 2nd ECCOMAS Conference, J.-A. Désidéri et. al. Editors), 738-743, Wiley, 1996 [6] J Clinckemaillie, B Elsner, G Lonsdale, S Meliciani, S Vlachoutsis, F de Bruyne and M Holzner, Performance issues of the parallel PAM-CRASH code, Int. J. Supercomputer Applications and High Performance Computing, 11(1), 3-11,1997 [7] G. Lonsdale, A. Petitet, F. Zimmermann, J. Clinckemaillie, S. Meliciani and S. Vlachoutsis, Programming crashworthiness simulation for parallel platforms, Mathematical and Computer Modelling, to appear. [8] EUROPORT-D ESPRIT HPCN Project No. 21102, World-Wide Web Document: http://www.gmd.de/SCAI/europort-d/ [9] B. Maerten, A. Basermann, J. Fingberg, G. Lonsdale, D. Roose, Parallel dynamic mesh repartitioning in FEM codes, Advances in Computational Mechanics with High Performance Computing (Proceedings of the 2nd Euro-Conference on parallel and distributed computing for computational mechanics, B.H.V. Topping Ed.), Saxe-Coburg, 163-167, 1998 [10] The DRAMA Consortium, Library Interface Definition, DRAMA Project Deliverable D1.2a, 1998 [11] T. Coupez, Parallel adaptive remeshing in 3D moving mesh finite element, Numerical Grid Generation in Computational Field Simulation, Vol. 1 (B.K. Soni et.al Editors), 783-792, Mississippi University, 1996 [12] C. Ozturan, H. L. de Cougny, M. S. Shephard and J. E.Flaherty, Parallel adaptive mesh refinement and redistribution on distributed memory computers, Comp. Meth. Mech. Engnrg., 119, 123-137, 1994 [13] G. Karypis, K. Schloegel and V. Kumar, PARMETIS Parallel graph partitioning and sparse matrix ordering library, Version 1.0, Dept. of Computer Science, University of Minnesota, 1997 [14] K. Schloegel, G. Karypis and V. Kumar, Multilevel diffusion schemes for repartitioning of adaptive meshes, J. Parallel and Distributed Computing, 47, 109-124, 1997 [15] S. A. Attaway, E. J. Barragy, K. H. Brown, D.R. Gardner, B. A. Hendrickson and S. J. Plimpton, Transient solid dynamics simulations on the Sandia/Intel Teraflop computer, Supercomputing '97 (Proceedings on CD-ROM), Technical Paper, 1997 [16] C. Walshaw, M. Cross and M. Everett, Dynamic load-balancing for parallel adaptive unstructured meshes, Parallel processing for scientific computing (M. Heath et. al. Eds.), SIAM, 1997

11/11