A Highly Versatile Tool Chain for Robust ...

5 downloads 955 Views 12MB Size Report
Dec 11, 2016 - ation process, MeshLab [44] can be used to convert different mesh file formats. Also, the ...... org/4.8.1/Manual/packages.html, Accessed: 2016-08-05. [78] J. Schöberl ... File:Water Cooled PC.jpg, Accessed: 2016-12-12.
DIPLOMARBEIT A Highly Versatile Tool Chain for Robust Computational Fluid Dynamics Simulations ausgef¨uhrt am Institut f¨ur Mikroelektronik der Technischen Universit¨at Wien unter Anleitung von Ao.Univ.Prof. Dipl.-Ing. Dr.techn. Erasmus Langer, Dipl.-Ing. Dr.techn. Josef Weinbub, BSc und Dipl.-Ing. Dr.techn. Philipp Schwaha durch

G EORG M ACH , BS C Lazarettgasse 41/6 ¨ A-1090 Wien, Osterreich Matr. Nr. 9925083 geboren am 24. Dezember 1979, Wien

Wien, im Dezember 2016

Abstract The challenges of Computational Fluid Dynamics require careful preparation of models in order to ensure successful and correct calculations with reasonable resource consumption. This considerate preparation complicates the simple and wide use of a single simulation package for multiple different problem domains. This work introduces a robust tool chain which allows for a versatile application of Computational Fluid Dynamics calculations to greatly differing problem domains. Generic concepts ensuring and enhancing robustness and versatility are at the core of the thesis. These generic concepts are applied in the construction of a concrete Computational Fluid Dynamics tool chain, which stretches from problem definition to final evaluation. The performance of the software is of great importance in order to allow for application on a wide variety of different hardware, thus performance is crucial for ensuring a high degree of versatility. To put the versatility and robustness to the test, two applications from very different scientific disciplines are selected. On the one hand simulations are carried out to analyze water cooled heat sinks in both a consumer and an industrial application, which represents a classic engineering task. On the other hand the tool chain is also applied to medical input data in the form of Magnetic Resonance Imaging scans, which are processed to construct models for biomedical stress studies. This provides access to physical properties (e.g. velocity profiles and shear stress) otherwise inaccessible in situ. The applicability of the tool chain to these thematically different topics illustrates not only the claims of both versatility as well as robustness, but also that performance of the software does not get degraded by using the developed generic approach.

i

Kurzfassung Die Herausforderungen von computergest¨utzter Fluiddynamik erfordern die gewissenhafte Vorbereitung der Modelle, um erfolgreiche und korrekte Berechnungen mit vertretbarem Aufwand gew¨ahrleisten zu k¨onnen. Diese wohl u¨ berlegten Vorarbeiten erschweren allerdings den einfachen Einsatz der einzelnen Simulationspakete f¨ur eine Vielzahl an Problemstellungen. Diese Diplomarbeit pr¨asentiert eine robuste und universell einsetzbare Kette von Werkzeugen, die computergest¨utzte Fluiddynamik Simulationen in unterschiedlichsten Problemstellungen erm¨oglicht. Kernthema dieser Arbeit sind generische Konzepte zur Gew¨ahrleistung und Erh¨ohung von Versatilit¨at und Robustheit. Diese generischen Ans¨atze werden bei der Erstellung einer konkreten Werkzeugkette f¨ur computergest¨utzte Fluiddynamik angewandt, die sich von der Definition der Problemstellung bis zur abschließenden Auswertung erstreckt. Um die Software auf m¨oglichst vielen unterschiedlichen HardwareUmgebungen einsetzen zu k¨onnen, spielt Performanz eine große Rolle und ist daher ein unverzichtbarer Faktor, um einen hohen Grad an Versatilit¨at sicherzustellen. Um die Robustheit und Vielseitigkeit zu testen, wurden zwei Anwendungen aus sehr unterschiedlichen Wissenschaftsdisziplinen ausgew¨ahlt. Auf der einen Seite wurden Simulationen zur Analyse von Wasserk¨uhlungen sowohl aus dem Endverbraucherbereich als auch aus dem industriellen Einsatz durchgef¨uhrt, was eine klassische Ingenieursaufgabe darstellt. Auf der anderen Seite wurde die Werkzeugkette auch auf medizinische Bilddaten aus Magnetresonanztomographien angewandt, die verarbeitet wurden, um Modelle f¨ur biomedizinische Stresstests zu bekommen. Dadurch wurden physikalische Gr¨oßen (z.B. Geschwindigkeitsprofile und Scherspannungen) zug¨anglich gemacht, die vor Ort nicht bestimmt werden k¨onnen. Die Anwendbarkeit der Werkzeugkette auf diese inhaltlich unterschiedlichen Thematiken illustriert nicht nur die Anforderungen an Versatilit¨at und Robustheit, sondern auch, dass die Performanz der Software nicht unter den entwickelten generischen Ans¨atzen leidet.

ii

Acknowledgments First, I want to thank Prof. Erasmus Langer for supervising my master’s thesis. My sincere thanks go to Josef Weinbub, who undertook the tasks of being my second supervisor and reviewing this work on very short notice. During our shared time at the institute we had many exhilarating discussions and inspiring projects I am very thankful for. I cannot thank enough Philipp Schwaha who always supported me in many ways since my first days at the Institute for Microelectronics until the completion of this work. He answered questions regarding operating systems and programming just as well as queries concerning the English language. Together with Ren`e Heinzl, to whom special thanks are given too, Philipp introduced me to meshing, generic programming and scientific computing. I also want to thank Prof. Siegfried Selberherr for the possibilities to work at the institute, participate in projects and even visit a conference as an undergraduate student. Many thanks go to all my colleagues, especially to Michael Stadler and Franz Stimpfl, for fruitful discussions, joint projects and enjoyable lunch breaks. Furthermore, I want to express my gratitude to Camillo Sherif for introducing me to the exciting field of biomedical simulations and Christoph Lehrenfeld for many bug fixes and improvements of ngs-flow on short notice. I also thank the Cerebrovascular Research Group Vienna for providing the opportunity to use the Comsol Multiphysics suite for my thesis. This work could not have been done without the steady support of many friends. Thanks for discussions, proofreading or providing a place to work without any distractions. Special thanks go to Florian Gerstl for pushing me every now and then and being a true companion. Last but not least I want to thank my beloved ones. I could not have finished my studies without the help of my parents and my sister who helped me in many different ways and always believed in me. Also the backing of Giulia Raberger was crucial for the success of my work. I am grateful for all sacrifices she made.

iii

Contents 1 Introduction 1.1

1.2

1.3

1

Computational Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

1.1.1

Model Generation and Formalization . . . . . . . . . . . . . . . . . . . . . . . .

2

1.1.2

Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

Robustness and Versatility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

1.2.1

Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

1.2.2

Tool Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

1.2.3

Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

2 Theoretical Background

6

2.1

Fluid Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

2.2

Heat Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

2.3

Interaction Between Heat and Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

3 Concepts for Robust CFD Simulations 3.1

10

Modern Programming Paradigms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.1.1

Modularity and Encapsulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2

Parallel Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3

Improvements on Resolution and Contrast (Image Post-Processing)

. . . . . . . . . . . . 16

3.3.1

Digital Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.3.2

Layer Shifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.4

Smoothing and Cropping (Geometry Post-Processing) . . . . . . . . . . . . . . . . . . . . 19

3.5

Mesh Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.5.1

Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.5.2

Mesh Generation Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.5.3

Mesh Refinement and Adaption . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.6

Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.7

Time Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

iv

4 Software

29

4.1

Tool Chain Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.2

Constructive Solid Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.3

Imaging and Image Post-Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3.1

4.4

4.5

4.6

4.7

Insight Segmentation and Registration Toolkit . . . . . . . . . . . . . . . . . . . . 31

Segmentation and Geometry Post-Processing . . . . . . . . . . . . . . . . . . . . . . . . 32 4.4.1

itkSNAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.4.2

Vascular Modeling Toolkit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.4.3

MeshLab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Mesh Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.5.1

Generic Mesh Generation Interface . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.5.2

Ballistics Research Laboratory Computer-Aided Design . . . . . . . . . . . . . . 34

4.5.3

Triangle and Show Me . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.5.4

Computational Geometry Algorithms Library . . . . . . . . . . . . . . . . . . . . 36

4.5.5

Tetgen and Tetview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.5.6

Netgen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Computational Fluid Dynamics Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.6.1

NGSolve and ngs-flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.6.2

Comsol Multiphysics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Conclusions for the Tool Chain Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5 Applications 5.1

5.2

41

Thermal Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.1.1

Problem to Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.1.2

Problem Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.1.3

Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Biomedical Stress Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.2.1

Problem to Solve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.2.2

Problem Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.2.3

Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6 Summary and Outlook

73

Acronyms and Definintions

75

Bibliography

77

Eidesstattliche Erkl¨arung

86

v

List of Tables 3.1

Comparison of the systems used for the performance tests. . . . . . . . . . . . . . . . . . 12

4.1

Comparison of the advantages and disadvantages of itkSNAP . . . . . . . . . . . . . . . 33

4.2

Comparison of the advantages and disadvantages of VMTK . . . . . . . . . . . . . . . . . 33

4.3

Comparison of the advantages and disadvantages of Triangle. . . . . . . . . . . . . . . 35

4.4

Comparison of the advantages and disadvantages of CGAL. . . . . . . . . . . . . . . . . . 36

4.5

Comparison of the advantages and disadvantages of Tetgen. . . . . . . . . . . . . . . . 36

4.6

Comparison of the advantages and disadvantages of Netgen. . . . . . . . . . . . . . . . 36

4.7

Comparison of the advantages and disadvantages of the NGSolve suite. . . . . . . . . . . 38

4.8

Comparison of the advantages and disadvantages of Comsol Multiphysics. . . . . . 38

4.9

Usage of the described software tools within the investigated tool chains for the evaluation applications in Chapter 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.1

Material properties used in the first example. . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.2

Material properties used in the simulations of the second example. . . . . . . . . . . . . . 47

5.3

Comparison of the characteristics of all simulated designs . . . . . . . . . . . . . . . . . . 52

5.4

Comparison of the results of both simulated designs. . . . . . . . . . . . . . . . . . . . . 60

5.5

Comparison of the results of all three positions . . . . . . . . . . . . . . . . . . . . . . . 67

List of Listings 3.1

simple C approach for traversing using multidimensional arrays . . . . . . . . . . . . . . 11

3.2

generic approach of GSSE to traverse over a container of vertices . . . . . . . . . . . . . . 11

3.3

C++11 approach for a traversal algorithm, using std::tuple for the container . . . . . 12

3.4

Example for task parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.5

Example for data parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.6

Layer Shifting Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 vi

List of Figures 1.1

The sequence of the tool chain. The colors indicate the steps of computational simulations: Structure definition (blue), meshing (green), formalization (yellow) and the simulation itself (red). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

3.1

Performance (iterations per second) of generic code compared to simple C implementation [9] on an Intel P4 (left) and an AMD64 (right). . . . . . . . . . . . . . . . . . . . . . 11

3.2

Results of the performance tests: Performance (iterations / s) of three different implementations of a traversal algorithm on an Intel i7 (upper left), an AMD Turion II (upper right) and an ARM Cortex-A7 (lower left); memory usage for all three implementations measured on the Intel i7 (lower right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3

DSA of a rabbit1 . The sequence shows the mask (native sequence without contrast agent) on the left, the contrast enhanced image (where the arteries are apparent) in the middle and the subtraction image (with the arteries only) on the right. . . . . . . . . . . . . . . . 17

3.4

Layer shifting used to increase the resolution of the image: Optimal sequence, scanned sequences, combination and enhanced sequence. . . . . . . . . . . . . . . . . . . . . . . . 18

3.5

Demonstration of the need for smoothing: A segmented structure (blue) and the correspondent mesh (green) before (left) and after (right) smoothing. . . . . . . . . . . . . . . 19

3.6

Structure of the Generic Mesh Generation Interface. Each kernel module is available at each step of the process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.7

Hybrid meshes. The black parts are meshed using structured grids, the blue parts are unstructured. Left: Two-dimensional (2D) mesh of a bifurcation. Right: Three-dimensional (3D) mesh of a simplified model of the Raspberry Pi 1 A+. . . . . . . . . . . . . . . . . . 22

3.8

Surface mesh generation using the advancing front method. The rules on the left are applied to the problem on the right. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.9

Left: Explanation of the Delaunay criterion. In the left variant the points are within the circumcircles of the respective other triangle, in the right variant the circumcircles are empty. Thus the minimum inner angles have been maximized. Right: Duality of the Delaunay triangulation (black) and the Voronoi tessellation (red). The dotted lines are prolongations of Voronoi edges (dashed lines), the edges of the mesh are displayed as continuous lines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.10 A bar before (left) and after (right) adaptive refinement. The color represents the strain on the bar, the gradient of strain was used to determine the refinement. . . . . . . . . . . . 24 3.11 Several refinement candidates for a fourth order element. The size h is depicted directly as size of the triangles, the number in and color of the triangles indicate the polynomial degree p. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

vii

3.12 Left: Domain decomposition of a volume mesh of a transformer into four subdomains. For better view, the meshes of the coils (blue, green) have been partly removed. The core has been split into two subdomains (yellow: Prism elements - red: Tetrahedra elements). Right: Partitioning of a mesh (black) into disjoint element sets (red, green) along the domain boundary (cyan). These sets are divided between processors (0 - green, 1 - red). The boundary nodes are assigned to the processor with lower ID (0 - green). . . . . . . . . 26 3.13 Degrees of freedom of different linear Galerkin FEMs in 2D: Left: CG-FEM, center: DG-FEM, right: HDG-FEM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.14 The continuity of tangential (dashed) and normal (dotted) components of different Galerkin FEMs. Continuous components are marked with green parallel arrows, discontinuous components are marked with red antiparallel arrows. Left: CG-FEM, center: Divergence free DG-FEM, right: Divergence free HDG-FEM. . . . . . . . . . . . . . . . . . . . . . 27 4.1

Possible combinations of tools in the chain when using CSG. . . . . . . . . . . . . . . . . 29

4.2

Possible combinations of software tools when using imaging and segmentation. . . . . . . 30

4.3

CSG tree example. The operations are labeled as follows: ∪ for union, ∩ for intersection, \ for difference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.4

Two basic segmentation methods: Left: Threshold technique, right: Edge based method with seeding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.5

Surface reconstruction of MRI images with VMTK. Left: Selection of the region of interest, right: Reconstructed surfaces of blood vessels. . . . . . . . . . . . . . . . . . . . . . 33

4.6

Smoothing of a surface with MeshLab. Left: Before smoothing, a lot of artifacts like sharp edges and steps are present, right: Post-processing significantly reduces the artifacts. Color implies curvature in respect to the current extreme values. . . . . . . . . . . . 34

4.7

Meshing of the TU Wien logo with Triangle. Left: Geometry given by points, segments (blue lines) and holes (red crosses), right: Resulting mesh with quality criteria set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.8

Meshing of a simple cylinder containing a sphere with Tetgen. Left: Geometry to mesh given by the nodes, right: Resulting mesh (clipped). . . . . . . . . . . . . . . . . . . . . 37

4.9

Meshing of a water cooling device for CPUs with Netgen. Left: Geometry to mesh, right: Meshing process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.10 Simulation of a water cooling device for CPUs with NGSolve and ngs-flow. Left: Boundary conditions and model components, right: Results of the simulation. . . . . . . . 38 4.11 Simulation of a human aortic arch with Comsol Multiphysics. Left: Model tree with equations, right: Results of the simulation. . . . . . . . . . . . . . . . . . . . . . . . 39 5.1

Left: Installed water-cooling system2 . Right: Diagram of the cooling circuit consisting of the heat sink, the radiator unit to cool the water and a pump as well as an expansion tank. 44

5.2

Geometry of the original CPU heat sink design in upright position as it is mounted in a tower casing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.3

Tool chain used to simulate the CPU heat sink examples. . . . . . . . . . . . . . . . . . . 45

viii

5.4

Screenshots from an IBM Research video [96]. From upper left to lower right: Highest to lowest layer of a hot water cooling heat sink using microchannels. In the first picture the connections can be seen, the last picture shows the direct contact of the water with the processor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.5

Tool chain used to simulate the hot water cooling examples. . . . . . . . . . . . . . . . . . 46

5.6

Meshes of two different cooler designs. Left: Original design found in a cooler of a server (1a). Right: Same design but thicker base plate (1b). . . . . . . . . . . . . . . . . . . . . 47

5.7

Velocity streamlines of the water in the cooler designs 1a and 1b. The color indicates the velocity magnitude from blue (lower velocities) to red (higher velocities). . . . . . . . . . 48

5.8

Temperature of the copper element and the surrounding water (cut through the cooler) in designs 1a and 1b. The minimum and maximum values of the scale are given in white and black respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.9

Distribution of the temperature at the bottom of the base plate in designs 1a and 1b. The minimum and maximum values of the scale are given in white and black respectively. . . . 49

5.10 Meshes of two cooler designs with pyramids. Left: Design 2a, right: Design 2b with shorter outflow cavity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.11 Velocity streamlines of the water in the cooler designs 2a and 2b. The color indicates the velocity magnitude from blue (lower velocities) to red (higher velocities). . . . . . . . . . 49 5.12 Temperature of the copper element and the surrounding water (cut through the cooler) in designs 2a and 2b. The minimum and maximum values of the scale are given in white and black, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.13 Distribution of the temperature at the bottom of the base plate in designs 2a and 2b. The minimum and maximum values of the scale are given in white and black, respectively. . . 50 5.14 Meshes of three cooler designs with Manhattan type structure. Left: Design 3a with five cuts each in both directions of the base plate. Middle: Design 3b with only three cuts each in both directions of the base plate. Right: Design 3c with three cuts each in both directions of the base plate and one in each diagonal direction. . . . . . . . . . . . . . . . 50 5.15 Velocity streamlines of the water in the cooler designs 3a, 3b and 3c. The color indicates the velocity magnitude from blue (lower velocities) to red (higher velocities). . . . . . . . 51 5.16 Temperature of the copper element and the surrounding water (cut through the cooler) in designs 3a, 3b and 3c. The minimum and maximum values of the scale are given in white and black, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.17 Distribution of the temperature at the bottom of the base plate in designs 3a, 3b and 3c. The minimum and maximum values of the scale are given in white and black respectively.

51

5.18 Geometry and mesh of a cooler. The designs 2a and 3c have been combined to design 4. . 52 5.19 Simulation results for design 4. The minimum and maximum values of the temperature scales are given in white and black respectively. Left: Velocity streamlines of the water. The color indicates the velocity magnitude from blue (lower velocities) to red (higher velocities).Middle: Temperature of the copper element and the surrounding water. Right: Distribution of the temperature at the bottom of the base plate. . . . . . . . . . . . . . . . 52 5.20 Construction of the first layer of the IBM microchannel structure. Left: Basic element. Middle: Inflow and outflow element with smoothed corners. The outflow element is extended by a cone. Right: Array of elements. . . . . . . . . . . . . . . . . . . . . . . . 53

ix

5.21 Construction of the second layer of the IBM microchannel structure. Left: One element with an additional connection pipe. Middle: Array of the elements. Right: The outflow elements are added to complete the second level. . . . . . . . . . . . . . . . . . . . . . . 53 5.22 The elements of the first layer marked in blue are not connected to the second layer and have to be removed. This problem occurs on two of the four edges of the structure. . . . . 54 5.23 Final IBM microchannel structure. Left: The cavity directly upon the processor has been added connecting the in- and outflow channels. Middle: Copper (depicted in brown) is added to the water-bearing channels (marked blue). Right: Depiction of the inflows (shown in blue) and the outflow elements (marked red). . . . . . . . . . . . . . . . . . . . 54 5.24 Mesh of the simulated IBM microchannel structure. The color indicates the size of the tetrahedra relative to the mesh volume. The mesh is clipped to reveal the inner structure. Blue shows fine, white normal and red coarse elements. . . . . . . . . . . . . . . . . . . . 54 5.25 Quality of the mesh elements depending on the aspect ratios of the edges. Only the 1% of all tetrahedra falling into the worst quality category (0.01 - 0.31) are shown. The histogram shows the distribution of all mesh elements over quality from 0 to 1. . . . . . . . . . . . . 55 5.26 Results of the CFD simulation of the IBM heat sink: The pressure levels are depicted in the upper picture, whereas the lower image shows the velocity of the water stream. . . . . 56 5.27 Results of the heat transfer simulation of the IBM heat sink shown from below: Temperature (left) and magnitude of temperature gradient (right). The minimum and maximum values are taken from the contact plane. . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.28 CSG building process of the advanced microchannel structure. Upper row: The basic element (left), one in- and outflow each (middle); an array of first layer elements (right). Lower row: A complete second layer element with in- and outflow (Left), the completed water bearing structure including the connecting cavity (middle) and the whole heat sink with (brown) copper and (blue) water (right). . . . . . . . . . . . . . . . . . . . . . . . . 57 5.29 Clipped mesh of the advanced microchannel structure. The color indicates the size of the tetrahedra relative to the mesh volume (from blue to red showing finer to coarser). . . . . . 58 5.30 Quality of the mesh elements depending on the aspect ratios of the elements. The worst 1% of all tetrahedra are shown, the histogram reveals the dispersion of all mesh elements over quality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5.31 Results of the fluid motion simulation of the advanced heat sink: The upper picture shows the pressure levels, velocity of the water stream is depicted in the lower image. . . . . . . 59 5.32 Results of the heat transfer simulation of the advanced heat sink shown from below: Temperature (left) and magnitude of temperature gradient (right). The minimum and maximum values are again taken from the contact plane. . . . . . . . . . . . . . . . . . . . . . 60 5.33 Cerebral aneurysm [106]3 . Left: Angiogram image and 3D reconstruction of a large aneurysm at the basilar artery. Right: Intraoperative surgical images of a large intracranial aneurysm before treatment (A) and after successful clipping (B). . . . . . . . . . . . . . . 61 5.34 The consequence of the rupture of an intracranial aneurysm can be a hemorrhagic stroke. The diagram depicts the bleeding following such a rupture. . . . . . . . . . . . . . . . . . 62 5.35 Gross appearance of a subarachnoid hemorrhage [107]4 . Left: The ruptured aneurysm at the circle of Willis is marked by a white arrow on a black card. Right: The basilar artery was marked with a yellow ∗, the ruptured aneurysm with a yellow ◦. . . . . . . . . . . . . 62

x

5.36 CT used to diagnose a hemorrhagic stroke. The white areas are bone, nearly black areas are liquor, gray areas are brain and nearly white areas represent blood. Left: CT image of a healthy brain [109]5 . Right: CT image of a brain with a subarachnoid hemorrhage (surrounded by a red line) and following intraventricular hemorrhage (surrounded by a green line) [110]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.37 Time-dependent velocity of blood. Left: Doppler sonogram of a human artery [118]. Right: Flow profile derived from averaging different human Doppler sonograms. . . . . . 64 5.38 Tool chains used to simulate the blood flow examples. . . . . . . . . . . . . . . . . . . . . 64 5.39 (From left to right) Meshes of the positions A, B and C of the aneurysm sac. . . . . . . . 65 5.40 (From left to right) Velocity of the blood flow (from blue to red implying slower to faster) for the positions A, B and C of the aneurysm sac. . . . . . . . . . . . . . . . . . . . . . . 65 5.41 (From left to right) Absolute value of the tangential components (shear stress) of the surface forces (from blue to red implying lower to higher) for the positions A, B and C of the aneurysm sac. The lower images show a cross section of the simulated vessels including the stream lines depicting the velocity of the blood stream. . . . . . . . . . . . . 66 5.42 (From left to right) Absolute value of the normal components of the surface forces (from blue to red implying lower to higher) for the positions A, B and C of the aneurysm sac. Again the lower images show a cross section of the simulated vessels including the stream lines depicting the velocity of the blood stream. . . . . . . . . . . . . . . . . . . . . . . . 67 5.43 Generation of an artificial bifurcation with an aneurysm model. Left: Schematic diagram showing the artificial bifurcation of the parent vessels (1, 2) and a venous pouch as an aneurysm model (∗) [119]. Right: Inter-operational picture showing the completed bifurcation (1, 2) and the blood-filled aneurysm (∗) [120]. . . . . . . . . . . . . . . . . . . 68 5.44 Segmentation from MRI images. Left: Segmentation of the parent vessels and the recanalized part (∗) of the aneurysm. Right: Segmentation of the parent vessels, the embolized part of the aneurysm sac (◦) and the recanalized part (∗). . . . . . . . . . . . . . . . . . . 68 5.45 Meshes of the segmented structures. Left: Mesh of the whole structure including the parent vessels, the embolized part of the aneurysm sac (◦) and the recanalized part (∗). Right: Finer mesh including a smaller part of the parent vessels and the recanalized part (∗) only. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.46 Velocity of the blood flow (from blue to red implying slower to faster). Upper row: Coarser mesh including the embolized part of the aneurysm sac. Lower row: Finer mesh including a smaller part of the parent vessels and the recanalized part only. Left column: Flow through the parent vessels and the recanalized part. The embolized part of the aneurysm sac has no flow. Right column: Turbulences caused by the recanalized part. . . 70 5.47 Absolute value of the tangential components (shear stress) of the surface forces (from blue to red implying lower to higher). Upper left: Overview, upper right: Detailed view including the embolized part of the aneurysm sac. Lower left: Finer mesh including the parent vessels and the recanalized part only. Lower right: Cut through the finer mesh showing not only the shear stress but also the streamlines of the blood flow coming from the left. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

xi

5.48 Absolute value of the normal component of the surface forces including pressure (from blue to red implying lower to higher). Upper left: Overview, upper right: Detailed view including the embolized part of the aneurysm sac. Lower left: Finer mesh including the parent vessels and the recanalized part only. Lower right: Cut through the finer mesh showing not only the surface forces but also the streamlines of the blood flow coming from the right. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.1

Tool chain presented in this work. All possible combinations of the modules are depicted with arrows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

xii

Chapter 1

Introduction Computational Fluid Dynamics (CFD) is a branch of fluid mechanics dedicated to solving problems involving the flow of fluids by means of computer simulations. Since many of these problems do not admit analytical solutions in general, numerical methods and algorithms are used to compute approximations. The following parts of this chapter give a general introduction to the topic of the thesis and provide an overview about the structure of this work and a definition of the ideas of robustness and versatility.

1.1 Computational Modeling Even though simulations certainly cannot be properly utilized in each and every application, in many cases they are a fast and cheap alternative to other development techniques like prototyping. In other cases simulations even provide access to physical properties which elude direct measurement and other forms of determination. Although, to get meaningful and accurate results, the principles of modeling (setting up a computational model of a real process or system) have to be respected in any case [1]. Therefore, the following six main stages of simulating as a technology of processing have to be completed: 1. Set up a model, including all substantial characteristics of the actual system but utilize simplification for economic reasons and ease of perception. 2. Formulate the model in mathematical expressions, for example using coupled differential equations. 3. Solve the set of equations using simulation tools to find steady states. 4. Analyze the stability of the found solutions. This step is crucial to ensure the validity of subsequent conclusions. 5. Determine practical applications of the used model. Since the characteristics of the system have been investigated already in the first step, solid fundamentals are provided. 6. Analyze the model used. If any flaws are found, return to step one. The first three stages (model generation, formalization and simulation) are shortly described in the following two subsections. The subsequent subsections define the ideas of robustness and versatility and summarize the situation at the time when this work was started as well as some points where improvements could be made.

1

1.1.1 Model Generation and Formalization The aim of model generation and formalization is to set up a computational model including the geometrical data as well as all relevant characteristics of the materials and systems involved in a mathematical and computer-readable language. Therefore, the necessary steps include the identification of the relevant formalisms and the expression as equations. All parameters required for the equations due to the involved materials need to be identified. Finally, a mesh to calculate the solutions on needs to be constructed. In general, it is not always possible or suitable to solve the differential equations directly. In such cases, numerical methods like the Finite Element Method (FEM) have to be used. Therefore and because of the limitation of memory and computational power, the simulation domain has to be represented not by continuous areas but by distinctive points, edges, faces and solids. To get the geometric information needed into a computer-readable form there are two possibilities: • Having a construction plan1 the geometry can be modeled using standard geometrical bodies like planes, cylinders and spheres. This technique is called Constructive Solid Geometry (CSG) [2], see Section 4.2. • Having images (like tomography images) or similar data sets of the structure to be modeled, the borders between distinct areas can be reconstructed by segmentation (see Section 4.4), dividing the image or data set into parts of different tissues or materials. Thereby the geometrical model is obtained. Once the geometry has been (re)constructed, the simulation domains have to be locally discretized. This process is called meshing. The different techniques are described in Chapter 3. Not only the simulation domains have to be discretized but also the continuous differential equations have to be transformed to discrete difference equations to make them accessible to numerical computing.

1.1.2 Simulation The aim of the simulation itself is to find a solution to the difference equations derived in the previous step. Thereby, the solution of the difference equations has to be a solution of the differential equations, from which the difference equations have been derived, too. This ensures the validity of the outcome. Subsequently, the results have to be provided in a format suitable for analysis (like visualization, data tables or other forms of data analysis). Again, the limited memory and computational power of computer systems are the main constraints to consider while choosing algorithms and parameters for solving. Nevertheless, the fit of the solver (algorithm used to find a solution of the equations) to the system of equations has the biggest impact both on the validity of the results and the pace of the computation. Different solvers to use are described in Chapter 3 and are broadly discussed in the literature [3].

1.2 Robustness and Versatility At the beginning of this work, the bigger part of available tools was specialized on very specific tasks in narrow fields of applications. Only a few tools had a broader, more generic approach. Not only the reusability, maintainability and extensibility suffer from too specialized approaches, but also the necessary robustness is difficult to achieve. 1

A construction plan is all data specifying the needed geometrical elements like points, edges and so on.

2

1.2.1 Definitions Robustness Robustness is the ability of a system to sustain its functionality despite of variations in the characteristics of the environment without the need of modifications to the respective system itself. For simulation software possible variations of the environment include the field of application, for example, implicating changes to the involved materials, their characteristics and thus the values of their parameters or the typical dimensions. Further modifications of the used models can lead to mathematically ill-conditioned cases of a previously well posed problem. The following factors have profound impact on robustness, within this thesis it will be shown, how robustness can be increased: • Segmentation: Smoother structures from grained images are partly topic of this thesis. • Meshing: Better meshes are a main topic of this work. • Simulation: Different schemes for discretization and time integration are main topics of the thesis. Versatility Versatility - deriving from the Latin word versatilis, which means agile, deft, alterable - describes the ability to turn to or be used for different fields or tasks. Simulation tool chains can be versatile in more than one way: In the sense of turning to different fields versatile tool chains can be easily used for multiple areas of application. In the sense of being alterable they can be easily modified, for example, by exchanging single parts or methods by other ones better suitable to solve the specific task. Comparing the two definitions of robustness and versatility given above, it stands out that both deal with the ability to react on changing environments. In this way they both support each other: Robust components allow the application of the software while turning to different fields; a versatile tool chain can help to sustain functionality despite of changing surroundings. The following factors are crucial to achieve versatility, the following chapters will show where and how versatility is achieved: • Different geometries and dimensions: Local discretization is a main topic of the thesis. • Diversified materials: Different fields of application are a main topic of this work. • Altering mathematical conditions: Different solvers are partly topic of this thesis.

1.2.2 Tool Chain The following characteristics for the used software are crucial to reach the aforementioned goals considering the constraints regarding functionality of the software, expenditure of time and trust in the results:

3

• Modularity, compatibility, and extensibility for easily exchanging and adding components, which is generally important due to ongoing enhancements on all parts of simulation tools. • Usability and high performance2 are essential criteria for using simulation software in any practical application. Not only the time spent waiting for results of a simulation but also the time to get familiar with the software are typically strictly limited. • Accuracy and stability for a wide range of applications and solving even mathematically ill-conditioned cases. These criteria can be best matched by using a flexible chain of tools instead of a monolithic tool, since some of the aforementioned characteristics, like modularity and extensibility, are satisfied to a certain point on its own when using a tool chain. Figure 1.1 depicts the sequence of the developed tool chain from modeling to flow simulation. While the chain of tools satisfies some criteria automatically, the other requirements listed above are not necessarily met as a matter of course. Chapters 3 and 4 will point out different approaches to increase both robustness and versatility as well as tools, that implement these approaches. Imaging

Segmentation

Meshing

Formalization

Flow Simulation

Constructive Solid Geometry

Figure 1.1: The sequence of the tool chain. The colors indicate the steps of computational simulations: Structure definition (blue), meshing (green), formalization (yellow) and the simulation itself (red).

1.2.3 Application The first application chosen is a classical engineering problem: Analyze and improve upon the specification of a Central Processing Unit (CPU) heat sink design for liquid coolants. Miniaturization has resulted in packaging densities that dissipate considerable amounts of heat even as total power consumption of individual devices is reduced so that this topic continues to be of interest. Since dealing with liquids the simulation falls into the challenging domain of CFD. Starting with a simple, straight forward geometry of a CPU heat sink for home use, more complex, state of the art, industrial deployment scenarios are investigated. This quickly leads to issues with obtaining detailed data directly, due to confidentiality of design specifics. Promotional material is available, however, from which basic aspects of the deployed solutions can be reconstructed. The promotional material comes in the forms of videos and presentation, so not in exact structural descriptions. Therefore, methods of image processing and structure identification are turned to enable reverse engineering of more complex structures.

2

Performance is a measure for the amount of work accomplished (per time).

4

The increased complexity and level of detail is a first test of the robustness strived for in the developed tool chain. However, further challenges to the claim of combining modularity and robustness are highly desirable. During the investigations of structure extraction from image data, medical imaging is encountered repeatedly. The very different nature of the medical field of application renders it as a perfect vehicle to further investigate the robustness and versatility of the developed tool chain approach. As a result, the presented tool chain is confidently called robust with a broad spectrum of possible applications, like demonstrated in Chapter 5.

1.3 Overview This thesis is organized as follows: In Chapter 2 a short introduction into the mathematical and physical background of fluid motion and heat transport is given. Chapter 3 presents the concepts used for robust CFD simulations and provides the theoretical foundation of this thesis. Chapter 4 puts the thesis into context by depicting the application of the predefined concepts by the used software. Short descriptions of the used libraries and programs are provided as well as their advantages and disadvantages discussed. Chapter 5 demonstrates the wide area of possible applications. Four exemplary showcases from two entirely different fields of utilization, to underline the versatility of the approach, are presented. Chapter 6 summarizes the results of the work as well as the lessons learned. Finally, an outlook on possible further improvements is given.

5

Chapter 2

Theoretical Background This chapter gives an introduction into the theoretical background of fluid motion, heat transfer and their interactions, mainly by providing the most important equations that are used in the simulations. Further information can be found in any introduction to fluid mechanics [4, 5, 6] and thermodynamics [7, 8].

2.1 Fluid Motion Starting with the Leibniz-Reynolds transport theorem Z Z Z d QdV = 0, u~v · ~ndA − udV + dt Ω Ω ∂Ω

(2.1)

where Ω is a test volume, u is an intensive quantity, ~v is the velocity and Q representing the sources and sinks, one can use the Green theorem as well as Leibniz’s rule to get  Z  ∂u ~ + ∇ · (u ⊗ ~v ) − Q dV = 0. (2.2) ∂t Ω Since this has to be valid with any test volume, the integrand itself has to vanish: ∂u ~ + ∇ · (u ⊗ ~v ) − Q = 0 ∂t

(2.3)

This equation has the form of the continuity equation. For the conservation of momentum u = ρ~v and Q = ~b, where ρ is the density, ~v is the velocity vector and ~b are body forces, is set: ∂ ~ · (ρ~v ⊗ ~v ) − ~b = 0 ρ~v + ∇ ∂t Rearranging leads to ~v



   ∂ρ ~ ∂~v ~ + ∇ · (ρ~v ) + ρ + ~v · ∇~v = ~b. ∂t ∂t

(2.4)

(2.5)

Then conservation of mass (with no sources or sinks of mass) ∂ρ ~ + ∇ · (ρ~v ) = 0 ∂t is assumed. This leads to ρ



∂~v ~v + ~v · ∇~ ∂t 6



= ~b.

(2.6)

(2.7)

With the material derivative

D ∂ ~ (⋆) := (⋆) + ~v · ∇(⋆) Dt ∂t

(2.8)

one gets ρ

D~v ~ = b. Dt

(2.9)

Splitting the body force ~b into forces caused by stresses and other ones (all non-stress forces are grouped together by f~) leads to D~v ~ · σ + f~, =∇ (2.10) ρ ∼ Dt where σ is a stress tensor. With the assumption of small deformations Hooke’s law can be used to split the ∼ stress tensor into a volumetric and a deviatoric strain tensor: σ = 2η ε + λ tr(ε ) I = −pI + T , ∼ ∼ ∼ | {z ∼}

(2.11)

−p

where λ and η are Lam`e parameters, p is the pressure, I is the identity matrix and T is a deviatoric strain ∼ tensor. Therefore, one can write ρ

  D~v ~ +∇ ~ · T + f~. ~ · −pI + T + f~ = −∇p =∇ ∼ ∼ Dt

(2.12)

This represents the most basic form of the Navier-Stokes-Equation (NSE). For newtonian fluids the following characteristics are assumed: • The stress tensor is a linear function of the strain rates. • The fluid is isotropic. • For a fluid at rest, the divergence of the deviatoric strain tensor must equal zero. So, the deviatoric stress tensor T is given by ∼

  ~ v ) + (∇~ ~ v )T + Iλ∇ ~ · ~v T = η (∇~

(2.13)

  ~ ∇ ~ · ~v ) ~ · (∇~ ~ v ) + (∇~ ~ v )T = ∇2~v + ∇( ∇

(2.14)

D~v ~ + η∆~v + (λ + η)∇( ~ ∇ ~ · ~v ) + f~ = −∇p Dt

(2.15)



With the NSE can be written as ρ or

Inertia

z

 ρ

∂~v ∂t |{z}

Unsteady acceleration

}|

~v + ~|v ·{z ∇~ }

Convective acceleration

{ 

Divergence of stresses

z }|  { 2 ~ ~ ~ = −∇p + η∇ ~v + (λ + η) ∇ ∇ · ~v + f~ . | {z } | {z } {z } |{z} | pressure Other shear stress compression

|

7

{z

viscosity term

}

body forces

(2.16)

When simulating the flow of incompressible fluids the constant mass density leads to vanishing divergences of the velocity (the velocity field is solenoidal) ~ · ~v = 0 ∇

(2.17)

allowing to express the incompressible NSE (iNSE): Inertia

z

ρ



∂~v ∂t |{z}

Unsteady acceleration

}|

~v + ~v| ·{z ∇~ }

Convective acceleration

{ 

Divergence of stresses

z }| { ~ + η∇2~v + f~ = −∇p | {z } | {z } |{z} pressure

shear stress (viscosity term)

(2.18)

Other body forces

If a steady state system is simulated, the time-dependent term of the equation equals zero leaving ~ v = −∇p ~ + η∇2~v + f~. ρ~v · ∇~

(2.19)

For the calculation of initial values to consequently solve the iNSE the incompressible Stokes Equation (iSE), which neglects the inertia by setting the unsteady and convective acceleration terms in the iNSE to be zero, is used: ~ + η∇2~v + f~ = 0 − ∇p

(2.20)

2.2 Heat Transfer Heat transfer by bulk motion is called convection and described by Q˙ = hA(Ts − Tb )

(2.21)

where • Q˙ is the rate of heat transfer, • h is a transfer constant (dependent on, e.g., the materials), • A is the transfer area and • Ts and Tb are the temperatures at the surface and the bulk, respectively. There are two different types of convection: • Natural convection: Bulk motion is caused by circulation of fluids due to buoyancy from the density changes induced by heating itself. • Forced convection: Bulk motion is somehow forced. In general, heat transfer in moving fluids or solids can be described by   ∂T ~ ~ · ~q = Q ρCp + ~u · ∇T + ∇ ∂t

8

(2.22)

where • ρ is the mass density of the material, • Cp is the specific heat capacity of the material, • k is thermal conductivity of the material, • ~u is the movement in the material, • T is the absolute temperature, • ~q is the heat flux by conduction and • Q summarizes all heat sources. The heat transfer by conduction within materials (with as well as without inner movement) is described by Fourier’s law: ~ ~q = −k∇T (2.23) In the case of isotropic and homogeneous material constants Equation 2.22 can be rewritten to   ∂T ~ ρCp + ~u · ∇T − k∆T = Q. ∂t

(2.24)

2.3 Interaction Between Heat and Motion To account for natural convection in the iNSE the body forces f~ can be set to f~(T ) = ρ(T )~g

(2.25)

where ρ(T ) is the mass density of the material-dependent on the temperature and ~g is the gravitational acceleration, expressing the buoyancy forces due to different densities. For full coupling when simulating an incompressible heat-driven flow the iNSE has to be rewritten:      2 ∂ |~v |2 ~ · ~v ρ(e + |~v | ) + p ) +∇ = ρ(e + ∂t 2 2 (2.26)        η 2 2 ~ ~ ~ ~ ∇ κe + |~v | + λ∇ · ∇ · ~v ~v + η ∇ · ∇~v ~v 2 In case of steady state simulations also the time-dependent term vanishes, leaving      η |~v |2 ~ )+p = ∇2 κe + |~v |2 . ∇ · ~v ρ(e + 2 2

(2.27)

In many cases of mixed convection where the part of the forced convection is bigger by orders of magnitude than the part of natural convection, the latter can be neglected and the simulations of the fluid motion and the heat transfer can be decoupled. This leads to much simpler simulations needing less time to be solved and consuming less memory.

9

Chapter 3

Concepts for Robust CFD Simulations This chapter illustrates the concepts used in the tool chain for robust CFD simulations. The first two sections explain general design patterns, the subsequent sections describe the tool chain from initial structure definition and generation, including the post-processing of image data for this purpose, to meshing and a formalization of solution methods. Each section pays special attention to matters of robustness and versatility.

3.1 Modern Programming Paradigms The evolution of programming paradigms [9] is always directed to improve: • Reusability1 : The ability of software to be used for more than one specific task is not only crucial for rapid and cheap software development but also provides already tested source code thus ensuring quality. • Maintainability2 : The source code has to be maintainable by different programmers at all times of its life cycle. Therefore, the code has to be readable and understandable. • Orthogonality3 : By enabling modifications of single components without creating nor propagating side effects orthogonality in software design facilitates feasibility and compactness of even complex designs. All of these goals ensure higher stability and robustness and each paradigm tries to reach them with its own methods. So, basically, modern programming paradigms have still the same goals but different points of view on the problem. Especially generic programming [10] in composite with the C++ programming language combines high performance general solutions with the possibility of highly specialized code parts for specific tasks [9]. Although the higher abstraction of modern programming paradigms increases robustness, the question to which extent performance is lost, remains. To investigate the possible loss of performance due to higher abstraction, generic code is compared to a simple implementation without abstractions.

1

Reusability is the ability to be used for more than one specific task. Maintainability is the ability to be reviewed/altered by different persons at all times. 3 Orthogonality is the ability to modify single components without creating nor propagating side effects.

2

10

The following example from literature [9] evaluated the performance of two implementations on two different architectures: The first approach is a simple C code (see Listing 3.1), the second approach is generic code taken from the Generic Scientific Simulation Environment (GSSE) [11] (see Listing 3.2), like it is used in the Generic Mesh Generation Interface (see Section 3.5.1). Both code snippets present the respective source code to traverse over the vertices of a container. Listing 3.1: simple C approach for traversing using multidimensional arrays 1 2 3 4 5 6

// traversal f o r ( i t 1 = 0 ; i t 1 < s i z e 1 ; i t 1 ++) f o r ( i t 2 = 0 ; i t 2 < s i z e 2 ; i t 2 ++) f o r ( i t 3 = 0 ; i t 3 < s i z e 3 ; i t 3 ++) { / / operations with container [ i t 1 ][ i t 2 ][ i t 3 ] }

Listing 3.2: generic approach of GSSE to traverse over a container of vertices 1 2 3

/ / preparation vit1 = container . vertex begin ( ); vit2 = container . vertex end ( ) ;

4

8

The simple C implementation uses a multidimensional, static array, which has to be fully declared at compile-time, i.e., the data type of the array has to be given and the size is fixed. Additionally, the static array will be one block within the memory, representing the values of the array one after another. The generic GSSE implementation, on the other hand, utilizes the concept of iterators. This code can be used for containers of any type which implements this concept. Furthermore, only the iterators are raised, allowing more flexibility of representing the container’s data in memory. The performance of these two approaches on two different architectures has been evaluated in literature [9] and is illustrated in Figure 3.1. Even in 2007, the performance loss due to abstraction was negligible if at all detectable. 10

10

10

10

10

9

10

9

iterations/s

7

// traversal f o r ( ; v i t 1 ! = v i t 2 ; ++ v i t 1 ) { / / operations with ∗ v i t 1 }

iterations/s

5 6

10

8

10

8

simple C generic GSSE 10

simple C generic GSSE

7 3

10

6

10 number of vertices

10

10

9

7 3

10

6

10 number of vertices

9

10

Figure 3.1: Performance (iterations per second) of generic code compared to simple C implementation [9] on an Intel P4 (left) and an AMD64 (right).

11

Many programming languages are still developing, C++ is one of them. The C++11-standard [12] added more concepts for higher abstraction like the std::tuple class or the std::iota function. Listing 3.3 shows how the aforementioned task of traversing over vertices in a container could be solved with these new possibilities, as these concepts are used in some of the tools, e.g. the latest versions of NGSolve. Listing 3.3: C++11 approach for a traversal algorithm, using std::tuple for the container 1 2 3

/ / preparation i n d i c e s = s t d : : g e t ( c o n t a i n e r ) ; a r r a y = s t d : : g e t ( c o n t a i n e r ) ;

4 5 6 7 8

// traversal f o r ( a u t o c o n s t& i t e m : i n d i c e s ) { / / operations with array [ item ] }

To review the performance of the three different approaches given above on different CPU types and hosts, a setup similar to [13] is chosen and the respective performance tests are executed. Table 3.1 enlists the parameters of the used systems. AMD Turion II Neo N40L x86 64 2 2 800 2200 2994.86 256 2048 0 DDR3 4 GCC 4.9.3

Intel i7-2670QM Architecture Physical cores Logical cores CPU min MHz CPU max MHz BogoMIPS L1 cache KB L2 cache KB L3 cache KB RAM4 type RAM GB Compiler

x86 64 4 8 800 3100 4389.95 512 2048 6144 DDR35 32 GCC6 4.9.3

ARM Cortex-A7 armv7l 4 4 600 1000 38.40 32 512 0 DDR2 1 GCC 4.9.2

Table 3.1: Comparison of the systems used for the performance tests.

To use only standardized code, the GSSE implementation of the container and iterator were replaced by std::vector and std::vector::iterator respective for the performance tests, as they are basically equal. The vertices were chosen to be double values since the data type does not matter for the relations between the traversal operations. Multiple measurements on each host are taken to remove jitter due to varying basic conditions like CPU load or memory usage by other processes. To compensate the constraints regarding the limited temporal resolution the test times for small numbers of vertices have to be extended. This is done by performing multiple repetitions of the same task within one measurement cycle and dividing the metered times by the number of repetitions thus getting averaged (but more accurate) values.

4

Random Access Memory (RAM) Double Data Rate (DDR) 6 Gnu C Compiler (GCC) 5

12

Not only the time but also the memory requirements are measured. Although no values can be retrieved at all for memory usage below about 256 KB the metered values for usage of approximately 64 MB and above correspond tightly with the calculated data. The simple C implementation and the GSSE like std::vector implementation only need the size of the vertex data type times the number of vertices, in the second execution a small, static overhead for the vector has to be taken into account. By contrast not only static overhead for the vectors and the std::tuple but also the size of the size t data type times the number of vertices has to be provided for the C++11 tuple implementation. At small vertex data types this is nearly equivalent to a doubling of the memory usage. Figure 3.2 shows the results of the performance tests. As expected, the maximum container size is strictly limited by available RAM. For this reason the possible number of vertices is much smaller for the tuple implementation than for the other two. For the Intel i7 and the AMD Turion no noteworthy abstraction penalty can be observed. Only on the ARM Cortex a considerable loss in performance is determined for the C++11 implementation. All three systems reveal an abrupt decline of the iterations per second at a certain point, which is in line with previous findings [13]. It appears that this point coincides with the limit of the respective CPU caches. Some other spikes (e.g. the one of the C++11 implementation on the AMD Turion at about 5.6 · 106 vertices) should be further investigated but are not relevant for this work. The main advantage of modern programming paradigms such as generic or functional programming is their high level of abstraction giving the opportunity to write easily readable and highly reusable code. In that way, not only quality is improved by providing already tested and easily maintainable code, but also extensibility e.g. for new data types is advanced. This facilitates versatility and robustness in similar manner. Due to the orthogonality of algorithms and data structures the amount of source code that needs to be written is extremely reduced.

3.1.1 Modularity and Encapsulation Modularity, compatibility and extensibility are essential for easily exchanging and adding components, which is generally important due to ongoing enhancements of all parts of the tool chain. Modularity, the emphasis of separating the functionality (of code) into solitary fractions (modules) [14], and encapsulation, the packaging of data with the respective methods operating on that data [15], both attempt to reach common goals by means of self-contained bundles: • Increase clarity and abstraction. • Enable interchangeability. • Raise extensibility and compatibility. • Simplify error location and troubleshooting. • Facilitate maintenance. By splitting the code into modules such that each of them contains everything necessary (functions and data) to execute only one aspect of the desired functionality both robustness and versatility can be improved. While interchangeability and extensibility primarily enhance the ability to turn on different fields, the advancements in compatibility and maintenance boost the tolerance to environmental variations.

13

AMD Turion II

2e+08

iterations/s

5e+07

1e+08

2e+08 1e+08

simple C

C++ vector

2e+07

2e+07

5e+07

iterations/s

5e+08

5e+08

1e+09

1e+09

Intel i7

C++ tuple

C++ vector

C++ tuple

1e+02 1e+03 1e+04 1e+05 1e+06 1e+07 1e+08 1e+09

1e+02 1e+03 1e+04 1e+05 1e+06 1e+07 1e+08 1e+09

vertices

ARM Cortex−A7

memory usage

simple C

C++ vector

C++ tuple

256 64 8

1e+08

2e+08

RAM used [MB]

1024

5e+08

4096

1e+09

32768

vertices

2e+07

1

5e+07

iterations/s

simple C

simple C

C++ vector

C++ tuple

1e+02 1e+03 1e+04 1e+05 1e+06 1e+07 1e+08 1e+09

1e+02 1e+03 1e+04 1e+05 1e+06 1e+07 1e+08 1e+09

vertices

vertices

Figure 3.2: Results of the performance tests: Performance (iterations / s) of three different implementations of a traversal algorithm on an Intel i7 (upper left), an AMD Turion II (upper right) and an ARM Cortex-A7 (lower left); memory usage for all three implementations measured on the Intel i7 (lower right).

3.2 Parallel Computing No matter how sophisticated the deployed strategies for image processing, mesh generation, refining and adaption are, they increase the computational burden of the simulation. Therefore, it is necessary to make efficient use of all computing resources available on any given platform. Parallelization accommodates the principle of divide and conquer - it slices the problem into smaller parts and deals with these chunks simultaneously. There are five different kinds of parallel computing [16, 17]:

14

• Bit-level parallelism based on the processor word size, • Instruction-level parallelism based on the operations that can be performed simultaneously, • Memory-level parallelism based on the ability to have multiple memory operations pending simultaneously, • Task parallelism based on the simultaneous execution of different tasks (that may execute the same code) on the same or different data and • Data parallelism based on the simultaneous execution of the same task on different data. The introduction of multi-core CPUs to the main stream, the recent trend to more cores with lower clock frequencies and the rising popularity of general purpose computations on Graphics Processing Units (GPUs) [18] made task and data parallelization essential, even for desktop systems. Otherwise the execution performance of serial code will stagnate or even decrease with, for instance, a new CPU generation, due to the lower capabilities of each single core. A multi-core processor is capable of efficiently executing several threads7 in parallel. The following pseudo code examples (Listings 3.4 and 3.5) depict the difference between task and data parallelism using a dual threaded system. One could consider data parallelization as a special form of task parallelization, given that the tasks are the same (execution of the same code) and the data are different. Listing 3.5: Example for data parallelism

Listing 3.4: Example for task parallelism 1 2 3 4 5

i f ( t h r e a d i d == 0 ) t h e n task A ( data A ) e l s e i f ( t h r e a d i d == 1 ) t h e n task B ( data B ) end i f

1 2 3 4 5 6 7

i f ( t h r e a d i d == 0 ) t h e n l i m i t 1 := 1 l i m i t 2 := round ( d a t a . l e n g t h / 2 ) e l s e i f ( t h r e a d i d == 1 ) t h e n l i m i t 1 := round ( d a t a . l e n g t h / 2) +1 l i m i t 2 := data . l engt h end i f

8 9 10 11

f o r i from l i m i t 1 t o l i m i t 2 do task ( data [ i ]) done

The most auspicious steps of the tool chain to profit from parallelization are the assembly and solution of the equation system. This is due to the usually high grade of parallelizable parts of these tasks. Already in 1967 Gene Amdahl stated that the potential speed-up of an algorithm through parallelization is limited by the fraction of the whole task that cannot benefit from this improvement [20]. Utilizing the fact, that matrices are easily partable, matrix-vector multiplications and FEM simulations in general profit from data parallelism [21]. Assembly is generating the matrices the solver needs by collecting the parameters of all elements. Therefore, as already stated above, it is a promising candidate for parallelization in the FEM procedure. This can be facilitated by partitioning the set of traversed elements into disjoint parts which thus enable parallel processing [9] (for example see Section 3.5.3). For the generation of large finite element matrices multiple GPUs can also be utilized [22]. The GPUs are used to generate the large sparse matrices while the CPUs perform the collection and addition of the fragments. In addition, parallelized algorithms such as the MapReduce approach [23] can be used for preconditioning these matrices using sparse approximate inverses [24]. 7

A thread is a sequence of instructions being executed in a program that can be managed independently [19].

15

The actual solution process of the equation system, e.g. the Lower-Upper factorization (LU factorization), is another area suitable for parallelization. LU factorization yields in decomposing a matrix into two matrices of the same size, where the first matrix is lower triangular, the second matrix is upper triangular. On the lines of assembly, for the actual solution process the matrices are split into disjoint parts and distributed for parallel processing. In this tool chain parallel matrix solvers, with Petsc [25], Trilinos [26, 27] and Pardiso [28] among them, were deployed. Parallelization contributes to both robustness and versatility by enabling the tool chain to solve bigger systems in a shorter amount of time, making the available memory more prominent as limiting factor. Thus, even models too big to solve on serial environments within reasonable time can be simulated, utilizing parallelization. Thereby, not only versatility is enlarged, but on the side of robustness, e.g. refinements are enabled where needed due to rapidly varying quantities, resulting in increased stability.

3.3 Improvements on Resolution and Contrast (Image Post-Processing) In the tool chain, the quality of the output of each step depends on the input from the preceding steps. Hence, any improvements should start at the beginning of the chain; if the geometry has to be reconstructed from images, this would be the imaging. Often, resolution and contrast of the imaging process itself are limited by factors that cannot be influenced. For example, the Signal to Noise Ratio (SNR) of Magnetic Resonance Imaging (MRI) depends on the field intensity of the magnetic field and a higher SNR can be allocated to higher spatial or temporal resolution. So, the spatial resolution of the MRI is limited by the available magnetic field. If the limiting factors cannot be influenced, improvements have to be made in post-processing the images. In the tool chain used for this thesis mainly two possibilities of post processing the image data were used: • Digital subtraction methods to increase the contrast and eliminate data not needed and • Shifted layers with contrast enhancement algorithms to increase the resolution. Digital subtraction methods can be used with each imaging technique capable of any contrast enlargement accentuating the parts of interest. Medical applications like Computed Tomography (CT) or MRI widely make use of them, but it is also possible to utilize them for Transmission Electron Microscopy (TEM) scans, like they are used to examine micro- and nanoelectronic devices. The shifted layer algorithms can be applied to each tomographic technique, although, the need of physical shifting the specimen against the detector in the order of magnitude of the desired resolution constrains the area of application.

3.3.1 Digital Subtraction Digital subtraction methods always use two sequences of images: Some sort of mask to subtract and one sequence where the parts of interest are somehow accentuated. In medical imaging, especially in CT and MRI, the first sequence would be a native sequence and the second one a sequence using contrast agents to highlight the respective tissues or regions. In general, there are no restrictions on how to enhance the contrast of the regions of interest. After image acquisition the mask is digitally subtracted from the second, contrast enhanced sequence, leaving only the accentuated regions. Hence, not only the contrast is enhanced but also all unnecessary data is eliminated. Figure 3.3 shows three images from a digital subtraction angiography (DSA) [29] which is used to represent blood vessels. The DSA is performed on a rabbit’s neck, showing an experimental aneurysm. Not all blood vessels but only the ones carrying contrast agents are enhanced. 16

Figure 3.3: DSA of a rabbit8 . The sequence shows the mask (native sequence without contrast agent) on the left, the contrast enhanced image (where the arteries are apparent) in the middle and the subtraction image (with the arteries only) on the right.

The contributions to robustness made by digital subtraction methods are on the one hand the reduction of data by eliminating regions which are not in focus of the investigation, whereas on the other hand the enhancement of contrast. The latter enables easy and correct distinction between different tissues and regions. Being less dependent on the contrast of the original images increases the versatility regarding imaging procedures.

3.3.2 Layer Shifting Although the theoretical achievable resolution of imaging techniques is enhancing continuously, the actually available resolution is often limited by non-system-intrinsic factors (for example maximum radiation rates or maximum examination duration). Hence, methods to improve the resolution are needed. Layer shifting [30] is one possibility to do so and was developed in conjunction with MRI-scans. A short, illustrative example is given to provide an overview: Given a machine can provide images at a resolution of 2x, with x being an arbitrary unit of length. The region of interest is a 9x thick layer with distinct features measuring x. The resolution can be increased by doing two scans at optimum resolution, shifting the two images by x (half the resolution) and combining them to one image with double the number of voxels9 . The contrast of the overlapping voxels can be increased by combining the data of both scans. By these means a resolution of 1.25x can be achieved. The contrast of the virtual resolution image is comparable to images directly recorded with a resolution of 1.25x. Figure 3.4 depicts the steps and the outcome of the example above. The following pseudo code describes the steps of the algorithm used in the tool chain: Listing 3.6: Layer Shifting Algorithm 1 2 3 4 5

/ / c o m b i n e t h e two s e q u e n c e s a and b w i t h e a c h n l a y e r s t o one s e q u e n c e c f o r e a c h l a y e r i from 1 t o n do c [ 2 i −1] : = a [ i ] c [2 i ] := b [ i ] done

6 7 8 9

/ / p r e p a r e t h e o u t p u t s e q u e n c e w[ i , 0 ] / / where i i n d i c a t e s t h e l a y e r and 0 i n d i c a t e s t h e i t e r a t i o n s t e p w[ 1 , 0 ] : = c [ 1 ] 8 9

X-Ray images were taken by C. Sherif and G. Mach within a project of the Cerebrovascular Research Group Vienna. A voxel is a volume element representing some numerical quantity, analogous to a (dimensionless) pixel.

17

sequence with optimal resolution (9 voxel) scanned sequences (2x4 voxel) combined sequences to one sequence (8 voxel) contrast enhanced sequence (8 voxel) Figure 3.4: Layer shifting used to increase the resolution of the image: Optimal sequence, scanned sequences, combination and enhanced sequence.

10 11 12 13

f o r i from 2 t o 2 n do w[ i , 0 ] : = ( c [ i −1] + c [ i ] ) / 2 done w[ 2 n + 1 , 0 ] : = c [ 2 n ]

14 15 16 17 18 19

// Iterations f o r j from 1 t o m do f o r i from 1 t o 2 n do / / calculate interim values d[ i , j ] d [ i , j ] : = (w[ i , j −1] + w[ i +1 , j −1]) / 2

20 21 22 23

// calculate differences e[ i , j ] e [ i , j ] := d [ i , j ] − c [ i ] done

24 25 26 27 28 29 30

/ / c a l c u l a t e c o rr ec ti on proposal values f [1 , j ] = e [1 , j ] f o r i from 2 t o 2 n do f [ i , j ] = e [ i −1, j ] + e [ i , j ] done f [ 2 n +1 , j ] = e [ 2 n , j ]

31 32 33 34 35 36 37 38 39

/ / calculate correction g [1 , j ] := 0 g [ 2 , j ] : = k1 ∗ f [ 2 , j ] − f o r i from 2 t o 2 n do g [ i , j ] = k1 ∗ f [ i , j ] done g [ 2 n , j ] : = k1 ∗ f [ 2 n , j ] g [ 2 n +1 , j ] : = 0

values k2 ∗ f [ 3 , j ] / 2 − k2 ∗ ( f [ i −1, j ] + f [ i +1 , j ] ) / 2 − k2 ∗ f [ 2 n −1, j ] / 2

40 41 42 43 44 45

/ / c a l c u l a t e new o u t p u t s e q u e n c e w[ i , j ] f o r i from 1 t o 2 n+1 do w[ i , j ] = w[ i , j −1] − g [ i , j ] done done

The factors k1 = 1.95 and k2 = 1.94 are found empirically to increase convergence (decreasing absolute correction values). Testing against true values revealed that the maximum error is strictly monotonic decreasing whereas single error values can oscillate. The convergence rates are decreasing with ongoing iteration. First and foremost, the improvements in resolution help to raise the quality of the model and so to facilitate valid solutions. Second, proper resolution is crucial for a stable solution process. These are main points to enhance robustness of the tool chain. But also versatility is advanced: Techniques to improve resolution 18

enable to examine models and applications that could not be dealt with before. Using MRI models instead of CT models for intracranial fluid dynamics simulations is an example of this application.

3.4 Smoothing and Cropping (Geometry Post-Processing) Due to limitations of the quality (e.g. resolution, SNR) of images from tomographic methods, segmented geometries are often noisy and coarse grained. Hence, smoothing the segmented structure is important to avoid non-optimal meshes (e.g. unnecessary high mesh resolution) or corrupt meshes (e.g. singularities). Figure 3.5 demonstrates this aspect. The process of smoothing must capture important geometrical features while reducing noise, sharp edges and too finely scaled structures. In the same manner, the geometry often has to be cropped to further use only relevant parts and avoid unneeded use of resources.

Figure 3.5: Demonstration of the need for smoothing: A segmented structure (blue) and the correspondent mesh (green) before (left) and after (right) smoothing.

One possibility for smoothing segmented surfaces is to apply different methods working directly on the surface as for example • Laplacian smoothing [31], • Taubin smoothing [32], • Poisson surface reconstruction [33] or • Marching cubes [34]. The other possibility is to utilize knowledge about the principal design of the surfaces (for example the circular type profile of blood vessels) for feature reconstruction methods like skeletonization and reinflation [35]. Not only segmented surfaces need to be smooth, structures created with CSG (see Section 4.2) are also subjectable to this demand. A simple method for smoothing CSG structures is to add rounding elements especially at junctions and terminal points. Also, touching surfaces with only few common elements and utmost acute angles can be avoided using additional elements to prepare the surfaces for meshing.

19

Immoderately fine meshes and singularities result in simulation breakdowns (if the problem is too illconditioned) or extremely long calculations, therefore smoothing is a powerful tool to increase robustness of the subsequent steps. Again the influence of the taken measures on the validity of the model for better or for worse have to be considered: While the elimination of volitionally fine structures or required elements through too extreme smoothing can jeopardize the validity of the model, the removal of artifacts originating from imaging, additional (re-)construction processes or numerical issues in CSG is a prerequisite for meaningful results. The ability to suppress and remove unwanted features in the geometry also broadens the field of application thus enhancing versatility.

3.5 Mesh Generation The meshing process lies between the segmentation and post-processing of the resulting structure on the one side and the specification of the model as a mathematical problem and solving the respective equations on the other side. Hence, it is worthwhile to wrap the mesh generator by interfaces, to be able to use many different input- and output-formats, thus encapsulating the meshing process and avoiding compatibility issues.

3.5.1 Interfaces To achieve the highest possible grade of modularity at the input of the meshing process an extended version of a Generic Mesh Generation Interface [36, 37] has been used in the tool chain, allowing generic access to various mesh generators (like Tetgen [38], Netgen [39], Triangle [40] and CGAL [41]). While this interface encapsulates the whole meshing process on the outside it is able to combine different meshing strategies and kernels on its inside. The combinations are chosen by meta information gathered from the geometry such as the local feature size [42]. Thus using generic and orthogonal algorithms a single meshing interface for arbitrary dimensions is given without taking detours using different file formats. Thereby, interoperability issues like loss of information due to limited convertibility can be avoided. Figure 3.6 depicts the modular structure of the mesher interface as well as the encapsulation of the meshing process. A different tool that can be used for the conversion of geometric file formats is the Ballistics Research Laboratory Computer-Aided Design (BRL-CAD) solid modeling platform [43]. It allows the conversion of more than ten different geometry file formats and provides the ability to easily add new converters as needed. In analogy to BRL-CAD, converting different geometry file formats on the output-side of the mesh generation process, MeshLab [44] can be used to convert different mesh file formats. Also, the aforementioned mesh generator interface provides various output formats. Although the use of converters has no direct influence on the robustness of the tool chain, using them can increase the stability by the use of tools best suited to the problem specifics that could not be used otherwise due to incompatible file formats. In the same manner the versatility of the tool chain can be enhanced.

20

Geometry

Mesher Interface Triangle

Tetgen Edge Extraction

CGAL

Boundary Meshing

Netgen

Volume Meshing

Mesh

Figure 3.6: Structure of the Generic Mesh Generation Interface. Each kernel module is available at each step of the process.

The mesh generator interface adds robustness and versatility in two different ways: On the one hand it has the same effect as the converter tools mentioned above, on the other hand it encapsulates the meshing process. As depicted in Section 3.1.1 modularization and encapsulation can boost both the ability to turn to new fields of applications as well as the insensitivity on variations in the characteristics of the environment.

3.5.2 Mesh Generation Strategies No matter which numerical technique, for example FEM or Finite Volume Method (FVM), is used for CFD simulations, they all work with discrete spaces and therefore need spatial discretization. There are many ways of discretizing space and all of them have their respective pros and cons; the most common scheme, particular when using FEM, is triangulation, the subdivision into unstructured simplices [45, 46]. A k-simplex is the k dimensional polytope, specifically the convex hull of its k + 1 vertices. The most commonly used simplices in CFD are the 2-simplex (triangle) and the 3-simplex (tetrahedron). Since curved boundaries can usually be approximated by piecewise linear boundaries most meshing algorithms are based upon the idealized assumption that the input geometry is piecewise linear [47]. This enables the representation of the objects by simplices. Many problems are defined on irregularly shaped domains, making structured discretization very difficult if not impossible. In many other cases it is necessary to refine or adopt the meshes (see also Section 3.5.3) or restrict the maximum element size at specific areas to ensure adequate resolution. 21

In such cases structured meshes tend to be comprised by an exceeding number of elements, which outgrows the advantages over unstructured meshes regarding ease of use (uniform number of neighbors) and memory usage (no storage of pointers to neighbors necessary). An example showing the Los Angeles Basin where the structured grid has five times as many nodes as the unstructured mesh is available in literature [47].

Figure 3.7: Hybrid meshes. The black parts are meshed using structured grids, the blue parts are unstructured. Left: Two-dimensional (2D) mesh of a bifurcation. Right: Three-dimensional (3D) mesh of a simplified model of the Raspberry Pi 1 A+.

Meshes combining structured and unstructured parts are called hybrid meshes. Figure 3.7 depicts two examples for hybrid meshes. Sections marked in blue can not be meshed in a structured way or would have a disproportional number of elements. In the 2D example the unstructured region is meshed using triangles, the blue marked areas in the 3D example are sweeps of triangle and quadrangle grids. Notwithstanding the advantages of structured grids and the possibility of combining structured and unstructured meshes, triangulation provides the most robust and versatile method and results. Hence, the meshes used in the tool chain all consist of unstructured simplices only. The primary algorithm utilized in the tool chain is called ”advancing front” and described in the next paragraphs. Starting from a geometric model given by CSG or segmented images the steps to get an unstructured volume mesh consisting of tetrahedra are as follows [39]: 1. Calculation of special points (points of intersection and extreme coordinates, degenerated points) 2. Edge detection (starting at special points and following until the terminal point of the edge, e.g., another special point, is reached) 3. Surface mesh generation (if the boundary has a special form, then ”cut off” these triangles by connecting points and inserting new nodal points using rules (see Figure 3.8); then get the new boundary) 4. Surface optimization (improve quality of triangles by modifying nodal points and connections) 5. Volume mesh generation (like surface mesh generation but using tetrahedra) 6. Volume optimization (like surface optimization) For the aforementioned surface mesh generation there are different sets of rules to find triangles. Some of them start with the insertion of nodal points [48], some of them with the ”triangle cut off” [39]. Figure 3.8 depicts four rules and their application to a simple meshing problem, starting with the ”cut off”.

22

1

1

rule number

1

existing point

2

new point old front

3

new connection (front) new connection (not front) free zone (unmeshed)

4

part of the structure

4 2

3

Figure 3.8: Surface mesh generation using the advancing front method. The rules on the left are applied to the problem on the right.

If a part of the problem matches the setting of a rule regarding the combination of points and the free zone (the parts of the geometric model that have not been meshed yet) this rule is applied. While the first two rules do not add an additional point the rules number three and four each add one nodal point. After the application of the rule the front ”advances” from the borders to the inner regions of the geometric model until there is no free zone left. Therefore, this method is called ”advancing front”. After the actual mesh generation processes for each the surface and the volume, optimization steps to improve the quality of the mesh are performed. Good mesh quality is essential for the robustness of the tool chain: The meaningfulness and accuracy of the numerical solution depend on an adequate spatial discretization. Since particularly very small and very large angles cause issues like ill-conditioned equation systems, discretization errors and interpolation errors [36], a perfect mesh element (cell) should be as ”round” as possible. A perfectly round element would be an equilateral triangle in 2D or an equilateral tetrahedron in 3D, respectively. Badly shaped triangles are daggers (one short edge and at least one small angle) and blades (no short edge, one big and two small angles). Badly shaped tetrahedra have at least three badly shaped triangles: Tetrahedra with all four vertices close to a line (spire, spear, spindle, spike and splinter) can be distinguished from ones with the vertices close to a plane (wedge, spade, cap and sliver) [49]. One obvious method of getting well shaped elements is to maximize the minimum interior angle and minimize the maximum interior angle of all cells. That can be reached if the circumcircle or circumsphere of any cell is empty. This condition is called ”Delaunay condition” and the mesh generation method ”Delaunay triangulation”. In the left picture of Figure 3.9 this criterion is shown on a simple example. Boris Delaunay found the duality of the Delaunay triangulation to the Voronoi tessellation in 1934 [50]. A Voronoi tessellation is the partitioning of a surface into cells, using a distance norm and a set of ”seed” points. If the seeds are the set of vertices of the mesh and the underlying norm used is the Euclidean distance, the resulting Voronoi diagram is the dual graph of the respective Delaunay triangulation. The edges of the mesh are perpendicular to the ones of the Voronoi diagram. The Voronoi cells are the regions that are closer to the corresponding points than to any other point. The right part of Figure 3.9 depicts the relation between Delaunay triangulation and Voronoi tessellation. The Delaunay algorithm removes all poor quality tetrahedra and triangles, except slivers [51]. A Sliver is a tetrahedron with vertices lying close to a plane and whose orthogonal projection to that plane is a convex quadrilateral with no short edge. To remove slivers, a ”weighted Delaunay triangulation” can be used [49].

23

Figure 3.9: Left: Explanation of the Delaunay criterion. In the left variant the points are within the circumcircles of the respective other triangle, in the right variant the circumcircles are empty. Thus the minimum inner angles have been maximized. Right: Duality of the Delaunay triangulation (black) and the Voronoi tessellation (red). The dotted lines are prolongations of Voronoi edges (dashed lines), the edges of the mesh are displayed as continuous lines.

To sum up this section, high quality meshes are crucial for robustness and versatility of the tool chain. Hence, it is worth to invest a lot of thought into mesh generation and optimization. ”Skinny” simplices can lead to stability and accuracy problems due to glancing intersections and other shortcomings. For this reason the tool chain uses weighted Delaunay triangulation to minimize the number of badly shaped elements. It is important to ensure high robustness and versatility not only of the results but also of the mesh generation process itself. The use of simple rule sets, fuzzy classification schemes as well as encapsulation and modularization using interfaces as depicted above can provide the desired improvements [52].

3.5.3 Mesh Refinement and Adaption Mesh refinement allows to adopt a given mesh to better serve the calculations, e.g., by increasing the (local) resolution of the mesh to accommodate accuracy requirements. Uniform refinement often results in many unnecessarily fine regions and, therefore, excessive need for computational power and memory. The consequences can be excessively long simulation times as well as crashes of the simulation software due to unstable simulation programs or operating systems. On the contrary, adaptive meshing contributes to the necessity of inhomogeneous and anisotropic refinement [53].

Figure 3.10: A bar before (left) and after (right) adaptive refinement. The color represents the strain on the bar, the gradient of strain was used to determine the refinement.

24

Figure 3.10 depicts these refinement techniques. The gradient of a strain was used to determine the necessary refinement factors. Regions of higher gradient were refined to increase the accuracy and stability of the solutions whereas other regions of low gradients were coarsened to reduce the demand for memory and computational power. Besides pure mesh refinement, hp-adaptivity provides different ways to refine an element, adaptively combining elements with variable size h and polynomial degree p of the ansatz functions [54]. These ways are preferable to both h- and p-adaptivity due to the large number of combinations for the polynomial degrees on the subelements. The number of different refinements is dependent on the implementation and the specific elements but in general hp-adaption allows all combinations of the possibilities of both h- and p-adaption. When employing hp-refinement exponential convergence rates [55] can be reached.

h  ion

h  ion

henement

enement

Figure 3.11: Several refinement candidates for a fourth order element. The size h is depicted directly as size of the triangles, the number in and color of the triangles indicate the polynomial degree p.

Some refinement choices for an element with fourth order ansatz functions, including examples for each a standard h- and a p-refinement only, are shown in Figure 3.11. Since homogeneous adaption (like in the first candidate) is not always reasonable, the third candidate illustrates an inhomogeneous but isotropic adaption. Anisotropic adaptions are also possible. Mesh refinement and adaption can improve the robustness of the tool chain in at least three ways: • By refining areas with high gradients the convergence rates of the solving algorithms can be increased. Therefore, the computational time and often even the demand of computational power and memory is reduced. By reason of lower update norms the stability of the solution process is also increased. • By not refining or even coarsening of other areas the need of memory can be drastically reduced. Additionally, the computational time of some algorithms used in CFD is proportional to the square root of the number of elements [56], giving a good reason to keep the latter as small as possible. • By increasing the mesh quality [36, 57], convergence rates as well as the stability of the algorithms used for solving the NSE can be increased. Mesh refinement and adaption is also a suitable target for parallelization, at least if there are efficient algorithms for domain decomposition and repartitioning. Such algorithms [58] are provided for example by METIS [59] and ParMETIS [60], the latter supports parallel partitioning methods. Figure 3.12 shows an example of a decomposed domain as well as the assignment of element sets to different processors. 25

Processor 1

Processor 1

Processor 0 Domain Boundary

Processor 0

Figure 3.12: Left: Domain decomposition of a volume mesh of a transformer into four subdomains. For better view, the meshes of the coils (blue, green) have been partly removed. The core has been split into two subdomains (yellow: Prism elements - red: Tetrahedra elements). Right: Partitioning of a mesh (black) into disjoint element sets (red, green) along the domain boundary (cyan). These sets are divided between processors (0 - green, 1 - red). The boundary nodes are assigned to the processor with lower ID (0 - green).

The libMesh framework [61] provides a generic library for parallel adaptive mesh refinement and coarsening. Originally developed for the solution of iNSE it now supports a wide range of applications including electrostatics and wave equations. To summarize, appropriately resolved meshes on the one hand decrease the demand of computational power and time as well as memory while on the other hand they increase the stability and convergence rates of the used algorithms. Thereby, mesh adaption, refinement and coarsening enhance robustness as well as versatility.

3.6 Discretization Galerkin Methods are a class of FEMs which are a class of discretization methods, converting a continuous operator problem (i.e. differential equations) into a discrete problem [62]. Introduced in 1973 by Reed and Hill to solve the hyperbolic neutron transport equation [63] Discontinuous Galerkin FEM (DG-FEM) received notable interest in the later 1990s to calculate fluid dynamics [64]. Different Galerkin methods with different characteristics and fields of application were developed, for example the Continuous Galerkin FEM (CG-FEM), the DG-FEM and the Hybrid Discontinuous Galerkin FEM (HDG-FEM). Figure 3.13 shows the degrees of freedom of the mentioned linear Galerkin methods in 2D [65]. In the developed tool chain, the local discretization is performed by the HDG-FEM [66], which is an enhancement to the DG-FEM. The DG Methods combine features of the FEMs as for example higher order approximations with features of the FVMs like the conservation of flows [67]. When modeling the problem − ∇2 ~u = f

(3.1)

HDG-FEM approximates u with piece-wise polynomials on the elements and additional polynomials on extra facets between the elements [65]. Figure 3.14 depicts the continuity of tangential and normal components of different Galerkin FEMs as well as the extra facets used in HDG-FEM. While continuity of the normal component is necessary for global conformity, the tangential components may be discontinuous. 26

Figure 3.13: Degrees of freedom of different linear Galerkin FEMs in 2D: Left: CG-FEM, center: DG-FEM, right: HDG-FEM.

Figure 3.14: The continuity of tangential (dashed) and normal (dotted) components of different Galerkin FEMs. Continuous components are marked with green parallel arrows, discontinuous components are marked with red antiparallel arrows. Left: CG-FEM, center: Divergence free DG-FEM, right: Divergence free HDG-FEM.

Using a suitable pair of finite element spaces a formulation of the iNSE, which is locally conservative and energy-stable at the same time, can be achieved. This leads to discrete solutions which are exactly divergence-free. Therefore, higher order basis functions (with non-zero divergence) will have no contribution and can be discarded [68]. In comparison to DG-FEM, HDG-FEM has even more unknowns but less matrix entries and allows element-wise assembling as well as static condensation of element unknowns [69]. Thus it requires only direct neighbor communication making it very suitable for parallel computing. Summarizing, the accumulated facts lead to the conclusion that divergence-free HDG-FEM can improve both robustness and versatility. Divergence-free discrete solutions and less matrix entries decrease the need of memory and enhance performance. The possibility of partitioning the elements into disjoint sets enables parallelization, which further improves performance.

3.7 Time Integration Solving time-dependent problems requires not only spatial discretization but also time integration, i.e., the solution of the problem at different (discrete) time steps. Typically, the solution is done in two steps: 1. Solve a steady state problem to get initial values for t = 0. 2. Use a time integration scheme to compute solutions for consecutive time steps. As pointed out in Section 2.1, to get initial values of the iNSE the unsteady and convective acceleration terms can be neglected, leading to the iSE, which is solved in step one. Thereafter the iNSE is solved for the chosen time steps.

27

For time integration of spatially discretized partial differential equations, i.e., of convection-diffusion type like the iNSE, it is a common method to use implicit-explicit (IMEX) schemes. These are methods using an implicit scheme for the diffusion part ∂~u = D∇2~u (3.2) ∂t and an explicit scheme for the convection part [70] ∂~u ~ u = 0. + ~u · ∇~ ∂t

(3.3)

Explicit schemes only use the current state of the system to get to the next time step. In contrast, implicit schemes involve both the current state as well as the next one, hence they require additional computations. On the other hand implicit schemes generally are much more stable and have higher convergence rates, especially when solving stiff problems [71]. The implicit schemes used in the tool chain are stiffly accurate, diagonally implicit Runge-Kutta schemes. By the combination of both schemes to the IMEX method the solution of non-linear, non-symmetric equations as well as the integration of stiff parts using explicit schemes can be avoided [65]. Thus increasing both stability and performance which leads to an enhancement of robustness and versatility.

28

Chapter 4

Software This chapter gives a brief overview about the application of the forementioned concepts (Chapter 3) by the used software. Again the sections are following the tool chain from modeling over meshing to the simulation itself. The first section links the later described software packages to the structure of the tool chain as shown in the introduction (Chapter 1). The last section presents a ranking of the previously described packages and depicts which tools and applications are used in the tool chain for the applications and results in Chapter 5.

4.1 Tool Chain Overview There are many possible ways to implement the principles and methods described in Chapter 3, hence there are innumerable tools for each task of the tool chain. Figures 4.1 and 4.2 show possible combinations of tools used during the work on this thesis. This is a link of the individual software packages to the structure depicted in Figure 1.1. Constructive Solid Geometry

Meshing

Formalization

Flow Simulation

Comsol Multiphysics Meshing Module

CSG Module

CFD Module

Solver Module

Triangle

Texteditor

CGAL

MesherInterface

Tetgen NG-Suite Netgen

NGSolve + NGSFlow

Figure 4.1: Possible combinations of tools in the chain when using CSG.

In comparison to Figure 1.1 the chain has an additional step ”interface” (containing the Mesher-Interface), which is basically part of the meshing process. It encapsulates the mesh generators Triangle, CGAL, Tetgen and Netgen (see Sections 4.5.3 to 4.5.6).

29

While the Comsol Multiphysics suite (see Section 4.6.2) has its own geometry module which facilitates the comfortable creation of even complex structures by means of CSG, import of geometry files as well as the combination of imported structures and CSG, Netgen on the other side only allows the import of various geometry file formats. Since most of them are structured extremely simply, an arbitrary text editor can be used to create them. Both, Comsol Multiphysics and the NG-suite (consisting of Netgen, NGSolve and ngs-flow, both latter see Section 4.6.1) allow to import meshes in different file formats, enabling different combinations of modules and thus increases flexibility. Imaging

Segmentation

Meshing

BRL-CAD

Formalization

Flow Simulation

Comsol Multiphysics Meshing Module

CFD Module

Solver Module

itkSNAP Insight Toolkit

Triangle VMTK MesherInterface

CGAL Tetgen NG-Suite Netgen

NGSolve + NGSFlow

Figure 4.2: Possible combinations of software tools when using imaging and segmentation.

The imaging process starts with the capture of data sets which is not part of this thesis. All used imaging technologies (X-Ray, CT and MRI) deliver their respective results using the Digital Imaging and COMmunication in medicine (DICOM) file standard [72], which can be read by various tools for image post-processing (like the Insight segmentation and registration ToolKit (ITK), see Section 4.3.1) and segmentation (like itkSNAP, see Section 4.4.1, or the Vascular Modeling ToolKit (VMTK), see Section 4.4.2). In comparison to Figure 4.1 the respective geometry part was replaced by modules for imaging, segmentation and an additional step of post-processing. This latter step is part of the segmentation process. Apart from these changes an additional interface (BRL-CAD, see Section 4.5.2) to convert the corresponding geometry files to a more easily handled mesh file format. This extension of the tool chain is necessary, since Comsol Multiphysics does not work well with files delivered by MeshLab (see Section 4.4.3). The simple extensibility by incorporating this additional step is due to the modularity of this tool chain.

4.2 Constructive Solid Geometry CSG is a method for geometric modeling using simple base objects called primitives, Boolean operations and geometric transformations to construct objects. Examples for primitives are cuboids, cylinders, spheres, pyramids and cones, as well as tori and helices. They can be combined by Boolean operations (union, intersection and difference) or transformed (copied, moved, rotated and mirrored) [2]. The construction plan can always be represented as sequential list or, if only Boolean operations are used, as a binary tree. The latter has primitives as leaves, operations as nodes and the finished geometry as root. Since the Boolean difference is not commutative, the corresponding branches have to be named. Figure 4.3 depicts an example of a CSG-tree, using four primitives as leaves and all three different Boolean operations.

30



A

A\B B 

Figure 4.3: CSG tree example. The operations are labeled as follows: ∪ for union, ∩ for intersection, \ for difference.

Many programs include CSG modules or interfaces to load CSG files. In this tool chain the simpler CSG files are written by hand, the more complex are set up using the geometry module of Comsol Multiphysics (see Section 4.6.2).

4.3 Imaging and Image Post-Processing Instead of reconstructing the respective geometry using CSG in many cases it is suitable to use a scanning device to create a proper geometric model of the simulation area. Such imaging devices include CT or MRI, especially, but not used in medical applications exclusively. The imaging process itself (getting images using a scanner) is not a focus of this work. But since the capabilities of these scanning devices is limited, all used forms of image improvement and post-processing (like described in Section 3.3) are described here.

4.3.1 Insight Segmentation and Registration Toolkit The ITK [73] is an open-source suite of image processing and analysis tools. Besides the opportunity of writing custom segmentation tools (see also Section 4.4) it provides the possibility of registration (the task of aligning or developing correspondences between data) and post-processing of images gathered from a scanner. Within this work ITK is used for three tasks: 1. Digital subtraction: Since not all scanners (especial MRI devices) have built-in support for digital subtraction, this task is integrated into the image post-processing workflow using the registration capabilities of the toolkit. 2. Layer shifting: The process described in Section 3.3.2 has been built using ITK libraries. 3. Resampling: Although resampling cannot improve the quality of an image (since no information is added), it can be useful to ensure a certain resolution (for example equidistant scaling).

31

The ITK was chosen for various reasons, the most crucial ones are that it is a widely used open-source project with an active community - therefore has a vivid update and life cycle - and it is implemented in C++ using a generic programming style, making it highly efficient. It is cross-platform and generates wrapping interfaces, enabling use from multiple interpreted programming languages (i.e. Java and Python).

4.4 Segmentation and Geometry Post-Processing If no construction plan is available but only tomographic images to describe the simulation domain, the geometric and topological information must be reconstructed by analysis and segmentation. A 3D structure has to be extracted from a series of 2D (gray value) images, which represent slices through the area of interest. The extraction of structures follows the boundaries between different materials, by applying one of two basic methods of segmentation [74] (as shown in Figure 4.4): • Threshold techniques separate the regions by distinguishing between different ranges of gray values. To get only cohesive domains, seeding is used. There, the user selects “seeds” which are expanded or contracted to meet the previously set thresholds. If all regions within the range of gray values should be selected, there is no need for seeding. • Edge-based methods on the other hand reconstruct the segments by using edge detection algorithms. To select a certain area seeding can be used in the same manner as with the previous discussed threshold techniques.

Figure 4.4: Two basic segmentation methods: Left: Threshold technique, right: Edge based method with seeding.

Both approaches make use of and rely on contrast. Which of them is the more efficient usually depends on resolution and geometrical factors encountered making an optimal default choice virtually impossible. The experience and expertise of a user are essential, as all subsequent steps of the simulation depend on ensuring adequate results in this step. In general, the segmented structure has to be smoothed and cropped afterwards to achieve fair surfaces with planar inlets and outlets. This is necessary to avoid breakdowns while meshing (see Chapter 3) and clearly distinguish between ports and sidewalls. The following subsections introduce tools used for segmentation and post-processing of the 3D surfaces. 32

4.4.1 itkSNAP itkSNAP [75] is a segmentation tool focusing on usability and therefore providing a Graphical User Interface (GUI) during the whole process of segmentation (see Figure 4.4). Table 4.1 provides an overview of the advantages and disadvantages of this application.

• • • • •

Advantages GUI-controlled suite Short training period Partly automated segmentation Manual refining possible Different output formats

Disadvantages • Only two different segmentation algorithms • No post-processing available

Table 4.1: Comparison of the advantages and disadvantages of itkSNAP

4.4.2 Vascular Modeling Toolkit Vascular means ”regarding blood vessels”, hence VMTK [76] is a collection of small applications focusing on the segmentation of these vessels (throughout this work ”vessel” always refers to ”blood vessel”). Therefore this toolkit provides methods for smoothing and a feature for a skeletonization to center lines and rebuilding of the structure with circles of average diameter [35]. Table 4.2 depicts the advantages and disadvantages of VMTK. Screenshots of a segmentation process can be seen in Figure 4.5.

• • • • •

Advantages Many different segmentation algorithms Methods for smoothing and cropping Algorithm for skeletonization and rebuilding Partly automated segmentation Different output formats

• • • •

Disadvantages Discrete components Only partly GUI controlled Manual refining not possible Long training period

Table 4.2: Comparison of the advantages and disadvantages of VMTK

Figure 4.5: Surface reconstruction of MRI images with VMTK. Left: Selection of the region of interest, right: Reconstructed surfaces of blood vessels.

33

4.4.3 MeshLab MeshLab [44] is an application which cannot be used for segmentation itself but for post-processing the extracted structures. Therefore, different methods of smoothing, trimming and rebuilding the surfaces are provided. Figure 4.6 depicts the smoothing of the surface of a blood vessel.

Figure 4.6: Smoothing of a surface with MeshLab. Left: Before smoothing, a lot of artifacts like sharp edges and steps are present, right: Post-processing significantly reduces the artifacts. Color implies curvature in respect to the current extreme values.

4.5 Mesh Generation After segmentation and post-processing the resulting structure has to be meshed. The main concepts and issues of meshing are described in Chapter 3, the following subsections give an overview about two of the mesh generators used in this work.

4.5.1 Generic Mesh Generation Interface The Generic Mesh Generation Interface [36, 37] is used in the tool chain to encapsulate the mesh generation process (see Section 3.5.1). This enables generic access to various mesh generators, which are described in the next sections. Figure 3.6 depicts the functionality of the interface. Not only the generic approach but also the automatic selection of a meshing kernel based on meta information are reasons for using this library.

4.5.2 Ballistics Research Laboratory Computer-Aided Design BRL-CAD [43] is an open-source cross-platform solid modeling system. Its main capabilities are • CSG modeling, • Ray tracing (rendering), • Framebuffering including network display capabilities and • Image manipulation and analysis including format conversions. 34

Although the software package is rather old (the first versions are from the 1980s) there is a vivid community further developing this tool. Some geometry editing tools and mesh generators such as those from Comsol Multiphysics (see below) work better with Computer-Aided Design (CAD) exchange file formats (like IGES) than with the widely-used STereoLithography file format (STL), which is also used by the segmentation applications mentioned above. Therefore, the main task for which BRL-CAD is used within the tool chain is as conversion tool for different geometry formats.

4.5.3 Triangle and Show Me Triangle [40] is a 2D mesh generation application for Delaunay triangulation. The program is written in C and uses very fast, stable and memory efficient algorithms. Triangle has no GUI itself but Show Me enables to view the polynomial start files as well as the results. The polynomial files contain lists of points, segments and holes. It is possible to set quality criteria such as minimum angles or maximum areas. Table 4.3 depicts the advantages and disadvantages of Triangle, Figure 4.7 shows two screenshots of Show Me. • • • •

Advantages Ensured Delaunay triangulation Robust and efficient algorithms Short training period Lightweight application

Disadvantages • Only 2D meshing • No real GUI • Programming style is not generic

Table 4.3: Comparison of the advantages and disadvantages of Triangle.

Figure 4.7: Meshing of the TU Wien logo with Triangle. Left: Geometry given by points, segments (blue lines) and holes (red crosses), right: Resulting mesh with quality criteria set.

35

4.5.4 Computational Geometry Algorithms Library The Computational Geometry Algorithms Library (CGAL) [77] is a C++ library suite providing efficient and reliable geometric algorithms and data structures. The range of packages reaches from arithmetic, algebraic and combinatorial algorithms over geometric kernels and operations, mesh generation and optimization as well as geometry processing and optimization to spatial searching and sorting, interpolation and kinetic data structures. Additional support libraries such as interfaces to solvers and visualization programs are also available. Within the tool chain CGAL is used for 2D and 3D mesh generation, using the Generic Mesh Generation Interface to control the library. Table 4.4 shows the properties of the library suite. Advantages • Fast and robust • Generic software design • Various algorithms for meshing, refining and coarsening • Large community, vivid life and update cycle

Disadvantages • No meshing application available; only a library • Long training period

Table 4.4: Comparison of the advantages and disadvantages of CGAL.

4.5.5 Tetgen and Tetview Tetgen [38] is a mesh generation application generating 3D tetrahedral meshes. Tetgen itself is a command line program, but a GUI called Tetview is available. Table 4.5 depicts the advantages and disadvantages of Tetgen, Figure 4.8 shows two screenshots of Tetview. • • • •

Advantages Fast and robust Mathematically guaranteed mesh algorithm Topological tests Short training period

• • • •

Disadvantages No generic software design at all Difficult to use for multiple materials No topological repair possible Only unintuitive mesh refinement, no coarsening

Table 4.5: Comparison of the advantages and disadvantages of Tetgen.

4.5.6 Netgen Netgen [39] is a mesh generation application generating 2D and 3D triangle meshes as well as 3D tetrahedral meshes. Furthermore, it is the basis for the NGSolve suite introduced in the next section. Table 4.6 gives an overview about the advantages and disadvantages of Netgen, Figure 4.9 shows screenshots of the meshing process. • • • • •

Advantages Fast and robust Mathematically guaranteed stable mesh algorithm Different mesh refinement methods Topological tests and repair Easy to use multiple materials

Disadvantages • Source code difficult to attend • Long training period • No mesh coarsening

Table 4.6: Comparison of the advantages and disadvantages of Netgen.

36

Figure 4.8: Meshing of a simple cylinder containing a sphere with Tetgen. Left: Geometry to mesh given by the nodes, right: Resulting mesh (clipped).

Figure 4.9: Meshing of a water cooling device for CPUs with Netgen. Left: Geometry to mesh, right: Meshing process.

4.6 Computational Fluid Dynamics Simulation After meshing the model has to be specified as a mathematical problem, and the respective equations have to be solved. The subsequent subsections describe two radically different suites for doing CFD simulations.

4.6.1 NGSolve and ngs-flow NGSolve [78] in conjunction with ngs-flow [65] is a suite capable of CFD simulations and based on Netgen. The local discretization is performed by the DG-FEM, for time discretization IMEX schemes are used. Both concepts are described in Chapter 3, Table 4.7 provides an overview of the advantages and disadvantages of these applications. Figure 4.10 shows screenshots of the suite. Together Netgen, NGSolve and ngs-flow shall be referred to as NG-suite.

37

• • • • •

Advantages Open-source software Latest mathematical developments implemented Multipurpose simulation software Short development periods Various input and output formats

Disadvantages • Long training period • High complexity • High effort to install

Table 4.7: Comparison of the advantages and disadvantages of the NGSolve suite.

Figure 4.10: Simulation of a water cooling device for CPUs with NGSolve and ngs-flow. Left: Boundary conditions and model components, right: Results of the simulation.

4.6.2 Comsol Multiphysics Comsol Multiphysics [79] is a proprietary simulation suite and can be used (at least with the CFD module) for CFD simulations. The local discretization is performed by the FEM. Table 4.8 depicts the advantages and disadvantages of this tool, Figure 4.11 shows two screenshots. Advantages • Short training period • Multipurpose simulation software • Low effort to install

Disadvantages • Proprietary software • Long development periods

Table 4.8: Comparison of the advantages and disadvantages of Comsol Multiphysics.

38

Figure 4.11: Simulation of a human aortic arch with Comsol Multiphysics. Left: Model tree with equations, right: Results of the simulation.

4.7 Conclusions for the Tool Chain Used All described software tools can be used to set up a robust and high performance tool chain for CFD simulations. Table 4.9 shows the use of the above mentioned applications and their modules, respectively. Software Package Text editor & Netgen (geometry analysis) Comsol Multiphysics (geometry modeling) BRL-CAD (geometry modeling) ITK itkSNAP VMTK MeshLab Generic Mesh Generation Interface BRL-CAD (format conversion) Triangle CGAL Tetgen Netgen (meshing) Comsol Multiphysics (meshing) NGSolve, ngs-flow Comsol Multiphysics (simulation)

Category CSG modeling Image post-processing Segmentation Geometry post-processing Interface

Mesh generation & adaption

Simulation suite

Usage yes yes no yes yes no yes yes no no no no yes yes yes yes

Table 4.9: Usage of the described software tools within the investigated tool chains for the evaluation applications in Chapter 5.

Not every described software tool is used in the applications depicted in Chapter 5. The choice of some tools over others has various reasons:

39

• When the complexity of the CSG models is rather low, a simple text editor is sufficient to set up the geometry files. Hence, the use of an additional tool like BRL-CAD is unnecessary. With rising complexity of a model’s geometry, the motivation to use modeling tools or built-in geometry editing modules increases. Thus, the more complex CSG models are built with the CSG module of Comsol Multiphysics. Of course the use of built-in modules over external tools or editors is self-evident. Not only additional chain links bringing along additional risks but also complex and error-prone input files can be avoided. Additionally there is no need for any format conversions when using built-in tools. • itkSNAP is chosen over VMTK for the better usability while giving approximately equivalent results. • Within the interfaces only the Generic Mesh Generation Interface is used in combination with the NGSolve-suite. The format conversion of BRL-CAD has to be used, if the Comsol Multiphysics-suite is used for segmentation based geometries to enhance robustness. • In all cases the Generic Mesh Generation Interface is used, Netgen is chosen for mesh generation and adaption because of the provision of adaptive meshing with hp-refinement for 2D and 3D geometric models. • For the simulation task itself both suites, NGSolve with ngs-flow as well as Comsol Multiphysics, are used with different settings. While NGSolve is used for CSG models and models based on imaging and segmentation, Comsol Multiphysics is solely performing simulations of constructively modeled geometries with higher complexity. The choice of NGSolve over Comsol Multiphysics for most applications is primarily made due to the open-source approach, enabling modifications of the code when required. This is especially important in more scientific cases, where changes and adoptions of the simulation are often inevitable. For industrial applications on the other hand, a ready made solution often is preferred.

40

Chapter 5

Applications This chapter demonstrates two applications of robust CFD simulations, both are from completely different scientific fields to point out versatility and the resulting wide area of possible applications. The first application is from the field of electrical engineering and physics, dealing with flows for thermal management. The second application comes from the biological and medical area and discusses mechanical effects of fluid flow. In each section, the problem to solve is analyzed, a solution technique is developed and the results of the simulations are presented based on the developed flexible CFD tool chain.

5.1 Thermal Management The first field of appliance is the area of thermal management, especially water cooled semiconductorbased devices. This application not only demonstrates the basic advantages of simulations over prototyping, but also shows the practical utilization of the modeling principles presented in Chapter 1. Furthermore, it combines different physical phenomena, thus challenging the robustness of the solution procedure.

5.1.1 Problem to Solve The increasing computational power of both CPUs and GPUs comes at the expense of increasing amount of electrical power. Therefore cooling elements of increasing sophistication have to take care of the rising amount of dissipated heat [80, 81]. Water cooling solutions are an attractive means of dealing with the excess heat. Water cooled CPUs can be found first and foremost in high performance computing clusters, supercomputers, computational grid computers and servers [82] but also in noise-limited Personal Computers (PCs) with overclocked CPUs, such as used in compute-intensive professional workstations for video and audio development as well as in gaming environments. Additionally, some considerations with regard to the power consumption of the cooling elements themselves have to be taken into account not in the least due to the rising economic pressure. Estimations are, that in 2009/2010 between 1.1% and 2% of total electrical energy world wide has been used for data centers [83, 84]. IBM has found a method to reduce the excess heat by about 80% [85] using hot water cooling [86].

41

The main advantages [87] of water-based cooling system over cooling with air are the higher heat capacity and thermal conductivity of water so that • An increase of computing power and the associated higher heat loss can be dealt with more easily, • The heat sinks may be smaller in size, • Components last longer due to more homogeneous temperature distribution and • It offers the potential for significantly reduced noise during operation. The use of hot water cooling methods adds • Huge economic and ecological benefits as a result of the reduction of energy waste and • Even longer component life times compared to standard water cooled systems to the list of advantages [88]. On the other hand, the gain of sustainable performance levels of cold water cooled systems in comparison to air cooled systems is nearly totally lost. Below two examples are presented to show the application of either standard water cooling as well as the hot water cooling method. Water Cooling in Mainstream Personal Computers The aim of this first example is to enhance the cooling characteristics of an existing heat sink for CPUs by various modifications to its geometry. Since this particular use case is from the consumer sector, less consideration can be given to the optimization itself but the financial pressure has to be payed particular attention to. This provokes additional considerations regarding the complexity and manufacturability of the optimized designs. Different, in part conflicting criteria for the simulation steps have to be met: • The maximum temperature at the contact plane has to be kept as low as possible. • The temperature at the contact plane should be dispersed as evenly as possible. • The heat sink has to be easy to craft and as small as possible. • Since a stable status regarding the fluid motion has to be reached, the flow has to be steady state. Hence, quick convergence of the flow simulation has to be reached. • The same situation regarding stability applies to the temperature distribution, which has to be steady state as well and should be reached as quick as possible. Therefore, the temperature simulation should converge as fast as possible. Hot Water Cooling for Data Centers This second example is driven by an industrial application and deals with the optimization of the cooling systems in data centers. In contrast to the above use case the price for the production of the heat sink plays a minor role in this application. Due to rising prices of energy and constantly increasing environmental awareness leading to more and more ecologic constraints the driving force here is the energy consumption [83]. Other important price-enhancing factors are personnel costs as well as costs for spare parts. 42

Water cooling is much more efficient than cooling with air but the most energy-hungry parts are the chillers and compressors [85]. A possible solution for this problem is hot water cooling: To cool all necessary parts of the system, water is forced through microchannels, making direct contact with the processors, where it absorbs excess heat. It is then pumped to a heat exchanger where it loses the additional heat. Afterwards the water is pumped back into the circuit [86]. Thus about 73% of the cooling power and 50% of auxiliaries power can be saved. About 80% of the excess heat can be used for heating the human-occupied parts of the building if needed, which means an additional saving of energy [85, 89]. Additionally, hot water cooling keeps parts within a very small temperature range in an extremely stable manner. Thus the average life time of parts can be prolonged reducing personnel costs and the necessity of spare parts. The idea of hot water cooling is inspired by nature: Dense, complex but efficient systems are used in biology to transport gases and fluids within animal and human bodies [90]. These streams can be used for material transport (e.g. supply of oxygen using both air and blood) as well as for the regulation of temperature. All these systems are constructed in a branching tree like manner with layers, where each layer has more but smaller elements. Hot water cooling uses the same principle going from pipes down to microchannels. The aim of the example is to reconstruct the microchannel based heat sink IBM used with the SuperMUC supercomputer at the Leibnitz Rechenzentrum in Munich in 2012. Since the design is proprietary and the layout has not been disclosed, the heat sink is reverse engineered from available data. This inspires further investigations of image processing and structure reconstructions. Revealing a problem at the edges of the heat sink, a second step is performed to redesign the cooling element and overcome the flaw of the first attempt. As always conflicting criteria for the simulation are present: • The temperature at the contact plane should be dispersed as even as possible. • The temperature of the cooling water at the inlet should be about 45 degrees Celsius and its temperature rise may not exceed 15 Kelvin. • The decrease in pressure over the cooling element has to be as low as possible. • The heat sink has to be small enough so that the whole layer fits into the standardized 19 in racks. • The flow has to be steady state, hence quick convergence of the flow simulation has to be reached. • The temperature distribution has to be steady state and should be reached as quick as possible, hence the temperature simulation should converge as fast as possible.

5.1.2 Problem Modeling Below the methodology of solving of the two examples (heat sink optimization in mainstream PCs and hot water cooling in data centers) is depicted. Heat Sink Optimization in Mainstream Personal Computers Beside the actual cooling elements (either for the CPU only or for each CPU and GPU) cooling circuits consist of a radiator unit to cool the water, a pump and an expansion tank. The latter is to keep the system of tubes and cooling elements free of air bubbles, managing thermal induced expansion and for refilling to compensate any loss of water [91]. The whole cooling circuit is shown in Figure 5.1.

43

H  Sink

R 

Pump

Ep  n T a

Figure 5.1: Left: Installed water-cooling system1 . Right: Diagram of the cooling circuit consisting of the heat sink, the radiator unit to cool the water and a pump as well as an expansion tank.

In these simulations only the CPU cooler containing a water surrounded heat sink made of copper is modeled. The radiator unit is expected to be well dimensioned which means that a constant temperature at the inlet of the heat sink can be assumed. Due to the considerable differences of thermodynamic material constants and the velocities of the cooling water compared to the surrounding air, the effects of the forced convection can be expected to be orders of magnitude bigger than those of the natural convection, hence the influence of the thin wall of the box as well as the influence of buoyancy driven fluid motions can be neglected. For this reason the simulations of fluid motion and heat transfer can be decoupled, as stated in Section 2.3. Figure 5.2 shows the original geometry of the simulated heat sink.

Figure 5.2: Geometry of the original CPU heat sink design in upright position as it is mounted in a tower casing.

To meet the typical dimensions of cooling elements for CPUs all the lengths are measured in cm throughout these models and the units for other physical quantities are adapted to cm as well. The material constants for water and copper are found in the literature [93, 94]. 1

Original Photo [92] was published under the Creative Commons Attribution-Share Alike 2.0 Generic license and then altered for this thesis.

44

The materials are modeled with the characteristics at normal pressure (101 325 Pa) and a temperature of 24 ◦C. Table 5.1 shows the used material characteristics. Material constant Density ρ [kg · cm−3 ] Dynamic viscosity η [kg · cm−1 · s−1 ] Kinematic viscosity ν [cm2 · s−1 ] Thermal conductivity k [W · K−1 · cm−1 ] Specific heat capacity cp [J · kg−1 · K−1 ]

Water 0.9973 · 10−3 9.107 · 10−6 9.132 · 10−3 6.054 · 10−3 4182

Copper 8.7 · 10−3 — — 4.000 385

Table 5.1: Material properties used in the first example.

The heat source is chosen to emit an excess heat Q = 180 W, which is slightly above the value of the Intel QX9775 processor [95]. The emission is uniformly distributed over an area of 36 cm2 , therefore a heat flow rate of q = 5 W · cm−2 is derived. The tool chain used includes the following steps: • Construction of the cooler with CSG (see Section 4.2). • Meshing of the constructed geometry (see Sections 3.5 and 4.5). Regions expected to have high temperature gradients are resolved finer to improve accuracy and decrease visualization errors. The resolution of areas with high gradients in velocity also has to be high to improve convergence and stability. • Flow simulation by solving the stationary iNSE (see Sections 2.1, 3.6 and 4.6). • Heat simulation by solving the heat convection equations (see Sections 2.2, 3.6, 3.7 and 4.6). After performing all steps the results are analyzed regarding the criteria set above. Eligible methods of improvement constrained by the set goals are considered and the geometry modified accordingly before starting a new simulation cycle. The tool chain used to perform the simulations for optimization of the CPU heat sinks for the mainstream PCs is depicted in Figure 5.3: The CSG files are created using a text editor, the Generic Mesh Generation Interface and Netgen are selected for meshing. The simulation itself is performed by NGSolve and ngs-flow. Tet

MI

#$%&$'

N !" #()*+,$ - #().+*/

Figure 5.3: Tool chain used to simulate the CPU heat sink examples.

The rising complexity of the heat sink designs would make the usage of a text editor for CSG modeling laborious and error-prone. Due to the flexibility of the developed tool chain this module could easily be replaced by another CSG module like the one built-in in Comsol Multiphysics or BRL-CAD. In the same manner, the mesh generation process could hark back to other interfaces and mesh generators if necessary.

45

Reconstruction of an Industry Inspired Example In contrast to the first example the hot water cooling is not intuitive. To get a better understanding of this industrially motivated use case, a heat sink using microchannels and hot water is reconstructed and simulated. As before, only the heat sink (without the rest of the cooling circuit) is examined but this time the reconstruction of the geometry is more complex.

Figure 5.4: Screenshots from an IBM Research video [96]. From upper left to lower right: Highest to lowest layer of a hot water cooling heat sink using microchannels. In the first picture the connections can be seen, the last picture shows the direct contact of the water with the processor.

The first step is to reverse engineer the geometry of the IBM heat sink, using a video from the IBM Research YouTube channel [96] as template. Figure 5.4 shows screenshots from the video used to reconstruct the cooler. After the reconstruction of the geometry the workflow is nearly identical to the above use case: • Construction of the heat sink using the CSG module (see Sections 4.2 and 4.6.2). • Meshing of the constructed geometry (see Sections 3.5 and 4.5). Again regions expected to have high gradients of temperature or velocity are resolved finer, thus improving accuracy and stability. • Simulation of the static fluid motion and the heat transfer (see Chapter 2 as well as Sections 3.6, 3.7 and 4.6). To verify the assumption, that the two simulations could be decoupled as stated in Section 2.3, the simulation is done twice: First, a coupled flow and heat simulation solving Equation 2.27 is performed, second, the fluid motion simulation is done neglecting heat induced flows with the heat transfer simulation done afterwards. The tool chain used is shown in Figure 5.5. All steps are performed using the appropriate modules of Comsol Multiphysics. Since the geometry is far more complex than in the first case, a CSG modeling tool has to be used. The CFD module of the used simulation suite is capable of coupling the flow and heat simulations. C:; ?l@

Coms01 23145678s59s