discrete element method for 3d simulations of mechanical systems of

0 downloads 0 Views 12MB Size Report
Mechanical Systems of Non-Spherical Granular ..... 6.1 Sketch of a shared memory and a distributed memory computer. .... least on the terrestrial scale, between nano-powders and rocks, research dedicated to ...... 1.3 / 1.7. 4. 31 / 76. 36% / 67 %. 1.4 / 2.6. 8. 31 / 86. 17% / 37 %. 1.4 / 3.1 ...... Ridge National Laboratory, 1965.
DISCRETE ELEMENT METHOD FOR 3D SIMULATIONS OF MECHANICAL SYSTEMS OF NON-SPHERICAL GRANULAR MATERIALS

JIAN CHEN

THE UNIVERSITY OF ELECTRO-COMMUNICATIONS GRADUATE SCHOOL OF ELECTRO-COMMUNICATIONS A DISSERTATION SUBMITTED FOR DOCTOR OF PHILOSOPHY IN ENGINEERING MARCH 2012

DISCRETE ELEMENT METHOD FOR 3D SIMULATIONS OF MECHANICAL SYSTEMS OF NON-SPHERICAL GRANULAR MATERIALS

APPROVED BY SUPERVISORY COMMITTEE: CHAIRPERSON: ASSOC. PROF. Hans-Georg Matuttis MEMBER: PROF. Hiroshi Maekawa MEMBER: PROF. Takeshi Miyazaki MEMBER: PROF. Takashi Kida MEMBER: PROF. Tomio Okawa MEMBER: ASSOC. PROF. Takaaki Nara

Copyright by JIAN CHEN 2012

非球形粒子を用いた粉粒体力学三次元シミュレーションの ための離散要素法 陳健 概要

離散要素法 (または、個別要素法或はDEM)は粉体力学の分野でよく用いられるマイクロメカニカ ルなシミュレーション方法である。従来の研究では粒子形として円盤や球が用いられているが、実際の 粉体の粒子は非球形であり、シミュレーションでの非物理学的なアーチファクトが多かった。本研究で は、粉体の多体動力学の信頼できる三次元系の分析方法を発展するために、粒子形として多面体を用 いる。その際の粒子間相互作用は多面体の接触状況の詳細を考慮する必要があり、この様な多面体を 用いた離散要素法を実現し、物理学的なシミュレーション結果を与えた。 第一章の序文では本研究の位置付けを行う。 本研究で扱う粉体現象は砂山の様な高密度な状態に 限定する。 様々な粉体力学で使われているシミュレーション方法を紹介し、離散要素法における粒子間 相互作用のモデル化とプログラムの概要を解説する。また、離散要素法における時間積分の安定性と 精度を議論する。 第二章では多粒子系の運動学とその数値解析手法(時間積分)の選択に関して説明する。粒子の 並進運動は一般のデカルト座標系を用いて解析する。 回転運動はオイラー角の運動方程式が数値的に 不安定であるため、単位四元数を変数として用いた。運動方程式はGear予測子-修正子法 (後退差分方 程式)で時間積分した。数値計算誤差に伴うノイズを抑えるため時間積分ごとに強制的に四元数とそ の時間微分間の直交化を行う必要があることを発見した。 第三章では粒子データ構造(頂点、辺、面など)と粒子間の接触を検知する方法について述べる。 また、凸多面体の体積、重心と慣性モーメントの計算方法を説明する。さらに、最小境界ボックス法と 並べ替え法を用いた接触検知のアルゴリズムを記述する。 第四章では重合部分の計算について示す。二つの凸多面体の相貫体は凸多面体な構造を示すべき である。しかし、計算幾何学の分野では相貫体の高速計算アルゴリズムは報告されてなく、本研究で全 く新しいアルゴリズムを提案する。概要として、まず貫く点と交切線を計算する。そして求められた貫 く点ともともとの多面体の頂点から相貫体が作成される。 第五章では粒子間の相互作を説明する。垂直抗力は重合部分の体積とヤング率に比例し、交切線 から力の作用線と接触面積を計算する。接線方向の力(クーロン摩擦)はCundall-Strackのモデルによ り導入される。重合部の体積の時間変換に比例する減衰力を導入して、実効的な反発係数のモデルと する。 第六章では関数の並列化、プログラムの並列化手法とその並列化効率が紹介する。OpenMPを用 いて8コアの2プロセッサ共有メモリ並列化が実現される。

第七章ではプログラムの実行結果及び実験との比較検証に関して言及する。まず我々の三次元の砂 山シミュレーションでは、球形または多面体の粒子間の「重ね長さ」(penetration depth)をベースとし た従来の数値解析方法よりも、より高い現実に近い安息角度を得ることができた。プログラムの検証の ために、準二次元の砂山内部の密度分布について実験結果とシミュレーション結果を比較した。砂山中 のかさ密度分布をレーザー計測した結果、対称軸近辺の密度が高いことから砂山内部の密度分布はシ ミュレーション結果の特徴と一致する。 第八章では本論文の結言を示す。 本論文では多面体粒子を用いる安定で実効性の高い離散要素法を提案した。時間積分では数値的 に安定性の高いGear予測子-修正子法を用いた。また力学モデルにおいては幾何学的な接触状況の詳 細を考慮した粒子間相互作用が組み込まれている。今回実装された多面体DEM手法は、従来の「重ね 長さ」をベースとした手法では解析困難な現象にも適用可能だと考えている。

Discrete Element Method for 3D Simulations of Mechanical Systems of Non-Spherical Granular Materials Jian Chen ABSTRACT Granular materials are ubiquitous in nature and technology. Nevertheless, no macroscopic equation for granular materials is known. The discrete element method (DEM) has been widely used to simulate complex behavior of granular materials without constitutive laws. It has become increasingly clear that the dynamics of non-spherical granular materials is governed essentially by the deviations of the particle shape from ideal spheres. Up to now, among most of the simulations for round particles, a few DEM code modeled the particles with polyhedral shape while the contact force by the “penetration depth” which essentially is the same as for round particles. None of those methods so far can investigate “history effect” of granular materials (construction history dependent phenomena) for three dimensional cases, nor can they reproduced realistically high angles of repose on flat surface for practical friction parameters. The main objective of the study is to develop a DEM code for granular materials, using polyhedral particle shape, with a contact force model which takes into account the whole geometry of the “overlap polyhedron” between non-deformed polyhedral particles. The contact force point is defined as the center of mass of the overlap polyhedron and the normal force direction as the average of the area-weighted normals of the contact triangles formed by the centroid of the overlap polyhedron and the generated vertices (the intersection points of two polyhedra). The volume of the overlap polyhedron is used as a measure for the elastic force and its changes for the damping force in normal direction. The characteristic length is introduced in the contact force model, with which the continuum-mechanical sound velocity can be reproduced in DEM simulation of a spacefilling packing of cubic blocks. The two-dimensional Cundall-Strack model is generalized for three dimensions as the approximation for friction. A systematic approach for overlap computation is introduced and implemented to obtain the overlap polyhedron (its center of mass, volume and the normal of the contact area). Several methods and algorithms are presented with respect to the vertex and the face computation, the triangle intersection computation algorithms characterized by the point-direction form and the point-normal form representation of plane, the neighboring feature algorithm for vertex computation, to obtain the overlap geometry efficiently. To further improve the efficiency, a contact detection strategy which precedes overlap computation is also applied: The determination of possible contacting particle pairs via the

neighborhood algorithm by sorting the axis-aligned bounding boxes (“sort and sweep”) in three dimensions; The refinement of the contact list via bounding spheres and extremal projections of vertices along the central line of the particle pairs. Simulation results for heaps constructed on flat surface show more realistic, high angles of repose than any penetration depth based simulations with either round or polyhedral particles. As verification, consistent results of angle of repose and of density patterns have been obtained from the DEM code and the experiments for quasi-two-dimensional heaps constructed by wedge sequences. The simulation results showed clear pressure dips for the heaps constructed by wedge-sequences and pressure maxima for the heaps constructed by layered sequences. This shows that the (construction) history effect on ground pressure distribution can be resolved with our DEM method. With the polyhedral DEM code, a larger phenomenology of granular materials is accessible than with round particles or with penetration depth based force models.

Contents Contents

i

1 Introduction

1

1.1

1.2

1.3

1.4

Overview of granular materials . . . . . . . . . . . . . . . . . . . . . . . . .

1

1.1.1

Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

1.1.2

Scales and examples . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

Phenomenology and methodology . . . . . . . . . . . . . . . . . . . . . . . .

4

1.2.1

Rheology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

1.2.2

Plastic behavior versus elasticity . . . . . . . . . . . . . . . . . . . .

4

1.2.3

Statistical physics . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

1.2.4

Size- and shape effects . . . . . . . . . . . . . . . . . . . . . . . . . .

6

Theoretical and computational methods . . . . . . . . . . . . . . . . . . . .

7

1.3.1

The event-driven method . . . . . . . . . . . . . . . . . . . . . . . .

7

1.3.2

Lattice models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

1.3.3

The Monte Carlo method . . . . . . . . . . . . . . . . . . . . . . . .

9

1.3.4

Continuum dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

DEM modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.4.1

Geometry models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.4.2

Normal force models . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.4.3

Modeling of solid friction . . . . . . . . . . . . . . . . . . . . . . . . 23

1.5

Accuracy and stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1.6

Disorder and qualitative results . . . . . . . . . . . . . . . . . . . . . . . . . 27

1.7

Objective and significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

1.8

1.7.1

Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

1.7.2

Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

1.7.3

Granular heap as test case . . . . . . . . . . . . . . . . . . . . . . . . 31

Organization of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2 Kinematics and Time Integration 2.1

2.2

34

Kinematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.1.1

Translation and rotation . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.1.2

Equation of motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Time integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 i

2.2.1

Explicit and implicit methods . . . . . . . . . . . . . . . . . . . . . . 40

2.2.2

Gear predictor corrector for 2nd-order ODEs . . . . . . . . . . . . . 42

2.2.3

Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2.2.4

Number of iterations . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3 Geometry and Contact Detection 3.1

3.2

3.3

46

Particle geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.1.1

Basic concepts and data structure . . . . . . . . . . . . . . . . . . . 46

3.1.2

Particle generation and geometry update . . . . . . . . . . . . . . . 49

Physical properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.2.1

Decomposition of a polyhedron into tetrahedra . . . . . . . . . . . . 50

3.2.2

Volume, mass and center of mass . . . . . . . . . . . . . . . . . . . . 54

3.2.3

Moment of inertia . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Contact detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.3.1

Conventional neighborhood algorithms . . . . . . . . . . . . . . . . . 57

3.3.2

Neighborhood algorithm via sorting . . . . . . . . . . . . . . . . . . 59

3.3.3

Refinement of the contact list . . . . . . . . . . . . . . . . . . . . . . 66

4 Overlap Computation 4.1

4.2

4.3

4.4

69

Triangle intersection computation . . . . . . . . . . . . . . . . . . . . . . . . 70 4.1.1

Intersection algorithm via point-direction form . . . . . . . . . . . . 70

4.1.2

Intersection algorithm via point-normal form . . . . . . . . . . . . . 73

Vertex computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.2.1

Inherited vertices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.2.2

Generated vertices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Face and contact area determination . . . . . . . . . . . . . . . . . . . . . . 85 4.3.1

Face determination . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.3.2

Contact area and normal determination . . . . . . . . . . . . . . . . 90

Optimization for vertex computation . . . . . . . . . . . . . . . . . . . . . . 93 4.4.1

Determination of neighboring features . . . . . . . . . . . . . . . . . 93

4.4.2

Neighboring features for vertex computation

4.4.3

Statistics from a test case . . . . . . . . . . . . . . . . . . . . . . . . 96

5 Force Modeling

. . . . . . . . . . . . . 95

99

5.1

The normal- and tangential direction and the force point . . . . . . . . . . . 100

5.2

Modeling of the normal force . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.2.1

Magnitude of the elastic force . . . . . . . . . . . . . . . . . . . . . . 102

5.2.2

Characteristic length . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

5.2.3

Sound propagation in discrete chain . . . . . . . . . . . . . . . . . . 105

5.2.4

Continuity of the time-evolution of the elastic force . . . . . . . . . . 106

5.2.5

Dissipative force in normal direction . . . . . . . . . . . . . . . . . . 109

5.2.6

Estimation of the time step . . . . . . . . . . . . . . . . . . . . . . . 111 ii

5.3

5.4

Modeling of the tangential Force . . . . . . . . . . . . . . . . . . . . . . . . 111 5.3.1

Cundall-Strack friction in two dimensions . . . . . . . . . . . . . . . 112

5.3.2

Cundall-Strack friction in three dimensions . . . . . . . . . . . . . . 114

5.3.3

Dissipative force in tangential direction . . . . . . . . . . . . . . . . 116

5.3.4

Caveats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

A simple test: a cube on an inclined plane . . . . . . . . . . . . . . . . . . . 119

6 Parallelization

123

6.1

Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

6.2

Compiler: FORTRAN versus C . . . . . . . . . . . . . . . . . . . . . . . . . 125

6.3

Programing model and hardware . . . . . . . . . . . . . . . . . . . . . . . . 127 6.3.1

Shared memory versus distributed memory . . . . . . . . . . . . . . 127

6.3.2

MIMD versus SIMD . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

6.4

OpenMP for parallelization . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

6.5

Parallelization of the DEM code . . . . . . . . . . . . . . . . . . . . . . . . 132

6.6

6.5.1

Profiling of the scalar code . . . . . . . . . . . . . . . . . . . . . . . 132

6.5.2

Parallelization for overlap computation . . . . . . . . . . . . . . . . . 133

6.5.3

Profiling of the parallelized code . . . . . . . . . . . . . . . . . . . . 137

6.5.4

Influences of Hardware and Software . . . . . . . . . . . . . . . . . . 138

6.5.5

Attempts for further optimization . . . . . . . . . . . . . . . . . . . 139

Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

7 Verification 7.1

7.2

141

Heap formation in 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 7.1.1

Heap construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

7.1.2

Angle of repose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

7.1.3

Oscillations in DEM heap . . . . . . . . . . . . . . . . . . . . . . . . 145

7.1.4

Summary and concluding remarks . . . . . . . . . . . . . . . . . . . 147

Study of quasi-two dimensional heaps . . . . . . . . . . . . . . . . . . . . . 147 7.2.1

Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

7.2.2

Experimental investigation . . . . . . . . . . . . . . . . . . . . . . . 149

7.2.3

Simulation

7.2.4

Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 155

7.2.5

Simulation of layered sequence heap . . . . . . . . . . . . . . . . . . 162

7.2.6

Summary and concluding remarks . . . . . . . . . . . . . . . . . . . 164

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

8 Summary and Conclusions

166

8.1

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

8.2

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

References

171

iii

List of Figures 1.1

Closest packing and reduction of the density (Reynolds dilatancy) . . . . .

5

1.2

Closest packing and shear band formation . . . . . . . . . . . . . . . . . . .

5

1.3

Physical situation and the simulation with finite element method and discrete element method

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.4

Clusters of round particles in DEM simulations . . . . . . . . . . . . . . . . 15

1.5

Higher effective Young’s modulus of composed particles made from round particles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.6

Penetration depths of a pair of particles at three different time steps . . . . 18

1.7

Impact on a Newton cradle as a model for the shock (sound) propagation in DEM simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.8

A space-filling packing of bricks. . . . . . . . . . . . . . . . . . . . . . . . . 19

1.9

Alternative definitions of the contact line for overlapping polygons . . . . . 20

1.10 Numerical approaches to Coulomb friction (one dimensional case) . . . . . . 23 1.11 Unphysical motion of a block on an inclined slope in numerical simulation ignoring the difference between static friction and dynamic friction . . . . . 24 1.12 Artificial data of configuration averages for flow problems with clogging . . 28 1.13 Artificial data of taking an average over a pressure distribution with a marked dip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 1.14 Pressure distribution of a single measurement . . . . . . . . . . . . . . . . . 29 1.15 Improper average of pressure measurements under granular heaps . . . . . . 29 1.16 General DEM flow diagram and the diagram in our DEM implementation for a time integration with a predictor-corrector method . . . . . . . . . . . 32 2.1

Sketch of the translation and rotation of a rigid body in a space-fixed coordinate system and a body-fixed coordinate system . . . . . . . . . . . . . 35

2.2

Euler explicit method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.3

Euler implicit method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.4

Simulation of a bouncing ball with the explicit Runge-Kutta method and the Gear-Predictor-Corrector formula

3.1

. . . . . . . . . . . . . . . . . . . . . 44

A sketch of the geometry of a tetrahedron and its topological information of vertices and faces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 iv

3.2

An example of the VERT_COORD, FACE_EQUATION, FACE_VERTEX_TABLE and VERTEX_FACE_TABLE array used in the DEM simulation to represent the tetrahedron in Fig. 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.3

Various kinds of polyhedra for DEM simulations . . . . . . . . . . . . . . . 49

3.4

Comparison of the polyhedra with 62 vertices and 120 faces generated from Schinner’s and our method . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.5

Sketch of a pyramid for decomposing a complex polyhedron. . . . . . . . . . 52

3.6

Decomposition of a sub-tetrahedron of a polyhedron . . . . . . . . . . . . . 52

3.7

Contact detection via bounding boxes in 2D . . . . . . . . . . . . . . . . . . 56

3.8

Verlet table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.9

Neighborhood table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.10 Sketch of a cell and its neighboring cells needed to be checked in two dimensions and in three dimensions . . . . . . . . . . . . . . . . . . . . . . . . 59 3.11 A case only one particle is larger than the rest in neighborhood table. . . . 59 3.12 Positions of the particles and of the bounding boxes in a one dimensional case for the neighborhood algorithm via sorting . . . . . . . . . . . . . . . . 61 3.13 Relative movement of bounding-boxes in one dimension . . . . . . . . . . . 62 3.14 Embedding of the linear array a for the bounding box positions between a sentinel of extremal values . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.15 Relative movement of bounding-boxes in two dimensions . . . . . . . . . . . 63 3.16 Relative movement of bounding-boxes in three dimensions . . . . . . . . . . 64 3.17 Refinement of the contact detection results via bounding circles in 2D . . . 66 3.18 Refinement of the contact detection results via vertex projection in 2D . . . 67 4.1

A point-direction form representation for a triangle . . . . . . . . . . . . . . 70

4.2

Two intersecting triangles represented by point-direction forms . . . . . . . 71

4.3

Possible relative positions of a triangle and a plane . . . . . . . . . . . . . . 74

4.4

Intersections of two triangles with the two corresponding planes . . . . . . . 76

4.5

The sorted intersection segments of two triangles . . . . . . . . . . . . . . . 76

4.6

Overlap of two irregular-shape polyhedra . . . . . . . . . . . . . . . . . . . 80

4.7

One intersection point enters twice into the intersection point list . . . . . . 82

4.8

The degenerate cases in the overlap polyhedron computation . . . . . . . . 83

4.9

The inherited and generated vertices of the overlap polyhedron of two intersecting polyhedra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.10 Example of an inherited face and a generated face . . . . . . . . . . . . . . 86 4.11 Order the vertices on a generated face with respect to the centroid . . . . . 88 4.12 Order the vertices for a generated face with respect to an edge . . . . . . . 89 4.13 Sketches of the contact lines of two intersecting polyhedra . . . . . . . . . . 90 4.14 The overlap polyhedron and the contact area and normal . . . . . . . . . . 92 4.15 Determination of neighboring features by the overlap bounding box method and the projection method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 v

4.16 Statistics of the efficiency of the contact detection algorithm . . . . . . . . . 97 4.17 Fraction of the number of particles which overlap in the contact list . . . . 97 4.18 Statistics of the efficiency of the neighboring feature algorithm . . . . . . . 98 5.1

The normal direction and tangential direction for two-dimensional particles 100

5.2

Two interacting polyhedral particles and their overlap region . . . . . . . . 101

5.3

Deformations and overlap in elastic models . . . . . . . . . . . . . . . . . . 102

5.4

Sound propagations in continuum and in discrete systems . . . . . . . . . . 103

5.5

Sound wave in the continuum, in a packing of cubes and in a packing of parallelepipeds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

5.6

Vectors from the center of mass to the contact point for arbitrarily shaped and regular particles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

5.7

Chains consisting of large-size and small-size particles . . . . . . . . . . . . 104

5.8

Propagation of the maximum velocities in the discrete chain . . . . . . . . . 105

5.9

Sound velocities of the small-particle and the large-particle chain . . . . . . 106

5.10 Two arbitrary-shaped polyhedral particles approaching each other . . . . . 107 5.11 The time evolution of the volume of the overlap polyhedron . . . . . . . . . 108 5.12 The time evolution of the center of mass of the overlap polyhedron . . . . . 108 5.13 The time evolution of the contact normal and the elastic force . . . . . . . . 109 5.14 Sketch of the variation of the total normal force . . . . . . . . . . . . . . . . 110 5.15 Deformation of the “tangential spring” in Cundall-Strack friction model . . 112 5.16 Intended behavior for the Cundall-Strack friction model and its actual oscillatory behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.17 Cundall-Strack tangential increment in two dimensions . . . . . . . . . . . . 114 5.18 A simple test for the force models: a cube on an inclined plane . . . . . . . 119 5.19 The volume of the overlap region. . . . . . . . . . . . . . . . . . . . . . . . . 120 5.20 Normal force and its convergence . . . . . . . . . . . . . . . . . . . . . . . . 120 5.21 Tangential force and its convergence . . . . . . . . . . . . . . . . . . . . . . 121 5.22 Position equilibrium of the block on the wedge. . . . . . . . . . . . . . . . . 121 5.23 Force equilibrium of the block on the wedge. . . . . . . . . . . . . . . . . . . 122 6.1

Sketch of a shared memory and a distributed memory computer. . . . . . . 127

6.2

Independent loop, forward dependency and backward dependency . . . . . . 127

6.3

Syntax of the loop construct of OpenMP in FORTRAN. . . . . . . . . . . . 130

6.4

Pseudocode of the force computation iterations. . . . . . . . . . . . . . . . . 133

6.5

Pseudocode of the (inherited) vertex computation iterations. . . . . . . . . 134

6.6

Pseudocode of the (triangular) face intersection computation iterations. . . 134

6.7

Coarse-granular (above) and fine-granular (below) parallelization approach for multi-core machines (in OpenMP style with four cores available). . . . . 135

6.8

Pseudocode of the parallelized force computation iterations. . . . . . . . . . 136

7.1

Sand heap on a mirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 vi

7.2

Granular heap of polygons on top of a zigzag surface . . . . . . . . . . . . . 142

7.3

A heap constructed by polyhedral particles on a flat surface . . . . . . . . . 143

7.4

The centers of mass of the particles of the heap. . . . . . . . . . . . . . . . . 144

7.5

The time evolution of the total energy of the granular heap. . . . . . . . . . 145

7.6

The velocity in z direction of a single particle dropped on a floor. . . . . . . 146

7.7

The angular velocity of a single particle dropped on a floor . . . . . . . . . 146

7.8

Sketches of the dispersion of a laser beam going through a granular system 149

7.9

A laser-sensor pair to measure the strength of the laser beam after dispersion149

7.10 Calibration setup and samples of glass beads used fore experiment . . . . . 150 7.11 Calibration result: each data point is the average of 80 measurements . . . 151 7.12 Construction and measurement setup . . . . . . . . . . . . . . . . . . . . . . 152 7.13 Construction of a quasi-two-dimensional heap and its density measurement 152 7.14 Particles used in the simulation of the quasi-two-dimensional heap . . . . . 154 7.15 Snapshot of the centers of mass of particles in the quasi-two-dimensional heap154 7.16 Three density homogenization methods for heaps . . . . . . . . . . . . . . . 155 7.17 Moving cell homogenization scheme for density computation . . . . . . . . . 155 7.18 “Proper” and “improper” angles of repose from the experiments . . . . . . . 156 7.19 Density results from the experiment with a lifting velocity 1 mm/s . . . . . 157 7.20 Density results from the experiment with a lifting velocity 3 mm/s . . . . . 158 7.21 Density results from the experiment with a lifting velocity 5 mm/s . . . . . 159 7.22 “Practical” and “impractical” angles of repose from the simulations . . . . . 160 7.23 Density results from the DEM simulation . . . . . . . . . . . . . . . . . . . 161 7.24 Pressure “dip” of the heap in the DEM simulation . . . . . . . . . . . . . . 162 7.25 Construction of a quasi-two-dimensional heap by layered sequence . . . . . 162 7.26 Density results from the DEM simulation of a layered sequence heap . . . . 163 7.27 No pressure “dip” for the layered sequence heap in the DEM simulation . . 164

vii

List of Tables 2.1

Gear corrector coefficients for second-order differential equation . . . . . . . 43

4.1

Performance comparison of PDTIA (without matrix rank check) and PNTIA 78

4.2

Performance comparison of PDTIA (with matrix rank check) and PNTIA . 79

6.1

Profiling result for the scalar code from a time-based sampling. . . . . . . . 132

6.2

Profiling result for the parallelized code based function count. . . . . . . . . 137

6.3

Performance for parallel executions on two different multi-core machines.

viii

. 139

Chapter 1

Introduction In this introduction, we will first outline what the systems are for which our DEMsimulation is intended. We then outline those properties of granular materials, which make more conventional simulation approaches problematic, which is why we choose the DEM-method as the approach for simulating granular materials. We then give an overview over other computational methods, to show how they are inadequate to deal with many aspects which have been outlined in the sections before. We continue with an overview over the alternatives in the force-modeling in DEM-methods, to motivate our choice of polyhedra for the shape. Before giving an overview of the thesis, we outline the considerations with respect to the accuracy and stability for DEM-simulations.

1.1 1.1.1

Overview of granular materials Classification

Under granular materials we will understand in the following materials which are made of solid particles which are definitely larger than atoms, and whose deformations under shear are insignificant compared to the dislocation of the center of mass of the particles. The interaction between the surfaces in normal direction of the contact is mostly due to elastic deformation, while in tangential direction, Coulomb friction is significant. The latter leads to static friction, which cannot be neglected, as the usual coefficients of friction are of the order of about half of the normal forces. Very small polymer particles, or the “particles” in gels and colloids, or bubbles in foams, interact by viscous tangential forces and will not be included in our considerations. In other words, it is the quality of the “surface interactions” between the particles which is characteristic for granular materials. A special case would be fracturing, where the particle shape is destroyed, but which nevertheless again leads to granular particles on a smaller scale. The opposite would be sintering, where smaller particles aggregate to larger ones. Both cases will not be treated in this thesis, which 1

2

INTRODUCTION

are limited to a purely mechanical treatment of granular materials, due to the amount of thermodynamics surface chemistry involved for a meaningful treatment. The classical phases or states of matter which are relevant for mechanical considerations are solid (definite shape, definite volume), liquid (definite volume, indefinite shape) and gas (indefinite volume, indefinite shape). The “granular” state is in fact sometimes considered as a “fourth” state, allowing for all the other three: Shape and volume may be definite (granular solid in soil or storage between walls), or volume may be definite, but shape indefinite (usually land- and mudslides, avalanches, liquefaction) or both volume and shape may be indefinite (in granular gases, like dust avalanches, sand-storms and pneumatic transport). The transition between these phases can be induced by mechanical excitations, like vibration and shaking (shaking implies a higher amplitude than vibration), fluid flow (aeolian transport by air, or hydraulic transport by water) or even by electrostatic forces for very fine powders. The focus of this work will be mainly on granular solids and fluids. We discriminate between the macroscopic scale (much larger than the particles size), the mesoscopic or micromechanic scale (of the particle size or fractions of it, depending on the extension of the contact areas), and the microscopic scale (the fine structure of the contacts, with ridges, wear, and the atomic and molecular structure). Our discrete element method treats the granular materials on the mesoscopic scale. Beyond the macroscopic classification of granular solid, fluid and liquid, another classification are the mesoscopic interactions between the particles. If there are no effects from the surrounding fluids, one speaks of “dry” granular materials if the interaction are only repulsive, or “cohesive” granular materials (usually for particles below 0.5 mm). In this work, we will limit ourselves to dry granular materials, though the force-laws could also be implemented for cohesive materials. For many processes in granular materials, actually the fluid is relevant, as it inhibits the motion of the granular particles: A typical case is a pile which is driven into some muddy ground or clay, and which starts to move very slowly, when an external force is applied: The granular matrix inhibits the flow of the fluid, as a sponge inhibits flow, while the fluid inhibits the motion of the granular particles by adding cohesiveness. But given enough time, the external force will move the materials nevertheless. The opposite is true for liquefaction during earthquakes, where the fluid destabilizes the granular matrix. These processes cannot be investigated in this thesis, but are mentioned here to show that there are important micro-mechanic aspects beyond dry granular materials. The names of granular materials with fluid depend on the fluid and the mixing ratio. In “suspensions” the particles are only in loose contact in fluids with relatively low viscosity (in air and in water), while in pastes (asphalt, toothpaste) the fluid has high viscosity. The space between the particles is called the “pore space”, while the particles form the “granular matrix”. For porous media, the “granular matrix” can be considered as fixed, only the fluid is in motion, and the volume of the granular phase is usually larger than that of the fluid phase.

OVERVIEW OF GRANULAR MATERIALS

1.1.2

3

Scales and examples

If we consider granular materials as composed of frictional particles, the smallest grains will be particles for which Coulomb friction is possible. In accordance with the results found in atomic force microscopy, the smallest scale on which Coulomb friction has been found would be of dozens of atoms diameter, i.e. nano-powders. They may be formed from solids, i.e. glassy (no atomic order), crystalline (atomic order) or ceramic (mixture of glassy and crystalline material) particles, where inside the solid the neighborhood between the constituting atoms does not change. Hence, Coulomb friction is possible. While macromolecules may have much higher molecular weight than grains of nano-powders, their molecules are usually partially reorientable and the inter-molecular interactions do not show friction of the Coulomb-type (velocity-independent dynamic friction and static friction), but rather viscous friction. This allows the discrimination between granular materials and polymers based on the type of the interaction and the structure of the constituents. In this work, we focus on a type of particle simulation able to investigate shape effects in a lot of detail. Nevertheless, we concede that shape effects may be irrelevant for some systems of granular materials below the mm-scale, where the cohesive forces will dominate the interaction so that geometric effects become negligible. Nevertheless, even for cement, the size dispersion is relevant for the yield stress, so we think that our approach may also have serious implications for cohesive materials [1]. In principle, rock mechanics would constitute the largest particle scale for granular systems. Nevertheless, with our definition of granular materials as particles which move mostly under the influence of the surface interactions of the constituents, much larger systems can be treated as granular: Shelf ice which floats on water has been treated as a two-dimensional granular material [2] and block-spring-models, simple models with granular interaction are used to model the Gutenberg-Richter distribution of earthquakes, so at least implicitly, continents in some communities are understood as “two-dimensional” granular structure, too. Granular systems are also found way beyond the earth. Wandering dunes have even be found on Mars. The asteroid belt between Mars and Jupiter has been discussed in the context of granular materials by astrophysicists1 [3]. Even if the grain size (maximally several hundred km) would be smaller than that for the continents on earth, at least the extension of the whole system is much larger. Another cosmic aspect of granular matter is the discussion of dark matter and the formation of planets: “interstellar dust” is often mentioned, even if the word “granular material” is usually avoided [4].

1 e.g. An experiment carried out by European Space Agency: http://eea.spaceflight.esa.int/?pg= exprec&id=9139&t=2542184240

4

INTRODUCTION

1.2 Phenomenology and methodology At least on the terrestrial scale, between nano-powders and rocks, research dedicated to granular materials is found in the fields of (particle size from small to large) powder technology, chemical engineering, process engineering, mechanics, physics, hydrology, civil engineering, geotechnics, geology, rock mechanics and earthquake engineering. As there is no universal governing equation for granular materials, the approach to the modelization varies strongly, depending on the field. In this section, we describe the phenomenology of granular materials and various aspects which have lead to the application of various methodologies. When we discuss the different approaches with respect to the granular phenomenology they are supposed to capture, or fail to capture, some overlap with sections on simulation methods will be unavoidable. Many approaches which have been fruitful in other fields of research (continuum mechanics, molecular dynamics, statistical physics) have been destined to failure with granular materials, at least beyond some narrow parameter windows. Instead of articles, we will rather cite research groups which are representative for the respective approaches.

1.2.1 Rheology As long as the rheology of the granular material is of interest, fluid-like models are often employed in powder technology, chemical and process engineering (Nikos Christiakis, at the University of Crete, http://www.tem.uoc.gr/~nchristakis). Rheological approaches work well as long as there is continuous flow due to controlled external excitation, by e.g. fluid flow, vibration or milling. Unfortunately, when the smooth flow breaks down due to clogging, the fluid models neither predict the conditions for the breakdown nor the conditions after the breakdown. Therefore, some researchers in these fields have switched to particle models (Tomas-Group, Univ. of Magdeburg, http://www.uni-magdeburg. de/ivt/mvt/). Similarly, there have been attempts to describe granular materials as very viscous fluids (Jäeger-Group, Univ. of Chicago, http://jfi.uchicago.edu/~jaeger/ group/granular.html). A marked difference to truly viscous fluids is nevertheless that the flow resistance of viscous fluids vanishes with the flow rate, while for granular materials, due to Coulomb friction, the flow resistance is always finite even for vanishing velocity.

1.2.2 Plastic behavior versus elasticity Classical continuum-approaches via elasticity theory are common in geotechnics. Unfortunately, dry granular materials are not “elastic” in the sense that granular assemblies have any resistance against pulling forces. On the contrary, the change of neighborhoods under static friction give nearly optimal plastic behavior, so that some jugglers practice with particle-filled objects, which don’t roll or jump about when they fall to the floor. As a workaround for the lack of elasticity, the linear relation for shear, obtained by externally

PHENOMENOLOGY AND METHODOLOGY

5

Fig. 1.1 Closest packing (left) and reduction of the density (Reynolds dilatancy) after the application of external stresses (right).

Fig. 1.2 Closest packing (left) and shear band formation after application of external stresses (right).

6

INTRODUCTION

confining the system, are used for the “elastic” parameters. An additional problem is that granular materials may increase their volume under shear, the so-called Reynoldsdilatancy, where the pore-space is increased if the aggregate is sheared while in a closest packing, which means effectively a “negative Young’s modulus”, as the volume increases under the influence of an external force, e.g. Fig. 1.1. In milder form, Reynolds dilatancy also occurs in shear bands in granular materials, e.g. Fig. 1.2. This as well as the possible plasticity and friction has to be incorporated into the models. Consequently, there is a wide flavor of continuum theories used which are supposed to cure some or all of these problems, including nonlinearities (Dimitrios Kolymbas at Univ. of Innsbruck, http://www.uibk.ac.at/geotechnik/ and Theodoros Triantafyllidis at the University of Karlsruhe, http://www.ibf.uni-karlsruhe.de/). Continuum methods are essentially “mean field” approximations, so no fluctuations in the analytical stress-strain curves are assumed, a stark contrast to the experimental reality and to the results one obtains with particle methods.

1.2.3 Statistical physics Statistical physics tries to describe macroscopic assemblies by distributions of properties for the constituents. In granular material research, these are often energy dissipation properties (H. Hayakawa, the Univ. of Kyoto, http://www2.yukawa.kyoto-u.ac.jp/ ~hisao/hisao-e.html) or the interaction strength via the dispersion relation (O’Hern, Yale University, http://jamming.research.yale.edu/people.html). In this respect, there is an important difference between thermodynamics and statistical physics: While thermodynamic results should be independent of the microscopic constituents (the classical heat capacity for gases depends only on the degrees of freedom of a molecule, not on the kind of molecule), statistical physics has to take the character of the microscopic constituents into account (ferromagnetism is limited to very few kinds of atoms, and cannot be deduced for arbitrary materials).

1.2.4 Size- and shape effects Most researchers who claim to work on statistical physics of granular materials (obtaining a macroscopic description depending on the microscopic structure) are actually trying to obtain thermodynamic descriptions (ignoring the micro-structure) by ignoring size and shape effects. Very often, the attempts are based on spherically symmetric models, though the aggregates of round particles have definitely different properties from non-round particles (angle of repose of only about 20◦ instead of over 30◦ , lower strength under triaxial compression etc.). This brings us to the issue of effects of the particle shape on the aggregates: while the size has already been mentioned, the role of the shape is equally decisive. The strength of a granular assembly is basically a result of the competition between rolling and sliding of the particles in a granular matrix: For round particles, even large coeffi-

THEORETICAL AND COMPUTATIONAL METHODS

7

cients of friction will not be able to stabilize a granular structure against disintegrating by relative rolling at inter-particle contacts. Ironically, a researcher in the field of geotechnics (Matsushima-group, Univ. of Tsukuba) is closest to the attitude of our group by insisting on the importance of the shape. Our conceptual approach is to take into account the micro-mechanics of the granular assembly, the shape and detailed particle contacts. Further, the full description is by a mechanical theory, where friction turns out as a constraint of motion.

1.3 1.3.1

Theoretical and computational methods The event-driven method

In general, the event-driven method (ED) or discrete event simulation refers to any simulation method where some process A is simulated, and, when an “event” is detected, another process B is effectuated, and then usually process A is continued [5]. For granular materials, the “event driven” method is a DEM-method based on the collision dynamics of two rigid particles which is obtained from the conservation of momentum. In the “time integration”, the particles fly in (under gravity, parabolic) trajectories, until a collision occurs. At the next time where two particles collide (“event”, therefore event-driven) the simulation for all particles is stopped, the velocities of the colliding particles are dealt with, e.g. for frontal collisions of particles with the same mass the velocity with respect to the collision axis is reversed. For systems of low density, i.e. granular gases, it is very fast for round particles, but as each collision dissipates energy, so that for systems with physical coefficients of restitution, the system becomes too dense to be dealt with via two-particle collisions only. The workaround by many researchers is to “switch off” the coefficient of restitution if a certain density is reached. The larger the system is, the shorter the interval between collisions becomes, so the algorithms become less efficient for larger systems. A (physical) remedy for this problem is not to deal with the whole system as one unit, but partition the system into subsystems, each with a “local clock” [6], so that the time is advanced according to the collisions of neighboring particles, not due to the next collision in the global system. For the event driven method, where the particles are practically “never” in contact, except at delta-like events, the sound velocity depends on the particle density [7]. The event driven method is the simplest example for a rigid-body DEM method, nevertheless, its inability to treat resting or permanent contacts between particles or static configurations makes it useless for most systems with “physical” densities.

1.3.2

Lattice models

Lattice-models for granular materials, as will become clear in the following sub-sections, belong to what at best could be called “semi-quantitative” methods. Nevertheless, as they

8

INTRODUCTION

have found considerable attention in the field, they are outlined here.

Cellular automata The original meaning of cellular automation in computer science is an algorithm which reacts to each of a finite number of input patterns with a deterministic output. In contrast, in the field of computer simulation in physics, it usually means a simulation method which makes use bits “0” and “1”, so that a bit pattern at discrete time “t” results deterministically in (usually another) bit pattern at time “t+1”. The wide use of cellular automata in physics originates in the field of pattern formation in the 1980s, which came into fashion after the fractals had split of as an independent research interest from the field of chaos and nonlinear dynamics in the 1970s. The scientific fame of Stefen Wolfram, the man behind MATHEMATICA, is basically due to his work on cellular automata. In cellular automata, the bits “0” and “1” are treated as physical entities, in particular, so for granular materials “0” stands for “no particle” and “1” for particle. While the use of the method has no proper physical motivation, it allows very fast simulations with many “particles” (bits). Cellular automata were very popular in the field of “pattern formation” (an offspring of the research on fractals, but with less mathematical rigor), from which many researchers changed to granular materials. As many researchers in the granular community received their basic training in statistical physics of phase transitions, where large system sizes must be computed to reduce finite size effects, the appeal of the method comes from the possibility to simulate “large” systems. Moreover, in statistical physics, critical exponents are determined by the symmetry and the range of the interactions, so that correct exponents could be obtained even when the details of the physics were ignored. Nevertheless, as granular systems are mechanical, not statistical systems, and the correlation length in granular systems is usually limited by the static friction, the potential of cellular automata to allow the simulation of large systems is hardly of any use, compared to the fact that essential part of the physics (Newton’s equation of motion) is lost. Cellular automata have been used for flow problems [8] and the modeling of avalanches in sand piles [9]. The latter article had tailored the avalanching in heap formation according to a power law, similar to the Guttenberg-Richter law for earthquakes, where the frequency is inverse proportional to the magnitude. This triggered a whole new fashion in physics, the quest for “self-organized criticality”, as the inverse power law phenomena were called. Nevertheless, for avalanches in sand heaps, experiments showed significant deviations from the cellular automata predictions [10, 11], so cellular automata are rather more pliable than reliable.

Other lattice models The basic structure of cellular automata can also be adapted to allow more than only binary outputs [12]. As in the case of cellular automata, the attraction for researchers

THEORETICAL AND COMPUTATIONAL METHODS

9

lies in the fact that one can determine the outcome of the simulation by fixing unphysical parameters, without being inconvenienced by the basic laws of mechanics.

Tetris As many researchers are not able to deal with non-spherical particles with proper physical equations of motion, once a lattice algorithm based on the TV-game of “Tetris” was devised to investigate compaction problems (packing densities under external excitations). The fact that it was accepted for the prestigious Phys. Rev. Letter journal demonstrates the need of a simulation of non-spherical particles, rather than the computational prowess of its authors to deal adequately with the problem [13]. While shape is certainly an important parameter in compaction, the friction coefficient and to a lesser degree the filling process are important. Both effects can be only incorporated by accounting for the full equations of motion in the simulation, something which will be always out of the reach of lattice models, which even have to make do with discretized space.

1.3.3

The Monte Carlo method

The Monte Carlo (MC) method is a statistical method based on the use of (usually equally distributed) random (or pseudo random) numbers. Before pseudo random numbers from computers became available, tabulated random numbers were used [14], which were generated by roulette-like machines. Before that, only tabulated outcomes of roulette games in casinos for gamblers were available, e.g. [15], so the name was derived from the most famous of casinos, where the outcomes where regularly posted outside and therefore used for the first tables2 . Conventionally one discriminates between “direct” and “importance sampling” Monte Carlo methods. Direct methods are usually used for kinetic problems of trajectories, going back in physics to the mid-60’s for modeling the transport modeling of neutrons [16]. In contrast, importance sampling methods model the underlying stochastic processes, e.g. accept or reject events according to the underlying probability, e.g. the Boltzmanndistribution

(

P =e

− k∆E T b

)

,

for a change of the energy ∆E, Boltzmann-constant kb at temperature T, so that changes of the system which increase the energy are exponentially suppressed. If the P is used directly, the method is called a Metropolis-algorithm [17] or M(RT)2 algorithm, in allusion to the initials of the authors of the first paper. If instead of P P′ = 2

http://www.lehrer-online.de/683293.php

P , 1+P

10

INTRODUCTION

is used, the algorithm is called “heat-bath algorithm”, which limits the acceptance probability to about 50 % even if P ≈ 1. This is useful if the acceptance of every Monte Carlo may lead to a repetition of the initial configurations for certain sampling strategies. In granular materials, hybrids of the direct MC (for the propagation) and the importance sampling (for the interaction) are used [18, 19, 20]. A drawback of the method is its lack of physicality, as the deterministic mechanical behavior must be mimicked by statistical laws. Usually, Newton’s laws cannot be fulfilled, especially the force balance, and stochastic collision laws are often also difficult to formulate on a Galilei-invariant basis. Like the lattice-models, the Monte-Carlo method for granular materials belongs to the realm of at best “semi-quantitative” methods. Nevertheless, due to its versatility and success for systems with well-stated thermodynamic properties [21, 22], it has also been tentatively used for granular materials, but without leading to algorithm with actual predictive power.

1.3.4 Continuum dynamics Some additional remarks, beyond the ones in section 1.2 on continuum mechanics are in order here. Continuum methods are widely and with considerable success used in material science and fluid dynamics. Nevertheless, for granular materials, they have some significant problems. In this section, we want to discuss some aspects which arise in modeling granular materials with structural mechanics methods, rather than fluid dynamics like approaches. While the continuum approximation of metals in structural mechanics can rely on the fact that the discrete constituents are atoms with a size below 10−9 m, in a homogeneous mixture, the grain diameter in granular materials has a much larger variation in size. Another aspect is the mixture: While it is the mission of steel mills to produce metals with a homogenous distribution of atoms, in e.g. geotechnics applications, one cannot necessarily assume homogeneous properties of a soil which must be taken as it is. Accordingly, the experimental input data for the continuum modeling show a much larger variation. The next conceptual problem is that granular materials, in contrast to metal parts, may not necessarily be stressed below the yield stress, but intentionally beyond: For the outflow from a silo, definitely the shear bands which form at the boundary between moving and non-moving material are relevant. Accordingly, a correct modeling of the transition between “elastic” and “plastic” properties is necessary. The “elastic” properties can be taken into account at least approximately by the linear part of triaxial stress-strain diagrams. Nevertheless, there is no elasticity in the sense of an elastic restoring force, but only a linearity between external stress and reacting strain. While at least this dynamics is similar to a conventional Newtonian mechanics, where the deformation is computed due to the external stresses, for plasticity, the stresses have to be computed based on the external deformation, like in constraint dynamics. Other approaches try to set up different differential equations according to the problem,

DEM MODELING

11

rather than having one computing method which adapts itself to the boundary condition. The hypoplastic continuum in geotechnics has already been mentioned, another example is the Savage-Hutter model for avalanches [23]. To conclude, the drawback of continuum dynamics is that the concept itself creates leads to a rather paradoxical approach: In the first step, a continuum approximation for a discrete system must be found, for which impressive names like “homogenization” etc. are used, but finding a smooth description for something what is plagued with jumps is certainly not something which can claim physical or mathematical rigor. In the next step, one is stranded with a partial differential equation for which no exact solution exist, instead one has to deal with well-known artifacts of partial differential equations, e.g. the infinite signal-propagation speed in diffusive terms, where a delta-function-like peak at t = 0 spreads out that for largest distances one can still find a finite amplitude at t > 0. In the next step, because of the lack of an analytical solution, one has to discretize the equation again to solve it on the computer. As long as the problem is one of civil engineering, the problem with the validity is “dealt with” by taking a safety margin of a factor of 2, but that can hardly be considered exact science.

1.4

DEM modeling

Fig. 1.3 Physical situation (a soft sphere is deformed while contacting a plane, left) and the simulation with finite element method (middle, not penalty method, many degrees of freedom necessary in the discretization) and discrete element method (right, overlapping shapes, only degrees of freedom of the corresponding rigid-body problem necessary).

The discrete element method (DEM, see Fig. 1.3, right) is a method which models inter-particle forces based on elasticity parameters and on the overlap of undeformed particle shapes [24, 25]. The penalty-method for finite element method (FEM) in structural mechanics for contact dynamics simulations uses the overlap between finite element shapes in a similar way [26]. The overlap between particles can be understood as the amount of deformation necessary that the particles could physically occupy the space in their actual configuration. Compared to a finite element method of deformable particles [27],

12

INTRODUCTION

which need a discretization of the elastic particles, the discrete element method needs only the degrees of freedom which is necessary for rigid bodies: Three in two dimensions and six in three dimensions. For the accuracy, there are hardly any drawbacks, as the only additional information one could gain from the finite element method, internal stresses and strains, are either not of interest, or unreliable due to the fact that the microscopic surface asperities will lead to random alterations of the FEM results anyway. In the flowing we will give an overview of the geometry and force models use in DEM simulations. In the flowing we will give an overview of the geometry and force models used in DEM simulations.

1.4.1 Geometry models The principle of the DEM-method is to compute the forces proportional to the geometrical overlap of the particles used. The most obvious choice in the first DEM-simulations [24] were disks, so that the overlap problem for two-dimensional particles could in fact be reduced to a one-dimensional intersection problem. Nevertheless, circular or spherical particles mean that the forces involved are basically central forces, while in actual granular materials the particles are not spherical, and the forces are not central forces at all. There are important implications for the force networks of round particles in two dimensions: The force network has basically “weak” and “strong” forces, and the network looks as if made of “meshes” of large forces, so that particle groups interacting with strong forces are embedded in meshes of larger force magnitude. This is an artifact of the round particles, for e.g. ellipses, the mesh-structure vanishes, and the bimodal (strong and weak forces) network becomes a wide, continuous unimodal distribution. Therefore, there is a need of the modeling of non-circular or non-spherical shapes. We will showcase here several aspects of this modelization, the apparent advantages, and the actual drawbacks of different choices of the geometry. As a further aspect, the corresponding two- and three-dimensional models are discussed in the same section. The possibility to generalize the model from two to three dimensions is an important aspect of modeling, as on the one hand, it shows how universal the approach is, on the other hand two-dimensional models allow to gain experience with the simpler lower-dimensional system (stability, timesteps, magnitude of overlap) which are necessary for a successful implementation in three dimensions. Generally, what is necessary for a DEM-computation is the force magnitude, from the overlap, and, for physical torques, a force-direction and a force point, and the latter is also necessary for a physical computation of static friction. Not for all models described below definitions can be found to obtain a smooth variation of force magnitude, force point and force direction when the relative motion of the particle is smooth. This has disastrous effects for the stability of the algorithm. In this section, we discuss the issues related to simulation with soft contacts, where the overlap of the particles are finite: Simulations of rigid contacts, where only point-wise contacts (or point-line or line-face or face-face) contacts can exist are discussed in the section on contact mechanics.

DEM MODELING

13

Ellipses and their generalizations A natural next step from round particles to elongated particles would be the choice of ellipses. Nevertheless, it should not be forgotten that the overlap computation should be fast, so that double precision computations should be sufficient. A single ellipses can be written in normal form

x2 y 2 + 2 = 1. ra2 rb

with half axes ra and rb . Nevertheless, for two ellipses in arbitrary relative position and orientation, mixed terms in x, y and xy cannot be eliminated, so to represent two ellipses we need two equations a1 x2 + b1 y 2 + c1 xy + d1 x + e1 y + f1 = 0,

(1.1)

a2 x2 + b2 y 2 + c2 xy + d2 x + e2 y + f2 = 0.

(1.2)

According to classroom geometry, we can obtain the intersection point of the ellipses, which are needed to compute the interaction, by eliminating the y in the above equation, so the intersection points between the ellipses are the solutions of the fourth order equation Ax4 + Bx3 + Cx2 + Dx + E = 0.

(1.3)

Analytically, if all solutions are complex, there are no intersection points in the x−y−plane. Unfortunately, when one implements the approach with double precision floating point numbers, it turns out that it does not work [28]. The computed intersection points are off the actual outlines of the ellipses in the percent-range of the half-axes. The reason can be easily seen when the amount of data in the original representation Eqs. (1.1,1.2) are compared with the equation for the intersection Eq. (1.3): For double precision, the coefficients in Eqs. (1.1,1.2) contain information of 2 × 6 × 8 = 96 bytes, compared to 40 bytes for the x−coordinates in Eq. (1.3): In the transformations to eliminate y, more than half of the data have been lost. Actually, this analysis can be confirmed easily by trying out overlap computations of ellipses where the orientation is e.g. parallel to the axes: Because the c1 , d1 , e1 , c2 , d2 , e2 drop out of the original equations, the information loss is reduced, and the computations from Eq. (1.3) become more accurate3 . Of course, this would not happen in analytical computations, but as long as we are limited to floating point computations for performance reasons, the use of Eq. (1.3) is not feasible. To actually compute the intersection points, Eqs. (1.1,1.2) must be simultaneously solved via Newton iteration. For very elongated ellipses, when one end of one ellipse protrudes over the end of the other, the choice of initial values becomes rather problematic, and the iteration may fail. In principle, the same reasoning applies to “super-ellipses” [29] n n x + y = 1, ra rb 3

Personal communication H.-G. Matuttis

(1.4)

14

INTRODUCTION

with positive n. While the contact line and the force point can be computed easily from the intersection points, in the generalization to three dimensions this is not so easy. First of all, when ellipses are generalized to ellipsoids [30], the intersection lines become spatial fourth order curves, which suffer from the same numerical problem as the intersection points. Further, the definition of a meaningful computation of the overlap volume is also not trivial. The problems are worse for super-quadrics, the generalizations of ellipsoids for three dimensions, so algorithms have been proposed [31], but we have seen no results published with such methods. Problems with the computation of force point and overlap volume are usually circumvented by computing only “penetration depths” instead of the detailed contact geometry (volume, contact line and force point). Nevertheless, as will become apparent in the section on Coulomb friction, the inability to specify a contact point, which also applies to “ellipsoidal potentials” [32] also makes the implementation of static friction impossible. Additionally, as will become apparent later, the inability to assign a contact distance and a contact area in two dimensions or a contact volume in three dimensions makes it impossible to fix the sound velocity of the space-filling packing to the theoretical continuum mechanical value derived from the density and the Young’s modulus. Even if that would be only hypothetical for ellipses and ellipsoids, such packings would be nevertheless possible for n → ∞ in Eq. (1.4), the sound velocity of the continuum is an important quantity to make sure that the dynamics obeys the basics of elasticity theory.

Clusters of round particles An alternative of non-spherical particles is the use of clusters of rigidly connected spheres to model granular materials, though molecules have been modeled as rigid assemblies of spheres or spherical potentials before the first particle simulations for granular materials were published. When intersections for e.g. disks can be computed, including overlap, force line and force point, clusters of several discs can be easily modeled as rigidly connected so that the center of mass moves according to the forces on all of these particles, and the orientation changes are computed accordingly. Some researchers [33] who obviously did not grasp the simplicity of this rigid approach connected round particles with springs, which lead to rather wobbly and rather unconvincing particles. For reasonable stiffness, the vibration frequency of the springs would have to be resolved. This approach needs about the same data structures as the rigid connection, only without the average of “heavier” particles and therefore reduced step-size for the time integration. Depending on how realistic the simulations shall be, the clusters can be made up of non-overlapping or of overlapping particles. Some [34] have even gone so far as to produce optimized fitting algorithms for the modeling of given surfaces. While the method of rigidly connected particles is very attractive because it allows to generalize a round particle simulation easily to a simulation of non-round particles, to deal with the artifacts for the friction due to the surface roughness is not so easy. If one puts an elongated particle on a slope, for

DEM MODELING

15

a given friction coefficient µ, the critical angle θ beyond which a particles will slide not stick is tan θ = µ. Nevertheless, if the same coefficient of friction is used for a composite particle with a rough surface, the non-convex surfaces interlock and produce additional tangential forces, so that not even for a two-particle problem (particle sliding on a slope) the correct dynamics will be reproduced. Mixing tangential and normal forces is a severe problem for this kind of simulation: Even for multiple-overlapping particles in the simulations by Matsushima et. al. [35, 36], which was made to reproduce the macroscopic stress-strain curves for different kinds of sand (in Japanese geotechnics, Toyoura-sand is usually the preferred object of investigation), the authors were not satisfied with the results. The Matsushimagroup has used very faithful particle modeling in two dimensions, but now uses very rough particle modeling in three dimensions, see Fig. 1.4. Officially, performance reasons are given, but another aspect one could imagine is numerical stability, as turned out by our discussions related to that subject at Tsukuba University in February 2010: Overlapping of many spheres may lead to contacts which are much harder than a single sphere, so that a reduction of the time step becomes necessary. While twice the number of particles would reduce the time step only by a factor of square root of two, it would improve the shape of the particles considerably if overlapping spheres would be used, see Fig. 1.5. Additionally, we have not encountered any detailed discussion on the numerical implementation of the rotational degrees of freedom and the equations of motion in Matsushima’s work, as we have done here in this thesis, so beyond modeling, there may be also additional issues.

Fig. 1.4 Matsushima in 2005 [35] and in 2008 [36]: The two-dimensional simulations are much more faithful to the particle shape than the ones for three dimensions.

Rigidly connected primitives Apart from spheres and circles, spheres can also be combined with cylinders to reduce the number of primitives and obtain smoother surfaces. In the DEM field, cylinders in three dimensions have been combined with spheres to rods [37] and “polyhedra” [38] (or rather skeletons of polyhedra). Nevertheless, as the corners must still be modeled with circles in two dimensions, while surfaces must be modeled with cylinders in three dimensions, the

16

INTRODUCTION

∆x

Fig. 1.5 Composite particles made from round particles will have a higher effective Young’s modulus due to the higher number of contacts (larger contact area), as indicated by the 13 dotted lines in the configuration on the right. Three-dimensional configurations would have an even higher number of contacts. This must be taken into account in the choice of the time √ step: For 13 contacts, and three times the mass of the particle, the time step should be 3/13 ≈ 0.48 or nearly only a half of the corresponding round particle simulation.

problem of interlocking and the wrong allocation of the total granular forces to tangential and normal forces perpetuates. Algorithmically, the inclusion of elongated primitives makes a decision necessary “up to where” the forces of neighboring primitives should be in effect: The connection must be smooth, else for particles with smoothly varying relative position and orientation will vary non-smoothly, and the simulation will become unstable.

Polygons and polyhedra Polygons and polyhedra are a natural choice for simulating granular particles, as granular particles are “edgy” in the first place, and the walls used in technical devices are usually straight (or can at least be modeled by straight lines). Unfortunately, the approach has a bad reputation as being time consuming. For the polygonal code of our group, this is certainly not the case, as in the one used in our research group less than half of the time goes into the overlap computation at least for moderate number of particle corners. Of course, for large number of corners, the update of the particle outline will dominate, but simulations of particles with 128 corners are hardly meaningful, except for control runs to reproduce the results for round particles. The remainder of the computer time goes into the time-integration and the collision detection, which have to be performed also for any other simulation with particles of another shape. Nevertheless, there is an indication that the performance problems of many groups were due to the faulty computation of simulation quantities (e.g. the moment of inertia in the simulations of H.-J. Tillemans4 ), or careless performance of the overlap computation. In that case, the simulation becomes unstable and usually the users run the simulation with much smaller time step than necessary. While some work has been done at DEM-simulation of polyhedra [39, 40], no simulation 4

Personal communication: H.-G. Matuttis

DEM MODELING

17

results have been published up to now. The conclusion in the granular community is that these codes did not work properly or became unstable easily. Therefore, the subject of this thesis is the generalization of a polygonal simulation to a working and stable polyhedral simulation, while the UIUC group has distributed the work to two theses [41, 42].

Piecewise curves The use of piecewise curves, like circle segments, shares characteristics of the simulation for ellipsoids, as the contact line intersections are more complicated than for straight lines. There is also a common feature with the composite primitives, as there is a necessity to decide during the overlap computation where the segments end. Arcs of circles have been used [43, 44]. Even (non-convex) shavings of hollow cylinders have been implemented [45]. Surprisingly, splines seem not to be in use in the granular or discrete element community, maybe due to the lack of reliable overlap computations.

Finite element codes Finite element packages also offer the possibility to model discrete elements from particles made up from finite elements in two [46, 47] and three dimensions [48]. The DEMfunctionality is also incorporated into software packages, e.g. in the ANSYS workbench. The interaction between the particles is then computed usually by a penalty method, i.e. the force is proportional to the overlap. The discrete element inter-particle forces can also be interpreted as such a penalty, so the discrete element method could be described as a penalty finite element method without finite elements. Nevertheless, for “usual” stresses far below the yield, the shape of the contacting surfaces is more important for the dynamics than the shape changes from the strains, so the introduction of the deformation only blows up the computational effort by increasing the number of the degrees of freedom. Another aspect of the finite element methods is that attempts have been made to couple discrete with finite elements [49]. Nevertheless, if the discrete element method is run “fully” dynamically (instead of static or quasi-static calculations), vibrations may be on a timescale close to the particle’s eigenfrequency (which results from the particle mass and the Young’s modulus). The coupling between particles and finite elements is then only possible if the elements neighboring the discrete element particles are of similar size, so √ that no abrupt changes in the wave resistance (mechanical impedance) R = Y ρ (Young’s modulus Y , density ρ) occur, else arbitrary reflections of vibrations are possible at the particle-FEM-boundary, and the simulation becomes noisy or unstable. In that case, there is not so much gain from replacing discrete element particles with finite element regions.

18

INTRODUCTION

1.4.2 Normal force models In this section, we will give an overview over possible normal force models for DEM simulations, and the following explanations will justify our particular choice for the computation the full overlap geometry for our polyhedral DEM simulation. Penetration depth

P2 , t=1 d1

P1 d3

d2 P2 , t=2

P2 , t=3

Fig. 1.6

Penetration depths of a pair of particles P1 and P2 at three different time steps.

The original molecular dynamics code for “soft disks” by Alder and Wainwright [50, 51] used the penetration depth as parameter of the normal interaction strength, and as such is a prototype of a DEM method. This is sufficient to obtain a qualitatively realistic dynamics for round particles, where the forces are basically central forces, so that together with the particle distance, also force point and force direction are determined. While such a minimalist approach is appealing, it should not be forgotten that the minimal computational effort leads also to a minimum amount of information about the contact geometry for more complicated particle geometries. In that case, additional information about the direction of the force, not only the magnitude, is needed. In particular, for simulations of polyhedra, one reason that several projects [52, 53, 54, 55] have not led to the publication of simulation results is in all likelihood that while the absolute value of the penetration depth is a smooth function for smoothly varying relative particle positions, the direction definition is a different matter, and may have led to destabilization of the whole simulation. If one would use the penetration depth, on could define the force direction at those features where (in two dimensions) the contact line is longest and in that direction

DEM MODELING

19

Fig. 1.7 Impact on a Newton cradle, as a model for the shock-propagation (soundpropagation) in DEM simulations: the frames start from left to right as time flies.

y

x Fig. 1.8

A space-filling packing of bricks.

where the penetration depth is deepest. Nevertheless, this does not resolve the problem of smoothness of the direction change for polyhedra, as can be seen in Fig. 1.6: Assume that a particle P1 is at rest, and a particle P2 is moving along P1 ’s perimeter from time t = 1 to t = 3 with a constant penetration depth ||d1 || = ||d2 || = ||d3 ||. Even then, from t = 1 to t = 3, the direction change from ||d1 || to ||d2 || as well as from ||d2 || to ||d3 || is not smooth. Such an abrupt change of the force direction would be fatal for even the most stable numerical integrators. In our preliminary studies with auxiliary points in three dimensions (weighted averages of distances from corners or centers of the edges etc.) we were not able to devise any algorithm which could have guaranteed a smooth variation of the force direction. The use of the penetration depth also has a physical drawback. Because the unit of the penetration depth is the length, one has to resort to an unphysical spring constant as parameter of the particle elasticity, instead of Young’s

20

INTRODUCTION

modulus. Related to this question is another disadvantage of this minimalist approach: It is difficult to adapt the sound velocity of a space filling packing to the continuum limit for a space-filling packing. While the sound-velocity in continuum theory is a microscopic pressure change (or deformation) traveling through the material, for the DEM-simulation the pressure change translates into a variation of the overlap, and to a variation of the position of the centers of mass of the particles, the latter similar to a Newton-cradle (see Fig. 1.7). If only the penetration depth is used, a space-filling packing of bricks (Fig. 1.8), which should exhibit practically the sound velocity of the continuum, would have a different sound velocity along the height direction of the bricks than in the length direction: The acceleration in both x- and y-direction due to the overlap but as the particles are much closer in vertical direction, i.e. there are more particles per unit length, the sound would propagate faster along the horizontal, as the acceleration of a single particle covers a larger distance.

Computation of the full overlap geometry

Fig. 1.9 Alternative definitions of the contact line (dashed lines) for overlapping polygons: For realistic Young’s modulus, the difference between both definitions will be negligible.

As the argument for the sound velocity in the previous subsection shows, it is desirable to have more parameters than just the penetration depth to describe the particle interaction adequately. It will certainly be realistic to choose the force between the DEM particles proportional to the amount of deformed material, which automatically leads to the overlap area as the measure of force in two and the overlap volume in three dimensions. Additionally, we have to define the contact line in two and the contact area in three dimensions for the directions of the elastic force. As shown in Fig. 1.9, the direction of the normal force will be defined by the normal to the contact length (or, in case that there are several parts, by directions weighted with the magnitude of the respective parts). Because cohesion will be proportional to the contact length (or area), together with the overlap we have two parameters which allow to define force laws for cohesive particles [56]. The attractive force due to cohesion will be proportional to the contact length (area in three dimensions), while the repulsive force due to the elasticity will be proportional to the overlap area (volume in three dimensions). Further, we will need to introduce the

DEM MODELING

21

distance from the contact point, from which the forces obtained for the overlap can be rescaled to obtain a homogeneous sound velocity for space filling packings of elongated particles. Discontinuous deformation analysis Gen-Hua Shi devised his “discontinuous deformation analysis” (DDA) [57] in analogy to finite element analysis (or rather, the older variant of finite elements which used the Ritz variation principle): The deformation of a DEM configuration undergoes those deformations which lead to the minimal energy of the configuration. Each particle is one “element”, similar to one mesh in a FEM grid, and the shape of its “element matrix” depends on the particle shape: Particles of different shape will deform differently in the presence of the same external contacts. Corresponding to the whole FEM grid for the DDA is the totality of all particles, the FEM matrix is made up of all the “element matrices” of a single particle. Depending on the contacts, the total matrix will change with the contact situation. The drawback of the DDA is that of all energy methods, accordingly, only conserved properties can be dealt with adequately: Dissipation cannot be included in the (Ritz) variation and therefore cannot be taken into account properly. The reason why dissipation is not a problem for dynamic FEM simulations is that today’s methods are not based on Ritz variational principle, but on the Galerkin method (weak formulation). The DDA was originally intended for rock mechanics, where this may not be too dramatic, as rebounds etc. will hardly happen, but for granular materials, where the particles can splash around, this leads to a serious lack of verisimilitude. In our collaboration with the Japanese railway research institute, we were asked by our collaborators to analyze why the DDA-package did give unrealistic time evolutions for the forces during the deformation of ballast under the wheel of a Shinkansen train, basically a whole timescale was missing in the over rolling process, compared to the experiment [58]. Our conclusion was that the simulation was only able to give quasi-stationary results for a variation of the forces (e.g. rock slides), not a real-time dynamics like Newton’s equation of motion, a judgement which seems to be confirmed by the current wikipedia entry5 . From the experience of our research group how “zero-order-approximations” [59] approach the mechanical equilibrium, we would classify the DDA rather as an effectively zero-order approximation to discrete element methods. Rigid body mechanics Rigid body mechanics comes in several flavors. Collision dynamics (“event driven method”, see section 1.3.1) is not applicable to particles which are at rest with respect to each other, as it can treat only instantaneous collisions and translations of the particles. Contact mechanics [60] and rigid body dynamics in computer games [61, 62, 63, 64] can simulate 5

http://en.wikipedia.org/wiki/Discontinuous_Deformation_Analysis

22

INTRODUCTION

rigid particles with resting contact. The approach for the normal forces is basically via constraints. For contacts which are not due to instantaneous collisions, there are two approaches to model resting contacts. In TV-games bilateral contacts (“relative position stays as it is”) are modeled, while in contact mechanics unilateral contacts (with a “>” instead of “=”) are modeled: The penetration δ between two particles must be smaller or equal to zero (where a finite overlap would count as δ). While in conventional systems with ordinary differential equations, the positions are computed from the forces, for such a constraint mechanics, the forces are computed from the (relative) positions. The advantage of this approach is that there are always point-point contacts for particles with curved surfaces, or point-point and point-line contacts for polygons, or point-point, point-line, line-line, line-face and face-face contacts for polyhedra. This means that never must an extended overlap be evaluated, as it with respect to penetration, or area, or volume. Nevertheless, as “in contact” means “exactly zero penetration”, the approach needs a strategy to deal with the fact that due to rounding, “exactly zero” does hardly exist for floating point computations. Therefore, numerically it is not so easy to get these algorithms stable, because small violations of the constraints, even if they can be dealt with so that no “explosions” due to huge force fluctuations for finite overlap happen, there is noise from the ambiguity due to finite precision. This can be seen in a paper by Moreau, where he computed a granular heap of polygons on a top of a string of wedges, instead of a flat surface [65]. Another drawback of rigid body DEM is that the sound velocity for aggregates of rigid bodies is always infinity, which creates problems with causality, and there is no smoothness of the time evolution which makes investigations related to granular acoustics impossible. Though we have a high respect for the work of the group of F. Radjai who devised a contact mechanics of polyhedral particles [66], there is a conceptual problem with rigid body models. They are sometimes considered to be “universal”, as they are “independent” of the Young’s modules. Erroneously, as the particles are rigid, researchers think that the results apply to “very hard” particles. Nevertheless, even rather hard granulates, e.g. glass beads, show scratches in processes where they are under only moderate pressure, so the concentrated contact forces in granular matter from brittle materials are sufficient to induce wear. As the yield stress is of the order of the Young’s modulus for a wide variety of materials [67], for the elastic case, this corresponds to a deformation which cannot be considered to be small, so there is no reason to assume that hard particles are adequately modeled with “rigid contacts”. If we then ask what the physical reality for rigid body modeling means, the only answer is: The limit of vanishing external forces (vanishing pressures, vanishing gravity etc.), because that is the only boundary conditions which will leave the particles undeformed at the contacts. Though the contact area in “soft” DEM models is at a length scale where the modeling cannot be considered reliable any more, for rigid body models, where one contact cannot be forced beyond another via deformation, even the macroscopic behavior must be assumed to be modeled insufficiently, as finite, but large pressures should be able to enforce reorientations of particles.

DEM MODELING

1.4.3

23

Modeling of solid friction

(a) F dissipation constraint

(b) dissipation

(c) F dissipation

F viscous v

v dissipation

vthreshold

v no dissipation dissipation

Fig. 1.10 Numerical approaches to Coulomb friction (one dimensional case): (a) Discontinuous function which distinguishes dynamic from static friction; (b) “Regularization” with viscous friction for static friction; (c) Zero static friction.

Coulomb friction Coulomb (solid) friction is the force which acts on a contact between two bodies to reduce the relative motion. Between surfaces of solids, there can either exist static or dynamic friction, which are hugely different in character. Mathematically, dynamic friction is a function, ff = −µ||fn ||

v , ||v||

(1.5)

while static friction is described by a relation when v = 0, −µ||fn || ≤ ||ff || ≤ µ||fn ||.

(1.6)

Such relation is often also called a unilateral constraint (with a “≤”, in contrast to bilateral constraints “=”, like a pendulum modeled as two-dimensional motion of the bob with the length as a constraint of fixed length [68]). Physically, the dynamic friction dissipates energy during the relative motion of the surfaces, in contrast to static friction, which constrains the surfaces relative to each other without any energy loss. As shown in Fig. 1.10 (a) for a one dimensional case, when the velocity v ̸= 0, the value of friction is uniquely determined by Eq. (1.5), however when v = 0, the friction force could have any value within the bounds of the inequality (1.6), but is in fact unique, when it compensates all other external forces so that v = 0. There are many “regularizations” to obtain a closure for the static friction, i.e. to obtain a value for the tangential force for vanishing tangential velocity. Nevertheless, not all of them are physical (e.g. Fig. 1.10 (b) and (c)). An approach to implement friction into a DEM-simulation must therefore be evaluated not with respect to its ability to disperse energy, but with respect to its ability to model the constraint of particles not moving in tangential direction relative to each other. Many researchers avoid to model free surfaces,

24

INTRODUCTION

e.g. heaps, because their friction models would be unable to stabilize heaps without additional external forces. Nevertheless, even for walls everywhere, a failure to model static friction appropriately will become evident by wrong (too high) packing densities. In the absence of static friction, the stress-strain diagram has only a linear characteristic, with the stress proportional to the strain, without a saturation region, as Reynolds dilatancy cannot be modeled adequately.

4.36

4.358

height

4.356

4.354

4.352

4.35

4.348

0

0.2

0.4

0.6

0.8

1

time

Fig. 1.11 Unphysical motion of a block on an inclined slope (of angle θ) with dynamic friction µ = 3 tan(θ), ignoring the difference between static friction and dynamic friction in numerical simulation.

Ignoring the problem altogether If one writes Coulomb friction as ft = −µsign(v) and then takes literal-minded the usual sign-definition with sign(0) = 0, one arrives at a friction law which is finite for finite velocity, and vanishes for static configurations. In other words, there is only dynamic friction, no static friction. For a block on an inclined slope of angle θ and an initial velocity goes down hill, the results in Fig. 1.11 shows that when µ > tan(θ), the simulation is physically meaningless, since the block is suppose to stop rather than slide down after the velocity reaches zero. Of course, computationally it is pretty difficult to obtain exactly zero velocities, and particles in relative force equilibrium will show an oscillation of the relative tangential velocity around zero. The numerical noise due to the discretization error for the velocity will lead to oscillations of the velocity’s sign, causing jumps in the dynamic friction which will destabilize any time integrator. Accordingly, such a force model which has no physical justification is used in mathematical proofs for rigid-body

DEM MODELING

25

systems with friction [69]. Practically the same instabilities can be expected if the force law for dynamic friction ft = −µsign(v) rather than for simulations. Introduction of viscous friction Using viscous instead of dynamic Coulomb friction will certainly lead to unphysical results, as Coulomb friction is nearly independent of velocity. If there is a dependence, it is a logarithmic decay with the velocity [67]. A more plausible “regularization” uses dynamic Coulomb friction up to a threshold value of the absolute value of the velocity, below it the friction is approximated via viscous friction [70, 71]. Physically, that would mean that a car parked on the slope of a hill could slide down over night. While such a force law is not suitable to model static friction, at least it does not lead to the jumps as the force law in the previous section. Cundall-Strack friction model Cundall and Strack [24] are credited with the invention of the discrete element method for granular materials as they actually implemented a force law which mimicks static friction by incrementing the tangential force from the previous time-step. A cut-off is used if the friction exceeds the product of friction coefficient and normal force. While in two dimensions, the tangential space of the particles is one-dimensional, so that the increment can be either positive or negative, for the three dimensional discrete element method the tangential space is a plane, so the choice of the direction requires special caution. In many codes, where the rotational degrees of freedom are neglected, or where unphysically high values of rolling friction are used, the problem is not so grave. Nevertheless, in this work we take additionally rotation into account without inhibiting the rolling, so in section 5.3.2 we will introduce our variant of the Cundall-Strack friction law which turned out to work reasonable well also for heap formation with particles which can rotate. Contact mechanics Coulomb friction is a reactive force, i.e. it results from the normal force. For contact mechanics, J. J. Moreau introduced the “sweeping process” [72], an iteration which satisfies the unilateral constraints for the volume exclusion (“normal force”) and for Coulomb friction (“tangential force”) simultaneously. Though the solution by the “sweeping process” is unique and well-defined, this is not the same as obtaining the tangential forces as reactions to the normal forces. Though the tangential forces are computed as constraint forces and in the limits set by Coulomb’s friction laws, we know of no proof that the computation of the tangential forces as reactions of the normal forces would lead to the same result. In other words, there is no proof that the tangential force computed from the “sweeping process” is identical with Coulomb friction. Another problem of the contact

26

INTRODUCTION

mechanics algorithm concerns the time integration. Though the positions will be smooth, the velocities may not be smooth. This can be easily seen for the simplest case of the application of a two particle collision operator in the event-driven method, of which “contact mechanics” can be considered as a generalization. Nevertheless, as the accelerations might not be defined in this case [73], it cannot be proved that the allocation of the forces on a particle between normal forces and tangential forces is smooth either, something one would have to demand for classical trajectories.

Exact friction As we have seen in the previous subsections, in the presence of static friction, Coulomb’s laws Eq. (1.6) give only necessary conditions, but not sufficient conditions for the computation of the friction forces as reactive forces for a particle agglomerate under given elastic forces. For single-particle contacts [74] and chains [75] exact solutions are possible, which are computed in a formalism with Lagrange-multipliers for unilateral constraints. For the general contact case for many-particle simulations no formalism has been developed yet. If one uses the Lagrange-multiplier-formalism to compute tangential constraint forces from the normal forces [68], the resulting tangential force cannot be guaranteed to be bounded by µ times the normal forces. What is worse, for too many contacts between the particles, the problem would become statically undetermined. Nevertheless, a unique solution of the constraint many particle problem must exist, conforming to Gauss’ principle of least constraint, which is as general a principle of mechanics, as are Newton’s equation of motion or the Lagrange-Euler equations [76].

1.5 Accuracy and stability In granular materials, the question of what can be computed, and what accuracy means, runs rather deep. In the mechanics of few bodies, as far as linear problems are concerned, the question is relatively simple: The general wisdom states that accuracy can always be reached by reducing the time step. The common attitude on accuracy is that if one compares the trajectories obtained from higher order methods and lower order methods, for an appropriate choice of the time step, one should obtain equivalent results. Numerical analysis textbooks are more careful and require also stability of the method for the proofs. Another issue is that the underlying physical process may be physically unstable. Stable in the sense of Lyapunov means that if a problem is computed with an initial condition xa (0) and another initial condition xb (0) = xa (0) + δx, the final values xa (t) and xb (t) should deviate only proportional to a power of t, if the deviation is larger (i.e. exponential), we have (Lyapunov) instability. This is in fact a grave issue to computer simulations, as many problems in physics are not stable, namely any system which is “chaotic”, and many which are “nonlinear”, which

DISORDER AND QUALITATIVE RESULTS

27

occurs daily in the weather forecast. For our DEM-simulation, we see immediately the cause for the nonlinearity: even if we choose a force law which is proportional to the penetration depth δx of two particles, which is often called “linear force law”, we still have { flin =

−kδx for particles in contact, 0

else.

As a result there is no guarantee to obtain the same trajectories if even only the time step is chosen differently, if the number of particles is beyond a certain value. One can accordingly quantify the degree of the non-linearity by quantifying Lyapunov exponents, as for many other systems, including turbulent flows, the hard sphere gas (going back to Boltzmann) etc. For all these systems, the actual “accuracy” for the computation of the trajectories is not so that they could be used itself as physical quantities. Increasing the order of the integrator or reducing the time step will only lead to precision without accuracy. Nevertheless, macroscopic variables which are computed for such systems, like wavelength dependent energy spectra (for turbulent flow) or critical angles (for granular materials) are usually “stable” quantities at least for the fluctuations which are given by the statistical error bars. Therefore, our choice of time integrator will not be dictated by a quest for arbitrary and illusionary high accuracy. There is a more important issue related to the non-smoothness of the force-laws: For even for the simplest “reasonable looking” force laws, where the dissipation for a normal collision will be velocity-dependent, like for { fdiss =

−k1 δx − k2 v for particles in contact, 0

else,

there will be a term proportional to the contact velocity v. As the contact velocity is maximal at the impact, the fdiss is finite at the initial contact time, and does not vary smoothly. This will be a serious problem for all conventional explicit integrators which are based on the assumption of a smooth variation of the force. One could require that the force law should be smooth, but also in the experiments, impacts of particles create sound, so there is a smaller timescale than that of the mesoscopic motion which nevertheless creates a non-smooth variation of the force, at least for the timescale of a time step in the simulation, of which a few should be able to resolve a two-particle collision. Therefore, when choosing an integration method, we will be much more concerned with stability than with accuracy issues, and the underlying mesoscopic processes are not necessarily “smooth”.

1.6

Disorder and qualitative results

The flow field for turbulence is disordered, i.e. the field amplitudes may vary strongly from one region to the next. Nevertheless, over time, the fields vary and averages will accumulate in such a way so that for neighboring regions they are continuous. This allows to give e.g.

28

INTRODUCTION

drag coefficients with reasonable accuracy. In contrast, the disorder in granular statics is quenched: The pressure by particles under a heap on the ground will not vary with time, and the pressure distribution will not become continuous via time-averaging. Even for flow problems, ten realizations of a flow through a narrow outlet may give similar flow rates, in the eleventh experiment, the flow may clog and get stuck altogether. This makes a different philosophy necessary than in ergodic systems, where time- and configuration averages give the same values.

In the example in Fig. 1.12 where some realizations lead

Probability average

Flow rate Fig. 1.12 Artificial data to show that configuration averages for flow problems with clogging would lead to results which would be characteristic neither for flow with nor without clogging.

Pressure

position Fig. 1.13 Artificial data to show that taking an average over a pressure distribution with a marked dip and the left-right reversed data may actually eliminate the dip.

to clogging (flow rate zero), the averages computed from different experiments may be between zero and the time averages of realizations without clogging. This may lead to averages which are found neither in the clogging nor the non-clogging experiments, which makes the notion of an average rather absurd. Of course, these data were artificial data, but we rather prefer not to make the actual experiment, as the clogging probability will

DISORDER AND QUALITATIVE RESULTS

29

change with the air humidity (as with any friction-related experiment with untreated surfaces), and therefore, in Japan, with the season. Similar cases arise in configuration averages of pressure minima under heaps: As the minima are located close to the middle, but not all exactly in the middle, the averages of distributions with minima may actually have no minima at all, as in Fig. 1.13, where the left-right averaging effectively erases the minimum. Equally important is to look at the standard deviations of the measured values, which give meaning to the actual measurement values. Very often, standard deviations are presented as “error bars”, though in disordered systems, these bars don’t represent errors in the measurement, but the fluctuation of the actual physical observables.

Carefully

Fig. 1.14 Pressure distribution of a single measurement (Brockbank et al.[77]): Single measurement for lead shot heap (left) and sand heap (right).

Fig. 1.15 Averaged pressure distribution over several measurements by (Brockbank et al. [77]): lead shot heaps (left) and sand heaps (right).

recorded experiments [77] show strong fluctuations in single experiments for lead shot and for sand (Fig. 1.14), while at least for sand the configuration averages show a pressure minimum (Fig. 1.15). In fluid dynamics, data trends tend to be much smoother, but in granular materials, the nature of the disorder leads typically to such qualitative (existence of dips for non-spherical poly-disperse systems, non-existence of the dip for round nearly

30

INTRODUCTION

mono-disperse particles) and quantitative (average width and magnitude of the pressure distributions) results. Likewise, symmetry arguments are very treacherous in granular materials: That the left and right angle of repose of a heap are the same says nothing about the symmetry of the internal stress- or density distribution.

1.7 Objective and significance 1.7.1 Objective The main objective of the thesis is to develop a stable and reliable DEM code for granular material research with realistic particle geometry (polyhedra), with a force model which is smooth enough so that the time integration is always stable. It should be able to produce heaps on smooth grounds with realistically high angle of repose without unphysical parameters or artificial boundary conditions. It should be able to reproduce history dependent phenomena (e.g. dependence of the pressure distribution and density on the construction history).

1.7.2 Significance History effects in granular materials have long been noticed in practice but been ignored in theoretical analysis, e.g. in the geotechnical field [78]. Up to now, no theoretical or numerical study is able to demonstrate and systematically investigate history effect for full three dimensional cases. In the 19th century, James Clark Maxwell proposed to a researcher (George Darwin, later published in [79]) to measure the pressure of embarkments on walls depending on the filling method. While the theories (among others, by Boussinesq) all indicated unique results, Maxwell suspected a “historical element”, i.e. an influence of the construction history. In fact, the pressure differences turned out to be up to 30%, a result which was later confirmed by Terzaghi [80]. That different filling methods lead to different pressure distributions has been ignored by the theories in Geotechnics until today, a fact which is even deplored in a current handbook on Geotechnics [78]. The question turned up again in the 1990s, when the pressure distribution under heaps became a controversial subject among researchers of different research communities (Mechanics, Physics, Powder technology and Geotechnics) who all insisted that only their (unique) predictions would be correct. So far only the discrete element method (where particles instead of continua are used) with proper friction model looks like the only candidate which can deal with history effects in granular materials due to the one-to-one correspondence (particle in nature is one element in the simulation). Nevertheless, attempts to implement simulation methods with (arbitrary convex) polyhedra [41, 42] lead to implementations where the penetration depth based force laws are not smooth, and the simulation results are not stable. To

ORGANIZATION OF THE THESIS

31

date, as a consequence, there are no systematic computational methods which are able to predict effects of the grain shape and the construction history of granular assemblies.

1.7.3

Granular heap as test case

The ability to simulate a dynamically constructed heap on a smooth ground with a realistically high angle of repose at (quasi-) static state serves as a crucial test for any DEM simulation method for practical usage. Unfortunately, to the knowledge of the author, yet none such heap have been reported with the existing DEM codes on smooth surface with realistic coefficients of friction. In the education of scientists and engineers, first statics is treated and then dynamics, because the analytical treatment of statics needs only force equations, while the analytical treatment of dynamics needs ordinary differential equations. Nevertheless, in computer simulations, simulating statics with dynamical situations is much more difficult than simulating dynamic processes where objects stay in motion. The reason is that noise (due to numerical errors, careless programming etc.) will always keep the velocity amplitudes finite, and prevent opposing forces from become equal enough so that objects stay at rest. In general, for DEM-simulations in the presence of walls or supports the elastic force will be able to restore the force equilibrium. Nevertheless, there is one case where a force equilibrium must be reached without the elastic force alone, and that is a heap on a smooth surface, where on the contacts between particles and floor, the tangential friction forces must be able to balance the forces. Any insufficiency in a simulation method will lead to the disintegration of the heap, and that is the reason why we choose it as a test case. Hardly any research groups dare to present simulation results for this case: Most of the time, the floor is made up by small particles, so that the heap particles interlock in the normal direction so that the heap looks stable [65]. Even in that case, the convex curvature is a treacherous sign that the tangential friction does not work reliably. Many researchers therefore prefer to simulate systems with walls (e.g. [41]), where the lack of stability is less obvious from the graphical output. Nevertheless, the use of walls does not imply that the simulation results are reliable, the stress-strain curve for such a simulation will be different from a simulation method which is able to produce a heap on a flat surface.

1.8

Organization of the thesis

Before we explain the content of the thesis, we first give the flow diagram of a DEM simulation. As shown in Fig. 1.16, left, a typical DEM simulation usually starts with a particle generation routine (refers to module or subroutine in FORTRAN hereafter) which prepares data for the particles and then goes into the main loop which consists of the following successive routines:

32

INTRODUCTION

Particle generation (Chapter 3) Predictor (Chapter 2) Geometry update (Chapter 3)

Contact detection (Chapter 3) DEM Loop in general

Overlap computation (Chapter 4)

DEM LOOP for polyhedra

Force and torque computation (Chapter 5) Time integration

Corrector (Chapter 2)

Fig. 1.16 General DEM flow diagram (left) and the diagram in our DEM implementation (right) for a time integration with a predictor-corrector method.

1. Geometry update which updates the vertex coordinates and the orientations of particles according to the results of integrator (or to the initial conditions at the first iteration). 2. Contact detection which detects the possible contacting particle pairs. 3. Force and torque computation which computes the contact forces and torques for contacting particle pairs and computes external forces, e.g. gravity. 4. Time integration which integrates the equations of motion according to the forces and torques. The four routines above are carried out sequentially inside the main loop until it terminates at the final time step. The flow diagram of our DEM code is shown in Fig. 1.16, right: there is a predictor routine precedes the geometry update due to the GearPredictor-Corrector formula we used for time integration (complemented the corrector routine). After the contact detection and before the force and torque computation, we need to obtain the overlap polyhedron (by the overlap computation routine) for contacting particle pairs, since the contact forces in our DEM code are modeled based on the overlap geometry of two contacting particles.

ORGANIZATION OF THE THESIS

33

The content of this thesis is organized in close relation to the routines in the DEM flow diagram: In Chapter 2, we discuss the fundamentals of a DEM simulation: the kinematics of granular particle system and the choices of the time integration methods for solving the equations of motion, which are independent to the geometry of particles and the contact force models applied. (Corresponding to the time integration routine, the complementary predictor routine and corrector routine, in the diagram Fig. 1.16) In Chapter 3, we describe how the polyhedral particles are represented in our DEM code. We also explain how to generate polyhedral particles (the particle generation routine) and how to update the particle geometry (the geometry update routine). The contact detection algorithms (the contact detection routine) are also discussed in this chapter since they are related to the shapes of the particles used. In Chapter 4, we introduce the approaches to compute the overlap polyhedron of two intersecting polyhedra (the overlap computation routine). The overlap polyhedron is used for modeling the contact forces. In Chapter 5, we explain how we model the normal force and the tangential force based on which we evaluate the forces and torques (the force and torque computation routine). In Chapter 6, we explain how to parallelize our DEM code on shared memory machines. In Chapter 7, we show the simulation result for heap formation in three dimensions. To verify the code, we also investigate the density distributions and ground pressures in quasi-two dimensional heaps with the DEM code and corresponding experiments. In Chapter 8, we summarize our development of the DEM code and the results of the verification studies obtained with the code.

Chapter 2

Kinematics and Time Integration The fundamental of the DEM investigation of granular materials is to resolve the time evolution of all the particles. Without considering the inter-particle force laws, the motion of a single particle is treated as that of a rigid body subject to external forces and torques. In this chapter, we introduce the basics of rigid-body kinematics and numerical algorithms for solving ordinary differential equations (ODEs). The basics of the first part can be found in classical textbooks such as Sommerfeld [81] or Goldstein [82], while the second part is based on monographs of Gear [83] or Hairer et al. [74, 84]. Nevertheless, with respect to practicability, some additional remarks are necessary for the applicability with the DEM-method.

2.1 Kinematics 2.1.1 Translation and rotation To represent a mass point in three dimensional Euclidean space at a certain time t, we use a vector r(t) in a fixed (chosen) Cartesian frame, r(t) = x(t)i + y(t)j + z(t)k,

(2.1)

where i, j and k are the unit vectors of the x, y, z axes of the coordinates. The velocity of such a mass point can then be represented as the first time derivative of r(t), v(t) = r˙ (t) = x(t)i ˙ + y(t)j ˙ + z(t)k. ˙

(2.2)

However, in our approach, granular particles are not simplified as mass points, since they occupy a certain space and preserve a definite shape. The motion of a granular particle is usually treated as that of a rigid body, not only be translated with respect to a fixed reference frame, but also rotated with respect to its center of mass, as shown in Fig. 2.1. This is in contrast to three-dimensional simulation packages of round particles 34

KINEMATICS

35

like EDEMTM1 and PFC3d2 which neglect the rotational degrees of freedom. Hereafter, with respect to kinematics, a (granular) particle is treated as a rigid body rather than a point mass [85]. The motion of a particle is described by the displacement of its center of mass (from c′ to c in Fig. 2.1) together with the rotation around its center of mass. The center of mass of a particle can be represented by a vector rc , rc (t) = xc (t)i + yc (t)j + zc (t)k,

(2.3)

rc (t) = x˙ c (t)i + y˙ c (t)j + z˙c (t)k.

(2.4)

and its velocity by vc ,

z’

c’

z

y

y’ x’ z’

y’

c

v

x’

ω

x Fig. 2.1 Sketch of the translation and rotation of a rigid body in three dimensional space with respect to a space-fixed coordinate system (x, y, z) and a body-fixed coordinate system (x’, y’, z’): the dashed lines indicated the boundary of the particle and the bodyfixed coordinates at a previous time while the solid line indicates the boundary and the body-fixed coordinates after translation and rotation.

To discuss the rotation of a particle, an additional body-fixed coordinate system must be introduced, which rotates together with the particle in a space-fixed coordinate system, see Fig. 2.1. The origin of a body-fixed coordinate system is usually set to the center of mass of the particle for the sake of intuitive understanding as well as analytical convenience. If a vector rb in the body-fixed coordinate system is transformed into the vector rs = 1 2

http://www.dem-solutions.com/ http://www.itascacg.com/pfc3d/

36

KINEMATICS AND TIME INTEGRATION

Rrb of the space-fixed coordinate system by a rotation matrix 

rxx ryx rzx

 R =  rxy

ryy

rxz

ryz



 rzy  , rzz

(2.5)

then R defines a rotation of the particle about its center of mass from the body-fixed system to the chosen space-fixed system. Each column of R represents the direction of the x′ -, y ′ -, z ′ -axis of the body-fixed system in the space-fixed coordinate system. R is unitary, which means its inverse equals its transpose 

rxx rxy rxz

 R−1 =  ryx ryy rzx

rzy



 ryz  . rzz

If pb is an arbitrary point on the particle in its body-fixed system, then the corresponding vector p(t) at time t in the space-fixed system is the vector sum of the rotation of pb about the its center of mass and the translation of the center of mass in the space-fixed system (rc (t)): p(t) = R(t)pb + rc (t).

(2.6)

Since for a unconstrained particle there are only three rotational degrees of freedom, there should be six constraints on the nine components of the rotation matrix R, which come from the fact that the matrix is unitary. Though the rotation matrix and its time derivative can be used in equations of motion directly [61], the redundancy of the variables and extra effort to deal with additional constraints is rather inconvenient. To represent the orientation and the angular degrees of freedom, a more common practice in simulations of physical problems is to use Euler angles (ϕ, θ, ψ) as rotational degrees of freedom (one definition of Euler angles refers to page 86 of Allen & Tildesley [86]). There are also other selections of three angle variables to represent the rotational degrees of freedom, like the Cardan angles [87] also called Tait-Bryan angles and commonly used in aircraft engineering and also fluid dynamics as yaw, pitch and roll. A general drawback of those angle representations is the singularity in the transformation between the angle variables and the component of rotation matrix R, e.g. sin ϕ = rzx / sin θ when θ approaches to nπ, (n = 0, 1, ...). The same instability also occurs in the equations of motion for the Euler angles. To avoid this instability, for the representation of the rotational degrees of freedom in the equations of motion, we choose unit quaternions (q = [s, (x, y, z)], ||q|| = √ s2 + x2 + y 2 + z 2 = 1) and their time derivatives. Unit quaternions have four components with one straightforward length constraint, which is far less cumbersome than the 3 × 3 rotation matrix, without the singularity problem which Euler angles have. A rotation about an angle α around an arbitrary unit vector u (which is the rotation

KINEMATICS

37

axis) can be represent by the unit quaternion q = [cos(α/2), sin(α/2)u]. The full rotation of a vector r can be composed by the unit quaternions q and its conjugate q∗ = [cos(α/2), − sin(α/2)u] as the transformation r′ = qrq∗ . Because of cos(α/2) = cos(−α/2), only α/2 instead of α appears in the argument of the trigonometric functions as the occurrence of α/2 in q and q∗ leads in total to a rotation of α. A point of a particle in the space-fixed coordinate system at time t can be represented by the corresponding vector pb in the body-fixed coordinate system together with the unit quaternion q as p(t) = q(t)pb q(t)∗ + rc (t),

(2.7)

where q∗ = [s, −(x, y, z)] is the conjugate of q and rc is the center of mass of the particle in the space-fixed coordinate system. The rotation matrix R, which performs the same rotation as the quaternion q = [s, (x, y, z)] is given by   R=

1 − 2(y 2 + z 2 )

2(xy − sz)

2(xz + sy)

2(xy + sz)

1 − 2(x2 + z 2 )

2(yz − sx)

2(yz + sx)

1 − 2(x + y )

2(xz − sy)

2.1.2

2

  .

(2.8)

2

Equation of motion

The motion of granular particles is decomposed according to König’s theorem [85] as the translational motion of the centers of mass and the rotational motion around the centers of mass. The translational motion of a particle at time t obeys Newton’s law ¨r(t) = M−1 F(t),

(2.9)

where M is the mass matrix, r(t) the vector of the center of mass and F(t) the external force vector. The external force F(t) = G +

l ∑

fcj (t)

(2.10)

j=1

is the sum of the gravitational force G and contact forces fcj (t) (l is the number of contacts). The contact force consists of normal and tangential components discussed in Chapter 5. Though in this thesis we focus on repulsive contact forces in the normal directions for dry granular particles, attractive contact forces can also be included in Eq. (2.10) to simulate cohesive particles. A crucial difference between the translational degrees of freedom and the rotational degrees of freedom is that the rotations don’t commute (like matrix multiplications). The angular degrees of freedom are subjected to Euler’s equation of motion L˙ = τ,

(2.11)

38

KINEMATICS AND TIME INTEGRATION

in which L is the angular momentum and τ is the external toque introduced by the contact force. Substituting L = Iω into Eq. (2.11) gives ˙ + Iω˙ = τ, Iω

(2.12)

where I is the moment of inertia and ω the angular velocity. While Newton’s equation of motion is Galilei-invariant, i.e. it can be solved in any coordinate system, the Euler equation of motion must be solved in the center of mass system. Thus, Eq. (2.9) is valid in any chosen space-fixed coordinate system while the quantities in Eq. (2.11) and Eq. (2.12) are in the body-fixed coordinate system whose origin locates at the center of mass of the particle. The transition between the space-fixed and the body-fixed coordinate system is through the rotation matrix R, e.g. for angular velocity ωs = Rωb (the subscript s stands for the space-fixed and b the body-fixed coordinate system). Hereafter, variables in the space-fixed coordinate system will be used without any subscript while variables in the body-fixed coordinate system will be specified by subscript

b

or superscript b .

In section 2.1.1 we mentioned that Euler angles are not a satisfactory choice for representing the orientations due to singularity in the transformation between the angle variables and the component of rotation matrix. As angular degrees of freedom, they suffer from the same singularity in the equations of motion. As can be seen form the first time derivatives of Euler-angles (ϕ, θ, ψ) 

 ϕ˙  ˙   θ  = ψ˙

    



cos ϕ cos θ sin θ sin ϕ cos ϕ − sin θ

sin ϕ cos θ sin θ cos ϕ sin ϕ sin θ



   ωx   0   ωy  ,  ωz 0 1

when θ approaches to nπ, the evaluation for ϕ˙ and ψ˙ becomes unstable or impossible. For unit quaternions q and their time derivatives, from Eq. (2.11) and Eq. (2.12), the second time derivative of the unit quaternion q is 1 ˙ ¨ = (ωq ˙ + qω), q 2

(2.13)

with the auxiliary equations ˙ ∗, ω = 2qq ω˙ = I

−1

(L × ω + τ ),

(2.14a) (2.14b)

L = Iω,

(2.14c)

I = RIb RT ,

(2.14d)

T I−1 = RI−1 b R ,

(2.14e)

in which Ib is the moment of inertia in the body-fixed system. Eq. (2.13) is solved as an

TIME INTEGRATION

39

substitute for solving Eq. (2.12) (to avoid the use of the time derivative of the inertia tensor) directly. In practice, several points of solving Eq. (2.13) with its auxiliary Eqs. (2.14) are worth mentioning: 1. The moment of inertia I is obtained from Eq. (2.14d) as transformation, instead of computing from it from its time derivative in the space-fixed coordinate system (which is very complicated). 2. The inverse of the moment of inertia I−1 needed for computing the time derivative of the angular velocity, ω˙ (Eq. (2.14b)), is computed by the transformation (Eq. (2.14e)) rather than from a direct matrix inverse operation which is much more time consuming and less stable than matrix multiplication in the numerical computation. 3. The direct computation for the moment of inertia from the particle geometry and matrix inverse operation to compute its inverse are needed only once in the bodyfixed coordinate system at the initial stage of simulation. 4. The unit quaternion q should be normalized at each time step after the integrator is called to preserve its unit length, else it would not only rotate the particle but also change its shape. 5. We noticed in our research that the time derivatives of the unit quaternion q˙ should also be constrained, which is not stated in any references we have found so far. Since ˙ ∗ at the right ω is a pure vector, the scalar part of the quaternion multiplication qq in Eq. (2.14a) should vanish: sq˙ sq + xq˙ xq + yq˙ yq + zq˙ zq = 0,

(2.15)

i.e. q˙ and q must be orthogonal (in the vector sense). The orthogonalization should be performed after each call of the integrator, as is the normalization of the quaternions.

2.2

Time integration

As numerical approximation for the equations of motion, Eq. (2.9) and Eq. (2.13) (with the auxiliary Eqs. (2.14)), the backward-difference formula (BDF) (in the form of “Gear Predictor-Corrector” [83, 86]) is used. Its advantage is its stability and efficiency, as it is an implicit method which does not need a matrix inversion or a solution of a non-linear system of equations in the predictor-corrector formulation. In the following the principle of the implicit predictor-corrector methods is explained and the Gear predictor-corrector formulation for 2nd-order ordinary differential equations (ODEs) is given. The efficiency and stability of the method are discussed briefly.

40

KINEMATICS AND TIME INTEGRATION

2.2.1 Explicit and implicit methods We will explain here the principle of the implicit predictor-corrector methods with the Euler (first order) methods as example. Since we can always rewrite a higher-order system of differential equations as a lower-order system by transforming to a larger number of equations and variables, we start from the general form of first order differential equation (2.16)

y(t) ˙ = f (t).

In principle, if we want to discretize this equation in lowest order by replacing the dif-

y(t) + δ t f(t) y(t+δ t) y(t)

f(t) y t

t+ δ t

Fig. 2.2 Euler explicit: The new solution y(t) + δt · f (t) at time t + δt is computed from the old solution y(t) and the old gradient f (t) at time t.

ferential with the finite difference δt, we have two choices: We can either use the “old” gradient

y(t + δt) − y(t) = f (t) δt

(2.17)

as the right-hand side, which is called the “forward-formula”: Seen from t, the tangent is in the forward direction (Fig. 2.2). Alternatively, we can use the new gradient y(t + δt) − y(t) = f (t + δt), δt

(2.18)

which is the “backward-formula” for the tangent as seen from f (t + δt) (Fig. 2.3). When we rewrite the equations, from Eq. (2.17) we obtain the solution y(t + δt) = y(t) + f (t)δt,

(2.19)

TIME INTEGRATION

41

f(t+δ t) y(t)

y(t+δ t) y(t)+δ t f(t+δ t)

y t

t+ δ t

Fig. 2.3 Euler implicit: The new solution y(t) + δt · f (t + δt) at time t + δt is computed from the old solution y(t) at time t and the new gradient f (t + δt) at time t + δt.

from the old solution y(t) and the old right-hand side f (t). This is an “explicit formula”, because all terms necessary to compute y(t + δt), namely y(t) and f (t), are known at time t. In contrast, from Eq. (2.18) the “implicit formula” y(t + δt) = y(t) + δt · f (t + δt),

(2.20)

makes use of the known y(t) and the not yet known f (t+δt). To obtain an estimate for the yet unknown right-hand side f (t + δt) at the new time step t + δt, we can now approximate it via the explicit Euler-formula via a “predictor-step” y p (t + δt) = y(t) + δt · f (t),

(2.21)

and from this we can compute f p (t + δt). Now we advance y(t) to y(t + δt) = y(t) + δt · f p (t + δt).

(2.22)

As can be seen from Fig. 2.2 and Fig. 2.3, for the given exact trajectory the numerical approximation via the explicit method overestimates the value for the solution, while the implicit method underestimates the value for the solution. Accordingly, in the predictorcorrector scheme, the predicted and corrected values can be combined in such a way that the error compensates which leads to a smaller error-constant and better stability than the other two.

42

KINEMATICS AND TIME INTEGRATION

For higher order methods, higher-order predictors have to be used which do not only use time step y(t), f (t) but also from the previous time steps y(t − δt), f (t − δt) . . . . In such a way, a “higher-order expansion” for the predicted value is obtained. To obtain the higher order corrector formulae, either the combination of old right-hand sides f (t), f (t − δt), f (t − 2δt) . . . is used (Adams-Moulton family [74]), or the old right-hand sides f (t), f (t − δt), f (t − 2δt) . . . together with the old function values y(t), y(t − δt) . . . (Gear family [83]). For computing the results for a new time step, not only the results from the current time step but also from previous time steps are used, thus predictor-corrector formulas are classified as multi-step method in comparison to other one-step methods such as Runge-Kutta methods. We chose the Gear predictor-corrector formulation for its stability and efficiency. The advantage of the Gear predictor-corrector over the Adams-Moulton family is that it is “stiffly stable”, i.e. able to neglect small oscillations in the solution, and “A-stable”, i.e. it is able to approximate the solution of some equations with arbitrary large time step. Moreover, it is an implicit method which in the predictor-corrector form does not need a matrix inversion or a solution of a non-linear system of equations as is the case for implicit Runge-Kutta methods.

2.2.2 Gear predictor corrector for 2nd-order ODEs A common practice in the solution of higher order ODEs is to rewrite them into a system of first-order ODEs, since most solvers in the field of numerical mathematics are developed for first-order ODE(s). The second-order Newton’s equation of motion for a particle (Eq. (2.9)) can be rewritten as two coupled first-order ODE r˙ (t) = v, ˙ v(t) = M−1 F(t) and solved simultaneously. With the Gear predictor-corrector algorithms, we have the choice whether we want to perform this transformation or whether we want to solve the second order equations of motion (Eq. (2.9) and Eq. (2.13)) directly. For the equation of motion for translation as example, let us denote r0 as the vector of the center of mass, δtn dn r0 δtn rn = as the n-th time derivatives of r rescaled by a factor , the six-value 0 n! dtn n! predictor for ri (i = 0, 1, . . . , 5) at time t + δt from t is a simple application of the Taylor series           

rp0 (t + δt)





   rp1 (t + δt)    p  r2 (t + δt)    =  rp3 (t + δt)    p  r4 (t + δt)    p r5 (t + δt)

1 1 1 1 1 0 1 2 3 4 0 0 1 3 6 0 0 0 1 4 0 0 0 0 1 0 0 0 0 0

1



  5    10      10    5   1

r0 (t)



 r1 (t)   r2 (t)   . r3 (t)   r4 (t)   r5 (t)

(2.23)

TIME INTEGRATION

43

The corrected value rci takes the form           

rc0 (t + δt)





   rc1 (t + δt)     c r2 (t + δt)    = c  r3 (t + δt)     rc4 (t + δt)    rc5 (t + δt)

rp0 (t + δt)





   rp1 (t + δt)    p  r2 (t + δt)    + p  r3 (t + δt)     rp4 (t + δt)    rp5 (t + δt)

c0



 c1   c2    ∆r c3   c4   c5

(2.24)

in which ∆r = rc2 (t + δt) − rp2 (t + δt) is the difference between the predicted value rp2 (t + δt) and the corrected value rc2 obtained by substitute the predicted value rp1 (t+δt) and rp0 (t+δt) into the equation of motion for translation (Eq. (2.9)). The coefficients ci for the six-value corrector are listed in Table 2.1, see also in the references [83, 86]. Attention should be paid that for second-order ODEs, if the first time derivative appears in the right-hand side of the equation, like for in our case velocities and first time derivative of quaternion are used for evaluating forces and torques, the coefficients for higher order correctors are slightly different from standard ones which are for velocity-independent forces.

Table 2.1 Gear corrector coefficients for second-order differential equation like ¨r = f (r, r˙ ) (velocity-dependent forces) of order from two to five. Order 2 3 4 5

c0 0 1/6 19/90 3/16

c1 1 5/6 3/4 251/360

c2 1 1 1 1

c3

c4

c5

1/3 1/2 11/18

1/12 1/6

1/60

The variables used in the Gear predictor-corrector formula are rn =

δtn dn r0 , n! dtn

which are the physical quantities dn r0 /dtn rescaled by factors δtn /n! (orders of the time step size δt), e.g. r1 corresponds to velocities rescaled by δt and r2 corresponds to acceler1 ations rescaled by δt2 . Such time step size rescaled variables were originally introduced 2 by Nordsieck [88] for the purpose of changing time step easily. Since the coefficients for corrector would be time-step size dependent when using the original variables without rescaling, Gear adopted such variable representation when he constructed the methods [83] while Allen and Tildesley [86] made it a common practice when using Gear-predictorcorrector methods without explanation.

44

KINEMATICS AND TIME INTEGRATION

2 Runge−Kutte 4/5th order / Prince−Dormand Gear−Predictor−Corrector 1−5th order

height z

1.5 1 0.5 0 0

2

4

6

8

10 t

12

14

16

18

20

Fig. 2.4 Simulation of a bouncing ball with gravitation constant g = 9.8, mass m = 1, damping constant 0.3 and spring constant k = 103 dropping from height h = 2 on a floor: While the explicit Runge-Kutta method needs so small time steps to resolve even the equilibrium position that the different time steps cannot even be resolved in the graph, the Gear-backward-difference formula allows much larger time steps.

2.2.3 Stability

Many texts written by authors not with a numerical analysis background, but with an applied science background (notably the first two editions of Numerical Recipes [89, 90], or Garcia’s “Numerical Methods for Physics” [91]) focus exclusively on accuracy when they discuss the numerical integration of ordinary differential equations. Nevertheless, for an efficient use of such methods, it is desirable to choose the time step δt as large as possible. The actual possible time step is not given by the accuracy properties of an algorithm (usually given by the Taylor order), but by the stability properties. Requirement for accuracy is stability, as can be seen from numerical analysis texts, where the stability is proved before the accuracy. Moreover, for stability, in numerical analysis the behavior with respect to small noise is classified. “A-stable” and “stiffly stable” means that the numerical integration method is able to ignore the noise [74, 84]. We can see the advantage of stiffly stable behavior in Fig. 2.4, where a bouncing particle is simulated with the same error for large time steps with the adaptive Gear-family of orders 1 to 5, while the Runge-Kutta method of order 4 and 5 (“Prince Dormand” [74, 84]) needs much smaller time steps, and accordingly much more function evaluations and computer time. The difference can be seen especially at the equilibrium position at the end, for which the Gear-family can select the time step orders of magnitude larger. In our simulation, we have implemented only a constant time step algorithm, because the changing many-particle contacts don’t allow much variation in the time step as the characteristic frequency for packings is given by the mass of the particles and the Young’s modulus. Nevertheless, the maximal time step which can safely be chosen is much larger than for any comparable method.

TIME INTEGRATION

2.2.4

45

Number of iterations

In the previous section, we have assumed that a single corrector step will give a result which is “good enough”, namely, that is it deviates only little from the predictor step, which is computed under the assumption that the force will not change to the next time step. In fact, the corrector-steps can be iterated, so that if the deviation between the predictorand the corrector-result is too large for a given time step δt, a “good” result can still be obtained for a “too large” time step with successive corrector iterations. In “Computer simulation of liquids” by Allen & Tildesley [86] it was remarked that corrector iterations do not improve the results significantly, but that remark applies to the smooth potentials of inter-molecular forces, not for our rather stiff potentials. As we adapt the time step to the oscillation frequency of particles which are more or less in force equilibrium, the time step may actually be too small for particles which accumulate speed in free fall and then impact in a granular assembly. In that case, a simulation without additional iteration may “explode”, i.e. unphysically not be able to dissipate out the kinetic energy of the impacting particles, and transmit the excess of kinetic energy to neighboring particles.

Chapter 3

Geometry and Contact Detection In our DEM formulation, a granular particle occupies a certain space and preserves a certain shape. Though most studies of granular materials use disks in two dimensions or spheres in three dimensions, it is obvious that realistic particles are rather polyhedral than roundish. Thus in our DEM simulation, we model granular particles as convex polyhedra. In this chapter, we discuss how polyhedral particles are represented and generated in the simulation and how the geometric and physical properties (volume, mass, center of mass and moment of inertia) are computed. The overlap detection algorithm, which is closely related to the particle geometry is also discussed.

3.1 Particle geometry 3.1.1 Basic concepts and data structure While for round particles, the radius alone represents the geometry, for polyhedral particles, more considerations are necessary for their usage in a DEM simulation. While the basic concepts explained in this section can be found in books on computational geometry [92], we focus on how features like vertices edges and faces have to be represented to compute the overlap of polyhedra. In the following, it will sometimes be convenient to talk about “features” [63], which means vertices, edges and faces of a polyhedron. To represent a polyhedron, we need to specify two kinds of information: geometric and topological. The geometric information refers to the boundary of the polyhedron, that is the surface which separates the polyhedron from the interior (bounded) and the exterior (unbounded) region. The surface of a polyhedron is a set of polygons which are triangles or can be subdivided into triangles in our algorithm (such subdivisions are termed triangulation in computational geometry). A face of a polyhedron is one of its separating triangles. An edge is the intersection of the two adjacent faces. A vertex is the intersection of several edges. The topological information needed is the adjacencies and connectivities 46

PARTICLE GEOMETRY

47

of the vertices, edges and faces. In our simulation, only limited while sufficient features are used, namely the vertices and faces and their connectivity. A vertex is a point in the three dimensional Euclidean space and is represented by its coordinates V = (Vx , Vy , Vz ) in Cartesian coordinates. For a polyhedron with nv vertices, the array VERT_COORD(1 : 3, 1 : nv ) is used to store the coordinates of the vertices, where the indices of the first dimension are for the coordinates axes and of the second dimension are for the indexing all the vertices. For our purpose, it is convenient to describe a plane by the point normal form n · r − d = 0,

(3.1)

where · indicates vector inner product, n(nx , ny , nz ) is the unit normal vector of the plane and r(rx , ry , rz ) is an arbitrary point on the plane and d is the distance from the origin to the plane. Thus the four parameters (rx , ry , rz , −d) can be used to represent a plane. A face is presented by a table with its vertices indices (referred as FACE_VERTEX_TABLE hereafter) and by the coefficients of the plane equation. For nf triangular faces of a polyhedron, we use the FACE_VERTEX_TABLE(1 : 3, 1 : nf ) array to store the vertices indices for each face and the FACE_EQUATION(1 : 4, 1 : nf ) array to preserve information for the equations of the planes. Suppose a face is determined by three vertices (V1 , V2 , V3 ), so when the coordinates of the vertices (Vix , Viy , Viz ) are known, we can compute the unit vector from a triangular face by the vector cross product (indicated by ×) n=

(V2 − V1 ) × (V3 − V1 ) . ||(V2 − V1 ) × (V3 − V1 )||

(3.2)

The distance d can be solved for by substituting the coordinates of one of the vertices, e.g. V1 into Eq. (3.1) so that we obtain d = n · V1 .

(3.3)

As can be seen in Fig. 3.1, the unit normal vectors are selected towards the outside the polyhedron. If Eq. (3.1) holds, a point r is on the surface of the polyhedron. If the condition ni · r − di > 0

(3.4)

is fulfilled for all nf faces, it is outside the polyhedron. If ni · r − di < 0,

(3.5)

it is inside the polyhedron. Instead of providing adaptive data structures with a variable number of features (vertices, edges and faces) for each polyhedron, we have fixed a maximal default size for the number of vertices, edges and faces. The advantage of saving

48

GEOMETRY AND CONTACT DETECTION

n2

V1

V1

F1

F3

V3

V4

F2

F3

V1

n1 r − d1 = 0 V2

F1

n3 r − d3 = 0 V3 V4

V2

V1

V2

F2

F4

n1

n3

n2 r − d2 = 0 V2

V3

n4 r − d4 = 0 V4 V4

V3

Fig. 3.1 A sketch of the geometry of a tetrahedron and its topological information of vertices and faces: Vi stands for the vertices and Fi for the triangular faces with their normals ni and the distances to the origin di . F1 for example, consists of the three vertices (V1 , V2 , V3 ) and the equation for the plane it is located is n1 · r − d1 = 0, in which r is an arbitrary point on the plane.

VERT_COORD vertex index: 1 → 4 V1x V2x V3x V4x V1y V2y V3y V4y V1z V2z V3z V4z

FACE_VERTEX_TABLE face index: 1 → 4 1 1 1 2 2 3 4 4 3 4 2 3

FACE_EQUATION face index: 1 → n1x n2x n3x n1y n2y n3y n1z n2z n3z d1 d2 d3

4 n4x n4y n4z d4

VERTEX_FACE_TABLE vertex index: 1 → 4 1 1 1 2 2 3 2 3 3 4 4 4

Fig. 3.2 An example of the VERT_COORD, FACE_EQUATION, FACE_VERTEX_TABLE and VERTEX_FACE_TABLE array used in the DEM simulation to represent the tetrahedron in Fig. 3.1: (Vix , Viy , Viz ) are the coordinates of the vertex Vi and (nix , niy , niz ) are the components of the normal vector ni of the triangular face Fi .

memory with adaptive data structures is not so great for codes like ours which use only several dozens MB of storage on today’s computers, while the disadvantage, much more complicated algorithms and debugging, are rather prohibitive. Since we need to compute the overlap of two polyhedra (Chapter 4) to determine the contact force (Chapter 5), it is convenient to construct a VERTEX_FACE_TABLE(1 : nvfmax , 1 : nv ) array, in which for each vertex, the faces on which it is located are stored (nvfmax is the maximum number of faces a vertex is on). It can be regarded as a “reverse table” for the FACE_VERTEX_TABLE array.

PARTICLE GEOMETRY

49

The necessary information for representing a polyhedron in our algorithm consists of the vertex coordinates (a VERT_COORD array, the geometric information) and the topological information, a FACE_VERTEX_TABLE array. The point-normal equations for the faces and the FACE_VERTEX_TABLE array which are used as auxiliary information can be derived from the VERT_COORD and the FACE_VERTEX_TABLE array. An example of how those arrays would appear in the simulation is given in Fig. 3.2 for the tetrahedron in in Fig. 3.1.

3.1.2

Particle generation and geometry update

Fig. 3.3 Various kinds of polyhedra for DEM simulations: the two irregular-shaped polyhedra at the left up corner are generated by convex-hull-algorithm while the rest of polyhedra are generated from selecting points on the surface of an ellipsoid or spheres.

Irregular-shaped polyhedra can be generated relatively easily with convex hull algorithms [92, 93] of random points but the result are rather random connectivities with strongly varying length of the edges (Fig. 3.3). For more regular geometry, we choose points on the surface of an ellipsoid with given half radii [94], see Fig. 3.3. From MATLAB’s bulid-in convex hull function, for a set of random points, we can obtain directly the vertex coordinates (VERT_COORD array) and the faces in terms of indices of the vertex (FACE_VERTEX_TABLE array) of a polyhedron which is a convex hull of all the initial points. To generate a polyhedron of regular geometry, we start from one of the poles of an ellipsoid and select points sequentially on several layers (the perimeters) towards the other pole. In Schinner’s scheme [94], as the layers approach the equator, the number of points selected are doubled (as shown in Fig. 3.4, left) while crossing the equator towards the other pole the number is halved. A problem with this method is that the length of the edges at the equator is considerable smaller while the “density” of the edges is large than near the pole when the layer number or the number of points on the first layer is

50

GEOMETRY AND CONTACT DETECTION

large. We have modified the algorithm so that we use the same number of points for the two layers next to the poles and twice the number for the rest, see Fig. 3.4, right. If we connect the vertices on a layer to its adjacent layers (or the pole) with triangles, we obtain the faces. For the same number of vertices and faces, our method can generate polyhedra with better triangulation (less variation of the triangle areas). Small variations in the edge length are beneficial for our computation as our overlap algorithm has problems in dealing with degenerate features (edge-edge intersections), as will be discussed in Chapter 4. As mentioned in Chapter 2, the configuration (positions and orientations) of a polyhedron changes in the space-fixed coordinate system while it remains the same in the body-fixed coordinate system. While the position of the center of mass of the polyhedron moves according to Newton’s equation of motion (Eq. (2.9)), the coordinates of the vertices are rotated with respect to the initial configuration in the body-fixed coordinate system. Thus, a record of the coordinates for each vertex in the body-fixed coordinate system (a VERT_COORD_BODY array) is kept during the simulation and used for computing the new coordinates in the space-fixed coordinate system (a VERT_COORD_SPACE array). For every time step, a new position of the center of mass rc and the new quaternion q for the rotation are obtained from integration of the equations of motion (Eq. (2.9) and Eq. (2.13)). The vertex coordinates of the polyhedron which are the entries in the VERT_COORD_SPACE array, are computed from their corresponding entries in the VERT_COORD_BODY array, together with the new rc , q, according to the explanation of how to rotate the vector between coordinates systems via the unit quaternion (Eq. (2.7) in section 2.1.1).

3.2 Physical properties To represent a polyhedron as rigid body with the Euler-Newton equations of motion, we need its physical properties volume, mass, center of mass and moment of inertia. Such information is computed easiest by a decomposition scheme [94]: Partition a polyhedron into several special tetrahedra with orthogonal features and then sum up the properties of the special tetrahedra appropriately. The physical properties are computed during the beginning of the DEM simulation after the particle features have been initialized.

3.2.1 Decomposition of a polyhedron into tetrahedra To compute the geometric and physical properties of a convex polyhedron, we need to divide it into simpler geometrical objects, whose volume, center of mass and moment of inertia are easier to calculate. We use a special kind of tetrahedra as shown in Fig. 3.5, which will be called a pyramid hereafter, whose vertices (V1 , V2 , V3 , V4 ) are shifted to V1 = (0, 0, 0), V2 = (a, 0, 0), V3 = (a, d, 0), V4 = (a, b, c).

(3.6)

PHYSICAL PROPERTIES

51

l1 nv = m l2 nv = 2 m

l1 nv = m

l3 nv = 4 m

l3 nv = 2 m

l2 nv = 2 m

m=6

62 vertices 120 faces Fig. 3.4 Comparison of the polyhedra with 62 vertices and 120 faces generated from Schinner’s [94] and our method: Schinner’s particle generation scheme (left) and our modification (right). To obtain a polyhedral particle, we first divide an ellipsoid into several layers (l1 ,l2 ,l3 ...); We start from one of the poles and pick up m points (evenly) located on the perimeter of the first layer, then we move to the next layer and double the number of points until we cross or reach the equator (upper left) else double the number of points only once (upper right). After crossing the equator, we halve the number of points until we reach the last layer (upper left), else we keep the number of points and halve it for the last layer (upper right). Finally we connect the selected points between adjacent layers (or poles) sequentially with triangles.

The face (V1 , V2 , V3 ) is in the X-Y-plane while the face through (V2 , V3 , V4 ) is parallel to the Y-Z-plane. Such a composition into pyramids with an edge perpendicular to the base, both of them parallel to the coordinate axes, makes the following computations easier. For the decomposition of a polyhedron P with N triangular faces Ti (i = 1, . . . , N ) i and vertices VT (j = 1, 2, 3), we choose the origin O(0, 0, 0) inside P. Then it can j

Ti Ti i be decomposed into N sub-tetrahedra Pi with the vertices (O, VT 1 , V2 , V3 ). Each

tetrahedron Pi is further decomposed into pyramids (as shown in Fig. 3.5) with an edge perpendicular to the basis. To obtain such special tetrahedron, a line segment from the origin O along the normal to its intersection point Li with the plane Ti is taken. The line i segments extend from Li to the vertices VT j and the three special tetrahedra Pi,j of the

52

GEOMETRY AND CONTACT DETECTION

z’ c

V4 y’ d

V3

b V2 a

O V1 Fig. 3.5

x’

Sketch of a pyramid for decomposing a complex polyhedron.

z y

x’

z’ O

V3Ti

VT2 i Li

T

V1i x

y’

Fig. 3.6 A sub-tetrahedron Pi of a polyhedron P: Pi is composed relative to the origin O (0, 0, 0) (inside P) with a triangular face Ti ; Pi is further divided into three special pyramids by the line OLi perpendicular to the face Ti . To orientate the special pyramids Ti i as shown in Fig. 3.5, e.g. Pi,1 (O, Li , VT 1 , V2 ), the unit vector along OLi is set as ′ the x -axis, the unit vector from O parallel to Li V1 is set as the y′ axis while the cross product of x′ and y′ is set as the z′ axis.

sub-tetrahedron Pi , Ti i Pi,1 = (O, Li , VT 1 , V2 ), Ti i Pi,2 = (O, Li , VT 2 , V3 ), Ti i Pi,3 = (O, Li , VT 3 , V1 )

PHYSICAL PROPERTIES

53

are determined as shown in Fig. 3.6. To orientate a pyramid Pi,j as in Fig. 3.5, we take the unit vector along OLi as the x′ -axis, the unit vector from O parallel to Li Vj is set as the y ′ axis and the cross product of x′ and y ′ as the z ′ axis. For Pi,1 as an example, the axes x′ , y′ and z′ of the special pyramid coordinate system (x′ , y ′ , z ′ ), x′(x′ ,y′ ,z ′ ) = (1, 0, 0)T , y′(x′ ,y′ ,z ′ ) = (0, 1, 0)T , z′(x′ ,y′ ,z ′ )

(3.7)

T

= (0, 0, 1) ,

are as follows in the (x, y, z) coordinate system of the polyhedron P OLi ||OLi || i Li VT 1

x′(x,y,z) = y′(x,y,z) = z′(x,y,z)

= (x′x , x′y , x′z )T ,

= (yx′ , yy′ , yz′ )T , i ||Li VT || 1 = x′(x,y,z) × y′(x,y,z) = (zx′ , zy′ , zz′ )T

(3.8)

where || · || denotes the norm of a vector. The rotation of a vector r′(x′ ,y′ ,z ′ ) in the (x′ , y ′ , z ′ ) coordinate system to a vector r(x,y,z) in the (x, y, z) coordinate system is done by the rotation matrix R through r(x,y,z) = Rr′(x′ ,y′ ,z ′ ) .

(3.9)

Substitute Eq. (3.7) and Eq. (3.8) into Eq. (3.9) gives )T x′x , x′y , x′z = R(1, 0, 0)T ( ′ ′ ′ )T y ,y ,y = R(0, 1, 0)T ( x′ y′ z′ )T zx , zy , zz = R(0, 0, 1)T

(

(3.10)

from which the rotation matrix 

x′x yx′ zx′

 R =  x′y

yy′

x′z

yz′



 zy′  zz′

(3.11)

is determined. For each pyramid Pi,j , the rotation matrix Rij which turns Pi,j in the (x, y, z) coordinate system into the P′ i,j in the (x′ , y ′ , z ′ ) coordinate system is the transpose of R in Eq. (3.11) 

x′x x′y x′z

 Rij =  yx′

yy′

zx′

zy′



 yz′  . zz′

(3.12)

To calculate the geometric and physical properties, with the rotation matrix Rij , each pyramid Pi,j of the polyhedron P in the (x, y, z) system is first turned into P′i,j in the (x′ , y ′ , z ′ ) system and afterwards the calculated properties are rotated back into the (x, y, z)

54

GEOMETRY AND CONTACT DETECTION

system via the transpose of Rij .

3.2.2 Volume, mass and center of mass The total volume Vtotal of a polyhedron P is the sum over its sub-pyramids Vtotal =

N ∑ 3 ∑

′ Vi,j .

(3.13)

i=1 j=1

1 acd. For a homogeneous 6′ ′ density distribution (ρ(r) = const), the mass of a sub-pyramid Pi,j is mi,j = ρVi,j and the ′ in which the volume of a sub-pyramid P′i,j (Fig. 3.5) is Vi,j =

total mass of the polyhedron is the sum of the mass of the pyramids mtotal =

N ∑ 3 ∑

′ Vi,j ρ(r) = ρVtotal .

(3.14)

i=1 i=1

The center of mass of a polyhedron rP is given by the weighted average of the centers of mass of the pyramids

∑N ∑3

j=1 mi,j ri,j

i=1

rP =

(3.15)

mtotal

where mi,j is the mass and ri,j the center of mass in (x, y, z) coordinate system of a subpyramid. The center of mass of a sub-pyramid P′i,j in the (x′ , y ′ , z ′ ) coordinate system (Fig. 3.5) is given by

 r′ i,j =

3a



1   b + d . 4 c

(3.16)

To compute the center of mass for the polyhedron, r′ i,j must be rotated into the (x,y,z) coordinate system of the polyhedron P via the rotation matrix R (Eq. (3.11)) ri,j = Rr′ i,j .

(3.17)

Then the center of mass of the polyhedron turns out as ∑N ∑3 rP =

i=1

′ j=1 mi,j Rr i,j

mtotal

.

(3.18)

CONTACT DETECTION

3.2.3

55

Moment of inertia

We start by computing the moment of inertia for the sub-tetrahedron P′ i,j  I′i,j =

cad    60 

b2 + bd + c2 + d2 −2a(b + d) −2ca

−2a(b + d)

−2ca

1 c2 + 6a2 − c(2b + d) 2 1 2 − c(2b + d) b + bd + 6a2 + d2 2

    

(3.19)

in the (x′ , y ′ , z ′ ) coordinate system. According to the parallel axis theorem (Steiner’s theorem [95]) from I′ i,j we can obtain the moment of inertia of Pi,j in the (x, y, z) coordinate system Ii,j = RI′i,j RT + mij (aT aE − aaT ),

(3.20)

in which the rotation matrix R is given by Eq. (3.11). E is the 3 × 3 identity matrix and a is the vector between the center of mass of the polyhedron P and the center of mass of the pyramid Pi,j , a = rP − ri,j . As the center of mass is linear in the mass density ρ(r), the moment of inertia is additive, which means that the moment of inertia of the polyhedron P is equal to the sum of the moments of inertia of all the sub-pyramids Ii,j with respect to the center of mass, Itotal =

3 N ∑ ∑

Ii,j .

(3.21)

i=1 i=1

In a DEM simulation, the body-fixed coordinate system of a polyhedron is set to its center of mass rP and Itotal from Eq. (3.21) is the moment of inertia Ib in the bodyfixed coordinate system. While the moment of inertia is fixed in the body-fixed coordinate system, rotating the particle will change its moment of inertia in the space-fixed coordinate system. In DEM simulations, when the particle orientation changes, also the moment of inertia has to be updated (as stated in Eq. (2.14d)) in each time step.

3.3

Contact detection

For a general simulation with n particles, in principle the interaction with all other n − 1 particles must be computed. This gives a computational effort of the order n(n − 1)/2, which is expressed by “big-O” notation (highest order of the variable only, without prefactors) as O(n2 ). The prefactor 1/2 stems from the fact that for the force Fij of particle i on particle j, we have Fji = −Fij from Newton’s “action=reaction” principle, so all interacting pairs have to be dealt with only once. If the interaction is “long-range”, it is in principle necessary to evaluate all these interaction forces directly (if the interaction falls steeper than 1/r2 , there are possibilities to use cut-offs, or at least cut-offs are used). The effort is O(n2 ) e.g. for gravitational problems, in electrodynamics with Coulombforces which are not screened, or for the interaction of vortices in fluid dynamics, if no tricks like Ewald-sum or oct-trees can be used.

56

GEOMETRY AND CONTACT DETECTION

(a)

(b)

(c)

Fig. 3.7 Contact detection via bounding boxes in 2D: (a) Non-intersecting bounding box, no overlap between the two polygonal particles; (b) Intersecting bounding box (dashed line) for overlapping particles; (c) Intersecting bounding box (dashed line) for non-overlapping particles.

Since our DEM interaction is via the particle elasticity, only particles which touch others experience an interaction. The contact force depends on the overlap geometry of two intersecting particles (the details of the force models are discussed in Chapter 5). Instead of computing all possible n(n − 1)/2 overlap polyhedra, and obtain zero most of the time, for short-range interactions like ours, we apply a hierarchy of contact detections before the contact list is forwarded to the overlap computation: less computational effort goes hand in hand with less information, but as long as the purpose is only the elimination of noninteracting pairs, this does not matter. We start with “computationally cheap” operations which yield inaccurate contact information, and continue with more “computational expensive” operations to refine the results. Every time, a longer list of particle pairs which are possibly in contact is reduced by successively more complicated and computationally costly algorithms. We apply the following scheme to detect the possible intersecting particle pairs: 1. Determination of possible contacting particle pairs: we first compare the extremal coordinates, whether there can be an overlap at all (see Fig. 3.7 (a)). The extremal coordinates form the “bounding box” of the particle (see Fig. 3.7 (b)). The comparison of the bounding boxes is used in a “neighborhood algorithm”, which is explained in section 3.3.2. From the neighborhood algorithm, the list of possibly contacting particle pairs (referred as contact list hereafter) is prepared for further overlap computation. 2. Refinement of the contact list: In the next step, when we have determined the neighborhood and have obtained a contact list, we would like to have a “cheap” method (compared to the full computation of the overlap polyhedron discussed in Chapter 4) to determine whether particles do not overlap, though their bounding boxes from the extremal corners do intersect, like the case in Fig. 3.7 (c). The “refinement algorithm” to detect and remove (some of) such cases from the contact list supplements the neighborhood algorithms for polyhedral particles and will be discussed in section 3.3.3.

CONTACT DETECTION

3.3.1

57

Conventional neighborhood algorithms

Many neighborhood-algorithms have been used over the last couple of decades for computer simulations to reduce the cost of computing the O(n2 ) interactions. We discuss briefly the neighborhood-algorithms via Verlet-tables and neighborhood-tables [86], mainly to show that they are not suitable for our simulation.

dx=vdt

i R

Fig. 3.8 Verlet table: For a maximal velocity v and a timestep dt, the maximal distance covered will be dx = vdt. If the Verlet-table has been set up for all particles in a radius R around particle i, after n = R/dx timesteps the Verlet table must be recomputed.

Verlet tables “Verlet-tables” are not to be confused with the integration algorithm rediscovered by the same author [96] (and original due to Störmer[97]). Verlet-tables are made by going over the whole range of the simulation domain, so that particles j in the neighborhood of particle i, depending on the velocity, are registered in a list, called the Verlet-table. These tables are then used depending on the velocity for a certain number of timesteps depending on the changes of the neighborhoods, or the velocities, which is simpler to obtain. If the particles move faster, the neighborhoods will change faster, and then the new Verlet-table has to be computed earlier, see Fig. 3.8. The amount of work is still of O(n2 ), but if the recomputation of the Verlet-table is only necessary every 100 timesteps, the prefactor 1/100 will speed up the computation significantly, but the neighborhood computation will still be the most time-consuming part of the simulation for large number of particles n. Verlet tables were originally derived for molecular potentials where the

58

GEOMETRY AND CONTACT DETECTION

interaction is relatively long range, so also for this reason they are not efficient for our type of simulation. We can estimate the efficiency of the neighborhood routine by comparing the closest packing density with the number of pairs which would result from a neighborhood routine. If we have an elongated particle which is 2.5 times longer than its “mean” radius, we would need a radius of at least 5 particles. If Verlet-tables are computed in two dimensions for a radius of 5 particles, there are about 25 possible collision pairs for each particles, compared to 6 pairs for a closest packing. In three dimensions, the same particle radius leads to 125 possible collision pairs, compared 12 possible pairs in a densest packing. As can be seen, Verlet tables become soon inefficient for elongated particles and higher dimensions, even if one neglects the additional O(n2 ) effort. kx−1

kx

kx+1

ky−1 i ky

ky+1

Fig. 3.9 Neighborhood table: Particle i in square (kx , ky ) (dark shading) may have overlaps with all particles in the lightly shaded squares. Elongated particles make the inclusion of more far-lying cells necessary.

Neighborhood tables Neighborhood tables (Fig. 3.9) are arrays in which the particles in a certain region are registered. Interactions are possible with particles in the neighboring array. The tables can be set up easily by e.g. dividing the particle coordinate by the cell size and rounding to the next integer. Depending on the size of the cells, there is a loop over the neighboring regions in the x−direction, a loop over the neighboring regions in the y−region, and, if more than one particle is located in a region, a loop over the particles inside the region. Often, if the particles are mono-disperse, the cell size is chosen so that only a single particle can fit in a cell for efficiency [98]. In that case, no additional list of particles in a

CONTACT DETECTION

59

Fig. 3.10 Sketch of a cell (filled by gray) and its neighboring cells needed to be checked in two dimensions and in three dimensions.

1

3 2 2

Fig. 3.11

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

A case only one particle is larger than the rest in neighborhood table.

cell is needed. In principle, this algorithm also need not be run in every timestep, but can be employed only every n time steps, with a certain “safety margin”. For too elongated particles, or too large size dispersions the algorithm becomes inefficient, because too many neighboring regions (most empty) have to be checked: While in three dimensions, only 8 neighboring cells can be sufficient for elongated particles, in three dimensions 3×9-1=26 neighboring cells have to be compared, as shown in Fig. 3.10. This again makes the effort prohibitive. If only few, known particles (e.g. walls) are much larger than the other particles, nevertheless the efficiency can be improved by going over the particle geometry and entering the particle coordinate into every cell which is overlapped, as shown in Fig. 3.11. Nevertheless, this will lead to varying list length of colliding particles which has to be set up after the neighborhood table has been initialized.

3.3.2

Neighborhood algorithm via sorting

Both Verlet-tables and neighborhood-tables have a disadvantage: At certain intervals, the whole neighborhood relation for all particles has to be recomputed from scratch, even if the particles only vibrate around an equilibrium position, i.e. do not move at all. It is more economic if the computational effort is limited to those particles which change their

60

GEOMETRY AND CONTACT DETECTION

relative position. Moreover, for the Cundall-Strack friction model (details in Chapter 5), it is necessary to increment data from a previous timestep for a pair of particles in contact. It would be very inconvenient if we would have to search previous and new pairs, whether the contact existed already before, and which value was registered for the tangential force. Fortunately, a neighborhood algorithm by sorting the axis-aligned bounding boxes for contact detection (“sort and sweep” [62]) can solve both problems: Its computational effort is proportional to the number of particles which change their relative positions, and for contacts which remain, data from previous timesteps can be easily retrieved for force computation. While in the Verlet- and neighborhood-tables the shape of the cells is independent of the shape of the particles, i.e. crucial information which can be used to improve the efficiency must be neglected, the bounding boxes take into account the information for each particle. Computer science enjoys discussing sorting algorithms in detail and at length. The usual assumptions are that the lists are not ordered, and the bounds given are usually worst case bounds. Nevertheless, in our simulation, we have already partially sorted lists (of the positions of granular particles), and on top of that, the possible changes in the list are rather limited, as particles under elastic interaction laws cannot move too far, usually only small fractions of their diameter in a single time-step. Therefore, sorting with binary exchange is sufficient, as we have to decide the possible interaction anyway on a particle-particle base. We don’t have to implement sophisticated sorting schemes, except for the initialization, where a linsort-like effort of O(n2 ) for a badly chosen initial configuration would be an unnecessary delay. We are also not affected by “worst case bounds”, because after the first time-step, our lists will be partially sorted and only the relative position changes of the position of the bounding box position make additional comparisons necessary. If a granular configuration would move so that all centers of mass have constant velocity, only an O(n) effort is necessary. Even though O(n2 ) has been postulated as worst case [99], in our simulations the effort will still be of O(n). In the field of design, robotics and solid modeling, where the “particles” are not necessarily convex, there is a huge research literature on the problem of “collision detection” (see [100] and references therein). Because we stick with convex polyhedra (even if in future codes non-convex polyhedra will be reduced to convex polyhedra), we can keep our algorithms and data structures simple.

Implementation for one dimension First we will explain how the “sort and sweep” algorithm works in one dimension for a change of the particle configuration for the time steps t0 , t1 and t2 in Fig. 3.12. A data structure of three linear arrays a, b, c, is used with two entries in the arrays for each particle i. Thus, each array has the length of twice the number of particles: a(k) contains the lower and upper bounding box values for each particle i,

CONTACT DETECTION

61

1

1

2

3

4

2

5

6

3

4

5

8

3

3

6

1

1 2

7

2

1

1

3

2

7

8

2

4

5

particle bounding box x 9 10 particle bounding box x 9 10

3

6 7

8

particle bounding box 9

10

x

Fig. 3.12 Configuration (position of the particles and of the bounding box) of a system at timesteps t0 (top), t1 (middle) and t2 (bottom).

b(k) contains -1 if the entry in a(k) is for the lower bound and +1 for the upper bound, c(k) contains the index i of the particle for the entry in a(k).

Initialization at t0 : In the first timestep, without overlapping particles, the positions of the bounding box are entered in ascending order into array a. For the sketch in Fig. 3.12 (above), the three arrays a, b, c have the following entries: index

(1)

(2)

(3)

(4)

(5)

(6)

a

0.5

2.0

3.5

6.0

7.0

9.5

b

-1

1

-1

1

-1

1

c

1

1

2

2

3

3

First timestep at t1 : Particles move to their new positions in Fig. 3.12 (middle). The old values for the bounding boxes are replaced with the new values: index

(1)

(2)

(3)

(4)

(5)

(6)

a

1.5

3.0

4.0

6.5

7.0

9.5

b

-1

1

-1

1

-1

1

c 1 1 2 2 3 3 Mark those entries which are not sorted appropriately (none in this timestep, as no overlap occurs). Second timestep at t2 : Particles move to new positions in Fig. 3.12 (below). The old values for the bounding boxes are replaced with the new values:

62

GEOMETRY AND CONTACT DETECTION

index

(1)

(2)

(3)

(4)

(5)

(6)

a

2.5

4.0

4.5

7.0

6.5

9.0

b

-1

1

-1

1

-1

1

c 1 1 2 2 3 3 Mark those entries which are not sorted appropriately. If not, sort array a so that it is again in monotonically ascending order. During the sorting, change also the corresponding entries in b and c together with a and register the corresponding particle indices in c (c(4) = 2, c(5) = 3) which exchanged their positions in the array a.

index

(1)

(2)

(3)

(4)

(5)

(6)

a

2.5

4.0

4.5

6.5

7.0

9.0

b

-1

1

-1

-1

1

1

c

1

1

2

3

2

3

Only if it is a lower bound which has moved below an upper bound (this is the case, b(4) = −1, b(5) = 1), there is a possible overlap between the two particles, so the particle pair (2, 3) is registered as a new entry in the contact list, a list of overlap particle candidates. This entry corresponds to the illustration in Fig. 3.13 (a). For the other cases in Fig. 3.13: If a lower bound would have moved above an upper bound (the case in (b)), there is no overlap between the two particles; If two lower bounds (the cases in (c) and (d)) or two upper bounds (the cases in (e) and (f)) would have exchanged positions, the bounding boxes of the two particles remain overlapping as the previous time step or steps).

Lower bounding−box of right par− Lower bounding−box of right par− ticle moves over upper bounding ticle moves below upper bounding box of left particle: intersection box of left particle: no intersection (a) (b)

For all other position changes of the bounding boxes in the list, the neighborhood relation does not change: (c)

(d)

(e)

(f)

Fig. 3.13 Configurations and relative movement of particles (the thinner arrow for the particle in white and the thicker arrow for the gray one), those which lead to a possible overlap (a), and those which separate overlapping particles (b) or which do not change the overlapping situation (c-f).

CONTACT DETECTION

63

We only register new pairs (Fig. 3.13 (a)) in the above procedures. For pairs whose bounding boxes remain in overlap, Fig. 3.13 (c-f), we keep a record of the old pairs and update it by eliminating those entries which have no overlap any more. Additionally, to allow for a simpler formulation of the indices, which uses a comparison with previous and subsequent list elements, a “sentinel” at both ends of the system is introduced, so that the linear array a of 2n elements lies between -maxval and +maxval, which simplifies the programming as no additional if-conditions are necessary to prevent going beyond the list end, see Fig. 3.14. a(1) a(2) . . . . a(2n) ⇓ a(0)

a(1) a(2) . . . . a(2n)

a(2n+1)





-maxval

maxval

Fig. 3.14 Embedding of the linear array a for the bounding box positions between a sentinel of extremal values to simplify the if-conditions and loop-iteration in the neighborhood algorithm.

Implementation for two dimensions (a)

(b)

(c) j

j

j i

i

i

Fig. 3.15 Relative movement of bounding-boxes in two dimensions: (a) New overlap in x-direction; (b) New overlap in y-direction and (c) New overlap both in x- and y-directions. When Baraff introduced the neighborhood algorithm via sorting axis aligned bounding boxes (“sort and sweep”) in one dimension [62], he did not explicitly explain what to do in higher dimensions and only refers to tree structures. In principle, in two dimensions the algorithm works as the same in one dimension. Apart from the fact that the size of the bounding box may vary if the particle rotates, there is another problem, see Fig. 3.15: While for one dimension the new particle pair (i, j) would enter only once in the neighborhood list for the x-coordinate, and in (b) only once for the y-coordinate, in (c) it would be entered twice in the list of possible contacting pairs, for the x- and the y-coordinate. Purging the lists of double entries is inconvenient. Instead, it is better to use the informa-

64

GEOMETRY AND CONTACT DETECTION

tion about the old bounding boxes in the previous timestep [101] as in the following piece of pseudocode:

1. If there is a new overlap in the x-direction for the pair (i, j), register it in the contact list. 2. If there is a new overlap in the y-direction for the pair (i, j), only register it in the contact list if there was an overlap of the bounding boxes in the x-direction already in the previous timestep.

With this scheme, the case in Fig. 3.15 (a) will be registered directly in the possible contacting particle list for the position changes of the bounding boxes along the x-direction. For the overlap case in Fig. 3.15 (b), it will be registered for the position changes of the bounding boxes along the y-direction, since their bounding boxes along the x-direction overlapped at the previous time step. For the overlap case in Fig. 3.15 (c), it will be registered for the position changes of the bounding boxes when sorting along the x-direction while it will not be registered again for the y-direction, since there was no overlap of the two bounding boxes along the x-direction at the previous time step. With this scheme, the problem of double entries is dealt with in two dimensions without additional computational complexity.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

Fig. 3.16 Relative movement of bounding-boxes in three dimensions where the overlap candidate pairs would only be entered once (a-c), twice (d-f) and three times (g) if no precautions are taken.

CONTACT DETECTION

65

Implementation for three dimensions For three dimensions, the situation is analogous to two dimensions, only there are more cases which may lead to double or multiple entries in the contact list. The algorithm works as in one dimension, except for the following cases: While in Fig. 3.16 (a) to (c), the new particle pair (i, j) would enter only once in the neighborhood list for the x-coordinate, and in (d) to (f) it would enter twice, in (g) it would enter even three times. Searching the lists for double or triple entries is even more inconvenient than in two dimensions. Again we can use the information about the old bounding boxes in the previous timestep as in the following piece of pseudo-code: 1. In the loop for sorting the x-direction, if there is a new overlap in the x-direction for the pair (i, j), immediately register it in the contact list. 2. In the loop for sorting the y-direction, if there is a new overlap in the y-direction for the pair (i, j), only register it in the contact list if there was an overlap in the bounding boxes in the x-direction already in the previous timestep, in the loop for sorting the y-direction. 3. In the loop for sorting the z-direction, if there is a new overlap found in the z-direction for the pair (i, j), only then register it in the contact list if there was overlap in the bounding boxes in the x-direction and in the y-direction already in the previous timestep. Applying this scheme to Fig. 3.16, we first check the new overlaps along the x-direction, (a), (e), (f) and (g), and record all of the pairs immediately in the loop for the x-direction. We then check the new overlaps along the y-direction for (b), (d), (e) and (g): In the loop for the y-direction, (b) and (d) are registered since they have overlaps of the bounding boxes in x-direction at the previous time step while (e) and (g) are excluded for not having overlaps. Lastly, we check the new overlaps along z-direction for (c), (d), (e) and (g): only (c) is registered for having overlaps both in x and y directions at the previous time step; (d) and (e) are not registered, as they had no overlap in either x or y direction and (g) did not have an overlap in both x and y directions at the previous time step. It is clear that with this scheme, no double (or triple) entry of pairs would occur in three dimensions. In principle, the subroutines for the computation of the contact list for the x-, y- and z-direction via sorting can be computed in parallel. From the point of load-balancing it should be noted that the loop for the y-component has to execute more comparisons than the loop for the x-component, and the loop for the z-component has to execute more comparisons than the loop for the y-component. The number of data which has to be dealt with (bounding boxes from the previous timestep) also increases for the loops in yand z-direction.

66

GEOMETRY AND CONTACT DETECTION

3.3.3 Refinement of the contact list We have seen in the previous sections, that the neighborhood algorithm is just a “rough” estimate for a possible overlap between the registered candidate pairs, or to be more precise, it is just an indication of the overlap of two bounding boxes, not for an overlap of the two particles, as shown in Fig. 3.7 (c). Next we introduce two algorithms dedicated to refine the contact list. In the neighborhood algorithm, only the coordinates of the vertices are used (and the extrema along the x-, y- and z-axis are used as the bounding box) while for refinement via projection additionally the centers of mass of the particles will be taken into consideration. We use figures of polygons in two dimensions to explain the algorithms for the correspoinding polyhedra in three dimensions. While the contact list obtained from the bounding boxes must be retained as it is, as it makes used of the information at previous timesteps, nevertheless, a list of less pairs can be passed for the computation of the overlap polyhedron.

Refinement via bounding spheres

(a)

(b)

(c)

Fig. 3.17 Refinement of the contact detection results via bounding circles in two dimensions: (a): Intersecting bounding circles (dotted line) for overlapping bounding boxes (dashed line) and overlapping polygonal particles; (b): Non-intersecting bounding circles (dotted line) for overlapping bounding box (dashed line) but non-overlapping polygonal particles; (c): Intersecting bounding circles (dotted line) and overlapping bounding box (dashed line) for non-overlapping elongated polygonal particles: Neither bounding boxes nor bounding circles are good indicators for the overlap.

The bounding boxes overlap in many cases where the particles don’t: The issue with the axis aligned bounding boxes is that their overlap depends on the axis orientation. For particles whose centers of mass lie along the ±x ± y directions relative to each other, bounding boxes become unreliable. Therefore, we add another algorithm in the computation which is independent of the axis orientation. A bounding sphere for a particle is used with its center at the center of mass of the particle which encloses the whole particle. For a candidate pair, with centers of mass at c1 and c2 , a possible overlap of the bounding spheres with radii r1 and r2 is computed: If the bounding spheres overlap (||c1 − c2 ||

CONTACT DETECTION

67

< r1 + r2 ), the particle pair is kept in the list, e.g. Fig. 3.17 (a); if no overlap (or just osculating contact) exists between the spheres (||c1 − c2 || ≥ r1 + r2 ), this pair is removed from the list, which is passed to the function for the computation of the overlap polyhedron, e.g. Fig. 3.17 (b). For particles which are not elongated the use of the bounding sphere is suitable, similar to the cases for bounding circles in Fig. 3.17 (a) and (b). As we have outlined the numerical problems with intersection algorithms of ellipses in section 1.4.1, it makes no sense to try to generalize bounding spheres to bounding ellipsoids. For very elongated particles (see Fig. 3.17 (c)), we have to resort to the following projection algorithm.

Refinement via projection of the extremal vertices

(a)

(b) maximal projection of P1

c2

minimal projection of P2

c2

c1 minimal projection of P2

c1 maximal projection of P1

Fig. 3.18 Refinement of the contact detection results via projection in two dimensions: (a) Overlap of the minimal projection of Polygon 1 and the maximal projection of Polygon 2, this pair is kept in the contact particle pair list; (b) No overlap of the minimal projection of Polygon 1 and the maximal projection of Polygon 2, this pair is removed from the contact list which is passed to the computation of the overlap polyhedron.

The projection algorithm compares the protrusion of the vertices of the two particles along the line connecting their centers of mass, e.g. see Fig. 3.18 for polygonal particles. The whole algorithm is as follows for two polyhedra P1 and P2 with the centers of mass as c1 and c2 : 1. Compute the unit base vector starts from c1 pointing to c2 : (c2 -c1 )/||c2 -c1 ||; 2. Compute the projections of the vertices of P1 and of P2 onto the unit base vector; 3. Find the maximum of the projections of P1 (max_projection_P1); Find the minimum of the projections of P2 (min_projection_P2);

68

GEOMETRY AND CONTACT DETECTION

4. if max_projection_P1 > min_projection_P2, keep the particle pair in the contact list; otherwise remove the pair from the list. As can be seen in Fig. 3.18 (b), the projection algorithm can deal with particles with very elongated shape. While the computation of the projections for the vertices (vector inner product for each vertex) needs more computational effort than comparing the bounding spheres (scalar comparison of the radii only once). Though the projection method could alone detect and remove those pairs found by the bounding spheres method, in our DEMmethod we use both algorithms, first using the bounding spheres method to “sieve” the possible contact particle pairs roughly and then use the projection method for further refinement.

Chapter 4

Overlap Computation In our DEM simulation of polyhedral particles, we need the full overlap geometry (volume, center of mass, contact area) of two contacting particles. This is computationally different from and rather more complex than the effort needed in other algorithms for polyhedral particles [53, 54, 55] which use the “penetration depth” to evaluate the contact force. To compute the overlap polyhedron, we have to obtain its vertex coordinates and its faces, which are needed for representing polyhedron in a DEM simulation, as described in Chapter 3. Though this problem looks very elementary from the point of stereometry/ three-dimensional geometry, there are no standard algorithms in the literature. Additionally, the contact area of the two overlapping polyhedra is needed for the force modeling (in Chapter 5), which is generally not available from commercial packages (the algorithms of the MATLAB GBT toolbox1 are not public, but for “general dimensions” and need additionally an error tolerance). Apart from the paper by the author of the GBT-Toolbox, error control in polytope computations [102] we know of no algorithm for the computation of the overlap of polyhedra. Thus in this chapter, we describe the overlap computation algorithm we developed for DEM simulations of polyhedral particles with the full overlap polyhedron rather than a mere distance computation [63] or penetration depth computation [39, 40]. We start with the algorithms for computing the intersections for two triangles and then explain how the vertices, faces and contact area are found for two intersecting polyhedra. Finally, we will introduce an algorithm to optimize the vertex computation. We deal with the intersection of two convex polyhedra, for the simulations of the particles as non-convex polyhedra, we would compose the particles of convex polyhedra. The overlap of the two convex polyhedra is convex again. We assume that no polyhedron pierces the other one, i.e. that the intersection between each of the original polyhedra with the overlap polyhedron is a singly connected domain. This will be sufficient for not too sharp angles, which would not be mechanically stable for brittle materials anyway. 1

http://www.sysbrain.com/gbt/

69

70

OVERLAP COMPUTATION

4.1 Triangle intersection computation To compute the vertices of an overlap polyhedron of two polyhedra A and B, we need to compute the intersection of the faces of A with the faces of B. We choose triangles to represent the faces of a polyhedron (described in Chapter 3) which is easier than for two general polygons. Polyhedra with polygonal faces can always be triangulated to apply the triangle intersection computation. We discuss two different approaches for the intersection computations of triangles in arbitrary three-dimensional orientation, in the point-direction and the point-normal form.

4.1.1 Intersection algorithm via point-direction form If we represent a triangle T by adding to the foot point f1 a λ1 -fold multiple of vector a and to the tip of the latter the λ2 -fold multiple of vector b we represent a triangle via the point-direction form T = f1 + λ1 a + λ2 b.

(4.1)

The area inside the triangle given by the relations 0 < λ1 < 1 and 0 < λ2 ≤ λ1 . The asymmetry for the range of the λi is due to the fact that the second vector b is added to the top of a, not to the foot point, as shown in Fig. 4.1.

b e

O

a

f

Fig. 4.1 A point-direction form representation for a triangle T: O is the origin, f the foot point, a and b the two edges vectors to construct the equation in the point-direction form Eq. (4.1).

We first explain the computation of intersection of an edge E with a triangle T. An edge E can be represented by adding the λ3 -fold multiple of a vector c to a foot point f2 so that we have E = f2 + λ3 c,

(4.2)

where the segment without the endpoints is given by 0 < λ3 < 1. The intersection of triangle T with edge E is given by T = E, so that f2 + λ3 c = f1 + λ1 a + λ2 b,

TRIANGLE INTERSECTION COMPUTATION

or, in matrix form,

 [a |

71



λ1

  b − c]  λ2  = f2 − f1 . {z } λ3 A

(4.3)

The vector λ with components λi , i = 1, 2, 3 can then be obtained in backslash-terminology2 [103], which indicates the left-division, as λ = A\ (f2 − f1 ) .

(4.4)

Whether the intersection point is really between the endpoints of the line and the triangle can be decided by verifying that the λi fulfill the conditions 0 < λ1 < 1,

(4.5a)

0 < λ2 ≤ λ 1 ,

(4.5b)

0 < λ3 < 1.

(4.5c)

Thus, one way to compute the intersection of two triangles is to solve Eq. (4.4) for all 3×2 edge-triangle-intersections. As in the intersection of the two triangles, the equations are

b2 2

b1

e2

a2

e1 a 1 f1

f2

O Fig. 4.2

. Point-direction forms (Eq. (4.1)) for two intersecting triangles.

not all independent, a more concise formulation is possible, which needs less operations than the intersection computation of one triangle with three independent line segments. 2

Backslash-notation is popular in numerical linear algebra as it is a reminder that the left division needs only one LU-decomposition and one backward insertion. In contrast, left-multiplication with the inverse of matrix A, would need rank(A) backward insertions and is therefore computationally more costly.

72

OVERLAP COMPUTATION

For two triangles T1 and T2 which can be represented as T1 = f1 + λ1,1 a1 + λ1,2 b1 T2 = f2 + λ2,1 a2 + λ2,2 b2 with 0 ≤ λi,1 ≤ 1, and 0 ≤ λi,2 ≤ λi,1 . For the intersection points which are located both on T1 and T2 , we have T1 = T2 which gives f1 + λ1,1 a1 + λ1,2 b1 = f2 + λ2,1 a2 + λ2,2 b2 ,

(4.6)

which is a three dimensional vector equation in four unknowns (the geometrical representation can be seen in Fig. 4.2). Exchanging items between the two sides, we get λ1,1 a1 + λ1,2 b1 − λ2,2 b2 = f2 − f1 + λ2,1 a2 in matrix form

 [a1

b1

λ1,1



  − b2 ]  λ1,2  = f2 − f1 + λ2,1 a2 . λ2,2

Rewriting the direct sum of the vectors on the left as A = [a1

b1

(4.7) − b2 ], again allows

left-division so that we obtain 

λ1,1



   λ1,2  = A\ (f2 − f1 ) + A\a2 λ2,1 , | {z } | {z } c λ2,2 d

(4.8)

which means that only one LU-decomposition must be computed, and it must be subsequently applied to the two right hand sides3 , f2 − f1 and a2 , to obtain the affine equations for λ1,1 , λ1,2 , λ2,2 depending on λ2,1 . In contrast, six LU-decomposition must be computed if we compute all six edge-triangle-intersections through Eq. (4.4). Next we have to deal with the λ2,1 -dependence of Eq. (4.8). We can rewrite the vector-equation Eq. (4.8) in components as λ1,1 = c1 + d1 λ2,1 ,

(4.9)

λ1,2 = c2 + d2 λ2,1 ,

(4.10)

λ2,2 = c3 + d3 λ2,1

(4.11)

and solve these equations by inserting the λi,j for the edges of the triangles. To compute the overlap polyhedron later on, we need the intersection points of one triangle edges with 3

The MATLAB-routine A\[e f] for vectors e, f should be able to apply the same solution simultaneous to two different “right hand sides” e, f. The equivalent routine in the LAPACK library [104] is xGESV where x is replaced by S if variables are single-precision floating numbers, by D for double precision and C for complex numbers.

TRIANGLE INTERSECTION COMPUTATION

73

the face of the other triangle, and vice versa. We separate the consideration for intersection points on the boundary (“A”) and inside (“B”). The possible cases for the intersection points of triangle T2 on the edges of triangle T1 are the following: A1. If the intersection point is on a1 , the coefficient for b1 , λ1,2 must be 0. Then, according to Eq. (4.10), λ2,1 = −c2 /d2 and therefore λ1,1 = c1 − d1

c2 . d2

A2. If the intersection point is on b1 , the coefficient for a1 , λ1,1 , must be one. In that case, from Eq. (4.9) λ2,1 = (1 − c1 )/d1 and from Eq. (4.10) we finally obtain λ1,2 = c2 + d2

1 − c1 . d1

A3. If the intersection point is on e1 = a1 + b1 , it follows that λ1,1 = λ1,2 . Then we have from Eqs. (4.9,4.10) c1 + d1 λ2,1 = c2 + d2 λ2,1 , or λ2,1 =

c2 − c1 . d1 − d2

If the intersection points are in the triangle T1 along the edges of triangle T2 , the intersection conditions are the following: B1. If the intersection is along the edge a2 , λ2,2 = 0: We solve λ2,1 = −c3 /d3 , from Eq. (4.11). B2. If the intersection is along the edge b2 , λ2,1 = 1 : We solve λ2,2 = c3 + d3 , from Eq. (4.11) B3. If the intersection is along the edge e2 = a2 + b2 , then λ2,1 = λ2,2 : We solve λ2,1 = c3 /(1 + d3 ), from Eq. (4.11), Finally, we have to ascertain that 0 ≤ λi,1 ≤ 1, and 0 ≤ λi,2 ≤ λi,1 to make sure the intersection points are really in the triangles.

4.1.2

Intersection algorithm via point-normal form

In the previous section 4.1.1, the disadvantage of the point-direction form for planes in the edge-triangle intersection computation is the necessity of multiple calls to the “expensive” LU-decomposition (for the matrix inversion) to solve Eq. (4.4), while the direct triangle-triangle intersection computation (solving Eq. (4.8)) which needs only one LUdecomposition is obvious superior. However, a problem with the intersection algorithm

74

OVERLAP COMPUTATION

via the point-direction form is the possible singularity of the matrix A in Eq. (4.4) or in Eq. (4.8). Since there is no prior information about the matrix A, checking whether it is of full rank or ill-conditioned introduces additional computational effort. In the following, we introduce an intersection algorithm via the point-normal form for planes. It needs only a scalar instead of vector equations and comprises simpler operations such as vector inner products and cross products and is also more robust when dealing with “degenerate cases” (Fig. 4.3 (b), (c) and (f)).

Triangle-plane intersection in point-normal form We first explain the triangle-plane intersection computation and turn to the triangletriangle intersection computation later. The possible relative positions of a triangle and a plane are shown in Fig. 4.3: The cases in (b), (c) and (f), we call “degenerate cases” since the intersection points can be simply determined by a point on plane test (via substituting the vertex coordinates to the plane equation). As for these cases, the resulting overlap volumes would be negligible, we neglect them in our algorithm. We only discuss here how to compute the intersection points for the cases where at least one of the intersection point is not the vertex, the cases in (d) and (e). Let’s recall the point-normal form for a plane,

(a)

(d)

(b)

(e)

(c)

(f)

Fig. 4.3 Possible relative positions of a triangle and a plane, the black dot indicates the intersection point: (a) Separation: no intersection; (b) Touching: one intersection point (which is a vertex); (c) Collinear: two intersection points (of which both are vertices); (d) Intersection: two intersection points (of which one is a vertex); (e) Intersection: two intersection points (of which none is a vertex); (f) Coplanar: three intersection points (of which all are vertices).

Eq. (3.1), introduced in Chapter 3 on the geometry of polyhedron, n · r − d = 0.

(4.12)

TRIANGLE INTERSECTION COMPUTATION

75

The equation of a line which passes through point V1 (V1x , V1y , V1z ) and V2 (V2x , V2y , V2z ) (equivalent to a vector V1 V2 = V2 − V1 ) can be written in point-parameter form as r = V1 + λ · V1 V2 ,

(4.13)

where the scalar parameter λ determines the position of a point on the line. The intersection point is determined by substituting the line equation (Eq. (4.13)) in to the plane equation (Eq. (4.12)) and solving for the parameter λ λ=

d − n · V1 , n · V1 V2

(4.14)

when n · V1 V2 ̸= 0. The values of the parameter λ give information of the geometric relation of the triangle (which has an edge V1 V2 ) and the plane (with its normal n and distance to origin d): 1. 0 < λ < 1: r =V1 +λ · V1 V2 is the intersection of the line and the plane, as the case in Fig. 4.3 (e) or where the intersection point is not a vertex (d). 2. λ = 0 or λ = 1: V1 or V2 is on the plane as in Fig. 4.3 (b), (c) and the intersection point which is a vertex in (d). 3. λ > 1 or λ < 0: There is no intersection, as in Fig. 4.3 (a). 4. No solution for λ (n · V1 V2 = 0): V1 V2 is on (or parallel to) the plane, as in Fig. 4.3 (c) or (f). For a triangle T with vertices (V1 ,V2 ,V3 ) and a plane of n and d, we can obtain the λ for all the edges V1 V2 , V1 V3 , V2 V3 and apply the criteria above to check the number of intersection points and obtain the intersection points (if they exist) accordingly. Intersection of two triangles The intersection points of two triangles T1 (on the plane P1 ) and T2 (on the plane P2 ), when we have obtained two intersection points of T1 and P2 , noted as Vi1 and Vi2 , and two intersection points of triangles T2 and P1 , noted as Vi3 and Vi4 , as shown in Fig. 4.4, can be determined as follows: When the four intersection points are available, an intuitive way to determine whether two triangles intersect is to check whether at least one of Vi1 and Vi2 is inside the triangle T2 or at least one of Vi3 and Vi4 is inside the triangle T1 . However this is rather tedious. Sorting would be more favorable as the four intersection points should lie on the intersection line of the two planes. To determine whether two triangles intersect can be simplified by the relative positions of the two segments, Vi1 Vi2 and Vi3 Vi4 . If Vi1 Vi2 and Vi3 Vi4 overlaps, the two triangles intersect and the intersection points are the inner two of the four points sorted. The possible relative positions of the

76

OVERLAP COMPUTATION

T2

Vi2

Vi1

Vi3

T1

Vi4

P2

P1

Fig. 4.4 Two triangles T1 and T2 and the planes P1 and P2 : Vi1 and Vi2 are the intersection points of T1 and P2 ; Vi3 and Vi4 are the intersection points of T2 and P1 . Whether T1 and T2 have intersections can be determined by the relative positions of the four intersection points on the intersection line of P1 and P2 , see Fig. 4.5.

Vi1

Vi2

Vi3

Vi4 No intersection

Vi1

Vi3 Vi2

Vi4 Intersection points: Vi3Vi2

Vi3

Vi1 Vi4

Vi2

Vi1

Vi3

Vi4 Vi2

Vi3

Vi1

Vi2 Vi4

Intersection points: Vi1Vi4 Intersection points: Vi3Vi4 Intersection points: Vi1Vi2

Fig. 4.5 The sorted intersection segments of two triangles (in solid line with black dot ends), Vi1 Vi2 of T1 and Vi3 Vi4 of T2 , the middle two points in the sorted list are the intersection points if Vi1 Vi2 and Vi3 Vi4 overlaps.

two intersection segments are shown in Fig. 4.5. Vi1 Vi2 and Vi3 Vi4 can be sorted with the points from the point-parameter form of a line, Eq. (4.13). If we substitute Vi1 and Vi2 into the equation as V1 and V2 , we can obtain the parameters λ3 and λ4 for Vi3 and Vi4 from Vi3 = Vi1 + λ3 (Vi2 − Vi1 ),

(4.15a)

Vi4 = Vi1 + λ4 (Vi2 − Vi1 ).

(4.15b)

According to the values of λi (λ1 = 0 for Vi1 and λ2 = 1 for Vi2 ), we can determine the relative positions of the four points on the intersection line: There are three ranges for λ3 (and λ4 ): λ3 < 0, 0 ≤ λ3 ≤ 1 and λ3 > 1, thus nine combinations of λ3 and λ4 . Among these cases, only when λ3 > 1 & λ4 > 1 or when λ3 < 0 & λ4 < 0 indicates that there is no intersection of the two triangles. In the other seven cases, the intersection segment can be determined. While Fig. 4.5 only gives four intersection cases since the sequence

TRIANGLE INTERSECTION COMPUTATION

77

between Vi3 and Vi4 looks sorted in the graphical representation, they still have to be sorted in the algebra of the computer program. A simpler way to sort the four points is based on the projections of points on a line along the coordinates axes would not change their order, thus solving for λ3 and λ4 in Eq. (4.15) becomes unnecessary. We can instead sort one of the non-zero coordinates of the four points. In other words, instead of sorting the intersection points along the intersection line of the two planes, we sort them along the projections to one of the coordinate axes. Algorithm 4.1 Compute triangle-triangle intersection points with Eq.4.14 1:

subroutine compute_triangle_intersection

2:

compute the three λ for T1 and P2

3:

if find two intersection points Vi1 , Vi2 then

4:

compute the three λ for T2 and P1

5:

if find two intersection points Vi3 and Vi4 then

6:

Find a non-zero coordinate axis for Vi1 , Vi2 , Vi3 and Vi4

7:

Sort Vi1 and Vi2 in ascending order

8:

Sort Vi3 and Vi4 in ascending order

9:

if max(Vi1 , Vi3 ) < min(Vi2 , Vi4 ) then

10:

{% Vi1 Vi2 and Vi3 Vi4 overlaps}

11:

Output intersection points max(Vi1 , Vi3 ) and min(Vi2 , Vi4 ) else if max(Vi1 , Vi3 ) = min (Vi2 , Vi4 ) then

12: 13:

{The two points in middle coincide}

14:

Only one intersection point (an edge-edge intersection) else

15:

No intersection

16:

end if

17: 18:

else No intersection or degenerate cases for T1 and T2

19: 20: 21: 22:

end if else No intersection or degenerate cases for T1 and T2

23:

end if

24:

end subroutine compute_triangle_intersection We summarize the intersection computation procedure (as a subroutine in pseudo

code) based on the point-normal form of plane equation for two triangles T1 and T2 lying on the planes P1 and P2 (Fig. 4.4) and give a few comments about the implementation in real practice, as shown in Algorithm 4.1: 1. In line 2 and line 4, we don’t need to compute all three λ if we already found the two intersection points (since we are not interested in detecting to which degenerated cases the two triangles belong);

78

OVERLAP COMPUTATION

2. In line 3 at least one of Vi1 and Vi2 is not the vertex of T1 and in line 5 at least one of Vi3 and Vi4 is not the vertex of T2 . Otherwise, it is a degenerate case that an edge of one triangle is on the plane of the other triangle; 3. In line 11, after we obtained two intersection points, an edge-to-edge intersection test (similar to line 12) should be done for the pair (Vi1 , Vi3 ) and the pair (Vi2 , Vi4 ) to check whether the intersection points are from edge-edge intersection. 4. In line 12, we use the “=” sign to indicate “equality” of the coordinates of two points for simplicity. In good programming practice, the equality comparison of two floating numbers should always be avoided [105], instead, an inequality comparison is applied for the difference of the two variables with respect to a certain acceptable tolerance. In our case, the coordinate difference of the order of 10−14 (an absolute error is used since our particle length usually would not less than 10−3 m) can be regard as “acceptable” tolerance for the “same” point. 5. In line 14, we obtain only one intersection point and the two triangles will be treated as no intersection in our DEM simulation. Comparison of the two algorithms The algorithm which uses the point-direction form in section 4.1.1 will be referred as the point-direction triangle intersection algorithm (PDTIA for short). The other one using the point-normal form, will be referred as the point-normal triangle intersection algorithm (PNTIA). We implemented both algorithms in MATLAB and in FORTRAN. The former one is generally faster than the latter in MATLAB but slower in FORTRAN because MATLAB is optimized for matrix computations. Table 4.1 Performance comparison of PDTIA (without checking the rank of matrix A in Eq. (4.8) for LU decomposition) and PNTIA in FORTRAN code for test triangle pairs with vertex differences of the order of 10−10 . The triangles are prepared according to Eq. (4.16). Number of test 1,000 10,000 100,000 1,000,000 10,000,000

Computation Time PDTIA PNTIA 1.62E-03 1.56E-03 1.58E-02 1.57E-02 0.15 0.15 1.55 1.52 15.57 15.25

Intersecting pairs PDTIA PNTIA 731 731 7506 7506 74833 74833 750826 750826 7502474 7502474

Order of difference 4.09E-07 4.51E-07 4.49E-07 1.36E-06 1.59E-06

The comparison of the two algorithms in (compiled) FORTRAN code is summarized in Table 4.1 and 4.2. We want to test our algorithms with small intersections, to make sure that the algorithm can be used here, as the overlap is small, usually at the initial

TRIANGLE INTERSECTION COMPUTATION

79

Table 4.2 Performance comparison of PDTIA (with rank check via QR decomposition) and PNTIA. That the number of intersecting pairs found by the two algorithms are different is due to the reason that when the matrix A (in Eq. (4.8)) is ill-conditioned (abs(Rii ) < 10−12 ) the computation is terminated in PDTIA. R is the result of QR decomposition of the matrix A. Number of test 1,000 10,000 100,000 1,000,000 10,000,000

Computation Time PDTIA PNTIA 2.51E-03 1.56E-03 2.43E-02 1.57E-02 0.240 0.15 2.40 1.52 24.31 15.25

Intersecting pairs PDTIA PNTIA 719 731 7405 7506 73865 74833 741271 750826 7407744 7502474

Min(abs(Rii )) PDTIA 1.25E-12 1.14E-12 1.01E-12 1.00E-12 1.00E-12

stage of a contact or at the end of a contact, when the elastic moduli of the particles are large. We thus prepare the triangles with T1 = rand(3) − 0.5,

(4.16a)

T2 = T1 + 10−n · (rand(3) − 0.5),

(4.16b)

in which rand(3) is a call to a random number generator which generates different random 3×3 matrices with entries evenly distributed in the range [0,1]. We shifted the distribution range of the random numbers to [-0.5, 0.5] so that the variation term 10−n · (rand(3) − 0.5) would not increase the coordinate monotonously (which then only push T2 away from T1 rather than introduce intersections). We choose n = 10 so that the coordinate difference between the two triangles are of the order of 10−10 . We compare the PDTIA under two conditions, without and with a rank check of the matrix A (by examining the minimum of the diagonal entries of the R matrix from QR decomposition4 of A) in Eq. (4.8) before the LU decomposition. As can be seen from Table 4.1, the computation time of the two algorithms are almost the same (when the deficiency of the rank of the matrix A is not checked, which would be risky for a granular simulation), while with rank check, the PNTIA is faster than PDTIA as shown in Table 4.2. The deviations between the two algorithms are of the order of 10−6 , as shown in Table 4.1. While the minimum values of the diagonal entries of R (as a result of the QR decomposition of A) is of the magnitude of 10−12 , as shown in Table 4.2, since the remaining two diagonal entries are usually of the order of 10−1 , the condition number for A is about 1011 (10−1 /10−12 ), which means we loose about 11 digits when we use the PDTIA algorithm for those test cases for intersecting triangles with coordinates variations of order 1010 , which also explains the differences of the two algorithms are up to 10−6 (since the precision of the double precision type floating point data is about 10−16 ). Therefore, we will use the PNTIA algorithm in our DEM simulation, as it is also more stable and the accuracy is usually 4

Done by the QR factorization subroutine DGEQRF from the LAPACK library [104].

80

OVERLAP COMPUTATION

better. Considering that we already have the point-normal information for the triangular faces (stored in the FACE_EQUATION array, explained in section 3.1.1), the PNTIA algorithm can skip the computation for the plane equation, which makes it more efficient and compatible in the whole DEM code.

4.2 Vertex computation

For vertex computation, we distinguish between the “inherited” and the “generated” vertices of an overlap polyhedron, since the methods to compute their coordinates are different. By inherited vertices, we mean the vertices of one polyhedron penetrating into the interior of the other one, becoming vertices of the overlap polyhedron. The generated vertices are the intersection points of the triangulated faces of the two polyhedra, e.g. see Fig. 4.6. In this section, we denote the two intersecting polyhedra as P1 and P2 and discuss how to compute the coordinates of the inherited and generated vertices of their overlap polyhedron Po .

P1

P2

Fig. 4.6 Overlap of two irregular-shape polyhedra P1 and P2 : the circles are inherited vertices from P1 and P2 and the stars are generated vertices from the intersection of the triangular faces of P1 and P2 .

VERTEX COMPUTATION

4.2.1

81

Inherited vertices

The distance of a point r′ (rx′ , ry′ , rz′ ) to a plane represented in the point-normal form n · r − d = 0 (Eq. (4.12)) is d′ = n · r′ − d.

(4.17)

This distance is not a norm, but an oriented distance with the sign indicating the relative position of the vector of the point with respect to the normal of the plane. The function for the distance computation after Eq. (4.17) for a point V(Vx , Vy , Vz ) and a plane F(n, d) dist(V, F) = n · V − d,

(4.18)

will be used in the pseudocode to explain the computation of the inherited vertices of Po . As stated in section 3.1.1, we select the normals of all the faces towards the outside of a polyhedron, and define the interior of the polyhedron according to Eq. (3.5). In terms of dist(V, F), any points V(Vx , Vy , Vz ) located inside a polyhedron should satisfy dist(V, Fk ) < 0,

(4.19)

where the subscript k indicates the k-th face of the polyhedron. Thus, if a vertex V of P1 satisfies Eq. (4.19) for all the faces Fk of P2 (or vice versa), it is an inherited vertex of the overlap polyhedron Po . Algorithm 4.2 Compute the inherited vertices 1:

{Check vertex of P1 with face of P2 }

2:

for all the vertices Vi of P1 do

3:

for all the faces Fk of P2 do

4: 5: 6:

if abs(dist(Vi , Fk )) > ϵd then Exit the inner loop for faces {Vi is outside of P2 } end if

7:

end for

8:

Record Vi in the list of inherited vertices from P1

9:

end for

10:

{Check vertex of P2 with face of P1 }

11:

for all the vertices Vi of P2 do

12:

for all the faces Fk of P1 do

13: 14: 15:

if abs(dist(Vi , Fk )) > ϵd then Exit the inner loop for faces {Vi is outside of P1 } end if

16:

end for

17:

Record Vi in the list of inherited vertices from P2

18:

end for

82

OVERLAP COMPUTATION

We can find all inherited vertices of Po by checking the vertices of P1 with the faces of P2 and vice versa, as shown in Algorithm 4.2, in which a precision ϵ for the distance check is included. For a chosen precision, e.g. an absolute error ϵ = 10−14 , a point whose distance to a plane is within ±ϵ will be regarded as on the plane. We only treat those vertices which are penetrating into the other polyhedron further than ϵ as inherited vertices. As a result of Algorithm 4.2, we obtain a list of the vertices of P1 inside P2 and a list the vertices of P2 inside P1 , with which we not only obtain the coordinates of the inherited vertices but also the topological information about the faces on which those vertices are located.

4.2.2 Generated vertices

F1

F1

Vi

Vi (a)

E

(b) E

F2

Fig. 4.7 An intersection point may be recorded twice in the list of intersection points in the computation of the overlap polyhedron based triangle intersection computation problem: (a) the triangle-plane intersection case from Fig. 4.3 (e) is checked again for the overlap computation, the edge E of a triangular face F1 intersects with the shade triangle from the other polyhedron at Vi ; (b) the face F2 which shares E with F1 intersects with the shaded triangle also at Vi . Thus the intersection point Vi will be recorded twice in the loop to compute the triangle intersections of two polyhedra.

For the generated vertices of Po , as indicated by stars in Fig. 4.6, which are the intersection points of the triangular faces of P1 and P2 , we need resort to the triangle intersection algorithm introduced in section 4.1.2. In the current code, the PNTIA Algorithm 4.1, which uses the point-normal form to represent a plane, is applied. We can compute the generated vertices by brute-force, i.e. first computing the intersections for all the faces of P1 with all the faces of P2 . For two polyhedra with nf faces each which is O(n2f ) operations, as shown in Algorithm 4.3. Then we index the intersection points as the generated vertices, as shown in Algorithm 4.4.

VERTEX COMPUTATION

83

P2

F2 F2 F1

F1

P1 Fig. 4.8 The degenerate cases in the overlap polyhedron computation: one intersection point of the two triangular faces F1 and F2 stems from the left edge-edge intersection and two points stem from the right edge-edge intersections. The circles are the intersection points from the edge-plane intersection while the black dots are from the edge-edge intersection. As can be seen in Fig. 4.7, one edge is shared by two triangular faces, which means that for one intersection point from edge-edge intersection, for the edge of F1 the intersection point would be recorded twice, as well as for the edge of F2 .

Algorithm 4.3 Compute all the intersection points (brute force) 1:

{Compute generated vertices: Part I}

2:

set num_int_pair = 0 {the number of intersecting face pairs}

3:

for all the faces F1i of P1 do

4:

for all the faces F2k of P2 do

5:

call compute_triangle_intersection ( F1i , F2k ) { defined in Algorithm 4.1}

6:

if find two intersection points Vint1 and Vint2 then

7:

num_int_pair = num_int_pair + 1

8:

{record the (F1i , F2k ) in a list of contact face pairs}

9:

contact_f ace_pair(1 : 2, num_int_pair)=(F1i , F2k )

10:

{record the two points in a list of intersection point pairs}

11:

intersect_point_pair(1 : 2, num_int_pair)=(Vint1 , Vint2 )

12: 13: 14:

end if end for end for

From Algorithm 4.3, we obtain a list of pairs of intersection points which is then used by Algorithm 4.4 to determine each generated vertex and its coordinates. In addition, we also get a list of pairs of intersecting faces, contact_f ace_pair in line 9, which will be used for determining the faces of the overlap polyhedron and the contact line. We need to index the intersection points as the generated vertices, else each generated vertex would enter the list of intersection point pairs (intersect_point_pair in line 11 of Algorithm 4.3) at least twice. A intersection point of two triangles, most of the time would come from intersection of an edge of one triangle with the other triangle, e.g. see Fig. 4.3 (e). For a polyhedron, an edge

84

OVERLAP COMPUTATION

is always shared by two triangular faces. Thus, if a face of polyhedron P1 intersects a face of polyhedron P2 via one of its edges, the face of P1 which shares the same edge would also intersect the same face of P2 and report the intersection point again, as shown in Fig. 4.7. This is the usual case for our polyhedral intersection computation, but exceptional cases may also occur, i.e. edge-edge intersection (in the discussion of Algorithm 4.1, see also in Fig. 4.8). If an intersection point is from an edge-edge intersection, it might enter the intersect_point_pair list four times. We will refer to such cases as degenerate cases for overlap computation (though it would not be exceptional from the point of view of triangle intersection computation), which would necessitate additional arrangement to index the generated vertices. The reason will usually be a penetration of two particles which can be regarded as unphysical, caused by e.g. too large time step or wrong initialization of unintended overlap of particle positions. Thus we need to index the generated vertices in the intersection point pair list (intersect_point_pair) to identify the generated vertices and record their coordinates. Simultaneously, we also obtain a list of segments in terms of the generated vertex indices, the intersect_point_pair_idx list in Algorithm 4.4, which is used to determine the contact line. Algorithm 4.4 Index the intersection points as generated vertices 1:

{Compute generated vertices: Part II}

2:

{set the first two intersection points as the first two generated vertices}

3:

vert _gen(1:2) = intersect_point_pair(1:2,1)

4:

{set the indices of the generated vertices for the intersection point pairs}

5:

intersect_point_pair_idx(1:2,1) = (1:2)

6:

set vert_idx=2 {initialize the index of the generated vertices}

7:

for i = 2 to num_int_pair do

8: 9: 10:

for j = 1 to 2 do Vcheck = intersect_point_pair(j, i) if Vcheck is not in the vert _gen list then

11:

vert_idx=vert_idx+1

12:

vert _gen(3)= Vcheck

13:

intersect_point_pair_idx(j, i)= vert_idx

14: 15: 16:

end if end for end for

The brute-force approach (Algorithm 4.3) is of the computational complexity of O(n2f ), which means for nf i (i = 1, 2) faces of polyhedron Pi , in total nf 1 · nf 2 triangle pairs are computed to obtain the intersection points and return no intersection most of the time. To reduce the simulation time, algorithms which can significantly reduce the triangle pairs for computing both the inherited and generated vertices will be discussed in section 4.4. For the time-being, with the result from brute-force methods (Algorithm 4.2 for inherited

FACE AND CONTACT AREA DETERMINATION

85

vertex and Algorithm 4.3 together with Algorithm 4.4 for generated vertices), we obtain all the vertices of the overlap polyhedron, e.g. in Fig. 4.9.

P1

P2

Fig. 4.9 The vertices of the overlap polyhedron (left) obtained after the computation of inherited vertices (circles) and of generated vertices (stars) for the two intersecting polyhedra (right, also in Fig. 4.6). The vertices obtained are points scattering in space and we need to find their topological relations, namely the faces, to determine the overlap polyhedron.

4.3

Face and contact area determination

When all vertices of the overlap polyhedron Po have been computed as scattered points in space, we need to determine the topological relations of those vertices to obtain its faces. As soon as the vertex coordinates and the faces in terms of vertex indices are known, we can compute the volume and the center of mass of Po according to section 3.2. Then we can define the contact areas (triangles) for our force model (Chapter 5).

4.3.1

Face determination

Similar to the vertices, which are partly inherited from P1 and P2 and partly generated from triangular face intersections, the faces of Po can also be distinguished from generated faces and inherited faces. The inherited faces are those faces of P1 whose three vertices are all inside P2 and vice versa, see Fig. 4.10 (a). The generated faces are part of the original faces of P1 and P2 bounded by the generated vertices (or by the generated vertices together with inherited vertices), e.g. Fig. 4.10 (b). In contrast to generated vertices, which all originate from the intersection, the generated faces are not something “totally new” but

86

OVERLAP COMPUTATION

(a)

(b) P2 Vg Vi Vi

P1 Vg

Vg Vg

Vg Vi

Vi Fi

Fg P2

Vg

F

P1 Fig. 4.10 Example of an inherited face Fi and a generated face Fg : In (a), for the two polyhedra P1 and P2 , since the three vertices (Vi ) of the triangular face (filled by dark gray) of P1 are all inside P2 , the face is an inherited face for the overlap polyhedron; In (b), for P1 and P2 , there is only one vertex (Vi ) of P1 inside P2 , thus it is not an inherited face. The gray triangle on the face F of P2 , which consists of three generated vertices Vg , is a generated face of the overlap polyhedron by the face intersections of F with the faces where Vi is located. Faces which consist of both generated vertices Vg and inherited vertices Vi are also generated faces.

parts from all intersecting faces of P1 and P2 which become the generated faces of the overlap polyhedron. With the list of the intersecting face pairs (contact_f ace_pair from Algorithm 4.3) from computing the generated vertices, what remains to be determined for those faces are the indices of the generated vertices which they are located on (since one face can not intersect only one face). Instead of finding the inherited faces directly by checking their vertices, we make use of the VERTEX_FACE_TABLE array (in which for each vertex all the faces it is located on are stored, as explained in section 3.1.1) for the inherited vertices. For an inherited vertex, we check all its faces: if the face has already been registered as a face of Po , we register it on the face of Po ; if not, we register the face as a new entry in the face list of Po and register the inherited vertex for this newly registered face. By doing so, we can not only register the inherited vertices on those generated faces it may belong to, but also find the inherited faces. The algorithm to find the faces of the overlap polyhedron Po of P1 and P2 is summarized in Algorithm 4.5.

FACE AND CONTACT AREA DETERMINATION

87

Algorithm 4.5 Determine the face of the overlap polyhedron Po 1:

{generated faces from P1 }

2:

for all the faces of P1 : Fi (P1 ) do

3:

if Fi (P1 ) is intersecting with P2 then

4:

register Fi (P1 ) as a face of Po : Fk (Po )

5:

find the entries of Fi (P1 ) in contact_f ace_pair

6:

find the vertex indices for all the intersection points of Fi (P1 ) from the corresponding entries in intersect_point_pair_idx

7: 8: 9:

register all the vertex indices for Fk (Po ) end if end for

10:

{inherited faces from P1 }

11:

for all the inherited vertices from P1 : Vin i (P1 ) do

12: 13: 14: 15:

for all the faces Fi in VERTEX_FACE_TABLE of Vin i (P1 ) do if Fi is in the list of faces of Po (Fi = Fk (Po ) then register the Vin i (P1 ) for the face Fk (Po ) else

16:

register Fi as a new entry in the face list of Po

17:

register Vin i (P1 ) for the face entry corresponding to Fi

18: 19:

end if end for

20:

end for

21:

{Repeat the same process for intersecting and inherited faces from P2 }

Though the faces of the original polyhedra P1 and P2 are triangles, the generated faces which consist of two inherited vertices Vi and two generated vertices Vg are not necessarily triangular, as can be seen in Fig. 4.10 (a). Since our formulae (and the corresponding subroutines in the DEM code) for computing the physical properties of a polyhedron are based on triangular faces (section 3.2), to obtain the volume and the center of mass of the overlap polyhedron, we need to triangulate those generated faces with more than three vertices. In this case, we devised two algorithms, one using the centroid (Fig. 4.11) the other using an edge (Fig. 4.12) to determine the relative orientations of the vertices and order them counterclockwise. For the method which uses the centroid, we first need to set up a reference system for determining the orientations of the vertices to be ordered: we choose the origin as the centroid C(Cx , Cy , Cz ) of a set of k vertices Vi (Vix , Viy , Viz ) which is given as the algebraic average of the vertex-coordinates, C=

k ∑

Vi /k;

(4.20)

i=1

the w-axis as the normal of the generated face nf , which can either be found in the

88

OVERLAP COMPUTATION

V3 (V’5)

V5 (V’4)

θ5

V4 (V’6) C

θ2 θ6

Vb (u)

nf

V6 (V’3)

Va (v)

V2 (V’2)

V1 (V’1)

Fig. 4.11 Order the vertices on a generated face with respect to the centroid: Vi is the entry in the vertex indices list of the generated face before ordering and V′i is the entry in the list after ordering; C is the average of the vertex coordinates; nf is the normal of the face; a unit vector from C to the first vertex V1 is selected as the base unit vector Vb ; an auxiliary unit vector Va is chosen as nf × Vb . The angle between the Vb and the vector CVi is θi from Eq. (4.23). The vertices Vi are ordered according to the values of θi . For the ordered list V′i , we can obtained a triangulation with triangles (C, V′i , V′i+1 ) for i = 1, . . . , 5 and (C, V′6 , V′1 ).

FACE_EQUATION array (section 3.1.1) or computed from the three non-collinear vertices (Eq. (3.2)). The u-axis is chosen as the unit base vector Vb from C to the first vertex in the list V1 Vb =

V1 − C ; ||V1 − C||

(4.21)

and an auxiliary vector Va as the v-axis from Va = nf × Vb .

(4.22)

The sin(θi ) and cos(θi ) for the Vi (i = 2, . . . k) can be computed from vector inner products sin(θi ) = (Vi − C) · Va , cos(θi ) = (Vi − C) · Vb . Using the atan2(y, x) function from FORTRAN5 , we can obtain the angle θi for Vi with 5 The atan2(y,x) function gives the angle in radians for a point (x,y) to the positive x-axis for all four quadrants. The results are positive for counter-clockwise angles when y>0 and negative for clock-wise

FACE AND CONTACT AREA DETERMINATION

89

respect to the unit base vector Vb as θi = atan2(sin(θi ), cos(θi )),

(4.23a)

θi = 2π + θi ,

(4.23b)

if θi < 0.

Sort Vi according to θi for the vertex list of a generated plane leads to a list of vertices ordered counter-clock-wisely, as shown in Fig. 4.11.

V3 (V’5)

V5 (V’4)

V4 (V’6)

θ4

θ3

V6 (V’3)

θ5 θ6 V1 (V’1)

Vb

V2 (V’2)

Fig. 4.12 Order the vertices for a generated face with respect to an edge: the first two entries in the vertex list for a generated face in our algorithm always one of its edges. We use the edge V1 V2 as unit base vector Vb and compute the cos(θi ) for the remaining vertices Vi . The larger the cos(θi ) is, the closer the Vi are to the edge V1 V2 . A triangulation is obtained automatically for the ordered list with triangles (V′1 ), V′i ,V′i+1 ) for i = 2, . . . , 5.

This algorithm is self-sufficient, which means no additional information is necessary except the coordinates of the vertices to be ordered. Thus it can be used also in other applications as a general approach for ordering a set of disordered points on a plane. In contract, in our overlap computation, we obtain convenient information which can facilitates the ordering process, namely the information of intersection segments (the intersect_point_pair_idx array from Algorithm 4.4), which are the edges of the overlap polyhedron. When we register the vertex indices of the intersection points for a generated face, the first two vertices are always recorded the list as a pair, which means for a generated face, we know at least one edge from the first two vertices of its vertex indices list, as shown in Fig. 4.12. Since the angles of all the other vertices to the edge V1 V2 cannot be larger than π, instead angles when y ϵd then Exit the inner loop for faces {Vi is outside of P2 } end if

6:

end for

7:

Record Vi in the list of inherited vertices from P1

8:

end for

9:

{Repeat the same procedure to check the neighboring vertices of P2 with the neighboring faces of P1 }

When we have a list of neighboring vertices and a list of neighboring faces for each polyhedron, we can just check the neighboring vertices of one polyhedron against the neighboring faces of the other one. Thus, Algorithm 4.2 for the computation of the inherited vertices can be optimized and we obtain Algorithm 4.7. For the generated vertices, instead of computing the intersection points for all the pairs of triangular faces of the polyhedra (Algorithm 4.3), we compute all the local triangular face pairs, as shown in Algorithm 4.8. The computational effort for the optimized algorithms for the neighboring features would become O(m2 ), where m is the number of the neighboring features. For polyhedra with a small number of vertices and faces, the improvement can be expected to be insignificant, while for polyhedra with a large number of vertices and faces, for small deformations, the number of the neighboring features would only be a fraction of the total number of features. Therefore, the effort for the overlap computation can be considerably reduced compared to the brute-force algorithms.

96

OVERLAP COMPUTATION

Algorithm 4.8 Compute all the intersection points via neighboring features 1:

set num_int_pair = 0 {the number of intersecting face pairs}

2:

for all the neighboring faces F1i of P1 do

3:

for all the neighboring faces F2k of P2 do

4:

call compute_triangle_intersection ( F1i , F2k )

5:

if find two intersection points Vint1 and Vint2 then

6:

num_int_pair = num_int_pair + 1

7:

{record the (F1i , F2k ) in a list of contact face pairs}

8:

contact_f ace_pair(1 : 2, num_int_pair)=(F1i , F2k )

9:

{record the two points in a list of intersection point pairs}

10: 11: 12: 13:

intersect_point_pair(1 : 2, num_int_pair)=(Vint1 , Vint2 ) end if end for end for

4.4.3 Statistics from a test case In this section, we give the statistics from a dynamic DEM simulation of 648 polyhedra (each consists of 62 vertices and 120 faces) and 34 wall particles (modeled as polyhedra with 8 vertices and 12 faces). The wall particles form a drum which rotates with constant speed and the particles move according to the equations of motion. We record three variables, the total number of particle pairs obtained from our contact detection algorithm, the averaged number of triangle pairs for a polyhedron pair passed to the subroutine for computing the intersections and the total number of particles which turn out to have an overlap, to illustrate that the proposed neighboring feature algorithm works properly. The values are given from timestep 1 beyond timestep 600000 at every 5000 time steps, to make sure that there are no effects from the equilibration. Since the contact detection algorithm is applied to eliminate those polyhedron pairs without overlap via less computational effort, its efficiency is also evaluated here. In Fig. 4.16, we plot the total number of particle pairs from our contact detection algorithm divided by n2 and by n, where n is the total number of particles, to show the merit of avoiding the n2 -loop. The number of (possible) contact particle pairs from our contact detection algorithm is about 4n, which means we are much better off in contrast to the n2 overlap computation. If we compare the number of particle pairs which turn out to have an overlap with the reported particle pair number as shown in Fig. 4.17, over 50% of the pairs from the contact detection do have overlaps. The efficiency of the neighboring feature algorithm is shown in Fig. 4.18, for a pair of polyhedron with 120 faces (nf =120), instead of computing 14400 triangle pairs, with the neighboring feature algorithm, only about 0.55 · nf triangle pairs are passed to the intersection computation.

OPTIMIZATION FOR VERTEX COMPUTATION

97

−3

7

x 10

5

6

nc/n

4

c

n /n2

5

3 2 2 normalized normalized by by n n 2 normalized by n

1 0 0

1

2

3 4 Time step

5

0 7

6

5

x 10

Fig. 4.16 The number of particle pairs from our contact detection compared with respect to n2 (left y-axis) and n (right y-axis), where n = 682 is the total number of particles in the simulation. The number of particle pairs from our contact detection is about 4n, compared to 6n for the densest packing in three dimensions.

0.65 0.6 0.55 0.5 0.45 0.4 0.35 0.3 0

1

2

3 Time step

4

5

6 5

x 10

Fig. 4.17 The fraction of the number of particles which overlap in the contact list vs. the total number of pairs in the contact list during the time evolution: about 55% pairs reported by the contact detection are really in contact.

98

OVERLAP COMPUTATION

−3

9

x 10

1 2

normalized by n normalized by n

0.9

7

0.8

6

0.7

5

0.6

c

c

n /n

n /n

2

8

4 0

1

2

3 4 Time step

5

0.5 7

6

5

x 10

Fig. 4.18 The pairs of faces passed to triangle intersection computation after refinement with the neighboring feature algorithm, normalized by the total number of faces of a particle nf and its square n2f (nf = 120). Only about 0.55 · nf pairs are sent for triangle intersection computation.

Chapter 5

Force Modeling In DEM simulations for granular particles, while the degrees of freedom are given by the equations of classical mechanics (as discussed in Chapter 2), the interaction between particles is a matter of responsible modeling. To model the interaction between two particles, three basic properties have to be defined: the magnitude of the force, the direction of the force and the action point (to define torques). When Cundall and Strack [24] first introduced the DEM for granular media research with discs, they defined the vector connecting the two centers of mass as the normal direction, used the penetration depth to model the magnitude of the elastic force. However, in their model the force point was not explicitly stated: the normal forces would not introduce torques, while in the definition for the torque caused by the tangential forces, the radii of the discs were used. For polygonal and polyhedral particles, one possibility is to define the vertices inside a certain “contact zone” [41, 42] and accumulate the contact force from each vertex according to its penetration depth. Though not explicated stated in the Ref. [41, 42], a similar treatment for the torques can be inferred. An obvious shortcoming of such multiple-contact points for a single contact is the discontinuous behavior of the selected contact points, which would led to force discontinuities in DEM simulation, as one author confessed himself in his thesis [41]. The discontinuity stems from the fact that vertices may appear in or vanish from the contact zone discontinuously. In our DEM simulation with polyhedral particles, we take the full geometric information of the overlap polyhedron into account to model the contact force: The overlap polyhedron for the force magnitude, the contact line, which defines the contact area, for the force direction, and the centroid of the overlap polyhedron for the force point. In this chapter, we will first discuss how the contact points and the normal and tangential directions for the contact forces are defined. For the simulation of two dimensional polygonal particles, our group has used an elastic force model which lead to stable simulations for considerable time [25, 106], thus then we discuss its generalization for the normal contact forces of polyhedral particles. We will also discuss why the polyhedral force model is stable. Also for the tangential force, we will discuss the adaptation and implementation 99

100

FORCE MODELING

of Cundall-Strack friction model (originally for two dimensional discs [24]) for three dimensional polyhedral contacts. We test our force models with the dynamics of a cube on an inclined plane for which the convergence of the normal and the tangential forces are expected and the tangential force model should be equivalent to static friction.

5.1 The normal- and tangential direction and the force point ^t

^t ^t ^n

^n

^n

Fig. 5.1 Definition of the normal direction n ˆ and tangential direction tˆ for twodimensional particles, circular particles (left), for polygons via the contact line (middle) and for via the center of mass of the overlap polygon and its connection to the intersection points (right).

For normal collisions of particles, we take into account the deformation and the collision velocity. Tangential sliding leads to Coulomb friction so that we need mathematically unique definition of the normal- and the tangential force. For round particles, it is common to define the vector connecting the two centers of mass as the normal direction (Fig. 5.1, left). In two dimensions, for polygonal particles, we can use the “contact line”, which goes through the two intersection points of the contacting particles, as the tangential direction and its normal as normal direction (Fig. 5.1, middle). Alternatively, one could connect the center of mass of the overlap polygon and the two intersections points to obtain the “contact line”, which is actually a connection of two line segments. We take the normals of the two segments and define a unique normal direction by summing up the two normals weighted by their lengths. Accordingly, the direction perpendicular to the normal direction is the tangential direction (Fig. 5.1, right). For polyhedral particles, the length-weighted normal direction definition becomes area-weighted, as explained in section 4.3.2, the normal for the contact area of the overlap polyhedron is the average of the areaweighted normals of all the contact triangles, see Fig. 5.2, right. As the line segments on the boundary between the surfaces of two polyhedra vary smoothly for smooth relative motion, so will the area of the triangles vary smoothly. Accordingly, the normal direction will also vary smoothly. For the force point, in our DEM simulation we choose the center of mass of the overlap polyhedron, as shown in Fig. 5.2. As the volume of the overlap polyhedron varies continu-

MODELING OF THE NORMAL FORCE

101

c22 co

c11

Fig. 5.2 Two interacting particles (left) and their (magnified) overlap region (right): The stars show the intersection points of the two polyhedra; The contact area consists of the triangles formed by the solid lines connecting adjacent intersection points and the dashed lines connecting intersection points and the center of the overlap region. The force point is at the center of mass of the overlap polyhedron Co , the normal direction is along the vector starting Co , which is the average of the normals of the contact triangles.

ously, so does its center of mass, in contrast to the definition via multiple-contact points. The normal force fn acts at the center of the overlap region co and along the direction which is the average of the vector sum of all normals of the contact triangles, weighted with the respective area Ai

∑s Ai ni , n = ∑i=1 || si=1 Ai ni ||

(5.1)

in which s stands for the number of contact triangles and ni for the unit normal vector of the i-th contact triangle. To sum up, in our DEM simulations, for a pair of polyhedral particles in contact, the contact force is acting at the center of mass of the overlap polyhedron, the normal direction is the normal of the contact area and the tangential plane is the plane perpendicular to the normal and pass though the center of mass. All of these quantities are uniquely defined and vary smoothly for smooth relative motion of the contacting particles.

5.2

Modeling of the normal force

The modeling of forces in normal direction for the discrete element method borrows from the basic models of elasticity, namely Hook’s and Hertz’s law (for the elastic force) and the harmonic oscillator (for the relation between the elastic and the dissipative force), see Fig. 5.3.

102

FORCE MODELING

dx

dx

Fig. 5.3 Deformations and overlap in elastic models: linear contact model (for rectangles, left), and Hertzian contact model (for spheres, right) with penetration depth dx.

5.2.1 Magnitude of the elastic force When we model an overlap with the DEM for a bar, the elastic force Fe will be proportional to the penetration depth dx. On the other hand, if we want to model the contact of a spherical particle, the elastic force Fe will be proportional to dx3/2 [107]. Many simulation codes for round particles exist which make use either of linear or Hertzian potentials. For arbitrary shaped particles, the use of the penetration depth as parameter of the force is not practicable, because analyzing the contact shapes (corner-on-corner, corner-on-edge, corner-on-face, edge-on-face, face-on-face) and switching between force-laws is tedious. For our polyhedral simulation (see Fig. 5.2) it is more convenient to use as parameter the volume of the overlap region, which can be shown to reproduce the linear regime and the Hertz-regime for the corresponding contact geometries. As far as the accuracy of the computation of overlap polyhedron is concerned, if we want to calculate the overlap of two cubes along their faces of e.g. 0.01m × 0.01m × 0.01m size, with realistic Young’s modulus of about 100 GPa=1011 N/m2 , we get the following estimate: Assuming a weight of 1 g for the particle, the relative penetration depth will be ∆l F 0.001kg · 9.81m/s2 = = ≈ 10−9 . l A·Y 0.01m · 0.01m · 100 · 109 N/m2 With double precision (16 digits accuracy), we would have enough digits left so that rounding error will not affect our simulation. For sharper contacts, or softer material, which both result in larger penetration depth, the accuracy problem will become less severe.

5.2.2 Characteristic length While our interaction should locally reproduce the contact laws depending on the shape of the contacting particles, we also have to take into account the units. If we use as parameter for the particle elasticity the Young’s modulus (unit: N/m2 ) multiplied with

MODELING OF THE NORMAL FORCE

103

t1 t2 t3 t4

t1 t2 t3 t4 volume element

c2

c1 contact point

L

Fig. 5.4 Sound propagations in continuum (left) and discrete systems (right): volume element of length L is treated with respect to the relative deformation of the element of length L (left). The particle length L between the center of mass and the contact point must be dealt with appropriately.

the volume of the overlap polyhedron (unit: m3 ), there is still a factor L with a unit of length missing. The sound velocity in bulk solids is a physical property which can be used to fix this length factor analogously to the two dimensional polygonal model [106]. Sound waves (microscopic deformations of the continuum traveling at sound speed c which is only material-dependent, not e.g. amplitude-dependent) are mimicked in DEM-simulations by microscopic displacements of the center of mass of the particles which should propagate with c for space-filling packings, as shown in Fig. 5.4.

Amplitude A

Amplitude 2A

Fig. 5.5 Sound wave in the continuum (wavy line, left), in a packing of cubes (shaded overlap region of particles, middle) and in a packing of parallelepipeds (right).

Space-filling packings of cubes or parallelepipeds should have the same sound velocity as the bulk continuum, cbulk =



Y /ρ,

(5.2)

which should depend only on the material parameters, namely the density ρ and Young’s modulus Y. Obviously, if we would choose the factor L constant, we see in Fig. 5.5 that

104

FORCE MODELING

a sound wave/overlap amplitude in a packing of “short” particles would lead to smaller accelerations than a larger amplitude for “longer” particles. Thus instead of using a constant length factor, which makes the sound velocity dependent on the particle size, we define the “characteristic length” Lc for two intersecting particles i and j with arbitrary shapes of different “radii”, ||ri || and ||rj || (distance between center of mass and contact point, ri = co − ci and rj = co − cj , Fig. 5.6, left) Lc = 4

||ri ||||rj || . ||ri || + ||rj ||

(5.3)

This definition, for two rectangular particles of the same shape contacting with parallel

ci ri

co

cj ci

rj

co ri

rj

cj

Fig. 5.6 Vectors from the center of mass ci , cj to the contact point co (the center of the overlap region) for arbitrarily shaped (left) and regular (right) particles. sides, gives exactly the length of the particle (Fig. 5.6, right). The characteristic length has to be introduced to make the sound velocity in granular materials independent of the particle size. It serves to compensate the “time of flight” which the sound wave would spend while it passes through the particle, distant from the inter-particle contacts. This is not an effect of our choice of force law: Also for packings of spheres, one has to assume that the sound velocity should not change with the particle size, but with the packing density. The characteristic length is therefore complementary to the force-law (linear, Hertz or whatever): While the overlap polyhedron takes care of the microscopic interaction at the contact point, the characteristic length takes care of the macroscopic propagation speed of this interaction through the bulk.

large size v v small size Fig. 5.7

Chains consisting of large-size and small-size particles

MODELING OF THE NORMAL FORCE

105

With the definition of the “characteristic length”, we have the complete form of the elastic force ||fe || = Y Vo /Lc ,

(5.4)

in which Y is the Young’s modulus of the material, Vo the volume of the overlap polyhedron and Lc the characteristic length for the two contacting particles as defined in Eq. (5.3), acting at the center of the overlap polyhedron and along the normal direction n defined in Eq. (5.1). As verification for the elastic force model, the sound velocity in a discrete chain of cubes will be discussed in section 5.2.3.

5.2.3

Sound propagation in discrete chain

For a linear chain of particles with constant cross-section, a valid elastic force model should be able to reproduce the sound velocity of a bulk with the same kind of material. Deviations will occur if the sound propagation excites resonances at the particle contacts due to the effective inter-particle contact strength and the particle mass. We set up two velocities of the particles in the chain (1st 6 particles) 0.9 0.8

velocity (m/s)

0.7 0.6

1st Part. 2nd Part.

Maximum velocity

3rd Part. 4th Part. 5th Part. 6th Part.

0.5 0.4 0.3 0.2 0.1 0

−0.1 100 200 300 400 500 600 700 800 900 1000 1100 1200 timestep(dt=1e−05 s)

Fig. 5.8 Propagation of the maximum velocities of the first six particles in the chain; we measure the sound propagation through the distance between particles and the time span after which the maximum velocity arrives.

chains consisting of 400 particles, one chain with particles two times larger than the other chain along the sound propagation direction, see Fig. 5.7. The Young’s modulus is chosen as Y = 2 × 107 N/m2 and density as ρ = 5000 kg/m3 , which gives a continuum sound velocity of cbulk = 63.25 m/s. The sound wave in the chains is triggered by the impact

106

FORCE MODELING

of the first particle at the initial time step. The propagation of the displacements of the particles in the chain is sketched in Fig. 5.8. As can be seen from Fig. 5.9, the sound 65

velocity (m/s)

60

55 velocity of large−particle chain velocity of small−particle chain 1/2 velocity of a continuum bulk: (Y/ρ) 50 0

50

100 150 200 Particle interval (n: P to P ) n

250

300

n+1

Fig. 5.9 Sound velocities of the small-particle and the large-particle chain: both of them converge to the sound velocity in continuum bulk of the same kind of material.

velocities of both chains converge to cbulk . The increase of the sound velocity should not be confused with a violation of momentum conservation: The initial impacts and collisions lead to larger movement of the centers of mass of the particles than the finally obtained sound velocity in the chain, where nearly only the dislocation propagates. With the force model defined in Eq. (5.4), we succeed to reproduce the bulk sound velocity in a discrete chain of cubes with the same material property.

5.2.4 Continuity of the time-evolution of the elastic force To integrate out the dynamics for two polyhedral particles in contact, it is necessary that the elastic interaction varies smoothly. As our force law (see Eq. (5.1, 5.4)) depends smoothly on the overlap features (overlap volume, centroid of the overlap polyhedron, normal of the contact area), it is to show that these overlap features also vary smoothly. Smooth (continuous) relative motion of the particles means here smooth change of the relative position of the center of mass and of the orientation of the particles. 1. Obviously, for smooth relative motion, the variation of the overlap volume is also smooth, as all corner coordinates and intersection points also vary smoothly. At the same time, the intersection line (the intersection line of the surfaces of the two polyhedra) will also vary smoothly.

MODELING OF THE NORMAL FORCE

107

2. If the overlap volume varies smoothly, the position of the centroid and with it the force point will also vary smoothly. 3. If the position of the centroid varies smoothly, the inverse characteristic length will also vary smoothly. 4. Due to the smooth variation of overlap volume, position of the centroid and inverse characteristic length, the magnitude of the elastic interaction force also varies smoothly. 5. It remains to be shown that the direction of the force also varies smoothly. As in point 1, smooth variation of the overlap region implied smooth variation of the intersection line, so the triangles which form the contact area also vary smoothly. As the direction is the weighted average of the normals of the contact triangles, it will also vary smoothly. In the algorithm by the University of Illinois at Urbana Champaign group [41, 42], new points entering the overlap region created the discontinuities. If in our algorithm a new vertex becomes part of the overlap polyhedron, its effect will be weighted by the area change, which will be initially zero, so that the variation is still smooth in this case.

Fig. 5.10

Two arbitrary-shaped polyhedral particles approaching each other

As an example, we show the time evolution of the volume, the center of mass, and the normal direction of two particles of arbitrary (while convex) shapes, as show in Fig. 5.10. The time evolution of the volume of the overlap polyhedron is shown in Fig. 5.11, as well as the center of mass in Fig. 5.12 and the normal of contact area in Fig. 5.13, left, all of which vary continuously during the contact process. The elastic force is proportional to the volume of the overlap polyhedron, acting at the center of mass of the overlap polyhedron and pointing to the normal of the contact area, thus varying continuously, as shown in Fig. 5.13, right.

108

FORCE MODELING

−3

x 10 2

volume

1.5

1

0.5

0

Fig. 5.11

0.01

0.02

0.03 0.04 time

0.05

0.06

0.07

The time evolution of the volume of the overlap polyhedron

x

0.44 0.42 0.4

z

y

0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055 time

Fig. 5.12

0.40 0.80 0.60 0.40 0.60 0.55 0.65 0.60 0.55

0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055 time

0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055 time

The time evolution of the center of mass of the overlap polyhedron

Differentiable variation is not necessary, and is also not implied in DEM-codes: already for the simplest case of a linear force law in one dimension, independent of the particle shape, the force-law is not differentiable (see section 1.5), something which is taken care of by the use of stiff solvers. The velocity-dependent force laws will in general also not be smooth, as they are maximal at the closing and opening of a contact, and zero immediately

nz

f f 0.04 time

−0.20 −0.30 −0.40 0.02

Fig. 5.13

y

−0.20 −0.40 0.02

−20 −40 0.02

0.04 time

z

ny

0.02

x

−0.92 −0.94 −0.96

109

f

nx

MODELING OF THE NORMAL FORCE

−2 −4 −6 −8 −10 0.02

0.04 time

0.04 time

−5 −10 −15

0.04 time

0.02

0.04 time

The time evolution of the contact normal and the elastic force

before, respectively immediately after. With the use of stiff solvers this is not a problem, as in the case of dense packings, the contact velocities are very small. Nevertheless, a non-smooth variation of the force direction would lead to considerable noise, which would be devastating for the reliability of the simulation and the stability of the results.

5.2.5

Dissipative force in normal direction

In our two dimensional force model for contacting polygons [108, 106], in analogy to the harmonic oscillator, x ¨ = −kx − γ x˙ where additional to −kx for the elastic force a viscous force −γ x˙ is introduced, we model the dissipative force proportional to the area change dA/dt of the overlap polygon and use the characteristic length Lc to fix the unit. To make the dissipation equivalent for various √ masses m and stiffness constants k, additionally a prefactor km is multiplied with γ, √ which for our two-dimensional model was γ Y (2D) m, with [N/m] as the unit of the two dimensional Young’s moduli Y like a spring-constants k in the oscillator. The damping force in the normal direction for a two dimensional contact of polygonal particles takes the form ||fd ||(2D)

√ Y (2D) m =γ dA/dt, Lc

which is a signed quantity according to the change of the overlap area dA/dt. The damping force works along the normal direction as defined in previous section 5.1.

110

FORCE MODELING

To generalized the two dimensional damping force for three dimensional contacts, we use the change of the volume δVo of the overlap polyhedron. Since Y has the units [N/ m2 ], in this case an additional length unit must be dealt with under the root, so all in all a higher order of the characteristic length Lc should be used. We chose the magnitude of the damping force ||fnd || proportional to the change of the volume of the overlap region as √ ||fnd || = γ n

Y Mred δVo /δt, L3c

(5.5)

in which γ n is a dimensionless damping coefficient for normal force, Mred is the reduced mass given by Mred = mi mj /(mi + mj ), where mi and mj stand for the mass of particle Pi and Pj respectively. ||fnd || is a signed quantity whose value is smaller than zero when δVo < 0. The choices of the damping coefficient γ n for DEM simulations can be based on the coefficient of restitution of the material to be investigated, which is a quantity which tells how fast the energy is dissipated due to normal collisions [109].

Attraction Repulsion

Total force elastic contact force 0 damping force

time t T* T

Fig. 5.14 Sketch of the variation of the total normal force (thick black line) as the sum of the elastic force (gray line) and the damping force (dashed line) from impact (at t=0) to T, with the time T ∗ where elastic and damping force compensate each other exactly. In DEM simulation, we need to set the total normal force as zero after the time T ∗ .

To avoid the unphysical “attractive” contact force in DEM simulations for dry granular materials (as shown in Fig. 5.14), when the damping force ||fnd || is larger than and opposite to the elastic component ||fe ||, the following modification is needed: ||fnd || = −||fe ||, if ||fe || + ||fnd || < 0.

(5.6)

MODELING OF THE TANGENTIAL FORCE

111

For a contact pair of polyhedral particles Pi and Pj (see Fig. 5.2, left), suppose that n points outside of particle Pi , n · ri < 0, then the total normal force on the particle Pi is given by fn = (||fe || + ||fnd ||) · n,

(5.7)

while −fn acts on the particle Pj .

5.2.6

Estimation of the time step

Since the definition of the normal force is according to the harmonic oscillator, Eq. (5.7), and the eigenfrequency of the undamped vibration is determined by the mass m and the √ stiffness constant k as ω = k/m, we can estimate the vibration frequency of a contact between two particles as √ ωc =

√ k ≈ m

Y · Lc 1 ≈ 3 ρl l

√ Y , ρ

(5.8)

in which ρ is the density of the material, l the length of the particle and m is the reduced mass of both particles. From the Eq. (5.8), we know that the larger the Y and the smaller the density ρ and particle size l are, the higher the vibration frequency ωc of the contact is. For a collision of particles, the contact time τc would be roughly half of the period 1 τc ≈ 2

(

2π ωc



) =π·l·

ρ . Y

(5.9)

For a simulation with parameters Y = 107 , ρ = 103 and l = 10−2 , all in UI units, the estimation for the contact time τc would be of the order of 10−4 seconds. Thus a suitable time step size to resolve the contact process would be of order 10−5 to 10−6 so that we have 10 to 100 evaluations in the numerical approximation for a contact duration. Contact time τc of the order of 10−4 s and time step size of the order of 10−5 s to 10−6 s are typical in our DEM simulations. For BDF-integrators (in section 2.2.2), a resolution with 10 step sizes is usually enough, for less stable integrators, more step sizes per collision are needed.

5.3

Modeling of the tangential Force

The need to model the tangential force arise from the difficulty to implement Coulomb friction numerically exact for static friction, a problem not only limited to DEM simulations of granular material, but a general problem for numerical approaches for the dynamics of many-body systems. While many-body systems of rigid particles with Coulomb friction forces have been dealt with some time ago via “contact dynamics” [60] (see section 1.4.3), for the case where the Coulomb friction forces are reactions to elastic normal forces with velocity constraints of many-body systems, a general solution is still outstanding. In

112

FORCE MODELING

granular research, friction is usually “approximated” with “breaking spring” model dating back to Cundall and Strack [24] who first introduced the discrete element model (DEM) for granular material research, or worse, by “fudging” a velocity-dependent viscous-like damping (e.g. Chong et al. [110]), or, finally it is totally neglected [111]. So far, for numerical exact solutions for static friction, for one- and two-dimensional problems, solving the static friction as non-holonomic constraint has been discussed [74, 68, 112]. For three dimensions, there are still open questions, due to foreseeable difficulties, among them the vanishing of the uniqueness of the tangential direction. While in one or two dimensions, the external forces between particles can be compensated uniquely by the static friction force at the contact, in three dimensions the force directions may be skewed and thus no unique direction for the friction exists a priori. Instead of seeking the exact solution for Coulomb friction, we generalize the Cundall-Strack friction model, which worked well in two dimensions [106, 108], to three dimensions. In this section, we explain how the Cundall and Strack’s friction model works in two dimensions first and then how we adapt and implement it in three dimensions for our current DEM code. The caveats for using such models are also discussed.

5.3.1 Cundall-Strack friction in two dimensions

t 0 approach

t 1 contact

t 2 sliding

t 3 sliding

t 4 sliding

t 5 stopped

Fig. 5.15 Deformation of the “tangential spring” in Cundall-Strack friction model in a contact process, the friction in the model is approximated as a spring in the tangential direction.

In two dimensions, the tangential contact force introduced by Cundall and Strack is modeled incrementally [24]: at time t, the magnitude of the tangential force ft is given by ||ft (t)|| = ||ft (t − dt)|| + kt ||vt || · dt,

(5.10)

MODELING OF THE TANGENTIAL FORCE

113

in which kt is the tangential stiffness, vt the tangential velocity; the direction of ft is in the direction opposite to vt ; additionally, the magnitude of the tangential force is checked against the maximal possible value µ||fn || to apply the rule ft (t) = sgn(ft (t)) · µ||fn (t)|| if ||ft (t)|| > µ||fn ||,

(5.11)

where sgn is the sign function, µ the friction coefficient and fn the normal contact force given by Eq. (5.7). Since the direction of the tangential force is obtained from the tangential velocity, the Cundall-Strack model uses scalar increment of magnitude. As kt must have the dimension of a spring constant, the Cundall-Strack model is sometimes referred to as a model of breaking tangential springs.

dynamic friction µ f n

force

static friction t3 µf n t2

t0

t4 t5 Cundall−Strack friction

t

t1

Fig. 5.16 Intended behavior for the Cundall-Strack friction (thick black dashed line) and actual oscillatory behavior (thin black full line).

The behavior we want to obtain with this model is the one in Fig. 5.16, a monotonous increase of the tangential force, until saturation occurs at the theoretical value. Unfortunately, the behavior will not be monotonous: If we divide Eq. (5.10) by dt, we obtain (in scalar form) df /dt = −kt vt .

(5.12)

If we integrate this equation over dt, we see that it is essentially the equation for the harmonic oscillator. That means the tangential force does not always act strictly opposite to the actual velocity, but due to the inertia of the “harmonic oscillator”, it may due to delay, even act in the direction of the actual velocity. Only if the tangential friction reaches the value for sliding friction (condition before Eq. (5.11)), energy is dissipated. Without additional damping, the tangential force will lead to oscillatory behavior (see thin full line in Fig. 5.16), only additional damping will lead to a purely monotonous curve (black

114

FORCE MODELING

dashed line in Fig. 5.16). While the damping may reduce the oscillations, one also sees that the Cundall-Strack model leads to a tangential friction whose action is delayed in comparison to the “exact” friction.

|ft (

v (t t −d t)

t) f t( dt t)| (t) −d v t |f t(t

t)|

−k

t−d

(t) vt Fig. 5.17 In the two-dimensional case, the tangential friction from the previous time-step ft (t − dt) is used irrespective of a possible shift in the direction at the new time-step. As shown in Fig. 5.17, we can interpret the directions via vector projection: the tangential force from the previous time-step ft (t − dt) is projected onto the tangential direction of the current time-step along the current contact velocity vt (t) while its magnitude is kept by rescaling the projection of ft (t − dt); Then the new tangential force ft is obtained by the vector addition of the incremental force −kt vt during the time interval dt and the magnified projection of previous tangential force ft (t − dt).

5.3.2 Cundall-Strack friction in three dimensions To generalize the Cundall-Strack model to three dimensions, several issues should be addressed. In three dimensions, the manifold of the tangential movement is two-dimensional (r1 (t), r2 (t)), while the actual trajectory of the contact point is three dimensional (x(t), y(t), z(t)). Therefore, we need a way to relate the contact manifold (r1 (t), r2 (t)) with the contact trajectory (x(t), y(t), z(t)) in a unique way. At the same time, we want to retain the incremental feature of the Cundall-Strack model in two dimensions that the magnitude of the contact force at the previous timestep is used irrespective of a shift in the direction. The three concepts, projection, rescaling and vector addition, are crucial to adapt the Cundall-Strack model for three dimensional contacts: 1. Projection: During the advance from time-step t − dt to time-step t, where we have ˆ (t) and the new tangential velocity vt (t), we project the old the new contact normal n

MODELING OF THE TANGENTIAL FORCE

115

tangential force ft (t − dt) onto the new tangential plane, ˆ (t)) n ˆ (t). ft (t − dt)p = ft (t − dt) − (ft (t − dt) · n

(5.13)

2. Rescaling: We then rescale the projection ft (t − dt)p by the magnitude of the previous tangential force ||ft (t − dt)||, ft (t − dt)r = ||ft (t − dt)|| ·

ft (t − dt)p . ||ft (t − dt)p ||

(5.14)

3. Vector addition: the rescaled projection ft (t − dt)r is then added by the incremental vector for the interval dt, ft (t) = ft (t − dt)r − kt vt (t)dt.

(5.15)

A cut-off is applied, if the results from the vector addition exceeds the maximal friction allowed (the dynamic friction) ft (t) = sgn(ft (t)) · µ||fn (t)|| if ||ft (t)|| > µ||fn ||,

(5.16)

to obtain the tangential force as dynamic friction. Applying the above scheme, we can obtain a three-dimensional vector ft (dt), which is in the tangential plane of the contact and of incremental property, which will be used as an approximation of friction in our DEM simulations. A test case of a block on an inclined plane will be used in section 5.4 as a validation for the generalization of Cundall-Strack friction in three dimensions. In practice, the incremental term −kt vt (t)dt in Eq. (5.15) deserves more consideration: similar to the elastic force model, where the one-dimensional “spring constant“ k (with unit [kg/s2 ] or equivalent to “force/length”) of the harmonic oscillator is replaced by a “two-dimensional” modulus Y (with unit [kg/(m·s2 )] or equivalent to “force/area”) in our force model, for the tangential stiffness constant in a three dimensional problem, we need the characteristic length Lc (Eq. ( 5.3)) to fix the unit for the tangential force increment ∆ft = Yt · Lc vt (t)dt,

(5.17)

when using a “tangential Young’s modulus” Yt [106] instead of a spring constant kt . The tangential Young’s modulus Yt is often chosen as Yt =νY (0 < ν < 1), in which ν is functionally an analogue to Poisson’s ratio, as a measurement of a particle in force equilibrium (which leads to static friction) to resist the change of position along the tangential direction (since Y is used as a measure to resist the change along normal direction). In our DEM simulations, Yt = 2/7Y is used as in our two dimensional simulations for polygonal particles [25, 106, 108]. If Yt is used instead of kt in Eq. (5.15) and the length dimension

116

FORCE MODELING

Lc is missing, for cases that the characteristic length are small (Lc < 1, which are usually the cases in our DEM simulations with small particles), we will have a large increment term Yt vt dt > (Yt vt dt) Lc ,

(5.18)

which may lead to the “incremental” mechanism (Eq. (5.15)), which is the essence of the model, dysfunctional. For example, for a simulation with Lc is of the order of 10−3 , the increment tangential force from Eq. (5.15) would be 100 to 1000 times larger than the normal increment ∆ft in Eq. (5.17). Due to the cut-off in Eq. (5.16), the tangential force would be maximal almost all the time in simulation, which means the “incremental” mechanism is not working and the model is equivalent to dynamic friction, neglecting the static friction all the time.

5.3.3 Dissipative force in tangential direction The damping force in tangential direction for the polygonal particles in two dimensions [106] takes the following form, ftd

= −γ

t



t v , Yt Mred t

(5.19)

t in which the Mred is the reduced “tangential mass” t Mred =

1 , 1/mi + 1/mj + ri2 /Ii + rj2 /Ij

(5.20)

which takes into account the moments of inertia Ii and Ij of the particles. To adapt it to three dimensions, we need to consider two things, first the dimension of the equation and second the form of the reduced “tangential mass”. Since Yt is with unit [kg·s−2 ·m−1 ] compare to [kg·s−2 ] in two dimensions, the characteristic length Lc enters the tangential damping force in three dimensions ftd

= −γ

t



t L v . Yt Mred c t

(5.21)

t Mred is the reduced “tangential mass” in three dimensions, as the moments of inertia of the

particles in three dimensions would be a matrix rather than scalar quantity in Eq. (5.20). We consider the tangential velocity at the contact point for two particles i and j vt = [v1 + ω1 × r1 − (v2 + ω2 × r2 )]t .

(5.22)

Its time derivative gives the tangential acceleration v˙ = v˙ 1 + ω˙ 1 × r1 + ω1 × r˙ 1 − (v˙ 2 + ω˙ 2 × r2 + ω2 × r˙ 2 ) ,

(5.23)

MODELING OF THE TANGENTIAL FORCE

117

where the subscript “t ” which indicates the tangential direction is dropped. Substituting r˙ i = ωi × ri into Eq. (5.23) and rearranging the terms gives v˙ = v˙ 1 − v˙ 2 + ω˙ 1 × r1 − ω˙ 2 × r2 + ω1 × (ω1 × r1 ) − ω2 × (ω2 × r2 ),

(5.24)

where ωi × (ωi × ri ) would be vanished for round particles since it is along the ri direction and for round particles it has no component in the tangential direction. We denote A = ω1 × (ω1 × r1 ) − ω2 × (ω2 × r2 ) since it does not contain the force terms explicitly and substitute ω˙ i = I−1 i ((Ii ωi ) × ωi + ri × fi ) into Eq. (5.24) yields ( ) ( −1 ) v˙ = v˙ 1 − v˙ 2 + I−1 1 (r1 × f1 ) × r1 − I2 (r2 × f2 ) × r2 + A + B, where

(5.25)

( ) ( −1 ) B = I−1 1 (I1 ω1 ) × ω1 × r1 − I2 (I2 ω2 ) × ω2 × r2

is another term which does not contain the force explicitly. Since the two particles are in contact and we ignore the influences from the other particles as well as the gravity, then we have f2 = −f1 and the Eq. (5.25) becomes ) 1 1 + ft + v˙ = m m2 ( −11 ) ( ) I1 (r1 × ft ) × r1 + I−1 2 (r2 × ft ) × r2 + ( −1 ) ( ) I1 (r1 × fn ) × r1 + I−1 2 (r2 × fn ) × r2 + A + B, (

(5.26)

where ft and fn are the tangential and the normal components of f1 . For round particles, ( ) the item I−1 i (ri × fn ) ×ri can be dropped since ri is parallel to fn , while for polyhedral (or polygonal) particles usually it can not. The reduced tangential mass for two dimensional problems, Eq. (5.20), comes from the simplification of Eq. (5.26) for two dimensional discs ( v˙ = =

1 1 r2 r2 + + 1 + 2 m1 m2 I1 I2 1 t ft + B, Mred

) ft + B (5.27)

t which could be treated as a physical harmonic oscillator with mass Mred for whose viscous

damping force would be fd = −



t k v. Mred t

Since it is difficult to decouple the tangential and the normal component of the contact force in Eq. (5.26) and write in the form like a harmonic oscillator along the tangential direction alone as we can do for two dimensional discs in Eq. (5.27), we did not proceed to derive for the reduced “tangential mass” for polyhedral particles. Moreover, the derivation is only exact for two-particle problems, while most of the time, our particles will have more

118

FORCE MODELING

contacts. Instead, we use the same value as for the dissipative force in normal direction t Mred =

1 . 1/mi + 1/mj

(5.28)

for polyhedral particles Pi and Pj in our DEM code. Compared with its two dimensional counterpart in Eq. (5.28) it may overdamped since the its effect on the rotary degrees of freedom have not been taken into account. Cundall and Strack suggested to choose the same value for the normal damping constant γ n and the tangential damping constant γ t [24]. As an compensation for our overdamping, we could take the damping constant for tangential direction half of that for the normal direction. In practice, this is not necessary, since we don’t want the oscillation along the tangential direction anyway, so an overdamped system would be preferable. What really should be taken into consideration is whether the choice of “mass” and “stiffness” in the damping model would not increase the damping, so that the corresponding system of equations would become stiff. This would make the timesteps in explicit integrators prohibitively small, but it will not a problem with our BDF-integrators. The total force in tangential direction would then be ft = fet + fd ,

(5.29a)

ft = sgn(ft ) · µ||fn || if ||ft || > µ||fn ||,

(5.29b)

where fet is the tangential force from Eq. (5.15) or its cut-off, Eq. (5.16).

5.3.4 Caveats

In the Cundall-Strack model [24] and its generalization [113, 114], the tangential force is incremented from the previous time step, with some additional modifications for the direction, and there is a cut-off due to the normal force and the friction coefficient. Such models are widely used and are believed to yield reasonable results when the velocity variations in the system are not large, e.g. in shearing a granular system or in dynamic process like heap formation etc. There are several drawbacks of a friction model compared to the (currently not accessible) analytic solution. While actual solid friction is practically instantaneous, there is a delay in the Cundall-Strack model which depends actually on the spring constant. This delay is smaller, the larger kt in eq. (5.12) is chosen. Nevertheless, a large choice of kt necessitates a reduction of the timestep dt for the time integration, which else depends only on Young’s modulus and the particle mass, i.e. the necessary amount of computer time increases for “better” modeling. Another problem is that the “tangential spring” is an unphysical degree of freedom which may randomly store and release energy and therefore act as a noise term. Especially for strongly oscillatory problems with high amplitudes, the consequences for the verisimilitude of the simulation are difficult to fathom.

A SIMPLE TEST: A CUBE ON AN INCLINED PLANE

5.4

119

A simple test: a cube on an inclined plane

We show how the normal and the tangential force model works via a simple test case: a cube is placed above an inclined face of a wedge (see Fig. 5.18) without contact, when the simulation starts, the block will fall down along the z-axis during the first time step and take some time to find its equilibrium position, as can be seen from the variation of the volume of the overlap polygon in Fig. 5.19.

The density of the block is chosen as

Fig. 5.18 A block on a wedge: The edge of the cubic block is 0.1 m while the wedge is 0.5 m high along the z−axis and 1 m long along the x−and y−axis.

1 kg/m3 and Young’s modulus Y = 104 N/m2 . The tangential stiffness is 0.2Y and the damping coefficient γn and γt are both set as γ = 0.2. Since we have an initial velocity anyway, we choose a friction coefficient µ = 0.6, larger than the µ ˜ = 0.5 which corresponds to the angle of the wedge. The normal force is computed according to Eq. (5.7) and the tangential force to Eq. (5.29). As can be seen in Fig. 5.20, the normal force converges to G cos(θ), where G is the gravity force and the tangential force converges to G sin(θ), which is smaller than the value for dynamic friction µG cos(θ), see Fig. 5.21. As can be seen from Fig. 5.22 and Fig. 5.23, the variation of the positions and forces converges to zero. When assessing the oscillatory motion, it should not be forgotten that for the actual problem, also bouncing in the vertical direction takes place so the physical time evolution of the forces and velocities are not monotonous either.

120

FORCE MODELING

Volume of the overlap polyheron

-8

x 10 14 12

3

Volume(m )

10 8 6 4 2

0.01

0.02

Fig. 5.19

0.03

0.04

0.05 0.06 Time(s)

0.07

0.08

0.09

0.1

The volume of the overlap region.

-3

x 10

Fn 14

Gcos(θ)

12

Force(N)

10 8 6 4 2

0.01

0.02

Fig. 5.20

0.03

0.04

0.05 0.06 Time(s)

0.07

0.08

Normal force and its convergence

0.09

0.1

A SIMPLE TEST: A CUBE ON AN INCLINED PLANE

121

-3

x 10 9

Ft

8

µGcos(θ) Gsin(θ)

7

Force(N)

6 5 4 3 2 1 0.01

0.02

Z Position

Y Position

X Position

Fig. 5.21

0.03

0.04

0.05 0.06 Time(s)

0.07

0.08

0.09

0.1

Tangential force and its convergence

0.2502 0.2502 0.2501 0.2501 0.25 0.01

0.02

0.03

0.04

0.05 0.06 Time(s)

0.07

0.08

0.09

0.1

0.01

0.02

0.03

0.04

0.05 0.06 Time(s)

0.07

0.08

0.09

0.1

0.01

0.02

0.03

0.04

0.05 0.06 Time(s)

0.07

0.08

0.09

0.1

0.7225 0.7224 0.7224

0.4171

0.4171

Fig. 5.22

Position equilibrium of the block on the wedge.

122

FORCE MODELING

X

F (N)

-3

0 -2 -4 -6

x 10

0.01

0.02

0.03

0.04

0.05 0.06 Time(s)

0.07

0.08

0.09

0.1

0.02

0.03

0.04

0.05 0.06 Time(s)

0.07

0.08

0.09

0.1

0.02

0.03

0.04

0.05 0.06 Time(s)

0.07

0.08

0.09

0.1

-3

x 10 FY(N)

0 -5 -10 0.01

Z

F (N)

-3

2 0 -2 -4 -6 -8

x 10

0.01

Fig. 5.23

Force equilibrium of the block on the wedge.

Chapter 6

Parallelization In this chapter, we will discuss the parallelization of our DEM simulation, both with respect to the type of parallelization which was chosen, as well as the types of parallelization which were avoided. We start with a list of terminology (in computer science) which is relevant for the later discussion. Then we explain the choice of FORTRAN over C/C+ +. After the discussion for parallel programing models and the related hardware, we give a brief introduction of the usage of OpenMP, which is the current standard library used for shared memory parallelization. Last we discuss how to parallelize our DEM code with OpenMP for multi-core machines.

6.1

Glossary

In the following, we give a brief alphabetic glossary of terms for high performance computing which will be used in the later sections of this chapter. Cache is the “fast” memory which is between the main memory and the registers of the CPU. Nowadays, there are usually level 1 to 3 of caches, level 1 the fastest, closest to the registers of the CPU, and usually on-chip. On multi-core-machines, the level 1 and level 2 caches are usually assigned to a single core, while the level 3 cache may be accessed by several cores, and can therefore also serve as a communication device between the cores. Cache coherence means that the data necessary for some operations are accessible in the cache, or at least by the next loading operations from the next lowest memory into the cache. If the data are not available, this leads to cache misses, which cause idling time for the CPU where no operations can be performed. Cache miss see cache coherence. 123

124

PARALLELIZATION

Distributed memory means that each node has its own memory. Parallelism takes place by message passing, sending of data between the nodes over a network to other nodes. Domain decomposition is a coarse-level parallelization strategy where each node performs the computation for a certain domain. The information of the domain boundary must usually be treated in a special way, so that nodes which are responsible for neighboring domains can obtain the necessary information. Domain decomposition is usually the method of choice for message-passing parallelization for problems with narrow interaction range, as the amount of communication can be limited. For many applications with partial differential equations, where a matrix inverse must be parallelized, domain decomposition is not so efficient. Granularity refers to the level on which the parallelization occurs: If a matrix-matrixmultiplication is parallelized with block-matrices, one speaks of “coarser” granularity as if the computation of the inner loop is distributed on the same number of nodes. Message passing is a parallelization strategy where data are exchanged between different nodes which can not be accessed by use of shared memory only. Efficiency for a parallelized program is E=

ts ntn

on n nodes, where program took the time tn to execute, while it took ts on a single node. It gives the percentage of time during execution where the parallel processors are actually doing the work a single processor in the original scalar program did. If the program modifications necessary for the parallelization need a lot of additional time (.e.g. due to communication etc.), the efficiency can become very small. Unfortunately, in such cases it is possible to “improve” the efficiency by using every time-consuming codes for single nodes to make the efficiency look better. Node A computing node is a unit which executes parts of a parallel program independently. While for shared memory parallelization, a single core of a multi-core processor is a unit, in message passing programs, on specialized computers a whole main board made up from several processors may act as a node. With current multiprocessor multi-core machines, nodes can also be the cores of different processors. Kernel is that part of the program which is iterated and consumes most of the computer time. For granular material simulations, it is not a single task, but neighborhood routines, overlap computation and time integration. Usually, it is enough in parallel applications to parallelize the kernel, input and output must be parallelized only for excessively huge mounds of data.

COMPILER: FORTRAN VERSUS C

125

Load balancing is the distribution of work on the nodes of a parallelized program. To maximize speedup, the workload should be distributed as uniformly as possible over the nodes, so that no processor has to wait. MPI (Message Passing Interface) is the current standard library for parallelization with message passing. Calls to the library have to be inserted explicitly as function calls. OpenMP (Open Multi-Processing) is the current standard library for shared memory parallelization, for use with parallelizing compilers. Parallel execution is obtained by marking program sections for parallelization with “pragmas”, special flags in the program which are syntactically comments in the language standard, and which would be ignored by non- parallelizing compilers. Shared memory means that different nodes can access the same main memory. Therefore, the connection between nodes for the parallel computing is simply realized by some processors writing data into the memory, other processors read theses data. Speedup is the quotient S=

ts tn

for a program which takes tn on n processors of a parallel computer and ts on a single node. Especially for shared memory parallelization of multi-core processors, a speedup larger than n is possible: As the computation is usually delayed due to the lack of cache memory, for multi-core machines, each core has its own cache, so running a program on an n-core machine means to run it with n-time the cache of a single-core run.

6.2

Compiler: FORTRAN versus C

History: FORTRAN is the “first” programming language, conceptualized as “Formula Translation” for scientific applications. Nevertheless, the standard is continuously updated, the most notable one were FORTRAN77, and FORTRAN90. Because in the last years, the attention of compiler- and processor-manufacturers has drifted away from accuracy concerns to performance, the possibility to compare with older compilers and older hardware will be an important aspect of code verification. Accuracy: The FORTRAN-standard defines accuracy via declaration statements, in contrast to the C/C++-standards, which do not define the accuracy at all, apart from the fact that “double” should be as accurate or more accurate than “real”. This will ensures the portability to other hardware platforms with comparable accuracy. For modern compilers and processors, if the accuracy is not maintained via compilerflags, there is a danger that the actual computations for “double precision” are not full 8 Byte, but only about 6 Byte, corresponding to 9 or 10 digits accuracy instead of 16.

126

PARALLELIZATION

Modules: The FORTRAN-95 standard does not offer objects (as C++, which dates from the 1980s), because there are serious performance issues involved with instantiating and deleting objects. Nevertheless, it offers in FORTRAN90, name-spaces were introduced, which can be used similar to C++ Objects, but in a more down-to-earth (and less performance-endangering fashion): Data encapsulation occurs via variable names which are extended with the name of the module they occur in, and therefore are not accessible by other modules. Interfaces and data hiding are monitored by the compiler, not by profilers as in C which are unrelated to the language syntax. Instances (objects in a C++ class) can not be realized with modules, but the need to use any has not arisen up to now, and creating instances during run-time will not be efficient compared to the allocation at compile-time. Cache usage: In the devising of the FORTRAN standards special attention is paid to performance issues. In contrast to C/C++, FORTRAN compilers do not allow to pass the same argument into a function several times. If only pointers to the beginning of an array are passed, the memory access structure becomes unclear, and the data have all to be taken from the main memory, instead of the cache, which will slow down the program considerably. Also, a more transparent memory-layout due to use of indices instead of pointers helps compilers to optimize code. The problems with the cache for pointer usages will become worse for multi-core architectures, where the bandwidth is already at its limits if the caches of the individual cores can be used. Hard typing vs Soft typing: Currently, the program uses soft typing, i.e. the double precision data are defined as double precision. Hard typing would mean the introduction of new data-types, like a data-type of mass, velocity, position, which then in the interface definition would be allowed to be used with specified other data-types. This reduced programming mistakes in projects with many programmers and many data types, while in our case, the relation of the variables is defined by physics anyway, so there is no merit of hard typing for our simulation. On the other hand, the additional programming effort in defining the types and the interfaces is prohibitive. High-Performance-FORTRAN (HPF) is in contrast to FORTRAN95 not a standard, but a proposal of features which (hardware- and compiler-) manufacturers are free to choose from in their implementations. Because portability of features in HPF is neither intended nor guaranteed, we avoid its use.

PROGRAMING MODEL AND HARDWARE

6.3

127

Programing model and hardware

6.3.1

Shared memory versus distributed memory

CPU1

CPU2

CPU3

System Bus

CPU4

CPU1

CPU2

CPU3

CPU4

Mem1

Mem2

Mem2

Mem3

Memory Fig. 6.1

Network

Sketch of a shared memory (left) and a distributed memory computer (right).

In shared memory computers, all processing nodes have access to the same data in the same physical memory, see Fig. 6.1, left. The parallelization is done by the compiler on the thread-level: A Unix-thread contains instructions and the data for the execution. All Unixprocesses are anyway decomposed of threads, which on single-core processors are computed sequentially. The programmer’s task is therefore to introduce pragmas into the source code, so that the compiler can identify data dependencies, so that independent threads can be assembled, which can then be executed independently and in parallel on different cores. For simple loops, with no forward- or backward-dependency (see Fig. 6.2) the compiler usually performs “automatic parallelization” without any additional information. For subroutines for which the relation between the input and output is not clear, especially if different calls to the same data are used, the relation between the data has to be indicated extensively. Nevertheless, the most complex task for shared memory parallelization for a programmer who is the author of the scalar code is to understand the effect of the different pragma and the compiler options. If the program is written clearly (loops for operations on vectors, not function calls to single scalar data, subroutine interfaces where the programmer has a clear understanding of what the input- and what the output data are), the programming effort is not excessive, especially if the parts of the program are inherently parallel anyway, as for granular materials, the interactions between particle pairs. Therefore, this kind of parallelization strategy was chosen for the DEM code. do i = 1,n a(i)=a(i)*b(i) end do Fig. 6.2

do i = 1,n a(i)=a(i-1)*b(i) end do

do i = 1,n a(i)=a(i+1)*b(i) end do

Independent loop (left), forward (middle) and backward dependency (right).

Distributed Memory computing demands a much higher effort from the programmer than shared memory computing. Venturing into the realm of distributed memory computing is tempting, as it is easy to connect several workstations to a parallel computer just

128

PARALLELIZATION

using a LAN-network, see Fig. 6.1, right. Nevertheless, the actual programming effort is difficult to estimate. While the effort to write a working and stable computer simulation of granular materials takes of the order of three years, or one PhD-thesis, the parallelization takes about the same effort: Several researcher who made spirited attempts to parallelize granular particle codes, like Georg Ristow [115], with the extensive help of the superuser of the parallel computer R. Knecht, and Stefan Schwarzer [116], with the help of several PhD-students, finally retired from academia, as the amount of programming work turned out disproportionate to the scientific merit, and the parallelization became a drag on their scientific output1 . One reason is the long time which the program development and the debugging takes with the message passing programming model: Though there are tools and libraries which are supposed to help, the programmer has to take care that all the data are present whenever they are needed, as long as the data are not received, the program must be halted. Small programming mistakes, which could be immediately detected on a shared memory machine, as the data can be compared over the nodes immediately, may take days to debug, as first of all a strategy is needed to communicate the correct data redundantly. The development of parallel supercomputers also takes several years, so when they are delivered, single processors are already several times faster than the nodes in the parallel supercomputer. After that, with additionally three years development time, a new generation of single processor becomes available, so that very often, after the completion of a parallelization project, the same program is as fast on the newest single processor as it is in the parallelized version. An additional problem is: While it would be faster on the newest available supercomputers, computer time there is only available after tedious reviewing processes, often with uncertain outcomes. When Colin Thornton, the current maintainer of P. Cundall’s BALL and TRUBAL programs, was asked in the Powders & Grains conference in Sendai in 2001 how he would intend to speed up the code for parallel computing, he answered: “I won’t, I wait till single processors become faster”2 .

6.3.2 MIMD versus SIMD MIMD stands for Multiple Instruction Multiple Data parallelization, while SIMD stands for Single Instruction Multiple Data parallelization. Usually, shared memory parallelization, as well as distributed memory parallelization parallelizes via the MIMD-paradigm: The threads or processes which are generated by the compiler may contain different instructions (or the same instructions, but if-conditions which depend on the data lead to the executions of actually different instructions in different parallel threads). The MIMDapproach is flexible enough to allow the independent execution of different subroutines with different amounts of data on different nodes. In contrast, SIMD-computers perform identical operations in the same order on all nodes on different sets of data. After the Thinking Machines replaced the SIMD-Connection Machine 200 in the first half of the 1 2

Personal communication H.-G. Matuttis Personal communication H.-G. Matuttis

OPENMP FOR PARALLELIZATION

129

1990s with the (at least internally) MIMD CM500, nearly a decade passed without any SIMD-architecture available on the market. In the meantime, GPGPUs (General-Purpose Graphic Processing Units) and TV-game consoles (Playstation 3) have become veritable SIMD-supercomputers, both from the point of programming (special libraries, similar to new languages, like CUDA for NVIDIA-cards, have been devised), as well as from the point of hardware performance: Under favorable conditions, with graphics-cards hundreds or thousand times the performance of a scalar chip can be obtained. Of course, there is a drawback: Basically all operations must be performed in the same order, only the inputdata may change. This is maybe not a grave limitation for the simulation with central potentials, for overlap computations of polyhedra, where each pair has a slightly different orientation, every overlap needs a slightly different treatment. Even if only the order in the operations is different, we have not been able to devise a strategy to implement our overlap computation in SIMD-style on graphics cards, nor have any practitioners made us hope that such an approach would be feasible without loosing orders of magnitude of the peak computing performance. Another aspect is that reliable data for the performance and accuracy are difficult to obtain: From the point of computer graphics, there are characteristic computational problems like ray-tracing, which are much less sensitive to rounding errors than our overlap- and intersection computations. If extensive data preparation on the CPU is necessary to supply the GPGPU so that efficient data preparation is necessary, accurate tools are necessary to analyze the relative time consumption on the CPU and the GPGPU for different processes. Nevertheless, on the Unix-operating system which we used with a NVIDIA GTX-275 card, timing measurements were neither reliable nor reproducible even with only preliminary studies of dozen times repetitions of relatively coarse grained functions from linear algebra. This is not our isolated experience: Also researchers at the Earth Simulator who try to work with NVIDIA graphics cards have repeatedly requested better analysis tools from the manufacturer, without avail3 . Due to these problems, which do not seem so be destined for a solution in the near term, we have abandoned any attempts to port the overlap computation on GPGPUs.

6.4

OpenMP for parallelization

OpenMP is the current de-facto standard for shared memory parallelization. The OpenMP instructions are inserted into a program as syntactical comments (in FORTRAN usually starting with !$omp, see Fig. 6.3), so the same code can be used for scalar execution if the OpenMP library is not included during the compilation. In this section, we explain only the OpenMP language [117, 118] features which we have used for parallelize our DEM code. The main clauses of OpenMP we used are shown in Fig. 6.3. We create a parallel region with a pair of constructs !$omp parallel do and !$omp end parallel do. The way the loop iterations to be distributed over the available threads is controlled 3

Personal communication Hide Sakaguchi

130

PARALLELIZATION

by the schedule clause. All the iterations are divided into several “chunks” [118], for the dynamic schedule, a new chunk of iterations is assigned to a thread automatically when it has finished an old chunk and requests for a new one. In a parallel run of a loop iterations, if several processor cores can access the same memory, the results of a program may change in the case of forward- or backward-dependencies if the operations are performed or the variables are accessed in the wrong order. If such a possibility exists, the compiler usually refused to parallelize the code. To avoid memory access conflict (between processor cores), OpenMP asks for additional descriptions of the variables for a parallel region, which are specified by the shared, private, firstprivate and lastprivate clauses:

!......(scalar region) !$omp parallel do schedule(dynamic) !$omp& shared(variable list) !$omp& private(variable list) !$omp& firstprivate(variable list) !$omp& lastprivate(variable list) loop_to_be_parallelized: do cnt = 1, n !.........(loop region) end do loop_to_be_parallelized !$omp end parallel do !......(scalar region)

Fig. 6.3 Syntax of the loop construct of OpenMP in FORTRAN: The loop of a program to be parallelized starts with !$omp parallel do and ends by !$omp end parallel do. OpenMP divides the loop iterations into pieces and assigns them to different threads (cpu-cores) according to the argument of the schedule clause (dynamic is used here as an example). The variables involved in the parallel region have attributes as shared, private, firstprivate, lastprivate to avoid memory access conflict.

Shared variables are shared among multiple threads. Only a single memory location is assigned for a shared variable. Such kind of variables should be used with caution to avoid multiple threads read and write the memory location simultaneously. The array which stores all the overlap information obtained from the overlap computation routine as a preparation for force computation should be a shared variable. Private variables are private to a thread, which means for the “same” variable there are several memory locations, one for each thread. The value of a private variable is not defined before entering the parallel section as well as left the section. Special attention should be paid to the fact that the value of a variable which has the same name as a private variable will become undefined either after the parallel section is executed [118]. In the overlap computation for all the particle pairs, the pair index should be private.

OPENMP FOR PARALLELIZATION

131

firstprivate variables retain the value they had before entering the parallel section, in a sense as the input-variables for a parallel section. In the overlap computation for all particle pairs, the list of the particle pairs can be specified as firstprivate. lastprivate variables leave a parallel loop section with the values from the last loop iteration (the last iteration in a sequential order). It is not used in the current parallel DEM code, since we are not only interested in the last iteration but all the results for overlap computations.

There is one more clause which is related to declare the scope the variables, the threadprivate derivative, which is not shown in Fig. 6.3. It is used to assign “global” data private to a single thread, which is not located inside the region to be parallelized but at the place where the “global” data defined, e.g. right after module data declaration. We need to use the threadprivate derivative to specify the module data shared by the subroutines to compute the overlap of two polyhedra in case that we want to assign one pair of polyhedra to a thread for overlap computation. Further aspects related for implementing parallelization using OpenMP are the following:

1. Inappropriate use of the variable declarations will lead to hopelessly wrong simulation results, as we could ascertain: The compiler does not check the validity of the declaration, so the responsibility is solely on the side of the programmer. No parallelization e.g. by !$omp do parallel may result if the attribute (private, firstprivate or lastprivate) of the variables is not supplied. 2. To enable a parallelized execution of a code compiled with OpenMP parallel constructs, e.g. Fig. 6.3, the number of threads should be set beforehand in the operating system (e.g. to set 4 cores for executing the compile code in Unix with bash shell by export OMP_NUM_THREADS = 4), not in the program. 3. To profile a parallel code, the profiler pgprof specified works on data files created with the compiler option -Mprof (of PGI FORTRAN compiler4 ) for multi-threading, i.e. the profile is not only made for the functions called, but also for each of the threads (e.g. numbered from 0 to 3 for an execution with 4 threads). 4. Sometimes source codes have idiosyncrasies or errors which are not detectable by a single compiler, and another compiler gives complementary or more easily to understand error message. For this case, beside the PGI compiler we used, the GNU GFORTRAN compiler which comes free of charge with the UNIX-installation is absolutely sufficient. 4

http://www.pgroup.com/

132

PARALLELIZATION

6.5 Parallelization of the DEM code To parallelize our current DEM code, we first analyze our scalar code through profiling to find out in a simulation how much time is spent by each routine. Based on the profiling result, we can choose a more fine- or coarse-granular approach to parallelize the DEM simulation. The coarse-granular approach which parallelizes on the particle pair level is implemented with PGI FORTRAN compiler on a AMD OPTERON Duo-ProcessorQuadcore machines and a speed up of 1.4 with parallelization efficiency of about 35% is achieved.

6.5.1 Profiling of the scalar code Table 6.1 Profiling result for the scalar code from a time-based sampling (with the PGI compiler option -Mprof=time) Profiled: ./granu3d_scalar on Wed Oct 12 10:07:54 JST 2011 for 27,656.086624 seconds Function polyover_tri_int_dist_2 polyover_interpoints pgf90_mm_real8_str1_vxm polyover_overpoly _pgf90_mm_real8_str1a ...... geometry_update_fplane pre_cor_corrector boundingbox_find_pplist pre_cor_predictor ......

Time 4,583.59 = 17% 4,106.5 = 15% 3,262.4 = 12% 2,089.36 = 8% 1,016.6 = 4% 654.656 411.956 354.033 308.522

= = = =

2% 1% 1% 1%

To profile the the scalar code, we prepared a simulation setup with a drum modeled by 34 polyhedra with 8-vertex-12-face shape and 375 granular particles which are a mixture of polyhedra with 8-vertex-12-face shape and with 12-vertex-24-face shape. The code was complied with the PGI compiler profiling-options -Mprof=time to obtain a time-based sampling for each subroutine (and function). We simulated the dynamics of the granular particles inside the drum with a constant rotation speed for thirty seconds realtime. The time step was 2 · 10−5 s and the simulation lasted about 27656 seconds (about 7 hours and 40 minutes). The most time-consuming subroutines are listed in Table 6.1. Most of the subroutines which take the largest amount of computer time are related to the overlap computation. The amount of computer time spent for the geometry update of the polyhedra (geometry_update_fplane), for the predictor-corrector (pre_cor_corrector and pre_cor_predictor) and for the contact detection (boundingbox_find_pplist) is almost

PARALLELIZATION OF THE DEM CODE

133

the same, around 1% to 2%, which are insignificant compared with the time-consuming subroutines of the overlap computation. Since the overlap computation consumes most of the simulation time according to the profiling information, when we consider to parallelize the scalar DEM code, we need to focus on the parallelization for the overlap computation.

6.5.2

Parallelization for overlap computation

...... ! comment: loop over all the particle pairs in the contact list do pair_id = 1, number_of_particle_pair p1 = contact_list(1, pair_id) p2 = contact_list(2, pair_id) compute_overlap(p1, p2, overlap_polyhedron) compute_force_torque(overlap_polyedron, contact_force_torque) add_force_torque(contact_force_torque, total_force, total_torque) end do ...... Fig. 6.4 Pseudocode of the force computation iterations: for each pair of particles in the contact list, the overlap computation and the force computation are carried out sequentially.

There are two regions in the program that closely related to overlap computation: the force (and torque) computation routine which uses the result of the overlap computation to determine the contact force (torque) between two particles and the overlap computation itself. As shown in the Fig. 6.4, in the force computation routine, the overlap computation precedes the force computation since the latter needs the information of the overlap polyhedron to determine the contact force (details in Chapter 5). Thus one approach to parallelize the code for multicore-machines would be distributing the overlap computations of all particle pairs in the contact list to several cpu-cores, which means the overlap polyhedra of multiple pairs are computed “simultaneously”. Alternatively, for the overlap computation of a single pair of particles, we can also seek opportunities for parallelization. To compute the inherited vertices, as shown in Fig. 6.5, with a single cpu-core, we need to compute for the vertices of two polyhedra sequentially. From the profiling (Table 6.1), we know that the most computational intensive subroutines (polyover_tri_int_dist_2 and polyover_interpoints) are related to the triangular face intersection computation. As shown in the pseudocode in Fig. 6.6, to obtain the generated vertices, for all the combinations of the neighboring face pairs, the intersections are computed sequentially. For a multi-core machine, we can nevertheless parallelize the inherited and generated vertex computations. How the tasks would be distributed in the coarse-granular and in the fine-granular approach (using OpenMP) are shown in Fig. 6.7 assuming that a maximum of four threads

134

PARALLELIZATION

...... ! comment: compute inherited vertices from P1 do vertex_P1 = 1, number_of_neighboring_vertices_P1 check_inside_polyhedron(vertex_P1, neighboring_faces_P2) if (inside) register_inherited_vertex(vertex_P1) end do ! comment: compute inherited vertices from P2 do vertex_P2 = 1, number_of_neighboring_vertices_P2 check_inside_polyhedron(vertex_P2, neighboring_faces_P1) if (inside) register_inherited_vertex(vertex_P2) end do ...... Fig. 6.5 Pseudocode of the inherited vertex computation (part of overlap computation) iterations: The computation of vertices for the two polyhedra P1 and P2 are carried out sequentially.

...... ! comment: compute intersection points of P1 and P2 do face_P1_id = 1, number_of_neighboring_faces_P1 face_P1= neighboring_faces_P1( face_P1_id) do face_P2_id = 1, number_of_neighboring_faces_P1 face_P2= neighboring_faces_P2( face_P2_id) compute_intersection(face_P1, face_P2, intersect_point) if (intersect) register_intersect_point(intersect_point) end do end do ...... Fig. 6.6 Pseudocode of the (triangular) face intersection computation (part of overlap computation) iterations: for each neighboring face combination of the two polyhedra, the intersection computation is carried out sequentially.

PARALLELIZATION OF THE DEM CODE

135

pair 1: overlap computation (thread 0) particle−pair list pair 2: overlap computation (thread 1) (contact detection) pair 3: overlap computation (thread 2) master thread (thread 0) pair 4: overlap computation (thread 3)

compute force master thread (thread 0)

parallelization at the particle−pair list level pair 1: intersection computation (thread 0) face−pair list (neighboring feature) master thread (thread 0)

pair 2: intersection computation (thread 1)

compute inheritated vertices

pair 3: intersection computation (thread 2) pair 4: intersection computation (thread 3) master thread (thread 0)

paraellelization at a specific particle−pair level (Other parts of overlap computation could also be parallelized)

Fig. 6.7 Coarse-granular (above) and fine-granular (below) parallelization approach for multi-core machines (in OpenMP style with four cores available).

are available: In the coarse-granular approach, parallelization over all particle pairs, when the program enters the parallel region, i. e. the overlap computation for all the possibly overlapping particle pairs, the single initial thread (the master thread) creates four threads which carry out the overlap computations for four different pairs independently. After all the overlap computations are finished, the master thread continues the force computation while the other threads terminate. Since the distribution of the polyhedra with actual overlap in the list of possible interacting pairs is practically random or unknown, this parallelization may cause quite some overhead for synchronizing threads with varies time consumption. In the fine-granular approach, parallelization of all faces for a single pair, after the program enters the parallel region, four pairs of faces are computed for intersection simultaneously. After all the intersection computations are finished, the master thread continues while the others are terminated. Similar to the coarse-granular parallelization, a considerable overhead would also exist for the fine-granular parallelization since when we compute the intersections of face-pairs of two polyhedra, as only a few of the face-face computations really yield intersecting pairs and the face-pairs which really intersect are also distributed randomly and unknown. Currently, we have parallelized our DEM code for the particle pair list. To alleviate the overhead issue, we use the dynamic option of OpenMP [118], with which new iterations are assigned automatically to the threads when they have finished their previous bunch of particle pairs, to distribute the particle pairs for multi-threads.

136

PARALLELIZATION

To implement OpenMP for parallelization of the overlap computations for all particle pairs, a few modifications of the original scalar code should be made. In the scalar code, after obtaining the overlap polyhedron, the necessary information is then handed to force computation routine seamlessly. While in the parallelized code, we need to wait for all the overlap computations finished to start for force computations. Due to the way OpenMP handles memory, each thread has its own copies of variables (termed as private variables) when enter a parallel region and after the execution of the parallel region, the private variables of the threads become undetermined except for the variables of the master thread. Thus we need to preserve all the overlap polyhedra in the parallelized code otherwise all the overlap polyhedra would be lost except the overlap polyhedron of the last pair computed by the master thread. We use an array of the length of the number of particle pairs in the contact list, which is shared by all the threads (a shared variable) to store the results obtained by all threads. The access of each thread to the array for storing the results is limited to the column of the array according to the particle pair index. By doing so, there would be no data conflict (also referred as data race [117]) between the threads for accessing the array. The modified pseudocode of force computation is shown in Fig. 6.8, in which the array for storing all the overlap polyhedra is noted as over_poly_all.

...... !$omp parallel do !$omp& private(pair_id, p1, p2) schedule(dynamic) !$omp& firstprivate(number_of_particle_pair, contact_list) !$omp& shared(over_poly_all) do pair_id = 1, number_of_particle_pair p1 = contact_list(1, pair_id) p2 = contact_list(2, pair_id) compute_overlap(p1, p2, over_poly_all(:, pair_id)) end do !$omp end parallel do compute_all_force_torque add_all_force_torque ...... Fig. 6.8 Pseudocode of the parallelized force computation iterations: the parallel region starts from “!$omp parallel do” and ended by “!$omp end parallel do”, with a parallel compiler option -mp of PGI FORTRAN compiler, the code would be compiled as a parallel executable file. The number of threads for execution is specified as an environmental variable for operating system (e.g. “export OMP_NUM_THREADS=x” in bash shell or “setenv OMP_NUM_THREADS x” in c-shell or tc-shell, where x stands for the number of threads designed for OpenMP applications).

PARALLELIZATION OF THE DEM CODE

6.5.3

137

Profiling of the parallelized code

We profiled the code with two threads with the same simulation setup used for the profiling of the scalar code in section 6.5.1. In contrast to the scalar code, the parallelized one is profiled through measuring the execution time for the functions (instrumentation-based profiling5 ), which if used in a production run needs additional computational effort compared to an execution without profiling and would cause a considerable overhead. Thus, as can be seen from Table 6.2, the execution time for the parallelized code with 2 threads running costs even more time (about 11 hours and 24 minutes) than the scalar code (about 7 hours and 40 minutes). The reason to use the instrumentation-based profiling rather than the run-time-based profiling is that the latter is only suitable for measuring singlethread simulations, which means for parallel runs, only the statistics for the master thread is available. Though time-consuming, with the instrumentation-based profiling method, we can evaluate how well the load was distributed in the multiple threads in a parallel run. Table 6.2 Profiling result for the parallelized code based function count (with the PGI compiler option -Mprof). The simulation was run with 2 threads with a AMD OPTERON Duo-Processor-Quadcore machine. Profiled: ./granu3d_parallel on Thu Oct 13 00:18:40 JST 2011 for 41,005.806527 seconds Function interpoints tri_int_dist_2 overpoly orderintersp checkinsider ...... update_fplane update_geometry update_inertial_tensor find_pplist corrector predictor ......

Time 6,790.66 = 17% 6,031.48 = 15% 4,359.07 = 11% 2,795.22 = 7% 2,346.72 = 6% 1,775.81 734.065 636.861 453.514 421.253 421.253

= = = = = =

4% 2% 2% 1% 1% 1%

Count 2,575,960,696 = 1% 34,706,002,960 = 19% 913,005,784 = 1% 913,005,844 = 1% 913,005,784 = 1% 1,500,001 1,500,001 1,500,001 1,500,001 1,500,001 1,500,001

= = = = = =

0% 0% 0% 0% 0% 0%

Profiling of function tri_int_dist_2 for each thread: Threads 0 1

Time 6,031.48 = 51% 5,784.16 = 49 %

Count 17,668,577,443 = 51% 17,037,445,517 = 49 %

As can be seen from the profiling of the function tri_int_dist_2 (for triangle intersection computation), the work has been evenly divided for the two threads (in Table 6.2, below). 5

PGI Tools Guide, www.pgroup.com.

138

PARALLELIZATION

Besides the time consumption for each function, the number of times a function is called in the simulation is also available in the profiling result (the Count item in Table 6.2). The count statistics shows that the triangle intersection computation had been executed for enormous times in the simulation. Currently, with the coarse-granular parallelization approach, for the same simulation setup and machine for profiling (Table 6.1 for scalar code and Table 6.2 for parallel code), and with the compiler options -c -O3 -mp -Minfo=mp,loop,ccff. We measured the execution for different number of cores on the eight cores of our two-processor machine. The result is listed in Table 6.3, in which we can see that the parallelization efficiency decreases with the increase of the number of of cores specified for parallel execution. A speedup of 1.4 with 35.5% parallelization efficiency was achieved when using 4 out all the 8 cores of the machine. The best parallelization efficiency was obtained with 2 cores and when more than 4 cores were used the speedup was saturated. The low parallelization efficiency may due to the overhead in the dynamical allocation of threads, while the efficiency of another parallelization scheme with manual distribution of the parallel loop (by striding the whole loop with the number of cores for each thread) at compile time turned out to be the same order.

6.5.4 Influences of Hardware and Software To investigate the influences of hardware and compliers, we compare two multi-core machines with two different compilers: The Linux opteron-based workstation with 2.5 GHz clock rate, with PGI FORTRAN compiler 10.2 (the one used for profiling the parallelized code in section 6.5.3); An i7-based MAC OS X (Lion) MacBook Pro with 2.2 GHz clock rate, with GFORTRAN compiler (and compiler option: -c -O3 -fopenmp). Both the machines support carrying out up to 8 threads simultaneously, since opteron has 8-cores while i7 support eight-way multitask processing. This allows to estimate the overhead created by the operating system or differences in the memory handling. We already know from the profiling of the scalar code that the most expensive part is the overlap computation, where the (convex) overlap-polyhedron of two overlapping (convex) polyhedral particles must be determined. For overlap computation, we have used extensively (with respect to the number of function calls) standard-FORTRAN90-functions which have no equivalent in standard-C, e.g. the computation of matrix-matrix products or the computation of maximal values of vectors. For a scalar run on the GFORTRAN/ MAC implementation, 40% percent of the of the runtime went into the matrix-matrix products from GFORTRAN’s intrinsic MATMUL-function which are used for the solution of the equation of the intersection between planes and lines, as well as in the projections to determine whether objects can have an overlap at all. In the PGI/Linux-implementation, this program part took only about 10%. Replacing GFORTRAN-MATMUL on the MAC with the Apple’s native BLAS-routines reduced the CPU-consumption to about the same

PARALLELIZATION OF THE DEM CODE

139

amount as on the Linux-implementation. With respect to single processor performance, it turned already out that the compilation with OpenMP on MAC/GFORTRAN increased the runtime by 2%, as all local arrays are allocated on the stack, not used from the heap. As most of the runtime is spend in the overlap computation, it turned out to be sufficient to parallelize the loop over all possible contacting pairs. The comparison of the efficiency and speedup for the two multi-core machines is given in Tab. 6.3. The speed in the second column is given in updates per second (u/s), i.e. the number of particles times the number of sweeps, divided by the time, which is a quantity which should be independent of the number of particles. This shows that the cores of the two processors are comparable in speed. Tab. 6.3 shows that the efficiency decays fast with increased core number. Nevertheless, the MAC-system is affected less by this breakdown than the Linux system. Since the i7 has a larger (L3) cache than the Opteron, the performance on the Linux-workstation decays more markedly than of the MAC for large amounts of data transfer (usage of many cores). If larger (unnecessary) array-sizes are initialized, the performance of the MAC also decreases further, but not as strongly as the LinuxWorkstation. Table 6.3

Performance for parallel executions on two different multi-core machines. number of cores 1 2 4 8

6.5.5

103 u/s 22 28 31 31

/ / / /

22 48 76 86

Efficiency ts /(n · tn ) 100% / 98 % 65% / 86 % 36% / 67 % 17% / 37 %

Speedup ts /tn 1 / .98 1.3 / 1.7 1.4 / 2.6 1.4 / 3.1

Attempts for further optimization

For a code which parallelizes “ideally” due to its data independencies, the speedup in Tab. 6.3 is definitely deplorable. On the other hand, the profiling tools gave hardly any useful information about further speedup possibilities: On the Linux, waiting times like barriers and cache misses were indicated (about 20% of the total runtime), while on MAC these data had to be inferred from the sum of the CPU-time functions (rather 80% for the functions, which leaves 20% for waiting-time). As there is no other overhead, there are three candidates for the performance decay: Using the stack instead of the heap due to new memory allocations, or cache misses (as different cores get in each others way while neighboring data) or the inability of the bus to keep up the memory traffic when all cores work and request data simultaneously. We rewrote the program so that each core would call its own subroutine, so with the SAVE-attribute from FORTRAN, memory allocation on the heap could be enforced. There was no measurable difference, so the usage of the stack can be excluded as reason for the performance downgrade. To investigate the effect from the bus, we compared the performance of several few-core timing with one many-core

140

PARALLELIZATION

result. On the MAC, two simultaneously started three-core jobs took about 30% longer than a single three-core job, which corresponds to a speedup of 2×1.6, which is slightly better than the speedup of 2.8 on six cores (running two times four cores vs. 8 cores makes no sense, as there are more background jobs on the MAC than under Linux, which would just contaminate the result). One can attribute 30% to the limited capacity of the system bus, the remaining 15% are due to interference of the cores with each others caches. For the Linux-machine, the slowdown for running two 4-core jobs instead of one was about 4%, so here the majority of the performance downgrade is due to interference of different cores with each others loading operations.

6.6 Concluding remarks The parallelization via shared memory was relatively straightforward, at least after the intricacies of the variable declarations with respect to shared or private character was understood. Nevertheless, the efficiency for a code for which input- and output-data are practically independent is very unsatisfying. Better data-layout is needed so that cacheaccesses of different cores do not interfere with each other, while the loading of unnecessary data should be suppressed as far as possible. We hope that in the future, we will be able to use FORTRAN’s common blocks (which prescribe the successive order of variable in the array) and maybe also FORTRAN90-types (which allow a sequential-attribute, so that elements with the same index are ordered sequentially in the memory, instead of elements of the same variable name) to influence the order of the data layout so that the overhead will be reduced and the ideal data independency will be complemented by an ideal efficiency.

Chapter 7

Verification DEM models the granular material at the micromechanic level (particle level) to study the phenomena at the macroscopic level, via observables such as angle of repose, stress-strain diagram etc. In this section, we first investigate the angle of repose of a heap formed by pouring polyhedral particles on a flat floor, which is the most straightforward way to tell whether the noise and the numerical error are insignificant and whether static friction has been implemented properly in the DEM code. We then apply the code to simulate “quasi-two-dimensional” heaps between two narrow walls. For verification, experiments on heaps formed by glass beads between acrylic plates have been carried out. The density distributions inside the heaps have been measured in the simulation and in the experiment. The (construction) history effect on pressure distribution under the heap is also investigated in the simulation.

7.1

Heap formation in 3D

To apply our algorithm to the investigation of macroscopic properties of granular systems with particles of relatively complicated geometries, we simulate the construction of a heap by pouring particles on a smooth, flat floor (modeled as one particle) and measure the angle of repose. By smooth we mean that the floor is geometrically one flat surface. As it is possible to construct a heap even on a mirror Fig. 7.1, one should be able to do without “roughness” or “zig-zag” shapes (see Fig. 7.2 from reference [65]), nor set a very high coefficient of friction for the particle-floor interactions (e.g., in the thesis of Zhao [41] which also modeled particles as polyhedra but with a “penetration depth” based force model, the inter-particle friction is set as 35◦ , while the particle-floor friction is as high as 60◦ ; moreover that heap was constructed almost statically by depositing particles inside a box first and two walls of the box were removed quickly before measuring the angle of repose). In our simulation, the heap was constructed dynamically by dropping batches of particles continuously on the apex. 141

142

Fig. 7.1

VERIFICATION

Sand heap on a mirror: no need of manipulation of the “roughness” of the ground

Fig. 7.2 Granular heap of polygons on top of a zigzag surface (a string of wedges) instead of a flat surface computed by Moreau [65] via contact dynamics: The necessity to use a ground which is not flat indicates stability issues.

HEAP FORMATION IN 3D

7.1.1

143

Heap construction

Fig. 7.3 A heap constructed by polyhedral particles on a flat surface: New batch added (above) and the final stage (below).

The particles are generated by choosing corners on the hull of an ellipsoid with half radii 7 mm, 6.5 mm and 6.5 mm (according to the polyhedral particle generation method described in Chapter 3). Each particle consists of 18 corners and 32 faces and the heap is made of 2000 particles. The Young’s modulus is chosen as Y = 6.5 × 107 N/m2 and density as ρ = 1000 kg/m3 . The coefficient of friction is chosen as 0.6 (critical angle 31◦ ), for both particle-particle interaction and particle-floor interaction. For integrating the equations of motion, Gear’s second order predictor-corrector is used with a fixed timestep of dt = 1.5 × 10−5 s for ten seconds physical duration. During the simulation, the particles are added batch by batch, as shown in Fig. 7.3 (scaled in centimeter), close to the apex of the heap to reduce the total input energy to the system (otherwise it would take a longer time for the system to damp out the energy of the impacting particles before reaching the equilibrium). The last batch of particles was added at around 6 seconds and we waited another 4 seconds to measure the angle of repose of the heap. A screenshot of the last frame of the simulation can be seen in Fig. 7.3 scaled in centimeter.

144

VERIFICATION

7.1.2 Angle of repose The positions of the centroids of the particles projected into the y = 0, y = x and x = 0 plane are drawn in Fig. 7.4. The angle of repose at the final time step is about 30◦ . Because the number of corners/faces is relatively large, and the particles are inscribed in “nearly” roundish ellipsoids, the angle of repose is smaller than for many technical materials, Nevertheless, it is larger than the angle of repose for round particles (about

z direction

22◦ [119]). The Urbana-Champaign group [41] also used polyhedral particles, but with a 6 4 2

z direction

5

10

20 x direction

25

30

35

6 4 2 15

z direction

15

20

25 30 projection on y = x

35

40

6 4 2 5

10

15

20 y direction

25

30

Fig. 7.4 The centers of mass of the particles of the heap projected on the y=0, y=x and x=0 plane (unit in centimeter, final frame in simulation); The angle of repose of the heap is around 31◦ to 32◦ , almost the same as the friction parameter used in the simulation. The flat apex of the heap is due to the impact of particles added above it.

penetration depth force model and reported 26◦ for their “heap” (actually not a whole but only one quarter of a heap kept up by walls on two sides) created by removing two walls of a container of particles. To increase the slope of the heap, they added new particles to the quarter-heap. With 25000 particles in total, they claimed a final angle of repose as 38◦ . Though they claimed that the configuration of the pile reached a stable configuration, in their figure (Fig. 64 in reference[41]) there were still some particles flying around the the tail of the heap. Thus we think the 26◦ would be more reliable estimate for their angle of repose, and it might even decrease if no new particles added after removing the walls. The 38◦ just indicates that the particles along the slope did not have time to slide down before the simulation was terminated. Without the two supporting walls and the

HEAP FORMATION IN 3D

145

obviously large coefficient of friction (60◦ or 1.73) between the particles and the ground, whether an angle of repose of 26◦ is obtainable in their simulation is in question. Even though angles of repose for polyhedral DEM-simulations are so rare that they are our only reference here. Thus, we don’t think their data is comparable with our simulation. Generally, from our simulation and also in reference [41], polyhedral particle simulations tend to give realistic high angles of repose which are not usually seen in simulations of spherical particles (except if the rotation is eliminated [120], which is clearly unphysical anyway).

7.1.3

Oscillations in DEM heap

0.6 0.55 0.5

Energy [J]

0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 1

2

3

4

5 Time[s]

6

7

8

9

Fig. 7.5 The time evolution of the total energy of the granular heap of which we measured the angle of repose: the zig-zag ascending range corresponds to the period when new particles added to the heap while the monotonous decrease region corresponds to the heap after construction.

When angles of repose for heaps are reported in DEM simulations, the states of the heaps are usually not explicitly stated, except by the rather ambiguous term “stable” [41]. Here we analyze the state of the heap for the measurement of the angle of repose, as can be seen from Fig. 7.5. At the termination time, the total energy of the granular system is still decreasing, corresponding to an average velocity of particles in the heap about 1 mm/s downwards (while the maximum height of the center of mass of particles is about 69 mm). The decrease of the energy of the heap is due to the damped oscillations of the particles inside. Since the contact force model works essentially as a “spring”, the particles inside DEM heaps oscillate and such oscillations, if not well-controlled, may lead

146

VERIFICATION

Velocity [m/s]

1 0.5 0 -0.5 -1 0

0.5

1

1.5

2 Time[s]

2.5

3

3.5

4

-13

Velocity [m/s]

1

x 10

0.5 0 -0.5 -1 2

2.5

3 Time[s]

3.5

4

Angular velocity [rad/s]

Fig. 7.6 The velocity in z direction of a single particle dropped on a floor in a 4 seconds simulation (above) and the later half of the simulation (below). Oscillations of the same order in x- and y-direction can also be found.

10

wx

0

wy

-10

wz

-20 -30 -40 0

0.5

1

1.5

2 Time[s]

2.5

3

3.5

4

Angular velocity [rad/s]

-12

1

x 10

wx wy

0

w

z

-1 -2 -3 2

2.5

3 Time[s]

3.5

4

Fig. 7.7 The angular velocity of a single particle dropped on a floor in a 4 seconds simulation (above) and the later half of the simulation (below). The angular velocity is not oscillating around zero but stays at some constant value, though as small as the order of 10−12 rad/s.

STUDY OF QUASI-TWO DIMENSIONAL HEAPS

147

to disintegration of the heap. When a single particle with the same parameter settings as in the heap formation is dropped on the floor, the velocity oscillation of the particle is wellcontrolled (of the order of 10−13 m/s) by the integrator (Gear predictor corrector of second order), while a noticeable phenomenon (or artifact) is that the angular velocity converges to a non-zero constant as shown in Fig. 7.7. For single particles, oscillations of the order of 10−12 to 10−13 m/s are insignificant, while in the heap, the amplitude of the oscillations has larger orders which are not negligible anymore. Thousands of such oscillations will not be damped out properly by two-particle force models with the integrator used. These oscillations of particles inside the heap are the main reason why DEM heaps are not as stable as would be desirable, which is the reason of the failure of most DEM code to reproduce angles of repose of heaps on smooth and flat floors without any supporting walls. From the single particle case, we may infer that “rolling without sliding” occurs which makes the introduction of rolling friction and drilling friction necessary for DEM simulation.

7.1.4

Summary and concluding remarks

With our DEM method, we succeeded to construct a three dimensional heap on a smooth ground with realistically high angle of repose. This outperforms any existing DEM codes which need manipulations of boundary conditions or unrealistically large friction parameters. We also found out that oscillations affect the stability of the heap, which would explain why other researchers have not yet published simulations of heaps on smooth ground.

7.2

Study of quasi-two dimensional heaps

Density distributions in granular heaps have up to now been mostly investigated with the discrete element method (DEM) in two dimensions, as few research groups have the necessary simulation codes for non-spherical particles in three dimensions. The angle of repose of quasi-two-dimensional granular heap in three dimensional simulation is higher and therefore more realistic than two dimensional simulations. We investigate the density distribution in a granular heap between walls in relation to the “pressure dip” with a three dimensional DEM simulation with polyhedral particles. We verify the results experimentally with glass beads in a similar configuration. We find consistent density patterns in the experiment and in the simulation.

7.2.1

Background

There is generally a problem in granular materials research in relating two-dimensional results (i.e. results with two-dimensional materials, like rods, i.e. Schneebeli-Material,

148

VERIFICATION

similar to our toothpicks in Fig. 1.1) with quasi-two-dimensional results (three-dimensional grains between two narrow walls) and fully three dimensional geometries (i.e. unbounded, cone-shaped heaps). For the formation both of actual and quasi-two-dimensional heaps, avalanches would be restricted to go either “to the left” or “to the right”, while on surfaces of conical heaps, continuous direction changes are possible. The heap geometry will also affect the internal dynamics. While for two- and fully three-dimensional heaps the ground would carry the whole weight, quasi-dimensional heaps would act as silos, i.e. a part of the weight of the heap may be carried by the walls (depending on the height of the heap and the narrowness of the wall, due to the Janssen-effect [121]). A massive controversy arose in the granular community about the occurrence of “pressure dips” (relative minima in the pressure distributions in the center of heaps between larger pressure amplitudes) of granular heaps in the second half of the 1990s, see the literature [25, 122, 123] and references therein. As H.-G Matuttis [25] pointed out, mostly papers from powder mechanics (poured from a point source) showed pressure minima, papers from civil engineering (built layer-wise) did not. While arching was discussed as a “prime suspect”, the identification of the mechanism which causes the arching is still under debate. Savage [122] had pointed out inconsistencies in the weight computation of heaps built from point sources. Subsequently Schinner [123] found in two-dimensional simulations pressure minima under cores with higher density for heaps built from point sources, and “flat” pressures under homogeneous dense heaps which had been built layer-wise, which indicated that density inhomogeneities were the cause of “arching” and “pressure minima”. For experiments of quasi-two dimensional heaps of glass beads between parallel walls, Chen et al.[124] found density patterns consistent with the two dimensional polygonal simulations. In this section, we would like to go as far as possible to show that density inhomogeneities are not just an artifact of two dimensional simulations, but exist also in three dimension, both in simulation and experiment. Beyond the geometrical differences, dynamically, there are more rolling degrees of freedom in three dimensions, so the competition between rolling and sliding in the formation of the heap is also different. Therefore, for an understanding of realistic granular heaps between narrow walls, a full three dimensional simulation is necessary, though up to now most of the simulations concerning the density distribution were two dimensional [124, 125, 126] or only for round particles [127]. In this section, we will study the density distribution of quasi-two-dimensional granular systems both experimentally and numerically. Compared to numerical simulations, experimental studies of density distribution inside granular heaps are of great difficulty or cost. With nuclear magnetic resonance (NMR), Šmid et al.[128] had found elevated densities in the middle of silo fillings from a point-source. Matuttis [129] and Chen et al. [124] succeeded to calibrate laser-sensor pairs for production engineering to measure the density distribution of granular heaps constructed from glass beads deposited between two narrow transparent walls. We have improved our optical measurement method [124, 129]

STUDY OF QUASI-TWO DIMENSIONAL HEAPS

149

to capture the density inhomogeneous inside the heaps (Section 7.2.2), while applying our newly developed three-dimensional DEM code with convex polyhedral particles to form a quasi-two dimensional heap and investigate its density distribution and bottom pressure distribution.

Fig. 7.8

Fig. 7.9

7.2.2

Sketches of the dispersion of a laser beam going through a granular system

A laser-sensor pair to measure the strength of the laser beam after dispersion

Experimental investigation

When a light beam travels through transparent granular particles, the intensity of the beam will decrease due to dispersion (scattering at the particle surfaces away from the initial direction of incidence, Fig. 7.8). The scattering per covered distance is stronger for more particle surfaces in the optical path, i.e. it increases with density and decreases with the size of the particles. This leads to a characteristic decrease of the light transmissivity which can be formulated like an exponential decay rate [129]. Thus we apply a laser-sensor

150

VERIFICATION

pair (Fig. 7.9) to measure the density distribution in granular-glass-bead heaps with the calibration for densities of a given particle size.

Calibration The calibration setup is shown in Fig. 7.10: A container with adjustable width and fixed length and height (150 mm × 65 mm) is placed in-between of a laser-sensor set (Keyence FU-E11/FS-N11MN) and the measured laser intensity from the sensor can be read from a meter as a quantity ranging from 0 to 9999. For a selected width (e.g. 15 mm), the

Calibration Setup

container

laser

particle samples Fig. 7.10

sensor

width 15 mm

Calibration setup and samples of glass beads used fore experiment

volume of the container is constant. By filling the container with varying amounts of glass beads and varying the filling method, we could obtain different homogenized packing densities for the granular assembly in the container. For calibration, the container should be filled as even as possible to minimize the density difference among the granular system to be measured. Lower densities can be obtained by filling the container when it is laid nearly horizontally and then turn it upright while high density by vibration, e.g. tapping the container or shaking it along the z-axis to let the particles compactify to a closer packing. Since the measured intensity decays exponentially with the increase of the width of the container in such systems [124, 129], we calibrated the new laser-sensor set only for the granular assembly of 15 mm width. The container is filled with 210 g, 225 g, 240 g, 255 g and 277 g non-spherical glass beads of 2-4 mm length, as shown in Fig. 7.10. The calibration between the averaged packing density and the readings of the measured laser intensity is shown in Fig. 7.11. The “error bar” should rather be interpreted as a signature of the inhomogeneous density than actual measurement error. The calibration

STUDY OF QUASI-TWO DIMENSIONAL HEAPS

151

curve shows the capability of the laser-sensor set to resolve a density difference of about 0.1 g/cm3 (approximately 5% to 6% with respect to the mean packing density). With least squares fitting of the experiment data, we obtained an exponential relation between the intensity reading (Ir ) and the packing density (ρp ) as log Ir = −3.69ρp + 7.14,

(7.1)

where ρp is a dimensionless variable obtained from recalling the filling density with the bulk density of the glass beads, 2.78 g/cm3 . Using Eq. (7.1), we can then recalculate the average density distribution inside the heap from the intensity data. Due to low transmissivity, measurements can be obtained for heaps which are up to few tens of particles in the width dimension. experiment data least square fit

190

Intensity reading

180 170 160 150 140 130 120 110 0.52

Fig. 7.11

0.54

0.56 0.58 0.6 0.62 Homogenized density

0.64

0.66

Calibration result: each data point is the average of 80 measurements

Heap construction and density measurement We used a “X-Z axis linear robot system”[124, 129] (IAI ICSA2-ZICM-A-60-40B-T1-5LCT, see Fig. 7.12) for the construction of the granular heaps and the density measurement described in the calibration. The robot can move along the x- and the z-axis with constant speed, making the construction and measurement of granular heaps reproducible. To construct a granular heap, we first fill a hopper with glass beads (the same kind as used in calibration) and put its funnel at the bottom of a acrylic container with 15 mm width (and 1200 mm length and 300 mm height); Then we let the robot lift the hopper at a slow speed (see Fig. 7.13, above) while refilling it to keep a steady granular flow pouring down from

152

VERIFICATION

z−actuator

hopper

Fig. 7.12

x−actuator

x)

acrylic container

laser

height (z)

length (

sensor

width (y)

Construction and measurement setup

Construction

Measurement

Fig. 7.13 The heap is constructed by lifting a hopper with particles (above); The density is measured continuously by moving the laser-sensor set along the dashed line (below).

STUDY OF QUASI-TWO DIMENSIONAL HEAPS

153

the hopper. If the hopper is lifted too fast and the glass beads are accelerated, the heap structure would affected by liquidation of regions by particles with high impact velocity. To measure the density distribution inside the heap, we move the laser-sensor set along the route as show in Fig. 7.13, below; Then the measured signal of the sensor is recorded by a data acquisition system (Keyence MS2-H50) for the sensor. With the calibrated relation Eq. (7.1), we can convert the measured intensity data into density distribution. The experiment results are discussed together with the simulation results in Section 7.2.4.

7.2.3

Simulation

Modeling of particles For the particles in our quasi-two dimensional granular heaps, the vertices of a particle are chosen on the hull of an ellipsoid with given half radii and the faces of a particle are triangles or divided into triangles for computational simplicity. Fig. 7.14 shows the particle shapes used in our simulation, the sizes are rescaled for better visibility, a 400 mm × 150 mm × 20 mm container, a hopper with funnel of an inner radius of 9.5 mm and a sample granular particle of 12 vertices choosing from the surface of an ellipsoid of radii as 2 mm, 1.6 mm and 1.6 mm. A certain randomness is introduced in the coordinates of the vertices of each particle while the same 12-vertex-20-face geometry structure and edge connectivity is kept. A snapshot of the centers of mass of the particles from the simulation is shown in Fig. 7.15 from which we see that the particles fall down from the hopper and pile up inside the container. The detailed outlines of particles are simplified as dots in Fig. 7.15. Since the size of the particles is so small comparing to the whole heap, the outlines would become illegible.

Density computation There are several ways to define average densities of continuous volumes of granular particles, depending on whether the control volume is defined with or without respect of the particles, and whether only the centers of mass or the explicit volume is taken in account, see Fig. 7.16 for a two-dimensional sketch. In the same way we define the homogenized granular density in three dimensions for a cubic cell as the ratio of the volume occupied by the polyhedral particles Vi inside and the volume of the cell Vc itself: ρh =

V1 + V2 + · · · + Vn . Vc

(7.2)

We have three obvious possible selections for the Vi and the Vc : The simplest way is to choose the volumes of all the particles which have their center of mass in the cell Vc (Fig. 7.16 left); Adding the volume of the polyhedra which extend beyond the cell Vc to the cell Vc (Fig. 7.16 middle) is another choice; For our measurement we chose

154

VERIFICATION

Fig. 7.14 Particles used in simulation: the hopper consists 16 equal sized parts for the cone part and 16 equal sized parts for the funnel parts; the granular particle consists of 12 vertices and 20 triangular faces.

Fig. 7.15 A snapshot of the centers of mass of granular particles from simulation: each point mark stands for the center of a granular particle and the container and the hopper are omitted.

STUDY OF QUASI-TWO DIMENSIONAL HEAPS

155

only the volume inside the cell Vc (Fig. 7.16 right) which gives the most accurate result though it is most time consuming. For our “quasi-two-dimensional” heaps, the density distribution is averaged along the width direction. Along the length direction, a moving cell homogenization scheme (Fig. 7.17) was used to improve the resolution for a given cell length.

Fig. 7.16 Three density homogenization methods: In black the grains volumes which are counted, while the dotted line indicates the volume Vc used at the given cell.

homogenization cell 1 (solid line) homogenization cell 2 (dashed line) Fig. 7.17 Moving cell homogenization scheme: each cell has a neighbor cell with certain overlap along the length direction

7.2.4

Results and discussion

Experimental results The angles of repose from the experiment are shown in Fig. 7.18. The upper heap with a larger angle of repose in Fig. 7.18 is the result of a low lifting velocity (2 mm/s) of the hopper with refilling of particles to keep a continuous flow of particles, while the lower one with a quite small angle of repose is caused by the high impact velocities of particles

156

VERIFICATION

accelerated by a relatively high lifting velocity (8 mm/s) and the intermittent flow of particles caused by clogging in the hopper. For proper measurements of density distributions in heaps, we need a certain height to keep the quasi-two dimensional structure. Thus, heaps should be built with small inflow velocities, while high inflow velocities decrease the angle of repose and are probably not relevant for the discussion of the pressure dip.

34

o

14

o

Fig. 7.18 “Proper” and “improper” angles of repose from the experiments: relatively high lifting velocities (8 mm/s instead of 1 mm/s) of the hopper lead to so high flow rates that the angle of repose is significantly reduced.

The density measurement in the experiment is averaged every 8 mm along x-axis and 11 mm along y-axis. The density results from the experiments for lifting velocities 1 mm/ s, 3 mm/s and 5 mm/s are shown in Fig. 7.19, Fig. 7.20 and Fig. 7.21 respectively. It seems that in the experiment, due to the shear flow of the avalanches, lower densities could be reached than in the calibration, but the maximal densities of the calibration due to tapping and vibrating could not be reached. The experimental data towards the ground (bottom graphs in Figs. 7.19-7.21) show a core of high density near the center (middle graphs in Figs. 7.19-7.21) of the heap close to volumes with lower densities. Simulation results and comparison with the experiment By implementing the damping for rotational degrees of freedom, we could obtain realistic angles of repose for the simulation like the upper heap of Fig. 7.22, while without the damping mechanism, the angles of repose are significantly reduced like for the lower heap of Fig. 7.22. In the simulation, we measure the density distribution of a heap whose angle of repose is higher than the 30◦ obtainable with polyhedrons in two dimensional simulations [124] and much larger than simulation with round particles (about 22◦ [119], if no unphysical tricks like elimination of the rotational degrees of freedom or unphysical large rolling frictions are used). The physical parameters for the simulation are as follows: the Young’s modulus is chosen as Y = 6.5 × 107 N/m2 , the density as ρ = 1000 kg/m3 and

STUDY OF QUASI-TWO DIMENSIONAL HEAPS

0.49

0.5

0.51

0.52

0.53

0.54

157

0.55

0.56

0.57

Z Position[mm]

100 80 60 40 50

100

150 X Position[mm]

200

250

Normalized Density

0.5 0.4 0.3 0.2 0.1 50

100 150 200 X Position [mm]

250

100 Z Position [mm]

90 80 70 60 50 40 30 0.35

0.4 0.45 Normalized Density

0.5

Fig. 7.19 Experiment for a lifting velocity 1 mm/s: Two dimensional normalized density contour plot (left), average density along the length dimension (middle) and average density along the height dimension (right).

158

VERIFICATION

0.49

0.5

0.51

0.52

0.53

0.54

0.55

0.56

0.57

Z Position[mm]

100 80 60 40 50

100

150 X Position[mm]

200

250

Normalized Density

0.5 0.4 0.3 0.2 0.1 50

100 150 200 X Position [mm]

250

0.42

0.44 0.46 0.48 Normalized Density

0.5

100 Z Position [mm]

90 80 70 60 50 40 30

Fig. 7.20 Experiment for a lifting velocity 3 mm/s: Two dimensional normalized density contour plot (upper), average density along the length dimension (middle) and average density along the height dimension (bottom).

STUDY OF QUASI-TWO DIMENSIONAL HEAPS

Z Position[mm]

0.49

0.5

0.51

0.52

0.53

0.54

159

0.55

0.56

0.57

80 60 40 50

100

150 X Position[mm]

200

250

Normalized Density

0.5 0.4 0.3 0.2 0.1 50

100 150 200 X Position [mm]

250

0.4 0.45 Normalized Density

0.5

100 Z Position [mm]

90 80 70 60 50 40 30 0.35

Fig. 7.21 Experiment for a lifting velocity 5 mm/s: Two dimensional normalized density contour plot (upper), average density along the length dimension (middle) and average density along the height dimension (bottom).

160

VERIFICATION

o

35

o

20

Fig. 7.22 “Practical” (with realistic damping) and “impractical” (without realistic damping) angles of repose from the simulations.

the coefficient of friction as µ = 0.65. A fixed timestep of dt = 6 × 10−6 s is used for the time integration and the time span is 12 s. There are 3500 particles in the simulation for the upper heap of Fig. 7.22 and 1800 particles in the lower one. The angle of repose of the former is comparable to the physical experiment and its density distribution is measured. The density result from the simulation is shown in Fig. 7.23. The density is homogenized in cells of 8 mm × 8 mm × 8 mm and averaged along the y-axis. When we lift the hopper with funnel (e.g. see Fig. 7.14), the impacting particles vibrate the particle layer below them, which explains the reason of high density in the center. On the other hand, those particles which are not deposited in the middle, but which move to the left and right in the shear flow of avalanche or which lead to slipping of particle layers below them create regions of low density. The reduced densities on the ground, away from the center, can be attributed to avalanches which are deposited without much reordering. Compared with the experiment, the simulation result has higher resolution and shows a pattern with less fluctuation: the higher density regions in the middle surrounded by regions with lower density. Also in the simulation, the density increases towards the ground (bottom graph in Fig. 7.23) and in the middle (middle graph in Fig. 7.23). Thus, both the results for the experiment and the simulation show a consistent density distribution pattern of the quasi-two dimensional heaps. The pressure distribution in the simulation shows a minimum with different lengths of the sampling cell (Fig. 7.24), so the correlation of the density maxima with the pressure dip is established also for the three dimensional simulation. More simulation runs (with particles with slightly different shapes) showed consistent density and ground pressure results. Unfortunately, we have up to now not been able to find an affordable experimental device for the experimental measurement to measure the pressure on the ground. In the simulation, the weight taken by the ground is 85% of the total weight of the heap. As the ratio of the length to height to width of the heap in the simulation is about 140 : 50 : 20,

STUDY OF QUASI-TWO DIMENSIONAL HEAPS

Z Position [mm]

0.46

0.48

0.5

161

0.52

0.54

0.56

240

260

40 30 20 10 140

160

180

200 220 X Position [mm]

Normalized Density

0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 150

200 250 X Position [mm]

Z Position [mm]

50 40 30 20 10 0.1

0.2 0.3 Normalized Density

0.4

Fig. 7.23 Simulation results: Two dimensional normalized density contour plot (upper), average density along the length dimension (middle) and average density along the height dimension (bottom).

162

VERIFICATION

the Janssen-effect is not as marked as it would be for “silo geometries” where the height is considerably larger then the width. Normalized Pressure

1 cell 1 cell 2

0.8 0.6 0.4 0.2 140

160

180

200 220 X Position [mm]

240

260

Normalized Pressure

1 cell 1 cell 2

0.8 0.6 0.4 0.2

140

160

180

200 X Position [mm]

220

240

260

Fig. 7.24 Simulation results: Pressure on the bottom for the homogenization cell length of 8 mm (upper) and the cell length of 12 mm (lower).

7.2.5 Simulation of layered sequence heap

hopper route

3 2 1 Fig. 7.25 The heap is constructed by moving the hopper along the route, indicated by the line with arrow

We also constructed a heap via layered sequence, as shown in Fig. 7.25. The density result from the simulation is shown in Fig. 7.26. As for the heap via wedge sequence, the density is homogenized in cells of 8 mm × 8 mm × 8 mm and averaged along the y-axis. Comparing to the density results for the wedge-sequence heap (Fig. 7.23), there is a clear valley reaching to the ground in the density contour plot for the layered sequence heap

STUDY OF QUASI-TWO DIMENSIONAL HEAPS

0.46

0.48

0.5

0.52

163

0.54

0.56

Z Position [mm]

40 30 20 10 150

200 X Position [mm]

250

Normalized Density

0.4 0.3 0.2 0.1

150

200 250 X Position [mm]

Z Position [mm]

50 40 30 20 10 0.1

0.2 0.3 Normalized Density

0.4

Fig. 7.26 Simulation results of the layered sequence heap: Two dimensional normalized density contour plot (upper), average density along the length dimension (middle) and average density along the height dimension (bottom).

164

VERIFICATION

while for the wedge-sequence heap there is a transition region of valley and ridge in the middle; another distinct difference is that the density variations along the x-axis is so large

Normalized Pressure

that the normalized density is relatively smaller in the middle than the adjacent region. 1 cell 1 cell 2

0.8 0.6 0.4 0.2

Normalized Pressure

140

160

180

200 220 X Position [mm]

240

260

1 cell 1 cell 2

0.8 0.6 0.4 0.2 140

160

180

200 220 X Position [mm]

240

260

Fig. 7.27 Simulation results of the layered sequence heap: Pressure on the bottom for the homogenization cell length of 8 mm (upper) and the cell length of 12 mm (lower).

Unlike the case for the wedge sequence (Fig. 7.24), the pressure distribution of the layered sequence heap in the simulation shows no relative minimum if not a maximum with homogenization cells of different lengths (Fig. 7.27). More simulation runs (with particles with slightly different shapes) have been carried out and consistent results have been obtained. Thus the relation of the pressure dip with the construction history is established also for the three dimensional simulation of quasi-two dimensional system. We can also find a relation between the density distribution and the pressure distribution: a density valley above the ground would correspond to a local pressure maximum while a density ridge above the ground would correspond to a local pressure minimum. Further study is needed to confirm such relation and to clarify the mechanism.

7.2.6 Summary and concluding remarks As a verification of the DEM method, we have calibrated a laser-sensor pair to measure the density distribution of the quasi-two dimensional heap constructed by depositing glass beads in wedge sequence between two narrow acrylic walls and conducted corresponding three-dimensional DEM simulation with polyhedral particles. For DEM simulations, attention should be paid to the damping of the rotational degrees of freedom in the dynamic simulations of granular systems in three dimensions as long as

STUDY OF QUASI-TWO DIMENSIONAL HEAPS

165

only phenomenological damping- and friction- models are used. If the implementation is careless, the damping will increase the noise in the system, fluidize the whole assembly or at least distort the results. A further source of noise is the possible degeneration of the unit quaternions for the rotational degrees of freedom: Not only their norm must be constrained, but also the orthogonality to their time-derivative must be enforced by a projection. To our knowledge, this has not been noted in the literature, as in most dynamic simulations, the particles are in perpetual motion. Only when static configurations are investigated, the high noise amplitudes become obvious in the unphysical depletion of the angle of repose. We do not exclude the possibility that such noise will unfavorably affect dynamic results in other fields of computer simulation. Minimizing the noise both for the modeling and the numerics, our three dimensional simulation produced higher angles of repose than purely two dimensional simulations. For low flow rates and high angles of repose, we found consistent density distribution patterns in the experiment and in the simulation for quasi-two dimensional system of particles between walls: a high density column near the middle of the heap reaches to the ground and is surrounded by lower densities regions. In the simulation, we can clearly identify a pressure dip, while for the experiment, we are still hoping that an affordable measurement device becomes available. For the experiment, the density fluctuations were much larger than for the simulation: Future work will have to reveal whether it is due to the fact that the experiment particles are not all strictly convex like the particles in the simulation, or whether it is an defect of the simulation. For high flow rates, where the slope is not straight any more, and the dynamics in the formation is different, the density results are inconclusive but these were also not the slopes for which pressure dips had been reported. All density averages increase towards the ground. The history effect of granular assemblies can be reproduced with our simulation method: for a heap constructed via layered sequence, there is no pressure dip in contrast to the heap constructed by the wedge sequence, which proves the relation of the pressure dip with the construction history for the three dimensional simulation of quasi-two dimensional system. Future work will have to reveal the mechanical principles which cause the arching due to the higher densities. Though improvements will be necessary for both the experiment (better calibration and adjustment) and the simulation (implementation of exact many body friction), qualitatively comparable results have been achieved in a resolution and verisimilitude of the simulation much better than what has been published up to now.

Chapter 8

Summary and Conclusions We have developed a DEM code for granular material research with polyhedral particles, which is more realistic than conventional round shapes. The force model uses the full overlap information which yields continuously varying elastic forces in contrast to the penetration-depth based force model for polyhedral particles. With the code, we obtained realistically high angles of repose for three dimensional heaps on smooth ground, outperforming all the existing DEM codes either with round or polyhedral particles. We obtained consistent results between the simulations and the corresponding experiments of density distributions in quasi-two dimensional heaps constructed by wedge-sequence. The dependence of the construction history and the existence of (ground) pressure dip have also been reproduced in the simulation.

8.1 Summary We summarize the thesis from two perspectives, the development and the verification of the DEM code. In this thesis, we developed a three dimensional DEM code for granular material research, characterized by the following aspects: 1. The granular particles are subjected to Newton-Euler’s equations of motion. Unit quaternions are used to represent the rotational degrees of freedom. We found out the necessity to ensure the orthogonality between quaternions and their time derivatives after each call of the integrator in DEM simulation. We chose the Gear predictor-corrector to integrate the equations of motion, for its stability, capability and efficiency. 2. The particles are modeled as convex polyhedra. A polyhedron is represented via its vertices and triangular faces in the DEM code. We improved a polyhedron generation algorithm able to generate polyhedra with better triangulated surface. 166

SUMMARY

167

3. For the contact detection, a hierarchy of algorithms is applied, starting with a neighborhood algorithm with less computational effort to detect the possible contact particle pairs and refining the detected result with a relative complex algorithm which compares the projections of vertices along the line connecting the centers of mass of both particles. (a) For the neighborhood algorithm, we generalized the “sort and sweep” method which detects neighboring particles via sorting the axis-aligned bounding boxes of particles in one dimension to three dimensions. (b) We proposed and implemented two methods to refine the contact list, one using the bounding sphere and the other using the projections of the vertices along the line connecting the centers of mass of the two polyhedra. We demonstrated that with the above contact detection scheme, the detected (possible) contacting particle pairs is only proportional to the total number of particles, which means for a simulation of n particles, in the DEM code the computational effort needed for the overlap computation of particle pairs is linear (O(n)) rather than quadratic (O(n2 )). 4. For the overlap computation of two polyhedra in contact, we proposed and implemented a systematic approach to obtain the vertices, the faces, the center of mass, the contact line, the contact area and its normal of the overlap polyhedron. (a) We compared two triangle intersection computation methods, one using the point-direction and the other using the point-normal representation for the planes where the triangles are located. The point-normal method is implemented in the DEM code for its efficiency and compliance with the data structure for representing a face. (b) In the computation for vertices, we distinguish the inherited vertices obtained from the original polyhedra and the generated vertices obtained from face intersections. The brute-force approaches for computing the inherited vertices (by checking the relative positions of all the vertex-face pairs of the two polyhedra) and the generated vertices (as the intersection points of all the triangular face pairs of the two polyhedra) are optimized via the introduction of neighboring features (faces and vertices). (c) We introduced and compared two methods to locate the neighboring features which might intersect, one using the overlap of the axis-aligned bounding boxes of the two polyhedra, the other using the projection of the vertices along the line connecting the centers of mass of the two polyhedra. The projection method is implemented in the current code, since the projected vertices are available from the refinement of the contact list in the contact detection process.

168

SUMMARY AND CONCLUSIONS

(d) For computing the faces of the overlap polyhedron, we proposed two methods to order the vertices on a face, one compares the angles formed by the vertices with respect to the arithmetic center of all the vertices, the other with respect to one known edge of the face. The latter, faster one is implemented. (e) After the vertices and the faces are determined, the center of mass of the overlap polyhedron is computed. By connecting the intersection segments from the face intersection, we obtain the contact line. By connecting each contact segment and the center of mass, we obtain the contact triangles which all together form the contact area. We then compute the average of the area-weighted normals of the contact triangles which gives the normal of the contact area. With the application of neighboring features, we can improve the vertex computation efficiency dramatically, from the quadratic brute force approaches to linear performance. With the above overlap computation scheme, we can efficiently obtain the volume of the overlap polyhedron and the normal of the contact area which are used to model the contact force. 5. We proposed a normal force model for polyhedral contacts and generalized the twodimensional Cundall-Strack friction model to three dimensions. (a) In the direction of the normal of the contact area, the elastic force is modeled proportional to the volume of the overlap polyhedron and the dissipative force is proportional to the changes of the overlap volume. The characteristic length is introduced in the normal force model so that the continuum limit of the materials stays the upper limit for sound propagation in a granular packing. (b) In the tangential direction, we generalized the Cundall-Strack friction model via three successive steps: i. A projection step in which the tangential force from the previous time step is projected onto the current tangential plane. ii. A rescaling step in which the magnitude of the projected tangential force is rescaled to the magnitude before the projection. iii. An addition step in which the incremental tangential force vector is added to the rescaled tangential force. A cut-off is applied after the addition to ensure that the magnitude of the tangential force would not exceed the magnitude of the dynamic friction as determined by the normal force and the coefficient of friction. A viscous-like damping is modeled in the tangential direction as well. With our definition of the elastic force which contains the characteristic length, for a linear chain of particles with constant cross-section, the sound velocity in a bulk with the same kind of material can be reproduced. We analyzed the continuity of our elastic force model and showed the continuous time evolution of the elastic force

SUMMARY

169

during the contact process of two arbitrary-shaped polyhedra. We set up a cube on a plane with lower inclination than the critical angle of the friction coefficient with finite initial speed. The cube is stopped due to the friction and can reach the force equilibrium as expected: the normal force converges to the component of the gravity force orthogonal to the plane and the tangential force converges to the tangential component which acts as static friction. 6. Parallelization: To speed up the simulation, we parallelized the DEM code with OpenMP for multi-core machines. From the profiling of the scalar code, we found that the overlap computation is, as expected, the most time-consuming part. To parallelize the code, we compared two strategies, a coarse-granular approach with the simultaneous computation of the overlap polyhedron for several pairs of particles, and a fine-granular approach where the vertices computation for a single particle pair is parallelized. We implemented the former one due to the only modest effort needed to rewrite the scalar code compared to the latter. For a workstation with a AMD OPTERON Duo-Processor-Quadcore CPU, we obtained a parallelization efficiency of about 64.8% (35% and 17.4%) for a speed up of 1.3 (1.4 and 1.4) on 2 (4 and 8) processor-cores with dynamic scheduling. We also tried out the parallelization on a MacOS i7-based machine with GFORTRAN compiler. It turns out that i7-based on machine outperforms the OPTERON machine for its larger L3 cache size. For the verifications of the DEM code, we carried out simulations and the corresponding experiments: 1. Simulation of heap formation in three dimensions: We simulated heap formation on flat surfaces and obtained realistic high angles of repose (larger than 30 degrees) which are not usually seen in simulations of spherical particles. It was also noticed that angle of repose would decrease gradually with the extension the simulation duration. The instability of the angle of repose is an inherent problem for simulations using the spring-like Cundall-Strack friction model to approximate static friction without the possibility to damp out microscopic vibrations numerically exact. 2. Simulations and experiments of quasi-two dimensional heaps: We studied the density distributions of granular heaps deposited in a wedge sequence between two narrow walls with our DEM code and in experiment with glass beads. We found consistent density distribution patterns in the experiment and in the simulation: a high density region near the middle of the heap reaches to the ground and is surrounded by lower densities regions. In the simulation of the wedge sequence heap, we can clearly identify a pressure dip, which is not accessible for the experiment for the time being. In the simulation of the layered sequence heap, we find no pressure dip in contrast to the wedge sequence heap. The influence of the construction history on the pressure distribution was reproduced by our DEM code.

170

SUMMARY AND CONCLUSIONS

8.2 Conclusions Our DEM code turns out to be the only simulation method worldwide which can be used to compute history effects in granular assemblies. It is the only code for which three dimensional granular heaps with realistically high angles of repose on smooth surfaces can be obtained for physically realistic parameters. To conclude: • Our way of defining the forces, together with the overlap computation, gives a continuous time evolution for the forces, which is the prerequisite for the use of solvers for differential equations. • Our simulation code allows to investigate Maxwell’s “historical element”, i.e. the dependence of densities and pressures on the building method, for three dimensional granular aggregates. • Not “a single trick” was necessary to obtain a reliable simulation code, but various improvements over the previous techniques where necessary, ranging from the choice of integrator, over the enforcement of the orthogonality between unit quaternions and their time derivatives, to the definition of the force law and the type of intersection point computation between faces and edges. Applying dubious “numerical damping” (as used in many finite element codes via the Newmark-method) is not necessary. • The “angle of internal friction” is different from the value one obtains from the friction coefficient µ via θ = arctan µ. The macroscopic “angle of internal friction” is moreover strongly influenced by the particle shape. For further application as well as verification, discrete avalanches in rotating drums, sound propagation in dense packings, stress propagation and stress-strain diagrams can be investigated with the DEM code. To improve the stability, an attempt should be made to obtain a computationally viable solution of the exact friction problem.

References [1] D. P. Bentz, C. F. Ferraris, M. A. Galler, A. Hansen, and J. Guynn. Influence of particle size distributions on yield stress and viscosity of cement-fly ash pastes. Cement and Concrete Research, 2010. [2] H. Shen, W. Hibler, and M. Leppäranta. On applying granular flow theory to a deforming broken ice field. Acta Mechanica, 63:143--160, 1986. 10.1007/BF01182545. [3] Ishan Sharma, James T. Jenkins, and Joseph A. Burns. Tidal encounters of ellipsoidal granular asteroids with planets. Icarus, 183(2):312--330, 2006. [4] J. Blum and Dust Agglomeration. Dust agglomeration. In F. Boulanger, Christine Joblin, Anthony Jones, and Suzanne Madden, editors, Interstellar Dust from Astronomical Observations to Fundamental Studies, volume 35 of EAS Publications, pages 195--217, 2009. [5] George S. Fishman. Principles of discrete event simulation. Wiley, 1978. [6] D. C. Rapaport. The Art of Molecular Dynamics Simulation. Cambridge University Press, Cambridge, 2004. [7] S. Luding. Die Physik trockener granularer Medien. Logos Verlag, Berlin, 1998. Habilitation thesis (in german), ISBN 3-89722-064-4. [8] G. William Baxter and R. P. Behringer. Cellular automata models of granular flow. Phys. Rev. A, 42:1017--1020, Jul 1990. [9] Per Bak, Chao Tang, and Kurt Wiesenfeld. Self-organized criticality: An explanation of the 1/ f noise. Phys. Rev. Lett., 59:381--384, Jul 1987. [10] Naoto Yoshioka. A sandpile experiment and its implications for self-organized criticality and characteristic earthquake. Earth Planets Space, 55:283--289, 2003. [11] H. M. Jaeger, Chu-heng Liu, and Sidney R. Nagel. Relaxation at the angle of repose. Phys. Rev. Lett., 62:40--43, Jan 1989. [12] Eric Goles, Gregorio Gonzalez, Hans Herrmann, and Servet Martinez. Simple lattice model with inertia for sand piles. Granular Matter, 1:137--140, 1998.

171

172

REFERENCES

[13] Emanuele Caglioti, Vittorio Loreto, Hans J. Herrmann, and Mario Nicodemi. A ``tetrislike'' model for the compaction of dry granular media. Phys. Rev. Lett., 79:1575--1578, Aug 1997. [14] M.G. Kendall and B. Babington Smith. Tables of random sampling numbers. Cambridge University Press, 1939. [15] C. Wenzler. Das Roulette-Spiel. Taschenbuch und Ratgeber für alle Interessenten dieses Spieles. Leupoldt, Stuttgart, 1890. [16] R. R. Coveyou, J. G. Sullivan, H. P. Carter, D. C. Irving, R. M. Freestone, Jr., and F. B. K. Kam. O5R, a General-Purpose Monte Carlo Neutron Transport Code. Oak Ridge National Laboratory, 1965. [17] Nicholas Metropolis, Arianna W. Rosenbluth, Marshall N. Rosenbluth, Augusta H. Teller, and Edward Teller. Equation of state calculations by fast computing machines. Journal of Chemical Physics, 21:1087--1092, 1953. [18] T.A.J. Duke, G.C. Barker, and A. Mehta. Monte carlo study of granular relaxation. Europhys. Lett., 13:19--24, 1990. [19] Anita Mehta. Granular matter: an interdisciplinary approach. Springer, 1994. [20] MÜLLER M., LUDING S., and HERRMANN H.J. Simulations of vibrated granular media in two and three dimensional systems. In Giovanni Gallavotti, Wolfgang L. Reiter, and Jakob Yngvason, editors, D.E. Wolf and P. Grassberger, pages 335--340. World Sci., Singapore, 1997. [21] Mark E. J. Newman and G. T. Barkema. Monte Carlo methods in statistical physics. Oxford University Press, New York, 1999. [22] D.P. Landau and K. Binder. A Guide to Monte Carlo Simulations in Statistical Physics. Cambridge University Press, Cambridge, September 2005. [23] S.B. Savage and K. Hutter. The motion of a finite mass of granular material down a rough incline. J. Fluid Mech., 199:177--215, 1989. [24] P. A. Cundall and O. D. L. Strack. A discrete numerical model for granular assemblies. Geotéchnique, 29(1):47--65, 1979. [25] H. G. Matuttis. Simulation of the pressure distribution under a two dimensional heap of polygonal particles. Granular Matter, 1:83--91, 1998. [26] Peter Wriggers. Computational Contact Mechanics. Springer, second edition, 2006. [27] O. C. Zienkiewicz and R. L. Taylor. The Finite Element Method for Solid and Structural Mechanics. Elsevier, sixth edition, 2005.

REFERENCES

173

[28] Nobyasu Ito, Hans-Georg Matuttis, Hiroshi Watanabe, and Keiko M. Aoki. Vectorizable overlap computation for ellipse-based discrete element method. In Proceedings to Powders and Grains 2001, volume 35, pages 173--176, 2001. [29] G. G. W. Mustoe and H. P. Huttelmaier. Dynamic simulation of a rockfall fence by the discrete element method. Computer-Aided Civil and Infrastructure Engineering, 8(6): 423--437, November 1993. [30] CHUNG-YUE WANG, CHI-FONG WANG, and JOPAN SHENG. A packing generation scheme for the granular assemblies with 3d ellipsoidal particles. Int. J. Numer. Anal. Meth. Geomech., 23:815--828, 1999. [31] JOHN R. WILLIAMS and ALEX P. PENTLAND. Superquadrics and modal dynamics for discrete elements in interactive design. Engineering Computations, 9(2):115 --127, 1992. [32] J.W. Perram, J. Rasmussen, E. Praestgaard, and J. Lebowitz. The ellipsoid contact potential: Theory and relation to overlap potentials. Phys. Rev., 54:6565--6572, 1996. [33] Thorsten Pöschel and Volkhard Buchholtz. Static friction phenomena in granular materials: Coulomb law versus particle geometry. Phys. Rev. Lett., 71:3963--3966, Dec 1993. [34] T. Matsushima, H. Saomoto, M. Matsumoto, K. Toda, and Y. Yamada. Discrete element simulation of an assembly of irregularly-shaped grains: quantitative comparison with experiments. In 16th ASCE Engineering Mechanics Conf., volume 44, pages 1--8, 2003. [35] T. Matsushima. Effect of irregular grain shape on quasi-static shear behavior of granular assembly. In R. García-Rojo, H.J. Herrmann, and S. McNamara, editors, Powders and Grains 2005, volume II, pages 1319--1323, 2005. [36] Jun KATAGIRI, Takashi MATSUSHIMA, and Yasuo YAMADA. Dense granular flow simulation for irregularly-shaped grains by image-based dem. In Proc. of the 57th Nat. Cong. of Theoretical & Applied Mechanics, pages 511--512, 2008. [37] A. Woutersem, S. Luding, and A. P. Philipse. On contact numbers in random rod packings. Granular Matter, 11:169--177, 2009. [38] J. Wang, H. Yu, P. Langston, and F. Fraige. Particle shape effects in discrete element modelling of cohesive angular particles. Granular Matter, 13:1--12, 2011. [39] Beate Muth, Peter Eberhard, and Stefan Luding. Simulation of contacting spatial polyhedral particles. In XXI International Congress of Theoretical and Applied Mechanics, Warsaw, Poland, August 2004.

174

REFERENCES

[40] Beate Muth, Günther Of, Peter Eberhard, and Olaf Steinbach. Collision detection for complicated polyhedra using the fast multipole method or ray crossing. Arch Appl Mech, 77:503--521, 2007. [41] Dawei Zhao. Three-dimensional discrete element simulation for granular materials. PhD thesis, University of Illinois at Urbana-Champaign, Urbana, Illinois, 2006. [42] Erfan Ghzazi Nezami.

Three-dimensional discrete element simulation of granular

materials using polyhedral particles. PhD thesis, University of Illinois at UrbanaChampaign, Urbana, Illinois, 2007. [43] Alexander V. Potapov and Charles S. Campbell. A fast model for the simulation of non-round particles. Granular Matter, 1:9--14, 1998. [44] Pengcheng Fu, Otis R. Walton, and John T. Harvey. Polyarc discrete element for efficiently simulating arbitrarily shaped 2d particles. Int. J. Numer. Meth. Engng, 2011. [45] S. Rémond, J. L. Gallias, and A. Mizrahi. Simulation of the packing of granular mixtures of non-convex particles and voids characterization. Granular Matter, 10:157--170, 2008. [46] CHING L. LIAO and CHING S. CHANG. A microstructural finite element model for granular solids. Engineering Computations, 9(2):267--276, 1992. [47] A. Munjiza, D.R.J. Owen, and N. Bicanic. A combined finite-discrete element method in transient dynamics of fracturing solids. Int. J. of Engineering Computation, 12:145-174, 1995. [48] Antonio Munjiza. The Combined Finite-Discrete Element Method. John Wiley, 2004. [49] C. Miehe and D. Zäh J. Dettmar. Homogenization and two-scale simulations of granular materials for different microstructural constraints. International Journal for Numerical Methods in Engineering, Special Issue: Multiscale Methods in Computational Mechanics, 83(8-9):1206--1236, 2010. [50] B.J. Alder and T.E. Wainwright. Studies in molecular dynamics. i. general method. The Journal of Chemical Physics, 31(2):459--466, 1959. [51] B.J. Alder and T.E. Wainwright. Phase transition in elastic disks. Physical Review, 127(2):359--361, 1962. [52] Beate Muth, Peter Eberhard, and Stefan Luding. Simulation of contacting spatial polyhedral particles. In Tomasz A. Kowalewski and Witold Gutkowski, editors, Proceedings of the 21st International Congress of Theoretical and Applied Mechanics, Warsaw, Poland, August 2004. Springer.

REFERENCES

175

[53] Beate Muth and Peter Eberhard. Investigation of large systems consisting of many spatial polyhedral bodies. In D.H. van Campen, M.D. Lazurko, and W.P.J.M. van der Oever, editors, Proceedings of the ENOC-2005 Fifth EUROMECH Nonlinear Dynamics Conference, pages 1644--1650, Eindhoven, The Netherlands, August 2005. [54] Dawei Zhao, Erfan G. Nezami, Youssef M.A. Hashash, and Jamshid Ghaboussi. Threedimensional discrete element simulation for granular materials. Engineering Computations, 23(7):749--770, 2006. [55] Erfan G. Nezami, Youssef M.A. Hashash, Dawei Zhao, and Jamshid Ghaboussi. Simulation of front end loader bucket-soil interaction using discrete element method. Engineering Computations, 31(9):1147--1162, 2007. [56] H.-G. Matuttis and A. Schinner. Particle simulation for cohesive granular materials. Int. Journ. Mod. Phys. C, 12(7):1011--1022, 2001. [57] Gen-Hua Shi. Block System Modeling by Discontinuous Deformation Analysis. WIT Press, 1992. [58] 浦川文寛, 相川明, and 名村明. 砕石形状実測データを用いたバラスト軌道のモデル化と三次 元個別要素による軌道動的応答解析. RTRI REPORT, 23(2):11--16, 2009. (in Japanese). [59] Shi Han NG and Hans-Georg Matuttis. Adaptive mesh generation for two dimensional simulations of polygonal particles. Theoretical and Applied Mechanics Japan, 59:323-333, 2010. [60] J. J. Moreau. Unilateral contact and dry friction in finite freedom dynamics. In J. J. Moreau. and P. D. Panagiotopoulos, editors, Nonsmooth Mechanics and Applications, volume 302 of CISM Courses and Lectures, pages 1--82. Springer, 1988. [61] David Baraff. An introduction to physically based modeling: Rigid body simulation i unconstrained rigid body dynamics. Siggraph Course Notes, 1997. [62] David Baraff. An introduction to physically based modeling: Rigid body simulation ii - nonpenetration constraints. Siggraph Course Notes, 1997. [63] Ming Chieh Lin. Efficient Collision Detection for Animation and Robotics. PhD thesis, University of California, Berkeley, Berkeley, CA, 1993. [64] Jonathan D. Cohen, Ming C. Lin, Dinesh Manocha, and Madhav Ponamgi. I-collide: An interactive and exact collision detection system for large-scaled environments. In ACM SIGGRAPH Symposium on Interactive 3D Graphics (I3D), 1995. [65] J. J. Moreau. Application de la methode ``contact dynamics'' à des collections de solides polygonaux. Aussois, nov 1997. 4ème Rénion annuelle du Résau de Laboratoires GEO.

176

REFERENCES

[66] E. Azéma, F. Radjai, R. Peyroux, V. Richefeu, and G. Saussine. Short-time dynamics of a packing of polyhedral grains under horizontal vibrations. The European Physical Journal E: Soft Matter and Biological Physics, 26:327--335, 2008. 10.1140/epje/i200710331-0. [67] E. Rabinowicz. Friction and Wear of Materials. Wiley, 1965. [68] Jian Chen, Alexander Schinner, and Hans-Georg Matuttis. Static friction, differential algebraic systems and numerical stability. Physics Procedia, 6:65--75, 2010. Computer Simulations Studies in Condensed Matter Physics XXI - Proceedings of the 21st Workshop, Computer Simulations Studies in Condensed Matter Physics XXI. [69] V. M. Matrosov and I. A. Finogenko. The solvability of the equations of motion of mechanical systems with sliding frictions. J. Appl. Maths Mechs, 58(6):945--954, 1994. [70] J.J. Kalker. Three-Dimensional Elastic Bodies in Rolling Contact. Springer, 1990. [71] Herrmann H.J. Tillemans, H.J. Simulating deformations of granular solids under shear. Physica A, 217:261--288, 1995. [72] J.J. Moreau. Evolution problem associated with a moving convex set in a hilbert space. Journal of Differential Equations, 26:347--374, 1977. [73] CHRISTIAN WALTER STUDER. Augmented time-stepping integration of non-smooth dynamical systems. PhD thesis, Technical University of Zürich, 2008. [74] E. Hairer, S. P. Nørsett, and G. Wanner. Solving Ordinary Differential Equations I Nonstiff Problems. Springer, second revised edition, 1993. [75] N. van de Wouw and R. I. Leine. Attractivity of equilibrium sets of systems with dry friction. Nonlinear Dynamics, 35(19--39), 2004. [76] L. A. Pars. A Treatise on Analytical Dynamics. Ox Bow Press, 1981. [77] R. Brockbank, J.M. Huntley, and R.C. Ball. Contact force distribution beneath a threedimensional granular pile. J. Phys. II France, 7:1521--1532, 1997. [78] Gerd Gudehus. Geotechnical Engineering Handbook, Fundamentals, chapter Earth pressure determination, pages 407--436. Wiley-VCH, 2002. [79] G.H. Darwin. On the horizontal thrust of a mass of sand. In Proc. Inst. Civ. Eng., volume 71, pages 350--378, 1883. [80] C. Terzaghi. Old earth-pressure theories and new test results. Eng. New-Rec., 85(14): 632--637, 1920. [81] Arnold Sommerfeld. Mechanics Lecture on Theoretical Physics, Vol. I. Academic Press, New York and London, fourth printing edition, 1969.

REFERENCES

177

[82] Herbert Goldstein. Classical Mechanics. Addison-Wesley, second edition edition, 1980. [83] C. William Gear. Numerical Initial Value Problems in Ordinary Differential Equations. Prentice-Hall Series in Automatic Computation. Prentice-Hall, 1971. [84] E. Hairer and G. Wanner.

Solving Ordinary Differential Equations II Stiff and

Differential-Algebraic Problems. Springer, second revised edition, 1996. [85] Donald T. Greenwood. Advanced Dynamics. Cambridge University Press, 2003. [86] M. P. Allen and D. J. Tildesley. Computer Simulation of Liquids. Clarendon Press, Oxford, 1987. [87] Jens Wittenburg. Dynamics of Systems of Rigid Bodies. B.G. Teubner Stuttgart, 1977. [88] A. Nordsieck. On numerical integration of ordinary differential equations. Math. Comp., 16:22--49, 1962. [89] William H. Press, Saul Teukolsky, William Vetterling, and Brian Flannery. Numerical Recipes in C. Cambridge University Press, second edition, 1992. [90] William H. Press, Saul Teukolsky, William Vetterling, and Brian Flannery. Numerical Recipes in Fortran. Cambridge University Press, second edition, 1992. [91] Alejandro L. Garcia. Numerical Methods for Physics. Prentice Hall, Englewood Cliffs NJ, first edition, 1994. [92] Franco P. Preparata and Michael Ian Shamos. Computational Geometry: Nn Introduction. Springer, 1985. [93] C. B. Barber, D. P. Dobkin, and H. T. Huhdanpaa. The quickhull algorithm for convex hulls. ACM Trans. on Mathematical Software, 22(4):469--483, Dec 1996. http:// www.qhull.org. [94] Alexander Schinner. Numerische simulationen für granulare medien. Master's thesis, Universität Regensburg, 1995. [95] William B. Heard. Rigid Body Mechanics. Willey-VCH, 2006. [96] Loup Verlet. Computer "experiments" on classical fluids. i. thermodynamical properties of lennard-jones molecules. Phys. Rev., 159:98--103, Jul 1967. [97] Carl Störmer. Méthode d'intégration numérique des équations différentielles ordinaires. In Comptes rendus du Congrès international des Mathématiciens, pages 243-257, Strasbourg, 1921. [98] Wolfgang Vermöhlen and Nobuyasu Ito. State diagram of polydisperse elastic-disk systems. Phys. Rev. E, 51:4325--4334, May 1995.

178

REFERENCES

[99] B. C. Vemuri, L. Chen, L. Vu-Quoc, X. Zhang, and O. Walton. Efficient and accurate collision detection for granular flow simulation. In GRAPHICAL MODELS AND IMAGE PROCESSING 60, pages 403--422, 1998. [100] P. Jiménez, F. Thomas, and C. Torras. 3d collision detection: A survey. Computers & Graphics, 25:269--285, 2001. [101] Noriko K. Mitani, Hans-Georg Matuttis, and Toshihiko Kadono. Density and size segregation in deposits of pyroclastic flow. GEOPHYSICAL RESEARCH LETTERS, 31:L15606, 2004. [102] S.M. Veres. Error control in polytope computations. Journal of Optimization Theory and Applications, 113(2):325--355, 2002. [103] Gene H. Golub and Charles F. Van Loan. Matrix Computations. The Johns Hopkins University Press, 1996. [104] E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK's user's guide. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 1992. [105] Steve McConnell. Code Complete: A Practical Handbook of Software Construction. Microsoft Press, first edition, 1993. [106] S. A. E. Shourbagy, S. Okeda, and H. G. Matuttis. Acoustic of sound propagation in granular materials in one, two, and three dimensions. J. Phys. Soc. Jpn., 77(3):034606 1--5, 2008. [107] K. L. Johnson. Contact Mechanics. Cambridge University Press, 1985. [108] S. A. E. Shourbagy, S. Morita, and H. G. Matuttis. Simulation of the dependence of the bulk-stress-strain relations of granular materials on the particle shape. J. Phys. Soc. Jpn., 75(10):104602(1--10), 2006. [109] Dietrich E. Wolf. Modelling and computer simulation of granular media. In Karl Heinz Hoffmann and Michael Schreiber, editors, Computational Physics: selected methods, simple exercises, serious applications. Springer, 1996. [110] S. Chong, M. Otsuki, and H. Hayakawa. Generalized green-kubo relation and integral fluctuation theorem for driven dissipative systems without microscopic time reversibility. Phys. Rev. E, 81:041130, 2010. [111] Kyle C. Smith, Timothy S. Fisher, and Meheboob Alam. Quasi-static compaction of polyhedra by the discrete element method. In M. Nakagawa and S. Luding, editors, Powders and Grains 2009, Proceedings of the 6th International Conference on Micromechanics of Granular Media, pages 90--93, Golden, Colorado, 2009. AIP.

REFERENCES

179

[112] A.F. Filippov. Differential Equations with Discontinuous Righthand Sides. Springer, 1988. [113] Jian Chen, Alexander Schinner, and Hans-Georg Matuttis. Discrete element simulation for polyhedral granular particles. Theoretical and Applied Mechanics Japan, 59:335-346, 2010. [114] Jian Chen and Hans-Georg Matuttis. Study of quasi two dimensional granular heaps. Theoretical and Applied Mechanics Japan, 60, 2011. to appear in. [115] Christian M. Dury, Renate Knecht, and Gerald H. Ristow. Size segragation of granular materials in a 3d rotating drum. In HPCN Europe' 98, pages 860--862, 1998. [116] Bernd Wachmann and Stefan Schwarzer. Three-dimensional massively parallel computing of suspensions. Int. J. Mod. Phys. C, 9(5):759--775, 1998. [117] Rohit Chandra, Ramesh Menon, Leo Dagum, David Kohr, Dror Maydan, and Jeff McDonald. Parallel Programming in OpenMP. Morgan Kaufmann, 2000. [118] Barbara Chapman, Gabriele Jost, and Ruud van der Pas. Using OpenMP: Portable Shared Memory Parallel Programming. The MIT Press, 2007. [119] R. Albert, I. Albert, D. Hornbaker, P. Schiffer, and A.-L. Barabási. Maximum angle of stability in wet and dry spherical granular media. Phys. Rev. E, 56(6):R6271--R6274, 1997. [120] J. Lee and H.J. Herrmann. Angle of repose and angle of marginal stability: molecular dynamics of granular particles. J. Phys. A: Math. Gen., 26(2):373--383, 1993. [121] H. A. Janssen.

Versuche über getreidedruck in silozellen.

Zeitschr. d. Vereines

deutscher Ingenieure, 39(35):1045--1049, 1895. [122] S. B. Savage. Modeling and granular material boundary value problems. In H. J. Herrmann, J. P. Hovi, and S. Luding, editors, Physics of Dry Granular Media, volume 350 of ASI Series E: Applied Sciences, pages 25--96, Cargese, 1998. NATO, Kluwer Academic Publishers. [123] Alexander Schinner, Hans-Georg Mattutis, Tetsuo Akiyama, Junya Aoki, Satoshi Takahashi, Keiko M. Aoki, and Klaus Kassner. Towards a micromechanic understanding of the pressure distribution under heaps. In Mathematical Aspects of Complex Fluids II, volume 1184, pages 123--139. RIMS Kokyoroku Series, 2001. [124] J. Chen, L. Watanabe, H.-G. Matuttis, and H. Sakaguchi. History dependence of the density distribution on granular heaps. In M. Nakagawa and S. Luding, editors, Powders and Grains 2009, Proceedings of the 6th International Conference on Micromechanics of Granular Media, pages 199--202, Golden, Colorado, 2009. AIP.

180

REFERENCES

[125] Pradip Kumar Roul. Numerical Investigation of Micro and Macro Mechanical Behaviour of Granular Media via a Discrete Element Approach. PhD thesis, École Polytechnique Fédérale de Lausanne, 2009. [126] Pradip Roul and Alexander Schinner. Simulation study on micro and macro mechanical behaviour of sand piles. Powder Technology, 204(1):113--123, 2010. [127] D. Lazarevic, K. Fresl, and B. Milovanovic. Some discrete properties of the granular contents of silos. In B.H.V. Topping, J.M. Adam, F.J. Pallarés, R. Bru, and M.L. Romero, editors, Proceedings of the Seventh International Conference on Engineering Computational Technology. Civil-Comp Press, 2010. [128] J. Šmid, Pham Van Xuan, and J. Thýn. Effect of filling method on the packing distribution of a catalyst bed. Chem. Eng. Technol., 16(2):114--118, 1993. [129] Hans-Georg Matuttis, Takumi Kinouchi, and Hide Sakaguchi. Vibration-induced compaction in elongated vessels. In Mathematical Aspects of Complex Fluids IV, volume 1413, pages 84--93. RIMS Kokyoroku Series, 2005.

List of Publications Related to the Thesis List of Publications Related to the Thesis 1. Jian Chen and Hans-Georg Matuttis, Study of Quasi Two Dimensional Granular Heaps , Theoretical and Applied Mechanics Japan, Vol. 60, pp. 225-238, 2011 (Related to Chapter 2, 5 and 7) 2. Jian Chen, Alexander Schinner and Hans-Georg Matuttis, Discrete Element Simulation for Polyhedral Granular Particles , Theoretical and Applied Mechanics Japan, Vol. 59, pp. 335-346, 2010 ( Related to Chapter 2, 3, 4, 5 and 7) 3. Jian Chen, Lisa Watanabe, Hans-Georg Matuttis and Hide Sakaguchi, History Dependence of the Density Distribution on Granular Heaps , in M. Nakagawa and S. Luding, editors, Powders and Grains 2009, Proceedings of the 6th International Conference on Micromechanics of Granular Media , pp. 199-202 (AIP, Golden, Colorado, 2009). ( Related to Chapter 5 and 7)

List of Other Publications 1. Yusuke Sakai, Jian Chen and Hans-Georg Matuttis, Simulation of Polygonal Grains in a Finite-Element-Fluid , in M. Nakagawa and S. Luding, editors, Powders and Grains 2009, Proceedings of the 6th International Conference on Micromechanics of Granular Media , pp. 1019-1022 (AIP, Golden, Colorado, 2009). 2. Jian Chen, Alexander Schinner and Hans-Georg Matuttis, Static Friction, Differential Algebraic Systems and Numerical Stability , Physics Procedia, Vol.6, pp. 65-75, 2010 (Proceedings of the 21st Workshop Computer Simulations Studies in Condensed Matter Physics XXI)

181

Author Biography 陳 健 (チン ケン)

国籍:中国

1983年12 月 5 日 生

2001 年 9 月

中国電子科技大学

機械電子工学科

入学

2004 年 4 月

電気通信大学

交換留学 (JUSST Program)

入学

2005 年 3 月

電気通信大学

交換留学 (JUSST Program)

修了

2005 年 7 月

中国電子科技大学

機械電子工学科

卒業

2005 年 9 月

中国電子科技大学

大学院機械電子工学専攻

入学

2008 年 7 月

中国電子科技大学

大学院機械電子工学専攻

修了

2009 年 4 月

電気通信大学

大学院電気通信学研究科 知能機械工学専攻 博士後期課程

182

入学

Acknowledgments First and foremost, I would like to express my sincere gratitude and deepest thanks to my supervisor, Associate Professor Dr. rer. nat. Hans-Georg Matuttis, for his patient guidance of and substantial contribution to this research, for his extraordinary enthusiasm to teach and serious passion for science and for his unremitting inspiration and encouragement ever since I was an exchange student at UEC. I would like to dedicate my sincere thanks to Professor Takeshi Miyazaki for his support and encouragement during the study. I would like to thank Professor Hiroshi Maekawa, Professor Takashi Kida, Professor Tomio Okawa and Associate Professor Takaaki Nara for their valuable comments and insightful recommendations as my thesis committee members. I would like to thank the financial support from Japanese Government (MEXT) Scholarship and the cooperation and financial support of the Japanese Railway Technical Research Institute (Tokyo). I also wish to thank the members of the Matuttis laboratory for providing a supportive and friendly environment and offering helpful and interesting discussions, and all my friends for their support and friendship. Finally, I want to thank my parents to whom this work is dedicated and my younger brother for their love and support.

183