Fast Algebraic Methods in Computational Electromagnetics

5 downloads 0 Views 654KB Size Report
Northrop Grumman's proprietary code SWITCH, which is based on the hybrid FEM-BIE method. Keywords-fast methods; adaptive cross approximation;.
Fast Algebraic Methods in Computational Electromagnetics Tri Van, Member, IEEE, Laura R. C. Suzuki, Damir Latypov, Joy Von Holle, Tom Voss, Ton Van, Greg Wilson BerrieHill Research Corporation, Dayton, OH, USA [email protected]

Abstract—Applicability of the Adaptive Cross Approximation and hierarchical matrix approach to the problems of computational electromagnetics is examined both theoretically and numerically. Reduction of the required storage and computational speed-up during each step of a numerical solution are analyzed. Numerical examples are produced using a Northrop Grumman’s proprietary code SWITCH, which is based on the hybrid FEM-BIE method. Keywords-fast methods; adaptive cross approximation; hierarchical matrix; computational electromagnetics

I.

INTRODUCTION

The history of fast solvers in computational electromagnetics (CEM) goes back to 1971 when Bojarski noticed that all integrals in the CEM boundary integral equations (BIE) were convolutions [1]. Therefore, if these integrals are discretized on a uniform grid, the resultant matrix-vector product can be evaluated using the fast Fourier Transform (FFT), which requires only O(NlogN) operations instead of O(N2) operations for direct calculation of the product. Originally, the FFT-based methods were limited due to their inherent requirement for uniform meshes. To obtain more flexibility, it was necessary to project arbitrary irregular meshes onto a uniform mesh for solution, and then return to the irregular mesh after calculation. This has been accomplished by the introduction of the precorrected FFT methods and the adaptive integral method [2]. These methods are very effective for planar structures for which they reduce the computational complexity to O(NlogN). However, they are less efficient (O(N1.5logN)) when the surface basis functions have to be projected onto a three-dimensional grid. The next big step in development of the fast solvers was the introduction of the Fast Multipole Method (FMM) by Rokhlin and Greengard [3]. The method is based on the fact that a description of distant interactions can be greatly simplified by using a familiar multipole expansion. The multilevel FMM (MLFMM) has the computational complexity of the FFT method and can be implemented for an arbitrary grid. The development of the FMM has had a significant impact on computational electromagnetics as it tremendously enhanced solver capabilities for dealing with large-scale openNorthrop Grumman Corporation

George Antilla, Member, IEEE, Matt Warren Integrated Systems Northrop Grumman Corporation El Segundo, CA, USA

region problems involving complex geometry and material inhomogeneity. However, although the principles of FMM are simple, its implementation, especially the MLFMM is a nontrivial task. Another problem with the MLFMM is that it has only been implemented as an iterative solver. The theory of iterative solvers is well established for Hermitian and normal matrices but for general matrices it is still an open area of research. In practical terms this means that MLFMM implementations sometimes suffer from poor convergence or no convergence at all [4]. The success of FMM increased interest in data-sparse representations of the fully populated matrices. In 1999 Hackbusch introduced the notion of a hierarchical matrix (Hmatrix) [5]. A hierarchical matrix is an algebraic structure in which many sub-blocks can be represented by low rank matrices. In addition to efficient storage, hierarchical matrices provide approximate variants of the standard matrix operations such as addition, multiplication, inversion and LUdecomposition with almost linear complexity. All that is needed to take advantage of the H-matrix theory is to supply an algorithm for a low rank approximation for sub-blocks corresponding to distant interactions. In the FMM this approximation is generated by the multipole expansion of the kernel of the integral operator. Thus, implementing an FMMbased fast solver into an existing CEM code requires a complete rewrite of the impedance matrix build portion of the code. Therefore, when Adaptive Cross Approximation (ACA), a pure algebraic fast method which generates a low rank approximant from a few entries of the original matrix, was proposed, it was received with great interest. In this paper, we examine applicability and feasibility of a fast method based on H-matrix arithmetic and ACA to the problems of computational electromagnetics. Numerical examples are produced using a Northrop Grumman hybrid finite element – boundary integral method code called SWITCH. The paper is organized as follows. In the next section we describe an algorithm to represent an impedance matrix in Hmatrix format and a simple version of ACA. Section III provides information about SWITCH and how the H-matrixACA fast method has been integrated to produce a new code,

SWITCH-H. The fourth section summarizes numerical results on matrix compression, matrix build and LU-factorization speed-up and accuracy for a number of the Electromagnetic Code Consortium (EMCC) benchmark geometries. In the fifth section we discuss limitations and possible problems with ACA. Conclusions are given in the last section. II.

H-MATRIX REPRESENTATION OF THE IMPEDANCE MATRIX AND ACA

Large dense matrices arising from integral equations do not have explicit structure in general. However, in case of an asymptotically degenerate kernel it is possible to find a permutation so that a matrix with appropriately permuted rows and columns contains blocks that can be well-approximated by low-rank matrices. A suitable permutation depends on the identification of geometrically separated clusters of elements (typically facets or interior edges of the mesh). Sufficiently separated clusters determine sub-blocks that are numerically low rank, that is, can be approximated by low-rank matrices. Such sub-blocks are called admissible. A sub-block corresponding to a pair of closely spaced clusters is called inadmissible and is not approximated. The threshold distance that divides the admissible and inadmissible sub-blocks is a parameter, typically set to one wavelength of the frequency in consideration. Another parameter of the geometry decomposition process is the minimum block size. An inadmissible sub-block may be divided until it reaches that size. This process of geometry decomposition and the resulting H-matrix structure are illustrated in Fig. 1. In this paper, the admissible sub-blocks are approximated by low rank matrices using ACA. This algorithm produces a sequence of decompositions of an admissible sub-block matrix A into a sum A=Sk +Rk, where Sk is a low rank (less than or equal to k) matrix and Rk is the error of approximation. The algorithm consists of the following steps [6]. Let S0=0, i1=1 and for k=0, 1,… compute: k

Step 1:

T e= e Ti k +1 A − ∑ (U l )i i k + 1 Rk i =1

k +1

V Tl ,

Step 2:

jk +1 :| ( Rk )ik +1 , jk +1 | = max(| ( Rk )ik +1 , j |, j ),

Step 3:

V Tk +1= e Ti k +1 Rk / ( Rk )i k +1 , j k +1 , k

Step 4:

= U k +1 Ae Tj k +1 − ∑ (Vl )i i =1

k +1

Ul ,

Step 5:

ik + 2 :| (U k +1 )ik +2 | max(| (U k +1 )i |, ), i ≠ ik +1 =

Step 6:

S( k +1)= S k + U ( k +1) V(Tk +1) .

In the first step of the algorithm, the row with index ik+1 of the matrix A is generated and the corresponding row of the error Rk is computed. At Step 2 the column with the maximum element in the row ik+1 is found. The corresponding element is

Figure 1. Geometry decomposition and the structure of the resulting Hmatrix Green blocks are generally admissible and pink blocks are inadmissible.

the pivot element for an ACA iteration. In Step 3 the ik+1-th row of Rk is normalized. Since the position jk+1 of the pivot element in the ik+1-th row of Rk is known, at Step 4 we can compute the corresponding column of this matrix (Uk+1). Next, position of the next pivot element in the jk+1-th column is found (Step 5). The last step of the algorithm is an update of the approximation from Sk to Sk+1. A stopping criterion is given by:

Uk

Vk ≤ ε S k ,

(1)

where ||.|| indicates the Frobenius matrix norm. Thus, the algorithm approximates an m×n matrix A by an outer product form of an m×k matrix U and n×k matrix V, A≃UVT. Saving the U and V matrices requires 2k(m+n) units of storage as compared to mn units for matrix A. Therefore, if k is sufficiently small, the outer product form requires less storage than the original matrix and ACA compresses the matrix. Under the same condition on rank k the outer product form also leads to faster matrix addition and multiplication [7]. It is important to note that in ACA neither the matrix A nor the error Rk need to be computed completely. Since matrix A will not be generated completely only the norm of its approximation Sk is available. Therefore with ACA we don’t know the true error 𝜺true

A − S k ≤ ε true A ,

(2)

of the approximant Sk, and have only an estimate 𝜺 of it. This will be discussed in greater detail in section on limitations of ACA. Once a matrix is brought to the H-matrix form, its inverse, LU factorization, etc. can be computed using approximate, reduced complexity H-matrix arithmetic [5]. The accuracy of these approximate operations is controlled by an error tolerance parameter 𝜺H. III.

INTEGRATION OF THE H-MATRIX-ACA FAST METHOD INTO A CEM CODE

BerrieHill has been working with Northrop Grumman to incorporate an H-matrix/ACA fast method into a CEM code SWITCH [6]. SWITCH is one of the EMCC sponsored CEM codes and has the following features. It is based on threedimensional curvilinear hybrid finite element-boundary integral equation approach. The curvilinear coordinates permit arbitrarily curved geometries to be modeled accurately. The hybrid approach eliminates some disadvantages of both methods. For modeling regions of inhomogeneous or anisotropic materials, in which the integral equation method is difficult to implement, the finite-element method is used. The difficulty with the finite-element method of accurately terminating the computational mesh is eliminated by application of the exact boundary integral method. SWITCH is capable of solving Radar Cross Section (RCS) scattering, cavity and antenna radiation problems. In this paper we report results for the RCS problems only. The modified version of SWITCH with a fast method based on H-matrix/ACA added has been named SWITCH-H. The flowchart in Fig. 2 shows a workflow for SWITCH-H. IV.

NUMERICAL RESULTS

SWITCH-H has been extensively tested and compared against the SWITCH RCS calculations on a number of benchmark problems including several benchmark geometries

Figure 2. SWITCH-H workflow. The left path shows normal SWITCH progress without employing the new H-matrix/ACA code, and the right side shows the path taken by the accelerated version. Both versions return to the original SWITCH after matrix decomposition to solve for the RCS.

from EMCC. This section describes the problems used in this paper to quantify the performance of the fast method. These problems were selected in order to exercise all basic echo source mechanisms - tip, edge and corner diffraction, cavity return, traveling and creeping wave echoes, specular scatterers, surface discontinuities and interactions. A. Test geometries For this paper we have chosen the following 7 geometries: 1) 2) 3) 4) 5) 6) 7)

Two 1m diameter spheres 10 m apart at 300 MHz 10ft Ogive (EMCC benchmark) at 1GHz VFY-218 at 140 MHz 11ft Cone-Sphere (EMCC benchmark) at 2 GHz 3ft Open Pipe (EMCC benchmark) at 8 GHz C-29 at 1 GHz Small Half-Kite (EMCC benchmark) at 11.81 GHz

Fig. 3 shows these geometries and the number of unknowns resulting from their discretization when meshed with element edge length approximately equal to 1/7 of the scanning wavelength. All test bodies except for the last one (the Small Half Kite) are the perfect electric conductors (PEC). The Small Half-Kite is made of two parts, one being PEC and the other a lossy dielectric. The properties of the dielectric as well as the precise dimensions of this geometry can be found in Fig. 7. B. ACA Compression rate The compression rate of the impedance matrix resulting from ACA depends on the problem being solved as well as two ACA parameters – minimum block size and the ACA error tolerance ε, defined in Eq. (1). Smaller values of the minimum block size yield a larger number of admissible blocks thus resulting in higher compression rates. The reverse holds with respect to error tolerance – smaller values of ε give higher approximation accuracy but less memory saving. An example of a distribution of ACA compression rate over the impedance matrix is shown in Fig. 4. Black-colored blocks correspond to

Figure 3. Some of the test geometries used in verification of SWITCHH.

Figure 6. Typical SVD of an impedance matrix and of an admissible block. For an admissible block, the singular values decay much faster than for the impedance matrix as a whole, allowing ACA to give high compression even when the error tolerance ε is as small as 1.0e-13.

Figure 4. H-matrix for VFY-218 color coded by sub-block compression rate. Black blocks are inadmissible and therefore incompressible; darker blue tones indicate higher compression and red tones lower compression

no compression (inadmissible blocks), blue tones to higher compression and red tones to lower compression. It is interesting to note that significant compression rates are achieved even when the error tolerance is set to very small values, as seen in Fig. 5. This may seem to be counterintuitive, as one might expect that for ε as small as that shown in Fig. 5, ACA iterations will recover the full impedance matrix and thus produce no compression at all. Fig. 6 explains why this is not the case. It shows the singular value decompositions (SVD) of the full impedance matrix for the Double Ogive (an EMCC test body not shown here) and one of its admissible blocks. One can see that the SVD values of the admissible block decay much faster than those of the full impedance matrix and the 20-th SVD value is already several orders of magnitude smaller than the first one. This means that the block is highly compressible even at very small values of error tolerance ε. In this paper we omit the details of the H-matrix arithmetic

Figure 5. ACA compression rate (percent of original, uncompressed matrix memory used) as function of error tolerance 𝜺.

and the hierarchical LU decomposition, H-LU. These can be found in [5] and [7]. We note, however, that the H-LU decomposition leads to some loss of compression, that is, storing the lower (L) and upper (U) factors of a hierarchical matrix takes up more space than storing the original H-matrix. This loss of compression in case of Small Half-Kite is illustrated in Fig. 7. A reader may find it tempting to draw an analogy with the matrix fill-in effect in the ordinary LU decomposition of a sparse matrix. H-matrices, however, are able to handle dense matrices and it is not the fill-in but the increase of the effective rank that leads to some loss of the compression. Formally, it is a consequence of the fact that Hmatrices do not form a linear space [7]. The ACA impedance matrix compression rates as well as loss of compression after the H-LU decomposition for test problems is summarized in Table I. Each test case is listed with the number of unknowns. The compression rate yielded by ACA alone, before H-LU decomposition, is shown in Before H-LU. In some cases, some loss of compression occurs before H-LU decomposition, and compression rates after this step are shown in After H-LU, followed by percent loss of compression (% Loss). C. Speed-up of matrix operations ACA reduces the algorithmic complexity of the impedance matrix calculation, or the matrix build, from O(N2) to almost

Figure 7. Loss of compression during H-LU. Again, black blocks are inadmissible and have no compression, while dark blue blocks yield the highest compression rates and red tones give lower compression rates. White areas are an artifact of the rendering and occur at locations of large amounts of small inadmissible blocks.

TABLE II. SWITCH-H COMPRESSION RATES. Test Case Two Spheres Ogive VFY-218 Cone-Sphere Open Pipe C-29 Small Half-Kite

Number of Unknowns 1,584 13,376 25,508 28,980 82,496 134,848 221,119

Matrix Compression Before H-LU

After H-LU

Loss(%)

0.44 0.63 0.55 0.76 0.87 0.89 0.85

0.44 0.53 0.45 0.66 0.78 0.69 0.73

0 16 18 13 10 22 14

Fig. 8 shows RCS of the Ogive test body computed by SWITCH and SWITCH-H for two polarizations (top plots). Two plots on the bottom of the figure show the difference between the SWITCH and SWITCH-H results. As can be seen, results deviate by less than 0.1 dBsm everywhere except for locations with very low values of RCS (𝜑~0° for 𝜑𝜑polarization and 𝜑~-150° for 𝜃𝜃-polarization). The RMS deviation between the SWITCH and SWITCH-H computed RCS values for all test cases are summarized in Table III. V.

linear because not all the entries of the matrix are actually computed. It also brings the matrix to the H-matrix form, for which approximate, reduced complexity variants of the linear algebra operations, such as inverse and LU-decomposition, have been developed [5], [7]. This results in significant acceleration of the solution of the impedance matrix equation. In this paper, the impedance matrix equation is solved using the H-matrix variant of the LU-decomposition (H-LU) as a direct method. Another alternative is to use the H-LU with a relaxed error tolerance 𝜺H as a preconditioner with an iterative solver. The final version of SWITCH-H will have this capability as well. Table II summarizes the speed-up of the matrix build, factorization and the solution of the LU-factorized impedance matrix equation using SWITCH-H, as compared to the corresponding steps when running the original SWITCH. Percent speed-up is given as the percent of runtime of the original SWITCH code used by SWITCH-H within each category, calculated using CPU seconds.

LIMITATIONS OF ACA

In this section we consider some potential problems that can arise when using ACA for low rank approximation of the admissible sub-blocks of an H-matrix. First we note that since with the ACA algorithm we do not compute the entire impedance matrix, the true error of the approximation remains unknown. ACA provides only an estimate 𝜺 of the true error 𝜺true as defined in (1) and (2). Fig. 9 shows a comparison of this estimate with the true error for a 3343×1672 size admissible block as function of the ACA iteration number k. One can see that ε(k) is a biased estimate – most of the time it is optimistically underestimating the true error. This fact has also been observed in [8]. The authors of this work argue that the stopping criteria (1) can lead to a premature stop and yield inaccurate results as a consequence. Besides an

D. Accuracy Accuracy of a solution obtained using the H-matrix-ACA fast method depends on the error tolerances ε of ACA and 𝜺H of H-matrix arithmetic. In this section the accuracy of the method is studied numerically, by comparing the RCS computed for seven test bodies using SWITCH and SWITCHH. A theoretical analysis of the method’s accuracy is postponed until next section. Numerical experiments have established that the error tolerance 𝜺 values of between 1.0e-4 and 1.0e-5 produce a good compromise between the compression and speed-up performance of the method and the accuracy of the solution. Typically, the H-matrix arithmetic error tolerance 𝜺H is set at the same value as 𝜺. TABLE I. MATRIX BUILD AND OPERATIONS SPEED-UP

Test Case Two Spheres Ogive VFY-218 Cone-Sphere Open Pipe C-29 Small Half-Kite

Number of Unknowns 1,584 13,376 25,508 28,980 82,496 134,848 221,119

Speed-up (%) Matrix Build

Matrix Factor

Matrix Solve

71 76 64 79 76 91 34

89 79 78 81 95 85 94

5 73 74 73 74 82 84

Figure 8. Accuracy of the SWITCH-H for the Ogive test case. The top two plots show, from left to right, RCS from the φφ polarization and from the θθ polarization in dBsm over a full rotation around the test body. The lower set of plots show the difference between SWITCH and SWITCH-H results. TABLE III. ACCURACY OF THE SWITCH-H RCS CALCULATIONS Test Case Two Spheres Ogive VFY-218 Cone-Sphere Open Pipe C-29 Small HalfKite

Number of Unknowns 1,584 13,376 25,508 28,980 82,496 134,848 221,119

RMS Error in dBsm 𝜑𝜑 polarization 5.3e-7 1.1e-4 2.3e-6 7.2e-6 1.9e-5 4.3e-5 8.9e-6

𝜃𝜃 polarization 5.5e-7 9.3e-6 2.7e-6 9.0e-6 6.4e-6 3.4e-5 8.0e-5

𝑡𝑡1 𝑥𝑥𝑖𝑖 ∊ � 𝑡𝑡2

𝑖𝑖𝑖𝑖 𝑖𝑖 ≤ 𝑛𝑛 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒

𝑦𝑦𝑗𝑗 ∊ �

𝑠𝑠1 𝑠𝑠2

𝑖𝑖𝑖𝑖 𝑗𝑗 ≤ 𝑛𝑛 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 ,

(4)

Nyström discretization of the double layer potential which is asymptotically smooth in the x variable but not in the y variable in the points xi and yj yields a matrix sub-block

X∊ℝ . The regions t1 and s2 as well as t2 and s1 lie in the same plane, while the outer normal vectors of t1 and s1 and t2 and s2 respectively are perpendicular. Thus, all entries Xij with i≤n and j>n or i>n and j≤n vanish, while the diagonal blocks are non-zero. Thus, the sub-block X has the structure: (n+m)×(n+m)

𝑋𝑋 𝑋𝑋 = � 11 0

Figure 9. Comparison of the ACA error estimate with the true error. As can be seen, ACA occasionally estimates the error to be much smaller than the true error, leading to premature termination of the ACA algorithm and continuation of the program.

obvious fix – decreasing the error tolerance ε – they have suggested a multiple restart procedure. After the estimated error becomes smaller than ε, it proceeds with a number of additional iterations, randomly choosing a starting column. If the estimated error remains below ε after these iterations, the algorithms terminates; otherwise it continues as usual. In certain cases ACA iterations may appear to converge while the true error remains unacceptably large. This can be demonstrated by discretizing the double layer potential u(x) in 3D:

u ( x) = ∫ ρ ( y ) S

nˆ ( y )( y − x) dσ y , y−x 3

(3)

for a simple geometry shown in Fig. 10. Although this integral does not appear explicitly in the integral equations relevant to SWITCH, it is nonetheless instructive to see an example how ACA may fail. Let cluster T be a union of regions t1 and t2, and cluster S – union of regions s1 and s2. Clusters T and S are assumed to be far from each other so that the corresponding matrix sub-block is admissible. Let us fix n, m and points xi and yj, i, j=1, 2,…,n+m satisfying

0 � , 𝑋𝑋11 𝜖𝜖ℝ𝑛𝑛×𝑛𝑛 , 𝑋𝑋22 𝜖𝜖ℝ𝑚𝑚 ×𝑚𝑚 𝑋𝑋22

.

(5)

If ACA is applied to sub-block X with the starting pivot in X11, then the next pivot can only be found within X11. In other words, the ACA algorithm works on the sub-block X11 (where we can expect it to converge exponentially) but does not recognize the second sub-block X22. If we chose the initial pivot index i>n, we would instead approximate X22 but completely miss the sub-block X11. In both cases the ACA error estimator converges exponentially to zero, thus suggesting a good convergence of the total error, when, in fact, no convergence occurs. A theoretical proof of convergence of ACA has been given in [9] for integral equations with the kernels which are asymptotically smooth in both variables. This proof has been questioned in [10] by constructing an explicit counter example. We note that neither EFIE nor MFIE satisfy even the conditions stated in [9]. Therefore at present for CEM problems the ACA should only be considered as a heuristic method. VI.

CONCLUSION

A pure algebraic fast method based on the H-matrix arithmetic and ACA is integrated into a Northrop Grumman hybrid finite element - boundary integral CEM code SWITCH. Impedance matrix compression on the order of 90% and run time speed-up of 75-85% are observed for a scattering problem on a number of test geometries. Limitations of the ACA methods were also discussed. REFERENCES [1]

[2]

[3] [4] [5]

Figure 10. Two clusters leading to a reducible matrix block for the double layer potential.

N.N. Bojarski, “k-Space formulation of the electromagnetic scattering problem”, Tech. Rep. AFAL-TR-71-75 Air Force Avionics Lab, March 1971. E. Bleszynski, M. Bleszynski and T. Jaroszewicz, “A fast integral solver for electromagnetic scattering problems”, IEEE APS Int. Symp. Dig, vol. 1, pp. 416-419 , 1994. L. Greengard, V. Rokhlin, “A fast algorithm for particle simulations”, J. Comp. Phys. Vol. 73, pp. 325-348, 1987. A. Greenbaum, Iterative methods for solving linear systems. SIAM, 1997. W. Hackbusch, “A sparse matrix arithmetic based on H-matrices. Part I: Introduction to H-matrices”, Computing vol. 62, no. 2, pp. 89-108, 1999.

[6]

[7] [8]

G.E. Antilla and N.G. Alexopoulos, “Scattering from complex threedimensional geometries by a curvilinear hybrid finite-element-integral equation approach”, J. Opt. Soc. Am., A 11, pp. 1445-1457, 1994. M. Bebendorf, Hierarchical Matrices, Springer, 2008. J. Laviada, et al., “On the convergence of the ACA”, Microwave and Optical Technology Letters, vol. 51, pp. 2458-2460, 2009.

[9]

M. Bebendorf, S., Ryazanov, “Adaptive low-rank approximation of collocation matrices”, Computing, vol. 70, no. 1, pp. 1-24, 2003. [10] S. Borm, L. Grasedyck and W. Hackbusch, “Hierarchical Matrices”, Lecture Note No. 21, Max Planck Institute for Mathematics in the Sciences, Leipzig , 2006.