Atomistic simulation of dynamic surfaces

4 downloads 0 Views 2MB Size Report
Apr 21, 2008 - KLS is a very general method containing many popular methods as particular realizations. These are generated by fixing the number of levels ...
IOP PUBLISHING

JOURNAL OF MICROMECHANICS AND MICROENGINEERING

doi:10.1088/0960-1317/18/5/055029

J. Micromech. Microeng. 18 (2008) 055029 (17pp)

Atomistic methods for the simulation of evolving surfaces M A Gos´alvez1, Y Xing2, K Sato3 and R M Nieminen1 1

Laboratory of Physics, Helsinki University of Technology, PO Box 1100, 02015 HUT, Finland Department of Mechanical Engineering, Jiu Longhu Campus, Southeast University, Jianging Section, Nanjing 211189, People’s Republic of China 3 Department of Micro-Nanosystem Engineering, Nagoya University, 464-8603 Aichi, Japan 2

E-mail: [email protected]

Received 29 January 2008 Published 21 April 2008 Online at stacks.iop.org/JMM/18/055029 Abstract The currently available atomistic simulation tools for an evolving complex interface are described, focusing on the kinetic Monte Carlo (KMC) and cellular automata (CA) methods. The different versions of the two methods are considered, stressing their relative weaknesses and strengths. Data storage considerations lead to the choice of an octree data structure, of use in both KMC and CA simulations. The octree results in a substantial reduction in memory use, simultaneously benefiting from rapid addressability and fast searching properties. Examples in anisotropic etching are used to depict the use of the methods in practice and to describe the algorithms. (Some figures in this article are in colour only in the electronic version)

For instance, growth and etching can be sequentially applied in alternation during many steps resulting in a constantly reshaped evolving complex interface that propagates in three-dimensional space [10–12]. In the same manner as the simulation of the collective microscopic behavior enables an improved understanding of the surface evolution mechanisms, the realistic simulation of the resulting multivalued propagating surface becomes essential for proper device design and development. At present, one may claim that the successful simulation of such a variety of microscopic and macroscopic phenomena has been enabled by the constant increase in computational power through the years, as reflected by the elimination of the traditional need to use supercomputers, partly being replaced by fast desktop (and laptop) computers. However, much of the modeling success is due to the development of improved algorithms and methods. In this paper, we describe the kinetic Monte Carlo (KMC) [13, 14] and cellular automata (CA) [15, 16] methods as two prominent examples of popular atomistic approaches for the simulation of collective microscopic processes and complex macroscopic propagating surfaces. By closely comparing the two methods in their most relevant variants, we provide a comprehensive description of their major features, stressing their relative benefits and

1. Introduction Since the mid 1980s there has been a significant increase in the popularity of atomistic simulations for the analysis and description of collective behavior at the liquid–solid and gas– solid interface. Relevant examples can be found in both equilibrium and non-equilibrium systems, where the evolving surface may be subject to redistributions and/or net flows of particles, such as in growth or etching. Collective behavior can be found, e.g., in the concerted motion of islands and pits, where an overall complex deformation and reconstruction of the clusters provide low energy routes for their motion [1–3]. It can also be found in step flow, where it involves a delicate interplay between kink propagation, step reactivity and diffusion transport [1, 4–6]. Another example is the formation of morphological defects such as mesa structures, hillocks, cavities and other surface features where a wide range of microscopic mechanisms needs to be considered [1, 4, 7]. At the macroscopic level, the interface may exhibit a markedly complex evolution. As an example, anisotropic etching can lead to complex multi-valued surface fronts propagating with different rates and displaying different morphologies depending on the exposed local surface orientation [4, 8, 9]. This can be used for the fabrication of micro/nano electro mechanical systems (MEMS/NEMS). 0960-1317/08/055029+17$30.00

1

© 2008 IOP Publishing Ltd

Printed in the UK

J. Micromech. Microeng. 18 (2008) 055029

M A Gos´alvez et al

disadvantages. In doing so, we address important questions such as why to prefer the use of atomistic simulations instead of a continuum formulation, or why KMC is suitable for describing microscopic collective processes while CA predicts better the overall propagation of the surface. We describe the fundamental components of an atomistic simulator and describe what features make a simulator accurate.

epitaxial and conformal growth, such as atom layer deposition (ALD), and surface micromachining, involving the deposition and etching of sacrificial and functional layers based on the use of different materials, provide additional levels of complexity and flexibility for the realization of the microstructures. From the point of view of performing simulations, these processes share a fundamental, common feature: they result in the propagation of an evolving complex interface in threedimensional space. Taking this perspective, we discuss the major concepts necessary for the atomistic simulation of such evolution. Although the paper is tailored mostly for wet etching, many discussed issues are applicable to other processes. In particular, the use of the octree data structure and the description of the different KMC and CA methods can be directly applied for the simulation of step-flow, epitaxial and conformal growth as well as for dry etching and surface micromachining. Many tools and algorithms have been developed for the simulation of etching with the main focus on either the prediction of the etch front propagation in engineering applications or the understanding of the atomistic mechanisms. Generally, the engineering applications have been dominated by geometrical simulators [20–23]. However, there have been recent advances in the atomistic methods for this purpose by using improved cellular automata (CA) implementations [19, 24–26]. This has effectively expanded the scope of the simulations, used more traditionally in the form of kinetic Monte Carlo (KMC) codes for explaining the origin of the morphologic features [5–7, 9, 27] as well as the site-specific origin of the etch rate anisotropy [27–30] and its temperature dependence [31, 32]. Since both KMC and CA are also used for the simulation of growth processes [33–36], and threedimensional attempts are being made to use them for the simulation of deep reactive ion etching [37, 38] as well as atom layer deposition [39–41], an overview of the different KMC and CA methods themselves is timely appropriate.

1.1. Why to use atomistic simulations? In general, one can distinguish between three main approaches for the simulation of a propagating interface by considering the underlying atomistic nature: molecular dynamics (MD), Monte Carlo (MC) and cellular automata (CA). As compared to MD [17], the MC and CA methods simplify the nature of the interactions between the atoms, substituting them by simple rules which may be tailored to retain the essential behavior of the system. This enables the simulation of larger systems containing up to a few tens of millions of atoms (or cells). There are several reasons to favor the use of atomistic simulations, including the following (i) direct use of the resulting geometries in post-processing tools, such as finite element method (FEM) analysis of mechanical, thermal and/or piezoresistive properties, with the whole domain already conveniently discretized, including convex and concave regions and/or protrusions; (ii) a realistic description of the surface morphology and its origin is included, enabling an improved understanding of the interplay between the microscopic phenomena and the macroscopic features. As an example, the underlying atomistic process can be visualized and the macroscopic features can be understood better. This is the case for macroscopic surface striations appearing due to step bunching during step flow etching [18]; (iii) the correctness of the underlying atomistic processes is tested, serving as a screening method for different possible mechanisms; (iv) a reduction in the number of experimentally required parameters can be obtained. As an example, the use of atomistic cellular automata for the simulation of anisotropic etching significantly reduces the number of experimental parameters to just a few as compared to hundreds or thousands of etch rates required by the continuum formulation using the Wulff–Jaccodine method [19, 20].

1.3. Outline Based on our experience in building an atomistic simulator [42], we focus on the description of four essential features: (i) the incorporation of the crystalline structure of the substrate (section 2), (ii) important considerations about data storage structures (section 3), (iii) the determination of the process rates (section 4), which can vary from fully manual to semiautomatic calibration and can involve the use of sophisticated physical models to capture diffusion and micromasking phenomena and (iv) the different KMC and CA methods for the time propagation of the interface (section 5), where we make a particular effort to stress their relative strengths and weaknesses. In addition, the paper offers an overview of simulated results. Sections 2 and 4 are very specific, in principle limited to crystalline silicon and its anisotropic etching in alkaline or acidic solutions. Some of the ideas, however, are applicable to other materials and other processes, as pointed out correspondingly in each section. In contrast, sections 3 and 5 contain very general concepts about data storage and the

1.2. Anisotropic etching As a particularly demanding process from a computational perspective and widespread use in micromaching, we describe the major aspects necessary for the successful simulation of anisotropic etching, serving as an example of a broader simulation scope including other surface processing techniques. Anisotropic wet chemical etching of crystalline silicon stands as a prevalent method for the fabrication of MEMS and NEMS devices. Although alternative technologies such as deep reactive ion etching (DRIE) have penetrated the traditional wet etching territory, the higher costs and limitations for batch processing using DRIE still make chemical etching the most affordable method for the reliable production of three-dimensional structures with multipurpose functionalities. Similarly, other technologies including 2

J. Micromech. Microeng. 18 (2008) 055029

M A Gos´alvez et al

(a)

(c)

(b)

(e)

(f )

(d )

Figure 1. Crystallographic features of silicon: (a) cubic lattice, (b) conventional unit cell, (c) crystallographic basis for the conventional cell (8 atoms), (d) a generic 1{h k l} plane before rotation of the crystal to place {h k l} parallel to the Z-axis, (e) one possible orthorhombic unit cell to describe a {755} silicon substrate and (f ) a small crystal with {755} orientation.

a silicon crystal with the crystallographic planes {1 0 0}, {0 1 0} and {0 0 1} perpendicular to the X, Y and Z directions, rotating then the crystal in order to place the {h k l} plane parallel to the Z-axis (figure 1(d)–(f )). This rotation ensures that the atom at the origin, with nm-coordinates (0, 0, 0, 1), remains centered. After this, one can search for an orthorhombic unit cell, i.e. a cell with rectangular faces, as shown in figure 1(e). There are several reasons for this choice, including (i) ease of implementation of periodic boundary conditions (see below), (ii) ease of storage, as rectangular arrays are suitable for direct use in computations and (iii) ease of visualization (see section 3 in connection with the use of an octree). The determination of the orthorhombic unit cell is based on finding periodicity along three orthogonal directions, including the Z-axis as one of them. The process has three stages: (1) given that the origin has already an atom located on it, we search for the closest atom lying on the Z-axis. The vector between the origin and this atom constitutes the third lattice vector of the searched unit cell. (2) Restricting our search now to the plane perpendicular to the previous vector, we now look for two atoms that are closest to the origin so that, additionally, the vectors reaching them from the origin are mutually perpendicular. These vectors will form the first and second lattice vectors of the searched orthorhombic cell. (3) Typically, after applying (2), one will find several sets of vectors corresponding to different unit cells which are rotated with respect to each other about the Z-axis and contain different realizations of the basis. In principle, any of them is valid to represent the {h k l} oriented crystal. A criterion to single out one of them is to choose the one that simultaneously minimizes the length of the first and second lattice vectors as well as the area of the rectangle defined by them. An example of the resulting orthorhombic unit cell is shown in figure 1(e).

KMC and CA methods. In our opinion, the latter are useful for a larger readership with interests outside etching.

2. Crystallographic considerations This section presents a way to incorporate the crystallographic structure of silicon in a simulation so that a generic process can be atomistically simulated on any generic surface orientation {h k l}. As explained at the end of the section, one can easily introduce some modifications in order to simulate other crystalline materials. 2.1. Use of orthorhombic unit cells to represent any surface orientation In order to preserve the basic crystallographic properties of silicon such as the n-fold symmetry of the etched structures on {1 1 0} (n = 2), {1 1 1} (n = 3) and {1 0 0} (n = 4), the atomistic simulation must contain the correct Bravais lattice, crystallographic unit cell and crystallographic basis. For the case of silicon, these are shown in figure 1. Accordingly, the position of any atom in the crystal can be addressed by using four coordinates (n1 , n2 , n3 , m), where the trio (n1 , n2 , n3 ) designates the position of the cell in the lattice (figure 1(a)) and the last index m corresponds to a particular atom in the crystallographic basis (figure 1(c)). We refer to the four indices as the nm-coordinates. Since the etch and/or growth rate of silicon can be anisotropic and thus depend on the crystallographic orientation of the surface, one needs to simulate the process on any surface in order to obtain an improved understanding of the differences. To do this, the simulator must be able to represent any surface orientation, characterized by the three Miller indices {h k l}. In practice, one does this by always generating 3

J. Micromech. Microeng. 18 (2008) 055029

M A Gos´alvez et al

Y directions is N1 and N2 , implementing the periodicity means that when a neighbor is required from cell n1 = N1 + 1 and/or n2 = N2 + 1, the subroutine should output the corresponding neighbor from cell n1 = 1 and/or n2 = 1. Similarly, for cell n1 = 0 (n2 = 0) the subroutine provides the corresponding neighbor at n1 = N1 (n2 = N2 ). Nothing else is needed to have periodic boundary conditions in the two planar directions! 2.4. Other substrates The previous approach to handle the crystallographic structure of crystalline silicon can be easily modified in order to include other crystalline materials. Since any crystal can be represented using an orthorhombic unit cell, one only needs to change the conventional unit cell of silicon by a convenient orthorhombic cell for the new material. Due to the change in the atom coordination, not necessarily tetrahedral as in silicon, the list of neighbors will necessarily have a number of entries differing from four for each atom in the basis. The presence of different atoms from different elements in the basis introduces some technical differences with respect to the previous methodology, but the same basic ideas can be applied.

Figure 2. Creation of the list of neighbors for the atoms in the crystallographic basis.

Once the lattice vectors of the orthorhombic unit cell have been determined, the corresponding crystallographic basis can be found immediately. This is a simple procedure where one searches for the atoms whose position is within the boundaries defined by the six faces of the orthorhombic cell. To obtain the crystallographic basis correctly, the atoms lying exactly on the facets are included only for three mutually perpendicular faces (not all six faces).

3. Data storage 3.1. Complex multivalued surfaces Anisotropic and dry etching of the exposed areas enforced by a masking pattern frequently leads to the removal of large regions of the material and often gives rise to complex multivalued surfaces, as schematically shown in figure 3(a). Together, the resulting intricate bulk and etchant (or empty) regions appear as inactive domains for the atomistic simulations, whose primary focus is the processing of the atoms located at the interface. On the other hand, as explained in section 2, the atomistic simulators favor a representation of the {h k l}-oriented crystal as an orthorhombic sample, with right angles and three different lengths. Thus, a simple threedimensional computational array is the natural choice to store the state of the system. However, in this approach the bulk and etchant regions are kept in the memory (as the zeros in figure 3(a)), leading to a very inefficient use and resulting in unnecessary limits to the maximum possible simulation size. As a more flexible and efficient alternative, we consider the octree data structure, which is simultaneously compatible with the use of orthorhombic cells and a rather tight representation of the active regions only.

2.2. Locating the first neighbors Since the atomistic simulations are based on associating reaction rates to the atoms according to the particular configuration of their neighborhood (section 4), it is necessary to know the location and state of the first neighbors. This is actually a basic procedure that is needed very often during a simulation and, thus, should be a fast subroutine. In practice, this is obtained by storing a list of the first neighbors for only the atoms in the crystallographic basis, as shown in figure 2. Finding the neighbors is limited to reading from this list. In the case of silicon, since the atoms have tetrahedral coordination the list has four neighbors for each basis atom. Note that the nm-coordinates are stored as relative values so that the same list can be used to find the neighbors of any atom in the crystal. 2.3. Periodic boundary conditions The list of neighbors is an essential component for the efficient incorporation of periodic boundary conditions in the simulations. In general, there are three types of boundary conditions: (i) the crystal has a finite size in the three dimensions (representing a silicon chip), (ii) the crystal is finite in the Z-direction and infinite in X and Y (a planar slab), and (iii) the crystal is semi-infinite in the Z-direction and infinite in X and Y (a semi-infinite crystal). In order to realize cases (ii) and (iii) in a simulation, periodic boundary conditions are needed in the planar directions X and Y so that the same silicon chip repeats itself indefinitely on the XY plane. This can be easily implemented by modifying slightly the procedure where the neighbors of an atom are read off from the list of neighbors. Assuming that the number of orthorhombic cells in the X and

3.2. The octal tree An octal tree or ‘octree’ is a tree where each branch bifurcates into eight branches or, otherwise, into no branches at all. An example is shown in figures 3(b)–(c). Both the logical tree and the corresponding three dimensional structure are displayed using colors that mach each set of children. The lowest level (or root node) corresponds to the outermost box, which is divided into eight octants. In this example, seven octants are empty and one has an inner structure. Correspondingly, the logical tree stops at seven children and branches out only at 4

J. Micromech. Microeng. 18 (2008) 055029

M A Gos´alvez et al

(a)

(b)

(c) (f )

(e)

(d )

Figure 3. (a) Schematic representation of a 2D matrix for storage of a simulated system at two different times (active band [29] and dynamic array [43] methods). Bulk and etchant empty areas are not needed and memory is used inefficiently. Schematic structure of an octree using (b) the logical and (c) the three-dimensional representation. (d) Example of the use of an octree for a silicon substrate with orientation {3 1 0}. (e) Comparison of memory use by an octree and a dynamic 3D array [44].

the remaining child. The non-empty octant is itself divided into eight octants, of which seven are empty and one is further subdivided. In this manner, higher branches in the logical tree are added corresponding to the subdivisions of the 3D region until the grid size of the represented data is reached. In this particular example, where only eight data points are in use (i.e. having state 1 while the rest have state 0), the octree uses 41 storage elements while a 3D array for the finest grid would require 32x32x32 = 4096 elements (8 with state 1 and 4088 with state 0). Although this extreme case of memory reduction will rarely happen, the example illustrates the key idea that the octree bifurcates only in the regions where data are held and this reduces significantly the memory requirements. The use of the octree data structure in connection with an orthorhombic unit cell is shown in figure 3(d). Instead of holding a single grid point at the leaves, as in the previous example, the whole orthorhombic unit cell is stored. The branches ending at lower levels represent empty regions, either in the bulk of silicon or in the etchant. In this manner, during a simulation, the octree does not keep in memory the already etched regions nor the still unused bulk regions, which can represent a large fraction of the overall system volume when using masks (figure 3(a)). As an example, figure 3(e) shows the reduction in memory occupation obtained with an octree as compared to a dynamic 3D array which allocates and deallocates memory according to the size of the active band [29], which is a very frequent approach [24, 25, 43] and is illustrated by figure 3(a). Figure 3(e) shows that the octree uses about 75% less memory and that the memory used by the octree is equivalent to that used for simulating a flat surface (compare

the black and red curves). Although these results have been obtained for kinetic Monte Carlo (KMC) simulations [44] (see section 5), similar reductions are obtained when using cellular automata (CA) [26] (also introduced in section 5). Similar behavior is observed for other three dimensional shapes such as micropyramids, cantilevers, AFM probes, etc. More details on the implementation of the octree data structure can be found in [26]. 3.3. Benefits of the octree In addition to allowing a reduced memory use, the octree provides a number of useful features, including (i) fast addressability, (ii) flexible visualization and (iii) ease of use in both KMC and CA simulations. Fast addressing of the stored data is possible because the octree is a spatially organized sparse data structure which enables rapid calculation of the exact location of the data and, if needed, fast searching is supported by this structure [26, 44]. This is particularly advantageous when compared to an orthogonal list, often used to represent sparse systems (e.g. sparse matrices). As a generalization of a linked list into two or more dimensions, an orthogonal list can hold the useful data very tightly (with minimal waste of memory by precisely holding only the useful data without any coarser structures characteristic of the lower levels of an octree), but searching for a node is very slow and direct calculation of a storage location is not possible. As shown in section 5, searching a tree is much faster than searching a one dimensional array (or a multidimensional generalization such as an orthogonal list), differing even by orders of magnitude in terms of computational time. 5

J. Micromech. Microeng. 18 (2008) 055029

M A Gos´alvez et al

etching in KOH at medium concentration (e.g. 30–40 wt%) can be described by a removal probability function [29]:

Flexible visualization is based on truncating the tree at a desired level. At the lower levels, the detailed structure of the surface atoms is not considered; thus this allows the display of larger systems with less details (zoom-out). If needed, the full atomistic structure can be displayed for the whole system or for the region of interest (zoom-in). Finally, the octree can be successfully used as the underlying data structure for both KMC and CA simulations, as shown at the end of section 5. For the case of KMC, the octree data structure and the octree search algorithm discussed in section 5 can be realized using the same tree [44], demonstrating that the method is a suitable solution to simultaneously store a complex spatial topology and maintain computational efficiency. For the case of CA, the octree is suitable for parallel computations, which are an important feature of this method. The use of the octree does not affect in any way the implementation of the periodic boundary conditions for both methods and the guidelines given in section 2 are still valid. In conclusion, the octree provides a good balance between memory efficiency and calculation speed and it is suitable for the description of a multivalued dynamic surface being propagated in three-dimensional space.

1 + e−β1 n1 0

r(n1 , n2 ) =

1 + e−β2 n2 0

(1) 0 0 , 1 + eβ1 (n1 −n1 ) 1 + eβ2 (n2 −n2 ) where β = 1/kB T , kB is the Boltzmann constant and T is the temperature. This particular analytical form is based on geometrical arguments and visual fitting of simulated and experimental etched profiles for a set of mask designs. The use of the RPF reduces the number of adjustable variables from 48 removal  rates to 4 physically meaningful parameters  1 , 2 , n01 , n02 . As an additional example of the viability of this parameterization, section 5 of this paper shows that this same function can be used to describe the propagation of the etch front for the case of an etched spherical silicon sample. 4.2. Removal probability tables In most cases, however, the dependence on the neighborhood is not treated as a parameterized function but rather as an adjustable matrix (or table) where the process rates pn1 ,n2 can be manually fitted by matching simulations and experiments. This can be based on the comparison of (i) the simulated and experimental surface morphologies, (ii) the etch rates of different surface orientations and (iii) the shape of the etching profile in the engineering applications. As an example, this approach has been used in [27, 28, 45] by comparing detailed STM images and KMC simulations of the surface morphologies of stepped {1 1 1} surfaces and in [7] by simultaneously matching experimental and simulated morphologies of a wide range of surface orientations. Similarly, direct comparison of the etched profile around a convex corner using CA simulations has also been used [46–48]. In this case, however, the neighborhood is classified according to the number of first neighbors located at the surface and in the interior (or bulk), n1s and n1b , respectively, and the second neighbors are disregarded. One should notice that typically the values of the process rates for the KMC and CA methods obtained by this type of visual fit or, in fact, any other fitting strategy (see the following paragraphs) can differ significantly. This is due to an existing inherent difference between the two methods, which makes KMC behave more atomistically and CA more macroscopically, as explained in section 5. A more elaborate but also more flexible example of a Removal Probability table (RPT) originates from a generalized classification of the neighborhood based on distinguishing the first and second neighbors according to their location with respect to the propagating surface (i.e. are they at the surface or in the bulk?). This results in a four index (n1s , n1b , n2s , n2b ) scheme [25] which unambiguously separates many otherwise similar surface sites according to the (n1 , n2 ) classification, as explained in detail in [19]. In this respect, using (n1 , n2 ) leads to a degenerate classification while the introduction of the surface and bulk locations, (n1s , n1b , n2s , n2b ), provides a nondegenerate categorization that distinguishes the full spectrum of different surface sites. This has been successfully used in both CA [19, 25, 26] and KMC [49] simulations of the etching process. Since the number of entries in the RPT matrix is

4. Physical models for the process rates The fundamental assumption in the atomistic methods based on the KMC and CA approach is that each atomistic process (such as, e.g., the atom removals taking place during wet etching) occurs at a different rate which depends on the particular arrangement of the neighborhood. In other words, one assumes that the reaction rates of the simulated processes depend on the state of the neighborhood. This is based on the underlying idea that the neighborhood contains the local coordination and bonding configuration, which suffer a transformation from an initial to a final state during one such process. In many cases, however, the final state can be considered to be the same for all different initial states, effectively yielding process rates that depend only on the initial state. In other cases, the initial states can be assumed equal and the difference in behavior can be correlated to the final states. In etching as an example, the removed state of an atom as a reaction product in the etchant (final state) can be considered the same for a wide set of initial states (e.g. different surface sites) and the removal rates of the atoms are a function of the surface type only (initial state). For step-flow growth, in contrast, it is the final state that becomes the key element describing the differences in growth rates. 4.1. Removal probability function In practice, this dependence of the process rates on the local environment can be considered as an explicit function of the number of neighbors, e.g., the number of nearest and nextnearest neighbors, n1 and n2 . One assumes the existence of a function r(n1 , n2 ) describing the removal rate (or, when normalized, the removal probability) of an atom in the different configurations. For instance, it has been shown that anisotropic 6

J. Micromech. Microeng. 18 (2008) 055029

M A Gos´alvez et al

different numbers of bonds. The use of such a physical model satisfactorily explains the anisotropy of the macroscopic etch rate [30, 48].

greatly expanded with this classification, a completely manual calibration of the table entries is a very challenging task and alternative approaches must be used. For the CA case, it is possible to derive analytical expressions for the etch rates Rhkl of a set of surface orientations [19], a rD 4 a R110 = √ rM 2 2 a rT rL(T) R111 = √ 3 rT + rL(T) a R210 = √ (rM + rEMK + rK ) 4 5 a R310 = √ (rKR(1)−EMK + rK ) 2 10

4.4. Mesoscopic phenomena

R100 =

In addition to only considering the local neighborhood, more complex physical models for the reaction rates can be obtained by taking into account also mesoscopic aspects of the process. As an example in wet etching, the removal rate of the atoms can be a function of local gradients in the etchant concentration and/or temperature, corresponding to a delay between the processes occurring at the surface and diffusion transport of the reactants and/or products to those locations. In practice, this can be incorporated into the simulations by multiplying the neighborhood-dependent removal rates by a factor that models the additional phenomena. For instance, during anisotropic etching of silicon, step bunching on vicinal {1 1 1} surfaces and zigzag formation on {1 1 0} can be described by using the factor 1 + aρ(r), where ρ(r) is the density of steps on the surface at position r and a is a parameter. The steps are the most active regions for etching and, correspondingly, the strongest gradients in the concentration and/or temperature occur around them. Using the concept of ‘activity monitoring’ [6], an approximation to ρ can be efficiently maintained in the simulations and the formation of inhomogeneities (in the etchant concentration and/or temperature) can be reproduced, which leads to the appearance of step bunches on {1 1 1} and zigzags on {1 1 0}. Another example of a complex physical model is the modification of the atomistic removal rates due to the adsorption of metal impurities such as Cu on the silicon surface during etching [49]. In this case, the removal rate is modified from the neighborhood-dependent value by multiplying it with a factor 1 + aρ(r), where ρ represents the density of copper clusters. As above, the concept of ‘Activity Monitoring’ can be used in this case also to obtain a good approximation for ρ and, through it, realistic simulations of the micromasking effects from the impurities can be carried out, showing the formation of trapezoidal hillocks on {1 1 0}. Based on the success in the use of activity monitoring for the simulation of aggregation-dependent phenomena (e.g. variations in the etchant concentration, impurity concentration, etc), we encourage its use for simulations of similar phenomena in other systems. An important aspect of the previous physical models is the fact that the multiplying factor or the ab-initio model parameters effectively lead to a truly large set of different process rates (typically as large as the number of surface atoms themselves), severely reducing the number of sensible choices for an efficient kinetic Monte Carlo method. As explained in section 5, in practice one needs to use the K-level search method (including any of the tree search methods) for these KMC simulations.

(2)

2 a (rT + rETM )(rM + rL(ETM) ) − rM R331 = √ rT + rETM + rL(ETM) − rM 19 ···

and then ‘invert’ the system to solve the atomistic removal rates {rD , rM , rT , rEMK , . . .} in terms of the experimental etch rates for those surfaces [19, 25, 26]. The form of the previous equations depends on the actual realization of step flow [19]. Since the system cannot be solved fully numerically and some input is needed from the user, we refer to this approach as semi-automatic calibration of the atomistic removal rates. 4.3. Use of ab-initio calculations For KMC, the corresponding derivation of a system such as equation (2) linking the experimental etch rates to the atomistic removal rates is an open, unsolved problem. The difficulty lies in the fact that the expression for the etch rate of a generic surface depends strongly on the values of the atomistic removal rates. Thus, only an iterative fitting procedure will in principle succeed since the equations themselves change as the atomistic rates are modified. However, this iterative solution is computationally a very expensive task. Because of this, calibration of the RPT values for the KMC simulations is typically based on morphology comparisons between the experiment and the simulations [7, 27, 28, 45]. In addition, calibration can be assisted by the use of first Principles calculations or ab-initio [50]. As an example, density functional theory (DFT) calculations have been used in order to determine the removal rates of different atoms based on back-bond weakening and steric interactions [48]. Due to the steric hindrance for OH-terminations on the surface, a surface site classification scheme that uses the number of direct and indirect second neighbors, n2d and n2i , in addition to the total number of first neighbors is used, i.e. r = r(n1 , n2d , n2i ). The use of ab-initio in this case leads to a reduction of the number of adjustable variables by introducing a set of physically meaningful parameters, including (i) the backbond energy and its weakening due to OH-termination, (ii) the interaction energies between the H and OH terminating species and (iii) the transition state energies (or ‘threshold’ energies), considered different for the surface atoms having 7

J. Micromech. Microeng. 18 (2008) 055029

M A Gos´alvez et al

In both methods, the atoms of the surface are visited one by one and their neighborhoods are inspected in order to determine the removal rate (or probability), deciding whether the atom is removed or remains attached. The essential difference between KMC and CA is that KMC removes each atom as soon as its removal has been decided while CA first determines which atoms need to be removed and then it removes them all simultaneously. As a result, the CA does two loops to complete a time step while KMC does only one, making the latter a faster method typically. Note that these loops are actually the origin of the difference in behavior: the CA provides a mean-field like evolution with a large degree of parallelism while KMC generates a purely atomistic propagation that is completely sequential. In terms of sequentiality, the two methods may be considered as extreme examples. One can conceive a hybrid method where part of the time evolution is realized using the KMC and CA methods in alternation. The strong sequential nature of KMC has so far prevented the successful implementation of this method for use in parallel computer architectures (parallelization) and only approximate methods have been introduced [51, 52]. In opposition, parallelization of a CA program is straightforward as it can be realized through simple domain decomposition [53]. In order to discuss and compare the most relevant KMC and CA methods in more detail, the following notation will be used. The microstates visited during the time evolution of a system are considered to contain a variable (i.e. non-constant) number N of processes (e.g. atom removals). Since many of these processes are typically identical (i.e. they represent the same physical phenomenon, such as the removal of the atoms at identical kink sites), they can be grouped into M ( 106 ) and only a few tens of binary searches are necessary to locate the next process. With respect to LS, this can speed up the search time by orders of magnitude.

M-fold (BKL). An alternative, popular VTS method was introduced by Bortz, Kalos and Lebowitz (BKL) [58]. Here, we refer to it as the M-fold method since we group the processes into M lists, one for each process type. Usually, the number of process types in the BKL method is defined to be N (not M) and BKL is more commonly referred to as the ‘N-fold method’ [58]. In this respect, all processes having exactly the same rate define a process type. This includes the typical case in which the grouped processes represent the same physical phenomenon. The search of the next event is realized in two steps: first, a process type is chosen by means of a linear search in the M-fold space. This search is similar to that of the previous LS method but with a smaller set of rates and it leads to the fastest process types being chosen more often. In a second step, a particular event is randomly picked up from the chosen process-type list. The M-fold method is useful if the number of process types M is small. In this case, finding one process type from the small set of M probabilities is typically much faster than finding one particular process from the list of all N events (as done with LS). In the case that the number of groups M equals the number of processes N (i.e. one process per list) the BKL method becomes the LS method and it provides no advantage. Since the search for the process type depends only on the number of process types M, the computational cost of BKL is O(M) which is independent of the system size N. Thus, if the number of distinct rates M is small, BKL offers the best possible performance for any system size, outperforming all of the other existing VTS methods. This explains the popularity of the method, as the number of distinct rates M is usually small for most applications. For the case of anisotropic etching, however, this is not typically the case and the KLS family of methods becomes more advantageous. In the BKL method, the next process is found by first searching for the most probable group, and then picking up randomly one process from the group. Since typically the number of groups (i.e. process types) is small (e.g. of the order of 10), a linear search can be typically used to determine the next group. If the number of process types increases, a binary search of the group will be a better (faster) choice. In the case that the number of groups is very large (e.g. when there is only one process for each process type), BKL becomes the LS or BTS method, depending on which search method is used to find the group. As compared to the LS method, the BKL method involves a major programming effort to write the subroutines for

K-Level search (KLS). The binary tree and the corresponding tree search (TS) shown in figure 4(g) are only an example of a more general VTS method known as K-level search (KLS) [56] where the tree has K levels and the bin size, denoted g, can be different at each level. By increasing the bin size, the number of levels can be reduced. For instance, a system with one million different rates can be represented with K = 9 levels and a bin size of g = 5 (59 = 1953 125 > 106 ). Thus finding a process will typically be done in nine searches, each within a bin containing five rates. Depending on the implementation details, this can be faster than the binary tree search involving about 20 searches, each within a bin holding two rates. KLS is a very general method containing many popular methods as particular realizations. These are generated by fixing the number of levels K and/or the bin size g. Examples are Maksym’s √ binning method [57] (or ‘binning’, with K = 2 and g =  N ), Linear Search (or LS, with K = 1 and g = N ), Binary Search (or BS, with K = log2 N  and g = 2), Quadtree Search (or QS; K = log4 N , g = 4) and Octree Search (or OS; K = logN , g = 8), where x denotes the smallest integer larger than x. Linear Search and Binary Search are the most extreme choices, with the largest possible bin size for LS and the smallest for BS. In addition, it is possible in principle to find an optimal KLS tree by both minimizing K and g simultaneously so that a minimum number of linear searches will be done in a minimum bin size. This is done by choosing K as the lowest integer that satisfies K K > N and choosing g = K. In practice, the methods where g is even (such as BS, QS and OS) are typically fastest since their 10

J. Micromech. Microeng. 18 (2008) 055029

M A Gos´alvez et al

Table 1. KMC methods. N is the number of surface atoms. M is the number of different surfaces sites. Type

Acronym

Method

Computation cost

Book keeping

CTS

Metropolis

Metropolis acceptance/rejection

O(N )

None

KLS LS BS QS OS Binning

K-Level search Linear search Binary search Quadtree search Octree search Maksym’s binning

BKL

M-fold grouping

O(KN ) = O(g logg N ) O(N ) O(2 log2 N ) O(4 log4 N ) O(8 log8 N ) O(2N 1/2 ) (linear) O(2 log2 N 1/2 ) (binary) O(M), independent of N

VTS

1/K

Large Low Large Large Large Medium Very large

algorithms for a generic simulation is given in the appendix. The algorithms can be used for etching and growth, amongst other processes. For growth, diffusion events along the terraces and steps must typically be included, unlike etching. Apart from the difference in the time increments, the VTS and CTS methods differ mostly in the amount of book keeping effort. Although the implementation of CTS is very simple as no book keeping is required, it is only advantageous for systems dominated by fast processes. On the other hand, the VTS methods always make use of some form of book keeping (except LS) and are thus more complicated to be written and tested. For systems where all the rates are high, the VTS methods are not worth as compared to Metropolis (CTS). However, for stiff systems where the lowest and highest rates can differ by orders of magnitude, the VTS methods are unavoidable. BKL is useful only for systems with a few (i.e. less than ten) distinct rates while KLS is valid for systems with any number of different (also time-dependent) rates. Using bit shifting as an acceleration procedure, BS, QS and OS are the fastest forms of KLS. The amount of book keeping effort is largest for BKL, not only when running the program but also when writing it by a programmer. In comparison, KLS is simpler to program, both in its generic form and tuned as any of BS, QS, OS or others. In practice, a KLS program also spends less time in book keeping tasks than BKL. As a rule of thumb, if the number of different rates (i.e. M) is larger than 10, KLS should be chosen over BKL.

updating the lists. This is necessary because every successful event will typically affect the neighboring processes, changing not only their rates but also their process types. This means that the processes have to be moved form one list to another. Book-keeping is an essential feature of the M-fold algorithm and it consumes most of the computational time used by this method. As a matter of fact, book-keeping yields the method unnecessarily expensive for systems with a high total rate, where typically a simple Metropolis implementation (i.e. CTS) will perform more efficiently. However, when the rates of the processes are low, the BKL method typically provides faster simulations (by many orders of magnitude) as compared to the Metropolis method. A potential limitation of BKL for some applications is the necessity to fix the number of process types (lists) before a simulation can be started. There have been recent efforts to overcome this limitation by using a self learning KMC (SLKMC) algorithm based on BKL [3] in which the number of lists is dynamically modified at run time as demanded by the evolution of the system. Binning. The binning method [57] is a special case of KLS. As BKL, the binning method corresponds to a two-level search. However, in this case, each bin contains processes with different rates, as in KLS. The grouping (or binning) can be the result of a spatial or random decomposition of the system. As a matter of fact, any grouping is valid. However, keeping the bin sizes similar (as in the KLS trees) will improve the performance. As in BKL, choosing a bin is stochastically determined by means of a linear or a binary search. However, in opposition to BKL where the chosen process is determined by randomly picking up one process in the chosen bin, in the binning method the process must be chosen using a linear search (or equivalent) within the bin. In order to balance the computational effort in each of the two searches, the system should be decomposed in N 1/2 groups, each of them containing N 1/2 processes. In this way, it will take on average the same time to find a group than to find a process within a group. If linear searches are conducted at each search level, the computational cost of the binning method is O(2N 1/2 ). If binary searches are done at the two levels, the computational cost is O(2log2 N 1/2 ) = O(log2 N ).

5.3. CA methods The cellular automata (CA) methods provide a deterministic, mean field approach to the evolution of an interface. The basic feature is that, in a first pass, the algorithm determines which processes need to be performed and, in a second pass, those processes are realized, effectively producing them simultaneously (i.e. in parallel). Here, we focus on the presentation of the basic cellular automaton (BCA) and the continuous cellular automaton (CCA), which differ by defining discrete or continuous occupation states for the substrate atoms, respectively. 5.3.1. Basic cellular automaton. The discrete CA method, which we refer to as the basic cellular automaton (BCA), defines the substrate as a collection of atoms which are either present (state 1) or absent (state 0). From a more general

Summary. Table 1 provides a summary of all the constant and variable time stepping methods for the realization of KMC simulations. In addition, a description of the corresponding 11

J. Micromech. Microeng. 18 (2008) 055029

M A Gos´alvez et al (b)

(a)

Figure 5. Comparison of etch rate accuracy for (a) the BCA and (b) the CCA methods.

perspective, the atoms may be defined as cells, which can be full (state 1) or empty (state 0). Since there are no other intermediate states between 0 and 1 the system is discrete in nature. The atoms have associated removal rates which depend on the state of the neighborhood and the propagation of the surface is the result of the removal of groups of identical surface atoms whose type alternates sequentially. In this manner, for instance, the advancement of a Si{1 1 1} vicinal crystallographic plane due to wet etching (or epitaxial growth) is realized as the removal (or addition) of rows of atoms mimicking an average form of step flow. For anisotropic etching, the rules controlling the removal of an atom in a BCA simulation where introduced by Tan and B¨uttgenbach [46, 47]. An atom is removed if it has two neighbors, which corresponds to the typical configuration of the ideal {1 0 0} surfaces, or if it has three neighbors with at least one of them in the etch front, which corresponds to the neighborhood configuration for {1 1 0)}. The atom is not removed if it has three bulk neighbors, corresponding to {1 1 1}. Finally, an atom with any other neighborhood arrangement is also removed. The standard implementation of the BCA method uses Constant Time Stepping, consisting on choosing the size of the time increment (e.g. 1) and performing the two CA loops. In the first loop, the removal of each surface atom is decided according to the previous removal rules and the atoms that will be removed are marked with a special flag. In the second loop, the marked atoms are removed, updating the state of their neighborhoods. The propagation of the surface is obtained as these complete surface updates are consecutively generated. The combination of the previous removal rules and CA time evolution with the experimentally measured etch rate of {1 0 0} for calibration purposes leads to rather good qualitative simulations of etching for engineering applications. The explanation of this qualitative agreement is purely geometrical in origin, as we explain here. For many anisotropic etchants the etch rate of silicon presents a global minimum at {1 1 1}, a global maximum at {1 1 0} and a local minimum and maximum at {1 0 0} and {h 1 1}, respectively, with h = 3 or 4, as shown in figure 5 for 50% w/v KOH at 70 C as an example [59]. On the

other hand, it is a simple exercise to derive the etch rates of the different surface orientations according to the BCA removal rules√given above. One gets for instance: R111 = 0,√R110 = √ a/(2 2), R100 = a/4, R311 = a/ 11, R211 = a/(2 6) and so on. Here, the unit of time is taken as 1 and a is the lattice parameter for the conventional unit cell. The quoted expressions simply differ due to the geometrical differences between the surfaces. Incidentally, figure 5(a) shows that the BCA method qualitatively reproduces the existence of the major features, including the global minimum at {1 1 1}, the local minimum at {1 0 0}, the local maximum at {3 1 1} and the global maximum at {1 1 0}. In this figure, the simulations have been calibrated according to the etch rate of {1 0 0} (31 µm min−1 at 54.74◦ ). In agreement with the theoretical √ values, the simulations show that R311 = 4/ √11R100 (approximately 37.4 µm min−1 at 29.50◦ ), R110 = 2R100 (approximately 43.8 µm min−1 at 144.74◦ ) and R211 = √ 2/ 6R100 (approximately 25.3 µm min−1 at 19.47◦ ). Since the variation of the etch rate as a function of orientation for the BCA method is only due to the change in geometry, the qualitative agreement might be described as a happy coincidence, valid for most etchants but not for all. In particular, it is impossible for the BCA method to describe the orientation dependence of the etch rate in TMAH at low concentration, where the etch rate for {1 1 0} is smaller than that for {1 0 0}. The √ BCA method is limited to the fixed value of R110 /R100 = 2. Similarly, the method cannot be used to fine-tune the relative etch rates of e.g. {1 0 0} and {3 1 1} or any other pair. In fact, the rates of any one pair of orientations are fixed by the underlying geometry and the given set of removal rules, and it is only possible to shift up or down the whole CA curve of figure 5(a), which is what calibration with respect to {1 0 0} does. 5.3.2. Continuous cellular automaton. The continuous cellular automaton (CCA) method defines the substrate as a collection of sites (or cells) which can be fully or partially occupied by the atoms (or occupants). The atoms have associated removal rates which depend on the site type (or cell type) that they occupy and the propagation of the surface 12

J. Micromech. Microeng. 18 (2008) 055029

M A Gos´alvez et al

is described as the overall result of gradually decreasing the occupation of the surface sites (or cells) according to the removal rates of the atoms (or occupants). The ‘occupation’ of the sites corresponds to the ‘mass’ of the atoms in the original presentation of the CCA method by Zhu and Liu [43]. For an explanation of the benefits of the use of the ‘occupation’, see [19]. In short, the occupation of any surface site is constantly reduced by an amount equal to the current removal rate of the occupying atom. This has to take into account the changes in the neighborhood, as they typically affect the value of the rate (section 4). Since this is a CA method, two loops are used, which ensures that the removal of identical sites occurs simultaneously, thus providing an average propagation mode of the interface, typically consisting on the gradual removal of entire rows of atoms. In most cases, the removed rows alternate according to a predictable sequence. For a very detailed description of the method, see [19]. In order to implement the CCA method, one can use both Variable and Constant Time Stepping. For the CCA simulations, there exists only one VTS implementation, which produces exact results. In contrast, there exist several CTS methods but they only provide approximate time evolutions.

the experimental points. In this case, etching has been simulated using the Exact CCA method for all shown surface orientations. As compared with the curve for the BCA method in figure 5(a), the CCA reproduces not only qualitatively but also quantitatively the major features of the etch rate anisotropy, thus becoming a more accurate method than BCA for the simulation of three-dimensional etch profiles of interest in engineering applications. Constant time stepping CCA (CTS-CCA or, simply, CCA). Constant time stepping (CTS) is realized by choosing the size of the time increment (e.g. 1) and performing the two CA loops. In the first loop, the occupation of every site is reduced according to the current value of the removal rate of the occupying atom. The atoms at the sites reaching zero (or negative) occupation are marked for removal. In the second loop, the marked atoms are removed and the state and rate of their neighbors are updated. The CTS procedure is schematically shown in figure 6(b). There are two problems associated with CTS-CCA (see figure 6(b)): (i) the occupation of a site can become negative just before the removal of the corresponding atom, and (ii) the method enables the simultaneous removal of different groups of atoms that should occur sequentially according to the exact evolution. The first problem occurs because the removal rate of a surface atom can take any value (between 1 and 0, when normalized), most times far from being a multiple of the current value of the occupation, which itself is changing (decreasing constantly). Thus, at some point during the evolution, the occupation becomes smaller than the time step, eventually becoming negative right before the atom is removed. The second problem occurs because the time step can be larger than the two smallest time increments to remove the next two scheduled groups of atoms. As a result, both groups are removed simultaneously. This can be wrong since the removal of the first group can sometimes lead to the formation of very active atoms which should be removed next instead of the second group. As a result, constant time stepping can affect the correct sequence for groups of events such as the removal of complete rows of atoms, affecting their order and, thus, the overall etch rate.

VTS-CCA: variable time stepping CCA (Exact CCA). In the variable time stepping (VTS) method for CCA simulations time is always advanced by an exact amount calculated according to the values of the occupation and rate for all possible processes at the current state. During the first loop over the whole surface, the minimum time to remove one surface atom (or a group of atoms having the same time increment) is determined and, in the second loop, only those atoms with this time increment are removed. This is schematically shown in figure 6(a). Note that this exact approach is not possible for the BCA method. As for the BCA method, calibration is needed in order to match the simulations with the experiments. Calibration reduces to fitting {1 0 0} for BCA. For the CCA method, however, calibration is a more complicated task than just shifting up or down all the rates to match that of {1 0 0}. But it is also more flexible. In the same way as the etch rates of the different surface orientations can be written down exactly for the BCA method (see section 5.3.1), the same can also be done for CCA. In this case, however, the expressions are more complicated as they incorporate the removal rates of the involved surface atoms [19]. This results simply from the fact that the (normalized) removal rates for these atoms are not only 1 or 0, as in BCA, but any value between those two extremes. As a result, one obtains a system of non-linear equations that link the experimental etch rates for the crystallographic planes with the removal rates for the atoms [19]. By minimizing the global error [26] or, otherwise, numerically solving the system [19], one can obtain calibrated values for the removal rates. As an example, the triangular (black) markings in figure 5(b) show a small set of seven orientations that have been chosen for calibration of the atomistic removal rates. After that, wet etching on any surface orientation can be simulated and the corresponding etch rates plotted for comparison with

Time compensation (TC). The time compensation (TC) method was introduced in order to improve the accuracy of the CTS-CCA simulations [43]. Since the cell occupation eventually becomes smaller than the time step (problem ∗ in figure 6(b)) of the last (i) above), only a portion (t1,2 time increment before the removal should be used, this way preventing a negative occupation value. TC adds the end part of the last time increment (red segments in figure 6(b)) to the next time step, thus eliminating negative occupation and compensating an otherwise erroneous measure of time. Unfortunately, TC works well only when the sites removed in a time step have an identical value of occupation and removal rate. TC does not correct the second prevalent error associated with the use of CTS, namely, the simultaneous removal of different cells (problem (ii) above). Thus, a simulation using TC for the CTS-CCA method does not 13

J. Micromech. Microeng. 18 (2008) 055029

M A Gos´alvez et al

(a)

(b)

Figure 6. Differences between variable and constant time stepping methods for CCA simulations. Time compensation (TC) is illustrated for the standard and backward implementations.

may affect the accuracy. A trade-off can typically be found. Both the exact (VTS) and the approximate (CTS-CCA with TC or BTC) methods can be used in our simulator [42]. Exact CCA is used for deriving and verifying analytical equations describing the etch rate of different surface orientations [19] and the approximate methods are used for demonstrating and/or developing MEMS engineering applications. Because the VTS-CCA method (exact) helps the system to directly jump from one removal (or group of removals) to the next, this method runs faster than CTS-CCA with TC or BTC when the surface atoms have relatively low rates. In this situation, many loops in the CTS-CCA method are spent decreasing the occupation of the cells without removing the atoms. In addition, exact time stepping is more efficient when evaluating the etch rate of pure surface orientations. However, if the simulation uses mask patterns (such as oxide or nitride layers, as in the engineering applications), constant time stepping with time compensation will typically show faster performance. This is due to the fact that the correct sequential chain of removals at the convex corners usually demands the removal of only a few atoms for each loop in the VTS-CCA method, while more atoms are removed with CTSCCA. The difference in computational performance becomes more accentuated when more site types are identified with different removal rates. The faster performance of CTS-CCA with TC or BTC comes typically at the expense of certain (acceptable) inaccuracy in the results. As compared to the fastest KMC methods such as BKL and KLS, a CCA simulation is typically slower. Although the numbers depend on the simulated system and the size of the neighborhood for the atomistic updates as well as on the computer architecture, the typical computational cost for our anisotropic etching simulations running on a desktop computer (Pentium III) is about 50 microseconds/event for the fastest CCA simulations. This is almost 10 times slower than our

Table 2. CA methods. Type

Acronym

Method

VTS

Exact CCA (also VTS-CCA)

Exact continuous cellular automaton (Variable time stepping CCA)

CA (also BCA) CCA (also CTS-CCA) TC-CCA BTC-CCA

Cellular automaton (Basic CA) Continuous cellular automaton (Constant time stepping CCA) (Standard) time compensated CCA Backward time compensation CCA

CTS

necessarily follow the correct evolution, resulting in deviating etch rates for some surface orientations according to our experience. Backward time compensation (BTC). Backward time compensation (BTC) is an alternative way of improving a CTS-CCA simulation by simply stepping back time an amount equal to the end part of the time increment leading to a removal (see figure 6(b)). Although it improves the accuracy of the simulations, it does not correct the second problem (ii). The use of TC or BTC for improving the CTS-CCA simulations leads to slightly different results. According to our experience, based mainly on visual comparison, BTC generates results slightly closer to the exact evolution given by VTS-CCA. Summary. Table 2 provides a summary of all the constant and variable time stepping methods considered in this paper for the realization of CCA simulations. The use of a carefully selected time increment for a CTSCCA simulation together with TC or BTC can provide similar results than a exact VTS simulation but much faster. The efficiency of a standard/backward time compensated CTSCCA computation improves by using larger time steps but this 14

J. Micromech. Microeng. 18 (2008) 055029

(a)

(d )

M A Gos´alvez et al

(c )

(b)

(f )

(e)

Figure 7. Comparison of results from experiment and CCA and KMC simulations for anisotropic etching of a sphere. Experiment (a)–(b): KOH 34 wt% at 70 C. Simulations (c), (e)–(f ): KOH 30 wt% at 70 C. (d) Visualization of the octree structure for a small silicon sphere.

highlighted, presenting the structure, properties and benefits of the use of an octree for atomistic simulations. As an specific example for the simulations, we consider anisotropic etching of silicon and discuss some of the key crystallographic features that allow simulating etching or growth on other crystalline substrates. Similarly, we provide a brief review of the different physical models used for the determination of the reaction rates in etching, serving as an example of different calibration strategies for the comparison of simulations and experiments. The paper compares KMC and CA simulations of spherically shaped samples with the experimental results and discusses the simulation of an array of AFM probes, providing links to other simulated systems.

fastest KMC simulations. The reason can be found in the two CA loops, which have a large impact on the efficiency. An example comparing the CCA and KMC methods for the simulation of an etched sphere is shown in figure 7. The CCA results are characterized by an average behavior following exactly the same pattern in equivalent regions while the KMC is characterized by a strong randomness. Both methods capture the 4-fold, 3-fold and 2-fold symmetry around {1 0 0}, {1 1 1} and {1 1 0}, respectively. Moreover, the two methods provide a rather good quantitative description of the overall etch rate distribution. One should notice that the simulation of the etched profile of an spherically shaped sample is a very challenging problem since many competing orientations must be described accurately. Other simulation results concerning the use of CCA and KMC for engineering applications such as microneedles, microfluidics and other complex processes have been presented in [19, 26, 38].

Acknowledgments We acknowledge support by the Japanese MEXT 21st Center of Excellence Program (‘Micro- and Nano-Mechatronics for Information-Based Society’ (2003-2007)), the JSPS Bilateral Programs (Joint Research Project ‘Physical Chemistry of Anisotropic Chemical Etching of Silicon Applied for MEMS’ (2005-2207) with the Academy of Finland), and the MEXT Scholarship Programs (Research Student (2005-2007). This research has also been supported by the Academy of Finland through its Centers of Excellence Program 2006-2011.

6. Conclusions Based on the accumulated experience from the development of a versatile atomistic simulator (http://www.fyslab.hut.fi/ ∼mag/VisualTAPAS), the paper presents the currently available atomistic simulation methods focusing on the kinetic Monte Carlo (KMC) and cellular automata (CA) approaches, describing their similarities and differences, and stressing their relative strengths and weaknesses for the simulation of an evolving complex interface. An example of such dynamic surfaces typically results from the application of different growth and etching processes such as in the fabrication of micro/nano electro mechanical systems. However, the use of the atomistic methods is not restricted to these cases. In addition, important data storage considerations are

Appendix A description of the algorithms for the different KMC methods is given. These are classified as shown in table 1. The tree search methods are all similar. Here we present the generic 15

J. Micromech. Microeng. 18 (2008) 055029

M A Gos´alvez et al

case, KLS, and one particular example, BS. The inner loop of the CTS method can be described as follows:

the averages of the observables (SAt = SAt−t + At); (4) Generate new processes by repeating (1)–(3).

Algorithm 0: CTS (Metropolis)

Algorithm 3: VTS-3 (LS) (1) Randomly choose one process i according to its probability of success. This isdone by picking up the first i for which i−1 i j =1 wj < r  =1 wj , where r is a random number in j N [0, 1] and wi = pi j =1 pj is the success probability for process i; (2) Increment time by t = −ln(e)/S NN , where e is a random number in (0, 1] and SN = i=1 pi (i.e. the total rate); (3) Always accept the process; update the rates of all affected processes (little bookkeeping); update the sums used for calculating the averages of the observables (SAt = SAt−t + At); (4) Generate new processes by repeating (1)–(3).

(1a) Choose randomly (with uniform probability 1/N ) a particular microscopic process i from the N processes at the current state; (1b) Read-off the process rate p; (1c) Choose a random number r in [0, 1]; (2) Increment time by t = 1/N ; (3) Update the sums used for calculating the averages of the observables (SAt = SAt−t + At); (4) If r  p, accept the process; update the rates of all affected processes; (5) Test a new event by repeating steps (1)–(4). Note that the next trial process in the CTS algorithm can be typically chosen by simply picking up one process randomly. This is referred to as Random Pick Up (RPU). In some systems it is possible to speed-up the simulation by choosing the next event deterministically from the list of all possible processes (i.e. by always choosing the next entry) due to the inherent stochastic nature of the criterion for success. This is referred to as Sequential Pick Up (SPU). SPU is not always possible as it may introduce correlations in the system. It is always a good idea to check if SPU is valid for the particular system at hand as it may speed up a simulation (e.g. by a factor of 2 or more). This, of course, depends on whether the random numbers need to be generated at run time or they have been stored previously and the speed-up factor depends on the efficiency of the random number generator. There are multiple variations for the inner loop of the VTS algorithms. The ones below are just one possible choice.

Algorithm 4: VTS-4 (KLS) (0) Choose K as the lowest integer that satisfies K K > N . Choose g = K. Define N (0) = N and calculate the number of subsystems at level k as N (k) = N (k−1) /g, with k = 1, 2, . . . , K. Here x denotes the nearest integer towards infinity i.e. the ceiling function. Define Si(0) = pi with i = 1, 2, . . . , N , where pi is the rate of process  i, and calculate the ig total rate of subsystem i at level k as Si(k) = j =1+(i−1)g Sj(k−1) with i = 1, 2, . . . , N (k) and k = 1, 2, . . . , K. (1) Choose one process i according to its probability of success by using a K Level search. This consists on performing K linear searches k = K, K − 1, . . . , 1, starting at k = K and ending at k = 1. In the kth search, subsystem i for which g pick(k)up the i first (k) i−1 (k) j =1 Sj < r j =1 Sj , where r is a random j =1 Sj  in [0, 1] and Si(k) is the total rate of subsystem i at level k. When level k = 1 is reached, a process i with rate pi = Si(0) has been chosen. (2) Increment time by t = −ln(e)/SN , where e is a random number in (0, 1] and SN = N i=1 pi (i.e. the total rate); (3) Always accept the process; update the rates of all affected processes; update the affected subsystem sums at all levels k = 1, 2, . . . , K (moderate bookkeeping); update the sums used for calculating the averages of the observables (SAt = SAt−t + At); (4) Generate new processes by repeating (1)–(3).

Algorithm 1: VTS-1 (M-fold or BKL) (1a) Choose a process type α according to its probability of success. This is done α−1 α by picking up the first α for which w < r  β=1 β β=1 wβ , where r is a random number in M [0, 1] and wα = Nα pα β=1 Nβ pβ is the success fraction for process type α; (1b) Choose a particular realization i of process α by randomly picking one process from the chosen processtype list. (2) Increment time by t =−ln(e)/SN , where e is M p = a random number in (0, 1] and SN = N i=1 i α=1 Nα pα (i.e. the total rate); (3) Always accept the process; update the process types and rates of all affected processes and their corresponding lists (intensive bookkeeping); update the sums used for calculating the averages of the observables (SAt = SAt−t + At); (4) Generate new processes by repeating (1)–(3).

References [1] Oura K, Lifshits V G, Saranin A A, Zotov A V and Katayama M 2003 Surface Science: An Introduction (Berlin: Springer) chapters 13–14 [2] Salo P, Hirvonen J, Koponen I T, Trushin O S, Heinonen J and Ala-Nissila T 2001 Phys. Rev. B 64 161405 [3] Trushin O, Karim A, Kara A and Rahman T S 2005 Phys. Rev. B 72 115401 [4] Gos´alvez M A, Sato K, Foster A S, Nieminen R M and Tanaka H 2007 J. Micromech. Microeng. 17 S1–26 [5] Garcia S P, Bao H and Hines M A 2004 Etchant anisotropy controls the step bunching instability in koh etching of silicon Phys. Rev. Lett. 93 166102 [6] Gos´alvez M A, Xing Y, Hynninen T, Uwaha M, Foster A S, Nieminen R M and Sato K 2007 J. Micromech. Microeng. 17 S27–37 [7] Gos´alvez M A and Nieminen R M 2003 New J. Phys. 5 100 [8] Shikida M, Masuda T, Uchikawa D and Sato K 2001 Sensors Actuators A 90 223–31

Algorithm 2: VTS-2 (BS) (1) Choose one process i according to its probability of success by using a binary-tree search. This is  done by i keeping in memory the N cumulative sums Si = j =1 pj (i = 1, 2, . . . , N ) and bisecting the interval [0, N ] until the index i satisfies Si−1 < rSN  Si , where r is a random number in [0, 1]; (2) Increment time by t = −ln(e)/S N N , where e is a random number in (0, 1] and SN = i=1 pi (i.e. the total rate); (3) Always accept the process; update the rates of all affected processes and the affected cumulative sums (moderate bookkeeping); update the sums used for calculating 16

J. Micromech. Microeng. 18 (2008) 055029

M A Gos´alvez et al

[9] van Veenendaal E, Sato K, Shikida M, Nijdam A J and van Suchtelen J 2001 Sensors Actuators A 93 232–42 [10] Elwenspoek M and Jansen H V 1998 Silicon Micromachining (Cambridge: Cambridge University Press) [11] Park S, Lee S, Yi S and Cho D D 1999 Japan. J. Appl. Phys. 38 4244–9 [12] Kwon J W and Kim E S 2002 Sensor Actuators A 97–98 729–33 [13] Jansen A P J 2003 An introduction to Monte Carlo simulations of surface reactions Preprint arXiv cond-mat/0303028v1 [14] Voter A F 2005 Introduction to the kinetic Monte Carlo method Radiation Effects in Solids ed K E Sickafus and E A Kotomin (Dordrecht: Springer) [15] Weimar J R 1997 Simulation with Cellular Automata (Berlin: Logos) [16] Wolfram S 2002 A New Kind of Science (Champaign, IL: Wolfram Media) [17] Allen M P 2004 Introduction to molecular dynamics simulation Computational Soft Matter: From Synthetic Polymers to Proteins, Lecture Notes (NIC Series vol 23) ed N Attig, K Binder, H Grubm¨uller and K Kremer (J¨ulich: John von Neumann Institute for Computing) pp 1–28 [18] Werkmeister J, Gos´alvez M A, Willoughby P, Slocum A and Sato K 2006 J. Microelectromech. Syst. 15 1671–80 [19] Gos´alvez M A, Xing Y and Sato K 2008 J. Microelectromech. Syst. 17 410–31 [20] Asaumi K, Iriye Y and Sato K 1997 Proc. IEEE Int. Conf. on Micro Electro Mechanical Systems (MEMS 97) (Nagoya, Japan, 26–30 Jan.) pp 412–7 [21] Fr¨uhauf J, Trautman K, Wittig J and Zielke D 1993 J. Micromech. Microeng. 3 113–5 [22] Sequin C H 1992 Sensors Actuators A 34 225–41 [23] Li G, Hubbard T and Antonsson E K 1998 Proc. 1998 Int. Conf. on Modeling and Simulation of Microsystems (MSM 98) (Santa Clara, CA, 6–8 Apr.) pp 356–61 [24] Zhou Z F, Huang Q A, Li W H and Deng W 2006 Proc. of 5th Intl. Workshop on Phys. Chem. of Wet Etching of Semiconductors (PCWES) (Saarbrucken, Germany, 19–21 Jun.) p 59 [25] Zhou Z F, Huang Q A, Li W H and Deng W 2007 J. Micromech. Microeng. 17 S38–49 [26] Xing Y, Gos´alvez M A and Sato K 2007 New J. Phys. 9 346 [27] Flidr J, Huang Y C, Newton T A and Hines M A 1998 J. Chem. Phys. 108 5542–53 [28] Newton T A, Huang Y-C, Lepak L A and Hines M A 1999 J. Chem. Phys. 111 9125 [29] Gos´alvez M A, Kilpinen P, Haimi E, Nieminen R M and Lindroos V 2001 Appl. Surf. Sci. 178 7–26 [30] Gos´alvez M A, Foster A S and Nieminen R M 2002 Europhys. Lett. 60 467–73 [31] Gos´alvez M A, Cheng D, Nieminen R M and Sato K 2006 New J. Phys. 8 1–11

[32] Gos´alvez M A, Nieminen R M and Sato K 2005 Sensors Mater. 17 187–97 [33] Schulze T P 2004 J. Cryst. Growth 263 605–15 [34] Chou C-C and Falk M L 2006 J. Comput. Phys. 217 519–29 [35] Akis R and Ferry G K 2005 IEEE Trans. Nanotechnol. 4 345–8 [36] Backofen R and Voigt A 2007 J. Cryst. Growth 303 100–4 [37] Pani S K, Tjiptoharsono F, Wong C C, Premachandran C S, Ramama P V and Iyer M K 2007 Appl. Phys. A 88 401–7 [38] Xing Y, Gos´alvez M A and Sato K 2008 Octree-search kinetic Monte Carlo for the simulation of complex 3D MEMS strucures Proc. MEMS 2008, 21st IEEE Int. Corf. on Micro Electro Mechanical System (Tucson, AZ, USA, 13–17 Jan. 2008) pp 323–6 [39] Mazaleyrat G, Est`eve A, Jeloaica L and Djafari-Rouhani M 2005 Comput. Mater. Sci. 33 74–82 [40] Knizhnik A A, Bagaturyants A A, Belov I V, Potapkin B V and Korkin A A 2002 Comput. Mater. Sci. 24 128–32 [41] Brunev D V, Karpov A N, Neizvestny I G, Shwartz N L, Yanovitskaja Z S and Zverev A V 2003 Proc. 4th Ann. 2003 Siberian Russian Workshop on Electron Devices and Materials (1–4 July) pp 21–6 [42] VisualTAPAS: http://www.fyslab.hut.fi/∼mag/VisualTAPAS/ Home.html [43] Zhu Z and Liu C 2000 J. Microelectromech. Syst. 9 252–61 [44] Gos´alvez M A, Xing Y, Sato K and Nieminen R M 2008 New J. Phys. submitted [45] Garcia S P, Bao H and Hines M A 2003 Surf. Sci. 541 252–61 [46] Tan O and B¨uttgenbach S 1994 Sensors Actuators A 45 85–9 [47] B¨uttgenbach S and Tan O 1996 Proc. of IEEE European Design and Test Conf. (ED and TC 96) pp 454–8 [48] Gos´alvez M A, Foster A S and Nieminen R M 2002 Appl. Surf. Sci. 202 160–82 [49] Hynninen T, Gos´alvez M A, Foster A S, Nieminen R M, Tanaka H, Sato K and Uwaha M 2008 New J. Phys. 10 013033 [50] Foster A S, Gos´alvez M A, Hynninen T, Nieminen R M and Sato K 2007 Phys. Rev. B 76 075315 1-8 [51] Shim Y and Amar J G 2005 Phys. Rev. B 71 115436 1–12 [52] Shim Y and Amar J G 2006 J. Comput. Phys. 212 305–17 [53] Official page of Domain Decomposition Methods: http://www.ddm.org/ [54] Metropolis N, Rosenbluth A W, Rosenbluth M N, Teller A H and Teller E 1953 J. Chem. Phys. 21 1087–91 [55] Levi A C and Kotrla M 1997 J. Phys.: Condens. Matter 9 299 [56] Blue J L, Beichl I and Sullivan F 1995 Phys. Rev. E 51 867 [57] Maksym P A 1988 Semicond. Sci. Technol. 3 594–6 [58] Bortz A B, Kalos M H and Lebowitz J L 1975 J. Comput. Phys. 17 10–8 [59] Wind R A, Jones H, Little M J and Hines M A 2002 J. Phys. Chem. B 106 1557–69

17