A Distributed Simulation-Based Computational

0 downloads 0 Views 273KB Size Report
gence algorithm for inverse problem of nanoscale semiconductor device is pre- sented. This approach ... mainly integrates the semiconductor process simulation, semiconductor device simulation, evolutionary ... 1 Introduction. Technology ...
A Distributed Simulation-Based Computational Intelligence Algorithm for Nanoscale Semiconductor Device Inverse Problem Yiming Li and Cheng-Kai Chen Department of Communication Engineering, National Chiao Tung University, 1001 Ta-Hsueh Road, Hsinchu 300, Taiwan [email protected] Abstract. In this paper, a distributed simulation-based computational intelligence algorithm for inverse problem of nanoscale semiconductor device is presented. This approach features a simulation-based optimization strategy, and mainly integrates the semiconductor process simulation, semiconductor device simulation, evolutionary strategy, and empirical knowledge on a distributed computing environment. For a set of given target current-voltage (I-V) curves of metal-oxide-semiconductor field effect transistors (MOSFETs) devices, the developed prototype executes evolutionary tasks to solve an inverse doping profile problem, and therefore optimize fabrication recipes. In the evolutionary loop, the established management server allocates the jobs of process simulation and device simulation on a PC-based Linux cluster with message passing interface (MPI) libraries. Good benchmark results including the speed-up, the load balancing, and the parallel efficiency are presented. Computed results, compared with the realistic measured data of 65 nm n-type MOSFET, show the accuracy and robustness of the method.

1 Introduction Technology computer-aided design (TCAD) in semiconductor industry nowadays is continuously playing a central role in metal-oxide-semiconductor field effect transistors (MOSFETs) device fabrication [1-6]. For a set of given current-voltage (I-V) curves of MOSFETs, inversely searching out an optimal fabrication configuration forms an engineering’s inverse problem [7-9]. Conventional try-and-error procedure has been used to seeking acceptable fabrication recipe. Unfortunately, such work has significantly encountered serious challenges, and simultaneously complicates the development of next generation technology due to evident variations of electrical characteristics in modern 65 nm MOSFETs [10-12], for example. Any computationally efficient approach is therefore required and also benefits sub-65 nm MOSFET’s era. A simulation-based evolutionary TCAD methodology provides an alternative to new technology development. It is known that time cost of computation of the process simulation and device simulation dominates the efficiency of the simulation-based computational intelligence method. This approach may work in real world applications once distributed computing techniques [13-22] could be properly incorporated. In this work, a distributed implementation of the simulation-based computational intelligence technique [8] is presented for solving the semiconductor inverse problem on a G. Min et al. (Eds.): ISPA 2006 Ws, LNCS 4331, pp. 231 – 240, 2006. © Springer-Verlag Berlin Heidelberg 2006

232

Y. Li and C.-K. Chen

PC-base Linux cluster. This approach successfully integrates a two-dimensional (2D) process simulation, device simulation, computational intelligence algorithm [13, 23-24], and empirical knowledge on our own cluster with message passing interface (MPI) libraries. According to achieved and accumulated experience from semiconductor foundry, different empirical knowledge is implemented in the developed prototype. It plays a good starting point in the loop of evolutionary processes. Fabrication steps are analyzed in the stage of semiconductor process simulation. To simulate the device characteristics of 65 nm MOSFETs and beyond, quantum mechanical effects are taken into consideration to accurately describe device’s transport phenomenon. A set of 2D density-gradient-driftdiffusion equations [1-2] is numerically solved with an adaptive computing technique [16]. A hybrid genetic algorithm, combining genetic algorithm with numerical optimization method [23-24], is advanced in the simulation-based computational intelligence approach. The prototype of the simulation-based computational intelligence algorithm is implemented in our PC-based Linux cluster, which is functioned with 16 CPUs. Based upon a management server of the distributed system, all jobs of the process and device simulations are gathered in a queue. The server is then dynamically allocated the jobs to each CPU of the cluster. For a set of specified target of I-V curves as well as electrical characteristics, the prototype will perform process and device simulations and search out several suitable process recipes, such as doping profiles. The stopping criterion is subject to a given error tolerance between the simulation and the specification of designed target. Compared with realistic experimental data and process recipe, the achieved results confirm the capability of the implemented prototype for a 65 nm n-type MOSFET (NMOSFET) on the PC-based cluster. The accuracy and computational performance in terms of difference benchmarks are obtained. Distributed realization of the simulationbased computational intelligence algorithm not only is of great worth in advanced TCAD development but also provides a novel way to diagnosis of device characteristics in sub65 nm MOSFETs era. This paper is organized as follows. In Sec. 2, we state the methodology. In Sec. 3, results and discussion are presented. Finally, we draw conclusions.

2 The Evolutionary Technique and Distributed Implementation The developed distributed simulation-based computational intelligence algorithm includes 2D simulations of process and device, computational intelligence algorithm, and empirical knowledge of fabrication technology. Architecture of the proposed system is shown in Fig. 1. We utilize the distributed computation technique on the hybrid evolutionary system and external simulators. The distributed system management bridges the PC clusters and the simulation-based evolutionary system together. It allocates the computing resources while the evolutionary system perform optimization task. The simulation of process and device, shown in Algorithm 1, performs simulation of several important fabrication processes to obtain the device geometry of MOSFET and the corresponding doping profile. The output of the process simulation is then used in the device simulation to examine device characteristics. Both the target and simulated I-V curves are the input of the optimization kernel. After evolutionary process, a set of newest updated parameters is proposed and suggested for the next simulation of process and device. This work features the computational intelligence approach in the inverse problem of the doping profile. The developed evolutionary prototype, shown in Algorithm 1, is mainly relying on a hybrid genetic algorithm.

A Distributed Simulation-Based Computational Intelligence Algorithm

233

This evolutionary technique works together with several practically empirical rules that are necessary for pre-process of optimization, and significantly play good initialstarting (and re-starting) points for all evolutionary steps. In this investigation, only the hybrid genetic algorithm among evolutionary algorithms is enabled due to a moderate number (about 30 parameters) of parameters to be optimized.

Fig. 1. A system diagram for the proposed system Algorithm 1. A procedure of the proposed optimization system to solve the semiconductor device inverse problem

While optimal recipe is not found Use current recipe to simulate I-V curves: Perform process simulation Obtain doping profile Perform device simulation Obtain simulated I-V curves Evaluate error (target and simulated I-V curves) Generate new recipe with empirical knowledge Algorithm 2. A procedure of the inverse modeling problem

Initialize GA environment Generate initial process recipes While device electrical characteristic is not converged Invoke process simulator to obtain doping profile While I-V curves are not converged Invoke device simulator Retrieve I-V characteristic from device simulator Evaluate result (I-V curves and device electrical characteristics) End while End while As shown in Algorithm 2, with a set of selected process recipes and device model parameters, the external numerical programs are called to perform simulations of process and device to retrieve the newest I-V curves and device characteristics. Together with the specified target of I-V curves, the results are used in the calculation of

234

Y. Li and C.-K. Chen

the newest fitness score, and then the newest parameters are suggested for next simulation and optimization. The fitness score is used to evaluate how well the solution being tested that fits the desired outcome. Given by Eq. (1), the drain current (ID) target means the simulated data and the I D is the specified target to be achieved. We distributed the external simulation programs, which are the dominant parts of the CPU time. In the procedure of the hybrid genetic algorithm, we only need to pass the genes, which represent different sets of parameters to the management server, then wait for the new fitness that returned from the PC cluster. The procedure of the distribution method implemented in this work is shown in Algorithm 3. fitness ≡ (

log( I D ) − log( I Dtarget ) 2 ) log( I Dtarget ) .

(1)

Algorithm 3. A working flow of the distribution which implemented in this work

While evolutionary system requires evaluation Acquire PC-Cluster manager to allocate resource For each assigned PC-Cluster’s CPU Calls external simulator for simulation Return results to PC-Cluster manager End for PC-Cluster manager returns results to evolutionary system End while The physical-based empirical knowledge directly indicates the relationship of the parameters and the tendency of device characteristics. During optimization processes, once a larger error occurring in certain region of the I-V curves is observed, empirical rules will be employed to destroy the evolution, which may result in different mutation and is useful in the iteration loop of simulation and optimization. We adopt the relationship of the target of the I-V curves to be optimized and several most concerned physical quantities. For different regions of the I-V curves, we can firstly tune the corresponding process or device parameters by following the empirically built-in rules in our evolutionary prototype. The corresponding pseudo code of several considered empirical knowledge of fabrication technology is shown in Algorithm 4. Algorithm 4. A procedure of the considered experimental engineering knowledge

Empirical Knowledge for MOSFET IV-Optimization For each I-V point PT in I-V curves { If( PT.voltage < 0.0 ) PT in Band-to-band tunneling model region Else if( PT.voltage >= 0.0 && PT.voltage < 0.6) PT in VT.Implementation region Else if( PT.voltage >= 0.6 && PT.voltage < 0.8) PT in Saturation region Else PT in Mobility model region }

A Distributed Simulation-Based Computational Intelligence Algorithm

235

1.0

1.0

Mutation rate = 0.1 Mutation rate = 0.3 Mutation rate = 0.5

0.9

Process+Device Process Device Fitness score

Fitness score

0.8 0.8

0.7

0.6

0.6

0.4

0.5 0

100

200

300

Number of Generations

400

0

100

200

300

400

500

600

Number of Generations

Fig. 2. (a) Performance comparisons among three different mutation rates, and (b) the performance comparisons among three different evolutionary strategies. There are totally 31 process and device’s parameters to be optimized in the case of process and device simulations.

3 Results and Discussion To inversely extract the doping profile of the designed 65 nm MOSFET for the given target of the I-V data, the implemented evolutionary prototype is running on our PCbased Linux cluster system with 16 CPUs. Figure 2a shows the comparison of the three distinct GA configurations with different mutation rates for only device parameters optimization. It is shown that the mutation rate = 0.3 has the best convergence behavior. Under the same setting, the fitness score versus the number of evolutionary generations is shown in Fig. 2b. It depicts the performance of the evolutionary technique with three different calibration strategies. If we partially optimize the process parameters or the device parameters, the accuracy of extraction is limited. Results suggest that it is necessary to extract process and device parameters simultaneously. For the given target, if the inverse extraction considers only the parameters of device modeling in the 2D device simulation, the fitness suggests that the proposed optimization methodology seems to be invalid even for a long time evolution process. For the simulation-based evolutionary technique with only the process-related parameters (i.e., only the parameters of the doping profile), a better fitness score is expected. However, the methodology with simultaneously considering the parameters of the process and device physics impressively confirms its computational efficiency. The extracted I-V curves for the explored 65 nm MOSFET are shown in Fig. 3. The symbols are the desired target to be optimized, the solid lines are the final achieved result, and the dashed lines are the original I-V characteristics corresponding to the initial setting on the process and device simulations. We note that the target can ideally be regarded as the realistic silicon data after fabrication and measurement. For the 65 nm MOSFET, the optimized doping profiles are shown in Fig. 4. The derivation of the ratio of the on- and off-state currents consists of several mechanisms, such as the level of the off-state current significantly affected by the implantations of the threshold voltage, the lightly doped drain (LDD), and source/drain. The on-state current is directly proportional to the adjustment of device’s mobility model, and the implantations of the threshold voltage and LDD. Table 1 shows the partial list of process parameters to be extracted with their numeric ranges for the explored 65nm N-MOSFETs and the extracted results.

236

Y. Li and C.-K. Chen 1e-2 1e-3 1e-4

Drain Current (A)

1e-5 1e-6 1e-7 1e-8 1e-9 1e-10 1e-11

NMOS Targets NMOS Optimized Simulation NMOS Initial Simulation

1e-12 1e-13 1e-14 -0.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Gate Voltage (V)

2e+18 0

Initial doping profile

-2e+18 -4e+18

Optimized doping profile

-6e+18 -8e+18 -1e+19 0.00

0.05

0.10

0.15

Depth into substrate (Pm)

0.20

Doping Concentration (cm-3)

Doping Concentration (cm-3)

Fig. 3. The achieved accuracy of the extracted I-V curves for the 65 nm N-type MOSFET. The device is with the gate length L = 65 nm and device width W = 1 μm. Results are simultaneously obtained with considering device and process configurations. Symbols are measured data and lines are eventually optimized result. 3.5e+19 Initial doping profile 3.0e+19 2.5e+19 2.0e+19 1.5e+19 Optimized 1.0e+19 doping profile 5.0e+18 0.0 -5.0e+18 -0.3 -0.2 -0.1 0.0 0.1 0.2

0.3

Channel direction from source to drain (Pm)

Fig. 4. (a) A plot of the extracted doping profile of the 65 nm N-MOSFET. The result is corresponding to the finally optimized I-V curves shown in Fig. 4. (b) Orange cutting-line (horizontal direction) plots for the corresponding doping profiles from surface into substrate which locates at the center of the device channel. (c) Pink cutting-line (perpendicular direction) plots for the corresponding doping profiles from the source side to the drain side below the device surface 50 nm.

Difference of the doping profile between the initial setting and final optimized results is shown in Fig. 4b. For the corresponding doping profiles, perpendicularly cutting-line plots from surface into substrate which locates at the center of the device channel are shown in Fig. 4b. A 30% difference is observed on the surface (i.e., the position at 0 μm). The plot along the channel from the source side to the drain side below the device surface 50 nm is shown in Fig. 4c. The difference is more than 50% shown in the both sides of the device channel (i.e., near the source and drain sides, respectively). Figure 5 is the achieved speed-up and efficiency for three different optimization configurations, where the speed-up is the ratio of the execution time of the simulation codes on a single processor to that on multiple processors. The efficiency of the distributed system is defined as the speed-up divided by the number of CPUs. It is found that the speed-up is about 13 for the simulation running on a

A Distributed Simulation-Based Computational Intelligence Algorithm

237

Table 1. A partial list of process parameters to be extracted with their numeric ranges for the explored 65nm N-MOSFETs

Process parameters

Numeric range

Well Imp. Energy Well Imp. Dose Well Imp. Tilt LDD Imp. Energy LDD Imp. Dose LDD Imp. Tilt LDD Imp. Rotation S/D Imp. Energy S/D Imp. Dose S/D Imp. Tilt

200 ~ 500 KeV 5e+12 ~ 5e+13 cm-2 0 ~ 45 (degree) 10 ~ 50 KeV 5e+12 ~ 5e+13 cm-2 0 ~ 45 (degree) 0 ~ 360 (degree) 10 ~ 80 KeV 1e+13 ~ 1e+14 cm-2 0 ~ 45 (degree)

The calibrated results for 65nm N-MOSFET 462 2.6e+13 5 30 3.7e+13 30 43 17 2.1e+13 2

Table 2. The achieved load balancing of the prototype running on the cluster with 16 CPUs with respect to three different population (Pop) sizes

Pop size CPU #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 Max − Min × 100% Max

8 250 259 234 231 229 245 225 254 --------13.12%

Time (min) 16 492 501 511 530 481 521 481 485 512 532 498 492 487 490 482 521 9.2%

32 1042 1023 982 987 1002 1034 1021 1012 990 1015 986 989 996 1031 1027 981 5.8%

238

Y. Li and C.-K. Chen

16-CPUs PC-based cluster system, and the efficiency is maintained at about 80%. Due to properties of distributed genetic algorithm without data exchanging, the presented work achieved to high performance. Preliminary results, shown in Table 2, are the achieved load balancing of the established system. The distributed management server properly maintains the load of each CPU in the cluster. For three different population sizes, the maximum difference of the calculation time ranges from 5.8% to 13.2% where the optimization configuration is extracting process and device parameters simultaneously. Small population size troubles the distribution in the evolutionary process and results in poor load balancing among CPUs. Increase of population sizes improves the load balancing among CPUs. 100

Device Process Device+Process

15

Device Process Device+Process

95

Efficiency (%)

Speed-up

12

9

6

90

85

80

3

3

6

9

Number of CPUs

(a)

12

15

75

3

6

9

12

15

Number of CPUs

(b)

Fig. 5. The achieved performance versus the number of CPUs for three different optimization configurations, where (a) is the speed-up and (b) is the achieved efficiency

4 Conclusions We have presented a distributed realization of the simulation-based computational intelligence algorithm for semiconductor device inverse doping profile problem. The prototype has successfully implemented on our PC-based cluster and tested on the 65 nm MOSFET devices. In the evolutionary processes, various process and device parameters have simultaneously been considered, and therefore the doping profiles of the 65 nm MOSFETs were successfully extracted according to the desired target which reflects realistic measured silicon data well. Performed on the PC-based Linux cluster with MPI libraries, preliminary performance of the distribution with dynamic job allocating has been achieved in terms of the speed-up and the efficiency. We are currently extending this approach to explore the inverse doping profile problem with minimization of the characteristic fluctuation for sub-65 nm MOSFET devices.

Acknowledgments This work was supported in part by Taiwan National Science Council (NSC) under Contract NSC-94-2215-E-009-084, Contract NSC-95-2221-E-009-336, and Contract NSC-95-2752-E-009-003-PAE, by the MoE ATU Program, Taiwan, under a 2006 grant, and by the Taiwan Semiconductor Manufacturing Company under a 2005-2007 grant.

A Distributed Simulation-Based Computational Intelligence Algorithm

239

References 1. Li, Y., Chou, H.-M., Lee, J.-W.: Investigation of Electrical Characteristics on Surrounding-Gate and Omega-Shaped-Gate Nanowire FinFETs. IEEE Trans. Nanotech. 4 (2005) 510-516 2. Li, Y., Chou, H.-M.: A Comparative Study of Electrical Characteristic on Sub-10 nm Double Gate MOSFETs. IEEE Trans. Nanotech. 4 (2005) 645-647 3. Li, Y. Yu, S.-M.: A Two-Dimensional Quantum Transport Simulation of Nanoscale Double-Gate MOSFETs using Parallel Adaptive Technique. IEICE Trans. Inf. Syst. E87-D (2004) 1751-1758 4. Li, Y.: A Parallel Monotone Iterative Method for the Numerical Solution of Multidimensional Semiconductor Poisson Equation. Comput. Phys. Commun. 153 (2003) 359-372 5. Li, Y., Sze, S. M., Chao, T.-S.: A Practical Implementation of Parallel Dynamic Load Balancing for Adaptive Computing in VLSI Device Simulation. Eng. Comput. 18 (2002) 124-137 6. Li, Y., Liu, J.-L., Chao, T.-S., Sze, S. M.: A new parallel adaptive finite volume method for the numerical simulation of semiconductor devices. Comput. Phys. Commun. 142 (2001) 285-289 7. Binder, T., Heitzinger, C., Selberherr, S.: A Study on Global and Local Optimization Techniques for TCAD Analysis Tasks. IEEE Trans. CAD. 23 (2004) 814-822 8. Li, Y., Yu, S.-M., Chen, C.-K.: A Simulation-Based Evolutionary Technique for Inverse Problems of Sub-65nm CMOS Devices. In: Kosina, H., Selberherr, S. (eds.): Book of Abstracts of the 11th International Workshop on Computational Electronics. Technische Universit¨at Wien (TU Wien), Institute for Microelectronics, Vienna, Austria (2006) 69-70 9. Dupre, L., Slodicka, M.: Inverse problem for magnetic sensors based on a Preisach formalism. IEEE Trans. Mag. 40 (2004) 1120-1123 10. Li, Y., Yu, S.-M.: Comparison of Random Dopant-Induced Threshold Voltage Fluctuations in Nanoscale Single-, Double-, and Surrounding-Gate Field Effect Transistors. Jpn. J. Appl. Phys. 45 (2006) 6860-6865 11. Li, Y., Yu, S.-M.: Study of Threshold Voltage Fluctuations of Nanoscale Double Gate Metal-Oxide-Semiconductor Field Effect Transistors Using Quantum Correction Simulation. J. Comput. Elec. 5 (2006) 125-129 12. Li, Y., Chou, Y.-S.: A Novel Statistical Methodology for Sub-100 nm MOSFET Fabrication Optimization and Sensitivity Analysis. In Extended Abstract of the 2005 Int. Conf. Solid State Devices and Materials (2005) 622-623 13. Cantú-Paz, E.: Efficient and Accurate Parallel Genetic Algorithms. Kluwer Academic Publishers, Boston (2000) 14. Goldberg, D. E.: Genetic Algorithms in Search, Optimization and Machine Learning. New York: Addison-Wesley (1989) 15. Thierauf, G., Cai, J.: Parallel evolution strategy for solving structural optimization. Eng. Struct. 19 (1997) 318-324 16. Schoneveld, A., de Ronde, J. F., Sloot, P. M. A.: Task Allocation by Parallel Evolutionary Computing. J. Paral. Distribu. Comput. 47 (1997) 91-97 17. Migdalas, A., Toraldo, G., Kumar, V.: Nonlinear optimization and parallel computing. Paral. Comput. 29 (2003) 375-391 18. Van Veldhuizen, D. A., Zydallis, J. B., Lamont, G. B.: Evolutionary computing and optimization: Issues in parallelizing multiobjective evolutionary algorithms for real world applications. In: Proc. ACM Symp. Appl. Computing (2002) 595-602

240

Y. Li and C.-K. Chen

19. Nanda, P. K. Ghose, B., Swain, T. N.: Parallel genetic algorithm based unsupervised scheme for extraction of power frequency signals in the steel industry. IEE Proc.: Vision, Image and Signal Processing. 149 (2002) 204-210 20. Lee, C.-H., Parl, K.-H., Kim, J.-H.: Hybrid parallel, evolutionary algorithms for constrained optimization utilizing PC clustering. In: Proc. Congress on Evolutionary Computation. 2 (2001) 1436-1441 21. Cantú-Paz, E., Goldberg, D. E.: Efficient parallel genetic algorithms: theory and practice, Comput. Meth. Appl. Mech. Eng. 186 (2000) 221-238 22. High, K.A., LaRoche, R. D.: Parallel nonlinear optimization techniques for chemical process design problems. Comput. Chemical Eng. 19 (1995) 807-825 23. Li, Y., Cho, Y.-Y.: Intelligent BSIM4 Model Parameter Extraction for Sub-100 nm MOSFET Era. Jpn. J. Appl. Phys. 43 (2004) 1717-1722 24. Li, Y.: A Hybrid Intelligent Computational Methodology for Semiconductor Device Equivalent Circuit Model Parameter Extraction. In: Anile, A.M.; Alì, G.; Mascali, G. (eds.): Scientific Computing in Electrical Engineering. Springer-Verlag, Berlin Heidelberg New York (2006) 345-350