Program Schemes For Multilevel Flash Memories - IEEE Xplore

3 downloads 366 Views 456KB Size Report
distributions and smaller current absortion, with positive effects on degree of parallelism and program throughput. As for FNT, much faster programming than ...
Program Schemes for Multilevel Flash Memories MARCO GROSSI, MASSIMO LANZONI, AND BRUNO RICCÒ, FELLOW, IEEE Invited Paper

This paper presents a synthetic overview of multilevel (ML) Flash memory program methods. The problem of increasing program time with the number of bits stored in each cell is discussed and methods based on both channel hot electrons (CHE) and Fowler–Nordheim tunneling (FNT) will be discussed. In the case of CHE, the use of an increasing voltage rather than a constant one on the control gate (CG) leads to narrower threshold voltage distributions and smaller current absortion, with positive effects on degree of parallelism and program throughput. As for FNT, much faster programming than those commonly used today can be done using high CG voltages without producing intolerable degradation of cell reliability. Keywords—Flash, memories, multilevel, programming.

I. INTRODUCTION Emerging new applications for Flash memories (e.g., audio and video storage) have highly increased the demand for high-density, low-cost memories. In this context, multilevel (ML) storage [1] allows to memorize more than one bit in each cell, thus offering significant cost per bit reduction for the same cell dimension. ML storage, however, implies more critical constraints in terms of program and sensing accuracy, charge retention, read and write disturbs. In particular, accurate programming requires the placement of the right amount of charge on the cell floating gate (FG) to produce tight threshold voltage ( ) distributions. If denotes the number of bits per cell, 2 such distributions, adequately separated from each other, must cover a total voltage window (TVW) (in pratice the difference be) that tends to tween the highest and the lowest value of shrink with new technologies aimed at low-voltage operations. Accurate charge placement is normally obtained by means of program and verify (P&V) algorithms featuring a sequence of small steps, each followed by a read operation Manuscript received July 1, 2002; revised January 5, 2003. The authors are with the Department of Electronics, Computer Science, and Systems, University of Bologna, 40136 Bologna, Italy (e-mail: [email protected]). Digital Object Identifier 10.1109/JPROC.2003.811714

to determine whether or not further programming is to be made. This approach obviously leads to the required accuracy, provided that the individual program steps are small enough. On the other hand, precision is heavily paid for in terms of program throughput (PT), i.e., number of bits that can be programmed per second, since the number of distribution widths. P&V steps increases with decreasing This, of course is particularly true for increasing values of (3,4,…), since the width of the distribution decreases essentially as 2 (for the same TVW). In spite of these problems, ML programming with 2 b/cell in both NOR [2], [3] and NAND [4], [5] technology is already a reality, while a substantial research effort is dedicated at the 3 and 4. cases with As for architectures, the NOR solution has been so far the mainstream Flash technology since: 1) it allows one to program cells by both channel hot electrons (CHE) and Fowler–Nordheim tunneling (FNT); and 2) the absence of serial connected cells allows faster programming and reading and avoids write disturbs (seriously affecting the NAND case). On the other hand, the NAND solution is gaining increasing interest due to: 1) its more compact layout (leading to higher memory density and lower cost per bit); and 2) values, the possibility to use very low (or even negative) thus effectively eliminating the problem of overerased cells and the consequent need of erase and verify algorithm. A symmetrical problem exists in NAND memories for overprogramming. Since unselected cells become pass transistors, is too high, this can prevent it from turning on. if a cell The problem is, however, less important than overerase in ML memories, since high accuracy in programming must be guaranteed either in NOR or NAND architecture to allow many levels to be stored in the same TVW. In the case of a NOR Flash memory, Fig. 1 illustrates the distributions required for 4, 8, and 16 levels, respectively. The needs to avoid read disturbs due to excessively low values as well as undesired programming of low cells value, during reading impose a minimum and maximum thus effectively determining the TVW.

0018-9219/03$17.00 © 2003 IEEE

594

PROCEEDINGS OF THE IEEE, VOL. 91, NO. 4, APRIL 2003

(a)

(b)

(c) Fig. 1. V distributions for ML programming of NOR Flash memory in a TVW of 4.5 V. (a) Four-level programming. (b) Eight-level programming. (c) 16-level programming.

In the case of Fig. 1, where a TVW of 4.5 V is considered, the maximum gate voltage ( ) applied during reading is 5.25, 5.4, and 5.85 V for 4, 8, and 16 levels, respectively. distribuIn the NAND architecture, Fig. 2 illustrates the tions for the eight-level NAND memory discussed in [5]: the distributions are well separated (0.4 V), and, although the applied to nonselected word-lines in reading is maximum 6 V (a trade-off between fast reading and device reliability), a reliable and efficient device is achieved.

2) FNT: electrons are injected into the FG by tunneling due to the high vertical electric field. Compared with FNT, CHE requires lower voltages, with benefits for the driving circuitry and device reliability, but is also characterized by large current absortion that limits the degree of parallelism (DOP) and is problematic for lowpower applications. In the following sections, program methods for both CHE and FNT are synthetically discussed.

II. MULTILEVEL PROGRAM METHODS Flash memory programming is achieved by injecting electrons into the FG. This can be obtained by means of two different physical mechanisms. 1) CHE: electrons in the channel of the cell MOSFET gain enough energy by the driving electric field to be injected into the FG (helped by the vertical electric – ). field, essentially due to GROSSI et al.: PROGRAM SCHEMES FOR MULTILEVEL FLASH MEMORIES

III. CHANNEL HOT ELECTRONS NOR Flash memories can be programmed by CHE using two different techniques: 1) conventional box programming; and 2) ramped voltage programming. In the former method, a constant voltage is applied on the is CG during the whole operation, while in the latter raised linearly during programming.

595

Fig. 2. Target V distributions for eight-level NAND Flash memory. The picture is taken from [5].

The FG voltage ( ) and the injection current into the FG ( ) are linked by the following equation [6]: (1) is the FG to CG capacitance; is the FG to where is the total capacitance between drain capacitance; and FG and the other MOSFET regions. , thus In conventional box programming, . Since decreases with decreasing, both the programming speed ( ) and are high at the beginning of programming, but decrease with program time and reach a low value at the end of the operation, as schematically illustrated in Fig. 3(a) [6]. This behavior represents a problem ) limit the DOP, thus because high values of (hence of the PT. Moreover, strong nonuniformities of produce high , hence (relatively) wide dispersion in programmed distributions. is constant With ramped voltage programming instead, is the slope of the gate bias waveform) and (hereafter, ; thus, . If the initial , value of the ramp applied to CG is set so that the write operation takes place under equilibrium conditions ), where both and are constant, as schemati( cally illustrated in Fig. 3(b) [6]. and for ramped voltage Qualitative waveforms of programming are sketched in Fig. 4(a) [6], while Fig. 4(b) [6] and (here, shows the expected transient behavior of denotes the time necessary to reach the equilibrium condition ). In Fig. 4(c) [6], the expected waveforms for and are schematically described. helps to maximize As already mentioned, constant DOP, hence PT. Furthermore, the linear relationship be596

tween programmed and program time produces a better distributions. accuracy in programming, hence, tighter distribution widths obtained with ramped voltage programming depend on programming conditions, i.e., drain and and , respectively) as well as on . substrate bias ( ) of the proFig. 5 shows the standard deviation ( distribution measured on 10 K cells as a grammed , for different values of and . function of For all considered bias configurations, the minimum is obtained at low program speeds (low ) and increases with . Thus, a tradeoff is in order between high program speeds and good accuracy in achieving the value. final From this point of view, the ramped voltage programming technique has been shown to be able to program a Flash memory array on four levels (2 b/cell) without the need of P&V algorithms [7], with substantial benefits of PT. In particular, assuming the same DOP (256), the method of [7] re0.8 MB s, instead of 0.17 MB/s achieved in sults in a [2]. distributions are well separated, and the The obtained minimum read margin (i.e., the difference between the cell and the gate bias used in reading) is 0.4 V. Also, after 20 K program/erase (P/E) cycles, the read margin does not degrade much; thus, the reliability constraints for the memory are guaranteed. However, programming the memory on eight or more levels without P&V algorithms requires a significant increase in TVW that is not compatible with desirable circuit specification. On the other hand, the use of ramped voltage programming in conjunction with P&V is problematic, because before each program step the exact value of cell must be determined in order to set the correct initial value of . Since determination is a time-consuming operation, PROCEEDINGS OF THE IEEE, VOL. 91, NO. 4, APRIL 2003

Fig. 3. Conceptual plots of V_ as a function of the FG voltage V and corresponding typical behavior of V during programming operation where the CG has (a) a box waveform or (b) a ramp waveform.

ramped voltage programming with P&V is more convenient than conventional box programming only if a minimum number of verifications is used. A new programming method that combines ramped voltage programming with verify operations is described in [8]. With this algorithm, programming is performed using determination. only two steps, each precedeed by a In detail, and with reference to Fig. 6, the program algorithm consists of the following steps. First, the initial ) of the cell is determined. Second, the cell value ( to an intermediate target value is programmed from ) using a ramped CG voltage with slope and ( ) for all cells. Third, the obtained the same overdrive ( ) of after this program step is determined. value ( to the final Fourth, the cell is programmed from with a CG voltage of slope and overdrive value , where . The deterguarantees quasi-equilibrium conditions mination of during the first program operation, thus avoiding initial high current absortion and loss of accuracy. The determination of , instead, allows one to adjust the program overdrive to account for the characteristics of each individual cell, and GROSSI et al.: PROGRAM SCHEMES FOR MULTILEVEL FLASH MEMORIES

represents the essential element to obtain adequate program accuracy. The algorithm is capable of achieving distribution widths and displacement of the distribution mean value from the targets smaller than 150 and 20 mV, respectively. This method is adequate for 3 b/cell ML schemes while, distribufor the case of 4 b/cell, the separation between tion is probably insufficient for direct use in real memories, although the adoption of error correcting codes makes it possible to use it also for 16-level schemes. The achieved program time is six times lower than that obtained with the algorithm of [2] for 4 b/cell at cell level (70.75 instead of 400 s) that, with a cell matrix scheme 256 and parallel analog determination of featuring DOP cells , results in a PT about three times larger (0.9 instead of 0.32 MB/s). IV. FOWLER–NORDHEIM TUNNELING Compared with CHE, this programming method has the advantage of small current absortion, particularly interesting for low-power applications. Moreover, it allows very high 597

Fig. 4. (a) Qualitative waveforms of the CG and drain voltages for ramped voltage programming scheme as well as corresponding behavior of (b) V , I and (c) V , I .

Fig. 5. Dependence of  on V V and V . BWP indicates the 

for 10 K cells at different for box programming.

DOP, thus leading to a strong increase in PT. In this regard, the NAND state of the art (based on FNT programming) produces a PT as high as 10 MB/s [9]. However, as described in [10], FNT has several drawbacks that make it less effective than CHE for ML applications. In 598

particular, programming by tunneling is more sensitive than CHE to process parameters, and this produces wider distributions. Furthermore, the applied voltages are higher than with CHE, and this produces high stress in the oxide, resulting in worse device reliability. In this regard, Fig. 7 shows shift read disturb time, i.e., the time to produce a 0.5-V due to drain stress, as a function of number of P/E cycles [11]. Thus, since the applied voltages cannot be too high, programming currents ( ) are low; this leads to high programming times (in the range of 10 ms as opposite to the few s for CHE programming). To maintain competitive PT, high parallel programming is required, and this leads to high circuit complexity and die-size overhead, although parallel programming for FNT is simpler to implement than for CHE. distriCompared to CHE, FNT tends to produce wider butions and higher programming time; thus, efficient P&V algorithms are needed in ML programming to guarantee good program accuracy and PT. In [12], three different P&V algorithms (schematically shown in Fig. 8) are presented for a NAND Flash memory. Fig. 8(a) illustrates the conventional P&V technique where pulses of variable width are applied on the CG, while a verify operation is carried out between two write pulses. The first write pulses are sufficiently short so as to ensure that fast cells will not overprogrammed, then the pulse width is increased to minimize the number of verify steps for slow cells. Fig. 8(b) shows the trapezoidal pulse algorithm that achieves much better results than in Fig. 8(a). Higher programming speed can be obtained, while the oxide electric field ( ) can be reduced. Moreover, programming time distribution width reduction is much increase with weaker than for the previous case. Fig. 8(c) instead shows the staircase pulse algorithm that uses the same approach as in Fig. 8(b) but it is much easier to generate on-chip. In Fig. 9, the main characteristics of both FNT and CHE are compared. Since the advantage of less disturbs and lower electric fields are more important than the large DOP allowed by FNT, CHE seems to be more suitable for ML applications, at least when low power consumption is not the main constraint. Of course, with FNT it is possible to reduce program time , thus trading off and device relia( ) by increasing bility. In this regard, stress-induced leakage current (SILC), degrading data retention time, is the main phenomenon, and , it has conventionally been considered to increase with thus with the decrease of [13] (for the same charge fluence, i.e., total charge injected through the oxide). However, recent studies [14] have shown that, for the same charge fluence, initially SILC increases with decreasing , as the stress time becomes but it tends to decrease with comparable to the characteristic time required for permanent oxide degradation. Fig. 10 shows SILC characteristics of Flash memory cells and for different program condias a function of tions. Fig. 10(a) shows that SILC after a 10 K P/E cycling 20 ns is not much larger than the one obtained with PROCEEDINGS OF THE IEEE, VOL. 91, NO. 4, APRIL 2003

Fig. 6. Representation of the novel algorithm that combines ramped voltage programming with verify operations. Inside the boxes, the CG voltage during the two program steps is shown.

Fig. 7. A comparison between the read disturb due to CHE and FNT programming, as a function of P/E cycles. The picture is taken from [11].

with 30 s. Instead, Fig. 10(b) shows that SILC stops 1 s and (slightly) decreases with increasing for below such a value. This shows that FNT programming of Flash memory with as low as 20 ns is feasible, with good results in term of data during reading retention, provided that sufficiently low is applied. In this regard, in Fig. 11 the maximum read disturb voltage ) compatible with a data retention time of ten years ( as after 10 K P/E cycles is shown as a function of . For low as 20 ns, this maximum value is about 2.5 V. However, a significant problem for FNT is due to the high voltages needed for fast programming [in the case 20 ns, it is 26.5 V], since of Fig. 10(a), for GROSSI et al.: PROGRAM SCHEMES FOR MULTILEVEL FLASH MEMORIES

this leads to challenging constraints for the high-voltage programming circuit. Scaling the oxide thickness has favorable effects because it decreases the values of for the same oxide field, but also produces a drastic decrease in data retention time. In [15], measurements performed on 6.5-nm oxide Flash memories have shown a data retention time of 13 hours of 2.5 V during after 10 K P/E cycles with a maximum reading. Such a retention time is small compared to the ten-year retention of conventional nonvolatile memories, but it is more than three orders of magnitude greater than typical DRAM refresh time, thus making fast FNT potentially interesting for DRAM-like applications. 599

Fig. 8. Conventional (a), trapezoidal (b), and staircase (c) programming pulses. A verify step is carried out after each pulse. The picture is taken from [12].

Fig. 10. SILC characteristics of the Flash memory cells after 10 K P/E cycling (a) as a function of F for different program conditions and (b) as a function of T .

Fig. 9. Comparison of FNT and CHE programming mechanisms for ML applications. The picture is taken from[11].

V. CONCLUSION This paper has presented a synthetic review of different program techniques for ML Flash memories based both on CHE injection and FNT. In the case of CHE, ramped voltage programming has been distributions and higher shown able to achieve tighter program throughput than the conventional box techniques. In fact, programming on four levels is feasible without the use of P&V algorithms. Instead, with 8 or 16 levels, P&V is mandatory and problems are in order because of the difficulty of conjugating ramped voltage programming and verify operations. In the case of FNT, instead, fast programming with pulse duration of 20 ns seems able to produce very high PT (comparable with DRAMs). However, problems occur because of need to use high-voltage circuitry and/or the reduction of data retention time due to decreased tunnel oxide thickness. For these reasons, fast FNT seems more suitable for DRAM-like applications than conventional nonvolatile memories. 600

Fig. 11. Maximum read disturb voltage V which still guarantees a data retention time of 10 years versus T after 10 K P/E cycles.

REFERENCES [1] B. Riccò, G. Torelli, M. Lanzoni, A. Manstretta, H. Maes, D. Montanari, and A. Modelli, “Nonvolatile multilevel memories for digital applications,” Proc. IEEE, vol. 86, pp. 2399–2421, Dec. 1998. [2] A. Silvagni, S. Zanardi, A. Manstretta, and M. Scotti, “Modular architecture for a family of multilevel 256/192/128/64 mbit 2-bit/cell 3 v only NOR Flash memory devices,” IEEE Trans. Electron Devices, vol. 48, pp. 937–940, Jan. 2001. [3] M. Bauer, “A multilevel-cell 32 Mb Flash memory,” in IEEE ISSCC Tech. Dig., 1995, pp. 132–133. [4] T.-S. Jung, Y.-J. Choi, and K.-D. Suh, “A 117 mm 3.3 v only 128 mb multilevel NAND Flash memory for mass storage applications,” IEEE J. Solid-State Circuits, vol. 31, pp. 1575–1583, Nov. 1996.

PROCEEDINGS OF THE IEEE, VOL. 91, NO. 4, APRIL 2003

[5] H. Nobukata, S. Takagi, and K. Hiraga, “A 144-Mb, eight-level NAND Flash memory with optimized pulsewidth programming,” IEEE J. Solid-State Circuits, vol. 35, pp. 682–690, May 2000. [6] D. Esseni, A. D. Strada, P. Cappelletti, and B. Riccò, “A new and flexible scheme for hot-electron programming of nonvolatile memory cells,” IEEE Trans. Electron Devices, vol. 46, pp. 125–133, Jan. 1999. [7] R. Versari, D. Esseni, G. Falavigna, M. Lanzoni, and B. Riccò, “Optimized programming of multilevel Flash EEPROMs,” IEEE Trans. Electron Devices, vol. 48, pp. 1641–1646, Aug. 2001. [8] M. Grossi, M. Lanzoni, and B. Riccò, “A novel algorithm for high throughput programming of multi-level Flash memories,” IEEE Trans. Electron Devices., submitted for publication. [9] H. Nakamura, K. Imamiya, and T. Himeno, “A 125 mm 1 Gb NAND Flash memory with lOMB/s program throughput,” in IEEE ISSCC Tech. Dig., vol. 1, 2002, pp. 106–450. [10] B. Eitan, R. Kazerounian, A. Roy, G. Crisenza, P. Cappelletti, and A. Modelli, “Multilevel Flash cells and their trade-offs,” in IEEE IEDM Tech. Dig., 1996, pp. 169–172. [11] B. Eitan and A. Roy, “Binary and multilevel Flash cells,” in Flash Memories, P. Cappelletti, C. Golla, P. Olivo, and E. Zanoni, Eds. Boston, MA: Kluwer, 1999, pp. 91–152. [12] G. Hemink, T. Tanaka, and T. Endoh, “Fast and accurate programming method for multi-level NAND EEPROM’s,” in Symp. VLSI Technology Dig. Tech. Papers, 1995, pp. 129–130. [13] R. Moazzami and C. Hu, “Stress-induced current in thin silicon dioxide film,” in IEEE IEDM Tech. Dig., 1992, pp. 139–141. [14] R. Versari, A. Pieracci, D. Morigi, and B. Riccò, “Fast tunneling programming of nonvolatile memories,” IEEE Trans. Electron Devices, pp. 1285–1287, June 2000. [15] R. Versari, A. Pieracci, and B. Riccò, “Fast programming/erasing of thin-oxide EEPROMs,” IEEE Trans. Electron Devices, pp. 817–819, Apr. 2001.

Marco Grossi was born in Bologna, Italy, in 1973. He received the Laurea degree in electronic engineering from the University of Bolognain 2000. He is currently working toward the Ph.D. degree at the Department of Electronics, Computer Science, and Systems Laboratory, University of Bologna. His research interest is characterization of nonvolatile memories. He is currently working in the field of Flash memories and the multilevel programming of these memories using the ramped gate technique.

GROSSI et al.: PROGRAM SCHEMES FOR MULTILEVEL FLASH MEMORIES

Massimo Lanzoni was born in Bologna, Italy, in 1961. He received the Laurea degree in electronic engineering from the University of Bologna, Bologna, Italy, in 1987. He is with the Microelectronics Research Group, Department of Electronics, Computer Science, and Systems, University of Bologna, working on research projects in the fields of nonvolatile memories, MOS devices, virtual instrumentation, and testing. His research interests include the characterization of thin dielectrics reliability, nonvolatile memory cell characteristics and reliability, MOS transistors’ experimental characterization and new techniques for IC testing as nonvolatile memories endurance testing and CMOS IC latch-up testing. He is now involved in projects concerning analog applications of nonvolatile memories and multilevel programming.

Bruno Riccò (Fellow, IEEE) was born in Parma, Italy, in 1947. He received the Laurea degree in electrical engineering from the University of Bologna, Bologna, Italy, in 1971 and the Ph.D. degree from the University of Cambridge, Cambridge, U.K., in 1976, where he worked at the Cavendish Laboratory. In 1980, he was a Full Professor of Electronics at the University of Padova, Padova, Italy. In 1983, he was a Full Professor of Electronics at the University of Bologna. In 1983 and 1986, he was Visiting Professor at the University of Stanford, Stanford, CA; at the IBM Thomas J. Watson Research Center, Yorktown Heights, NY; and at the University of Washington, Seattle. He is currently with the Department of Electronics, Computer Science, and Systems, University of Bologna. He has also been a Consultant for major companies and for the Commission of the European Union in the definition, evaluation, and review of research projects in microelectronics. He is author or coauthor of more than 300 publications (more than half of which have been published in major international journals), three books, and six patents in the field of nonvolatile memories. His research interests include solid-state devices and ICs. He is currently also working in the field of IC design, evaluation, and testing. Prof. Riccò has been President of the Group of Electron Devices, Technologies, and Circuits of the Italian Association of Electrical and Electronics Engineers (AEI) since 1996, and was President of the Italian Group of Electronics Engineers from 1998 to 2001. In 1996, he recieved the G. Marconi Award from the AEI. He was European Editor of the IEEE TRANSACTIONS ON ELECTRON DEVICES from 1986 to 1996, European Cochair at the International Electron Device Meeting (IEDM) from 1992 to 2001, and ViceChairman of the North Italy Section of IEEE from 1999 to 2001. He has been Chairman of the IEEE North Italy since 2002.

601