Jan 26, 2011 - Example (but not ideal yet). 45s. HDD. DRAM. OS Data. Transf. and Expan. SSD. DRAM ..... â¢Restore operation can be performed with PRE.
ASP-DAC 2011 26th January 2011 Yokohama, Japan
1
Non-Volatile Memory and Normally-Off Computing
T. Kawahara Central Research Laboratory, Hitachi, Ltd.
T.Kawahara, Hitachi
2
Acknowledgements
Original works cited in this presentation were supported in part by; ; • The “High-Performance Low-Power Consumption Spin Devices and Storage Systems” program (headed by Professor Hideo Ohno of Tohoku University) under Research and Development for NextGeneration Information Technology of MEXT, • One of projects on "Fundamental Technologies for the Next Generation Supercomputing" by MEXT, and, • The Japan Society for the Promotion of Science (JSPS) through its “Funding Program for World-Leading Innovative R&D on Science and Technology (FIRST Program). “ The author would like to thank to; Tohoku University: S. Ikeda, T. Meguro, R. Sasaki, M. Yamanouchi, I. Morita, T. Hirata, T. Hanyu, and H. Ohno. Hitachi: R. Takemura, K. Ono, T. Ishigaki, T. Yamada, A. Kotabe, S. Hanzawa, S. Yamaguchi, K. Miura, H. Yamamoto, J. Hayakawa, N. Matsuzaki, Y. Mouri, K. Ito, H. Takahashi, H. Hasegawa, Y. Goto, N. Osakabe, and H. Matsuoka. T.Kawahara, Hitachi
3
Outline Introduction Non-volatile memories Spin-transfer torque RAM scalability Normally-off instant-on computing Conclusion
T.Kawahara, Hitachi
4
Social system needs large power
• World is power hungry, but... Home Network
Office
Light up City Infrastructure
Distribution Travel Tracking House Health
Entertainment LAN
Study Event
T.Kawahara, Hitachi
Small Area Network
5
Necessity of low power IT
• In 2025, five times larger power consumption is expected in Japan than that in 2005. • Will reach nine times in the world.
Power Dissipation in Japan
(BkWh/year) 25 (Source: Ministry of Economy, Trade and Industry (METI)) Network (router, switch) ) 20 15 10
Server PC
5 0 2005
T.Kawahara, Hitachi
TV 2010
2015 (Year)
2020
2025
6
Sustainable and innovative society
• Solutions are not only by LSI technology, but LSI can contribute in many for sustainable world. Conventional trend Power Consumption
Lower Power by Innovation New: Normally-OFF, Instant-ON
1990
T.Kawahara, Hitachi
with conventional nano Tr., low-voltage, multi-core, etc.
Direction for sustainable and innovative society 2005
2025
Target as carbon emission in 2007 G8 Summit @Heiligendamm Germany 2007.6
2050 (year)
7
Beyond Low Voltage Operation
• Already made extensive progress for low voltage operation with high performance in last twenty years. • Complementarily, a new kind of innovation for further. 101
T. Sakata et al., VLSI93.
100 Current (A)
VPH VDD
1.2A
IACT
10-1
Local VDD
IAC
10-2
VSS VNH
10-3 10-4
tRC = 180 ns T = 75°°C S = 97 mV/decade
IDC
10-5 10-6
VDD
16 M 64 M 256 M 1 G 4 G 16 G 64 G Capacity (bits) 3.3
2.5
2.0
1.5 1.2 VDD (V)
1.0
0.8
VSS
1993: Well-driven SA (Mitsubishi) 1993: 256Mb DRAM (Hitachi) T. Ooishi, et al., VLSI93.
G. Kitsukawa, et al., ISSCC93.
NMOS Back bias terminal
PMOS
NiSi (FUSI) Source
Gate
Drain STI
0.53 0.40 0.32 0.24 0.19 0.16 0.13 Extrapolated VT at 25°°C (V) 2.0
20 44 98 210 470 Wtotal / Leff (× ×106) 1993: Projected Current Consumption (Hitachi) T.Kawahara, Hitachi
6.7
Substrate
SOI BOX
Well (Back gate)
2004: Dopeless Channel CMOS (SOTB) (Hitachi) R. Tsuchiya, et al., IEDM04.
8
Memory and Computing
• Memory systems are constructed according to a deep hierarchy based on speed and capacity. Volatile memory and nonvolatile memory are often combined • Leads to slow start-up, long idle times, and low power efficiency. Volatile Volatile
Non-Volatile Non-Volatile
~ns
Big difference in • Speed • Volatile/NV
~10ns
~100ns
~ms T.Kawahara, Hitachi
FF/Register FF/Register
Speed Density Fast
Low
Slow
High
Cache Cache
Main Main memory memory
HDD HDD
SSD SSD
9
Normally OFF, Instant ON Computing
• New innovation in addition to low-voltage. • Normally OFF, Instant ON – Turn off anytime when not in use, – but operate instantly with full performance when needed. • Need data stored in the state of operation before any turned off, with zero power for retaining data. – > Nonvolatile RAM (NV-RAM) • NV-RAM: Infinite number of fast write and read operations with non-volatility. • Adaptive solutions in wide time domain, and wide area domain. LSI level to System level. T.Kawahara, Hitachi
10 Example (but not ideal yet) • Power ON time is improved to 1/9 with NV-RAM. • Nonvolatility achieves low power also. Stand-by Power( (mA) ) 0 100 HDD+DRAM Normal Power ON
OS Data Transf. and Expan.
HDD
SSD+DRAM SSD Normal Power ON
DRAM
25s
100mA Data is stored in RAM
With NV-RAM Power ON
RAM: 1GB Stored all the data
5s 0
45s
Stand-by Power
DRAM Resume Stand-by Power ON
T.Kawahara, Hitachi
DRAM
5
10
15 20 25 30 Power ON time (s)
35
40
45
11 Outline Introduction Non-volatile memories Spin-transfer torque RAM scalability Normally-off instant-on computing Conclusion
T.Kawahara, Hitachi
12 RAM Comparison • NV-RAM: Infinite # of write cycle, non-volatile, fast R/W. • Magnetic RAMs are good candidates for NV-RAM. 1020
100u SRAM/DRAM (Volatile)
RAM PRAM
End uran ce 1 0y 3y
Read Latency (s)
Endurance
1015 Magnetic
10u
FLASH (NAND)
1u FLASH(NOR)
Magnetic RAM
100n
1010
FeRAM
PRAM 10n
DRAM
FLASH 105 1n
1u 10n 100n Write Time (s)
10u
1n 0
SRAM (100-150)
30 10 20 2 Cell Area (F ) ∝(1/Density)
General Memory Performance Map T.Kawahara, Hitachi
FeRAM
13 Origin of Non-volatility • Various phenomena is applicable: insulator barrier, change of structure (bistable), and alignment of electron condition (bistable). Insulator barrier
Alignment of electron condition
Change of structure
Floating gate 3d
Phase change Floating gate
3d
Ferromagnetism (+ Magneto resistive effect for memory)
Gate Si3N4 S
D Bit2 Bit1
Ferroelectrics
Silicon nitride
Nano-dots T.Kawahara, Hitachi
Ionization
-> Easier to achieve infinite write cycle (endurance).
14 Phase Change Memory • Resistance change between amorphous and poly of chalcogenide material. Large resistance change. • Good scalability. 10mA
Plate Chalcogenide
Write Current
Electrode
Chalcogenide Poly amorphous
bitline wordline
1mA
0.1mA 10nm
Memory cell 120 RESET Pulse M.P. SET Pulse C.T. T2
100
Current (a.u.)
TemperatureT1
80
T.Kawahara, Hitachi
100nm Electrode Diameterφ φ
1μ μm
Current
crystalline “0”
Resistance changes
Phase change region
60 40 20 0
time
S. Lai, IEDM 2003.
Vth φ
amorphous “1”
0 0.2 0.4 0.6 0.8 Voltage (V) from Ovonyx HP
1
1.2
Y.N. Hwang et al., IEDM 2003.
15 Development of Phase Change Memory • Principle was announced in the 70's. • Production as NOR flash replacement in recent years. year
70’s
02
03
04
05
06
07
08
09
10
512Mb 128Mb (Numonyx(Micron)) (Samsung)
Product
Stand-alone 4Mb 8Mb (STMicro.)
256Mb
1Gb (Numonyx)
4Mb 256b (ECD, Intel) (Ovonyx, Intel) 64Mb 256Mb 512Mb (Samsung)
Conf.
Embedded
T.Kawahara, Hitachi
4Mb (Hitachi/Renesas)
4Mb (STMicro.)
16 Phase Change Memory Chips Stand-alone (Numonyx) • Process: 45 nm • Power Supply: 1.8 V • Density: 1Gb • Chip size: 37.5 mm2 • Cell size: 0.011 µm2 (5.5F2) • Cell Tr: BJT • Endurance: ~109
Embedded (STMicroelectronics) • Process: 90 nm • Power Supply: 1.2 V • Density: 4Mb • Chip size: 3 mm2 • Cell size: 0.29 µm2 (36F2) • Cell Tr: MOS • Endurance: ~107
Memory Cell
Chip photo
C. Villa, et al., ISSCC 2010 G. Servalli, IEDM 2009 T.Kawahara, Hitachi
Memory Cell
Chip photo
G. De Sandre, et al., ISSCC 2010 R. Annunziata, et al., IEDM 2009
17 Stacked Phase Change Memory • Selecting element and the memory device are stacked. • Selecting element by non-substrate Si, formed on wiring. DerChang Kau, et al., IEDM 2009
Chip
Memory cell
I-V Characteristics
A switch made of phase-change material itself (Numonyx).
Y. Sasago, et al., VLSI Tech. 2009
Cross sectional view
Memory cell
Polysilicon diode formed on wiring. Current drivability: 160uA@30nm. T.Kawahara, Hitachi
18 TMR Device and Memory cell • Resistance changes between parallel and anti-parallel. • Use this two-state as an information bit. Circuit diagram Bit Line (BL) TMR device
(CoFeB)
(CoFeB)
Word Line (WL)
Free layer MgO barrier Pinned layer Selecting transistor
Source Line (SL) Parallelized State (Low Resistance: RP)
Anti-Parallelized State (High Resistance: RAP)
TMR ratio = (RAP - RP) / RP T.Kawahara, Hitachi
19 Innovation in TMR device: MgO • MgO barrier provides two breakthroughs. • High TMR ratio and small write current. Free layer AlO barrier (amorphous) Pinned layer Free layer MgO barrier (crystal) Pinned layer
800
604%(RT)
TMR ratio (%)
600
Hitachi & Tohoku Univ. (100)bcc MgO-barrier MTJs
400
RP
AIST/ CanonANELVA IBM
200 Al2 O3 -barrier MTJs
RAP
RP
RAP
(CoFeB) (CoFeB)
0 1995 T.Kawahara, Hitachi
2000
year
2005
2010
In-plane Perpendicular TMR ratio = (RAP - RP) / RP
20 SPRAM (SPin-transfer torque RAM) 103 102 Threshold Current Density (Jc)
1 x 107 A/cm2 1 x 106 A/cm2 5 x 105 A/cm2 1 x 105 A/cm2
101 100
0
100 200 300 400 TMR Device Width (nm)
BL
I
MTJ MTJ SL
WL
SPRAM T.Kawahara, Hitachi
500
Write Current, Ic (mA/bit)
Write Current, Ic (µA/bit)
• MRAM: Matured in product. Inferior in scalability. • SPRAM: Good potential in scalability. 20 15
Write Current Ic
IWWL + IBL
10 5 0 10
IBL
100 1000 TMR Device Width (nm)
IWWL
BL MTJ MTJ SL WWL RWL RWL
Conventional MRAM
21 Thermal Stability Factor: E/kBT • Thermal field affects retention and disturbance. • High E/kBT attainable by perpendicular magnetization. • Perpendicular TMR with typical CoFeB was reported. (S. Ikeda et. al, Nature Materials, Volume 9, pp.721–724 (2010))
Retention Parallelized
Anti-Parallelized
Error Transition
Thermal fluctuation P=1-exp{-(t/τ0)exp[-E/kBT]} T.Kawahara, Hitachi
Disturbance Parallelized
E/kBT
Anti-Parallelized
Error Transition
E/kBT(1-Icell/Iw) P=1-exp{-(t/τ0)exp[-E/kBT(1-Icell/Iw)]} Icell: Read Current, Iw: Write Current
22 Development of Magnetoresistive Memory • MRAM: Matured in product. • R&D moved to SPRAM. year
00
01
02
03
04
05
06
07
09
4Mb (Everspin)
4Mb (Freescale)
Product
08
512b 256kb 1Mb 4Mb (Motorola) (Freescale)
10 16Mb
32Mb (NEC)
512kb(NEC) 1kb(IBM) Conf.
MRAM SPRAM
128kb (IBM/Infineon) 16kb (Samsung)
16Mb (Toshiba/NEC) 16Mb
1Mb (Renesas)
4Mb(TDK) (IBM,TDK)
256kb(Grandis) 64Mb (Toshiba) 2Mb 32Mb 64Mb (Hynix) (Hitachi/Tohoku Univ.)
4kb (Sony/AIST) T.Kawahara, Hitachi
23 SPRAM Chips Perpendicular TMR (Toshiba) • Process: 65 nm • Power Supply: 1.2 V • Density: 64 Mb • Chip size: 47.12 mm2 • Cell size: 0.3584 µm2 (84.8F2) • Write Current: 50 uA • TMR device: Perpendicular
K. Tsuchida, et al., ISSCC 2010
T.Kawahara, Hitachi
Modified DRAM Process (Hynix, Grandis) • Process: 54 nm • Power Supply: 1.8 V • Density: 64 Mb • Chip size: 23.36 mm2 • Cell size: 0.041 µm2 (14F2) • Write Current: 140 uA • TMR device: In-plane
S. Chung, et al., IEDM 2010
24 Progress in memory development • Memory tied up closely to deep material and device tech. • Modeling is indispensable to quantitative cooperation with circuitry. Literature, SPICE Modeling simulation TEG design BS,・・・ ・・・
Diversity
Material ・organic ・Bio ・MEMS ・Nano
Feasibility study
Spin
Complexity
Cell operation Array operation
Phase1
Mechanism ・Analog ・Multi-value ・Destructive/ND
Polymer
Phase 2,3,・・ ・・
CNT DNA
・Volatile/NV
Time T.Kawahara, Hitachi
(Team building, Management)
Proto Chip
Winner
25 Ex. Modeling of TMR Memory Cell (1/2) • Physics was built into analog and mixed-signal (AMS) simulation, and simulated with CMOS circuits. • Dependence of basic physical characteristic to memory function can be directory simulated. SPICE
Data
WL Write DV
TMR model
BSIM
TMR model
Write DV Cell Tr
•Non-linear I-R behavior •T-dependent I-R behavior •P-state: GP ∝ V2 (Simmons) •AP-state: ∆R/R ∝ exp(-|V|/Vh) • ∆R/R(V=0) : function of Tdependent polarization •Non-linear parameter Vh : Tindependent
G(V, T, s)=A(V, T, s)GP(V) GP: P-state conductance A: modulation term V: voltage T: temperature K. Ono, et al., s: spin state IEDM 2009
•Numerical solution of stochastic-LLG equation •Include thermal fluctuation of macro-spin
[ (
)
( (
ς fl (t ) = 0, ςϕfl (t )ςθfl (t ′) = 2δ ϕθ δ (t − t ′) T.Kawahara, Hitachi
))]
∂s = s × heff + hfl − αs × s × heff + hfl ∂τ γ H ∆τ = 0 K2 ∆t 1+ α α 2k BT h fl = (1 + α 2 ) M S H KVo ςfl
s: normalized spin vector HK: easy-plane anisotropy field γ0: gyromagnetic ratio heff: effective field hfl: stochastic field Ms: saturation magnetization α : damping coefficient Vo: free-layer volume kB : Boltzmann’s constant T: temperature
26 Ex. Modeling of TMR Memory Cell (2/2) • Physics was built into analog and mixed-signal (AMS) simulation, and simulated with CMOS circuits. • Dependence of basic physical characteristic to memory function can be directory simulated. Transient write current and stochastic switching K. Ono, et al., IEDM 2009
0.5 Cumulative probability
Cell Current (mA)
Meas. 0.4 0.3 0.2 Sim. 0.1 0
-0.1
0
T.Kawahara, Hitachi
20
40 60 Time (ns)
80
100
1.0 0.8 0.6 0.4 Measurements 0.2 0
TMR model 0
10 20 30 40 Switching time (ns)
50
27 For emerging memory simulation • Need path to generate physical model with new materials. Memory tech. should handle physical phenomena. • Estimation of phenomena between adjacent cells and the size effect are important.
Y. Sasago, et al., VLSI Tech. 2009
Emerging Memories
(from ITRS ERD/ERM WG,2010)
"Visualization“: important to intuitive understanding of peculiar phenomenon to the material and grasping the disturbance between adjacent cells in the array. T.Kawahara, Hitachi
“Size dependent phenomena“: TCAD necessary to handle materials more than current Si, otherwise size dependence, that is important, cannot be estimated.
28 Outline Introduction Non-volatile memories Spin-transfer torque RAM scalability Normally-off instant-on computing Conclusion
T.Kawahara, Hitachi
29 Memory cell size scalability • Transistor drivability ∝F, but TMR write current ∝F2. • Cell becomes easy to shrink with advancing feature size. Transistor drivability, Write current (uA)
1000
Jc= 4.0 MA/cm2
TMR size:FxF Id= 300 uA/um
2.0
F2
F2
F2
F2
Gate width
10F
4F
2F
F
Cell size
40F2
16F2-
8-6F2
4F2
TMR size
100 1.0
TMR
Tr W= 10F
0.5
4F 2F
10
1F
8F 2 1T1MTJ
TMR
: Transistor drivability : Write current 8F 2
1
90
T.Kawahara, Hitachi
65 45 32 Feature size, F (nm)
22
2T1MTJ
30 Scalable 4F2 Memory-cell configurations • TMR memory cell has better scalability than DRAM cell due to low aspect ratio. R. Takemura, et al., IMW 2010 PL
Capacitor
SL
2F
Tr. BL
Capacitor height ~2 µm
TMR
WL
Cell layout BL TMR
“11”
Δ R1
TMR2 (ΔR2 @Ic2±) WL
“01”
Δ R2
Δ R2
“10”
Δ R1
“00”
RAP1+RAP2 RAP1+RP2 RP1+RAP2 RP1+RP2
BL
MTJ1 72 nm
MTJ
MTJ2
SL
SL
Current WL
Memory cell configuration T.Kawahara, Hitachi
Ic2-
Ic1-
Ic1+
Ic2+
Characteristic
34 Toward 2F2/bit and More… • 2n value level can achieve n-bits per cell. • Needs high E/kBT and sufficiently small RA. (RA: resistance area product of TMR) T. Ishigaki, et al., VLSI Tech. 2010
R1 R2 R3 TMR1
TMR2
TMR3
R1
R2
R3
R
“1 1 1”
R1AP+R2AP+R3AP
“1 1 0”
R1AP+R2AP+R3P
“1 0 1”
R1AP+R2P+R3AP
“1 0 0”
R1AP+R2P+R3P
“0 1 1”
R1P+R2AP+R3AP
“0 1 0”
R1P+R2AP+R3P
“0 0 1”
R1P+R2P+R3AP
“0 0 0”
R1P+R2P+R3P IP3 IP2 IP1
Memory cell circuit T.Kawahara, Hitachi
IAP1 IAP2 IAP3
I
3-bit/cell state transition scheme
35 Future tech.: Voltage-driven magnetic RAM • Basic research is progressing for voltage-driven magnetic RAM. Current-driven device should face power consumption concern, voltage drop in line, and ability for current supply switch. -> Solved by Voltage-driven.
T.Kawahara, Hitachi
Ref.: http://www.csis.tohoku.ac.jp/ (Center for Spintronics Integrated Systems)
36 Outline Introduction Non-volatile memories Spin-transfer torque RAM scalability Normally-off instant-on computing Conclusion
T.Kawahara, Hitachi
37 Non-Volatile Architecture Merits for non-volatile architecture (non-volatile computing) • Disappear the difference between suspend and hibernation. – Easy to control On/Off of server due to the load.
• Assured write back operation. – Fast and quick restarting.
Issues • New OS for the best use of non-volatile memory. – When restarting, activate the contents of memory. – For security issue, handle the erase of contents. – Same as uninterruptible computation.
• Cooperation with processor vender and OS vender. – Ultimately, compiler schedules On/Off. Volatile Volatile
FF/Rgst FF/Rgst
FF/Rgst FF/Rgst
Cache Cache
Cache Cache
Main Main memory memory
Main Main memory memory
Non-Volatile Non-Volatile
HDD HDD
SSD SSD
Conventional Architecture T.Kawahara, Hitachi
HDD HDD
SSD SSD
Non-Volatile Architecture
38 In CPU and Basic Circuitry Layer • Achieving a normally-off state as much as possible by detecting any standby operation, when the power needed is preferably lower than required for transition. • Eliminating the power required for communication between the memory and logic circuits. • High energy efficiency in operation and no DC power consumed in CPU's long standby state, as occurs often in most business applications. Volatile Volatile
FF/Rgst FF/Rgst
FF/Rgst FF/Rgst
Cache Cache
Cache Cache
Main Main memory memory
Main Main memory memory
Non-Volatile Non-Volatile
HDD HDD
SSD SSD
Conventional Architecture T.Kawahara, Hitachi
HDD HDD
SSD SSD
Non-Volatile Architecture
39 Nonvolatile Logic-in-Memory Architecture • Logic-in-Memory Architecture (proposed in 1969): Storage elements are distributed over a logic-circuit plane. Magnetic Tunnel Junction (MTJ) device (TMR device) MTJ layer CMOS layer
●Storage is nonvolatile: (Leakage current is cut off) ●MTJ devices are put on the CMOS layer ●Storage/logic are merged: (global-wire count is reduced) T.Kawahara, Hitachi
•Non-volatility •Unlimited endurance •Fast writability •Scalability •CMOS compatibility •3-D stack capability
Static power is cut off. Chip area is reduced. Wire delay is reduced. Dynamic power is reduced.
Ref.: http://www.csis.tohoku.ac.jp/ (Center for Spintronics Integrated Systems)
40 Design of a Nonvolatile Full Adder • Demonstrated quick on/off operation. • Dynamic power reduce to 1/4 with keeping performance. S. Matsunaga et. al., Applied Physics Express 1 (2008) 091301.
15.5 µm
SUM
P : Precharge phase E : Evaluate phase
VDD
“PowerPoweroff” off” P E Standby Active
“PowerPower-off” off”
Standby
Inputs (A,B and Ci)
P E
P E
Active
A=0,B=0,Ci=0
“PowerPoweroff” off”
“PowerPower-off” off”
P E
Standby
Active Standby
A=0, B=0,Ci=1
CLK
CARRY
13.9 µm
Sbefore=0
S (=A B Ci) S’ (=A B Ci) 780mV/div
10.7 µm
Safter=0
1mS/div
S’before=1
Sbefore=1
Safter=1
S’before=0
S’after=0
S’after=1
Sbefore(S’ (S’before) : S(S’ S(S’) just before powerpower-off Safter(S’ (S’after) : S(S’ S(S’) just after powerpower-on
CMOS Proposed 224 ps 219 ps Delay 71.1 mW 16.3 mW Dynamic power (@500MHz) 2 ns/bit 10 ns/bit Write time 4 pJ/bit 20.9 pJ/bit Write energy 0.9 nW 0.0 nW Static power 2 2 333 mm (42 MOSs) 315 mm (34 MOSs + 4 MTJs) Area (Device counts) Ref.: http://www.csis.tohoku.ac.jp/ (Center for Spintronics Integrated Systems) T.Kawahara, Hitachi
41 High-Density TCAM Cell Design • Halves active power-delay product with 1/100 stand-by. • 2bit TMR storage merged into CMOS logic. S. Matsunaga, et al., APEX, 2, 2, 023004, Feb.2009. Equality-detection (ED) circuit
1-bit storage
3.0 µm
ML/BL
ML VDD
Leakage current
WL VSS
2-bit storage (MTJs) Logic (MTJs & MOSs)
Leakage current
BL1
SL’
SL
SL’/WL1
BL2
PowerPower-off
Active P
E
Standby
E
PowerPower-off
P
E
Standby
Active P
E
CLK
Stored data B=0 S=0
S
OUTbefore=1
Stored data B=0 S=0
OUTafter=1
S=1 OUTbefore=0
S=1 OUTafter=0
OUT
10µ µs
T.Kawahara, Hitachi
Active Power
CMOSCMOS-only
Proposed
Cell array
109.6 µW
107.3 µW
SA
30.8 µW
ACC
3.7 µW
CLK
32.7 µW
Cell Standby array Power SA Delay
780mV
Match
Ref. cell
Proposed
Active P
TCAM cell
Dynamic current comparator in MLSA
SL/WL1
Conventional VDD
Output generator in MLSA
9.8 µ m
1-bit storage
Match Mismatch
Mismatch
103%
9.6 µW 3.7 µW 62.0 µW
340.9 µW 1.2% 1.8 µW 1.2 µW
2.3 µW
1.39 ns 43%
0.60 ns
HSPICE simulation under a 90nm CMOS/MTJ technology @125MHz, RP : 2kΩ, TMR ratio : 100%
Ref.: http://www.csis.tohoku.ac.jp/ (Center for Spintronics Integrated Systems)
42 Design of a Compact Nonvolatile FPGA • TMR device/logic direct combination by differential current-mode tech. simplifies the circuit, reduces power. D. Suzuki, et al., Symp. on VLSI Circuits, 80/81, June 2009.
SA SA SA SA
MTJ MTJ MTJ MTJ
Combinational logic (CMOS)
Output
Proposed
102 MOSs + 8 MTJs
29 MOSs + 4 MTJs
702 µm2
287µ µm2
Delay *2)
140 ps
185 ps
Power*2)
26.7 mW
17.5 mW
Power
0 mW
0 mW
Device Counts Area *1)
(SA: Sense Amplifier)
Low voltage
Nonvolatile SRAM
High Voltage
Active
Conventional
Standby
*1) Estimation based on a 0.14mm process *2) HSPICE simulation based on a 0.14mm MOS/MTJ-hybrid process Active Standby Active
MTJ MTJ MTJ MTJ
Combinational logic (Current-Mode) Low voltage
Proposed
SA
CLK
Output
VDD
High voltage
Selection Transistor Tree
A ⊕B 00 01 10 11
VDD= 0
A ⊕B 00 01 10 11
Z Z 0.78V/div 50µs/div
T.Kawahara, Hitachi
Ref.: http://www.csis.tohoku.ac.jp/ (Center for Spintronics Integrated Systems)
43 In Main Memory and System Layer • For instant-on, keeping the status in main memory in standby without power dissipation. – Simplified control circuit because the refresh operation is minimized. – Battery backup in cache logging area is eliminated.
• Hot standby configuration (speed) with deep standby mode (power) is attainable. – Achieving system reliability for mission-critical systems (high SLA) with low-power. Volatile Volatile
FF/Rgst FF/Rgst
FF/Rgst FF/Rgst
Cache Cache
Cache Cache
Main Main memory memory
Main Main memory memory
Non-Volatile Non-Volatile
HDD HDD
SSD SSD
Conventional Architecture T.Kawahara, Hitachi
HDD HDD
SSD SSD
Non-Volatile Architecture
44 Ex. In Parallel Processing System Node M-1 Node 1 Processor 3 (Sub) Processor 3 (Sub) Back-gate & Node 0 Frequency Memory Processor 3 (Sub) Control Memory Processor 2 (Sub) Memory (Main to Sub) Memory Processor 1 (Sub) Memory Mem. Memory FPU 1 Cache CPU 0 (Main) Memory FPU n Other Processor Memory Contl Mem. CPU FPU 1 FPU n Other Cache Memory Contl Memory Mem. Cache CPU FPU 1Back-bias Memory FPU n Other Selector Contl Mem. CPU FPU 1Back-bias Cache Contl FPU n Other Memory Selector Clock Divider and Selector Back-bias Selector Network and contr. Selector regstClock SleepDivider SW/Back-gate I/F PLL Back-bias Voltage Generator Network Clock Divider and Selector PLL Back-bias Voltage Generator I/F regst Gated clock/Op. freq. contr. Network PLL Back-bias Voltage Generator I/F PLL Internal Voltage Generator T.Kawahara, Hitachi
H. Aoki, et al., VLSI Circ. 2008
45 Power-down during Parallel Processing? Processors 0 1 2 3
Network
0 1 2 3
H. Aoki, et al., VLSI Circ. 2008
0 1 2 3
Node 0 Node 1 Node M-1
Processor 1
Time T.Kawahara, Hitachi
2
3
4
Preparation for processing Communication with other Proc. • Needs µs order • Not scalable to processor speed • Larger parallel, longer time -> use for fine-grain power control In parallel processing • Hard to allocate entire processor • Hard to assign equal load either • Hard to assume homogeneity -> use for fine-grain power control
46 Power-down Controllability Processors 0–3 Node 1
Node 0 Main Sub 1 2 3 send
: Fully active : FPU inactive : Waiting
Network
Main Sub 1 2 3
recv
Node M-1 Main Sub 1 2 3
time
Program: Program: T=f() T=f() S=SUM(T*A(1:N)) S=SUM(T*A(1:N)) B(1:N)=S+A(1:N) B(1:N)=S+A(1:N)
Calculate T Broadcast T Set up parallel Calculate partial sum for each thread
isend recv T.Kawahara, Hitachi
Calculate partial sum Imbalanced Broadcast partial sum load Calculate total sum Set up parallel Calculate each No load element of B H. Aoki, et al., VLSI Circ. 2008
47 Power Control in Parallel Processing • High-performance processor capacity is underutilized. • Can reduce power. Need automatic implementation tools. Case 1-1 Case 1-2 Case 1-3 Main Thread Exec. Communication Parallel Execution Sub Sub Sub Main 1 2 3 Main 1 2 3 Main 1 2 3
H. Aoki, et al., VLSI Circ. 2008
: Fully active : FPU inactive : Waiting
Case1-1+1-2, Time Grain Size : µs order In case 1-3: Execution Cases with Memory Hierarchy Case 2-1 Case 2-2 Case 2-3 Processor Processor Processor Execution Characteristics
Cache
Cache
Memory
Memory
Cache Bandwidth Time Grain Size