Non-Volatile Memory and Normally-Off Computing - ASP-DAC

16 downloads 107 Views 3MB Size Report
Jan 26, 2011 - Example (but not ideal yet). 45s. HDD. DRAM. OS Data. Transf. and Expan. SSD. DRAM ..... •Restore operation can be performed with PRE.
ASP-DAC 2011 26th January 2011 Yokohama, Japan

1

Non-Volatile Memory and Normally-Off Computing

T. Kawahara Central Research Laboratory, Hitachi, Ltd.

T.Kawahara, Hitachi

2

Acknowledgements

Original works cited in this presentation were supported in part by; ; • The “High-Performance Low-Power Consumption Spin Devices and Storage Systems” program (headed by Professor Hideo Ohno of Tohoku University) under Research and Development for NextGeneration Information Technology of MEXT, • One of projects on "Fundamental Technologies for the Next Generation Supercomputing" by MEXT, and, • The Japan Society for the Promotion of Science (JSPS) through its “Funding Program for World-Leading Innovative R&D on Science and Technology (FIRST Program). “ The author would like to thank to; Tohoku University: S. Ikeda, T. Meguro, R. Sasaki, M. Yamanouchi, I. Morita, T. Hirata, T. Hanyu, and H. Ohno. Hitachi: R. Takemura, K. Ono, T. Ishigaki, T. Yamada, A. Kotabe, S. Hanzawa, S. Yamaguchi, K. Miura, H. Yamamoto, J. Hayakawa, N. Matsuzaki, Y. Mouri, K. Ito, H. Takahashi, H. Hasegawa, Y. Goto, N. Osakabe, and H. Matsuoka. T.Kawahara, Hitachi

3

Outline Introduction Non-volatile memories Spin-transfer torque RAM scalability Normally-off instant-on computing Conclusion

T.Kawahara, Hitachi

4

Social system needs large power

• World is power hungry, but... Home Network

Office

Light up City Infrastructure

Distribution Travel Tracking House Health

Entertainment LAN

Study Event

T.Kawahara, Hitachi

Small Area Network

5

Necessity of low power IT

• In 2025, five times larger power consumption is expected in Japan than that in 2005. • Will reach nine times in the world.

Power Dissipation in Japan

(BkWh/year) 25 (Source: Ministry of Economy, Trade and Industry (METI)) Network (router, switch) ) 20 15 10

Server PC

5 0 2005

T.Kawahara, Hitachi

TV 2010

2015 (Year)

2020

2025

6

Sustainable and innovative society

• Solutions are not only by LSI technology, but LSI can contribute in many for sustainable world. Conventional trend Power Consumption

Lower Power by Innovation New: Normally-OFF, Instant-ON

1990

T.Kawahara, Hitachi

with conventional nano Tr., low-voltage, multi-core, etc.

Direction for sustainable and innovative society 2005

2025

Target as carbon emission in 2007 G8 Summit @Heiligendamm Germany 2007.6

2050 (year)

7

Beyond Low Voltage Operation

• Already made extensive progress for low voltage operation with high performance in last twenty years. • Complementarily, a new kind of innovation for further. 101

T. Sakata et al., VLSI93.

100 Current (A)

VPH VDD

1.2A

IACT

10-1

Local VDD

IAC

10-2

VSS VNH

10-3 10-4

tRC = 180 ns T = 75°°C S = 97 mV/decade

IDC

10-5 10-6

VDD

16 M 64 M 256 M 1 G 4 G 16 G 64 G Capacity (bits) 3.3

2.5

2.0

1.5 1.2 VDD (V)

1.0

0.8

VSS

1993: Well-driven SA (Mitsubishi) 1993: 256Mb DRAM (Hitachi) T. Ooishi, et al., VLSI93.

G. Kitsukawa, et al., ISSCC93.

NMOS Back bias terminal

PMOS

NiSi (FUSI) Source

Gate

Drain STI

0.53 0.40 0.32 0.24 0.19 0.16 0.13 Extrapolated VT at 25°°C (V) 2.0

20 44 98 210 470 Wtotal / Leff (× ×106) 1993: Projected Current Consumption (Hitachi) T.Kawahara, Hitachi

6.7

Substrate

SOI BOX

Well (Back gate)

2004: Dopeless Channel CMOS (SOTB) (Hitachi) R. Tsuchiya, et al., IEDM04.

8

Memory and Computing

• Memory systems are constructed according to a deep hierarchy based on speed and capacity. Volatile memory and nonvolatile memory are often combined • Leads to slow start-up, long idle times, and low power efficiency. Volatile Volatile

Non-Volatile Non-Volatile

~ns

Big difference in • Speed • Volatile/NV

~10ns

~100ns

~ms T.Kawahara, Hitachi

FF/Register FF/Register

Speed Density Fast

Low

Slow

High

Cache Cache

Main Main memory memory

HDD HDD

SSD SSD

9

Normally OFF, Instant ON Computing

• New innovation in addition to low-voltage. • Normally OFF, Instant ON – Turn off anytime when not in use, – but operate instantly with full performance when needed. • Need data stored in the state of operation before any turned off, with zero power for retaining data. – > Nonvolatile RAM (NV-RAM) • NV-RAM: Infinite number of fast write and read operations with non-volatility. • Adaptive solutions in wide time domain, and wide area domain. LSI level to System level. T.Kawahara, Hitachi

10 Example (but not ideal yet) • Power ON time is improved to 1/9 with NV-RAM. • Nonvolatility achieves low power also. Stand-by Power( (mA) ) 0 100 HDD+DRAM Normal Power ON

OS Data Transf. and Expan.

HDD

SSD+DRAM SSD Normal Power ON

DRAM

25s

100mA Data is stored in RAM

With NV-RAM Power ON

RAM: 1GB Stored all the data

5s 0

45s

Stand-by Power

DRAM Resume Stand-by Power ON

T.Kawahara, Hitachi

DRAM

5

10

15 20 25 30 Power ON time (s)

35

40

45

11 Outline Introduction Non-volatile memories Spin-transfer torque RAM scalability Normally-off instant-on computing Conclusion

T.Kawahara, Hitachi

12 RAM Comparison • NV-RAM: Infinite # of write cycle, non-volatile, fast R/W. • Magnetic RAMs are good candidates for NV-RAM. 1020

100u SRAM/DRAM (Volatile)

RAM PRAM

End uran ce 1 0y 3y

Read Latency (s)

Endurance

1015 Magnetic

10u

FLASH (NAND)

1u FLASH(NOR)

Magnetic RAM

100n

1010

FeRAM

PRAM 10n

DRAM

FLASH 105 1n

1u 10n 100n Write Time (s)

10u

1n 0

SRAM (100-150)

30 10 20 2 Cell Area (F ) ∝(1/Density)

General Memory Performance Map T.Kawahara, Hitachi

FeRAM

13 Origin of Non-volatility • Various phenomena is applicable: insulator barrier, change of structure (bistable), and alignment of electron condition (bistable). Insulator barrier

Alignment of electron condition

Change of structure

Floating gate 3d

Phase change Floating gate

3d

Ferromagnetism (+ Magneto resistive effect for memory)

Gate Si3N4 S

D Bit2 Bit1

Ferroelectrics

Silicon nitride

Nano-dots T.Kawahara, Hitachi

Ionization

-> Easier to achieve infinite write cycle (endurance).

14 Phase Change Memory • Resistance change between amorphous and poly of chalcogenide material. Large resistance change. • Good scalability. 10mA

Plate Chalcogenide

Write Current

Electrode

Chalcogenide Poly amorphous

bitline wordline

1mA

0.1mA 10nm

Memory cell 120 RESET Pulse M.P. SET Pulse C.T. T2

100

Current (a.u.)

TemperatureT1

80

T.Kawahara, Hitachi

100nm Electrode Diameterφ φ

1μ μm

Current

crystalline “0”

Resistance changes

Phase change region

60 40 20 0

time

S. Lai, IEDM 2003.

Vth φ

amorphous “1”

0 0.2 0.4 0.6 0.8 Voltage (V) from Ovonyx HP

1

1.2

Y.N. Hwang et al., IEDM 2003.

15 Development of Phase Change Memory • Principle was announced in the 70's. • Production as NOR flash replacement in recent years. year

70’s

02

03

04

05

06

07

08

09

10

512Mb 128Mb (Numonyx(Micron)) (Samsung)

Product

Stand-alone 4Mb 8Mb (STMicro.)

256Mb

1Gb (Numonyx)

4Mb 256b (ECD, Intel) (Ovonyx, Intel) 64Mb 256Mb 512Mb (Samsung)

Conf.

Embedded

T.Kawahara, Hitachi

4Mb (Hitachi/Renesas)

4Mb (STMicro.)

16 Phase Change Memory Chips Stand-alone (Numonyx) • Process: 45 nm • Power Supply: 1.8 V • Density: 1Gb • Chip size: 37.5 mm2 • Cell size: 0.011 µm2 (5.5F2) • Cell Tr: BJT • Endurance: ~109

Embedded (STMicroelectronics) • Process: 90 nm • Power Supply: 1.2 V • Density: 4Mb • Chip size: 3 mm2 • Cell size: 0.29 µm2 (36F2) • Cell Tr: MOS • Endurance: ~107

Memory Cell

Chip photo

C. Villa, et al., ISSCC 2010 G. Servalli, IEDM 2009 T.Kawahara, Hitachi

Memory Cell

Chip photo

G. De Sandre, et al., ISSCC 2010 R. Annunziata, et al., IEDM 2009

17 Stacked Phase Change Memory • Selecting element and the memory device are stacked. • Selecting element by non-substrate Si, formed on wiring. DerChang Kau, et al., IEDM 2009

Chip

Memory cell

I-V Characteristics

A switch made of phase-change material itself (Numonyx).

Y. Sasago, et al., VLSI Tech. 2009

Cross sectional view

Memory cell

Polysilicon diode formed on wiring. Current drivability: 160uA@30nm. T.Kawahara, Hitachi

18 TMR Device and Memory cell • Resistance changes between parallel and anti-parallel. • Use this two-state as an information bit. Circuit diagram Bit Line (BL) TMR device

(CoFeB)

(CoFeB)

Word Line (WL)

Free layer MgO barrier Pinned layer Selecting transistor

Source Line (SL) Parallelized State (Low Resistance: RP)

Anti-Parallelized State (High Resistance: RAP)

TMR ratio = (RAP - RP) / RP T.Kawahara, Hitachi

19 Innovation in TMR device: MgO • MgO barrier provides two breakthroughs. • High TMR ratio and small write current. Free layer AlO barrier (amorphous) Pinned layer Free layer MgO barrier (crystal) Pinned layer

800

604%(RT)

TMR ratio (%)

600

Hitachi & Tohoku Univ. (100)bcc MgO-barrier MTJs

400

RP

AIST/ CanonANELVA IBM

200 Al2 O3 -barrier MTJs

RAP

RP

RAP

(CoFeB) (CoFeB)

0 1995 T.Kawahara, Hitachi

2000

year

2005

2010

In-plane Perpendicular TMR ratio = (RAP - RP) / RP

20 SPRAM (SPin-transfer torque RAM) 103 102 Threshold Current Density (Jc)

1 x 107 A/cm2 1 x 106 A/cm2 5 x 105 A/cm2 1 x 105 A/cm2

101 100

0

100 200 300 400 TMR Device Width (nm)

BL

I

MTJ MTJ SL

WL

SPRAM T.Kawahara, Hitachi

500

Write Current, Ic (mA/bit)

Write Current, Ic (µA/bit)

• MRAM: Matured in product. Inferior in scalability. • SPRAM: Good potential in scalability. 20 15

Write Current Ic

IWWL + IBL

10 5 0 10

IBL

100 1000 TMR Device Width (nm)

IWWL

BL MTJ MTJ SL WWL RWL RWL

Conventional MRAM

21 Thermal Stability Factor: E/kBT • Thermal field affects retention and disturbance. • High E/kBT attainable by perpendicular magnetization. • Perpendicular TMR with typical CoFeB was reported. (S. Ikeda et. al, Nature Materials, Volume 9, pp.721–724 (2010))

Retention Parallelized

Anti-Parallelized

Error Transition

Thermal fluctuation P=1-exp{-(t/τ0)exp[-E/kBT]} T.Kawahara, Hitachi

Disturbance Parallelized

E/kBT

Anti-Parallelized

Error Transition

E/kBT(1-Icell/Iw) P=1-exp{-(t/τ0)exp[-E/kBT(1-Icell/Iw)]} Icell: Read Current, Iw: Write Current

22 Development of Magnetoresistive Memory • MRAM: Matured in product. • R&D moved to SPRAM. year

00

01

02

03

04

05

06

07

09

4Mb (Everspin)

4Mb (Freescale)

Product

08

512b 256kb 1Mb 4Mb (Motorola) (Freescale)

10 16Mb

32Mb (NEC)

512kb(NEC) 1kb(IBM) Conf.

MRAM SPRAM

128kb (IBM/Infineon) 16kb (Samsung)

16Mb (Toshiba/NEC) 16Mb

1Mb (Renesas)

4Mb(TDK) (IBM,TDK)

256kb(Grandis) 64Mb (Toshiba) 2Mb 32Mb 64Mb (Hynix) (Hitachi/Tohoku Univ.)

4kb (Sony/AIST) T.Kawahara, Hitachi

23 SPRAM Chips Perpendicular TMR (Toshiba) • Process: 65 nm • Power Supply: 1.2 V • Density: 64 Mb • Chip size: 47.12 mm2 • Cell size: 0.3584 µm2 (84.8F2) • Write Current: 50 uA • TMR device: Perpendicular

K. Tsuchida, et al., ISSCC 2010

T.Kawahara, Hitachi

Modified DRAM Process (Hynix, Grandis) • Process: 54 nm • Power Supply: 1.8 V • Density: 64 Mb • Chip size: 23.36 mm2 • Cell size: 0.041 µm2 (14F2) • Write Current: 140 uA • TMR device: In-plane

S. Chung, et al., IEDM 2010

24 Progress in memory development • Memory tied up closely to deep material and device tech. • Modeling is indispensable to quantitative cooperation with circuitry. Literature, SPICE Modeling simulation TEG design BS,・・・ ・・・

Diversity

Material ・organic ・Bio ・MEMS ・Nano

Feasibility study

Spin

Complexity

Cell operation Array operation

Phase1

Mechanism ・Analog ・Multi-value ・Destructive/ND

Polymer

Phase 2,3,・・ ・・

CNT DNA

・Volatile/NV

Time T.Kawahara, Hitachi

(Team building, Management)

Proto Chip

Winner

25 Ex. Modeling of TMR Memory Cell (1/2) • Physics was built into analog and mixed-signal (AMS) simulation, and simulated with CMOS circuits. • Dependence of basic physical characteristic to memory function can be directory simulated. SPICE

Data

WL Write DV

TMR model

BSIM

TMR model

Write DV Cell Tr

•Non-linear I-R behavior •T-dependent I-R behavior •P-state: GP ∝ V2 (Simmons) •AP-state: ∆R/R ∝ exp(-|V|/Vh) • ∆R/R(V=0) : function of Tdependent polarization •Non-linear parameter Vh : Tindependent

G(V, T, s)=A(V, T, s)GP(V) GP: P-state conductance A: modulation term V: voltage T: temperature K. Ono, et al., s: spin state IEDM 2009

•Numerical solution of stochastic-LLG equation •Include thermal fluctuation of macro-spin

[ (

)

( (

ς fl (t ) = 0, ςϕfl (t )ςθfl (t ′) = 2δ ϕθ δ (t − t ′) T.Kawahara, Hitachi

))]

∂s = s × heff + hfl − αs × s × heff + hfl ∂τ γ H ∆τ = 0 K2 ∆t 1+ α α 2k BT h fl = (1 + α 2 ) M S H KVo ςfl

s: normalized spin vector HK: easy-plane anisotropy field γ0: gyromagnetic ratio heff: effective field hfl: stochastic field Ms: saturation magnetization α : damping coefficient Vo: free-layer volume kB : Boltzmann’s constant T: temperature

26 Ex. Modeling of TMR Memory Cell (2/2) • Physics was built into analog and mixed-signal (AMS) simulation, and simulated with CMOS circuits. • Dependence of basic physical characteristic to memory function can be directory simulated. Transient write current and stochastic switching K. Ono, et al., IEDM 2009

0.5 Cumulative probability

Cell Current (mA)

Meas. 0.4 0.3 0.2 Sim. 0.1 0

-0.1

0

T.Kawahara, Hitachi

20

40 60 Time (ns)

80

100

1.0 0.8 0.6 0.4 Measurements 0.2 0

TMR model 0

10 20 30 40 Switching time (ns)

50

27 For emerging memory simulation • Need path to generate physical model with new materials. Memory tech. should handle physical phenomena. • Estimation of phenomena between adjacent cells and the size effect are important.

Y. Sasago, et al., VLSI Tech. 2009

Emerging Memories

(from ITRS ERD/ERM WG,2010)

"Visualization“: important to intuitive understanding of peculiar phenomenon to the material and grasping the disturbance between adjacent cells in the array. T.Kawahara, Hitachi

“Size dependent phenomena“: TCAD necessary to handle materials more than current Si, otherwise size dependence, that is important, cannot be estimated.

28 Outline Introduction Non-volatile memories Spin-transfer torque RAM scalability Normally-off instant-on computing Conclusion

T.Kawahara, Hitachi

29 Memory cell size scalability • Transistor drivability ∝F, but TMR write current ∝F2. • Cell becomes easy to shrink with advancing feature size. Transistor drivability, Write current (uA)

1000

Jc= 4.0 MA/cm2

TMR size:FxF Id= 300 uA/um

2.0

F2

F2

F2

F2

Gate width

10F

4F

2F

F

Cell size

40F2

16F2-

8-6F2

4F2

TMR size

100 1.0

TMR

Tr W= 10F

0.5

4F 2F

10

1F

8F 2 1T1MTJ

TMR

: Transistor drivability : Write current 8F 2

1

90

T.Kawahara, Hitachi

65 45 32 Feature size, F (nm)

22

2T1MTJ

30 Scalable 4F2 Memory-cell configurations • TMR memory cell has better scalability than DRAM cell due to low aspect ratio. R. Takemura, et al., IMW 2010 PL

Capacitor

SL

2F

Tr. BL

Capacitor height ~2 µm

TMR

WL

Cell layout BL TMR



“11”

Δ R1

TMR2 (ΔR2 @Ic2±) WL

“01”

Δ R2

Δ R2

“10”

Δ R1

“00”

RAP1+RAP2 RAP1+RP2 RP1+RAP2 RP1+RP2

BL

MTJ1 72 nm

MTJ

MTJ2

SL

SL

Current WL

Memory cell configuration T.Kawahara, Hitachi

Ic2-

Ic1-

Ic1+

Ic2+

Characteristic

34 Toward 2F2/bit and More… • 2n value level can achieve n-bits per cell. • Needs high E/kBT and sufficiently small RA. (RA: resistance area product of TMR) T. Ishigaki, et al., VLSI Tech. 2010

R1 R2 R3 TMR1

TMR2

TMR3

R1

R2

R3

R

“1 1 1”

R1AP+R2AP+R3AP

“1 1 0”

R1AP+R2AP+R3P

“1 0 1”

R1AP+R2P+R3AP

“1 0 0”

R1AP+R2P+R3P

“0 1 1”

R1P+R2AP+R3AP

“0 1 0”

R1P+R2AP+R3P

“0 0 1”

R1P+R2P+R3AP

“0 0 0”

R1P+R2P+R3P IP3 IP2 IP1

Memory cell circuit T.Kawahara, Hitachi

IAP1 IAP2 IAP3

I

3-bit/cell state transition scheme

35 Future tech.: Voltage-driven magnetic RAM • Basic research is progressing for voltage-driven magnetic RAM. Current-driven device should face power consumption concern, voltage drop in line, and ability for current supply switch. -> Solved by Voltage-driven.

T.Kawahara, Hitachi

Ref.: http://www.csis.tohoku.ac.jp/ (Center for Spintronics Integrated Systems)

36 Outline Introduction Non-volatile memories Spin-transfer torque RAM scalability Normally-off instant-on computing Conclusion

T.Kawahara, Hitachi

37 Non-Volatile Architecture Merits for non-volatile architecture (non-volatile computing) • Disappear the difference between suspend and hibernation. – Easy to control On/Off of server due to the load.

• Assured write back operation. – Fast and quick restarting.

Issues • New OS for the best use of non-volatile memory. – When restarting, activate the contents of memory. – For security issue, handle the erase of contents. – Same as uninterruptible computation.

• Cooperation with processor vender and OS vender. – Ultimately, compiler schedules On/Off. Volatile Volatile

FF/Rgst FF/Rgst

FF/Rgst FF/Rgst

Cache Cache

Cache Cache

Main Main memory memory

Main Main memory memory

Non-Volatile Non-Volatile

HDD HDD

SSD SSD

Conventional Architecture T.Kawahara, Hitachi

HDD HDD

SSD SSD

Non-Volatile Architecture

38 In CPU and Basic Circuitry Layer • Achieving a normally-off state as much as possible by detecting any standby operation, when the power needed is preferably lower than required for transition. • Eliminating the power required for communication between the memory and logic circuits. • High energy efficiency in operation and no DC power consumed in CPU's long standby state, as occurs often in most business applications. Volatile Volatile

FF/Rgst FF/Rgst

FF/Rgst FF/Rgst

Cache Cache

Cache Cache

Main Main memory memory

Main Main memory memory

Non-Volatile Non-Volatile

HDD HDD

SSD SSD

Conventional Architecture T.Kawahara, Hitachi

HDD HDD

SSD SSD

Non-Volatile Architecture

39 Nonvolatile Logic-in-Memory Architecture • Logic-in-Memory Architecture (proposed in 1969): Storage elements are distributed over a logic-circuit plane. Magnetic Tunnel Junction (MTJ) device (TMR device) MTJ layer CMOS layer

●Storage is nonvolatile: (Leakage current is cut off) ●MTJ devices are put on the CMOS layer ●Storage/logic are merged: (global-wire count is reduced) T.Kawahara, Hitachi

•Non-volatility •Unlimited endurance •Fast writability •Scalability •CMOS compatibility •3-D stack capability

Static power is cut off. Chip area is reduced. Wire delay is reduced. Dynamic power is reduced.

Ref.: http://www.csis.tohoku.ac.jp/ (Center for Spintronics Integrated Systems)

40 Design of a Nonvolatile Full Adder • Demonstrated quick on/off operation. • Dynamic power reduce to 1/4 with keeping performance. S. Matsunaga et. al., Applied Physics Express 1 (2008) 091301.

15.5 µm

SUM

P : Precharge phase E : Evaluate phase

VDD

“PowerPoweroff” off” P E Standby Active

“PowerPower-off” off”

Standby

Inputs (A,B and Ci)

P E

P E

Active

A=0,B=0,Ci=0

“PowerPoweroff” off”

“PowerPower-off” off”

P E

Standby

Active Standby

A=0, B=0,Ci=1

CLK

CARRY

13.9 µm

Sbefore=0

S (=A B Ci) S’ (=A B Ci) 780mV/div

10.7 µm

Safter=0

1mS/div

S’before=1

Sbefore=1

Safter=1

S’before=0

S’after=0

S’after=1

Sbefore(S’ (S’before) : S(S’ S(S’) just before powerpower-off Safter(S’ (S’after) : S(S’ S(S’) just after powerpower-on

CMOS Proposed 224 ps 219 ps Delay 71.1 mW 16.3 mW Dynamic power (@500MHz) 2 ns/bit 10 ns/bit Write time 4 pJ/bit 20.9 pJ/bit Write energy 0.9 nW 0.0 nW Static power 2 2 333 mm (42 MOSs) 315 mm (34 MOSs + 4 MTJs) Area (Device counts) Ref.: http://www.csis.tohoku.ac.jp/ (Center for Spintronics Integrated Systems) T.Kawahara, Hitachi

41 High-Density TCAM Cell Design • Halves active power-delay product with 1/100 stand-by. • 2bit TMR storage merged into CMOS logic. S. Matsunaga, et al., APEX, 2, 2, 023004, Feb.2009. Equality-detection (ED) circuit

1-bit storage

3.0 µm

ML/BL

ML VDD

Leakage current

WL VSS

2-bit storage (MTJs) Logic (MTJs & MOSs)

Leakage current

BL1

SL’

SL

SL’/WL1

BL2

PowerPower-off

Active P

E

Standby

E

PowerPower-off

P

E

Standby

Active P

E

CLK

Stored data B=0 S=0

S

OUTbefore=1

Stored data B=0 S=0

OUTafter=1

S=1 OUTbefore=0

S=1 OUTafter=0

OUT

10µ µs

T.Kawahara, Hitachi

Active Power

CMOSCMOS-only

Proposed

Cell array

109.6 µW

107.3 µW

SA

30.8 µW

ACC

3.7 µW

CLK

32.7 µW

Cell Standby array Power SA Delay

780mV

Match

Ref. cell

Proposed

Active P

TCAM cell

Dynamic current comparator in MLSA

SL/WL1

Conventional VDD

Output generator in MLSA

9.8 µ m

1-bit storage

Match Mismatch

Mismatch

103%

9.6 µW 3.7 µW 62.0 µW

340.9 µW 1.2% 1.8 µW 1.2 µW

2.3 µW

1.39 ns 43%

0.60 ns

HSPICE simulation under a 90nm CMOS/MTJ technology @125MHz, RP : 2kΩ, TMR ratio : 100%

Ref.: http://www.csis.tohoku.ac.jp/ (Center for Spintronics Integrated Systems)

42 Design of a Compact Nonvolatile FPGA • TMR device/logic direct combination by differential current-mode tech. simplifies the circuit, reduces power. D. Suzuki, et al., Symp. on VLSI Circuits, 80/81, June 2009.

SA SA SA SA

MTJ MTJ MTJ MTJ

Combinational logic (CMOS)

Output

Proposed

102 MOSs + 8 MTJs

29 MOSs + 4 MTJs

702 µm2

287µ µm2

Delay *2)

140 ps

185 ps

Power*2)

26.7 mW

17.5 mW

Power

0 mW

0 mW

Device Counts Area *1)

(SA: Sense Amplifier)

Low voltage

Nonvolatile SRAM

High Voltage

Active

Conventional

Standby

*1) Estimation based on a 0.14mm process *2) HSPICE simulation based on a 0.14mm MOS/MTJ-hybrid process Active Standby Active

MTJ MTJ MTJ MTJ

Combinational logic (Current-Mode) Low voltage

Proposed

SA

CLK

Output

VDD

High voltage

Selection Transistor Tree

A ⊕B 00 01 10 11

VDD= 0

A ⊕B 00 01 10 11

Z Z 0.78V/div 50µs/div

T.Kawahara, Hitachi

Ref.: http://www.csis.tohoku.ac.jp/ (Center for Spintronics Integrated Systems)

43 In Main Memory and System Layer • For instant-on, keeping the status in main memory in standby without power dissipation. – Simplified control circuit because the refresh operation is minimized. – Battery backup in cache logging area is eliminated.

• Hot standby configuration (speed) with deep standby mode (power) is attainable. – Achieving system reliability for mission-critical systems (high SLA) with low-power. Volatile Volatile

FF/Rgst FF/Rgst

FF/Rgst FF/Rgst

Cache Cache

Cache Cache

Main Main memory memory

Main Main memory memory

Non-Volatile Non-Volatile

HDD HDD

SSD SSD

Conventional Architecture T.Kawahara, Hitachi

HDD HDD

SSD SSD

Non-Volatile Architecture

44 Ex. In Parallel Processing System Node M-1 Node 1 Processor 3 (Sub) Processor 3 (Sub) Back-gate & Node 0 Frequency Memory Processor 3 (Sub) Control Memory Processor 2 (Sub) Memory (Main to Sub) Memory Processor 1 (Sub) Memory Mem. Memory FPU 1 Cache CPU 0 (Main) Memory FPU n Other Processor Memory Contl Mem. CPU FPU 1 FPU n Other Cache Memory Contl Memory Mem. Cache CPU FPU 1Back-bias Memory FPU n Other Selector Contl Mem. CPU FPU 1Back-bias Cache Contl FPU n Other Memory Selector Clock Divider and Selector Back-bias Selector Network and contr. Selector regstClock SleepDivider SW/Back-gate I/F PLL Back-bias Voltage Generator Network Clock Divider and Selector PLL Back-bias Voltage Generator I/F regst Gated clock/Op. freq. contr. Network PLL Back-bias Voltage Generator I/F PLL Internal Voltage Generator T.Kawahara, Hitachi

H. Aoki, et al., VLSI Circ. 2008

45 Power-down during Parallel Processing? Processors 0 1 2 3

Network

0 1 2 3

H. Aoki, et al., VLSI Circ. 2008

0 1 2 3

Node 0 Node 1 Node M-1

Processor 1

Time T.Kawahara, Hitachi

2

3

4

Preparation for processing Communication with other Proc. • Needs µs order • Not scalable to processor speed • Larger parallel, longer time -> use for fine-grain power control In parallel processing • Hard to allocate entire processor • Hard to assign equal load either • Hard to assume homogeneity -> use for fine-grain power control

46 Power-down Controllability Processors 0–3 Node 1

Node 0 Main Sub 1 2 3 send

: Fully active : FPU inactive : Waiting

Network

Main Sub 1 2 3

recv

Node M-1 Main Sub 1 2 3

time

Program: Program: T=f() T=f() S=SUM(T*A(1:N)) S=SUM(T*A(1:N)) B(1:N)=S+A(1:N) B(1:N)=S+A(1:N)

Calculate T Broadcast T Set up parallel Calculate partial sum for each thread

isend recv T.Kawahara, Hitachi

Calculate partial sum Imbalanced Broadcast partial sum load Calculate total sum Set up parallel Calculate each No load element of B H. Aoki, et al., VLSI Circ. 2008

47 Power Control in Parallel Processing • High-performance processor capacity is underutilized. • Can reduce power. Need automatic implementation tools. Case 1-1 Case 1-2 Case 1-3 Main Thread Exec. Communication Parallel Execution Sub Sub Sub Main 1 2 3 Main 1 2 3 Main 1 2 3

H. Aoki, et al., VLSI Circ. 2008

: Fully active : FPU inactive : Waiting

Case1-1+1-2, Time Grain Size : µs order In case 1-3: Execution Cases with Memory Hierarchy Case 2-1 Case 2-2 Case 2-3 Processor Processor Processor Execution Characteristics

Cache

Cache

Memory

Memory

Cache Bandwidth Time Grain Size