g/pdf - Computer Science - Cornell University

8 downloads 91 Views 13MB Size Report
Design Doc sign-up via CMS sign up Sunday .... Google (YouTube, Picasa, Gmail, Docs), Facebook, Flickr. • 100 GB per second is ... Where to? Smart Dust …
What does the Future Hold? Hakim Weatherspoon CS 3410, Spring 2013 Computer Science Cornell University

Announcements How to improve your grade? Submit a course evaluation and drop lowest  homework score • To receive credit, Submit before Tuesday, May 7th

Announcements Final Project Design Doc sign‐up via CMS sign up Sunday, Monday, or Tuesday May 5th, 6th, or 7th Demo Sign‐Up via CMS. sign up Tuesday, May 14th or Wednesday, May 15th CMS submission due: • Due 6:30pm Wednesday, May 15th

Big Picture about the Future

Big Picture How a processor works?  How a computer is organized?

register file B

alu

D

memory

D

A

compute jump/branch targets

+4

Instruction Decode

Instruction Fetch IF/ID

ctrl

detect hazard

ID/EX

M

dout

forward unit

Execute

EX/MEM

Memory

ctrl

new pc

din

memory

ctrl

extend

B

control imm

inst

PC

addr

Write‐ Back

MEM/WB

What’s next?

More of Moore

Moore’s Law

Moore’s Law introduced in 1965

• Number of transistors that can be integrated on a single  die would double every 18 to 24 months (i.e., grow  exponentially with time).

Amazingly visionary  2300 transistors, 1 MHz clock (Intel 4004) ‐ 1971 16 Million transistors (Ultra Sparc III) 42 Million transistors, 2 GHz clock (Intel Xeon) – 2001 55 Million transistors, 3 GHz, 130nm technology,  250mm2 die (Intel Pentium 4) – 2004 • 290+ Million transistors, 3 GHz (Intel Core 2 Duo) – 2007 • 731 Million transistors, 2‐3Ghz (Intel Nehalem) – 2009 • 1.4 Billion transistors, 2‐3Ghz (Intel Ivy Bridge) – 2012 • • • •

Moore’s Law Dual‐core Itanium 2

Ivy Bridge K10

Itanium 2 K8 P4 Atom

486 386 286 8088 8080 4004 8008

Pentium

Why Multicore? Moore’s law • A law about transistors • Smaller means more transistors per die • And smaller means faster too

But: Power consumption growing too…

What to do with all these transistors?

Multi‐core

Multi‐core

http://www.theregister.co.uk/2010/02/03/intel_westmere_ep_preview/

The first transistor

• An Intel Westmere

• on a workbench at  AT&T Bell Labs in 1947 • Bardeen, Brattain, and Shockley

– – – – –

1.17 billion transistors 240 square millimeters 32 nanometer: transistor gate width Six processing cores Release date: January 2010

Multi‐core

http://forwardthinking.pcmag.com/none/296972‐intel‐releases‐ivy‐bridge‐first‐processor‐with‐tri‐gate‐transistor

The first transistor

• An Intel Ivy Bridge

• on a workbench at  AT&T Bell Labs in 1947 • Bardeen, Brattain, and Shockley

– – – – –

1.4 billion transistors 160 square millimeters 22 nanometer: transistor gate width Up to eight processing cores Release date: April 2012

What to do with all these transistors?

Many‐core  and Graphical Processing units

Faster than Moore’s Law One‐pixel polygons (~10M polygons @ 30Hz) 

Peak Performance  ('s/sec)

Slope ~2.4x/year  10 10 10

9

nVidia G70

(Moore's Law ~ 1.7x/year)

8 UNC/HP PixelFlow SGI  IR

7 Division Pxpl6

UNC Pxpl5

10 10

SGI SkyWriter

6 5

SGI VGX

Flat  shading 

UNC Pxpl4

HP VRX

4

86

HP TVRX Stellar GS1000

SGI GT

HP CRX

10

SGI  R‐Monster

ATI  Radeon 256

SGI Iris

88

Gouraud shading  

90

SGI  RE1 E&S F300

Nvidia TNT GeForce E&S 3DLabs SGI Harmony Cobalt Glint Accel/VSIS Voodoo

Megatek E&S Freedom

SGI RE2

Textures

PC Graphics

Division VPX

Antialiasing

92

Year

94

96

Graph courtesy of Professor John Poulton (from Eric Haines)

98

00

AMDs Hybrid CPU/GPU AMD’s Answer: Hybrid CPU/GPU

Cell IBM/Sony/Toshiba Sony Playstation 3 PPE SPEs (synergestic)

Parallelism Must exploit parallelism for performance • Lots of parallelism in graphics applications • Lots of parallelism in scientific computing

SIMD: single instruction, multiple data • Perform same operation in parallel on many data  items • Data parallelism

MIMD: multiple instruction, multiple data • Run separate programs in parallel (on different data) • Task parallelism

NVidia Tesla Architecture

Why are GPUs so fast?

FIGURE A.3.1 Direct3D 10 graphics pipeline. Each logical pipeline stage maps to GPU hardware or to a GPU processor.  Programmable shader stages are blue, fixed‐function blocks are white, and memory objects are grey. Each stage processes a vertex,  geometric primitive, or pixel in a streaming dataflow fashion. Copyright © 2009 Elsevier, Inc. All rights reserved.

Pipelined and parallel Very, very parallel: 128 to 1000 cores

FIGURE A.2.5 Basic unified GPU architecture. Example GPU with 112 streaming processor (SP) cores organized in 14 streaming  multiprocessors (SMs); the cores are highly multithreaded. It has the basic Tesla architecture of an NVIDIA GeForce 8800. The  processors connect with four 64‐bit‐wide DRAM partitions via an interconnection network. Each SM has eight SP cores, two special function units (SFUs), instruction and constant caches, a multithreaded instruction unit, and a shared memory. Copyright © 2009  Elsevier, Inc. All rights reserved.

General computing with GPUs Can we use these for general computation? Scientific Computing • MATLAB codes

Convex hulls Molecular Dynamics Etc. NVIDIA’s answer: Compute Unified Device Architecture (CUDA) • MATLAB/Fortran/etc.  “C for CUDA”  GPU Codes

What to do with all these transistors?

Cloud Computing

Cloud Computing Datacenters are becoming a commodity Order online and have it delivered • Datacenter in a box: already set up with commodity hardware & software (Intel, Linux, petabyte of storage) • Plug data, power & cooling and turn on – typically connected via optical fiber – may have network of such datacenters

Cloud Computing = Network of Datacenters

Cloud Computing Enable datacenters to coordinate over vast distances • Optimize availability, disaster tolerance, energy • Without sacrificing performance • “cloud computing”

Drive underlying technological innovations.

Cloud Computing

Vision The promise of the Cloud • A computer utility; a commodity • Catalyst for technology economy • Revolutionizing for health care, financial systems,  scientific research, and society

However, cloud platforms today  • Entail significant risk: vendor lock‐in vs control • Entail inefficient processes: energy vs performance • Entail poor communication: fiber optics vs COTS endpoint

Example: Energy and Performance Why don’t we save more energy in the cloud? No one deletes data anymore! • Huge amounts of seldom‐accessed data

Data deluge • Google (YouTube, Picasa, Gmail, Docs), Facebook, Flickr • 100 GB per second is faster than hard disk capacity growth! • Max amount of data accessible at one time