thesis - The University of Texas at Austin

3 downloads 273 Views 3MB Size Report
This thesis focuses on the logical design of binary adders. It covers topics ... the method of parallel prefix analysis can be used to unify the conventional.
*

;-' ;

.

t;. '..*. >*,^

**!

"i.' \:

W..-iJfo-.-? >;-'S.

'>-..'

'-.-'
=


addition follows that for

ripple

carry

select,

47

except the formation of the 1. for each block

(a)

Form the

conditional carry by (group generate) calculating the carry-out according to the ripple carry algorithm while holding the

Form the

zero

2. Take the

3. Perform

transfer) conjunct-ing (ant

one

carry-out from the bottom block, and select the

for the next

block,

ripple

3.8 shows

first two

zero.

conditional carry by (group all of the transfer bits together.

ing)

Figure

is left until the end:

perform both steps simultaneously:

carry- in at

(b)

sums

rows

an

continue until the top block has

a

correct

correct

carry additions in each block to form the final

8 bit carry

show the

carries derived from the

operand

ripple

add

skip

algorithm.

carry-in. sums.

of four two bit blocks. The

composed

bits. The third two

carry

carry-out

show the conditional

rows

The next

row

shows the actual

block carry outputs. The 0 conditional carry output of the bottom block is actual carry

so

outputs

selected in series.

are

carries for the row

shows the

operand

this value is

blocks, sums

2We

adders; widths

across

use

are

are

higher

The next to last

produced

found

by

performance

N/n blocks,

of this

in

row

row.

The other block

shows the intermediate

carry fashion. The bottom

ripple

column- wise exclusive- or-'mg the two

where

n

algorithm

adder of width nz- is signified with k instead of n. an

a

is linear since the carries

ripple

is the constant number of bits per block.2

the variable N for adder widths.

hence are

which

an

bits and the carry bit.

The level of

serially

which

just copied from

an

Small adders

member of such

an

be

arrayed to make larger array. Conventionally block

can

48

0

1

0

1

1

1

1

1

0

1

1

0

0

1

1

0

1h

1f

ig

0

1

0 1i

1

0

11

1

2c

1

5a

transfers

conditional carries 0 conditional carries 1

4a

1

2a

6b

1

5b

1

4b

1 7b

7c

0

Figure

0 3c

4c

5c

0

21

3b

1

1 6c

0

actual block carry outs

1

1

6a

1

1*

U

carries within block

11

0

Carry Skip Addition

3.8:

N

0( carry _skip (worst .operands))

(3.28)

Carry Lookahead

Carry Lookahead 1. Calculate bit

addition is

perfomed by following

these steps:

generates and propagates /transfers.

2. Evaluate each carry

3. Use the carries and

Carry lookahead always

"^

1

1

3.6.3

generates

1p

1

2d

2e

1

U 1o

1

1

1a

1 1n

Bbits

2b

3a

0

7a

1

1m

1

1

1b

1c

1d

0

1k

ij

0

1e

A bits

equation

in

propagates

is based

on

=

to calculates the

sums.

the fact that Boolean

be evaluated in two stages of

ci

parallel.

logic equations

can

logic.

go + PoCo

(3.29)

49

c2

=

gi-r Pigo + Pizoco

(3.30)

C3

=

g2 + P2#l + ^2:1^0 + #2:00)

(3.31) (3.32)

O(carry_lookahead(worst_operands)) Where C is

a

=

C

(3.33)

constant.

Such calculations based

fan-in/fan-out ignore

on

two level and-or

several real effects.

evaluation with unlimited

This isn't the issue

adder models which show less than linear order evaluation time

however, are

as

all

ignoring real

effects3. The issue is whether the model appropriately predicts dominate device effects for the size of

adder,

3the

this is

very

only

problem

the

propagation

case

of interest. In the

for N oi

of information

a

case

of the carry lookahead

few bits.

over a

distance is at best linear time.

4

Chapter Gate

Models for the Conventional Adders

Delay

during my study of the adder unit I got the idea of solving vir tually all statements today we speak of data or information with yes/no values. We realized that this principle could be applied to all computing machine components, especially to the control device, and led to switching algebra with the aid of propositional calculus. K. .

.

.

-

-

-

Zuse.

In this

chapter

cussed in the

show

case

gate level

path lengths through

net list is often

the gate instances with their of the named connections circuit

holds even

can

circuit

parasitics.

formally

(nets)

as a

graph. Hence,

appropriate place

xBy gate for the

level

common

transistor

between the

such

as

we mean

gates such

implementation

to start

and nets,

an

so

nand and

with

study

nor.

Such

for the gates when needed.

50

a

list

sizes,

connection can

lengths,

be

nexus

typically

of adder

objects

macros

and

interpreted

of its

physical parameters. Apparently our

we

list of all

a

the net list

adder is the

that all of the instantiated as

algorithm,

pins. Various attributes of the

device

the net list of and

an

pins labeled, along

The connection information in the net list

topology, gate implementation, list is the

net list for

dis

the net lists1.

and output

nodes, pins,

information,

a

algorithms

ASCII file which contains

an

input

be attached to the

physical

of

logical gate implementations

previous chapter. After obtaining

derive the worst A

we

logical the net

implementations.

in the net list

would be

expanded

are

macros

out to the

51

to the

According of

length

graph interpretation,

path through

a

a

count is

gate delay

the net list.2 Because net lists

are

a

be

implemented as computer

syntax akin

objects such In

to

a

net

lists, nets, pins,

general,

the

longest path through

net list

a

slowest to evaluate since evaluation time is also

geometries involved, but

is

reasonable heuristic. It

a

rate

sequence of

limiting

generate, and

path lengths

carry

since

gate delays

The

be

sum

approach

improved

model.

are

upon

of

case

by using

were

simple

3Of

list, since course

counts

function of the device

constant time

at

layout

along

with

are

with

it

In

general times,

such

an

case

expressions

of

combination of

time,

improved timing model3

an

arcs

with SPICE.

acceptable

path lengths through

inputs

can

paths.

carry

evaluation

parasitics, and simulated

typically

time

simple timing

a

optimizing multiple

of the nodes crossed instead of a

operations.

to actual evaluation

device sizes and loads

counts

there must exist

circuits,

path lengths for determing evaluation

paths recognized by

are

physics

same.

The worst result obtained from SPICE is

the net

output

not be assumed to be the

accurate estimates of the worst

should be extracted from

2In actuality, gate delay

a

can

directly proportional

not

looking

the slowest among the

as

have

steps is the propagation of the carries. Propagate,

not all the

reasonably

produce

of the conventional adder

This refinement is necessary when

To obtain

or

chapter

apparent from the previous chapter that the

was

formation are

in the

input

as

path lengths

and numbers.

as

and

as

programs. The functions in this

procedure call and accept

the

computer based

representations, functions for describing net list properties such can

simply

worst

the dual

traversed.

which activate the

paths.

case

graph

of

52

evaluation time estimate.

Note, there operators

are

five distinct usages of and in this thesis There is the

analogous).

are

gramatical

the

name

refering

of the

forms carry

a

net list

generate,

situations. For

was

'and\

'AND'. All of these

use

them

consistently;

when

example,

a

schematic

implemented.

4.1 shows three variations

variations in

according

Figure

4.1

to the

are

NAND

4.3 shows the

macros

along

general, gates

all based

higher

xor

measure

Hence,

the

is known

performance

as

gates

cause

shown in

transfer and

two

Figure

delays 4.2.

in

The

(4.1)

of two and three

input OR-NAND,

fan-in

creating

(aV6)(a5)

are

slower.

to the number of series transistors between the

this

for

on:

implementation

with the three

with

logic

implementations

ab=

Figure

the

on

propagate and generate signals,

or

the worst case,

rails;

operation,

Ripple Carry Figure

In

list,

We have endevored to

ambigous

were some

used to show how

4.1

function called out in the net

macro

to the

gate which performs this operation 'and'. Finally,

unique information.

however there was

to the

logical

version in text, 'and'. The

propositional logic operator, 'A'. The word for refering The word for

other

(the

the "stack

The

input NOR and

and AND-NOR

delay

is

macros.

roughly

output and the

power

related

supply

height".

of the OR-NAND and AND-NOR is

comparable

53

ai

ts

ai b:

ai b.

Figure

4.1: Generate and

o

Propagate Logic

Blocks

D~

>

5> Figure

4.2: XOR and XNOR Blocks

54

Vdd

Vdd

r^tdC

4

Vss

t>

>

Vss

2-NAND

2-NOR

Vdd

Vdd

s

HI

HI

>

41

2

HI Vss

Vss

3-NAND

3-NOR

Vdd

4

Vdd

4

I



ooooooooo I

I

I

I

I

I

I

I

I

I

oooooooooooooooo I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

I

oooooooooooooooo ooo o

-*

ro

oo c

oooooo 01

o> -J

oo

0) *i

o 03

O o l-i

CD O ert-

ooooooooooooo o OO o ooo

o

-

ro

oo

w

oooooo

n

>j

-

oo

-f

rt

o

o

-A

CO

k

Ol

OOO o

-k

ro

oo w*

ooooo in

m

^

a

10

91

Neither of shown in

example

be created of

speed path Note,

skip

adder has

the block sizes.

level carry

one

simplified figure

packed

by

to the left.

6.7 shows the time one

fewer

diagram

fco delay

Brent and

path through not

Kung's

is. It is apparent from context which cells contain

fco logic.

logic

chains

D).

marked with identical letters

Many adders

are

made from

Manchester carry chains with

together the

restoring

are

6.8 shows

a

hybrid

(A,B,C,

or

of different

inverter stages. time

the

on

square law based

Hence, x

view

in the

adder.

which

are

it may be useful to

axis and N

(bits)

generalized prefix graph

fco

The shared

technologies.

nonrestoring dynamic circuits

prefix graphs by placing

Figure

a

can

the circles where the

darkening

logic

are

This is

block sizes. A faster adder

equal

Figure

case

6.7

probable layout.

to

adder. There is

skip

here than in the worst

have

we

carry

by varying

optimum

an

corresponds

6.6 where the cells have been

Figure

This

6.4 and 6.5

figures

on

CMOS

combined

generalize

the y axis.

for the carry

skip

adder.

figure

6.8

chain blocks is

x2,

For

MCC block. linear we see

we

while the

that block on

skip

Thus, the skipped

time, while the

evaluting

assumed that evaluation time of the Manchester carry

a

carry

has

carry is

generates

are

3x, where seen

a

worst

case

x

traveling

parabolic.

just finished evaluating

its first pass, and

and fourth bits.

time is

For

is the "width" of the accross

the adder in

example,

for the second

at time 6ns

time, block

c

is

carry would be between the third

OTJ'OTJTJTJTJTJ'O'O (Q(Q(Q(Q(Q(Q(Q(Q(Q(Q

Qrq

"O

o

"O -1

TJ ro

"O u

"0 6

P

(Q(Q(Q(Q(Q(Q(Q(Q(Q o-*ufliso

I"

cm

cd

1=: H

p>

CD 05

O p> *~i I

CO i>

CO

OOOOOOOOOO

I"

>

6666666666

CD

a-

> -j

cr; ST

S o

CD

f1

oooooooo

P> *
666666666 i

i

i

i

N

CD 03

000 o

-t

ro

i

! ^

00

o

1

OOOOOOOOOO

o

i

o

6666666666 1

i

!

6OOOO6OOOO6O66O6 CO

6666666666

oooooo as

n

ooooo

>

o

-

10

o



/yo

D*^-

^^

t>i

{>L

S^

S>^

Co

Po p2

k1 Co

N> Figure

8.2:

Transforming

Standard

Ripple

to

Majerski Style Ripple

116

it is

Although is associative g

signals

are

in the carry

possible

(see chapters still

then k is

7 and

no

recursion in

apply Majerski's

8)

complication

a

The adder

required.

path,

to

can

be

needed

longer

a

tree because it

simplified by using it is

as

p,k,

and

place

of p

is that the group t in

just t, but the requirement

for group g remains.

8.3

Ling's

Adder

Ling suggested using crastinating

the fco variant

the final and.

direction, and do

In

same

sistent with the rest of the thesis

we

Hi+i and then the

pieces

paper the

Ling's

not follow the

can

=

opposite con

(8.33)

=

propagated

between bits:

(8.34)

>

the

pieces:

(8.35)

UHi+i

Other Adders

8.4

Reed,

et al.

the gamma of

go in the

(giVa)

by assembling Ci+i

subscripts

pro

find:

to make the carry must be

be recovered

equation 8.14, and then

initial value for carry. In the form