A Tutorial on Hybrid PLL Design for ... - Semantic Scholar

8 downloads 320 Views 1MB Size Report
A Tutorial on Hybrid PLL Design for Synchronization in Wireless Receivers ...... f rather. Fig. 15 – Illustration of DDS operation vs. operation of an analog VCO. / 2.
A Tutorial on Hybrid PLL Design for Synchronization in Wireless Receivers (companion paper to the workshop “Synchronization and Receiver Structures in Digital Wireless Communications”)

Yair Linn University of British Columbia Vancouver, BC, Canada [email protected] Abstract - Coherent reception in digital wireless communications involves generating a local carrier that is in phase with the received carrier, and then using this local carrier in order to demodulate the received signal. Generation of the local carrier is generally done via a Phase Lock Loop (PLL). The receiver also includes a symbol timing synchronization PLL, whose purpose is to determine the optimal time for sampling of the received signal in order to sample each symbol at its peak and thus minimize the error rate. The objective of the workshop entitled “Synchronization and Receiver Structures in Digital Wireless Communications” [1] is to develop a systematic method for the incorporation of digital signal processing theory into the design and implementation of hybrid PLLs used in digital communications. Additionally, the workshop aims to provide an overview of all the important elements in the receiver, their operation, and their interactions. In this companion paper we shall only outline the core material of the workshop while omitting many mathematical derivations and other important content that shall only be presented in the workshop. Nonetheless, an attempt has been made to make this paper self-contained and thus it is hoped that this paper can serve as a useful reference for workshop attendees and for others.

I. INTRODUCTION Until the mid-to-late 1980s, most literature on the subject of coherent receivers was devoted to analog implementations. In the past decade or so, some books (including [2], [3]) and many papers (e.g. [4], [5]) have been published that deal with exclusively digital receivers – i.e. those in which sampling of the received bandpass or nearbaseband signal is performed, and the entire demodulation process is done digitally. Many contemporary coherent receivers, however, have neither a completely analog nor a completely digital implementation. Such receivers have hybrid architectures similar to Fig. 1, that is where I-Q demodulation is done in the analog domain, using a coherent carrier that is controlled by a digitally implemented loop filter and phase detector. This structure is particularly suited for high-datarate communications (for example (in current technology) for symbol rates which are above 50 MHz) where sampling and

real-time processing of the IF or near-baseband signal is either impossible or uneconomical. Using the architecture of Fig. 1 allows the designer to enjoy architectural benefits which are unavailable when using a completely analog or completely digital receiver. As alluded to earlier, sampling the baseband signal rather than the IF or near-baseband signal generally allows more inexpensive samplers which operate at a lower clock rate to be used. Furthermore, the fact that downconversion of the incoming signal is done in the analog domain considerably simplifies the digital logic required, since the latter is relieved from the need to perform downconversion of the complex (i.e. I-Q) sampled signal (see [2 Chap. 4] for a description of such a completely digital receiver operating on a near-baseband signal). On the other hand, having the I-Q demodulator in the analog domain driven by an analog local carrier allows an arbitrarily high IF frequency f IF to be used with the aid of an external mixer and a fixed oscillator (see Fig. 1). Thus, with a relatively inexpensive low frequency DDS (Direct Digital Synthesizer) and an additional (relatively inexpensive) fixed RF oscillator, the use of an expensive and difficult to use high frequency VCO (Voltage Controlled Oscillator) is averted. Moreover, the long-term stability and phase noise characteristics achievable with DDS chips are difficult to attain using a VCO. The advantages of implementing the phase detector and the matched and loop filters digitally are numerous. First and perhaps foremost, the repeatability of filter and transient response specifications that can be achieved via a digital implementation is exceedingly difficult to duplicate in an analog implementation. Second, arbitrarily complicated phase detector and filter structures may be implemented, whereby that complexity is only limited by the amount of logic and computing power available for the digital section’s implementation. Finally in this very non-exhaustive list, the implementation of certain synchronization loop elements by digital means allows testability and probing with accuracy and availability that is hard to attain in a completely analog system; for example, if the loop filter is implemented digitally then its (digital) input may be monitored by a computer console or analyzed in real time using the FFT transform.

-1-

Though the digital portion of the receiver could be implemented either in hardware or in software, the workshop and this paper focus on fixed-point hardware implementations. The reason is that, first, fixed-point hardware implementations, as shall be shown, will always have distinct performance advantages due simply to the fact that fixed-point hardware implementations can always be made to operate faster than any software and/or floating point implementation. Secondly, more intriguing challenges are present when trying to design a hardware system: while a software system could be implemented in a high-level language, in contrast when implementing a fixed-point hardware system the designer must explicitly address such issues as scaling and overflow, logic resource usage, implementation of mathematical operations, etc. In this paper, we shall endeavor to bridge the gap between, on the one hand, what is known in DSP, PLL and control theory, and, on the other hand, the architectural and implementation issues pertaining to hybrid receivers. The process will begin with rigorous analysis of “classical” or analog PLL theory, and we will arrive at a step-by-step method with which we can design analog PLLs. Subsequently we shall develop a mathematical model that describes the hybrid PLL. Then, the implementation of the components of that model will be dwelled upon, culminating with the presentation of a step-by-step procedure for hybrid PLL design.

Antialiasing Filter

I(nTs) I(t)

h(t )

Matched Filter

2 cos(2π f IF t + ∆ω t + θ o )

IF Input

IF Filter

DDS (Direct Digital Synthesizer)

Loop Filter

Phase Detector

90o Fixed Oscillator

−2 sin(2π f IF t + ∆ω t + θ o ) Antialiasing Filter Q(t)

Q(nTs)

Matched Filter

h(t )

Fig. 1 – General structure of a hybrid carrier PLL for digital communications. The parts within the dashed line in are implemented digitally, while the rest are analog components (the samplers and DDS are mixed-signal components). 1/Ts is the sample rate.

II. UNDERSTANDING THE MEANING OF SYNCHRONIZATION Before delving into mathematics, it may be worthwhile to attempt to attain an intuitive understanding of the meaning of carrier and symbol synchronization in a receiver. To this end, let us assume a BPSK communications system where the baseband data pulse is rectangular, and let us look at the waveforms at the outputs of the I and Q matched filters (note that the post-matched-filter pulse shape is triangular). The system’s symbol rate is denoted as 1/T . In Fig. 2 we see the situation that occurs when there is perfect carrier and symbol synchronization (a noiseless BPSK signal is assumed). The small circles denote the samples (at 1 sample/symbol). As we can see, the receiver’s carrier is in complete synchronization with the input carrier (as seen by the fact that the Q channel is always 0) and the symbol timing synchronization loop is also working perfectly (as seen by the fact that the samples of the I channel are always taken at the symbol’s peak). Conversely, in Fig. 3 we see what happens when the carrier loop is unlocked and there is also a timing mismatch between the receiver and transmitter symbol clocks. The carrier frequency error manifests itself in the signal meandering between the I and Q channels, while the timing error manifests itself in that the samples (the small circles) are offset from the peaks of the symbols. Clearly, data cannot be recovered under these conditions. It is also instructive to look at various scenarios using I-Q graphs. In Fig. 4 we see a BPSK signal where the carrier PLL is locked. On the left-hand side of Fig. 4 we see a signal

Fig. 2 – Ideal situation: no carrier error, no timing error.

Fig. 3 - Carrier frequency error of 1/(50 ⋅ T ) . Timing error of T / 4 .

Fig. 4 – I-Q Graphs for ES / N 0 = 7 dB . Left: no timing or carrier error. Right: no carrier error, timing error of

-2-

T / 4.

with ES / N0 = 7 dB where the symbol timing recovery is perfect. On the right, we see the effects of a timing error of T / 4 . Clearly, the timing error causes some data points to cross the decision region boundary (which is the vertical line I=0), hence worsening the error rate as compared to the perfect-timing recovery case. In Fig. 5‘s left we see the effect of carrier frequency error on the I-Q graph; obviously, in this case no data recovery is possible. The case of a constant carrier phase error is shown on the right-hand side. Though data recovery is possible, the error rate will suffer tremendously as a result of the fact that many more of recovered symbols have crossed the decision region boundary I=0 (as compared to Fig. 4 left). In Fig. 6 we see the effects of carrier phase jitter. On the left, we see how an I-Q graph would look for ES / N0 = 30 dB for an ideally synchronized receiver. On the right, we see the effects of carrier phase jitter, that is a phase error θe which

Fig. 5 – I-Q Graphs. Left: ES / N 0 = 15 dB with carrier frequency error, no timing error. Right: ES / N 0 = 7 dB with constant carrier phase error of 0.4π , no timing error.

has a zero-mean and Gaussian distribution with var(θe ) = 20o . Although for BPSK and ES / N0 = 30 dB the effects upon the error rate will be mild, the effect upon the error rate would be much more grave for a lower ES / N0 and/or a higher modulation index (e.g. QPSK, 8-PSK, etc.). Clearly, as we’ve seen in this section, achieving carrier and symbol timing synchronization is crucial for proper data recovery. In the receiver, the synchronized carrier and symbol clock are generated by using PLLs. In this paper we treat only carrier synchronization, though the application to symbol timing PLLs is straightforward (see [1])

Fig. 6 – I-Q graphs for ES / N0 = 30 dB . Left: no carrier or frequency error. Right: Gaussian carrier phase jitter with

yi (t ) yo (t )

III. ANALOG PLL THEORY A PLL’s purpose is to generate an output signal yo (t ) that is in phase and frequency synchronization with the input signal yi (t ) . A general schematic of the PLL is shown in

Fig. 7. There, we have yi (t ) = A ⋅ sin (ωi t + θ i (t ) ) and

yo (t ) = B ⋅ cos (ωi t + θ o (t ) ) . The fact that both yi (t ) and

yo (t ) contain the term ω i t does not necessarily mean that we assume that their frequencies are equal; we have simply incorporated the frequency-error induced phase difference, defined as ∆ ω ⋅ t , into θ0 (t ) (see [6 Sec. 3.1]). The purpose of the PLL is to control the VCO in such a manner that θi (t ) = θ0 (t ) . Note that yi (t ) and yo (t ) are in quadrature (i.e. one is a sine and the other is a cosine) but a signal that is phase-coherent with yi (t ) is easily obtained (when in lock) via a 900 phase shift network applied to yo (t ) (not shown in Fig. 7). A. The equivalent baseband nonlinear model The problem with PLLs is that θi (t ) , the phase information upon which the PLL operates, is “hidden” within the input signal yi (t ) = A ⋅ sin (ωi t + θi (t ) ) . The only way that we can extract phase information upon which the PLL can operate is to use a nonlinearity in the phase

Phase Detector

var(θe ) = 20o .

Loop Filter

VCO

Fig. 7 – General diagram of a PLL showing input and output signals.

detector. This makes exact PLL theoretical analysis very difficult because we are dealing with nonlinear differential equations. For example, consider the phase detector defined as PD ( yi (t ), yo (t )) = yi (t ) ⋅ yo (t ) . We then have: PD ( y i (t ), y o (t )) = y i ( t ) ⋅ y o (t )

= A sin (ω i t + θ i ( t ) ) ⋅ B cos (ω i t + θ o ( t ) )

(1) AB = ( sin (θ i (t ) − θ o (t ) ) + sin(2ω i t + θ i (t ) + θ o (t )) ) 2 If we assume that the loop filter obliterates the double frequency term, then we can denote the effective phase detector output as: P D ( y i ( t ), y o ( t )) = 0 .5 A B sin (θ i ( t ) − θ o ( t ) ) (2) Note that (2) is a nonlinear function of θi (t ) and θo (t ) . Analysis of the PLL is done in the phase domain. We already saw in (2) that the effective phase detector output is a function of θi (t ) and θ o (t ) , not of yi (t ) and yo (t ) . Thus, an equivalent baseband nonlinear model of the PLL is given in Fig. 8.

-3-

θ i (t )

Let’s make some preliminary and general assumptions regarding the loop components, as shown in Fig. 9: •

θ o (t )

The phase detector is a nonlinear device that gives as its output a function of the phase difference θi (t ) − θ o (t ) . We define this phase detector function as PD(θi (t ) − θo (t )) .



The loop filter is linear time invariant (LTI) with impulse response f (t ) .



The VCO is a device that outputs a frequency as a function of the voltage. Frequency is the derivative of the phase. Therefore, we can write for the VCO d (θ o (t ) ) = KV ⋅ u2 (t ) . In the Laplace domain, the dt VCO transfer function is therefore, from inspection,

V ( s)

θo (s)

U 2 ( s)

=

KV . We see that in the phase s

VCO

θi (t ) θ o (t )

PD(θi (t ) − θo (t ))

(3)

We can combine (3)-(5) to reach the differential equation: (6)

Eq. (6) is problematic because it is a nonlinear differential equation (because the function PD is generally nonlinear). For the forthcoming nonlinear analysis, we assume the phase detector has a sinusoidal function, i.e. that: PD(θi (t ) − θ o (t )) = K d sin(θi (t ) − θ o (t )) (7) is called the phase detector gain. For example for the Kd multiplier

phase K d = AB / 2 .

detector

we

have

from

(2)

that

1 1 + sτ 1

3) Lag-lead filter The lag-lead filter is:

1 + sτ 2 1 + sτ 1

(10)

1 + sτ 2 sτ 1

(11)

C. Nonlinear PLL equations Using these filter functions, we can arrive at nonlinear differential equations for each of the filter cases. Let us define for convenience the loop gain as K K d K a KV . We then can use eq. (6), along with the equivalence of the Laplace variable to the derivative operator in the time domain (that is, s = L {d (•) / dt} ) to arrive at [6 Sec. 3.5]: •

For F ( s ) = K a :

dθ o = K sin(θi − θ o ) dt



B. The loop filter The loop filters for the vast majority of PLLs are extremely simple (see Sec. III.G and [7 Sec. 2.4]). We thus limit ourselves to discussing the following loop filters. 1) Constant gain The simplest loop filter is simply a constant gain, as follows: F (s) = K a (8) 2) Lag filter The lag filter is a simple lowpass filter:

F (s) = K a

u2 (t )

Fig. 9 – Nonlinear equivalent baseband model of the PLL.

F ( s) = K a

(5)

d (θ o (t ) ) = KV ⋅ ( PD(θi (t ) − θ o (t )) ⊗ f (t )) dt

u1 (t )

4) Integrator-lead filter The integrator-lead filter is:

(4)

d (θ o (t ) ) = KV ⋅ u2 (t ) dt

u2 (t )

d (θ o (t ) ) = KV ⋅ u2 (t ) dt

F (s) = K a

u1 (t ) = PD(θ i (t ) − θ o (t )) u2 (t ) = u1 (t ) ⊗ f (t )

Loop Filter

Fig. 8 – General equivalent baseband model of the PLL in the phase domain.

domain the VCO is an integrator. Thus we have the following:

u1 (t )

Phase Detector

τ1 •

τ1

For F ( s ) = K a

(12)

1 : 1 + sτ 1

d 2θ o d θ o + = K sin (θ i − θ 0 ) dt 2 dt For F ( s ) = K a

1 + sτ 2 : 1 + sτ 1

d 2θ o dθ o + = K sin (θ i − θ 0 ) dt 2 dt  d θ dθ  + Kτ 2 cos (θ i − θ 0 )  i − o  dt   dt

(9) •

-4-

(13)

For F ( s ) = K a 1 + sτ 2 : sτ 1

(14)

τ1

d 2θ o = K sin (θ i − θ 0 ) dt 2  dθ i dθ o  + K τ 2 cos (θ i − θ 0 )  −  dt   dt

θi (15)

general block diagram of the linearized PLL model is shown in Fig. 10. There is an important conclusion from the assumption of small θe (t ) (where θ e (t ) θ i (t ) − θ o (t ) ) that led to the linear model. We can deduce that the linear model is valid for small θ e (t ) for any phase detector for which

PD(θe (t ))  → K dθe (t ) . In fact, all phase detectors θ e →0 have

good

performance will have PD(θe (t ))  → K dθe (t ) . Thus, we can conclude that θ →0 e

the linear model is valid for all phase detectors, provided that θ e (t ) → 0 . The only thing we need to find out in order to model a particular phase detector is to ascertain its K d . Using the small signal assumption θ e = θi − θ o → 0 we can simplify (12)-(15) to the following linear differential equations (using and cos(θi − θo ) → cos(0) = 1

sin(θi − θo ) → θi − θo ): •

For F ( s ) = K a :

dθ o + Kθ o = Kθ i dt •

τ1 •

For F ( s ) = K a

(16)

1 : 1 + sτ 1

d 2θ o d θ o + + Kθo = Kθi dt 2 dt For F ( s ) = K a

(17)

1 + sτ 2 : 1 + sτ 1 dθ o

d 2θ o dθ i + (1 + K τ 2 ) + Kθo = Kτ 2 + K θ i (18) 2 dt dt dt • For F ( s ) = K a 1 + sτ 2 : sτ 1

τ1

τ1

d 2θ o dθ o dθ i + Kτ 2 + Kθ o = Kτ 2 + Kθi 2 dt dt dt

F ( s)

V ( s) =

θo

The filters are easy to implement in analog form (see [7 Chap. 2]). However, we shall not discuss the analog implementation of the loop filters because we are interested implementing them digitally. D. Linearized PLL model Eqs. (12) to (15) are nonlinear differential equations which are very difficult to solve theoretically (although it should be noted that they are very easy to simulate). Thus, for theoretical analysis, what we can do is to assume that the PLL is locked and that the phase error is small enough so that we can approximate sin(θi (t ) − θ o (t )) ≈ θi (t ) − θ o (t ) . A

which

θe

Kv s

Fig. 10 – General block diagram of linearized PLL model. Loop Filter VCO

θi +

θe

-

Kd

F ( s) = K a

1 + sτ 2 1 + sτ 1

V (s) =

θo

Kv s

Fig. 11 – Linearized PLL model using the lag-lead filter.

E. Solutions of the linearized PLL equations Once we are given θi (t ) we can solve the linear differential equations (16)-(19) in the time domain (see for example [7 Chap. 4], [6 Chap. 5]). However perhaps a more informative approach is to solve the equations in the Laplace domain. We exemplify this for a PLL using the lag-lead filter

1 + sτ 2 . The open loop function is: 1 + sτ 1 K F ( s ) KV G ( s ) = K d F ( s )V ( s ) = d s F (s) = K a

(20)

The closed loop system function of the system in Fig. 11 is: 1 + sτ 2 KV Kd Ka θo ( s) G(s) 1 + sτ 1 s H (s) = = = θi ( s) 1 + G ( s) 1 + K K 1 + sτ 2 KV d a 1 + sτ 1 s (21) K τ K 2 s+ K d K a KV (1 + sτ 2 ) τ1 τ1 P(s) = = s (1 + sτ 1 ) + K d K a KV (1 + sτ 2 ) s 2 + 1 + Kτ 2 s + K Q ( s ) τ1 τ1 This corresponds to the transfer function of a second order system, whose denominator is often written in the form of: Q ( s ) = s 2 + 2ζωn s + ωn 2 (22) where ζ is known as the damping ratio and ωn is known as the natural (radian) frequency. ζ

and ω n have been extensively studied in control literature, and their effects on the temporal and frequency response characteristics of control loops have been thoroughly documented. The reader is directed for example to [7], [8], [9],and [10] for more details. Comparing (21) and (22), we find that: ωn = K / τ 1 (23) and using (21), (22) and (23):  1 Kτ  1 ω 2  1 =  n + ωn 2τ 2  ζ = + 2   2ωn  τ 1 τ 1  2ωn  K (24) ωn  1 1 K 1 = τ 2 +  = τ 2 +  2  K  2 τ1  K and by plugging (23) and (24) into (21) we get:

(19)

-5-

ωn   2  2ζ − K  ωn s + ωn  H ( s) =  2 s + 2ζωn s + ωn 2

Table 1 – PLL Characteristics for 1st and 2nd-order PLLs (from [11 App. A], [10 Chap. 2])

(25)

F (s)

Ka

As is customary, we assume a high-gain loop, as defined by: K >> ωn (26) There is no true loss of generality incurred by making this assumption, as virtually all PLLs (indeed, all control loops) are intentionally designed to have a high loop gain. This is because a high loop gain makes the loop bandwidth and damping factor relatively insensitive to variations in the gains K d , K a , and KV (see [8 Chap. 14]). After making the assumption of (26), eq. (25) can be reduced to:

H (s) =

2ζωn s + ωn 2 s 2 + 2ζωn s + ωn 2

(27)

which is a transfer function form that has been analyzed extensively in the PLL and control systems literature (e.g. [7], [8], [12], [10]). Table 1 lists several key PLL characteristics as a function of the loop filter. We want to be able to set ζ and ωn independently, and at the same time ensure a high loop gain, that is K >> ωn . The only filters that allow us to do this are

1 + sτ 2 1 + sτ 2 . There is an and F ( s ) = K a 1 + sτ 1 sτ 1 1 + sτ 2 when implemented advantage in using F ( s ) = K a 1 + sτ 1 F ( s) = K a

Ka

1 1 + sτ 1

Ka

1 + sτ 2 1 + sτ 1

Ka

1 + sτ 2 sτ 1

ωn

ζ

H ( s)

K

---------

K s+K

K

1 2 Kτ 1

ωn 2 s + 2ζωn s + ωn 2

1 K 1 τ2 +  2 τ1  K 

ωn   2  2ζ −  ωn s + ωn K  s 2 + 2ζωn s + ωn 2

τ1 K

τ1 K

τ1

τ2 2

K

τ1

2

2ζωn s + ωn 2 s + 2ζωn s + ωn 2 2

Sec. 2.5]). In contrast, 2nd-order PLLs are unconditionally stable. Therefore higher-order loops should be avoided if possible. This is why the vast majority of PLLs in digital communications are 2nd-order [7 Sec. 2.4].

IV. CHOICE OF PARAMETERS FOR THE ANALOG PLL A. Choosing ζ The choice of ζ is determined by our desire to achieve the fastest PLL response but with minimal overshoot. Moreover, we would like ζ to be optimal in terms of noise performance. To aid us in our choice we can look at [7 Table 7.2], [7 Chap. 2], and [13 Chap. 2]. We choose ζ = 0.95 because:

digitally because it has a bounded dynamic range (whereas, the integrator in the denominator of F ( s ) = K a

1 + sτ 2 sτ 1

ƒ It is a good compromise between the optimal values for several optimization criteria (see [7 Table 7.2]) ƒ The response is underdamped (i.e. fast response), but not too much (i.e. not too much overshoot) (see [7 Chap. 2] and [13 Chap. 2])

could potentially cause overflow problems in the digital loop filter under certain conditions). We thus choose

F ( s) = K a

1 + sτ 2 for the remainder of this paper. 1 + sτ 1

ƒ We get a phase margin of about 750 , which doesn’t increase much if we increase ζ further (see [13 Fig. 2.4-3])

F. The phase error response We can easily arrive at the transfer function that determines relates the phase-error θ e ( s ) to the input phase:

θi ( s) − θ o ( s) θ (s) = 1− o = 1 − H (s) θi ( s) θi ( s) 1 + sτ 2 For our choice of filter F ( s ) = K a we have 1 + sτ 1 H e (s)

θe (s) θi ( s)

K >> ω n

H e (s) ≈

s2 s + 2ζω n s + ω n 2 2

(28)

(29)

G. 3rd-order and higher order loops Sometimes, for very specialized applications, a 3rd-order or higher order PLL is necessary. This is generally necessary to be able to accurately track phase signals which include frequency accelerations or higher-order frequency changes. The problem with higher-order loops is that there is always the threat of them being unstable depending upon the loop gain. This can be easily seen in the Root-Locus plot (see [7

B. Finding the optimal natural radian frequency Before going on to choose ωn (or , equivalently, the natural frequency f n ω n / 2π ) let’s characterize the PLL a little more. For a given PLL, some fundamental questions are: What is the frequency offset for which we are guaranteed that the PLL will lock within a given time? (Answer: the Lock Range ∆ω L , and it will lock within TL seconds, the lock-in time); What is the largest frequency step that the PLL can handle without losing lock? (Answer: the Pull-out Range ∆ω PO ); What is the maximum frequency range where we are guaranteed that the PLL will lock eventually? (Answer: the Pull-in Range ∆ ω P ); What is the maximum frequency range within which the PLL can maintain lock if the input frequency is changed very slowly? (Answer: the Hold Range ∆ω H ). In Table 2 we see PLL

-6-

parameters for the second order PLL. For more information see [10 Chap. 2]). Where should the PLL be operating? The pull-in process is too unreliable and too long for us to count on during normal operation. Therefore, we must design our PLL so that it operates in the lock range ∆ωL (note that the PLL will lock

Table 2 – PLL Parameters for second-order PLL with loop filter

1 + sτ 2 (see [10 Chap. 2], [7 Chap. 4]). 1 + sτ 1 Parameter Meaning Value ∆ω L Lock Range ∆ω ≈ 2ζω

F (s) = K a

L

within TL seconds within the range ±∆ωL ). This, in turn, influences our choice of ω n and may affect the stability requirements upon the transmission carrier. Additional considerations in choosing ω n emanate from the

desire

to achieve minimal phase-error jitter 2 2 θ e = (θ i − θ o ) . For detailed treatment of such optimizations, see for example [14] (especially [14 Chap. 10]) as well as [6 Chap. 8]. Here, we only briefly present the main aspects of the optimization procedure. Essentially, we want our output carrier to follow the input carrier exactly. Therefore, we want our PLL to pass the input carrier phase noise ⇒ this means we want large ω n . Conversely, we want our output carrier to suppress the contribution of thermal-induced phase noise ⇒ this means we want small ωn . These two constraints are contradictory. We therefore must optimize, i.e. find the best ωn that will pass as much input carrier phase noise as possible but at the same time suppress as much thermal-induced phase noise as possible. It can be easily shown that this is optimization of ωn in order to minimize θ e 2 = (θ i − θ o ) 2 . Formally, suppose that the input carrier phase noise has a spectrum Φ θ ( f ) and that Pi the thermal-induced phase noise has a spectrum Φ θ ( f ) . It Ti can be shown that the phase-error variance will be [11 App. A]: ∞



θ e 2 = ∫ Φθ ( f ) H ( j 2π f ) df + ∫ Φθ ( f ) 1 − H ( j 2π f ) df 2

Ti

0

2

Pi

n

TL

Lock-in time

TL ≈ 2π / ω n = 1/ f n

∆ω PO

Pull-out range

∆ ω PO ≈ 1.8ω n (ζ + 1)

∆ω P

Pull-in range

∆ ω P ≈ 4 2ζω n K d K V π

∆ω H

Hold Range

∆ω H ≈ K

the PLL must operate at various SNRi , so we should design the PLL for the worst-case value (i.e., the lowest SNRi ). The procedure is shown in Fig. 12. It is quite straightforward. The most difficult step is determination of the optimal ω n , since this involves taking into account the carrier phase noise characteristics as well as computing the effects of thermal noise (see [14 Chap. 10], [6 Chap. 8]). A. Design of PLLs for digital modulations Although Fig. 12 was developed with no specific modulation in mind, it is entirely applicable to any modulation type. The only modulation-dependant things are (a) the phase detector and its gain K d , and (b) the squaring loss (which influences the noise calculations and hence the choice of ω n , see for example [14 Chap. 10], [7 Sec. 11.2]). Both K d and the squaring loss are easy to calculate: K d is calculated from the slope of the phase detector S-Curve at θ e = 0 (can be done via simple open-loop simulations), while the squaring loss is calculated either theoretically or via simple open-loop simulations (see for example [16],[17],[18], [2 Chap. 5, 6]).

VI. HYBRID IMPLEMENTATION OF THE PLL

(30)

0

minimized. C. Other optimality criteria Minimum phase jitter is not the only optimization criterion of the loop. The loop can be optimized for something else (e.g. to maximize the sweep rate, minimize the pull-in time, etc.) However, the minimum-jitter criterion is the most important and most widely used criterion in wireless communications. It is easy to see why: phase-jitter has a direct effect upon the error rate (see for example Fig. 6 and [3 Fig 2.9]).

A. Preliminary assumptions In this paper we are dealing with a fixed-point hardware implementation of the hybrid receiver in Fig. 1. We shall assume that the system has a high data rate (for example above 1 MHz) for, as discussed in the introduction, it is for such high data rates that the hybrid receivers become particularly attractive. For the purposes of this discussion it is sufficient to broadly characterize the phase detector sample rate. A phase detector in digital communications usually produces a new phase estimate for every symbol, and thus the phase detector output sample rate is assumed to be the same order of magnitude as the symbol rate. Thus, if we denote the phase detector output sample rate as f p = 1/ Tp ,

V. A STEP-BY-STEP PLL DESIGN PROCEDURE

then we have that 1/ T p ∼ 1/ T , where 1/ T is the symbol

The optimization according to minimal phase jitter (=minimum θ e 2 ) is to design the PLL so that (30) is

We are now ready to present a step-by-step PLL design procedure. Assumed given are the input carrier phase noise characteristics and the input SNR (Signal to Noise Ratio) denoted as SNRi (see [15 Chap. 6] or the workshop notes [1] for a detailed treatment and definition of SNRi ). Clearly,

rate and “~” denotes equal orders of magnitude. Again, since we are dealing with high datarate communications systems, this rate is in the order of at least several MHz. Carriers that are used in coherent communications, generally have phase noise whose content can be assumed non negligible up to a distance from the carrier of 1 KHz to

-7-

θi

+

θe

-

Phase Detector Data Rate = 1/ Tp

Kd

Loop Filter

B( z)

Direct Digital Synthesizer

θo Fig. 13 – Hybrid PLL equivalent linearized baseband model.

We choose 1+ sτ 2 F(s) = Ka 1+ sτ1

We choose

ζ = 0.95

Compute loop filter gain K a , making sure it is

waveform whose frequency corresponds to that tuning word. The DDS consists of a Numerically Controlled Oscillator (NCO) that is the input to a Digital to Analog Converter (DAC), which is followed by smoothing filter that rejects the image frequencies that appear at the DAC output ([11 App. A], [20]). It is stressed that the frequency waveform produced by the DDS is phase continuous, regardless of the frequencies through which its output transitions. This is a fundamental requirement if the DDS is to be used in a PLL. The linearized equivalent PLL model for Fig. 1 is shown in Fig. 13. An example of a DDS is the AD9851 chip manufactured by Analog Devices, discussed in [20]. The advantages of using a Direct Digital Synthesizer instead of a VCO include:

large enough so that the loop gain K = K d K a KV ωn

Decide upon ζ (usually 0.8 ≤ ζ ≤ 1.3)

and that steady-state errors are small enough

Compute loop filter poles and zeros. 1 + sτ 2 For F(s)=K a we have 1 + sτ 1

τ1 =

K

ωn 2

and τ 2 =



ωn



1 K

C hoose ω n a c c o rd in g to p e rfo rm a n c e re q u ire m e n ts Fig. 12 – Step-by-Step PLL design procedure for analog PLLs.

at most several KHz; for example, a common specification for a frequency source used for coherent digital communications is that its phase noise have a spectral density of about −70 dBc / Hz at a 1 KHz distance from the carrier. Such a specification can be found for example in the DVB-S2 standard (the European home satellite dish broadcasting standard), where the specification is −68 dBc / Hz at a 1 KHz (see [19]). For coherent communications f n is in general the same order of magnitude as the bandwidth of the significant phase-noise content of the received carrier. While this bandwidth is thus specific to the particular carrier and operating conditions of the particular receiver, we can safely say that the order of magnitude for f n is at most several KHz . An important conclusion can now be inferred. We can safely say that: f p >> f n (31) The relation expressed in (31) will turn out to be of pivotal importance. In fact, it is the relation in (31) that differentiates the analysis of the system discussed in this paper with that of just any arbitrary hybrid control loop. B. The Direct Digital Synthesizer (DDS) In the hybrid implementation of the PLL, the loop filter is implemented digitally, and the VCO is implemented though the use of a Direct Digital Synthesizer. A Direct Digital Synthesizer (DDS) is an Integrated Circuit (IC) that accepts a digital control word (heretofore referred to as the tuning word) at its input, and outputs a continuous-time sinusoidal

1.

The frequency of the DDS can be controlled precisely by the tuning word, typically to a precision of a fraction of a Hertz.

2.

The phase noise and frequency drift of the DDS is primarily determined by its clock source, which can be a fixed-frequency crystal oscillator that can have excellent characteristics at very inexpensive prices.

3.

Parasitic analog effects, which often plague VCO circuits, are eliminated when a DDS is used. This includes for example the sensitivity of the VCO to its power supply; often, the sensitivity of the VCO to the power supply causes frequency modulation of the VCO output due to power supply fluctuations, thus severely hampering the VCO’s operation.

As seen in Fig. 14, the tuning word controls the output frequency by determining by how much the phase of the output wave will increase each rising edge of the reference clock. Denoting this clock rate as f D = 1/ TD , the number of bits in the phase accumulator register as N and the tuning word as wD , we have that the DDS’s output frequency is: wD 1 wD = fD (32) 2 N TD 2 N Of course, (32) is only valid so long as the Nyquist criterion is observed, namely that there are at least two samples of each period of the output sinusoid, so that the DAC can reconstruct the waveform without aliasing. This means that (32) is valid so long as: w f f w = ND f D < D (33) 2 2

-8-

fw =

which will only occur if wD < 2 N −1 . Thus the maximum value of the control word that can theoretically produce a unaliased sinusoid will be wD = 2 N −1 − 1 , and that waveform will have a frequency of

fw =

2N −1 −1 f N →∞ ⋅ fD  → D. N 2 2

However, this limit cannot be achieved nor approached in practice because the post-DAC filter needed to separate the fundamental frequency from its first image (which is at f D − f w , see [20]) will be unrealizable. A more practical limit which allows for a realizable output filter is f w ,max ≈ 0.4 f D .

VII. MATHEMATICAL MODELING OF THE DDS From Fig. 14 it emerges that the DDS outputs a given frequency for a given tuning word at its input. Assume that the control word of the DDS is updated at a rate of f u = 1/ Tu Hz. f u is much slower than the reference clock of the DDS f D , because several samples of the NCO must by traversed between updates of the control word, if those updates are to be expressed at the DDS output. This is further merited because of the time it takes to physically program the new control word into the DDS (i.e. send the right control pulses and data values to the appropriate pins Fig. 14 – Explanation of DDS Operation. of the DDS chip). Thus we may write: f u > f n

(40) The difference in magnitudes expressed in (40) can be quite dramatic. Consider for example a 100 MHz symbol rate data system, i.e. 1/ Tp = 100 ⋅106 Hz . A typical value for f n is f n = 2000 Hz . A typical value for the DDS update rate in this case, as we shall see in the subsequent sections, would be for example 1/ Tu = 2 ⋅10 6 Hz . Given the

Fig. 18 – Hybrid PLL model with DDS equivalent model inserted. Phase Detector

θi +

θe

-

Data Rate = 1/ T p

Kd

Decimation Filter

−π M

θo

π

Decimator

Loop Filter

↓M

B(z)

M

DAC

sometimes substantial disparity between f p and f u , one would strive to achieve two goals. First, we want to find the minimum value of f u that would allow the PLL to function

ZOH

VCO



appropriately. Secondly, we want to implement the digital loop filter for the much smaller sampling rate of f u rather

fD 1 ⋅ 2N s

Tu e

− sTu 2



Digital Sequence to Impulse Train Converter rate = 1/Tu

sinh ( sTu / 2 ) sTu / 2

DDS Equivalent Model Fig. 19 – Hybrid PLL model with decimation.

- 10 -

than at rate f p . Having established the goal of implementing Sample Rate fu

B ( z ) at a sampling rate of f u , it is clear that the sampling rate of the data at the input of B ( z ) needs to be reduced

θi +

through use of decimation, i.e. by filtering by a decimation filter and then using a decimator. This is shown in Fig. 19. The decimation ratio in Fig. 19 is: M = Tu / Tp = f p / fu (41) An implicit assumption in the development of the model of Fig. 19 is that the spectral content of K dθe (nTp ) is fully contained in the discrete-time frequency interval of Ω ∈ [ −π / M , π / M ] , or, equivalently, that K dθ e (t ) is fully contained in the continuous domain in the interval  π f p π f p   fu fu  , where the last equality , ⋅  = − ,  − ⋅  M 2π M 2π   2 2  was facilitated by (41). This always holds. To see why, let’s define f ∈ [− f max , f max ] as the frequency bandwidth of

K dθe (t ) . It is easily seen (even intuitively) that we will have f max ∼ f n , say for example f max = 2 f n . But from (40) we have f p ≥ f u >> f n so we also have f p ≥ fu >> f max . VIII. CALCULATION OF THE DIGITAL LOOP FILTER The models obtained in previous sections allow us to now to compute B( z ) that so that a desired closed loop transfer function for the PLL is achieved. In this section, we shall assume that the virtual DAC (which is part of the DDS equivalent model), the decimation filter, and the decimator are all ideal (in later sections, we derive the conditions that ensure that these assumptions are appropriate). Under those conditions, the succession sampling at rate f p -decimation filter-decimator by M is equivalent to sampling the signal Kdθe (t ) at a rate of fu . Thus, under the assumptions of this section, we may simplify Fig. 19 to Fig. 20. A. How do we determine B(z)? If one compares Fig. 20 to Fig. 11 it is seen by inspection that the blocks amalgamated and denoted collectively as Fˆ ( s ) in Fig. 20 correspond to the loop filter F ( s ) that appears in Fig. 11. Since all other loop components are equal in both figures, if we can design a filter B ( z ) so that

Fˆ ( s) = F ( s) this will imply that the closed loop transfer function of both figures will be identical. Consequently, it can immediate be seen from the structure of Fˆ ( s ) that B ( z ) may be deduced from F ( s ) by using the bilinear transformation method [21 Sec. 7.1]. We now proceed to find B ( z ) using that method. The filter F ( s ) = K a 1 + sτ 2 has a pole at s = − 1 / τ 1 and 1 + sτ 1 a zero at s = − 1 / τ 2 . To use the bilinear transformation, we must first pre-warp[21 Sec. 7.1] these two frequency points:

θe

-

Fˆ ( s)

Sample Rate fu

Loop Filter



Ideal DAC

B(z)

Kd

VCO

θo

fD 1 ⋅ 2N s

Fig. 20 – Hybrid PLL model, assuming ideal virtual DAC, ideal decimation filter, and ideal decimator.

1

=

 πT   πT  1 1 1 tan  u  and = tan  u  π Tu τ 2 π Tu  τ1   τ2 

(42) τ1 and then construct the pre-warped transfer function of the analog filter: 1 + sτ 2 D( s) = K a (43) 1 + sτ 1 Now we can employ the bilinear transformation [21 Sec. 7.1] in order to determine B( z) as follows:

 1 − z −1   τ 2 1 + z −1   = Ka 2  1 − z −1  1+  τ 1 Tu  1 + z − 1  1+

B ( z) = D (s) s=

2  1− z − 1  Tu  1+ z − 1 

 2τ 2  2τ 2   1 +   1 +  1 − T Tu   u  = Ka  ⋅   2τ 1   τ 2 1 1 +   1 + 1 − Tu   T  u   Now let us define:  2τ   2τ   β1 = 1 − 2   1 + 2   Tu   Tu   

 2τ  α1 =  1 − 1  Tu  

 2τ 1 1 + Tu 

  

     

2 Tu

 1 +   1 + 

  −1  z  2τ 1   − 1  z Tu   2τ 2 Tu

      

(44)

(45)

and:



2τ 2   2τ 1   1 +  Tu   Tu   then B( z ) may be expressed as:

γ = K a 1 +

(46)

 1 + β 1 z −1  B(z) = γ ⋅  −1   1 + α1 z 

(47)

The transition to direct-form II implementation is immediate [21 Chap. 6] and is shown in Fig. 21.

IX. EFFECTS OF NON-IDEAL VIRTUAL DAC AND COMPUTATION OF REQUIRED DDS UPDATE RATE The derivations of the previous section assumed that the (virtual) DAC converter (which is part of the DDS mathematical model (Fig. 16)) could be considered ideal. However, that DAC is not ideal, and in fact it was shown that it is a DAC of the ZOH type (Fig. 17). As shall be shortly seen, this observation has a profound impact on the

- 11 -

PLL’s performance and serves to constrain the rate f u

x(n)

y(n)

above a certain minimum, which will be calculated. Modifying Fig. 20 to account for the DAC’s non-ideality, while still assuming that the decimation process is perfect, results in the system depicted in Fig. 22. A. Influence of virtual DAC nonideality on the phase

margin of the PLL While amplitude interference from aliases passed by the non-ideal virtual DAC will cause a degradation in performance, it will not in general lead to PLL instability (see [1]). In contrast, as we shall now see, the effect upon the phase response of the signals in the PLL is much more important and will lead to a higher necessary minimum f u . We analyze the effect upon the phase response by considering the effect on the phase margin of the PLL. If the DAC were ideal, we would have that Fˆ ( s) = F ( s) (see Fig. 20 and Fig. 11), and then the open loop transfer function of the hybrid PLL would be G ( s ) = K d F ( s ) K V / s . We now assume that the aliases contribute negligibly to the signal at the output of the DAC (see [1] for justification). Furthermore, we assume that the ZOH magnitude distortion of the primary reconstructed signal is negligible (this is true due to fu >> f max ; see Sec. VII and [1]). Under these

−α1

− sTu

Fˆ ( s ) Sample Rate fu

θ i + θe -

for those systems is:

PM ( G ) =

where

G ( j 2π f C ) − (−180) Degrees

(49)

G ( j 2π fC ) is in degrees. If we now turn our

attention to the system of Fig. 23, we have from (48) that:

Gˆ ( j 2π fC ) = e − jπ fCTu G ( j 2π fC ) = G ( j 2π f C ) = 1

( )

Gˆ ( j 2 π f C ) − ( − 180)

(e

G ( j 2 π f C ) − ( − 180)

= =

− jπ f C Tu

)

G ( j 2 π f C ) − ( − 180) +

exp( − jπ f C Tu )

Kd

Tu e

B( z)

− sTu 2

ZOH

sinh ( sTu / 2 ) ⋅ sTu / 2

VCO

fD 1 ⋅ 2N s Fig. 22 – Hybrid PLL model, assuming ideal decimation but non-ideal DAC. 2π

Fˆ ( s ) Sample Rate fu

θi +

-

θe

Loop Filter Kd

B(z)

DAC Ideal DAC

exp(− sTu / 2)

θo VCO



fD 1 ⋅ 2N s

Fig. 23 – Hybrid PLL model, assuming DAC incurs only phase distortion of the fundamental reconstructed signal.

lead to instability of the PLL (see [13 Chap. 2]). Thus we must find a condition on fu so that that decrease is tolerable. If we are willing to tolerate a decrease in the phase margin of d A degrees, we see from (51) that this means that

180 ⋅ fCTu < d A

(52)

or:

(50)

that is, the crossover frequency remains unchanged. Evaluating the phase margin of Gˆ ( s) , we have:

P M Gˆ =

Loop Filter

DAC Digital-Sequence to Impulse Train Converter rate = 1/Tu

θo

− sTu

frequency in which G ( j 2π f C ) = 1 , then the phase margin

β

1 Fig. 21 – Direct-form II implementation of B(z).

assumptions, we can reduce Fig. 22 to the circuit of Fig. 23. A quick comparison of Fig. 23, Fig. 20, and Fig. 11, shows that in this case the open loop transfer function will be:

Gˆ ( s ) = K d Fˆ ( s ) KV / s = K d e 2 F ( s ) KV / s = e 2 G ( s ) (48) The phase margin of a PLL is defined as the number of degrees above –180 that the phase of the open loop response possesses when its magnitude is unity [13 Chap. 2]. As a formula, this means that for the systems of Fig. 20 and Fig. 11, if f C , which we call the crossover frequency, is the

γ

Z-1

(51)

fu >

We see that the effect of the phase shift is a decrease in the phase margin by 180 ⋅ fCTu degrees. A decrease in the phase margin in an extremely undesirable occurrence, as it may cause an underdamped response of the loop and could

(53)

It can be shown that for a loop filter in the form of 1 + sτ 2 we have [13 Sec. 2.4]: F (s) = K a 1 + sτ1

f C ≈ 2ζ f n

(54)

Thus from (53) and (54) we have:

fu >

= P M ( G ) − 180 ⋅ f C Tu D egrees

180 ⋅ fC dA

180 ⋅ 2ζ f n dA

(55)

Exact linear-system theory analysis of models which include terms of the type exp(− sτ ) is impossible due to the non-polynomial nature of this term and the infinite number of root-locus branches which such terms generate [8 Sec.

- 12 -

7.12]. Instead, we compute the effects on the phase margin (PM) and, then, (if we assume F ( s ) = K a

1 + sτ 2 ) using the 1 + sτ 1

ζ ≈ 0.01 ⋅ PM (see [13 Fig. 2.4-3]) we can define an effective ζ , denoted as ζ eff , which is defined as relationship

ζ eff

0.01 ⋅ PM . We can then think of the PLL with time

delay as a second-order system with damping factor

ζ eff .

Acceptable values of ζ are 0.7 ≤ ζ ≤ 1.5 , where for carrier synchronization PLLs recommended values are generally 0.8 ≤ ζ ≤ 1.3 (see [7 Chap. 7]). So we can’t play with ζ in (55) but only with f u . If we wish to limit the degradation in the phase margin to d A = 100 w.r.t. the analog

implementation, then, for ζ = 0.95 , using (55) this means that we need to have fu > 180 ⋅ 2 ⋅ 0.95 f n = 34.2 f n . If we wish 10 to limit the degradation to d A = 30 then, for ζ = 0.95 , using (55) this means that we need to have 180 fu > ⋅ 2 ⋅ 0.95 f n = 114 f n . Once f u is determined, the 3 phase margin is computed and for subsequent analysis the system is considered to be a second-order system with natural frequency f n and a damping factor of

ζ eff = 0.01 ⋅ PM . In Fig. 24 we see bode-diagram example of how the phase margin of the PLL is affected by the delay term exp(− sTu / 2) . As we can see there, as expected the crossover frequency remains unchanged but the phasemargin is degraded due to the presence of the delay term.

X. IMPLEMENTATION OF THE DECIMATION FILTER We shall now analyze the decimation filter and find a structure for its efficient implementation in hardware. A filter with impulse response hDF ( n) will relate the output to its input through the well-known convolution relation: ∞

y ( n) =

∑h

m =−∞

DF

( m)x ( n − m)

y (k ) =



m = −∞

h DF ( m ) x ( M k − m )

(56)

(57)

L −1

∑h

m=0

DF

( m )x ( M k − m )

Z-1

hDF (0)

(58)

The direct-form realization of (58) is shown in Fig. 25. Regrettably, this direct-form implementation, while easy to visualize and analyze, often requires too many resources for it to be implemented in fixed-point hardware. To see this, note that there are L multipliers that need to operate at rate

Z-1

hDF (1)

hDF (2)

Z-1 hDF (L− 2)

hDF (L−1) ↓M

Fig. 25 – Direct-form implementation for

y(k)

hDF ( n ) (see [21Chap. 6]).

f p . Furthermore, the addition that produces the input to the decimator by M also needs to operate at the high sample rate, which becomes increasingly difficult to implement for a large L. Some improvements can be attained by using alternate filter topologies such as in [22 Fig. 6.28], and [22 Fig. 6.31]; however, even with these manipulations, the logic resources requirements are often still excessive. To make efficient hardware implementation possible, consider the unit-gain rectangular window filter, that is: 1 0 ≤ n ≤ L-1  hDF (n) =  L (59) 0 otherwise which has the transfer function of (using the summation N −1 n N formula α = 11−−αα [23 eq. 19.4]): n =0



DF

(e

jΩ

)=

L −1



∑ h ( n )e − j Ω n n=−∞

=

1 (e − jΩ ) ∑ n=0 L

=

e

− jΩL 2

e

The decimation filter hDF (n) is an FIR filter [22] whose length we shall denote as L, and thus (57) can be reduced to:

y (k ) =

x(n)

H

If we decimate by M at the output of this filter, that is take only one out of every M output samples, we have: ∞

Fig. 24 – Bode diagram comparison of an example open loop function G(s) with and without delay.

e

jΩL 2

− jΩ 2

n

=

−e 2 j

1 1 − e − jΩ L L 1 − e − jΩ − jΩL 2

(60)

2 j e

jΩ 2

−e

− jΩ 2

 ΩL  s in   1  2  = e ⋅ L sin (Ω / 2 ) The (one-sided) passband of that filter is [ 0, 2π / L] , and consequently since we are aiming for a passband of [0, π / M ] we choose L = 2M , and (60) is transformed to: − jΩ L −1 2

H DF (e jΩ ) = e

− jΩ

2 M −1 2

1 sin ΩM ⋅ 2M sin(Ω / 2)

(61)

The response of this filter for M = 8 is shown in Fig. 26.

- 13 -

As can be seen in Fig. 26, there is some distortion in the passband, and the sidelobes can be considered relatively high. However, it can be shown (see the workshop notes [1]) that these effects cause an increase of at most 10.9% (or less than 0.5 dB) in the noise power inside the loop, which for the overwhelming majority of applications is palatable. We have just shown that the rectangular filter provides acceptable performance when it serves as a decimation filter inside the PLL. We now show that dramatic savings in logic resources can be attained (w.r.t. Fig. 25) if that rectangular filter is implemented wisely. Combining Eq. (58), (59), and since L = 2M , we have:

y (k ) =

1 2M

Fig. 26 – Amplitude response H (e jΩ ) for M=8 (2M=16). DF

2 M −1

∑ x(Mk − m)

(62)

m=0

Now let’s separate (62) into even and odd samples:  1 2 M −1 k=2r  2M ∑ x(2 Mr − m)  m=0 y (k ) =  (63) 2 M −1  1 − − x Mr M m (2 ) k=2r-1  2M m∑ =0 An illustration of (63) is given in Fig. 27, which immediately suggests the simple implementation shown in Fig. 28. Note that the Integrate and Dump (IAD) modules in Fig. 28 require no multiplications at all (their precise structure will be shown shortly). The division by 2M shown in that figure is sill prohibitive to compute in hardware; however if 2M is chosen to be a power of 2, i.e. 2M = 2 P , then the division by 2M can be approximated by a block that simply discards the lower log 2 ( 2M ) = P bits, which is a trivial

operation to perform in hardware (for example, for 2M = 8 we have 2M = 23 and so the division would be approximated by discarding the lower 3 bits). A simplified diagram of the implementation of the integrate and dump module is shown in Fig. 29. Note that in that figure the division by 2M is done within the IAD module (in order to reduce the number of bits required for the output register). Also, the output rate is 1/(2Tu ) , not

1/ Tu ; this is because the IAD will be used in conjunction

Fig. 27 – Explanation of decimation by M using a rectangular window filter. Shown is operation for M=4 (2M=8).

Integrate and Dump 2M samples, start integrations at times: n=(2r-1)M-(2M-1)

Switch: Select upper branch for k=2r-1, lower branch for k=2r

x(n)

Integrate and Dump 2M samples, start integrations at times: n=2rM-(2M-1)

y(k)

Divide by 2M

Fig. 28 - Efficient hardware implementation of the decimation filter (the “Staggered Integrate and Dump” implementation).

with a second IAD module (as in Fig. 28), which will produce an output sample at a rate of 1/(2Tu ) Hz halfway

Input at rate 1/Tp

between each pair of samples of the first IAD module, and hence the total output rate for y ( k ) will be 1/ Tu , as

B bits

required. The “control logic” cloud tells the IAD module when to “integrate” and when to “dump” its result to the output register, and thus must work in tandem with the adjacent IAD module in order to produce the desired result at the output of the decimation filter in Fig. 28. Hence it is advantageous to implement the structure of Fig. 28 as a single module. The “control logic” cloud is essentially little more than a carefully controlled counter that controls the timing of signals within the IAD modules, as shown in Fig. 28. This control logic is straightforward to implement in hardware. In conclusion, we see that we can implement a nearly ideal decimation filter with only two adders (one per IAD), 4

Divide by 2M

B+log2(2M) bits Clock Rate 1/Tp

D

Q

Reg1 Clear

Discard Lower log B+log2(2M) 2(2M) B bits bits bits

Output at rate 1/(2Tu) D

Q

Reg2

B bits

Clock Rate 1/(2Tu)

B+log2(2M) bits

Control Logic Fig. 29 – Implementation of the Integrate and Dump module.

registers (2 per IAD), and some additional (yet quite simple) control logic. This is a huge reduction in logic resources as compared to the implementation of Fig. 25.

XI. EFFECTS OF IMPLEMENTATION LATENCY In Section IX we found that the update rate of the DDS has a measurable impact on the phase margin of the PLL.

- 14 -

This is in fact a special case of a more general phenomenon, which is the effect of a delay element within a control loop on the latter’s phase margin. Assume we have designed the decimation filter as per (61). We can incorporate this filter into the model of Fig. 19 as shown in Fig. 30. The effects of the decimation filter magnitude response on the loop are negligible (see Sec. X and [1]), and thus we shall assume for the ensuing analysis that the decimation filter magnitude response is ideal, i.e. that has an ideal LPF magnitude response in the interval [ −π / M , π / M ] . Regarding the delay introduced by the decimation filter, − jΩ

2 M −1 2

, we shall shortly see that it which takes the form of e must be taken into account when the PLL is designed. It is advantageous for the purposes of analysis to transform this delay into the continuous (= Laplace) domain. A delay of

e

− jΩ

2 M −1 2

in the discrete-time Fourier domain means a delay 0.5(2 M − 1) samples, which, taken at a rate of of f p = 1/ Tp , corresponds to a delay of 0.5(2 M − 1)Tp seconds.

(55), it is easy to show that if we wish to bound latency’s impact on the phase margin by d L degrees, we must have: dL dL 2M − 1   ≈ Td =  TI + Tp  < (67) 2   180 ⋅ 2 fC 720 ⋅ ζ f n For insight into the meaning of (67), it is instructive to construct a table of the maximum allowable Td as a function of f n , assuming a maximum allowed phase margin

degradation of d L = 10 0 and ζ = 0 .9 5 . This is shown in Table 3. As seen there, the constraint on the phase margin is clearly a limiting factor on what loops can be implemented, and this phase margin degradation must be taken into account when designing the loop. From Table 3 it is also clear that with a hardware implementation one can strive (using current technology) to implement loops with f n up to an order of several tens of KHz, while for a software implementation current technology limits the implementation to f n of at most several KHz. Thus we observe the more general conclusion that a hardware implementation will

Such a delay has the Laplace transfer function representation

2M − 1  . Transforming the delay into the of exp  − s Tp  2   Laplace domain and ignoring any magnitude effects of the decimation filter, we have the equivalent representation of the PLL of Fig. 31. We shall also insert into the open loop transfer function a pure delay exp(− sTI ) , which will model any additional implementation latencies totaling TI seconds (such as the delays associated with data path pipelining, latencies incurred in the various digital chips, PCB trace length latencies, etc.). The revised model is shown in Fig. 32 (where Gˆ ( s ) is defined in Sec. IX). The impact of the delay elements in Fig. 32 is best analyzed as it is ascribed to the decrease in the phase margin that it incurs. We see that, taking into account the delays in Fig. 32, the open loop transfer function is:

  2M − 1   ˆ G ( s ) = exp  − s  TI + Tp   G ( s ) 2   

Phase Detector Data Rate = 1/ Tp

θi + θe

-

e

− jΩ

2 M −1 2

ZOH

VCO fD 1 ⋅ 2N s



Tu e

− sTu 2

↓M

B( z)



Impulse Train Converter rate = 1/Tu

sinh ( sTu / 2 ) sTu / 2

DDS Equivalent Model Fig. 30 - Hybrid PLL model with decimation filter response.

Fˆ (s) Sample Rate fu

θi + θe

-

Loop Filter

Kd

−s

e

2M −1 Tp 2

B(z)

DAC Digital-Sequence to Impulse Train Converter rate = 1/Tu

Tue

− sTu 2

θo

(65)

ZOH

sinh ( sTu / 2) ⋅ sTu / 2

VCO

where it is emphasized that, due to the fact that G( j 2π f ) = Gˆ ( j 2π f ) = G( j 2π f ) (see (48) and (64)) we three transfer functions G ( s ) , Gˆ ( s ) , and G( s) , and is approximated by f C ≈ 2ζ f n (see (54)). Define for convenience the total implementation delay as: 2M − 1   Td  TI + Tp  (66) 2   In the same manner as was done in the derivations (52) to

1 sin ΩM ⋅ 2M sin(Ω / 2)

DAC Digital Sequence to

(64)

have that the crossover frequency f C is identical for all

Loop Decimator Filter

θo

Using precisely the same set of derivations as was used to derive (51) from (48), we find that:

2M − 1   PM (G ) = PM (Gˆ ) − 180 ⋅ 2 f C  TI + Tp  2  

Kd

Decimation Filter H DF (e jΩ )



fD 1 ⋅ 2N s

Fig. 31 – Hybrid PLL model with decimation filter latency converted to Laplace domain. Magnitude effects of the decimation filter are ignored.

θi +

θe -

θo

 2M −1  exp −s Tp  2  

exp(− sTI )

Gˆ (s)

Fig. 32 – Hybrid PLL model with implementation delay added.

- 15 -

that is part of the data path to the DDS, and which contains two registers that are clocked by the postdecimation clock of rate f u as detailed in Fig. 34.

From Analog PLL Design Procedure For example dA=3o

Decide on allowed phasemargin degradation due to the DDS, dA

For example dL=7o

Decide on allowed phasemargin degradation due to the digital logic latency, dL Design the loop filter as:

M=2

 1 + β1 z −1  B( z) = γ ⋅  −1   1 + α1 z  with

M=2M

Check if Yes

fp M

>

180 ⋅ 2ζ f n dA



and 2M − 1  dL  Td =  TI + Tp  < 2   720 ⋅ ζ f n This algorithm finds the

No

maximum M which is a power of 2 such that fu =

fp M

>

α1 =   1 −

2τ 1   Tu 

  2τ 2  β1 = 1 −  Tu   

γ = K a  1 +

180 ⋅ 2ζ f n dA



M=M/2

 2τ 1   1 +  Tu   

2τ 2   Tu 

 2τ 2   1 +  Tu     2τ 1   1 +  Tu   

dL 2M − 1   Td =  TI + Tp  < 2   720 ⋅ ζ f n

fu=fp/M

Prewarp: 1

=

1

π Tu

 πT   πT  1 1 = tan  u  and tan  u  T τ τ π  1   τ2  2 u

always have the advantage of being able to implement PLLs which have higher natural frequencies; this is equivalent to saying that a hardware implementation will be able to cope with a noisier carrier (i.e. one with a wider phase-noise bandwidth). Another quantity that affects the phase margin is the DDS update rate f u , as detailed in (51). The value of Td is has a strong connection to Tu = 1/ f u . To see this, consider that M = f p / fu , and putting that into (66) yields: 2 f p / fu − 1   f T   T   Tp  =  TI + p Tp − p  =  TI + Tu − p  (68)  TI + 2 f 2 2    u  

Furthermore, TI is also a strong function of f u (but not necessarily exclusively determined from f u ). Justification for this is as follows: recall that TI represents the additional implementation delays. Consider for example a logic path Table 3 – Maximum allowed implementation latency as a function of

fn

, for

d L = 10 0

fn

1MHz 100 KHz 10 KHz 1 KHz 100 Hz

TI and Td and check if (67) is satisfied. If not, then either fu must be increased, d L increased, or, if at all possible, f n decreased. It is generally unwise to meddle with the value of ζ (see Sec.

and

ζ = 0 .9 5

Maximum Allowed Td 14.6 nanoseconds 146 nanoseconds 1.46 microseconds 14.6 microseconds 146 microseconds

XII. STEP-BY-STEP PROCEDURE FOR DESIGNING A HYBRID PLL

Implement the loop filter and the decimation filter using the hardware-efficient structures that were presented

Fig. 33 – Flow chart of procedure for design of hybrid PLL.

Td

then, using this value of fu , compute or estimate

IX).

and

τ1

Each such stage will add a delay of Tu to the open loop transfer function. Additional delays which are not of this type may be present too and are also included in TI . Hence it is clear that in order to bound the total degradation in the phase margin which is caused both by the DDS and the system delay, it is necessary to first find fu that satisfies (55), and

We are now in a position to outline a step-bystep procedure for designing a hybrid PLL. The procedure is shown in Fig. 33. A. Comments on the design procedure The series of steps outlined in Fig. 33 can be easily performed by a computer program, e.g. using Matlab or C code. In fact, the values obtained by such a computer program may be directly incorporated into the design of the ASIC or FPGA that is used to implement the digital section. The author has indeed carried out such an algorithm to automatically generate values that were then used in the FPGA implementation of hybrid systems, and has found it to be quite an effective and accurate design methodology. Moreover, due to the tight link between Td and Tu (see

Sec. XI), very often the system’s architecture will make it possible to deduce a formula linking Td and Tu , which will enable d L to be determined from d A or vice versa. For example, if we have Tu = Td / α then it can be easily seen that if (55) is satisfied then setting d L = 2α ⋅ d A will ensure that (67) is satisfied. Hence, if a relation between Td and Tu can be found, then the first two steps in Fig. 33 can be united, where d L would be determined from d A in order to ensure a total phase margin degradation of d A + d L . Towards DDS

From Loop Filter

D

Q

Reg1

Logic

D

Q

Reg2

Clock Rate fu

Fig. 34 – How a pipeline stage in the datapath from the loop filter to the DDS inserts a delay of Tu into the open loop transfer function.

- 16 -

I(nT s ) 8

Q(nT s ) 8

Matched filtering and decimation to 1 sample/ symbol, where the output samples are the “even” samples (i.e on the peak of symbols (timing is achieved using the symbol PLL))

Decimation Filter and decimation by M, Staggered Integrateand- Dump implementation

loop filter in fixed-point hardware was also addressed. We then proceeded to outline a step-by-step procedure for the design and implementation of a hybrid system. Further analysis and derivations can be found in [1], which may be obtained by contacting the author.

I e (n)

8

Phase Detector Q e (n)

8

8

Loop Filter

8

B(z)

32

32

DDS Programming State Machine

[1]

To DDS

32 w CEN

Fig. 35 – Schematic of the contents of the FPGA or ASIC used to implement the digital section of the carrier PLL.

B. Digital implementation inside FPGA or ASIC Finally, we now have enough information to allow us to draw a schematic of the implementation of the digital logic inside the FPGA or ASIC. This is shown in Fig. 35. The widths of the internal busses are given in Fig. 35; while simply an example, they are typical choices as they emanate from a variety of system parameters, such as number of sampler bits, the number of bits needed for a DDS tuning word, as well as the general desire to minimize logic resources by minimizing the width of the datapath in the chip. To that end, notice that the only part of the chip that has a datapath width greater than 8 bits is from the output of the loop filter to the DDS. The decimation filter and the loop filter are implemented as per Fig. 28 and Fig. 21, respectively. In order to actually implement the loop filter, its implementation specific parameters (such as coefficient quantization and the datapath widths inside the filter, etc.) need be determined. These topics are beyond the scope of this paper (see for example [21 Secs. 6.7-6.10]), but it shall be commented here that in the author’s experience an excellent implementation of the loop filter (in terms of quantization noise) can be achieved when the coefficients are quantized to 16 bits (1 sign bit + 15 data bits). Finally, some explanation may be in order regarding the addition of the value wCEN to the output of the loop filter before the result is passed to the DDS. This is due to the fact that the PLL provides the correction to the DDS that is needed in order to maintain lock. The tuning word wCEN is the tuning word that corresponds the center frequency of the DDS (= the center frequency of the VCO in the DDS equivalent model).

XIII. CONCLUSIONS This paper has developed and presented techniques for the design and implementation of hybrid PLLs. The paper began with a discussion of “classical” analog PLL design. Then, a mathematical model that described the hybrid architecture was developed and analyzed. It was found that the update rate of the DDS and the total allowed implementation latency were crucial parameters concerning the PLL’s stability. Implementation of the decimation filter in fixed-point hardware was discussed, and it was found that an exceedingly efficient implementation exists. Design of the

[2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16]

[17] [18] [19] [20] [21] [22] [23]

- 17 -

REFERENCES Y. Linn, "Synchronization and Receiver Structures in Digital Wireless Communications (workshop notes)," in Seminario Internacional 15 Años. Bucaramanga, Colombia, Aug. 15-19, 2006. H. Meyr, M. Moeneclaey, and S. Fechtel, Digital communication receivers: synchronization, channel estimation, and signal processing. NY: Wiley, 1998. U. Mengali and A. N. D'Andrea, Synchronization techniques for digital receivers. NY: Plenum Press, 1997. F. M. Gardner, "Interpolation in digital modems. I. Fundamentals," IEEE Trans. Commun., vol. 41, no. 3, pp. 501-507, Mar. 1993. L. Erup, F. M. Gardner, and R. A. Harris, "Interpolation in digital modems. II. Implementation and performance," IEEE Trans. Commun., vol. 41, no. 6, pp. 998-1008, Jun. 1993. A. Blanchard, Phase-locked loops. Application to coherent receiver design. NY: Wiley, 1976. F. M. Gardner, Phaselock techniques, 2nd ed. NY: Wiley, 1979. J. J. D'Azzo and C. H. Houpis, Linear control system analysis and design : conventional and modern, 3rd ed. NY: McGrawHill, 1988. R. C. Dorf, Modern control systems, 5th ed. MA: AddisonWesley, 1989. R. E. Best, Phase-locked loops: theory, design, and applications, 2nd ed. NY: McGraw-Hill, 1993. R. L. Peterson, R. E. Ziemer, and D. E. Borth, Introduction to spread-spectrum communications. NJ: Prentice Hall, 1995. R. C. Dorf, Modern control systems, 4th ed. MA: AddisonWesley, 1986. H. Meyr and G. Ascheid, Synchronization in digital communications. NY: Wiley, 1990. W. P. Robins, Phase noise in signal sources. (Theory and applications). London: Peter Peregrinus, 1982. D. H. Wolaver, Phase-locked loop circuit design. NJ: Prentice Hall, 1991. B. T. Kopp and W. P. Osborne, "Phase jitter in MPSK carrier tracking loops: analytical, simulation and laboratory results," IEEE Trans. Commun., vol. 45, no. 11, pp. 1385-1388, Nov. 1997. W. P. Osborne and B. T. Kopp, "An analysis of carrier phase jitter in an M-PSK receiver utilizing MAP estimation," in Proc. MILCOM '93, Boston, MA, USA, 1993, pp. 465-470. W. P. Osborne and B. T. Kopp, "Synchronization in M-PSK modems," in Proc. ICC '92, Chicago, IL, USA, 1992, pp. 1436-1440. ETSI (European Telecommunications Standards Institute), "DVB-S2 Technical Report ETSI TR 102 376 V1.1.1," 2005. Analog Devices, "AD9851 Datasheet, Rev. C," retrieved from www.analog.com. A. V. Oppenheim and R. W. Schafer, Discrete-time signal processing. NJ: Prentice Hall, 1989. R. E. Crochiere and L. R. Rabiner, Multirate digital signal processing. NJ: Prentice-Hall, 1983. M. R. Spiegel, Mathematical handbook of formulas and tables. NY: McGraw-Hill, 1968.