Flexible baseband transmitter for OFDM

13 downloads 0 Views 62KB Size Report
[email protected], Anders.Olsson@axis.com. Abstract. To fully utilize the available spectrum for a wireless com- munication system it is feasible to adapt to ...
Flexible baseband transmitter for OFDM 1

Fredrik Kristensen1 Dep. of Electroscience and CCCD Box 118, Lund University SE-221 00 Lund, Sweden email:[email protected]

Peter Nilsson1 and Anders Olsson2 2 Axis Communications AB Emdalavgen 14 SE-223 69 Lund, Sweden email:[email protected], [email protected]

Abstract To fully utilize the available spectrum for a wireless communication system it is feasible to adapt to different situations on the channel. In this paper a flexible OFDM transmitter is presented together with basic theory behind OFDM transmission. It is shown that high flexibility can be obtained with a reasonable amount of additional hardware. Part of the design, the FFT-processor, has already been fabricated and measurement results are presented.

Data

2

OFDM transmission

The digital baseband parts of an OFDM transmitter consist of four basic blocks [1], shown in Figure 1. The ba-

cyclic To D/A prefix converter

IFFT

copy

DATA

CP 0

η−1

Introduction

Licensed spectrum is becoming more and more expensive and the free parts of the spectrum are being more and more crowded. As a result, modern radio systems must have high spectrum efficiency and be able to adapt to changes in the interference situation on the channel. Recently, many new standards e.g., Hiperlan/2, IEEE802.11a, and DAB, use Orthogonal Frequency Division Multiplexing (OFDM) to achieve high performance. In future wireless systems using OFDM, terminals will be expected to handle a wide range of different applications, from pure voice communication to high-speed data transfers, with as little overhead in power consumption and protocol procedure as possible. As the terminals will be battery powered and should be cheap to manufacture, an ASIC solution is required. To be able to adapt to different communication schemes and to have the possibility to fully utilize the ever-changing communication channels, the OFDM transmitter must be flexible. In this paper the digital baseband of an OFDM transmitter is explored and a flexible architecture is presented. In section 2 the basics of OFDM transmission is explained. The architecture of the design is presented in section 3 and the internal precision is discussed in section 4. Section 5 puts a price on the flexibility and in section 6 the results are presented. Conclusions are given in section 7.

mapper

Figure 1. The digital baseband parts of an OFDM transmitter

KEY WORDS OFDM, Transmitter, Flexibility, ASIC

1

encoder

N + η − 1 Time One OFDM frame

Figure 2. One OFDM frame

sic idea of OFDM transmission is that the available spectrum is divided into N orthogonal subcarriers. The encoded data is mapped onto the subcarriers using an N-point IFFT, which transforms the data representation into the time domain. Finally, an η sample long cyclic prefix (CP) is added to reduce the intersymbol interference (ISI). A complete OFDM frame is shown in Figure 2.

2.1

Channel encoder

The incoming data are encoded to reduce the bit error rate (BER). The idea is to insert controlled redundancy in order to correct errors that are introduced by the channel. An encoder suitable for flexible OFDM is found in [2]. The encoder is not part of this project and will not be addressed further in this paper.

2.2

Mapper

The mapper converts input data into complexed valued constellation points, according to a given constellation. Typical constellations for wireless applications are, BPSK, QAM, and 16 QAM, see Figure 3. In Figure 3, I is the in-phase component and Q is the quadrature component. The amount of data transmitted on each subcarrier depends on the constellation, e.g. BPSK and 16QAM

Q

Q

I

I

I

BPSK

Q

QPSK

16QAM

large number of subcarriers is that more hardware will be required to perform the IFFT and the latency through the hardware will increase. The latency can be critical if the transmitter is used in a real-time application. Thus, for the system to be efficient, the number of subcarriers has to be adjusted over time.

Figure 3. Typical constellations for wireless applications

2.4 transmit one and four data bits per subcarrier, respectively. Which constellation to chose depends on the channel quality. In an high interference channel a small constellation like BPSK is favorable, since the required signal to noise ratio (SNR) in the receiver is low, whereas in a low interference channel a larger constellation is more beneficial due to the higher bit rate. Three examples of how the constellations can be chosen are: 1. Only one constellation is included, which is often the case in low-end transmitters. How to choose the included constellation is a design decision, depending on the delay and multipath propagation situation. 2. More than one constellation is included, but only one constellation is used per OFDM frame, which is the case in the Hiperlan/2 and IEEE 802.11a standards. The choice of constellation can be based on measurements of the BER. 3. More than one constellation is included, where each subcarrier can use a different constellation. This is called bit loading. Bit-loading algorithms base the choice of constellation on the frequency response in each subcarrier. A subcarrier with high SNR will get a larger constellation and vice versa [3]. Thus a flexible transmitter must provide the user with the possibility to use one of several constellations for each subcarrier.

2.3

IFFT

The Inverse Fast Fourier Transform (IFFT) transforms the signals from the frequency domain to the time domain. In 1973 it was discovered that the FFT could be used in multicarrier systems like OFDM [4]. However, it was not until 1989 with the introduction of the CP that OFDM was established [5]. The number of subcarriers, N, determines how many sub-bands the available spectrum is split into. The more subcarriers used the less overhead is introduced by the CP, see section 2.4. Contrary to this the sub-band must be much wider than the Doppler frequency to remain orthogonal [6]. In other words the faster the terminal moves, the fewer subcarriers can be used and the more overhead is introduced. The crest factor, the ratio between the peak and the average amplitude of the OFDM symbol, will also limit the maximum number of sub carriers that can be used. A high N will give a high crest factor, which will cause linearity problems in the power amplifier [7]. Another drawback with using a

Cyclic prefix

The cyclic prefix (CP) is a copy of the last η samples from the IFFT, which are placed at the beginning of the OFDM symbol, see Figure 2. There are two reasons to insert a CP: 1. The convolution between the data and the channel impulse response will act like a circular convolution instead of a linear one. Circular convolution makes equalization easier. 2. Interference from the previous symbol will only affect the CP, which is discarded in the receiver. Both reasons assume that the CP is longer than the channels impulse response. If the CP is shorter than the impulse response, the convolution will not be circular and intersymbol interference will occur. However, if the number of samples in the CP is large, the data transmission rate will decrease significantly, since the CP does not carry any useful data. The data rate will decrease with the factor R as R = N/(N + η). Thus, it is important to chose the minimum possible CP to maximize the systems efficiency.

3

Flexible architecture

The main idea behind the chosen architecture is to provide the system with the desired flexibility. In the chosen architecture each block is split into small and independent subblocks. To reduce the energy consumption each sub-block can be turned on or off, depending on the configuration. Clock gates are used to turn sub-blocks on and off. Figure 4 shows the clock gate architecture [8] and a timing diagram how to turn off the gated clock for one clock cycle. As seen, the timing for the signal Enable is non-critical.

3.1

Mapper

The mapper is designed to be used with a bit-loading algorithm. The included constellations are no modulation (ZERO), BPSK, QPSK, 8PSK, 16QAM, and 64QAM. The ZERO constellation does not contain any data, but is included for unused subcarriers. That can be subcarriers with very low SNR and subcarriers at the fringes of the spectrum, where a spectrum-shaping filter ruins them. The DC subcarrier is sometimes also unused, because all offset errors in the IFFT will end up in this component. Constellations larger than 64QAM are not likely to be used in a wireless system, due to the high SNR required at the receiver side.

Enable

LATCH

FIFO, length x

Enable L AND

FIFO, length x/2

Gated Clock Data in

a)

Butterfly

Butterfly CM

TM

Data out

Figure 6. A radix-22 stage. x = 22∗stage−1

Clock Clock b) Enable Enable L Gated Clock

Figure 4. A clock gate (a) and the timing diagram (b) Constellation Data from encoder

CG

CG

ZERO

BPSK

CG

CG

QPSK

Power control

8PSK

CG

CG

16QAM

64QAM

Power controler

To IFFT

Figure 5. Mapper architecture, where CG is clock gates

Figure 5 shows the mapper architecture. Each constellation is implemented as a lookup table controlled by a clock gate. The clock gates prevent that more than one lookup table is clocked each clock cycle and thus prevents unnecessary switching activity. A power controller, which can reduce power separately in each subcarrier, is included to support bit-loading algorithms.

3.2

FFT processor

The FFT processor is designed to handle 32, 64, 128, 256, 512 and 1024 points, corresponding to the number of subcarriers. The FFT processor can as easily be used to perform an IFFT, by swapping the real and imaginary part of the input and output data. The architecture of the FFT processor is based on Decimation In Frequency (DIF) with radix-22 stages, as shown in Figure 6. The advantage with the radix-22 FFT compared to a straightforward radix-2 FFT is that every

other complex multiplier (CM) is replaced with a trivial multiplier (TM) that only multiplies with -j or 1. Compared to radix-4 structures it has the advantage of simple control and the smallest possible memory needed for a pipelined FFT processor [9]. To compute the 1024 point FFT, five radix-22 stages are connected as shown in Figure 7. This design allows FFTs of size 64, 256, and 1024. To realize the sizes 32, 128 and 512 a radix-2 stage is added to stage 5. Since the radix-2 stage and the radix-22 in stage 5 are never used simultaneously, parts of the hardware are reused. The last FIFO in stage 5 and the complex multiplier are shared between the radix-2 and radix-22 part. All internal control is managed with a counter that ripples through the stages. The counter is placed internally on the chip. The pipelined structure makes the processor fast. Due to the parallel processing blocks the processor produces one output for each input value, apart from the latency of N-1 clock cycles. The structure supports easy reconfiguration of the size, by bypassing some of the stages. For example, in a 32 point FFT the data pass through stage 5 (the radix-2 part), bypass stage 4 and 3, and finally pass through stages 2 and 1. To reduce power consumption, the clocks for the bypassed stages are turned off. Care has been taken when implementing the complex multiplier and the FIFOs, since they are large both in the sense of power consumption and area. A straightforward complex multiplication involves four real multiplications and two real additions, which can be transformed to 3 real multiplications and 5 additions. Another approach is to use distributed arithmetic, this will give an area roughly equivalent to two multipliers with the speed of a single multiplier [9]. In the FFT processor complex multipliers using distributed arithmetic and Wallace-trees are used [10]. In stages 3 to 5, see Figure 7, RAMs is used to implement

In C

R-22 & R-2 stage 5

R-22

R-22

R-22

R-22

stage 4

stage 3

stage 2

stage 1

clock gate

clock gate

clock gate

Out

clock gate

clock

Figure 7. FFT architecture, where C is the counter

Memory 0

Table 1. Average current consumption for 128-1024 words memories. The values are given for VDD=3.3 V and a wordlength of 24 bits.

Memory 1

From FFT processor

To D/A converter

Figure 8. The SRU architecture

the FIFOs, since they are more area efficient than flip-flops [11]. A consequence of the decimation in frequency approach is that the output sequence of the data will be bitreversed. In order to do the FFT in the receiver correct, the data has to be reordered.

3.3

Signal-reordering unit

The signal-reordering unit (SRU) handles two tasks, the insertion of a CP and the bit-reversion of the data. In order to adapt to the channel impulse response the length of the inserted CP is arbitrary, up to a maximum of 256 samples. Since the CP is copied from the end of a data frame and inserted at the beginning of the frame a memory of N words is required, see Figure 2. To produce continuous output data an additional memory of equal size has to be included, see Figure 8. While one data frame is written to one memory, the previous frame is read from the other memory. With this design the reordering comes for free, the data is written to memory with a bit-reversed address and read from memory with a normal address. Since the largest N in this design is 1024, the memories have to be 1024 words long. If the memories are implemented as one block they will consume much power even when only 32 of the addresses are actually used. To reduce the power consumption for cases when N is less than 1024, both memories are split into smaller blocks. As a trade-off between power reduction and the difficulty to place and route many memories, each memory is split into four blocks of size 128, 128, 256, and 512 words. Table 1 shows the average current that each memory consumes, note that the 128 words memory is a low power memory. The resulting current consumption for the split memory design is shown in Table 2. With N smaller than 512, only the low power memories are used. The values are given for memories done in a standard 0.35 mm CMOS process with five metal layers.

Size [words] IDD [µA/MHz]

1024 516

512 463

256 436

128 74

Table 2. Average current consumption for N=32-1024. The values are given for VDD=3.3 V and a wordlength of 24 bits. N IDD [µA/MHz]

1024 359

512 255

256 74

128 74

64 74

32 74

FFT and to reorder the data. The inputs to the design are the data signals receive and transmit, the control signals, reset, enable, and power down, and the configuration signals, number of subcarriers, length of the CP, constellation type, and constellation power. The power down input is used to tell the transmitter to finish processing the internally available data and turn off each block as soon as possible. With the power down signal raised from the start, a minimum of two OFDM frames are processed. To fully utilize the available spectrum, the transmitter must be able to produce continuous output data. Therefore the first parts of the transmitter will be affected by the performance of the last block, the SRU. While the CP is read, the other blocks must hold. The SRU generates a halt signal and the controller temporarily freezes the mapper and FFT processor for the duration of the CP. The Halt signal is also present as an output signal for other blocks to use. The bit-loading algorithm will also contribute to an uneven data stream since the number of required input bits change, according to the used constellation. For this design the number of processed bits per clock cycle vary between zero and six. To handle the variable data rate, a buffer could be used. However,in this design a buffer is not included, since it would have to store the redundancy inserted by the channel encoder as well as the data. A interface between the encoder and the mapper, that handles Receive

Transmit

Mapper

FFT processor

Data out SRU

Valid data

Control Controller

3.4

Transmitter

The transmitter consists of the mapper, the FFT processor, the SRU, and a controller, see Figure 9. The FFT processor and the SRU can be reused in the receiver to perform the

Power down

Halt

Figure 9. The transmitter architecture

Constellation

FF

Table 3. SNR for 32-1024 point FFTs, with 2 ∗ 12 bits twiddle factors.

FF Encoder

N SNR [dB]

Mapper

32 51

64 49

128 45

256 42.5

512 39.5

1024 36.5

FF Data from buffer

Table 4. Included runtime flexibility in the design and the possible values.

FF

Block Mapper Mapper FFT processor FFT processor SRU

Figure 10. Suggested interface between encoder and mapper

Parameter Constellation Power control Size IFFT/FFT CP length

Values ZERO-64QAM 1,0.5,0.25,0.125 32-1024 1/0 0-256

the uneven data stream, is suggested in Figure 10.

5 4

A price on flexibility

Precision

An important decision when designing a system is the internal wordlengths. Not only does the wordlength affect precision, but also size, speed, and power consumption. Thus, the internal wordlength must be kept to a minimum. The wordlength of the output from the mapper will depend on the largest constellation. In this design the largest constellation is 64QAM and thus three bits must be used in both the real and imaginary part of the complex valued output. Another three bits are added for soft decoding and two bits for noise and interference margin. The result is a 2 ∗ 8 bits complex word. In the FFT processor the FIFOs are long in the beginning and become short in the end. This means that a large wordlength in the last stages will not affect the FIFO area that much. To take advantage of this, the wordlength is allowed to increase, from 8 to 14 bits, in the first three stages of the FFT processor. Hence, no precision is lost in these stages. Figure 11 shows the wordlength for the real and imaginary part of the data in each stage. To keep the SRU memories small, the output from the FFT processor is rounded off to 2 ∗ 12 bits. As a consequence the wordlength in stage 1 and 2 can be kept constant without losing more than approximately 1 dB in SNR. The SNR for the FFT processor is shown in table 3.

Two types of flexibility are included in the design, runtime and design flexibility. The design flexibility is that each important parameter, such as the wordlength, is kept generic in the VHDL description, i.e., they can easily be changed before the design is instantiated. Design flexibility comes for free and is helpful if the design is to be revised with different optimization criteria. Runtime flexibility is controlled by input signals and can change the operations performed by the design during execution. The included runtime flexibility in the design is shown in Table 4. Unfortunately, runtime flexibility adds area to the design, leads to more design work and time spent on verification. The degree of flexibility is limited by the amount of additional hardware. The price paid in hardware to achieve runtime flexibility is often hard to estimate. If the parameters are fixed, there is a greater chance of co-optimization between the blocks, a good example is found in [12], where the SRU is replaced with a smaller FIFO. The price in hardware for the different parameters is shown in Table 5. The transmitter is compared to a design with the same architecture but without the flexibility. The architecture is split into individual sub modules,

Table 5. Additional hardware for each parameter, compared to a non-flexible design with the same architecture and N=1024. Additionally, input ports and some logic are needed.

Wordlength 14 13 12 11 10 9 8

Parameter Constellation

5

4

3

2

1

Stage

Figure 11. Internal wordlength in the FFT processor

Power control FFT size IFFT/FFT CP length

Additional hardware A look up table, a clock gate and a MUX per constellation 2 MUXes 4 clock gates, 8 Muxes, 1 butterfly, and a 256 words long ROM 2 MUXes 1 subtractor, in the address counter

and verified to perform a 1024-point FFT in 51 µs with a power consumption of 55.6 mW.

References [1] O. Edfors, “Low-Complexity Algorithms in Digital Receivers,” Ph.D. dissertation, Lule˚a University of Technology, Nov. 1996.

Figure 12. FFT chip

where each sub module can be turned on or off, depending on the configuration. Thus, the price for flexibility is mostly paid in area and not in power consumption. The only source of power consumption in the unused sub modules is the leakage power, which is low in non-high speed CMOS technologies.

6

Results

A flexible OFDM transmitter is implemented in a standard 0.35 mm CMOS process with five metal layers. The transmitter is synthesized for a clock speed of 50 MHz and the core area of the chip is 8.5 mm2 . There are no reliable figures on the power consumption to report, since the chip was not yet returned from fabrication while this paper was written. However care has been taken during the design to keep it low, e.g., low power memories are used as much as possible, unused parts in the design are turned off, and the wordlength is kept short. A prototype of the FFT processor has already been fabricated in the same process. The core power consumption varied, depending on the size of the FFT, from 1.25 to 2.78 mW/MHz at 2.0 V. The FFT processor is verified for clock frequencies up to 50 MHz. In the prototype the multipliers did not use distributed arithmetic and the wordlength was 2 ∗ 8 bits in and 2 ∗ 18 bits out. The FFT chip is shown in Figure 12.

7

Conclusions

To achieve a high spectrum efficiency, parameters such as cyclic prefix, number of carriers, and constellation must adapt to the current situation of the communication channel. In this paper it is shown that such flexibility can be obtained with a reasonable amount of extra hardware. The flexibility also contributes to a larger set of possible applications and thus larger fabrication volumes and lower price. The only extra source of power consumption due to the flexibility is the leakage power in the unused parts of the design. However, due to the moderate speed requirements a low leakage CMOS process can be used. Part of the design, the FFT-processor, has already been fabricated

¨ [2] M. Kamuf, J. Anderson, and V. Owall, “Providing flexibility in a convolutional encoder,” in Proc. of IEEE International Symposium on Circuits and Systems (ISCAS’03), Bangkok, Thailand, Mar. 2003. [3] C. Yui and R. Cheng, “Multiuser OFDM with adaptive subcarrier, bit, and power allocation,” IEEE J. Select. Areas Commun., Oct. 1999. [4] S. B. Weinstein and P. M. Ebert, “Data transmission by frquencydivision multiplexing using the discrete fourier transform,” IEEE Trans. Commun., vol. 19, pp. 628–634, Oct. 1971. [5] A. Peled and A. Ruiz, “Frequency domain data transmission using reduced computational complexity algorithms,” in Int. Conf. Acoust, Speech, Signal Processing, Denver, CO, 1980, pp. 964–967. [6] M. Russel and G. Stuber, “Interchannel interference analysis of OFDM in a mobile environment,” in Proc. IEEE Vehic. Technol. Conf., vol. 2, Chicago, IL, 1995, pp. 820–824. [7] N. Petersson, “Peak and power reduction in multicarrier systems,” 2002, licentiate, Lund University, Sweden. [8] K. Nazifi and G. Hasson, “Industry’s first rtl power optimization feature significantly improves power compiler’s quality of results,” 1998, www.synopsys.com/news/pubs/rsvp/spr98/ rsvp spr98 6.html. [9] S. He, “Concurrent VLSI Architecture for DFT Computing and Algorithms for Multi-output Logic Decomposition,” Ph.D. dissertation, Lund University, 1995. [10] A. Berkeman, V. wall, and M. Torkelson, “A Low Logic Depth Complex Multiplier Using Distributed Arithmetic,” IEEE J. Solid-State Circuits, vol. 35, no. 4, Apr. 2000. [11] F. Kristensen, P. Nilsson, and A. Olsson, “A flexible FFT processor,” in Proc. of 20th NORCHIP Conference, Copenhagen, Nov. 2002, pp. 121–126. [12] W. Ullah, “A low power FFT-processor for OFDM transceivers using cyclic postfix,” in Proc. of 20th NORCHIP Conference, Copenhagen, Nov. 2002, pp. 68–73.