A 100-Gb-Ethernet Subsystem for Next-Generation Metro-Area Network

5 downloads 183 Views 338KB Size Report
network services has been such that these MAN connections ... that MAN services impose on 100-Gb-Ethernet technologies ... Logical link control (LLC).
A 100-Gb-Ethernet subsystem for next-generation metro-area network Hidehiro Toyoda, Shinji Nishimura, Michitaka Okuno, Ryouji Yamaoka

Hiroaki Nishi

Central Research Laboratory Hitachi, Ltd. Tokyo, Japan [email protected]

Faculty of Science and Technology Keio University Kanagawa, Japan

Abstract—an ultra high-speed Ethernet subsystem, which realizes 100-Gb/s throughput and transmission up to 40 km, is examined for next-generation metro-area networks. A parallel link of 12 10-Gb/s synchronized parallel optical lanes is proposed. The 10 optical lanes are used to transmit 10-bit parallel data. The one of redundant lanes transmits a forward error correction code ((132b, 140b) Hamming code) to achieve highly-reliable (BER < 10-12) data transmission, and the other lane transmits a parity data used for the fault-lane recovery. Here, a 64B/66B codesequence-based de-skewing mechanism is proposed, and its effectiveness to realize low-latency compensation of the inter-lane skew (< 80 ns) is shown. We have implemented the 100-GbEthernet interface architectures into FPGA circuits, and confirmed the performance of 100 Gb/s data communication with compact 385-kgates circuit size, which is practically small for implementation in a single LSI circuit. Keywords-component; Ethernet; MAN; skew; FEC

I.

provides long-distance and low-cost data communication without protocol conversion (for example, Ethernet over SONET), the current requirement is Ethernet links that are at least 10 times faster (i.e. 100 Gb/s) and reasonably priced. The 40-Gb/s SONET/SDH standard (OC-768) [5] has already been released. However, OC-768 requires erbium doped optical fiber amplifiers (EDFAs) and dispersion compensation for data transmission over more than 10 km. These requirements increase the costs of installation and management to levels that are unacceptable to both providers and users. We have thus developed a 100-Gb-Ethernet physical layer for next-generation metro-area networks. The 100-GbEthernet links are relatively low-cost yet lift the performance of the MAN to levels that will be sufficient for some time to come (Fig. 1).

INTRODUCTION

ISP

iDC Metro-access

100GbE

Broadband networking and data-center services are rapidgrowth fields within the enterprise and residential area markets. Such services require high throughput, long distance transmission, and high-quality yet low-cost access to data through the main Internet pipes. Broadband service providers and end users, however, are frustrated by the lack of bandwidth in connections between ISPs (Internet service providers) and the Internet backbone. Such connections within a given urban area are collectively called a metro-area network (MAN). The typical MAN currently provides high-speed (< 10-Gb/s) data transmission over maximum distances (i.e. edge-to-edge or edge-to-core connections) of less than 10 km. The rapid increase in the popularity and availability of broadband IPnetwork services has been such that these MAN connections are already insufficient. This situation will worsen as the bandwidths of access networks and core networks are increased through the popularization of 10GBASE-T (wire-based 10Gb/s Ethernet standard) [1], FTTB (fiber to the building), and DWDM (dense wavelength-division multiplexing) technologies.

The requirements for higher speeds, longer distances, and better quality (lower latency, higher reliability and lower cost) that MAN services impose on 100-Gb-Ethernet technologies are summarized below.

The physical layer (PHY) of a typical MAN consists of Gigabit-Ethernet (1000BASE-SX/LX) [2] or 10-Gb-Ethernet (10GBASE-LR/ER) [3, 4] links. While this level of technology

A. Large-throughput and long-distance transmission Transmission up to 40 km is already specified in the 10-GbEthernet (10GBASE-ER, EW) standard [3], so the structure of

Edge Core SW SW

Edge Office 10G-PON ONU

10GbE

Server

100GbE

Metro-core

Edge

100GbE SW

Router

Edge 100GbE iDC

10GbE 10GbE Office

SW

Server Disk Disk

Figure 1. Schematic view of 100-Gb-Ethernet links in the MAN context

II.

REQUIREMENTS FOR THE 100-GB-ETHERNET PHYSICAL LAYER

0-7803-8939-5/05/$20.00 (C) 2005 IEEE

the current MAN is based on transmission over this distance. The 100-Gb-Ethernet concept thus requires data transmission over at least this distance [6]. The need to keep costs reasonable means that we will able to use neither signal amplification by EDFAs nor dispersion compensation. B. Highly reliable and low-latency signal transmission For an approach to become an Ethernet standard, it must be capable of a bit-error-rate (BER) below 10-12. Realizing a BER in this range for 100-Gb-Ethernet will require the use of a forward error correction (FEC) code. However, while the FEC code will have a strong effect in terms of error reduction, the encoding and decoding procedures will introduce delays that contribute to signal latency. The combination of high reliability and low latency is thus one of the difficult obstacles we face. Since an MAN will be required to carry important data, 100-Gb-Ethernet will have to be fault-tolerant. Mechanisms for fault tolerance should include both error detection and error recovery. Higher layers Logical link control (LLC) Media access control (MAC) Reconciliation sublayer (RS) CGMII 64B/66B x10 PCS (with de-skewing) Physical medium attachment (PMA)

PHY

PMD I/F PMD

10λ CWDM

10 parallel

SMF

Ribbon fiber

MDI

Figure 2. The context and structure of the physical layer (PHY) for 100-Gb Ethernet. CGMII: 100 gigabit media independent interface, PMD: physical medium dependent, MDI: medium dependent interface, SMF: single-mode fiber Tx

PCS

16 bit

(10)

64B/66B encoder

Serializer

Pattern inserter

64B/66B encoder

Serializer

(10)

(10)

Pattern inserter

64B/66B encoder

Serializer

(10)

De-skew buffer

64B/66B decoder

De-serializer

De-skew buffer

64B/66B decoder

De-serializer

(10)

(10)

De-skew buffer Delay controller

64B/66B decoder

(10)

De-serializer

Lane 1

(10)

Lane 9

Skew detector

Figure 3. Block diagram of 100-Gb-Ethernet

III.

TECHNOLOGIES

A. Overview To realize the above requirements, we developed the following technologies for 100-Gb-Ethernet (Figs. 2 and 3). 1) Synchronized parallel data transmission: The application of 12 x 10 Gb/s (10 1-bit data lanes and a 2-bit redundant lane) and ribbon-fiber-based synchronized parallel data transmission achieves short-reach (< 300 m) and large throughput while coarse wavelength division multiplexing (CWDM) is used for long-distance (< 40 km) data transmission. 10-Gb/s signaling speed is equal to 10-Gb-Ethernet, and it can be achieved with reasonable cost and sufficient reliability. 2) 64B/66B-code-sequence-based skew compensation: Lane-by-lane delay-time skew (< 80 ns) compensation is achieved by inserting reserved characters defined in 64B/66B code between frames and during periods where no data is to be transmitted; compensation for skew realizes synchronized parallel data transmission. 3) Hamming-code-based error correction: Low-latency Hamming-code-(132b, 140b) based error correction improves the bit error rate (BER) to the range (less than 10-12) required by Ethernet standards. 4) Fault-lane recovery mechanism: The two redundant lanes (parity and FEC data are transmitted) are used to implement fault detection and fault recovery systems for the link. 5) Auto link-speed selection mechanism: An automatic link-speed selection mechanism is used to realize compatibility between 100-Gb- and 10-Gb-Ethernet with no operator intervention. Keeping this simple means that the clock rate and framing-circuit structure must be compatible with both 100Gb-Ethenet and 10-Gb-Ethernet.

Lane 0

PD array module

160 bit

Lane 9

10

PMD service interface

Rx 16 bit

Lane 1

(10)

Idle pattern generater

CGMII

PMD Lane 0

VCSEL array module

160 bit

PMA

Pattern inserter

C. Compatibility with 10-Gb-Ethernet The need for compatibility with existing equipment means that 100-Gb-Ethernet units will have to be compatible with serial 10-Gb-Ethernet interfaces. That is, when a 100-GbEthernet unit is connected with a 10-Gb-Ethernet unit, the former should detect the configuration of the latter and operate in a compatible way. This should be achieved through a combination of link-speed detection and auto- negotiation. For the sake of simplicity, the circuit structure of 100-Gb-Ethernet (clock speed, framing mechanism, etc.) should be compatible with that of 10-Gb-Ethernet.

10

B. Synchronized parallel data transmission The combination of CWDM (wavelength: 1490–1600 nm) and parallel optical data transmission provides a way to attain transmission at 100-Gb/s over at least 40 km at reasonable cost. Since the transmission of serial data at 100-Gb/s remains technically difficult, we have specified the use of parallel data transmission over 12 10-Gb/s synchronized lanes (10-bit data and 2-bit redundant lanes) with ribbon-fiber and a serialized CWDM approaches (Fig. 2). A clock rate of 10 Gb/s has already been realized by 10-gigabit small-form-factor-interface

0-7803-8939-5/05/$20.00 (C) 2005 IEEE

(XFI) technology with CMOS logic. We thus use a 10-Gb/s clock with both the optical and electrical interfaces, and neither multiplexers nor demultiplexers are required. This 10-Gb/s based configuration achieves a smaller number of I/O lines than a 2.5-Gb/s based configuration. Furthermore, the signal integrity is better than in a 40-Gb/s-based configuration unless optical amplification and dispersion compensation are used with the latter. The 10-Gb/s-based configuration thus provides greater transmission distance than that of the 40-Gb/s case (limited by fiber dispersion). Multiplexed 100-Gb/s signal transmission could also be provided by a 10-Gb-Ethernet x 10-lane CWDM system (simple link aggregation as defined in Ethernet standards and no synchronization). However, 100-Gb- Ethernet provides greater bandwidth efficiency and packet-data transmission is easier to control (in particular, transmission does not require a load balancer). C. 64B/66B code-sequence-based skew compensation The de-skewing procedure (the related blocks are shown in Fig. 3), which compensates for the time skewing of the 10 high-speed signal lanes relative to each other, is an important part of 100-Gb-Ethernet. In the CWDM links solution, this skew is induced by the different signal transmission speeds for signals of different wavelengths (the differences are due to the wavelength dispersion of the fiber) and is less than 80 ns. In a parallel link, the skew is caused by small differences in the effective lengths of the individual fibers. In 100-Gb-Ethernet, special 64B/66B code–sequence-based data patterns are used for the de-skewing in the 10 lanes. The maximum degree of de-skewing for the worst case, which the system must cover, is calculated below.

signals with each other. Each decoded data is stored in a deskew buffer (1,024 bits) and, at the same time, sent to the skew detector. The skew detector detects the skew of the respective lanes. The delay controller compensates for the skew by controlling the timing of read-out from the de-skew buffer for each of the lanes. The receiving side of the PHY has 1024-bit of FIFOs in each lane, and the skew is compensated by this FIFOs based on the detected value of skew. The size of 1024-bit is larger than the 800-bit of maximum skew and it is enough for skew compensation. The de-skewing data patterns are constructed by 32 sets of 64B/66B codes (S0 to S31, shown in Fig. 5). In the 64B/66B codes for skew compensation, each of the first five characters (C0, C1, C2, C3, and C4) is assigned an /I/ or /K/ character, and the remaining three characters are ‘don’t care’. In all, 32 codes (S0 to S31) can be constructed from mixtures of five /I/s and /K/s. The 32 codes for skew compensation are transmitted in sequence (S0, S1, ..., S30, S31, S0, S1, ...) during gaps between frames. Tx side

t

2 1 0 31 30 29 28 27 26 25 8 7 6 5 4 3 2 1 0 31 30 29

Tx phase

S2 S1 S0 S31S30S29S28S27S26S25 S8 S7 S6 S5 S4 S3 S2 S1 S0 S31S30S29

Lane 0

S2 S1 S0 S31S30S29S28S27S26S25 S8 S7 S6 S5 S4 S3 S2 S1 S0 S31S30S29

Lane 1

S2 S1 S0 S31S30S29S28S27S26S25 S8 S7 S6 S5 S4 S3 S2 S1 S0 S31S30S29

Lane 10

64B/66B code After transmission Rx side

Maximum skew between the lanes = L * S /T

t

2 1 0 31 30 29 28 27 26 25 8 7 6 5 4 3 2 1 0 31 30 29

Rx phase

(1)

S1 S0 S31S30S29S28S27S26S25S24 S7 S6 S5 S4 S3 S2 S1 S0 S31S30S29S28

Lane 0

where transmission length L is 40 km; maximum skew between signals on any two CWDM lanes S is 2 ps/m in the given wavelength range; and signal period per bit T is 100 ps.

S5 S4 S3 S2 S1 S0 S31S30S29S28S11S10 S9 S8 S7 S6 S5 S4 S3 S2 S1 S0

Lane 1

S0 S31S30S29S28S27S26S25S24S23 S6 S5 S4 S3 S2 S1 S0 S31S30S29S28S27

Lane 10

= 40000 * 2 / 100 = 800 bits (80 ns)

Thus, a size of de-skewing data pattern should be larger than 1,600 bits (+/- 80 ns), which corresponds to twice length of maximum skew (80 ns). The 1,600 bits are covered by a sequence of 32 64B/66B codes (32 × 64 bits = 2,048 bits > 1,600 bits) in our proposal. 1) Structure of the circuit for de-skewing: In the transmitter-side of the physical coding sub-layer (PCS), each bit of the 160-bit-parallel input from the media access control (MAC) layer arrives at 625 Mb/s. This data is divided into 16bit, each of which is assigned to one of the 10 lanes (Fig. 3). The idle-pattern generator sequentially generates the 2,048bit de-skewing data; the current data is inserted in each of the lanes in 16-bit units over 4 cycles (total: 64 bits) (Fig. 4). Each de-skewing data pattern is included in all 10 lanes at the same time. An independent encoder produces the 64B/66B code for all lanes. The PCS on the receiver side decodes the 2,048-bit deskewing data in each lane and then resynchronizes the decoded

Figure 4. The skew-compensation mechanism Phase Code 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

S0 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15

Block payload C0 C1 C2 C3 C4 C5 /I/ /I/ /I/ /I/ /I/ DC /I/ /I/ /I/ /I/ /K/ DC /I/ /I/ /I/ /K/ /I/ DC /I/ /I/ /I/ /K/ /K/ DC /I/ /I/ /K/ /I/ /I/ DC /I/ /I/ /K/ /I/ /K/ DC /I/ /I/ /K/ /K/ /I/ DC /I/ /I/ /K/ /K/ /K/ DC /I/ /K/ /I/ /I/ /I/ DC /I/ /K/ /I/ /I/ /K/ DC /I/ /K/ /I/ /K/ /I/ DC /I/ /K/ /I/ /K/ /K/ DC /I/ /K/ /K/ /I/ /I/ DC /I/ /K/ /K/ /I/ /K/ DC /I/ /K/ /K/ /K/ /I/ DC /I/ /K/ /K/ /K/ /K/ DC

C6 DC DC DC DC DC DC DC DC DC DC DC DC DC DC DC DC

C7 DC DC DC DC DC DC DC DC DC DC DC DC DC DC DC DC

Phase Code 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

S16 S17 S18 S19 S20 S21 S22 S23 S24 S25 S26 S27 S28 S29 S30 S31

Block payload C0 C1 C2 C3 C4 C5 /K/ /I/ /I/ /I/ /I/ DC /K/ /I/ /I/ /I/ /K/ DC /K/ /I/ /I/ /K/ /I/ DC /K/ /I/ /I/ /K/ /K/ DC /K/ /I/ /K/ /I/ /I/ DC /K/ /I/ /K/ /I/ /K/ DC /K/ /I/ /K/ /K/ /I/ DC /K/ /I/ /K/ /K/ /K/ DC /K/ /K/ /I/ /I/ /I/ DC /K/ /K/ /I/ /I/ /K/ DC /K/ /K/ /I/ /K/ /I/ DC /K/ /K/ /I/ /K/ /K/ DC /K/ /K/ /K/ /I/ /I/ DC /K/ /K/ /K/ /I/ /K/ DC /K/ /K/ /K/ /K/ /I/ DC /K/ /K/ /K/ /K/ /K/ DC

Figure 5. Sequence of de-skewing codes. DC: don’t care

0-7803-8939-5/05/$20.00 (C) 2005 IEEE

C6 DC DC DC DC DC DC DC DC DC DC DC DC DC DC DC DC

C7 DC DC DC DC DC DC DC DC DC DC DC DC DC DC DC DC

2) Procedures of skew detection and compensation: The 10-Gb-Ethernet standard specifies a minimum inter-frame gap (IFG) of 12 bytes. The IFG for 100-Gb Ethernet can be longer but no shorter than this. In the case of 100-Gb-Ethernet, the unique 64B/66B codes (8 bytes, 64 bits) used for skew detection are inserted in every IFG. Thus, at least one pattern for detection of the inter-lane skew can always be inserted in each IFG. However, if the total IFG is less than 80 bytes (8 bytes × 10 data lanes), the skew of all lanes cannot be detected at the same time (that is, the skew can only be calculated for those lanes that actually have the code in the present IFG). On the other hand, the minimum length of 12-byte IFG can be used for single-lane skew detection. As shown in Fig.6, in case of that the de-skewing code is the 80-byte IFG (same patterns of S10 are detected in all ten lanes), the skew of all lanes are detected as “phase 10” at once. And in case of that the detected data pattern is the 12-byte IFG, the skew of only lane 2 is detected as “phase 5”. t 2 1 0 31 30 13 12 11 10 9 8 7 6 5 4 3 2 1 0 31 30 29 80-bytes 12-bytes S2 S1 S11S10 S2 S1 S29 S2 S1 S0 S11S10 S5 S2 S1 S2 S1 S0 S11S10 S5 S2 S1 S2 S1 S0 S10 S2 S1 S0 S2 S1 S0 S10 S1 S0 S2 S1 S0 S10 S1 S0 S2 S1 S0 S10 S1 S0 S2 S1 S0 S10 S1 S0 S2 S1 S0 S10 S9 S1 S0 S2 S1 S0 S10 S9 S1 S0

Idols and datas coexist in ,

Rx phase Lane 0 Lane 1 Lane 2 Lane 3 Lane 4 Lane 5 Lane 6 Lane 7 Lane 8 Lane 9

environmental fluctuations at any time during data transmission. The receiver side obtains the skew from differences between the sequence positions of the 2,048-bit sequence of de-skewing data patterns received on the 10 lanes. D. Hamming- code-based error correction The 100-Gb-Ethernet format includes Hamming-code(132b, 140b) based error correction (Fig. 7). Error correction is necessary to provide sufficiently fast (over 10 Gb/s) and reliable (BER < 10-12) data transmission. While a ReedSolomon (RS) or Trellis Code Modulate (TCM) code achieves good error-correction performance, both codes take a relatively long time to process and require relatively large data buffers, and thus increase both latency and circuit size. For these reasons, we selected a Hamming-code-based forward error correction code for 100-Gb-Ethernet. This achieves lowlatency (