Data-Driven Implementation of Efficient Protocol Handlers

18 downloads 80627 Views 468KB Size Report
of the data-driven protocol handlers to CDN (Contents De- livery Network). ..... service provider several benefits which are no distribution cost, reduced piracy ...
1

Data-Driven Implementation of Efficient Protocol Handlers Hiroaki NISHIKAWA†, Kazuhiro AOKI† and Hiroshi ISHII ‡ Abstract— This paper discusses and clarifies effectiveness of data-driven implementation of protocol handling system in multimedia networking environment. This paper first describes the cases where various types of multi-processing capabilities are required in the multimedia networking environment. Among them, it focuses on protocol handling that typically requires realtime multi-processing. This paper then clarifies ineffectiveness of pseudo multi-processing in sequential processing and superiority of data-driven multiprocessing implementation. The authors implement protocol handlers of GIOP/IIOP (General Inter-ORB (Object Request Broker) Protocol/ Internet Inter-ORB Protocol), TCP/IP over ATM by use of prototype data-driven processor, CUE-p (Coordinating Users’ requirements and Engineering constraints-prototype). We evaluate the performance of the data-driven protocol handlers and show that turn-around time is kept minimum independent of media variety and increase of multiplexing within acceptable throughput. Finally this paper shows possible application of the data-driven protocol handlers to CDN (Contents Delivery Network). Keywords— data-driven, multimedia, networking, protocol, realtime multi-processing

I. Introduction Recent explosion of the Internet has brought the situation that “anything is over the Internet”. The Internet becomes more and more indispensable for business, education, government, amusement, medical issues and so on. Versatile application and various types of underlying network are devoted to the Internet. Especially increase of multimedia handling in the Internet like CDN (Contents Delivery Network) requires broadband network transport as well as high speed contents processing. The Internet itself is a technology below the network layer but also strongly related to the application. We call the network and its usage environment that covers layers from the physical to application related to the Internet “multimedia networking environment”. In the multimedia networking environment, typical technology related to all the layers is protocol. This paper hence focuses on the effective handling of protocols in the multimedia networking environment. The requirement of multimedia protocol handling for the environment is “realtime multi-processing” that satisfies time constraints of each media and enables concurrently processing the multiple media. Pseudo multiplexing extending sequential processing methods natively cannot avoid overhead of thread switching. Considering the situation, studies so far try to satisfy the time constraint † Institute of Information Sciences and Electronics, University of Tsukuba, Tsukuba Science City, Ibaraki 305-8573 JAPAN. Email:{nisikawa, kaoki}@is.tsukuba.ac.jp ‡ NTT Information Sharing Platform Labs., Musashino-shi, Tokyo 180-8585 JAPAN. E-mail: [email protected]

based on scheduling and/or priority handling for realtime multi-processing[1]. Those studies suppose that there exists the process without any time-constraint or with weak constraint. However, the supposition is not realistic in the communication environment where each media has its own time-constraint. In order to satisfy above requirement, multi-processing achieving constant turn-around time is needed. Authors have been studying networking environment based on data-driven processor named CUE (Coordinating Users’ requirements and Engineering constraints)[2] that can natively realize any type of parallel processing such as concurrency, pipelining, and multi-processing [3]–[5]. In addition, we have clarified that the data-driven processor like CUE has high capability on reliability, multi-processing and multimedia information processing [6][7], high performance network configuration [8][9], distributed network fault management [10][11], and highly efficient network architecture[5]. And we have also studied data-driven implementation of protocol multi-processing [12][13]. Based on these studies, this paper first describes the cases where various types of multi-processing capabilities are required in the multimedia networking environment and possible applicability of data-driven processor for the environment. Then, it focuses on protocol handling that typically requires real time multiprocessing for all the layers of multimedia networking environment. And it discusses data-driven implementation of TCP/IP(Transmission Control Protocol/ Internet Protocol) and also GIOP/IIOP(General Inter-ORB (Object Request Broker) Protocol/ Internet Inter-ORB Protocol) that is ORB protocols, which are realized by use of prototype data-driven processor, CUE-p (CUE-prototype). And then this paper clarifies the performance of those data-driven protocol handlers and shows that CUE-p based protocol handlers can keep the turn-around time be minimum independent of media and grade of multiplexing within acceptable throughput. Finally this paper shows possible application of the data-driven protocol handlers to CDN (Contents Delivery Network). II. Multimedia Processing in the Multimedia Networking Environment and Applicability of Data-Driven Processor A. Needs of Realtime Multi-Processing In the multimedia networking environment, multimedia information shall be handled so concurrently and effectively in each layer that multi-processing is needed. The examples where multi-processing is required are shown here.

A typical example that multimedia networking environment is used is CDN (Contents Delivery Network). CDN consists of contents servers, server assignment server that assigns an appropriate contents server to each contents request and network transferring contents. Since the contents servers are accessed by multiple users simultaneously, it must handle multiple media with keeping multiple sessions at the same time. At the user side, user appliance must handle multiple media at the same time although session kept is single. Server assignment servers must process many user requests simultaneously. Network in nature needs handle multiple sessions simultaneously. This situation causes the need of realtime multi-processing in the environment because multimedia must be handled simultaneously with controlling multiple communication sessions under specific time constraint of each media. In case that eBusiness applications are realized on the application servers, Three-tier model is generally adopted. The platform of the application servers are component based by use of EJB (Enterprise Java Beans) and CORBA (Common Object Request Broker Architecture). In both the first tier, web servers, and the second tier, function servers, are installed on top of the common application servers platform. Especially in the case of transaction processing, time-constraint of process is very severe. The application servers platform must need realtime multiprocessing in order to handle simultaneous processing of multiple processes with time-constraints. Communication protocols in such layers as layer 2 (ATM, Ether), layer 3 (IP), layer 4 (TCP/UDP(User Datagram Protocol)) must simultaneously handle multiple sessions, connections, and datagrams, which causes the necessity of multi-processing capability. Thus, multi-processing is necessary any where in the multimedia networking environment. This paper, among them, focuses on multiprocessor implementation of protocol handling in for TCP/IP that are the typical Internet related protocols, whose underlying protocol is ATM and the upper layer is GIOP/IIOP supporting CORBA.

thread is apparently caused by overhead of thread switching. Besides, TCP/IP communications boards which shift the protocol handling to hardware on the boards from software on the workstations are developed to satisfy shorter time constraints. These boards achieve design throughput when IP datagram is longer than 4096 byte. Header handling is main process in TCP/IP handling. Therefore, process is more light when IP datagram is long because rate of header relatively reduce. However, length of IP datagram will be about 200 byte in multimedia communication. For example, length of Transport Stream Packet(TSP) in MPEG2 is 188 byte. Hence, processor architecture which can realize realtime multi-processing without any additional overhead should be applied to protocol handling required realtime multi-processing. C. Data-Driven Processor Architecture The data-driven implementation of protocol handling which described in this paper is studied on the superintegrated data-driven processors. Fig.1 shows configuration of these processors. They have already demonstrated high-definition video signal processing in realtime. Considering the results of application to video signal processing, it is also necessary to achieve the protocol handling on the same processor in order to realize multi-media networking environment.

RAM

DDP output

input

TBL GNT

SYC

INT

DDP

TBL GNT output

router SYC

B. Ineffectiveness of Pseudo Multi-Processing in Sequential Processor The authors evaluate pseudo multi-processing of TCP/IP in sequential processors. Two workstations accommodating CORBA(Common Object Request Broker Architecture), one of which is a client and the other is server, are interconnected through Ether network. The client generates threads and each thread sets up a TCP connection. On each TCP connection, the client requests process execution to the server. At every moment when a request arrives at the server from each client thread, server generates a thread. Several kind of processing time per thread were measured 10 times by changing number of threads from 1 to 45. Actually, Process execution time at the server and TCP/IP data transfer time on Ether were measured. Both time is increasing in proportional to the increase of number of threads. The increase in time per

RAM

router

INT

router RAM

VM

input

SUM RAM

TAM

INT: Integer & Logical Operation GNT: Generation Manipuration TBL: Table Operation Manipuration SYC: Synchronization & Constant Updating VM : Video Memory SUM: Summation DDP : Data-Driven Processor TAM: Tag Addressable Memory Fig. 1. Super-Integrated Data-Driven Processor for Video Signal Processing

As is shown in Fig.1, 2 DDP(Data-Driven Processor) which include 4 processor-cores, in which the core are named PE (Processing Element), is used with 1 TAM(Tag Addressable Memory) in order to realize each part of TCP/IP. All PE and routers in DDP and TAM is realized by self-timed elastic pipeline. Self-timed elastic pipeline is executed by hand-shake controls between pipeline stages. It is easy to construct multi processor because inter-PE and inter-chip is also controlled by hand-shake controls. Super-integration in DDP and TAM utilize this characteristic. PE in DDP or TAM has heterogenous instruction set, which makes possible to reduce hardware facility as much as homogeneous instruction set. DDP and TAM realize token as packet which has all information to execute operation. A packet consists of 2 word (32bits/word), which has operand (12bits), generation(24bits) which is tag, and destination(13bits). In high-definition video signal processing, operand is color information and generation is coordinates information. Bit length of them is long enough in the process. In DDP, INT(INTeger & logical operation) processes operand, GNT(GeNeraTion manipulation) processes generation, TBL(TaBLe operation manipulation) treats memory operation using table memory, SYC (SYnchronization & Constant updating) treats synchronizing process. TAM is data-driven processor specialized memory operation. In TAM, VM(Video Memory) treats history sensitive process, SUM(SUMmation) processes multiplex operation. Multiprocessing of memory operation inter generations is realized in TAM because memory is addressed by generation. D. Data-Driven Implementation of TCP/IP Handling TCP/IP handling are implemented sequentially using memory on sequential processors. Sequential process causes decreasing throughput and increasing turn-around time because our data-driven processor is consists of circular elastic pipeline. Actually, we estimate that sequential implementation is lower throughput and longer turnaround time than concurrent implementation in a part of IP sending. In this section, we discuss data-driven implementation of TCP/IP handling which is concurrent realization as possible. Fig.2 shows configuration of TCP/IP handling program. This figure is a screen shot of specification on RESCUE (Realtime Execution System for CUE series data-driven processors)[14]. TCP/IP handling consists of receiving process which sends data to application layer and sending process which sends acknowledgments to the network in response to the data sent from the application and received TCP segment.In Fig.2, ”IP recv”, ”TCP recv 1/2” and ”TCP recv 2/2” are receiving processes. ” TCP send” and ”IP send” shows sending processes. Besides, ”init TCP/IP program” is initialization process, and ”handle TCP retransmit” shows re-transmission in TCP handling. When re-transmission is occurred, it is difficult to ensure realtimeness by protocol handling. Therefore, we study realtime multi-processing of TCP/IP handling about receiving and sending processes which main processes of TCP/IP

handling. As shown in Fig.2, ”IP recv” inputs received IP datagram, initial signal and received interface indicator. Datadriven implementation treats IP datagram as a row of packet. A packet has 1 byte data. The order of packets is kept by generation. In ”IP recv”, IP header check and buffering datagram are handled simultaneously. And ”IP recv” outputs address of buffering data, IP address, data length and checksum. ”TCP recv 1/2” inputs them, handles TCP header check and outputs header information for acknowledgement to ”TCP send”. And at the same time ”TCP recv 2/2” controls connections. ”TCP recv 2/2” outputs data to upper layer such as GIOP/IIOP (General Inter-ORB Protocol/ Internet Inter-ORB Protocol) which are protocols of CORBA. ”TCP send” generates TCP header and pseudo header with buffering data. And it then outputs headers to ”IP send”. ”IP send” generates IP header from pseudo header, and outputs frames or PDU(Protocol Data Unit) which is an IP datagram, data length and trigger for network interface. Data-driven implementation realizes header check/generate concurrently because these can be independently processed each other except checksum. Therefore, turn-around time can be kept minimum time if checksum and data buffering is shorter than header check /generates. Table I shows estimation result of turn-around time in TCP/IP using RESCUE. As shown in Table I, turn-around time can be kept by 250 byte. If IP datagram is longer than 250 byte, turn-around time increase in proportional to length of IP datagram. For example, it takes beyond 300 µsec when length of IP datagram is 11 kbyte. At the same time, we show estimation result of turnaround time in UDP/IP. In this estimation, checksum in UDP isn’t processed. Data-driven implementation of UDP can be kept turn-around time by 1.4 kbyte. Therefore, checksum causes increasing turn-around time in TCP. TABLE I Turn-Around Time in TCP/UDP/IP Handling

IP recv < 250byte 11kbyte

18µsec. 48µsec.

TCP recv (1/2,2/2) 31µsec. 117µsec.

< 1.4kbyte

IP recv 18µsec.

UDP recv 12µsec.

TCP send

IP send

25µsec. 105µsec.

14µsec. 38µsec.

UDP send 10µsec.

IP send 14µsec.

E. Protocol Handling Oriented Data-Driven Processor Prototype CUE-p We implement TCP/IP using 8 DDPs and 4 TAMs in this study. When TCP/IP is classified 4 process: TCP send, TCP receive, IP send and IP receive, a process is implemented on 2 DDPs and a TAM as shown in Fig.DDPemu. Then, maximum throughput is about 80 Mbit/sec. and off-chip penalty is bottleneck of throughput.

Received IP datagram IP receive initial signal

Received stream data offset

Initial routing Set buffer hash table Set routing hash table Ethernet frame/ AAL-5 PDU header Initial TCP/IP trigger

Received stream header Received stream data

Received I/F Indicator

Invoke IIOP Output for emulator Connection established

Ethernet frame/ AAL-5 PDU

Send data stream

Length of output data

Checksum complete Signal

Net-IF init signal

Re-transmit clock

Fig. 2. Program Structure in Data-Driven Implementation of TCP/IP

Target throughput is 135 Mbit/sec. which is actually maximum throughput of OC-3 ATM (Optical Carrier-3 Asynchronous Mode). To achieve target throughput, we studied an implementation protocol handling oriented data-driven processor prototype CUE-p. CUE-p is realized on 0.35µm design rule. Therefore, we know 2DDPs or DDP and TAM can be integrated in a chip because DDP and TAM is implemented on 0.6 µm design rule.

CUE-p to networking environment using conventional network interface card. Besides, CUE-p board has IEEE 1394 interface to apply CUE-p to user access network. In experimental evaluation, maximum throughput of TCP/IP on CUE-p is 142 Mbit/sec. We show it is essential to utilize super-integration and tuning instruction to remove bottleneck.

We firstly evaluate which route of inter-chip is bottleneck. It is evident that the route between 2DDPs restrict throughput. Maximum throughput between 2DDPs is about 80 Mbit/sec. On the other hand, maximum throughput between DDP and TAM is about 140Mbit/sec. Therefore, we decide to integrate 2 DDP in a CUE-p.

III. Experimental Study on Multimedia Networking Environment

Besides, throughput of checksum must be refined. Maximum throughput of checksum is about 120 Mbit/sec. because of latency of memory access in TBL. To achieve target throughput, it is enough to minimize latency of memory access in checksum. Hence, CUE-p has multiplexer for checksum. Fig.3(a) shows a CUE-p board which has 4 CUE-p and 2 TAM. And Fig.3(b) shows construction of CUE-p board. Fig.3(c) shows block diagram of CUEp as mentioned above. CUE-p board is realized to apply

A. Experimental Study on Protocol Multi-Processing We experimentally study data-driven implementation of protocol multi- processing to show efficiency of data-driven implementation. We firstly implement TCP/IP on a CUEp board. Fig.3(b) show allocation of TCP/IP handling program. We assume multimedia networking environment such below. In client=server model, a client require servers to communicate another client. Servers then process synchronizing and multicasting media such as video, sound, documents and control sequences. Besides, we implemented GIOP/IIOP in CORBA to realize interoperability on networking environment. In this study, we evaluate multi-processing capability in these protocol handling.

IEEE1394 Interface

CUE-p

RAM

CUE-p

TBL

input TAM

CUE-p 1

IP recv

TCP recv 1/2

AAL

TCP recv 2/2

RAM

CUE-p 2 TCP send

TAM 0

TAM 1

MUL

INT

MUL

INT

input

router

CUE-p 3 IP send

TCP send

output

router

TAM

(a) Overview CUE-p 0

GNT

TBL

AP

GNT

output

RAM

INT: Integer & Logical Operation GNT: Generation Manipulation TBL: Table Operation Manipulation MUL: Multiplex Operation Manipulation CUE-p

TAM : Tag Addressable Memory AAL : ATM Adaptation Layer AP : Application Layer (b) Board Block Diagram (c) CUE-p Block Diagram Fig. 3. Data-Driven Processor Prototype; CUE-p TABLE II Condition of Media in an Assumed Environment

Video

Band width 2 Mbit/sec

Sound

64 kbit/sec

Documents



Control Sequence



Data length (per datagram) — (250byte) — (40byte) 11Kbyte x 3 (1408byte) 100byte (100byte)

Input interval 1msec 5msec — 100msec

We assume bandwidth, data length and input rate such as Table II. We consider MPEG4 as video, PCM as sound and typical Web page as a document. Considered actually maximum throughput of OC-3 ATM (135Mbit/sec.), maximum number of datagram which need to process concurrently is 35. Fig.4 shows evaluation result of multi-processing in protocol handling. In all media, turn-around time can be kept constantly when number of datagram increase by 35. Besides, in sound and control sequence, turn-around time

are kept minimum because data length is shorter than 250 byte. In documents, it takes 3 times as long as turn-around time of sound. But we consider it is no problem because time constraint of documents is longer than the others. B. Application for Multimedia Contents Processing We study an implementation of multimedia contents processing utilizing data-driven implementation of protocol multi-processing. Fig.5 shows configuration of ASP(Application Service Provider)[15] which provide contents on CDN. ASP gives users benefits which are no compatibility issue, no installation hassle and reduced downtime because ASP provides users application via network without installation. ASP also gives software vendor and service provider several benefits which are no distribution cost, reduced piracy, immediate upgrades and constant revenue potential. Therefore, ASP will be also indispensable for business, education, amusement and so on. We will study applying CUE processors to implementation of ASP. As shown in Fig.5, typical ASP consists of Web servers, Application processing servers and storage servers. Web access is main access to ASP from clients via wide area network. Applications which is provided by ASP should be used instantly via network without installation. Web access is an actual way to provide applications online because online applications such as some JAVA applets are

). ce sµ (e m i  T dn uo  r  A -n ru  T .)c es µ( e im T dn uo  r  A -n  ru T





























Number of User (a) Video

Number of User (c) Document

). ce sµ (e  m i  T dn uo  r  -A nr  uT ). ce sµ (e  m i  T dn uo  r  A -n  ru T





























Number of User (b) Sound

Number of User (d) Control Sequence

Fig. 4. Evaluation in Multi-processing of Protocol Handling

Application Processing Servers Web Server http TCP/IP

Application CORBA TCP/IP

Storage Servers

CUE processor System Network Fig. 5. Prototype of Application Service Provider

realized on Web browser. Web servers then process connection requirement form client simultaneously, allocate traffic to contents processing layer. Therefore, it is possible to apply data-driven implementation of protocol handling with HTTP(Hyper Text Transfer Protocol) to Web servers. In application processing layer, application processing function is developed on application server based on distributed processing environment such as CORBA and EJB (Enterprise JAVA Beans). ASP is required interoperability and portability because ASP must provide application regardless of client’s platform such as operating system, processor and interface. Therefore, CORBA and EJB are applied to application server in order to realize interoperability and

portability. We think that efficient application server which can ensure realtimeness can be implemented utilizing datadriven implementation of CORBA and TCP/IP handling. Besides, as another example, we study applying datadriven implementation of TCP/IP to user access network. The requirements for user to access the multimedia networking environment is considered as follows: (1) Efficient use of access line: It is required to use access line capability as efficiently as possible because generally access line has rather small bandwidth. In the multimedia networking environment, simultaneous communication between distributed servers in the network and user terminals will

Servers

Clients on PCs GIOP/IIOP IEEE1394

TCP connections IP network

Hub accommodating data-driven TCP/IP handler Fig. 6. Prototype of User Access Network

occur. The concurrent use of multiple connections must not give bad influence each other. (2) Shared use of access by stream and control/management messages: Even if control or management messages and stream information are transferred on logically different connections, those connections may share the same physical access line. Especially, the case several or tens Mbit/s multimedia information and tens kbit/s control/management information are transferred on the different ATM VP/VCs but in the physically same access line should be well taken into account. (3) Multiple terminal accommodation: It is required that multiple PCs in the same customer premise shall share the single access line. (4) Simple interface: Since it is not desirable to add advanced and complex functions to simple PCs, it is required to utilize the open and standard interface of PCs such as PCMCIA and IEEE1394. By making the most of the data-driven TCP/IP protocol handling, desirable access configuration can be achieved. The requirement (1) is satisfied by the high efficiency of data-driven TCP/IP protocol handling. To satisfy other requirements (2)-(4), we are planning to construct a TCP/IP hub that can accommodate multiple PCs via IEEE1394 interface [13]. IEEE1394 interface is increasingly introduced into home bus environment constructed by ordinary PCs (requirement (4)). It can easily realize multiple PC configuration with plug & play manner (requirement (3)). IEEE1394 can also realize high speed that is comparative to internal bus speed (at most 400Mb/s). It has another characteristics, that is it has both asynchronous and isochronous transfer mode. The characteristics is suit-

able for information transfer covering stream and control/management messages (requirement (2)). Fig.6 shows the overview of the planned configuration. At this point in time, since there is no standard about TCP/IP over IEEE1394, we assume the followings: (a) A data-driven TCP/IP hub has one IP address on which multiple TCP connections are multiplexed. (b) Information provided from/to each PC is transferred in TCP packets assembled/disassembled in the hub. Each TCP connection corresponding to each PC. (c) Within a hub, an ORB stub is installed by downloading via Java applet, if needed. Each PC can take advantage of ORB communication via GIOP/IIOP in high speed and those communication is multiplexed onto single access line by use of our CUE processor based TCP/IP hub, which can realize efficient use of access line. C. Data-Driven Implementation of Multimedia Networking Environment As mentioned in above sections, we will study datadriven implementation of multimedia networking environment. CUE project have already realized a superintegrated data-driven processor CUE-v1 (CUE-version1) and a networking interface board. Fig.7(a) shows overview of CUE-v1 board. CUE-v1 board consists of 7 CUE-v1 as shown in Fig.7(b). Fig.7(c) is block diagram of CUEv1. CUE-v1 which super-integrated CUE-p and 2 TAMs is realized to establish on-chip memory operation. CUE-v1 consists of 6 kinds of PEs, and 12 PEs in total. Fig.8 shows our networking interface board. Fig.8(a) is an overview of networking interface board. The networking interface board consists of OC-3 ATM/ FastEthernet LSIs, FPGA(ALTERA EPF10K70), DPM(IDT 70V27X25) and 2 CUE-v1 as shown in Fig.8(b). There are bottlenecks

 

Top

Bottom

(a) CUE-v1 Board

input

Table TableMemory Memory

PE

PE

Router PE PE

Video VideoMemory Memory

PE

VM mem.

PE

output



  

Router

CUE-v1

Table Memory Video Memory (b) Block Diagram of CUE-v1 Board

input

PE

PE

Router PE PE

PE

VM mem.

PE

output

TBL mem.

Router

(c) Block Diagram of CUE-v1

   

Fig. 7. CUE-v1 Board

among networking interface and processors which handle protocols because data transmission among them are implemented by software. And at the same time, it is necessary for CUE processors to study a implementation of cooperating with processors which are controlled by clock in order to realize interoperability among applications on Von Neumann processors and protocol handling/conversion on CUE processors. Therefore, the implementation applies FPGA and DPM to data transmission among 2 CUEv1 and network LSIs(netLSIs) which are ATM LSI and FastEthernet LSI for realizing multi-processing on data transmission. Specification of data transmission among FPGA, DPM and netLSIs on networking interface board is below. In sending IP datagrams from CUE-v1 to network, IP datagrams are stored in a buffer to construct IP datagrams. It is necessary to right the order of packets which consists of an IP datagram. Righting the order of packets and outputting them are essentially sequential processes. As mentioned in data-driven implementation of TCP/IP handling, sequential processes can be bottleneck on CUE processor.

We then applied DPM to the buffer because write process to buffer and read process from buffer can be independent each other. Therefore, transmission of IP datagram can be realized without any runtime overheads. Receiving IP datagrams from network is easier than sending. When an IP datagram is received to CUE-v1, it isn’t necessary to store IP datagrams because the order of packets is already right. When ATM cell/ Ethernet frame which consists of an IP datagram are inputted from network, netLSI checks these header. netLSI then restores correct ATM cells/ Ethernet frames to IP datagrams. After that netLSI gives IP datagrams the address which make IP datagrams input to CUE-v1. FPGA transforms IP datagram to a row of packets and gives packets generation which indicates the order of packets. CUE-v1 actually receives the packets. On the other hand, when IP datagram is sent from CUE-v1 to network, FPGA stores an IP datagram on DPM and writes a packet descriptor on DPM. When CUE-v1 issues commands to netLSI, FPGA interprets commands to send an IP datagram to network. After that, netLSI read an IP datagram and a PD on DPM, and transform an IP data-

Fast Ethernet LSI ATM LSI FPGA

Face ATM Fast Ethernet

CUE-v1

(a) Overview

Bottom Output

ATM LSI

FPGA

CUE-v1 Input

RAM

Fast Ethernet LSI

CUE-v1 DPM

PCI Bus (b) Block Diagram Fig. 8. Data-Driven Networking Interface Board

gram to ATM cells/ Ethernet frames. Finally netLSI outputs to network. Multi-processing of these processes are realized to keep throughput of network and protocol handling on CUE-v1. As a result, we show that the networking interface has achieved maximum throughput of OC-3 ATM/ FastEthernet. IV. Conclusion This paper discussed data-driven implementation of protocol handlers that achieves realtime multi-processing in the multimedia networking environment. It clarified the necessity of multi-processing capability in the multimedia networking environment that includes CDN , eBusiness platform and so on. Especially, this paper focused on protocol handling that typically requires realtime multiprocessing in the environment in order to handle simultaneous multiple sessions, connections and datagrams for each of multimedia with time-constraint. Through the experimental study considering applying the handler to streaming media, this paper clarified that it can realize multi-processing with keeping turn-around time minimum to achieve realtimeness. It also showed that both on-chip multiprocessor configuration and adjustment of instruction set are effective to implement data-driven processor suitable for protocol handling by applying CUE-p for the protocol handling. Characteristics of CUE processors in realtime multiprocessing is useful and essential to realize multimedia contents processing environment such as ASP and user access network. In ASP and user access network, realtime multiprocessing capability is required in protocol handling and

distributed processing environment such as CORBA. This paper discussed that it is possible to apply CUE processors to implementation of ASP and user access network. Besides, we realized our networking interface board as a prototype. The networking interface board applied FPGA and DPM to converting data format among data-driven processors and OC-3 ATM/ FastEthernet LSIs in order to realize multi-processing data transmission. We have demonstrated that the networking interface board can achieve maximum throughput of OC-3 ATM. Besides, the networking interface board keeps programmability because protocol handling is implemented on data-driven processors. We show possibility of cooperation among CUE processors and Von Neumann processors through the study. To implement networking node on 1-chip, the processor integrates data-driven processor core for protocol handling/conversion and Von Neumann processor core to implement application in 1-chip. Further study will be made to show more clearly the superiority of data-driven implementation by applying the data-driven processor to other systems and functions in the multimedia networking environment. For example, we have been studying a data-driven implementation of protocol conversion to realize self-evolutional networking environment. And data-driven implementation of media processing and media conversion will be studied for multimedia networking environment. Acknowledgment Although it is impossible to give credit individually to all those who organized and supported the CUE project,the

authors would like to express their sincere appreciation to all the colleagues in the project. This research is partially supported by the Scientific Grant of Japan Society for the Promotion of Science, the Telecommunication Advancement Foundation and Semiconductor Technology Academic Research Center. References [1]

[2]

[3] [4] [5]

[6]

[7] [8]

[9]

[10] [11] [12]

[13]

[14]

[15]

M. Adaka and E. Okubo, “A Dynamic Synchronization Protocol and Scheduling Method Based on Timestamp Ordering for RealTime Transactions,” IEICE transaction D-I, Vol.J82-D-I No.4 pp.560-570, 1999 [In Japanese]. H. Nishikawa and S. Miyata, “Design Philosophy of SuperIntegrated Data-Driven Processors: CUE,” Proc. of the 1998 Int. Conf. on Parallel and Distributed Processing Techniques and Applications, pp. 415-422, Las Vegas, U.S.A., July 1998. G.Nilsson, F.Duppy and M.Chapman, “An Overview of the Telecommunications Information Networking Architecture,” TINA95, pp.1-12, Feb 1995. P. Hellemans, H. Vanderstraeten, P. Lago, J. Yelmo, J. Villamor, and G. Canal, ”TINA Service Architecture: From Specification to Implementation”, TINA’97 pp.174-183, Nov. 1997. H. Nishikawa, S. Miyata, S. Yoshida, T. Muramatsu, H. Ishii, Y. Inoue, and K. Kitami, “Data-Driven Implementation of TINA kernel Transport Network,” Proc. of the TINA’97 Conf., IEEE, pp. 184-192, Santiago, Chile, Nov. 1997. H. Ishii, H. Nishikawa, H. Kobayashi, H. Tanaka and Y. Inoue, “A study on applying data-driven processor to TINA environment”, IEICE Networking Architecture Workshop, 3-3-1-8, Dec. 1996 [In Japanese]. H. Nishikawa, H. Ishii, Y. Inoue, “A Stream-Oriented Data-Driven Processor Realizing Hyper-Distributed Systems”, IASTED PDCS 96, pp.47-51, Oct. 1996. H. Ishii, H. Nishikawa, H. Kobayashi and Y. Inoue, “Implementation of TINA-based High Quality Multimedia Networks”, IEICE transaction B-I, Vol.J80-B-I No.6 pp.457-464, 1997 [In Japanese]. H. Nishikawa, S. Miyata, S. Yoshida, T. Muramatsu, H. Ishii, H. Kobayashi and Y. Inoue, “A Data-Driven Implementation of Telecommunication Network Systems”, ISADS97 pp.51-58, Apr. 1997. H. Ishii, H. Nishikawa, Y. Inoue, “Data-Driven Fault Management for TINA Applications” IEICE Trans. Commun. Vol.E80B, No.6 pp.907-914 1997. H. Ishii, H. Tanaka, H. Nishikawa, “Reliable TINA-based Telecommunication Networking Environment”, IASTED EuroPDS’97. H. Ishii, H. Nishikawa, and Y. Inoue, “Data-Driven Implementation of Highly Efficient TCP/IP Handler to Access the TINA Network,” IEICE Trans. Commun., vol. E83-B, no. 6, pp. 13551362, June 2000. K. Aoki, S. Kudo and H. Nishikawa, “Data-Driven ProtocolHandling for Interoperable Networking Environment,” Proc. of the 2001 Int. Conf. on Parallel and Distributed Processing Techniques and Applications, pp. 243-249, Las Vegas, U.S.A., June 2001. Y. Wabiko and H. Nishikawa, “A Data-Driven Paradigm to Develop and Tune Data-Driven Realtime System,” Proc. of the 2001 Int. Conf. on Parallel and Distributed Processing Techniques and Applications, pp. 350-356, Las Vegas, U.S.A. June 2001. Lixin Tao, “Shifting Paradigms with the Application Service Provider Model,” IEEE COMPUTER, vol. 34, no. 10, pp. 32-39, Oct. 2001.