Continuous Media Playback and Jitter Control - CiteSeerX

3 downloads 652 Views 246KB Size Report
University of Technology, Sydney. NSW-2007, Australia ... end delay experienced by a packet is 2 ms and the maximum delay is 7 ms then the maximum delay ...
Continuous Media Playback and Jitter Control Sanjay K. Jha and Michael Fry P.O.Box 123 School of Computing Sciences University of Technology, Sydney NSW-2007, Australia Phone: +612 3301858 Fax: +612 3301807 e-mail: {sanjay,mike}@socs.uts.edu.au

1

Continuous Media Playback and Jitter Control Sanjay K. Jha and Michael Fry School of Computing Sciences University of Technology, Sydney Abstract The purpose of this paper is to examine problems associated with display (playback) of live continuous media under varying conditions for an internetwork of workstations running a general purpose operating system such as Unix. Under the assumption that the network cannot guarantee the required bounds on delay and jitter, there is a need to accommodate the delay jitter in the end systems. Various methods of jitter smoothing at the end systems and their suitability to audio as well as video transmission are discussed. An experimental test bed used to investigate these problems is described. Some initial empirical results obtained using this test bed are also presented.

I.

Introduction

There has been a dramatic increase in the processing power of workstations and bandwidth of high speed networks. This has given rise to new real-time applications such as multimedia applications. These applications have traffic characteristics and performance requirements which are quite different from existing data-oriented applications. In order to satisfy these requirements, the underlying system needs to guarantee a predictable “Quality of Service”(QoS). A Multimedia communication system should meet the following properties: •

bound on delay and jitter



effective utilization of bandwidth



acceptable error rate



low processing overhead for the underlying communication and end system



adaptability to dynamically changing network and traffic conditions

2

Variance in end-to-end delay is called jitter. For example, if the minimum end-toend delay experienced by a packet is 2 ms and the maximum delay is 7 ms then the maximum delay jitter of the connection is 5 ms. Delay jitter is an important performance metric for real-time traffic such as video-conferencing. For example if the screen is not updated regularly every 30th second or faster, the user will notice flickering in the image. Similarly if voice samples are not played out at regular intervals, voice output may sound distorted. Some factors responsible for causing jitter are: •

the time required to digitise, compress and decompress



operating system scheduling latencies



variation in delay in the network transport system Interactive multimedia applications such as video-conferencing require a bound

on jitter in addition to a bound on delay. These guarantees can be provided if the network is connection oriented and resources can be reserved in advance during the connection set up phase. Dedicated links such as T1 (high bandwidth) or ISDN (low bandwidth) already provide room-based teleconferencing systems. Intel’s ProShare is an example of commercially available video-conferencing products based on ISDN [Int93]. The new generation networking technologies such as Fiber Distributed Data Interface FDDI-II [Kes91], Asynchronous Transfer Mode (ATM) [Bou91][DeP93] and Distributed Queue, Dual Bus (DQDB) have been designed to support continuous media with guaranteed QoS. The Pandora System is an example of bringing real-time audio and video traffic onto desktops via an ATM network [Hop90,Jon93]. Under the assumption that the network cannot guarantee the required bounds on delay and jitter, there is a need to accommodate the delay jitter in the end systems. If the underlying network is connectionless (individual packets may take different routes), the video frames may arrive at the destination out of sequence or after the time at which they should be displayed. A gap may result if a frame is not available for display. This may affect the quality of audio/video playout. This paper is organised as follows. In the next section we present a summary of research work done to solve this problem. Section-3 presents some methods of controlling delay jitter and playout of real-time audio.

3

Problems specific to video playback are discussed in Section-4. Section-5 describes the experimental test bed used to investigate this problem and Section-6 presents some initial empirical results obtained using this testbed. Finally our conclusions and future work are explained in the Section-7.

II.

Related Work

Jitter control and playback of real-time information is an active area of research. In this section, some research efforts in this direction are described. Ferrari describes the performance guarantees required for real-time traffic in packet switched network [Fer90]. These performance parameters are termed Quality of Service (QoS) parameters. These include bounds on delay jitter and, in turn, bounds on maximum and minimum delay to be experienced by packets on the channel. Verma proposes a scheme for guaranteeing delay jitter in a connection-oriented packet-switching network [Ver91]. Each intermediate packet forwarding node guarantees the bounds during connection set up phase. The real-time channel administration protocol (RCAP) is another example of resource reservation protocols, where applications specify their QoS requirements and the network reserves CPU, buffer space, bandwidth etc. to satisfy those requirements [Ban91]. To guarantee QoS on IP based internetworks a resource reservation protocol ST-II has been proposed [Top90]. The Heidelberg Transport System (HeiTS) was implemented over Token Ring using ST-II at the IBM European Networking Center [Heh92, Her91, Her92]. This provides an end to end communication system for continuous media and normal data. QoS negotiation is supported by the most prominent reservation protocols. The Session Reservation Protocol(SRP) of the DASH project allows applications to reserve capacity at each host in an IP internetwork and then use standard IP protocols to transmit data [And90]. RSVP is a resource reservation protocol which also supports a dynamic multipoint communication model. It is a simplex protocol and provides receiver initiated reservations to accommodate heterogeneity among receivers as well as dynamic membership changes [Zha93].

4

Dupuy. et al, [Dup92] provide a survey of networks and protocols for supporting QoS guarantee. Researchers at Lancaster University have been actively working on QoS provisioning in current distributed systems architectures. Their QoS architecture framework is described in [Cam94]. Some work has been done to reduce delay jitter at the end workstations. Non realtime operating systems such as Unix have unpredictable response times to generate, process and display continuous media. This may result in high levels of delay jitter. Azuma observed that variable delays experienced by processes running under Unix lead to unacceptably large errors in the correspondence between objects in the real world and objects in the virtual world [Azu94]. To address this problem some experiments have been performed running multimedia applications as high priority threads. Mauthe and others have implemented these applications using high-priority threads on PS/2 workstations running OS/2 [Mau92]. Fisher modified the Unix (Ultrix 4.2) kernel to support better response time for real-time processes [Fis92]. The ARTs project uses Real-time Mach to provide guaranteed response time to real-time processes [Mer94, Tok93]. The DASH Kernel allows applications to specify their resource requirements using the DASH resource model. The kernel then tries to meet the QoS requirements [And90, Gov91]. Running a process as high priority affects the performance of other processes in a general purpose operating system. It has been shown that running applications as realtime class processes in SVR4 affects other applications, such as the graphical user interface, which also needs a quick response [Nie94]. Some systems use special purpose devices to eliminate jitter at the end systems. The Pandora project handles audio/video by using a specialised device directly connected to the network [Hop90,Jon93]. The Etherphone project at Xerox PARC used special telephones to process audio packets and directly transmit them onto the Ethernet [Ter88]. Craig Partridge argues that isochronous applications do not require jitter controlled networks. His assumptions are that the receiving application has buffer space

5

equal to the channel bandwidth multiplied by the maximum inter-arrival variance. He also assumes that the maximum variance is equal to the maximum delay [Pat91]. The work described so far has attempted to eliminate delay jitter. Other work attempts to smoothe jitter rather than eliminate it. In this approach, the receiving system selects a destination wait time and attempts to play the received frames after this time. Selection of this destination wait time is a very challenging task and should take into consideration the variance in end to end delay. Such systems should also dynamically adjust the destination wait time. Montgomery suggested various ways of determining end to end delay and then displaying the frames based on this estimate [Mon83]. The RealTime Transport Protocol (RTP) uses techniques described by Montgomery to estimate end to end delay when clocks are not synchronised [Sch93]. This method uses blind delay, which means that the receiver assumes that the first received frame experiences maximum possible delay, and delays the display of frames accordingly. Naylor and Kleinrock have investigated an adaptive destination buffering scheme which may be used to eliminate jitter of streams as voice [Nay82]. A discussion of the methods called E (Expand the playout time) and I (Ignore late arriving packets) proposed by Naylor and Klienrock, is presented in the next section.. The Pandora System [Jon93] uses clawback buffers to place packets as they arrive. These buffers are designed to remove the effects of jitter and are placed as close to the destination as possible. Some ways have been suggested as to how much to buffer. This in turn accommodates the jitter. Stone and Jeffay have experimented by implementing the I and E policies of Naylor and their own policy called Queue Monitoring Policy to evaluate the performance of these policies for playout of audio packets [Sto95]. In the next section we discuss some jitter smoothing methods including the Queue Monitoring Policy.

III.

Jitter smoothing methods

We discuss some of the methods used for smoothing packet delay variability in the output by destination buffering. The basic idea behind each of them is to delay the display of the first packet by an amount D (called destination wait time) so that at least 6

more than one frame has arrived at the receiver in most of the cases. This reduces the frequency and duration of gaps caused by late arrival of packets but results into increased latency. Selection of the value of D is the key to the success of any algorithm. Naylor and Kleinrock have shown, through analytical work and simulation, that a destination wait time D can be chosen at the beginning of each talkspurt ( In voice communications a short stream of voice is followed by a silence period. This short stream is called a talkspurt.). A reasonable choice would be to estimate the value of D equal to the value by which the maximally delayed packet exceeds the delay of the first packet. Their approach observes the delays of the last m audio fragments. The k largest delays are discarded and then the largest remaining delay is selected as the value of D. A mathematical model is also discussed and a rule of thumb is proposed for choosing m and k (m > 40 and k ≈ 0.07*m). Two possible playback schemes: method E(time expanded in order to preserve information) and method I (late data ignored in order to preserve timing) are also proposed by Naylor and Klienrock. A hand simulation is shown in Figure 1 to illustrate how buffering can improve the playout of frames. We assume that frames suffer variable delays in the network and are ready to be displayed at the arrival time shown in 1.a. Frames may arrive out of sequence. The light horizontal bars on the left hand side of each of the vertical lines show the playout times (spaced 33 ms apart which corresponds to a 30 frames/second rate of video display). The thick bars on the right hand side show the frames being actually displayed at that particular playout time. No Buffering Policy In the first case (b) we don’t buffer frames and start displaying frames as they arrive, out of 8 frames only five could be displayed and there are lots of gaps between the display of frames. Frame number 2 arrived after 3 was displayed so it was discarded. Similarly when a burst of frames (5,6,7) arrives, only 7 gets displayed as the others are late. E & I Policies In the case of policies E & I we wait for an amount D (in our example 100 ms) which could be selected based on observations from the past, as described in the previous

7

paragraph. For both policies (c) and (d) there is a gap in playout at 349 ms playout time as the delay suffered by 5th packet is 226 ms. (It is more than the delay suffered by the first packet (117 ms) plus D (100 ms)). For policy I, frame 5 is not displayed because it arrives late (At 358 ms whereas its playout time was 349 ms). In case of the policy E we extend the playout period for the 5th frame and display it at the next playout time. This results in the display of all 8 frames.

0ms

1

0 ms

3 2

117ms

117ms

1 D=100ms

4

3

184ms 219ms

7 6 5 231ms

278ms

1 2 3

4

4 371ms 377ms 411ms

349ms

414ms

4 349ms

7

6

8

7

5 6

8

7

448ms

481ms Arrival Time

(a)

No Buffering

(b)

1 2 3

315ms

358ms

8

217ms

I-Policy

(c)

8

E-Policy

(d)

Figure 1: Impact of policies on playout

The problem with the E policy is that all the frames from 5 to 8 are displayed with an end to end display latency of 250 ms, in contrast to a display latency of 217 ms for frames 6 to 8 in case of policy I. A burst of frames caused an increase in the display latency which affects all subsequent frames. Thus we cannot buffer all the late arriving frames if we want a limit on display latency. For interactive video-conferencing scenario

8

where there is a limit of 250 ms maximum on end to end display latency [Cam93], we need to drop frames from the buffer. Pandora Policy The Pandora system has adopted a policy in which the destination wait time is adjusted dynamically by adjusting the queue length [Hop90]. An incoming frame is not added to the audio display queue if the queue length has been greater than a target value (4 ms of audio data) for sufficiently long interval (8 seconds). This mechanism allows the delay for jitter correction to be reduced at the rate of 2 ms every 8 sec. In simple terms, if there are enough frames in the queue then we need to reduce the destination wait time for the frames. Queue Monitoring Policy The queue monitoring policy proposed by Stone and Jeffay [Sto95] is a variation of the policy used in the Pandora system. In this policy, for each possible buffer length(size) a threshold value (called “target value” in Pandora) is defined. At the time of addition of a frame to the buffer, the length of queue is checked against a target value at each display initiation time. If the length of the queue is greater than this target for a sufficiently long duration, the incoming audio frame is discarded. The assumption here is that large variations in end to end delay are expected to occur infrequently, and that small variations are expected to occur more frequently. The threshold value for long queue lengths specifies a short duration, whereas for short queue lengths duration are longer. Some experiments were performed to compare the I & E policies with the queue monitoring policy of Stone. They collected traces of end-to end delay for audio packets in a LAN and campus wide internetwork, and using a simulator they tested the effect of applying each policy on audio playout. Initially they used a fixed threshold value for each queue length greater than 2 and then a variable threshold value for each queue size. They found that the queue monitoring policy performed better than I & E policies in terms of display latency and gap frequency.

9

IV.

Problems with video playback Most of the work reported so far is primarily concerned with voice traffic. In

these methods, the start of a talkspurt is used to choose a new target destination wait time. We need to assess the suitability of these schemes for traffic such as video or continuous voice traffic such as music which does not have talkspurts and silence periods. To the best of our knowledge, existing video-conferencing software either lacks jitter control and playback policies or else very simplified mechanisms. For example the internet video coferencing tool Vic doesn’t have any playback policy implemented [McCa95]. Vosaic (short for video Mosaic) client adapts to the received video rate by discarding frames that have missed the deadline and buffers early frames [Che95]. Some problems are unique to video streams. First of all, the display initiation time is longer in case of video (33 ms for 30 frames/sec). This implies that the threshold values (target value) for dropping packets from the queue will be different from audio. The compression techniques used for Video frames may use inter-frame dependent coding. For example MPEG compression [Gal91] uses I, P and B frames. I frames are needed by all P and B frames to be decoded. This means that frames cannot be dropped at random for such compression methods. Also there is a need to verify how the policies for audio discussed in the previous section scale in a Wide Area Network scenario where the variance in delay is much larger. Since video frames would occupy larger amount of buffer space, there has to be a limit on how much can be buffered. An uncompressed ten minute video clip consumes some 22 Gbytes of storage and its voice grade audio requires another 5 Mbytes. Uncompressed NTSC composite represents almost 28 Mbytes of pixel data per second. HDTV needs a data rate about 20 times higher. For a non-real-time operating system such as Unix, the display initiation time would vary. This is contrary to our assumptions for hand simulation, or the simulation performed by Stone. We also need to test these policies for a variety of applications. In the case of interactive teleconferencing, we need a bound on delay but some gaps are acceptable. For video applications it may be desirable to display the previous frame if a

10

gap occurs to retain continuity. Whereas for audio, silence can be inserted. Playout and jitter control problmes become more complicated in case of synchronised audio and video. Charles Arthur argues that “It’s good to talk: speech alone can be more effective than delayed video pictures” [Cha95]. The next section describeds an experimental testbed set up for the purpose of testing these policies for video packets in a real environment.

V.

Experimental Testbed

In order to perform an empirical study of various jitter and playout management policies we use a video-conferencing application with Sun Sparc20 workstations (CPU-60Mhz). These workstations are connected using Ethernet switch LanBooster-2000 series (Onet Data Communication Technology) which provides 10 Mbps switched links per node. The Sparc-20 workstations are equipped with SunVideo cards to provide connectivity to video cameras. Figure 2 shows the hardware configuration used in the experiments.

Sun Video Card

Video camera

Video Capture Subsytsem Compression Engine

CPU Frame storage

S-Bus video Interface

Memory

Cache Sparc-20 Workstation

Sun Sparc-20

LanBooster -2000 Ethernet Switch

Figure 2 Hardware System Overview

11

The Video-conferencing program is implemented using the X-Image library (Solaris OS) and Motif’s event driven programming for Graphical User Interface. Figure 3 shows an overview of the various software modules. The real-time video frames are captured and then compressed by the SunVideo card using Joint Photographic Experts Group (JPEG) compression method (The SunVideo card supports other compression methods too). Frames are compressed using X-Image library calls. The captured frame is packetised and headers are appended. The header contains sequence number, grab-time, size of frame and send-time. Frames are marshalled to construct an equivalent External Data Representation (XDR) format and then sent using using the User Datagram Protocol (UDP) option of the the socket interface. The sending function is called every timeout period specified by the user (using Motif’s Timeout callback). For example if we want to send frames at 30 frames/sec then the timeout is set to 33 ms. A delay module is used to introduce artificial delay on each packet which will be helpful in emulating the WAN/internetworking delay behaviour for testing purposes. Delay is read from a file and packets are put into a transmission queue for that period of time. The Packet scheduler removes the packet after the specified delay period and transmits it on the network. Frames can be dropped at random if we want to simulate the loss of frames in the network. Solaris recommendations for writing real-time applications were followed to get optimum performance. The receiver listens on the UDP socket (socket is also asynchronous). Frames are received asynchronously (using Motif’s Callback). At the display time frame is decompressed rescaled and dithered. Decompression is done in software on the host machine (The SunVideo card doesn’t support decompression). If the jitter or playout policy requires the frames to be queued before display then the frames are put into a display queue. Display of frames is done through a function which is called every timeout period, similar to the way we send frames. A monitor collects the traces from frames and writes them to a file. In order not to affect the real-time behaviour, the traces are written to files using memory mapping techniques. The monitor also displays quality of service parameters such as end-to-end delay and end to end display latency of frames through the GUI. These traces can be

12

analysed later through another GUI. This enables automatic calculation of statistics and plotting by clicking a few buttons. Appendix A shows pictures of various GUI developed for this test bed.

Capture frame

Compress Rescale

Display Frame

Display QoS Parameters

Analyser

Queue

Policy

Monitor Send Frame

file

Data Dither Rescale Decompress

Delay Queue

Receive Frame

Kernel Device Driver

Kernel Device Driver

Switched Ethernet

Figure 3 Software System Overview

VI.

Empirical Results

JPEG was used as the compression method for video frames. JPEG frame sizes vary according to the content of the scene. The worst case can be 3 times larger that the best case frame. NTSC frames (640 x 480) were scaled down by a factor of 2. Some measurements were taken to see the average size of JPEG frames in our video-

13

conferencing application. The size of compressed frames was on the average about 8000 bytes. Timestamping was used to calculate various timings. Since the LAN was running Network Time Protocol (NTP) [Mil89], the clock drift was very small.

VI.1 Effect of message fragmentation Since the underlying network is switched Ethernet, allowing a maximum packet size of around 1500bytes, the frames are fragmented by the IP layer. There is a choice of either fragmenting at the application level before sending UDP messages or allowing the IP layer to do it. A packet filter called snoop (Solaris 2.3) was used to count the number of messages received on the host machine for a particular UDP port. It was found that fragmentation at the application level resulted in the loss of more frames (see Table 1). It was also observed that more frames were received by the UDP layer itself but the receiving process could assemble fewer frames when the packets were fragmented at the application level. There are two possible reasons. One is that the loss of a single packet results in discarding the whole frame ( upto 5 packets). Also there is the likelihood that the UDP buffer was being over-run, as the receiving process was not able to empty out the buffer fast enough. This issue was not investigated further because it was not the main focus of this work. We continued experimenting without fragmenting the frames at the application level. Table 1 Effect of Packetization Frame rate (sender)/ Time-out period

Message size (bytes) at UDP layer

Number of messages received by UDP(snoop)

Total of 800 frames sent each time

Number received

of

(receive process)

30 fps / 33ms

1350

4697

120

15 fps / 66ms

1350

4800

304

12.5 fps / 99ms

1350

4800

545

30 fps / 33ms

8100

800

444

15 fps / 66ms

8100

800

636

12.5 fps / 99ms

8100

800

800

14

frames

VI.2 Slow decompression and its effect on playout Initially an attempt was made to send frames at 30 fps and display at the same rate. However, it was found that the display of frames was much slower (Table 2). It is also worth noticing that the number of frames actually displayed were far less than the number of frames received at the speed of 30 fps and 15 fps. This resulted into wastage of 49% of received frames at 30 fps and 12.5% of received frames at 15 fps. Since

Table 2 Total frame received Vs displayed Frame rate(sender)

Total frames sent

frames received by UDP layer

frames displayed

30 fps

5000

3862 (77%)

1953(39%)

15 fps

5000

4994(99.88%)

4369(87%)

12.5 fps

5000

4994(99.88%)

4992(99.84%)

current video protocols do rate control on the basis of frames received [Bol94] but if feedback can be sent to the sender as to the rate of frame processing, then considerable bandwidth will be saved. The main reason for the loss of frames is that the decompression process takes place in the host machine (i.e. software decompression). This is a function of the speed of the CPU. A benchmark was carried out to see the performance of various modules of the video-conferencing software, shown in Figure 3. Table 3 shows the results of this benchmarking. As expected, the decompression cycle is the most time consuming. The average value of the decompression cycle was around 82 ms which indicated that for this test-bed one could expect an average of 12 fps to be displayed by the receiver. This fact is also supported by Table 2, where if the frames are sent at 12.5 fps, the receiver can Table 3 Benchmarking of various modules Sender

Time (ms)

Receiver

Time (ms)

compression + scaling

9.97

decompress + rescale + dither

79.09

marshaling (XDR)

0.9

demarshalling

1.58

send

2.092

receive

1.044

Total

12.96

total

81.71

15

display up to 99.84% of the frames received. The values in Table 3 are only approximations as many of the X-Image library calls are executed asynchronously. The X-Image library also executes many of the calls together (so called Xil molecules) which results into a lesser number of memory copies and subsequently better performance. In fact, the sender side of the video-conferencing was initially written without using the molecule-related recommendations. This resulted in very poor performance (only a few frames per second) whereas after following the molecule-related recommendations, 30 fps rate was achieved at the sender side. Since the video-conferencing application was running as a normal application under Solaris (applications can be run as real-time but this affects the performance of other processes adversely) we tried to find out the interarrival rate of the frames at the sending process. On the average, frames suffered a delay of the order of few milliseconds. For example if a frame rate of 30 fps (i.e. inter-arrival rate of 33 ms) was specified then the frames were available to the sending process after 40 ms on the average(Table 4). Internal event processing in SunOS 4.0 and 4.1 produce variable delays measured 5 ms [Fis92]. Table4 Inter frame arrival time frame rate

time (ms)

30 fps (33ms)

40

15 fps (66ms)

70

12.5fps (80ms)

90

10 fps (99ms)

100

VI.3 Impact of receiver load on playout In a real video-conferencing application, the host will be running both sending and receiving processes. The impact of other concurrent processes on the display of frames was also investigated. For this experiment, the same machine was running both sender and receiver process. All the frames received were put into a queue before being decompressed and displayed, to try to minimise the loss due to transient overloads(As indicated by Table 2, many received frames couldn’t be displayed). Figure 4 shows that the interframe display time is much higher when the sender is still sending frames.

16

By the time that the receiver has displayed frame number 1500, the sender had already finished sending 5000 frames. In the absence of an active sending process, the CPU is able to devote more time to receiving and displaying frames. As a result, the interframe display time has dropped. This is also evident from Figure 5 where the number of frames to be displayed in the queue is rising until frame number 1500, but starts dropping Inter Frame Display Distribution

Display Queue Size

550

2500 qqa33.dat

Mean = 126.26 SD = 29.02 Max = 518.56 Min = 82.65

500

qqa33.dat Mean = 1180.70 SD = 662.81 Max = 2309.00 Min = 1.00

450

2000

Number of Frames

400

Time(ms)

350 300 250

1500

1000

200 150

500

100 50 0

500

1000

1500

2000 2500 3000 Frame Number

3500

4000

4500

0

5000

0

500

1000

1500

2000 2500 3000 Time (Frame No. Display)

3500

4000

4500

5000

Figure 5

Figure 4

Display Queue Size

Inter Frame Display Distribution

2500

1400 Mean = 126.76 SD = 38.66 Max = 1361.15 Min = 79.05

qqa66.dat

qqa66.dat Mean = 1004.25 SD = 638.45 Max = 2172.00 Min = 1.00

1200 2000

Number of Frames

Time(ms)

1000

800

600

400

1500

1000

500

200 0 0

500

0 0

500

1000

1500

2000 2500 3000 Frame Number

Figure 6

3500

4000

4500

1000

1500

2000 2500 3000 Time (Frame No. Display)

3500

4000

4500

5000

5000

Figure 7

gradually after that. Figures 6 and 7 also confirm the same behaviour when we change the frame rate from 30 fps to 15 fps at the sender. Similar behaviour was observed by Cho et el. [Cho95]. Kevin Fall and others conducted more formal study of the impact of process load on video playback and came to similar conclusions [Fal95]. It is evident from these results that flow control mechanisms need to take into consideration the current load of a receiving host. The receiver should provide additional feedback to the sender at regular intervals about its load. This will help the sender regulate its data rate.

17

Since all the video frames were buffered before being displayed, it was important to look at the impact of buffering on the display latency. Figures 8 and 9 show the end to Display Latency(End_to_End)

Display Latency(End_to_End)

300000

300000

qqa66.dat

qqa33.dat Mean = 133408.70 SD = 83531.01 Max = 265815.88 Min = 142.44 250000

200000

200000

Time(ms)

Time(ms)

Mean = 152865.58 SD = 76067.44 Max = 262108.73 Min = 253.46 250000

150000

150000

100000

100000

50000

50000

0

0 0

500

1000

1500

2000 2500 3000 Frame Number

3500

4000

4500

0

5000

500

1000

1500

2000 2500 3000 Frame Number

3500

4000

4500

5000

Figure 9

Figure 8

end display latency corresponding to 30 fps and 15 fps. End to end display latency is the time difference between acquisition of frame and display of that frame. Table 5 End to End display latency (all recd. Frames queued) Frame rate Mean SD Maximum Minimum (ms)

(ms)

(ms)

(ms)

30 fps

152866

76067

262108

253

15 fps

133409

83531

265815

142

As we can see from the graphs and Table 5, the display latency has a mean of the magnitude of a few minutes. For 30 fps the mean is around 2.5 min. and for 15 fps it is 2.2 min. For an interactive application such as teleconferencing, the acceptable end to end display latency is 250 ms. If we have infinite storage and there is no constraint on timing, we can display the frames at a suitable time later. But for interactive applications the conversation becomes unintelligible if the end to end display latency becomes very high. In order to reduce the end to end display latency we can choose from one of the following approaches: a) drop the frames from the display queue as the display latency increases above a certain threshold value (as discussed in the previous section).

18

b)acquire faster software/hardware for decompression. c)reduce the frame rate at the sender so that receiver can keep pace.

VI.4 Buffer size and end to end delay latency In order to determine the limit on queue size, some experiments were carried out to measure the impact of increasing queue size on interframe display time and the display latency. Table 6 shows that interframe display time does not change noticeably. As observed earlier, the decompression cycle is takes on average 82 ms. Table 7 shows that the end to end display latency increases as we increase the number of frames in the buffer. This is an important Quality of Service parameter and would play a decisive role in selection of a particular jitter and playout management policy. Since the users of video-conferencing will dictate what quality of service they need. A GUI was developed to facilitate on-line monitoring of end-to-end delay suffered Table 6 Interframe display time Queue size

2 3

Mean(ms)

Mean(ms)

@30fps

@15fps

126

106 105

113

4

121

116

5

122

116

Table7 End to end display latency Queue Size

Mean (ms)

Mean(ms)

@30fps

@15fps

2

320

237

3

510

443

4

545

502

5

676

571

by packets and the end to end display latency along with the frame rate (Refer to the Appendix A for a screen snapshot of the monitoring widget). By monitoring these parameters, the user can select a particular jitter control and playout policy.

VII.

Conclusion and Future Work

The experiments showed that application layer fragmentation of video frames was not desirable. The decompression cycle was found to be the most time consuming part and it restricted the maximum display rate to 12.5 frames/sec. Many frames were received by the receive side could not be displayed as the receiver did not have sufficient CPU power

19

to keep up with decoding the frames. Under additonal receiver load, even lesser number of frames were being displayed. The receive side needs to provide feedback to the sender as to how many received frames could not be displayed and the sender can adapt the sending rate accordingly. The existing jitter smoothing and playback policies as discussed in Section 3 were mainly developed for audio. There is a need to evaluate their suitability for video. Future work will include testing of these policies and their modification using the test bed described earlier. We would also like to experiment with synchronised audio/video and multicast communications.

VIII. Acknowledgment Experiments described in this report were carried out at the Distributed Computing and Communications (DCC) Laboratory at Columbia University, New York while the Sanjay was on leave from University of Technology, Sydney.

IX.

Reference

[And90] Anderson D., Herrtwich R., And Schaefer C., “SRP: A resource reservation protocol for guaranteed-performance communications in the internet,” The International Computer Science Institute, Berkeley, CA(1991) [Azu94] Azuma R., Bishop G., “Improving static and Dynamic Registration in an Optical See-through HMD”, Proceedings of SIGGRAPH ‘94, Orlando, FL, pp. 197-204, July 24-29, 1994. [Ban91] Banerjea A., Mah B., “The Real Time Chanel Administration Protocol”, 2nd International Workshop on NOSSDAV, Heidelberg, Germany, Nov 91. [Bol94] Bolot J. C., Turletti T., “A Rate Control Mechanism for Packet Video in the Internet”, INFOCOM’94, June 1994. [Cam93] Campbell A., Coulson G., Garcia F., Hutchison D., Leopold H., “Integrated Quality of Service for Multimedia Communications”, IEEE INFOCOM’93, San Fransico, March 1993. [Cam94] Campbell A., Coulson G.and Hutchison D. “A Quality of Service Architecture”, ACM Computer Communications Review, April 1994. [Cha95] Charles A., “The difficult art of videophone conversation”, New Scientist, pp.20, 14th January, 1995. [Che95] Chen Zhigang et el, “Real Time Video and Audio in the World Wide Web”. In 4th International World Wide Web Conference, Boston, Massachussetts, Dec 11-14, 1995. [Cho95] Cho H., Fry M., Seneviratne A., Witana V., “Towards A Hybrid Scheme for Application Adaptivity. Proc. Of 2nd International COST 237 Workshop on Multimedia Transport and Teleservices, Copenhagen, November, 1995. [DeP93] DePrycker M., “ Asynchronous Transfer Mode”, Ellis Horwood, 1993.

20

[Dup92] Dupuy S., Tawbi W., Horlait E., “Protocols for High-Speed Multimedia Communication Networks”, Computer Communications, Vol.15, No.6, pp. 349-358, July/August 1992 [Fal95] Fall K., Pasquale J., McCanne S., “Workstation Video Playback Performance with Competitive Process Load”, NOSSDAV’95, Boston, USA. [Fer90] Ferrari, D., “Client Requirements for Real-Time Communication Services”, IEEE Communications, pp. 65-72, November 1990. [Fis92] Fisher T., “Real-time Scheduling support in Ultrix 4.2 for Multimedia Communication”, 3rd International Workshop on NOSSDAV, San Deigo, CA, November 1992. [Gov91] Govindan R., Anderson D.P., “Scheduling and IPC Mechanisms for Continuous Media, Proc. ACM Symposium on Operating Systems Principles, ACM OS Review, Vol.25, No.5, pp. 68-80 1991 [Gal91] Gall D. Le., “MPEG: A video compression standard for Multimedia Applications.”, Communications of the ACM, 34(4):46-58, April 1991. [Gru85] Gruber J. G., Strawczynski, L., “Subjective Effects of Variable Delay and Speech Clipping in Dynamically Managed Voice Systems”, IEEE Transactions on Communications, Vol. COMM-33, pp. 801-808, August 1985. [Heh91] Hehmann D., Herrwich R. G., Shulz W., Shutt T., Steinmetz R., “Implementing HeiTS: Architecture and Implementation Strategy of Heidelberg High-Speed Transport System. 2nd International Workshop on NOSSDAV, Heidelberg, Germany, Novermber 1991. [Her92] Herrwich R.G., Delgrossi L., “Beyond ST-II: Fulfilling the requirements of Multimedia Communication”, 3rd International Workshop on NOSSDAV, San Deigo, CA, November 1992. [Her91] Herrwich R.G., Nagarajan R., Vogt C., “Guaranteed Performance Multimedia Communication using ST-II over Token Ring”, Technical Report, IBM, European Networking Center. [Hop90] Hopper A., “Pandora: An experimental System for Multimedia Applications”, ACM Operating Systems Review, Vol. 24, no. 2, pp. 19-34, April 1991. [Int93] Intel ProShare Personal Conferencing Video System 200, Intel Corporation. [Jon93] Jones A., Hopper A., “Handling Audio and Video Streams in a Distributed Environment”, Proc. ACM symposium on Operating Systems Principles, Asheville, NC, Operating Systems Review, Vol.27, No.5, pp.231-243, December 1993. [Kes91] Inside FDDI-II. LAN Magazine, pp. 117-125, March 1991. [Bou91] Le Boudec, Jean-Yves, 1991, “The Asynchronous Transfer Mode: A tutorial”, IBM Research Report RZ2133, May 1991. [Mau92] Mauthe A., Schulz W., Steinmetz R., “Inside the Heidelberg Multimedia Operating System Support: Real-Time Processing of Continuous Media in OS/2”, IBM ENC Technical Report No. 43.9214, September 1992. [McCa95] McCanne S., Jacobson V., “Vic: A Flexible Framework for Packet Video”, In Proc. 3rd ACM International Multimedia Conference and Exhibition, Multimedia’95, San Fransico, CA, November 5-9, 1995 [Mer94] Mercer C., Savage S., Tokuda H., “Processor Capacity Reserves: Operating Systems Support For Multimedia Applications”, International Conference on Multimedia Computing and Systems”, Boston, MA, pp. 90-99, May 14-19, 1994. [Mil89] Mills D., “Measured Performance of the Network Time Protocol in The Internet System”, RFC1128, UDEL, October 1989. [Mon83] Montgomery W. A., “Techniques for Packet-Voice Synchronization, IEEE Journal on Selected Areas in Communication, Vol. SAC-1, No.6, pp.1022-1028, December 1992.

21

[Nay82] Naylor W.E., Kleinrock L., “Stream Traffic Communication in Packet Switched Networks: Destination Buffering Considerations”, IEEE Transactions on Communications, Vol. COM-30, No.12, 2534. [Nei93] Neih J., Hanko J., Nothcutt D., “SVR4 UNIX Scheduler Unacceptable for Multimedia Applications”, Proceedings of Fourth International Workshop on NOSSDAV, Lancaster, U.K., December 93. [Pat91] Patridge C. , “Isochronous Applications Do Not Require Jitter-Controlled Networks.”, Request For Comment: 1257, 1991. [Sch92] Schulzrinne H., “Voice Communication Across the Internet: A Network Voice Terminal”, Technical Report, University of Massachuesetts. [Sch93] Schulzrinne H., Casner S., “RTP: A Transport Protocol for Real-Time Applications”, Internet Engineering Task Force, Internet Draft, October 1993. [Sto95] Stone D. L. and Jeffay K., “An Empirical Study of Delay Jitter Management Policies”, Multimedia Systems, Vol.2, No.6, pp.267-279, January 1995. [Ter88] Terry D.B. and Swinehart D.C., “Managing Stored Voice in the Etherphone System”, ACM Transactions on Computer Systems, Vol.6, No.1, pp.3-27 February 1988. [Tok93] Tokuda H. and Kitayam T., “Dynamic QoS Control Based on Reat-Time Threads”, Proceedings of Fourth International Workshop on NOSSDAV, Lancaster, U.K., December 93. [Top90] Topolic C., Experimental Internet Stream Protocol: Version 2 (ST-II), Internet Protocol(ST-II) RFC1190, October 1990. [Tur92] Turner C. J., Petrerson, Larry L., “Image Transfer: An End-to-End Design”, Comp. Comm. Review, Vol.22, No.4 pp.258-268, October 1992 [Ver91] Verma D., Zhang H., Ferrari D., “Guaranteeing delay jitter bounds in a Packet Switching Network”, In Proceedings of Tricomm’91, pp. 35-46, Chapel Hill, North Carolina, April 1991. [Zha93] Zhang Lixia et.al. RSVP:”A New Resource Reservation Protocol”, IEEE Network Magazine, September 1993.

22