Adaptive Anti-jitter Mechanism for Multi-Party

1 downloads 0 Views 145KB Size Report
Ramachandran Ramjee, Jim Kurose, Don Towsley, and Henning Schulzrinne, ... Sue B. Moon, Jim Kurose and Don Towsley, “Packet Audio Playout Delay ...
Adaptive Anti-jitter Mechanism for Multi-Party Conferencing in a H.323 Multi-Point Control Unit Miroslaw Narbutt and Liam Murphy Department of Computer Science, University College Dublin Belfield, Dublin 4 Abstract In this paper we propose a mechanism that can support multi-party conferencing. The main objective of this mechanism is to minimize effect of delay jitter. We have tested this mechanism in the Internet using H.323 terminals and H.323 Multipoint Control Unit. Our results show that the playout algorithms traditionally designed to work in the receiving endpoints can also be successfully implemented in the Multipoint Control Unit. As a result one can lower the rate of lost packets due to their late arrival. Introduction A typical VoIP terminal buffers incoming packets and delays their playout in order to compensate for variable network delays. This allows slower packets to arrive in time to be played out in the correct order. Clearly, the longer buffering delay, the more packets will arrive before their scheduled playout time and the better will be jitter compensation. The size of the jitter buffer may be kept fixed, or adaptively adjusted during the transmission. A number of algorithms have been proposed in the technical literature to control the size of the jitter buffer [1], [2], [3], [4], [5]. All these algorithms have been implemented at the receiver, supporting point-to-point connections only. In contrast, we propose an anti-jitter buffer mechanism that can be implemented in H.323 Multipoint Control Unit to support centralised multi-party conferences.

Multi-point conferences: centralized and decentralized model Multi-point conferences are defined as calls between three or more parties. ITU-T recommendation H.323 defines a Multi-Point Control Unit (MCU) to support multi-point conferencing [6]. According to H.323 recommendation an MCU contains two logical parts: •

Multipoint controller (MC) that handles the signalling and control messages necessary to setup and manage conferences.



Multipoint processor (MP) that accepts streams from endpoints, replicates them and forwards them to the participating endpoints.

An MCU can implement both MC and MP functions, in which case it is referred to as a centralized MCU. Alternatively, a decentralized MCU handles only the MC functions, leaving the multipoint processor function to the endpoints. The two basic models of multipoint conferences - centralized and decentralized - differ in the way they handle real-time media streams (audio and video). In the centralized model, all of the audio and video is transmitted to the central MCU that mixes the multiple audio streams, selects one video stream and retransmits the result to all of the participants. In the decentralized model each participant is responsible for its own audio mixing and video selection. The media may be sent between all entities utilizing either multicast, or a multiple unicast if the underlying network does not support multicast. In the centralized model each endpoint has only to encode its locally produced media streams and decode the set sent by the MCU. Terminals do not need to be modified and do not have to perform media mixing or transcoding. There is a standard point-to-point communication between each terminal and the MCU.

A

C MC

B B, C

A, C

A B+C A, B

A

B

B A+C

MC C A+B

C A

B

C

Figure1. Media exchange in the decentralized (left) and centralized (right) model of a multipoint conference On the other hand, the decentralized multipoint model does not require the presence of an MCU, a potentially expensive and limited resource. It is only required that one of the participating entities contain an MC. If the endpoint that has the MC leaves the conference, the MC must stay active or the conference is terminated. Media processing in centralized MCU In the centralised model (star call topology), an MCU receives media streams from all participants, mixes them, and redistributes the appropriate media stream back to the participants. Because mixing can only be performed on uncompressed data, the mixing algorithm follows a decode-mix-encode sequence. When an audio packet arrives at the mixing module, it is decoded into an audio frame, labelled with its RTP timestamp, and appended to the participant audio buffer queue. Every packetisation interval, a timer triggers a routine that mixes audio samples from appropriate input buffers into a combined audio frame. Then an audio frame is encoded, packetised, and sent back to the participants. adaptive anti-jitter buffering decoding

stream synchronization encoding

from A from B mixing from C

A+B

to C

Fig. 2. Media processing in MCU Anti-jitter buffering and stream synchronization takes place before mixing. Adaptive delay algorithms work independently on each of the input buffers. The timer intervals that trigger mixing routines are shortened or lengthened depending on the jitter delay estimation. This results in stretching or shrinking silence periods between talkspurts.

Synchronization: Audio and video packets are sent across the Internet using the best-effort UDP transport protocol, supported by the application layer RTP protocol. Each RTP header contains a timestamp, a sequence number, a marker bit, and a source id to identify the different streams. All these numbers are useful during the synchronization process. For example, the sequence number is necessary to detect packet losses, the timestamp is needed for inter-stream and intra-stream synchronization, and the marker bit indicates the beginning of a talkspurt.

Adaptive anti-jitter buffering Adaptive anti-jitter buffering dynamically adjusts the size of the receiving buffer in order to minimize packet loss due to their late arrival. A good anti-jitter buffering should be able to trade-off playout delay and packet loss rates. When packet loss ratio exceeds 5% and one-way end-to-end time delay exceeds 400 ms, holding an IP-based telephone conversation becomes difficult. ITU-T recommendation G.114 specifies a round trip delay of 300 ms for high quality voice traffic, which results in a one-way delay of 150 ms [7]. An effective way to choose the buffering delay is to adapt it to the delay characteristic of the network. Since the current delay characteristic is not known a priori, adaptive algorithms calculate the playout time of each incoming talkspurt based on the delays experienced by already-received packets. The first algorithm for adaptive jitter compensation was proposed by Ramjee et al in 1994 [1]. This algorithm estimates, for each incoming packet, the average network delay and a variation measure of this delay. Algorithm: ∧



i

i

∧ i

Let p be estimate of the playout delay of the i-th packet, d and v be estimate of the packet delay and its variance during the flow, respectively. Then at the beginning of a new talkspurt, the playout delay is computed as follows: ∧





pi = d i + B * v i Any subsequent packets of that talkspurt are played out with rate equal to the generation rate at the sender. ∧



The estimates d and follows: ∧

v are the running estimates of the packet delay and its variance and are computed as



d i = A* d (i-1) + (1-A)*ni; ∧





v i= A* v (i-1) + (1-A)* | d i – ni |; They are initialized when the first packet of the flow is received and updated each time a new packet arrives according to the total delay of that packet (ni ) introduced by the network. Constant A is a fixed weighting factor that characterizes the memory properties of this estimation. A is usually chosen to be 0.99802 to limit sensitivity to short-term packet jitter. B is a variation coefficient that controls delay/packet loss ratio (the larger the coefficient, the more packets are played out at the expense of longer delays). In our implementation we have chosen A=0.99 and B = 3.

Examining the anti-jitter mechanism To examine the performance of the anti-jitter mechanism we have built an MCU unit based on OpenH323 source code [8]. This entity is H.323-compliant and can interoperate with other H.323 software that uses G.711 A-law, G.711 u-law, GSM audio, and H.261 video compression schemes. For sending audio streams we used three H.323 terminals situated in Poznan University of Technology, Poland. The MCU unit was situated in the Performance Engineering Laboratory, Dublin City University, Ireland. The distance between terminals and MCU was 18 hops. All the interconnecting links had a bandwidth ranging from 34 to 155 megabits per second. We made our measurements in the afternoon between 3 pm and 4 pm. H.323 terminal (GSM audio) A H.323 terminal GSM (audio)

A,B,C

Internet B

delay: 197-453 ms

H.323 MCU A +B Performance Engineering Laboratory, School of Electronic Engineering, Dublin City University, IRELAND

C H.323 terminal (GSM audio)

A +B

Institute of Telecommunications Poznan University of Technology, POLAND

Fig. 3. The routes taken by the packets sent over the studied connection For the tests we have chosen one audio source which was common for all three sending terminals. We used GSM encoder providing one frame of audio (33 bytes) every 20 ms. Our terminal was set for one audio frames per packet. During 30 minutes of transmission our MCU unit received almost 30000 audio packets.

stream

encoding scheme

session duration

A B C

GSM GSM GSM

1839.3 s 1839.7 s 1839.6 s

number of packets received 13111 13659 2663

maximum network delay 425 ms 413 ms 453 ms

minimum network delay 197 ms 197 ms 294 ms

average network delay 237 ms 233 ms 381 ms

standard deviation 34 ms 32 ms 32 ms

We collected all experimental data (the arriving times, timestamps, sequence numbers, and marker bits) of all received packets (of all three streams) at the receiving MCU.

In order to examine our solution, we wrote a simulator which processes that data and simulated the behaviour of this mechanism. The simulator ran playout algorithms on three sets of data representing three received streams.

The figures below show the calculated buffering, playout delays (darker line) and the network delays (dots) of received packets. Packets whose delays are above the darker line are lost. All the others are successfully played out.

All three media streams received by the MCU were generated from the same audio source. MCU should synchronize them before mixing and send the mixed media back to the participants. The idea behind synchronization process is that one of the streams is chosen to be a master stream and all the others have to be synchronized according to its estimated playout delays. MCU should mix streams A and B and create a new stream A+B to be sent to participant C. In a similar way the MCU creates new streams to be send to participants A and B. Below we show possible solutions for choosing common buffering delays for streams A+B, B+C, C+A and A+B+C based on already calculated buffering delays.

Conclusions and Future Work In this paper we proposed an anti-jitter mechanism that can be implemented in the Multi-point Control Unit. The main objective of this mechanism is to keep the packet loss rate (due to their late arrival) as low as possible. We have tested our solution in the Internet with H.323 Multipoint Control Unit. Our results show that using adaptive playout algorithms in the MCU unit is feasible and can support media stream synchronisation. For the future work we want to implement other adaptive playout algorithms in the MCU unit. We also will focus on the synchronization process between media streams at the receiving MCU. References: 1. Ramachandran Ramjee, Jim Kurose, Don Towsley, and Henning Schulzrinne, “Adaptive playout mechanisms for packetized audio applications in wide-area networks”, in Proceedings of the Conference on Computer Communications (IEEE Infocom), Toronto, Canada, June 1994.

2. Sue B. Moon, Jim Kurose and Don Towsley, “Packet Audio Playout Delay Adjustment: Performance Bounds and Algorithms” in ACM Multimedia Systems, 1998. 3. J.C Bolot, A. Vega-Garcia, “Control Mechanism for Packet Audio in the Internet”, Proceedings of IEEE INFOCOM, April 1996. 4. M. Narbutt and L. Murphy, "Adaptive Playout Buffering for Audio/Video transmission over the Internet", Proceedings of the 17th IEE UK Teletraffic Symposium, Dublin, Ireland, May 16-18, 2001, p. 27/1-27/6. 5. M. Narbutt and L. Murphy, "Adaptive Playout Buffering for H.323 Voice over IP applications", Proceedings Irish Signals and Systems Conference 2001, Maynooth, Ireland, June 25-27, 2001, p. 201-206. 6. ITU-T “PACKET-BASED MULTIMEDIA COMMUNICATIONS SYSTEMS”. Recommendation H.323, September 1999. 7. ITU-T, "ONE-WAY TRANSMISSION TIME". Recommendation G.114, March 1993. 8. Source code available from www.openh323.org