Multimedia Group Synchronization Algorithm Based on RTP ... - UPV

3 downloads 0 Views 503KB Size Report
more sources transmitting media streams in a multicast way to one or several receivers .... (along the 10-minutes film) between them regarding the master stream ...
Multimedia Group Synchronization Algorithm Based on RTP/RTCP Fernando Boronat Seguí (IEEE Member), Juan Carlos Guerri Cebollada and Jaime Lloret Mauri Departamento de Comunicaciones, Universidad Politécnica de Valencia Camino de Vera s/n, CP. 46071, Valencia (SPAIN) {fboronat,jcguerri,jlloret}@dcom.upv.es

Abstract Most of actual multimedia tools use RTP/RTCP for inter-stream synchronization, but not for group synchronization. In this paper we describe and evaluate a proposal of modification of RTCP packets to provide a sender-based method for synchronization of a group of receivers.

1. Introduction Nowadays, there are lots of distributed multimedia applications for PCs, mobile devices (laptops, PDAs,…), etc. Most of these tools include intra-stream synchronization (for example, the temporal relationship between several frames of the same video sequence) and inter-stream synchronization (for example, the playback of a remote user’s audible words and the associated lip movements). But for the above examples of applications another kind of multimedia synchronization is required: Group Synchronization. It refers to the synchronization of the playout of all the streams, whether locally in each receiver (intra and inter-stream) and simultaneously at all the receivers (group). Nowadays, there are new proposals of intra and inter-stream synchronization algorithms for different scenarios [1] but there are very few proposals for group synchronization, such as, such as [2], [3], [4] and [5]. All of them are receiver-based and, except the one in [4] that uses RTP [6], the others don’t use a standard protocol and define specific control messages to interchange between sources and receivers. In this paper, a new sender-based method to acquire group synchronization is proposed, based on the use of RTP/RTCP and NTP (RFC 958, already included in most of the actual operating systems), allowing the minimization of control traffic and including the most common techniques used by the most-common interstream synchronization algorithms.

2. Group synchronization algorithm Our algorithm is valid in scenarios with one or more sources transmitting media streams in a multicast way to one or several receivers, through deterministic networks or networks with minimal QoS requirements (at least, a limited network delay experimented by the Media Units –MU- must be known), and all devices with a global time reference. We tackle the synchronization problem by dividing it in two phases (Figure 1): first phase, to get all the receivers starting the playout process and playing a media stream (considered as the master stream), in a synchronised way (Group Synchronization); and second phase to get all the streams played locally synchronised in each receiver (Local Inter-stream Synchronization). Master Receiver (Reference)

V

(Master Stream)

Synchroniser Source . . .

......

A

Source 2 Receiver 2

. . .

V

multicast . . .

Source M . . . Receiver N

Control Packets (Group) Playout (Inter -STREAM)

. . .

time Group Synch Inter-Stream Synch

A . . . V

MU

......

Inter-Stream Synch

A

time

......

Group Synch Inter-Stream Synch

time

Figure 1. Inter-stream and group synchronization The algorithm is sender-based, that is, the media source controls the playout of the receivers to be synchronous, taking information received in modified RTCP report packets from the receivers and sending new RTCP control packets to make the receivers correct their playout processes. For this, the source of the master stream (Synchroniser Source) needs some

feedback information about the state of each receiver master stream playout process. As RTP/RTCP can be malleable to provide the information required by a particular application, we define new packet extensions to get our objective. We propose to extend the Receiver Report RTCP packet (RR, [6]), and call it RR EXT RTCP packet, to include a profile-specific extension part with information useful for our algorithm: last MU played by the receiver and the NTP timestamp of the instant in which it was played (Figure 2a). With that information and an estimation of the limits of the network delay, the source can know the state of the playout processes of the master stream in all the receivers. It takes a receiver as the reference (master receiver) and will multicast action packets to make the receivers correct their master stream playout process accordingly (late receivers will skip MU and fast receivers will pause MU). Action Packets are new control APP RTCP Packets ([6]), called APP ACT packets, with an extension including an MU number and the NTP timestamp of the instant in which that MU should be played by all the receivers (Figure 2b). 0

2

V

4

P X

6

8

RC

10

M

12

14

16

18

20

22

24

26

28

30

31

length

PT = RR SSRC SSRC_1 (SSRC of first source)

cumulative number of packets lost

fraction lost

extended highest sequence number received interarrival jitter last SR (LSR) delay since last SR (DLSR) SSRC_2 (SSRC of second source)

… NTP timestamp (64 bits) Last MU played form source 1

…padding…

Last MU played form source 2

…padding…

a) RR EXT packet 1

2

V

4

P

8

Subtype

16

PT=APP=204

31

Lengrh

name (ASCII) = ‘ACT’ SSRC

process uses an internal channel to send its playout state to all the other media streams playout processes to let them know it and adapt themselves to it, jumping or skipping MU.

3. Evaluation We tested our proposal in a real scenario, our University WAN, modifying RTP-based mbone tools, such as vic and rat (for video and audio, respectively) and evaluating objective and subjectively the transmission of a sequence with separated audio and video streams from a videotape, located in Valencia Campus, to ten receivers, located in Gandia Campus, 70 km far away. All the equipment was globally synchronised by a stratum-1 NTP server in the Spanish Academic and Research network, IRIS network. For Group Synchronization, we selected one receiver as the master receiver and the audio stream as the master stream because of the synchronization requirements are stricter for audio than for video. For Inter-stream Synchronization, we communicated both streams playout processes via a local internal bus called mbus (http://www.mbus.org) to get inter-stream synchronization in each receiver. The video playout process adapts its state to the one of the audio playout process, by skipping or pausing video frames (MU). As playout delay negotiation between stream processes needs to be in a common format and with a common reference clock, RTCP messages provide individual mapping between the media and NTP timestamps. For both kinds of evaluation the video and audio formats were H.261 and GSM, respectively. For objective evaluation, capture and playout rate were 1 audio MU every 20 miliseconds and 1 video MU every 40 miliseconds. According with [7], we fixed the following limits: 120 msec as the maximum allowed value for asynchrony between receivers for the master streams (group, distributed), and 160 msec (even though values lower than 80 msec are considered ideal) as the maximum allowed value for asynchrony between audio and video playout processes (interstream, local).

name (ASCII) = ‘ACT’

------------------------------------ NTP Timestamp (64 bits) --------------------------Media Unit Sequence Number

…Padding…

b) ACT APP Packet

Figure 2. Format of the proposed packets Once group synchronization of receivers playing the master stream is achieved, then a local information interchange process will provide inter-stream synchronization between slave streams and master stream. We propose that the master stream playout

3.1. Subjective Evaluation This kind of evaluation determinates the user acceptation of a multimedia system. We centered the evaluation in studying how a user perceives the synchronization effect, from the psychophysical point of view. We based on the work presented by Steinmetz in [7]. We selected 20 users, none with experience in subjective evaluation or synchronization techniques. Only the 15 % of them had experience as users of

multimedia systems. The parameters and the scales used in the tests are shown in Tables 1 and 2. The scales are based on recommendation UIT-R BT.500-7. 15 frames/sec First Plane Action film 15 frames/sec First Plane Action film

Sequence type Video stream playout rate

With Interstream Synchr.

Sequence type Max. allowed asynchrony value (Skew) audio-video s Video stream playout rate Sequence type

With Group Synchr.

Max. allowed asynchrony value between audio receivers Max. allowed asynchrony value (Skew) audio-video

25 frames/sec First Plane Action film 25 frames/sec First Plane Action film

±80 ms

Synchr. degradation Not sure of accepting it Very annoying Annoying Slightly annoying Perceptible but not annoying Imperceptible

400 300 200 100 0 0

15 frames/sec First Plane Action film

25 frames/sec First Plane Action film

±120 msec. ±80 msec

Table 2. Scales used in the tests Grade 0 1 2 3 4 5

500

Synchr. Quality Not sure Totally without synchronization ---Totally with synchronization

4. Results

50

100

150

200

250

300

350

400

450

500

550

600

650

TIME (s) PC1

PC2

PC3

PC4

PC5

PC6

PC7

PC8

PC9

PC_MASTER

Figure 3. Playout delay of the master stream Figure 4 shows the distribution of the Square value of the Inter-stream detected asynchrony for one receiver (PC7). The above limits (in dashed lines) were only exceeded in very few occasions. Subjective evaluation showed that those occasions did not prove annoying for the application users. PC7

50000 Square Error (ms2)

Video stream playout rate

Without Synchr.

600 PLAYOUT DELAY (ms)

Table 1. Test parameters

RETARDO DE REPRODUCCIÓN DE LOS RECEPTORES DE AUDIO

700

40000 30000 20000 10000 0

4.1. Results of the Objective Evaluation We transmitted a 10 minutes film with separated video and audio streams. Without synchronization, the receivers got a mean value of 2,5 second asynchrony (along the 10-minutes film) between them regarding the master stream playout. Figure 3 presents the playout delay (from the transmission instant) of the audio (master) stream playout processes in the 10 receivers along the session using our algorithm. Playout delays in each receiver adjusted to the one of the master receiver (thick line). With regard to the control messages sent by the synchroniser source (ACT APP packets) was only the 0,14% of the total amount of packets (control and data) sent by it. On the other hand, control messages sent by the receivers (RR EXT packets) hardly supposed 6,88% of the total amount of packets sent by all the applications. We analysed the square value of group detected asynchrony and observerd that in no single receiver does it exceed the limit value of 14400 msec2 (square value of the maximum permitted, ±120 msec). Regarding the inter-stream synchronization, our algorithm works properly because it maintains this parameter lower than 6400 msec2 (corresponding to ±80 msec asynchrony) or, in the worst case, lower than 25600 msec2 (corresponding to ±160 msec asynchrony).

0

100

200

300

400

500

600

TIME (s)

Figure 4. Square value of detected asynchrony

4.2. Results of the Subjective Evaluation Figure 5 shows the results of the evaluation of the synchronization quality. The use of the group synchronization (distributed) algorithm obtained very good marks, even better than the sequences with only (local) inter-stream synchronization. Figure 6 presents the degradation of synchronization perceived by the users. In sentences with group synchronization they detected abnormal effects due to synchronization processes but evaluated them as imperceptible or not annoying. We appreciate that in first plane sequences, users could easily perceive when a situation of asynchrony between streams occurred. In this kind of sentences, users can easily detect a situation of no lip synchronization because they are watching directly the speaker’s face movements. In action film sentences there are frequent changes of planes so the synchronization actions (skipping and/or pausing MU) are more difficult to appreciate by the users. Moreover, the users are used to watch foreign films which have been translated to their own languages and in that processes there are asynchronies due to the own process of translation. Users tolerate well the

asynchronies in that kind of sequences because they are used to watch that effects and do not consider them as abnormal. Tests also showed that playout rate did not affect considerably to the appreciation of the synchronization quality. Nevertheless, as we expected, users detected more abnormal effects in sentences with rates of 15 frames/sec, such as mechanical and artificial movements or skips between different images.

6. References

5. Conclusions

[3] Akyildiz, I. F. And Yen, W., “Multimedia Group Synchronization Protocols for Integrated Services Networks”, IEEE JSAC., vol. 14, pp. 162 - 173, January 1996

[1] “Media Synchronization and QoS Packet Scheduling Algorithms for Wireless Systems”, Mobile Networks and Applications, Vol. 10 , Issue 1-2 (February 2005), pp: 233 – 249 [2] Yavatkar, R. And Lakshman, K., ‘Communication support for distributed collaborative applications’. Multimedia Systems, 2(4), 1994.

A modification of RTCP packets has been presented to make group synchronization possible and easy. It takes advantage of the feedback RR RTCP messages and the malleability of RTP/RTCP to provide the information required by a particular application, defining a new APP RTCP packet for the synchronization purpose. This modification hardly increases the network load and enables the asynchronies, between receivers (distributed) and between streams (local), to avoid exceeding the well known limits established as acceptable in the bibliography. The proposed group synchronization solution has obtained good results in the objective and subjective evaluation which validates it as a possible solution for multimedia applications which need group synchronization

[4] Diot, C. And Gautier, L., ‘A Distributed Architecture for Multiplayer Interactive Applications on the Internet’, IEEE Network, vol. 13, pp. 6 - 15, July/August 1999. [5] Ishibashi, Y., Tasaka, S. and Miyamoto, H.: ‘Joint Synchronization between Stored Media with Interactive Control and Live Media in Multicast Communications’, IEICE Trans. Commun., vol. E85-B, no. 4, pp. 812-822, Apr. 2002. [6] Schulzrinne, H.; Casner, S.; Frederick, R. And Jacobson, V., ‘RTP: A Transport Protocol for Real-Time Applications’. RFC-3550, July 2003. [7] Steimetz, R., “Human Perception of Jitter and Media Skew”, IEEE JSAC, Vol. 14, nº 1, january 1996

Ac tio n Film, 2 5 frame s / s ec

F irs t P l a n e , 2 5 f ra me s / s e c

10 0

10 0

80

80 60

%

%

40

60 40 20

20 0

0 NS/ NC

1

2

3

4

W it h o u t S y n c hro ni s a t io n W it h In t e r- s t re a m S y nc hro ni s a t i o n

5

NS/ NC

A ns w e r s

1

2

3

4

5 Ans we rs

Wit ho ut Sync hro nis atio n Wit h Int er-s tre am Synchro nis atio n Wit h Gro up Synchro nis atio n

W it h Gro up S y n c hro ni s a t i o n

Figure 5. Synchronization quality 25 frames/sec

100

100

100

80

80

80

80

20 0

%

35 25

0 0

0 1

2

3

4

0 5 Answers

a) Without Synchronization

55

60 40 20 0

20 0

0

0

0

1

2

25

%

20

4

5

%

30

40 0

3

55

60

%

40

40

%

60

%

%

%

25 frames/sec

25 frames/sec

25 frames/sec 100

15 0

0

b) With Group Synchronization

40 20 0

0

1

2

3

4

5

0

0

0

0

1

2

Answers

Answers

(First Plane sequence)

0

65

60

a) Without Synchronization

20

3

4

5 Answers

b) With Group Synchronization

(Action film sequence)

Figure 6. Synchronization degradation

15