A Communication Architecture for Critical

0 downloads 0 Views 346KB Size Report
97-9 Generalized Semi-Markovian Process Algebra, M. Bravetti, M. Bernardo, ..... hand-written notes, interactive audio communications, reports, etc. ...... for Media Mixing in Multimedia Conferences, IEEE/ACM Trans. on Networking, Vol. 1, N. 1 ...
A Communication Architecture for Critical Distributed Multimedia Applications: Design, Implementation, and Evaluation

Fabio Panzieri

Marco Roccetti

Technical Report UBLCS-98-7 June 1998

Department of Computer Science University of Bologna Mura Anteo Zamboni 7 40127 Bologna (Italy)

The University of Bologna Department of Computer Science Research Technical Reports are available in gzipped PostScript format via anonymous FTP from the area ftp.cs.unibo.it:/pub/TR/UBLCS or via WWW at URL http://www.cs.unibo.it/. Plain-text abstracts organized by year are available in the directory ABSTRACTS. All local authors can be reached via e-mail at the address [email protected]. Questions and comments should be addressed to [email protected].

Recent Titles from the UBLCS Technical Report Series 96-4 May and Must Testing in the Join-Calculus, C. Laneve, March 1996. 96-5 The Shape of Shade: a Coordination System, S. Castellani, P. Ciancarini, D. Rossi, March 1996. 96-6 Engineering Formal Requirements: an Analysis and Testing Method for Z Documents, P. Ciancarini, S. Cimato, C. Mascolo, March 1996. 96-7 Using Bayesian Belief Networks for the Automated Assessment of Students’ Knowledge of Geometry Problem Solving Procedures, M. Roccetti, P. Salomoni, March 1996 (Revised March 1997). 96-8 Virtual Interactions: An Investigation of the Dynamics of Sharing Graphs, A. Asperti, C. Laneve, April 1996. 96-9 Towards an Algebra of Actors, M. Gaspari, April 1996. 96-10 Mobile Petri Nets, A. Asperti, N. Busi, May 1996. 96-11 Communication Support for Critical Distributed Multimedia Applications: an Experimental Study, F. Panzieri, M. Roccetti, May 1996. 96-12 A Logic Coordination Language Based on the Chemical Metaphor, P. Ciancarini, D. Fogli, M. Gaspari, July 1996. 96-13 Towards Parallelization of Concurrent Systems, F. Corradini, R. Gorrieri, D. Marchignoli, August 1996 (Revised December 1996). 96-14 The Compositional Security Checker: A Tool for the Verification of Information Flow Security Properties, R. Focardi, R. Gorrieri, August 1996. 96-15 Jada: a Coordination Toolkit for Java, P. Ciancarini, D. Rossi, October 1996. 96-16 Fault Tolerance through View Synchrony in Partitionable Asynchronous Distributed Systems, A. Montresor, December 1996. 96-17 A Tutorial on EMPA: A Theory of Concurrent Processes with Nondeterminism, Priorities, Probabilities and Time, M. Bernardo, R. Gorrieri, December 1996 (Revised January 1997). ¨ Babao˘glu, R. Davoli, A. Montresor, 97-1 Partitionable Group Membership: Specification and Algorithms, O. January 1997. 97-2 A Truly Concurrent View of Linda Interprocess Communication, N. Busi, R. Gorrieri, G. Zavattaro, February 1997. 97-3 Knowledge-Level Speech Acts, M. Gaspari, March 1997. 97-4 An Algebra of Actors, M. Gaspari, G. Zavattaro, May 1997. 97-5 On the Turing Equivalence of Linda Coordination Primitives, N. Busi, R. Gorrieri, G. Zavattaro, May 1997. 97-6 A Process Algebraic View of Linda Coordination Primitives, N. Busi, R. Gorrieri, G. Zavattaro, May 1997. 97-7 Validating a Software Architecture with respect to an Architectural Style, P. Ciancarini, W. Penzo, July 1997. ¨ Babaoglu, R. Davoli, A. Montresor, R. 97-8 System Support for Partition-Aware Network Applications, O. Segala, October 1997. 97-9 Generalized Semi-Markovian Process Algebra, M. Bravetti, M. Bernardo, R. Gorrieri, October 1997. ¨ Babao˘glu, R. Davoli, A. 98-1 Group Communication in Partitionable Systems: Specification and Algorithms, O. Montresor, April 1998. 98-2 A Catalog of Architectural Styles for Mobility, P. Ciancarini, C. Mascolo, April 1998. 98-3 Comparing Three Semantics for Linda-like Languages, N. Busi, R. Gorrieri, G. Zavattaro, May 1998. 98-4 Design and Experimental Evaluation of an Adaptive Playout Delay Control Mechanism for Packetized Audio for use over the Internet, M. Roccetti, V. Ghini, P. Salomoni, M.E. Bonfigli, G. Pau, May 1998. 98-5 Analysis of MetaRing: a Real-Time Protocol for Metropolitan Area Network, M. Conti, L. Donatiello, M. Furini, May 1998. 98-6 GSMPA: A Core Calculus With Generally Distributed Durations, M. Bravetti, M. Bernardo, R. Gorrieri, June 1998. 98-7 A Communication Architecture for Critical Distributed Multimedia Applications: Design, Implementation, and Evaluation, F. Panzieri, M. Roccetti, June 1998.

A Communication Architecture for Critical Distributed Multimedia Applications: Design, Implementation, and Evaluation 1 Fabio Panzieri2

Marco Roccetti2

Technical Report UBLCS-98-7 June 1998 Abstract Distributed Multimedia Applications (DMMAs) in general manipulate data streams from both continous and discrete devices, in a networked computing environment, and require support for the communication, integration, and syncronization of those streams at geographically distant sites. A relevant class of DMMAs can be implemented to support critical activities from which either financial investments, or human lives, or both, may depend (e.g. electronic commerce aplications, telemedecine services). The design and development of those critical DMMAs is characterized by a notable complexity deriving from the need to provide dependable services that be both real-time and highly available. In order to provide support to the development of such DMMAs, we have designed and developed a communication software architecture that i) supports synchronization and isochronous rendering of multimedia data streams, ii) provides group management and group communication primitives, for use from DMMAs, and iii) meets effectively possible dependability and scalability application requirements. In this paper, we introduce the design and the prototype implementation of that architecture, and discuss its performance. The performance results we have obtained illustrate the adequacy of our approach for supporting critical DMMAs. Keywords: Multimedia Applications, Multimedia Protocols, Scalability, Reliability, Fault Tolerance.

1. Partial support for this work was provided by the Commission of European Communities under ESPRIT Programme Basic Research Project 6360 (BROADCAST), 2. Dipartimento di Scienze dell’Informazione, Universit`a di Bologna, Mura A. Zamboni 7, 40127 Bologna (Italy)

1

1

Introduction

The complexity inherent in the design of large scale distributed computing systems (LSDCSs) can be notably exacerbated by the requirement for reliable communication services, suitable for the implementation of Distributed MultiMedia Applications (DMMAs). LSDCSs can typically extend across wide geographical distances, and incorporate and master a possibly very large number of physical and logical resources. In general, in this context, communications are based on asynchronous communication networks, i.e. networks characterized by arbitrary communication delays. In contrast, as pointed out [10, 29], DMMAs exhibit typical real-time requirements with respect to such issues as timeliness in data capturing, communication, and rendering, i.e. requirements whereby it is necessary that both processing and communication delays be bounded and predictable. In fact, the implementation of DMMAs is based on the processing and exchange of both continuous (or time-dependent, or synchronous) information, such as audio and video data streams, and discrete (or time-independent, or asynchronous) information, such as streams of text data and still images. Thus, in order to support the implementation of these applications, it is required that the application designer be provided with appropriate software that maintain the timing relationships between those data streams, so that their rendering can be synchronized. This requirement can be further aggravated if the DMMAs to be supported are critical applications, such as the video and audio monitoring of crucial sites of a power plant, or a geographically distributed electronic auction bidding system, or the teleconferencing support for cooperative medicine (i.e. applications whose unreliable behaviour may endanger either human lifes, or large financial investments, or both.) In general, these applications can scale in terms of number of users, and geographical separation among them. However, regardless of the number of users and their physical distance, the performance and reliability of the protocols that support these applications are to be maintained within a range of values that be acceptable for these applications. Moreover, these applications manipulate both continuous and discrete data streams, consisting for example of mixed audio, video and text data, that are to be rendered isochronously at a collection (i.e. a group) of geographically distant end users. Hence, in addition to the inter-stream synchronization mechanisms mentioned above, these applications require support for coordinating the isochronous rendering of those streams at the end users; thus, in this context, the use of real-time group communication mechanisms, such as those described in [32, 59], suggests itself. These mechanisms can offer a further benefit, as they can be used for constructing adequate fault tolerance support [30] that cope effectively with the critical nature of these applications. Indeed, the variety of requirements that characterizes the DMMAs in general, and the critical DMMAs in particular, is much wider than that mentioned above; for example, these applications may well exhibit requirements for security, efficient information storage, sharing and retrieval (see, for example, [2, 8, 10, 11, 34, 36, 37, 54, 56, 60]). However, for the purposes of our discussion, we shall focus on the design, implementation, and performance evaluation of communication services that i) respond to the DMMAs’ scalability requirement, and provide support for the implementation of ii) reliable synchronization and isochronous rendering mechanisms of multimedia data streams, and iii) real-time group communication and membership services for DMMAs. (It is worth mentioning that meeting these requirements can be thought of as part of the more general design issue of providing Quality of Service (QoS) guarantees [61].) In the following we shall describe the design and the prototype implementation of a communication software architecture, for use from DMMAs, that meet the above requirements, and discuss its perfomance. That architecture is fully described and motivated in [43, 45, 46]. This paper is structured as follows. In the next Section we discuss the principal design issues we have addressed, and motivate our design decisions. In particular, in that Section we firstly examine our application scenario, and its requirements; secondly, we introduce the scalability support, the continous media synchronization, and the group membership services we propose in order to meet those requirements. Thirdly, we discuss the fault model we assume. Section 3 describes the algorithms designed for the implementation of the media synchronization support UBLCS-98-7

2

service, and discusses an analytical evaluation of these algorithms, in terms of expected bandwidth usage and message overheads they introduce. Section 4 details the group membership algorithms, and evaluates analytically these algorithms (yet again, in terms of expected bandwidth usage and message overheads). Section 5 introduces a prototype implementation of our architecture (based on the IP Multicast protocol). Section 6 discusses the performance results of that implementation. Section 7 compares and contrasts our work with related work; finally, Section 8 provides some concluding remarks.

2

Critical DMMAs

Scale is a primary factor that can influence the design and implementation of a distributed system, [50]. In particular, mechanisms that work adequately in small distributed systems may fail to do so when deployed within the context of larger systems. Hence, for the purposes of this paper, we term “scalable” a system that can provide its services, according to the performance and reliability specifications of those services, regardless of both the number of resources it accommodates, and the geographical separation among these resources. Thus, for example, if the users of an interactive DMMA can tolerate at most a 300 milliseconds end-to-end latency of the audio channels, the algorithms that support that DMMA should guarantee that latency never be exceed, regardless of the geographical separation among the application users, and the number of users concurrently using that application. (In this context, the “end-to-end latency” is the elapsed time between the acquisition of a data object at its source, and the rendering of that object at its destination [21]). The DMMAs mentioned earlier can be thought of as belonging to the same class of critical applications as they share a common set of requirements. In addition to scalability, these requirements include: ordered and isochronous rendering of the multimedia data they manipulate, high reliability and availability of the services they use, and support for group communications. In the following, we shall introduce these applications in isolation, in order to motivate those requirements. Power plant control. Assume that, in order to deploy a particular security control policy in a critical establishment, such as a nuclear or electrical power plant, it is required to develop a distributed system that implements video and audio monitoring of crucial sites of that plant, e.g. the gates. The architecture of one such system can be based on the interconnection, via a high speed local area network, of both a number of microprocessors distributed around the plant, and a set of workstations, scattered possibly in a limited geographical distance outside the plant. Each microprocessor can be dedicated to the control of a particular gate of the plant. To this end, each microprocessor can be equipped with a camera and a microphone, and made responsible for transmitting the live data streams, originated from those two devices it governs, to the workstations. Those workstations implement the rendering of the data streams transmitted by the microprocessors, so as to allow the security personnel to monitor the plant gates. In this context, it is possible that a group of workstations be required to render isochronously the live data streams transmitted by one of the microprocessors, in order to allow members of the security staff to monitor one particular gate simultaneously, from geographically separate locations. (Needless to say, a group of workstations may well be required to render isochronously, in separate windows, the data streams transmitted by more then one microprocessor.) Thus, the requirements for isochronous rendering and group communication support emerge clearly. In addition, the distributed system we are discussing will have to embody sufficient redundancy of components in order to provide highly reliable and available services, owing to the critical nature of the application it implements, and of the environment in which it is deployed. The system scalability requirement derives from the fact that it can be necessary to augment the number of microprocessors and workstations out of which the monitoring system is constructed, so as, for example, to increase the number of monitored sites in the plant. Finally, it is worth pointing out that “security” requirements are of primary importance in the UBLCS-98-7

3

design of a power plant control system; however, those requirements can be hardly met in the absence of a reliable system behaviour, as discussed at length in [41]. Distributed auction bidding. It is required to design an electronic auction bidding system that can be used to carry out auctions, such as those held at Sotheby’s and Christies’. The principal requirements to be met by one such system can be summarized as follows. 1. Participants from different countries can take part in the auction (in a real-life auction, a remote participant is represented by an agent attending that auction, and acting on his behalf); during an auction, they can join and leave that auction at arbitrary times. 2. In order to acquire an object on sale in an auction, the participants submit bids to an auctioneer. A “synchronous” model of interaction between the auction participants and the auctioneer, such as that described in [4], can be assumed; this model requires that all participants be available for auction at the same time, as in a conventional auction. (An alternative “asynchronous” model has been proposed in [44]). 3. Participants require to have the same view of the object on sale, and of the proposed bids, at the same time. They share the same notion of time, and are forced to adhere to the time constraints imposed by the auctioneer (e.g. limited time interval to propose a bid). The design of an auction bidding system that meet the above requirements can be based on the provision of highly available services that maintain a consistent view of what can be termed the “auction state”, i.e. the object on sale, the current bids for that object, etc. These services can be implemented by a collection of auction servers, geographically distributed in separate auction branches, and communicating via a broadband communication network. (Indeed, an alternative design can be based on the use of a single auction server; however, that design is of little interest, in this context, owing to the high availability requirement mentioned above.) An auction participant can connect to any branch by means of a workstation that allows him to observe the current auction state, and submit a bid. In view of the requirement 3 above, the workstations of all the auction participants must render the auction state isochronously. The auction state is replicated at the branches. Thus, the auction servers are required to act as a “group”, i.e. to cooperate with each other, in order to maintain consistency of the replicated data. In essence, for each object on sale, these servers will multicast the auction state to the workstations with which they are connected, including a deadline by which the participants have to place their bids (assume that the servers maintain the same view of the external time). When that deadline expires, the servers will exchange the collected bids with each other, in order to determine the highest bid, update the auction state (atomically at all branches), and, if necessary, start a new cycle for accepting new bids. In addition to the isochronous rendering, high availability, and group communication support requirements, a requirement for the scalability of the electronic auction bidding system suggests itself, as, for example, either the number of participants to an auction, and the number of auction servers, and the geographical separation among participants and servers may become very large. Remote assistance of surgical operations. Support for medical services, such as that provided by the Bermed system [31], allows various health specialists to cooperate. In order to bridge the physical distance that can separate those specialists, telecommunication systems can be used to retrieve and exchange patient multimedia data such as still images, video sequences, hand-written notes, interactive audio communications, reports, etc. One such system can be thought of as a teleconferencing application for supporting cooperative medicine. That application can be deployed in order to provide interactive assistance and expert consultancy during a surgical operation. The end users (i.e. the specialists mentioned above) of that application may well be located at geographically distant sites. In view of the critical nature of that application, the software and hardware infrastructure supporting it will have to provide highly reliable and available services, including high quality audio and video channels, and high performance. That infrastructure can be constructed out of a possibly limited number of workstations, equipped with the necessary video and audio devices for use from their end users, and interconnected through a broadband communication system. During a surgical operation, a workstation UBLCS-98-7

4

can be used to monitor that operation through a camera, and to transmit the live images captured by that camera, as well as possible audio communications originated by its end user (i.e. the surgeon carrying out the operation), to the workstations used by remote expert consultants. These workstations will cooperate in order to provide isochronous rendering of the live data they receive, and to maintain consistency of the multimedia data exchange. Note that, this infrastructure may require a relatively small number of instantiations of the cooperative medicine application; however, these instantiations may be separated by arbitrary geographical distances. Hence, the requirement for system scalability emerges. In conclusion, we wish to mention that the three particular DMMAs introduced above have been chosen as they represent three different types of scalable distributed applications. In particular, the first application we have described is an example of large scale application distributed in a relatively small scale networking environment. In contrast, the second application is a large scale application distributed in a large scale networking environment. Finally, the third application is a small scale application distributed in a large scale networking environment. 2.1 Common Requirements In general, the DMMAS we have described can be based on a communication infrastructure that supports the exchange of asynchronous, synchronous, and isochronous information. That infrastructure can interconnect a great variety of nodes, characterized by diverse capabilities. For example, a node may well consist of either a continuous input device (e.g. a camera) directly connected to the communication infrastructure, or a multimedia storage server, such as in [2], or a powerful workstation equipped with both continuous and discrete (e.g. a keyboard) I/O devices. These I/O devices can generate/render data streams that may consist either of individual data object sequences (e.g. sequences of audio samples, or video frames, or characters) or of composite multimedia data streams, that represent, for example, a motion video and its sound track, as transmitted by, or stored in, a video server. Within this scenario, the action of maintaining accurate time relationships among data objects within a single individual stream is generally referred to as intrastream synchronization; instead, the action of maintaining accurate time relationships among data objects of different streams (either individual or composite) is referred to as interstream synchronization [49]. A variety of algorithms has been proposed for achieving both intra and interstream synchronization e.g. [23, 51, 58, 62]. These algorithms can be implemented according to one of the following three synchronization policies, termed synchronization at the source, synchronization at the destination, and synchronization at the network [35]. Both an analysis of these algorithms and a detailed description of these three policies are beyond the scope of this paper, as they are discussed at length in the already cited references. Rather, our principal concern is the design of a particular synchronization policy (introduced in Subsection 2.3) that can support effectively the implementation of intra and interstream synchronization algorithms. Thus, in the following, we shall use the phrase Composite Multimedia Data Stream (CMDS) to indicate the data object that represents the result of the timely integration of different data streams (either individual or composite), regardless of the particular algorithm used to construct it. In the DMMAs mentioned earlier, the sources and/or destinations (i.e. the I/O devices) of the data streams, that have to be synchronized, may be geographically dispersed. Moreover, we have seen that these applications require isochronous rendering (IR) of different multimedia data streams at different destination sites. This requirement can be formulated as follows: IR: independent streams of data, originated from possibly geographically dispersed and heterogeneous (i.e. continuous and discrete) input devices, are to be integrated so as to form a CMDS to be rendered isochronously at a collection of output devices. In order to meet the IR requirement, a real-time multicast service [14, 15, 18, 6] can be used for exchanging multimedia data streams in a distributed context. Typically, one such service, combined with a real-time group-membership abstraction, provides a useful paradigm for implementing distributed applications. Thus, in this paper, we propose both a real-time multicast UBLCS-98-7

5

service, that implements the multicasting of CMDSs, and a group membership services, that maintains the group abstraction among DMMA components. Finally, in order to meet the scalability requirement that characterizes the DMMAs we have examined, we propose algorithms (introduced in the next Section) that allow the DMMA designer to construct applications that can grow in terms of number of users, and geographical separation among them, and yet provide the expected services. In summary, the DMMA requirements introduced earlier have led us to address the following four principal design issues in the development of our architecture: i) the provision of what we have termed “scalability support”, ii) the choice of a particular data stream synchronization policy that meet the IR requirement, iii) the choice of a particular group management policy, and, finally, iv) the definition of the fault model that our algorithms are intended to cope with. Below, we discuss each of these issues in turn. 2.2 Scalability Support In order to deal with scalable DMMAs, we have decided to design algorithms that allow one to structure hierarchically those DMMAs. The rationale behind this decision is that i) as shown in [58], hierarchical architectures can scale beyond an order of magnitude than purely centralized or distributed architectures, while continuing to meet the application requirements, and ii) the hierarchical virtual communication architecture we propose fits well with global interconnection network based communication infrastructures [53] that deal with small groups of components separated by large geographical distances. In particular, the implementation of the algorithms we propose shields the DMMAs components from the details of the physical communication network by providing them with the abstraction of a hierarchical, tree-structured interconnection infrastructure, that we term k-Augmented M-ary Tree (k-AMT) architecture. This architecture is structured as a complete M -ary tree (MT) with N leaves, augmented with k additional links (see below). The leaf nodes of the tree represent the multimedia data sources and destinations of a given instance of a DMMA. Nonleaf nodes represent synchronizers of multimedia data streams, that implement the Synchronization Support Service mentioned previously. A link between two nodes represents a virtual communication channel between those nodes. Nodes connected by a virtual channel can communicate by exchanging messages. The abstraction of a MT structured interconnection architecture, can be constructed out of a physical multimedia distributed system, as described in the following example. Consider the multimedia distributed system depicted in Figure 1. That system consists of a broadband communication network that interconnects: (i) a workstation equipped with three input devices, namely a camera, a microphone, and a keyboard, and a video monitor output device, (ii) a workstation equipped with two output devices only, i.e. a video monitor and a loudspeaker, and (iii) a multimedia file server and a camera connected to the communication network via a local area network. For the purposes of this example, we assume that the input and output devices are labeled as illustrated in Figure 1. In addition, we assume M , and construct the binary tree illustrated in Figure 2. (An MT construction algorithm that works for arbitrary even values of M is described in Subsection 3.2.) In order to construct the binary tree abstraction in Figure 2, the root node of that tree can be created in one of the workstations (and implemented by a specific synchronizer process); in addition, each I/O device in Figure 1 can be represented as a leaf node of that tree. Those devices can be clustered in the following four clusters, according to a physical neighborhood criterion, for example. A first cluster can consist of the camera and the file server, labeled 8 and 9, respectively, in Figure 1. A second cluster can include the loudspeaker 10, and the video monitor 11; a third cluster can include the video monitor 12 and the keyboard 13. Finally, a fourth cluster can consist of the microphone 14, and the camera 15. The activity of each of these clusters of devices is to be managed by a separate synchronizer process. Hence, in our example, four synchronizer processes are required. A synchronizer process can be represented as a node linked to the leaf nodes of the cluster that synchronizer is managing. Thus, the four nodes 4, 5, 6, and 7 in Figure 2 can be created.

=2

UBLCS-98-7

6

VIDEO MONITOR

CAMERA

11

8

COMMUNICATION NETWORK

WORKSTATION 10 LOUDSPEAKER

FILE SERVER

9

LOCAL AREA NETWORK

CAMERA

15 VIDEO MONITOR

12

WORKSTATION 14 MICROPHONE

13 KEYBOARD

Figure 1. Multimedia Distributed System

1

2

3

4

8

5

9

10

6

11

12

7

13

14

15

Figure 2. M-ary Tree

UBLCS-98-7

7

=2

Yet again, owing to our initial assumption that M , the activity of each pair of synchronizers is to be coordinated by a further synchronizer process. Thus, two such processes are required in our example; each of these processes can be represented as a node in the tree we are constructing (namely, nodes 2 and 3 in Figure 2), linked to a pair of synchronizers, and to the root node (i.e. node 1 in Figure 2). The k-AMT abstraction we propose can be derived from the MT abstraction described above by adding at most k spare links to each node in the tree; thus, sufficient redundancy for tolerating a number of communication faults, which is dependent on k, can be obtained. In Section 3 we describe an algorithm for the construction of the k-AMT. In addition, in that Section, we show that the synchronization policy we propose scales well, and can tolerate a number of faulty links, which depends on k, at the cost of communication overhead, estimated in terms of bandwidth usage and message overheads. The message overhead can be quantified as being O N logM N (where N is the number of the k-AMT leaf nodes); instead, the bandwidth usage turns out to be dependent on both k (i.e. the number of additional links per node) and M (i. e. the number of sons of each node).

( (

1))

2.3

( )+

Synchronization Policy

The approach to multimedia stream synchronization we propose meets the IR requirement, and is based on the use of a scalable, fault tolerant, decentralized synchronization policy, obtained as an extension of the algorithm described in [58]. This algorithm operates on a hierarchical virtual interconnection architecture, structured as an arbitrary rooted tree, and manages synchronized integration of multimedia data streams. In [58], it is assumed that each source generates media packets at a constant rate, and that the communication delays are bounded in a time interval. Under these assumptions, the proposed algorithm minimizes the difference between the generation time of the data packets that are being synchronized, in the absence of globally synchronized clocks. By reducing the packet generation time differences, this algorithm minimizes the buffering time and space requirements of the data packets. This algorithm operates as follows. Distinct leaf nodes can be clustered together, according to some physical or logical neighborhood criteria. Each cluster of leaf nodes may include “source” nodes that generate so-called individual data streams to be integrated in order to form a CMDS. In addition, each cluster may include “destination” nodes that perform the rendering of a CMDS. The parent node of a cluster of leaves receives data streams from its source leaves, integrates them to produce a CMDS, and then transmits it to its immediate ancestor. This procedure is executed by each nonleaf node (i.e. the synchronizers) of the hierarchical virtual interconnection architecture, up to its root. The root node constructs the Final Synchronized Multimedia Data Stream (FSMDS), and multicasts it to the destination leaves. In contrast, our approach is based on the following four assumptions.  A1. No conditions are placed on the data packet generation rate.  A2. A bounded delay (denoted D ) is assumed to be guaranteed by the communication subsystem in the communications between two directly connected sites in the k-AMT architecture.  A3. A bounded processing time (denoted as D ) is assumed to be guaranteed for each multimedia data stream which is processed at a given node of the k-AMT architecture. D (see [19]) may include processing delays such as i) the collection delay, i.e. the time elapsed from the acquisition of the data at a k-AMT source node to the delivery of those data to the network transport system of that node. Thus, D includes such delays as those caused by the data digitization and the encoding; ii) the equalization delay which is the time consumed by the synchronization algorithm to produce a CMDS at a given k-AMT site, iii) the delivery delay, i.e. the time elapsed between the delivery of a CMDS by the network transport system of a k-AMT node and the rendering of that CMDS at that node. In the following, we shall D the overall processing and communication delay which occurs indicate with D D in the communication between two directly connected nodes in the k-AMT architecture.  A4. The clocks of the processors survive failures, measure the passage of time accurately, and are synchronized, so that the measurable difference between the readings of all non

1

2

2

2

= 1+ 2

UBLCS-98-7

8

faulty clocks at any instant is bounded by a known constant (denoted ). (Note that this assumption is made realistic by such technological support as that provided by the satel, without loss of generality. lite based GPS [25].) In the following, we shall assume  However, it is worth observing that the properties of the algorithms described in this paper can be proved even when the “quite accurate synchronization” hypothesis (i.e.  > ) is satisfied. In summary, the communication subsystem can provide delay guarantees, sufficient buffer space, and a quite accurate clock synchronization by implementing, for example, the delay jitter control scheme described in [20, 23]. Moreover, we assume that the basic communication interface supporting the k-AMT architecture provides a multicast real time transport service, such as that introduced in [22, 62], and then developed in [6]. This transport service provides timely multicasting of real time data streams from a sending node to a collection of receiving nodes, directly connected to that sending node in the k-AMT architecture. In our approach, we extend the synchronization policy proposed in [58] in order to operate over a k-AMT. Our objective, in fact, is to combine both reliable communications and multimedia synchronization services. Thus, the k additional links at each node of the k-AMT are used for transmitting replicas of the individual data streams and of the CMDSs, in order to overcome problems that may arise from faults of both communication links and nodes (see Subsection 3.3). We have chosen a complete and ordered MT (rather than an arbitrary rooted tree as in [58]), over which to implement our synchronization policy, in order to master and control the complexity of the algorithms we propose. These algorithms form what we have termed the “Synchronization Support Service”.

=0

0

2.4

Group-Membership Management Policy

In general, the orchestration and coordination activities to be performed by a DMMA require communication support that allows the implementation of the following three activities:  synchronized playing out at a single destination node of CMDSs originated from multiple, possibly distributed, source nodes;  synchronized playing out, at a collection of destination nodes, of CMDSs originated from a single source node;  synchronized playing out, at a collection of destination nodes, of CMDSs originated from multiple source nodes. Thus, the principal communication models required by a DMMA are those for many to one, one to many and many to many real-time communications. These models can be adequately supported by the real-time group communication paradigm [6, 13, 32, 57, 59]. Hence, the algorithms we propose are designed to support reliable group communications over the k-AMT abstraction introduced earlier. Those group communications over the k-AMT can be based on the implementation of a message diffusion algorithm that allows its users to transmit messages to multiple destinations, and a group membership algorithm that maintains the group abstraction among DMMA components. The message diffusion algorithm transmits messages to their destinations via different routes, transparently to its users. In addition, this algorithm ensures that, under predefined failure hypotheses, the transmitted messages reach their destinations within bounded time intervals. The group membership algorithm provides all the non faulty source and destination components of a DMMA with a consistent view (defined in Section 4) of their relative group-membership, and guarantees time bounded delay of site failure detection and join. In essence, our group membership algorithm periodically monitors the occurrence of possible failures, deliberate disconnections of DMMA components, and requests to join that originate from new components. If the occurrence of any of these events is detected within a monitoring period, a new k-AMT abstraction is constructed at the end of that period. That new k-AMT abstraction will not contain the failed or disconnected components, and include those components whose requests to join have been detected. Finally, our algorithm updates the view of all the DMMA components in the new k-AMT abstraction, prior to the beginning of a new monitorUBLCS-98-7

9

ing period. (Both the message diffusion and the group membership algorithms are discussed in detail in Sections 3 and 4.) It is worth pointing out that we have decided to design real-time group membership algorithms, such as those described in [12, 13, 32, 33], as these algorithms can provide distributed applications with high dependability and guaranteed timeliness. We have explored the possibility of adopting an alternative approach based on the use of the virtual synchrony abstraction, as proposed in [3, 38, 52]. This abstraction allows one to develop reliable applications in asynchronous distributed systems; in particular, it ensures that the processes implementing those applications eventually perceive the occurrence of possible failures and modifications of the system configuration. However, in a distributed real-time context such as that we are considering, one of the principal concerns is to ensure that possible system failures and configuration changes be detected within a bounded real-time interval (hence, our design decision above). 2.5

Fault Model

Faults have been classified as benign and byzantine faults [38]. Benign faults include omission and timing faults; byzantine faults are those that exhibit an arbitrary or even malicious behavior. The fault model we consider for the k-AMT consists of benign faults only, that may occur at the virtual communication links, and at the non root nodes. Typically, these faults may cause message loss, and delays. In particular, an omission fault occurring at a k-AMT component (i.e. either a node or a link) causes that a message is never delivered to its destination; instead, a timing fault causes that a message is not delivered to its destination within D time units since the time instant at which that message has been sent. This model captures such faults as audio/video frame loss that may be caused by, for example, network congestion, as discussed in [28]. The strategy we propose to deal with those faults, described later, is based on the use of redundant links between k-AMT nodes (this strategy is compared with the Forward Error Correction strategy proposed in [28] in Section 5 of this paper). In the fault model we propose, faults causing network partitioning will result in omission or timing failures, and will be dealt with accordingly. Transient network partitions will have the same effect; however, when network connectivity is re-established, the components of a DMMA affected by the network partitioning will have to request to join that DMMA in order to resume their activity. The occurrence of a fault at a k-AMT node may cause timing and omission faults at one or more virtual links that directly connect that node to the k-AMT; thus, in the following, we shall be concerned only with those node faults that have that effect. In addition, we shall assume that those faults are “permanent”, i.e. it is either impossible or too expensive (e.g. in terms of additional communication delays) to recover from them within a particular instance of a given DMMA. Hence, we say that a k-AMT component is faulty at time T if it has been affected by a fault at time V  T ; instead, a k-AMT component is non faulty at time T , if it has never been affected by a fault. Finally, faults at the root node can be dealt with either by using conventional replication techniques [1], or by extending the algorithm proposed in the next Section to incorporate dynamic reconfiguration strategies (this extension is not discussed in this paper).

3

The Synchronization Support Service

In this Section we first introduce the definition of the k-AMT. Next, we describe two algorithms named k-AMT Construction (k-AMTC) algorithm, and k-AMT Synchronization (k-AMTS) algorithm. The former algorithm constructs the k-AMT; the latter implements our proposed synchronization strategy over the k-AMT. Subsequently, we present a collection of theorems that provide, under the specified fault hypotheses, sufficient conditions for the successful termination of the k-AMTS algorithm. Finally, we discuss the performance, and provide an analytical evaluation of that algorithm. UBLCS-98-7

10

For the purposes of this discussion, we shall assume that no faults occur during the execution of the k-AMTC algorithm; instead, faults may occur during the execution of the k-AMTS algorithm. 3.1

Definition of k-AMT

=

M d?1 Let MT be a complete M-ary tree with N leaf nodes, where M is an even integer, and N (d being the depth of the tree). The total number of nodes in MT is T M N ? = M ? . Each node in MT can be uniquely identified by an integer valued label i, such that  i  T , termed “node identification number”. The root node is assigned label i ; the M children of each node i can be labeled from left to right with the sequence of consecutive integers generated by M  i ? M ? j , j ; ; : : :; M ? . Each node i in MT,  i  T , resides at an MT level l i , such that l i lj , where j is the parent node of i in MT, and the root is at an MT level ; hence the leaves are at an MT level d ? . (For the sake of brevity, it is not reported in this paper how to construct the identification numbers of the leaves of the subtree rooted at the generic non-leaf node i. The interested reader may obtain this information by referring to Proposition A.1 in [45].) An MT defined as above, and characterized by a depth d > , can be augmented so as to form a k-AMT by adding at most k redundant links to each of its nodes, with  k < d ? . In essence, a k-AMT can be derived from some such an MT by linking any pair of nodes i and j at the same MT level l (i.e. such that l l i l j ), provided that i and j share a common ancestor other than the parent node. In particular, given the value k, any node i can be linked to those nodes at the same level l as i itself, whose labels are j i M h , with  h  k, h l l b i p =M c even and greater than zero, and p M ? M ? = M ? . Those nodes will share with i a common ancestor at the level l ? h ? (with  h  k, and h < l). For example, the MT of Figure 2 is characterized by M , d , N , T , and its nodes are labeled as illustrated in that Figure. The k-AMT of Figure 3 can be obtained from the MT of Figure 2 by introducing a k redundancy of the MT links; in particular, the redundant links are those connecting the pairs of nodes (4,6) and (5,7) at level 2, and (8,10), (9,11), (12,14), and (13,15) at level 3. As shown in Figure 3, the nodes at the end points of these links share a common ancestor other than the parent node. It is worth pointing out that, in a k-AMT, each < l  k has node at level l > k has exactly k additional links; instead, each node at level exactly l ? additional links. To conclude this Subsection, we introduce the following two definitions. We define “ascending path” from a node i at a level l i of the k-AMT, to a node j at a level l j of the k-AMT (l i < l j ), a simple path from i to j that may consist of both directed links from nodes at a generic level l to nodes at a level l ? , and spare undirected links between nodes at the same level. In addition, we denote with dmax;i k-AMT the length of the longest ascending path existing between any leaf node j of the k-AMT and a node i, situated at level l i  d ? of the k-AMT; thus, the length of the longest ascending path existing between any leaf node j of the k-AMT and the root is denoted by dmax;1 k-AMT . (This value dmax;i k-AMT is proportional to the depth of the k-AMT and may be easily calculated as shown in Proposition A.2 in [45]).

()

(

0

2) + = 0 1 () = ( )+1

(

1)

= (( =1 1

1) ( 1

1

1

1))

1

2

= ()= ( )

(+ )

=(

(( 1

= + 1 1) ( 1) + 1)) 1 = 2 = 4 = 8 = 15

=1

0

1

()

() 1 (

()

(

)

()

)

()

(

1

)

3.2 k-AMTC Algorithm At start up time, the k-AMTC algorithm is provided with a description of the DMMA as consisting of a collection of sites. Each site is identified by a unique site address that can be used in order to communicate with that site. In essence, a site represents the abstraction of a DMMA component that can be either a source, or a destination, or a synchronizer of DMMA data streams. (Note that a site can be implemented by, for example, a single process, or a collection of communicating processes.) In this context, a site, other than source and destination sites, is elected to be the root node of the k-AMT (the root election algorithm is beyond the scope of this paper). The k-AMTC algorithm operates as follows. The root node executes the following three steps. UBLCS-98-7

11

1

2

3

4

5

8

9

6

11

10

7

12

13

14

15

Figure 3. k-AMT

1. The root receives as input the values of k, M , d, and of the addresses of the N sites that represent the sources and the destinations of the DMMA data streams. It also receives, as input, the activation time value T , representing the time instant, at which the k-AMTS algorithm is scheduled to be executed for the first time (see Section 4.1). The root sets its identification number to 1, its level l , and computes the value dmax;1 k-AMT as introduced in the previous Subsection. Then, if the M and d input values are such that no complete M-ary tree with N leaves can be constructed, the root introduces a sufficient number of additional sites, termed “virtual leaves”, so as to form the required complete M-ary tree. These virtual leaves will act only as data stream repeaters, i.e. they will not act as data sources or destinations. (In the following, N indicates the total number of leaves of the k-AMT, including possible virtual leaves.) 2. The root assigns to the N leaves their corresponding identification numbers (calculated as described in the previous Subsection) and provides them with the activation time T ; then, it groups the N leaves into M (d?2) clusters consisting of M leaves each. 3. The root elects M distinct sites, that are not leaves, to be its own son nodes. Then, it creates a link with each son i, and transmits to its sons the following parameters: its own identification number, the identification number i, the level value associated to i (i.e. l i ), the values of M, d, and k, and, finally, the set of M d?1?l(i) identification numbers, together with their corresponding addresses, of the leaves belonging to the subtree rooted at i (calculated as introduced in the previous Subsection). Each node i, other than the k-AMT root, executes the following two steps. 4. Node i maintains both its own identification number i, its parent identification number, the value of its own level l i , and the values of M, d, and k. It calculates and maintains the identification numbers of the nodes with which it is to be connected by means of the k spare links, and establishes the appropriate links with those nodes. In addition, it computes the value dmax;i k-AMT , as indicated in the previous Subsection. 5. Node i elects M distinct unlabeled sites to be its own sons. It creates a link with each son j and provides it with: its own identification number, the identification number j, the level value associated to j (i.e. l j li ), the values of M, d, and k, and finally the set of M d?l(j )?1 identification numbers, together with their corresponding addresses, of the leaves belonging to the subtree rooted at j . As mentioned above, steps 4 and 5 are identically executed by each node i, such that < i < d ? , with the exception of the nodes at level d ? . In fact, each node j at that level

(1) = 0

(

)

()=1

()

(

)

( ) = ()+1

l( )

1

UBLCS-98-7

2

0

12

does not have to select its own son nodes, in step 5 above, as j has been already provided with the addresses and identification numbers of those nodes by its parent node. Finally, each node, whose level is equal to d ? (i.e. a leaf node of the k-AMT), executes only step 4 of the above algorithm. To conclude this Subsection, the following three observations are in order.  The k-AMTC algorithm can be executed, in a distributed fashion, in time O MN .  For the scope of this paper, we have not investigated fault tolerance policies to be incorporated in the k-AMTC algorithm; we regard this as a subject for future studies.  We have assumed that the values of M and d are provided as input to the root. These two input values can be replaced by appropriate parameters specifying, for example, particular Quality of Service (QOS) [34] values that allow the root node to calculate M and d. This approach may well entail the use of some sort of knowledge base that assists the root to evaluate the most appropriate M and d values, in order to meet the required QOS. Yet again, we consider this issue as a possible subject for future studies.

1

 = (log

)

3.3 k-AMTS Algorithm The k-AMTS algorithm implements the synchronization strategy introduced in Subsection 2.2. This algorithm operates over a k-AMT in two distinct phases, as described below. In the first phase (termed Collect Phase), the parent node of a cluster of leaves receives data streams from its (source) leaves, integrates them to produce a CMDS, and transmits that CMDS to its immediate ancestor. Each CMDS includes the identification numbers of those k-AMT leaf nodes that originated the individual data streams used to construct it. This procedure is executed by each nonleaf node of a k-AMT, up to the root of the k-AMT. The root node constructs the FSMDS and then, in the second phase (termed Disseminate Phase) multicasts it to the (destination) leaves. Before describing the Collect and Disseminate phases in detail, we define the worst case message transmission delay that can be experienced during each of these two phases of the k-AMTS algorithm as the value 1 dmax;1 k-AMT  D (where D is the communication-processing delay introduced in assumption A3 of the previous Subsection 2.1); the total worst case message transmission delay is defined by the value  1. The Collect and Disseminate phases are described in detail, below. Collect Phase In this phase, each nonleaf node i executes the integration of the data streams originated from its own son nodes, and delivered to i within a bounded time interval. If the set of identification numbers provided with those streams matches the set of the identification numbers which identify the (source) leaves of the subtree rooted at i, then the data stream integration executed by node i is said to produce a CMDS for the node i. As mentioned earlier, the CMDS for the root node is termed FSMDS; instead, the CMDSs for the leaf nodes are their own individual data stream. In case a mismatch occurs between the set of stream identification numbers received at i and the set of identification numbers of the (source) leaves rooted at i, the Collect Phase uses the additional links at each node as described below.  Each (source) k-AMT leaf node i sends to its immediate ancestor the individual data stream it originates, timestamped with the data stream origin time O, together with its own identification number. In addition, it sends “replicas” of that stream (together with the timestamp O, and its identification number) to each node linked with i through a spare link. Finally, upon receiving the first undelivered replica of a data stream from any other leaf node j , i compares the origin time O attached to that replica, with the time value F of its own clock; if F > O 1 then that replica has been received too late for retransmission, hence it is discarded. Otherwise it is delivered to node i, and forwarded both to its immediate ancestor and to each node linked with i through a spare link. Needless to say, i does not send to j replicas it has received from j itself.  Each nonleaf node operates in three distinct sequential sub phases: the reception sub phase, the synchronization sub phase, and the transmission sub phase. The principal actions carried out in each of these three sub phases are introduced below.

 =

(

) =2 

+

UBLCS-98-7

13

Transmission sub phase – Each nonleaf node i (other than the root) sends to its immediate ancestor its CMDS (if any, see below), timestamped with the origin time O of the individual data streams that compose that CMDS. – Each nonleaf node i (other than the root) forwards to its immediate ancestor, and to each node j linked with i through an additional link, each delivered replica. In addition, i does not send to j replicas it has received from j itself. Reception sub phase – Each nonleaf node i receives from its M sons the CMDSs they transmit. Each nonleaf node i can detect possible timing faults occurring during the transmission of those CMDSs by comparing the origin time value O, attached to each CMDS coming from each son, with the time value F of its own clock: if F > O D  dmax;i k-AMT , then the CMDS has been received too late for integration; hence, it is discarded. Otherwise it is delivered to node i. If a timing fault occurs that causes a CMDS (timestamped with some origin time value O) from any of its M sons to be too late for integration, then node i raises a “CMDS timing fault” exception and discards all the CMDSs, timestamped with the same origin time O, received from all the other sons. – Each nonleaf node i receives from both its M sons and each node j , linked with i through a spare link, all the transmitted replicas. Yet again, each nonleaf node i can detect possible timing faults by comparing the origin time O, attached to the “first instance” of each undelivered replica, with the time value F of its own clock. Thus, for a replica generated at time O, if F > O 1 then that replica has been received too late for retransmission. Hence, it is discarded. Otherwise it is delivered to node i. Note that, as in a k-AMT multiple paths exist between any two nodes, a data stream originated from a source node j may reach a synchronizer node i more than once (hence the use of the phrase “first instance” above). Synchronization sub phase – Provided that no “CMDS timing fault” exception has been raised, each node i executes the integration of the CMDSs (timestamped with the same origin time O) it has received from its sons, when its local clock displays time F O D  dmax;i kAMT . The resulting multimedia data stream will be a CMDS for node i. Instead, if a “CMDS timing fault” exception has been raised, or node i has not received all the expected CMDSs from its sons by time F , then node i examines all the replicas it has delivered. If a set of the identification numbers, attached to some of these replicas (all timestamped with the same origin time O) exists which matches the set of the identification numbers identifying all the leaves of the subtree rooted at i, then the integration of these replicas is executed. The resulting multimedia data stream constitutes a CMDS for the node i. Otherwise, no CMDS for node i can be produced; thus, that node i (with the exception of the root) will act as replica repeater only. Disseminate Phase In this phase, the root of the k-AMT transmits a copy of the FSMDS to each son. Such transmission is executed by the root node exactly when its local clock displays the time O 1 . If the root has not been able to produce the FSMDS by that time, it transmits a “null” message down to the leaf nodes in order to alert them. Those nodes will deal with “null” messages by discarding them, as discussed below. The transmission procedure is then executed by each nonleaf node of the k-AMT, which forwards copies of the FSMDS to all its sons, down to the leaf destination nodes. The motivation for this operation is to use the additional links at each node in order to overcome problems that may arise from faulty links, as described in the following.  Each nonleaf node i, upon receiving the first undelivered copy of a FSMDS (timestamped with the origin time O) delivers it only if it is not too late for retransmission with respect to the time value F of its own clock (i.e. F > O ). If that FSMDS is not too late for retransmission, then node i forwards it both to each son and to each node linked with i through a spare link. (Typically, i does not send to j a FSMDS it has received from j itself).

+(

(

))

+

= +(

))

(

+

+

UBLCS-98-7

14



Each (destination) leaf i, upon receiving the first undelivered copy of a FSMDS, compares the timestamp O representing the origin time, attached to that FSMDS, with the time value F of its own clock. If F > O (i.e. the FSMDS has been received too late for rendering), or the message is a “null” message, then the message is discarded, otherwise that message is delivered and played out exactly when the local clock of i displays the time value O . Finally, if the message is not discarded, then i forwards it to each leaf node linked with i itself through a spare link (with the usual exception of those leaf nodes from which i has received the FSMDS).  Each leaf node i, which is not a destination, upon receiving the first and not already delivered copy of a FSMDS (timestamped with the time value O of origin), compares the timestamp O attached to that FSMDS with the time value F of its own clock. If F > O (i.e. the FSMDS has been received too late for retransmission), then the message is discarded; otherwise, it is forwarded to each other leaf node linked with i through its spare links. (Needless to say, i does not send to leaf node j the FSMDS it has received from j itself.) As a final remark, note that, in general, a data stream may experience timing faults over a link that directly connects its sending node to its receiving node. That fault can be easily detected by including the transmission time of the sending node into the stream, and by comparing that time at the receiving node with the local clock value. If that local clock value is greater than the transmission time included in the stream plus the bounded delay D (defined in assumption A in Subsection 2.3), then an exception can be raised. That exception can be handled appropriately at the application level, if required. The k-AMTS algorithm deliberately ignores that exception as that algorithm allows streams to reach their destinations within bounded time intervals; the particular timing fault mentioned above does not necessarily causes a violation of those time intervals.

+

+

+

3

3.3.1 Correctness Analysis In order to meet effectively the IR requirement in a critical DMMA context, the k-AMTS algorithm has to possess the following three properties, derived from [13].  Atomicity property: every data stream generated by a non faulty source is either played out by all the non faulty destinations or by none of them.  Isochronous termination property: all the data streams generated by non faulty sources at a given time T are played out isochronously, after the same time interval I since T , by all the non faulty destinations.  Order property: all the rendered data streams are rendered by all the non faulty destinations in the same order as they are transmitted. The motivations for these three properties can be summarized as follows. As the k-AMTS algorithm deals with a data object, i.e. the FSMDS, that is effectively shared among a collection of k-AMT output nodes for rendering purposes, the Atomicity property ensures that the consistency of that FSMDS is maintained among those nodes. Instead, the Isochronous termination property guarantees that the rendering of the FSMDS can be carried out at the same time by those nodes. Finally, the Order property guarantees that the FSMDS is played out maintaining accurately the time relationships among the individual data streams out of which it is constructed. In essence, lack of one of these properties in the k-AMTS algorithm may give rise to inconsistencies in the FSMDS that is rendered by the k-AMT destination nodes; as we consider critical DMMAs that support cooperation among groups of end users, some such inconsistencies may result in disastrous consequences. In order to discuss in detail these three properties of the k-AMTS algorithm, we introduce the following definitions. Let us consider the individual data streams originated at some particular time O. We indicate with F O the set of (both omission and timing) faults occurring at the kAMT nodes and links during the Collect Phase of those data streams, and with F O the set of (both omission and timing) faults occurring at the k-AMT nodes and links during the Disseminate Phase of the FSMDS that integrates those streams.

1

UBLCS-98-7

2

15

1

We say that the set of faults F O leaves the k-AMT connected if, for each individual data stream originated at time O, there exists an ascending path of non faulty nodes and links from its source node to the root of the k-AMT. Similarly, we say that the set of faults F O leave the k-AMT connected if there exists an ascending path of non faulty nodes and links from each (destination) leaf node to the root, through which the root can transmit the FSMDS. In the following, we show that the k-AMTS algorithm possesses the atomicity, isochronous termination and order properties according to the above definition, provided that both (i) the accurate clock synchronization hypothesis is satisfied (i.e. assumption A4 in Section 2.1), and (ii) possible faults occurring at the nodes and the links of the k-AMT, during the execution of the two sequential phases of the k-AMTS algorithm, leave the k-AMT connected. (Note that, for the sake of brevity, in the following the proofs of the theorems are not included; the interested reader can refer to [45]).

2

Proposition 3.1 Let S be the set of data streams originated by all the non faulty (source) leaf nodes of a given k-AMT at time O of their clocks. Assume that both the set F O of faults occurring at the k-AMT nodes and links during the execution of a Collect Phase of the k-AMTS algorithm leave the k-AMT connected, and the accurate clock synchronization hypothesis is satisfied. Then, the root of the k-AMT produces the FSMDS containing S , by time O 1 of its clock.

1

+

Assume that Proposition 3.1 be satisfied, then we have the following:

2

Proposition 3.2 Assume that the set F O of faults occurring at the nodes and at the links of the k-AMT during the execution of a Disseminate Phase of the k-AMTS algorithm leave the k-AMT connected. Then, if the FSMDS is delivered to a non faulty (destination) leaf node j , then it is also delivered to all the non faulty (destination) leaf nodes by time O of their clocks.

+

Theorem 3.1 Assume that both the fault hypotheses specified in the Propositions 3.1 and 3.2 above, and the accurate clock synchronization hypothesis be satisfied, then the k-AMTS algorithm possesses the isochronous termination, atomicity and order properties. The following theorems provide sufficient conditions which guarantee that, if both omission and timing faults occur at the nodes and links of the k-AMT during the execution of the Collect and Disseminate phases, the k-AMT infrastructure remains connected. Note that, as introduced in Subsection 2.3, we assume that the failure of a node during the execution of the two sequential phases of the k-AMTS algorithm may correspond to a fault occurring at one or more communication links connecting that node to its neighbors.

1

Theorem 3.2 Let rl be the set of links between the nodes at level l ? and the nodes at level l of the k-AMT. Let sl be the set of the spare links (if any) between the nodes at level l. During the execution of a Collect Phase of the k-AMTS algorithm, the k-AMT remains connected if at most S0 k d?k? k  k ? = faults occur at the k-AMT links, and out of S 0 :

= (2

(

1) +

(

1)) 2

1

1. no more than k faults occur at rl [ sl , for each l such that k < l  d ? , 2. no more than l ? faults occur at rl [ sl , for each level l such that < l  k.

1

0

Definition 3.1 A set of at most S 0 faults, occurring during a Collect Phase of the k-AMTS algorithm, and satisfying the conditions of Theorem 3.2, is said to be innocuous.

Lemma 3.1 During the execution of a Collect Phase of the k-AMTS algorithm, the k-AMT remains connected, for any acceptable value of k, if at most fault occurs at the k-AMT links.

1

UBLCS-98-7

16

Theorem 3.3 Let rl and sl be defined as in Theorem 3.2. During a Disseminate Phase of the kAMTS algorithm, the k-AMT remains connected if at most S 00 k  d?k ? k  k ? = faults occur at the k-AMT links, and out of S 00 :

= (2

(

1)+ ( 1)) 2

1

1. no more than k faults occur at rl [ sl , for each l such that k < l  d ? , 2. no more than l ? faults occur at rl [ sl , for each level l such that < l  k.

1

0

Definition 3.2 A set of at most S 00 faults, occurring during a Disseminate Phase, and satisfying the conditions of Theorem 3.3, is said to be innocuous.

Lemma 3.2 During the execution of a Disseminate Phase of the k-AMTS algorithm, the k-AMT remains connected, for any acceptable value of k, if at most fault occurs at the k-AMT links.

1

Corollary 3.1 During the execution of the k-AMTS algorithm, the communication structure kAMT remain connected if an innocuous set of faults occurs during the Collect Phase and an innocuous set of faults occurs during the Disseminate Phase, respectively. 3.3.2 Bandwidth Usage and Message Overhead The following propositions show that the bandwidth [55] requirements for the implementation the k-AMTS algorithm depend only on the numbers k of redundant channels and M of sons per k-AMT node. (Note that these requirements are independent of the number of source and destination leaves of the k-AMT.) Proposition 3.3 The maximum bandwidth required for message reception at each k-AMT node during the Collect Phase is proportional to k M .

+

Proposition 3.4 The maximum bandwidth required for message transmission at each k-AMT . node during the Collect Phase is proportional to k

+1

Proposition 3.5 The maximum bandwidth required for message reception at each k-AMT node during the Disseminate Phase is proportional to k .

+1

Proposition 3.6 The maximum bandwidth required for message transmission at each k-AMT node during the Disseminate Phase is proportional to k M .

+

In conclusion, the following corollary holds. Corollary 3.2 The maximum bandwidth required for message reception and transmission at each k-AMT node, during both the Collect Phase and the Disseminate Phase, is proportional to k M .

2 + +1

The synchronization policy we propose can tolerate a number of faulty links at the cost of message overhead. In the following we evaluate this overhead and show that, in our synchronization policy, it is sufficiently contained to justify its deployment to support critical DMMAs. In addition, we show that the synchronization policy we propose scales well with the number of k-AMT leaves. Corollary 3.3 Assuming that no faults occur during the execution of the Collect Phase over a kUBLCS-98-7

17

AMT with N leaves, the total number of messages which are sent throughout the entire k-AMT can be estimated as shown below:

Z 0 = O(N (logM (N ) + 1)): Proposition 3.7 Assuming that no faults occur during an execution of the Disseminate Phase over a k-AMT with N leaves, the total number of messages sent throughout the entire k-AMT can be estimated as:

Z 00 = O(N ): Lemma 3.3 Assuming the no faults occur during an execution of the k-AMTS algorithm, the total message overhead entailed by our synchronization policy, estimated in terms of the number of messages sent through the entire k-AMT, is:

Z = O(N (logM (N ) + 1)): In conclusion, the total message overhead of the k-AMTS algorithm, estimated in terms of number of messages, results to be O N logM N , where N is the number of k-AMT leaf nodes. The bandwidth requirements and the message overheads of the k-AMTS algorithm are contrasted below with those of both a fully decentralized synchronization policy, and a centralized synchronization policy. In a DMMA supporting N participant nodes, a (generic) fully decentralized synchronization policy can be implemented by requiring that each node send his multimedia data stream to each of the other N ? nodes. Stream synchronization is to be performed independently by each of the N nodes. If we assume that this policy is deployed within a fully interconnected network by implementing N atomic reliable broadcasts [30], a bandwidth proportional to N is required for each participant node; moreover, the total message overhead is (in the worst case) O N 3 . However, this policy can tolerate up to N ? link failures. In contrast, if that DMMA implements a centralized synchronization policy, each participant node can transmit his data stream directly to a centralized synchronizer. That synchronizer performs the synchronization of the data streams it receives, and transmits the CMDS it obtains directly to all the destination nodes. Consequently, the bandwidth required by this policy at each node is constant, and the message overhead is O N ; however, this policy tolerates no link failures. In view of these observations, we claim that our algorithm scales well with the number of leaf nodes in the k-AMT, with respect to both bandwidth consumption and total message overhead.

( (

( ) + 1))

1

( )

2

( )

3.3.3 Analytical Evaluation In this Subsection we discuss an analytical evaluation of the k-AMTS algorithm, and assess the effectiveness of its operation when deployed in three different k-AMT architectures, introduced below. In view of the observation that the end-to-end latency is to be considered the most revealing indicator of the performance of a multimedia system [28], we have evaluated the end-to-end latency that can be experienced by audio and video frames that are exchanged over these three architectures, and synchronized by the k-AMTS algorithm. The first k-AMT architecture we have considered supports a total number of leaf nodes, and is characterized by the following parameters: M ,d , and k ; that is, the k-AMT leaves are grouped in clusters, each of which consists of leaf nodes. The second k-AMT architecture supports a total number of leaf nodes, and is characterized by: M ,d , and k , yielding clusters of k-AMT leaves (yet again, each consisting of leaf nodes).

=4 =4 4

16

4

UBLCS-98-7

=4 =5

=1

64

=1

64

64

256

18

Finally, the third k-AMT architecture supports a total number of 1024 leaf nodes, grouped 256 clusters (consisting of 4 leaf nodes each). Thus, the resulting k-AMT is characterized by: M = 4, d = 6 and k = 1. in

Note that the principal differences among these three architectures are represented by the number of leaf nodes, and the depth of the relative k-AMT. For each of these architectures we have evaluated three different scenarios within which the multimedia data stream exchange may occur, in order to assess the scalability of the k-AMTS algorithm, with respect to both the number of sources and destinations of multimedia data streams in a DMMA, and their geographical separation. In the first scenario, only two leaves of the kAMT, sharing a grandparent node, act as sources of data streams. In the second scenario, three k-AMT leaf nodes act as sources of data streams. Out of these three sources, two of them share a grandparent node in the k-AMT; instead, the third one is situated in a different subtree. In the third scenario, four leaves of the k-AMT act as data stream sources. Two of these four source leaves are situated in a given subtree and share a grandparent node; the remaining two source leaves are situated in a different subtree, and share a grandparent node as well. For the purposes of our discussion, the following assumptions have been made. The data transfer rate over the k-AMT can assume one of the following three values: either Mega-bits per second (Mbps), or 10 Mbps, or 16 Mbps (that is, all the k-AMT links provide the same data transfer rate). Color video frames (e.g. at  resolution) are compressed in approximately to kbits, and combined with audio frames of kbits; the continuous acquisition and display of the video frames requires 33 milliseconds [28]. Finally, we have evaluated the latency experienced by Mbit non interactive video frames assuming a typical ATM data transfer rate of : Mega-bytes per second (MBps) [21], over the three k-AMT architectures introduced above. Tables 1, 2, and 3 summarize the results of our evaluation. Each of these Tables is relative to one of the three k-AMT architectures introduced above, and reports the end-to-end latency (expressed in milliseconds) experienced by a frame transported over one particular architecture, for its source nodes to its destination nodes. Each Table shows separately the end-to-end latency values obtained in the cases in which 2, 3, and 4 k-AMT leaf nodes act as data stream sources. The values of the end-to-end latency for the same k-AMT architectures without link redundancy ) are reported in brackets. (i.e. k Table 4 summarizes the percentage of additional latency due to a k redundancy, with respect to the same k-AMT architectures without link redundancy. As pointed out in [21], the end-to-end latency in interactive audio applications should not exceed milliseconds; in contrast, in non interactive video applications, a maximum delay of 1000 milliseconds can be tolerated. Thus, the results summarized in Tables 1, 2, and 3 show that a k-AMT based on Mbit=s data rate links would be unappropriated to support the synchronization strategy we propose, as most of the latency values in one such k-AMT exceed 300 milliseconds. Instead, our approach appears to be feasible in the case in which the link data rates are in the range of to Mbit=s. For the case of non interactive video frames transported by ATM based networks, our approach appears to introduce an intolerable latency only in the case leaves k-AMT with sources; in contrast, in the remaining cases, the end-to-end latency of a does not exceed the required 1000 milliseconds. Moreover, Table 4 shows that the cost (in terms of latency), due to the redundancy introduced in order to tolerate as many faults as those calculated in the previous Section 3.3.1, does not exceed the 81% of the latency experienced by a frame over the same k-AMT architectures without redundancy. Finally, Table 5 shows the total number of messages transmitted by the k-AMTS over the three different k-AMT architectures introduced above. The total number of messages for the same k-AMT architectures without link redundancy (i.e. k=0) are reported in brackets. It is worth observing that the estimated values are far below the theoretical upper bound of Lemma 3.3.

2

240 256 4

60 64

1

4 14

=0

=1

300

2

10 16

1024

UBLCS-98-7

4

19

frame size=data rate 64 kbits=2 Mbps 64 kbits=10 Mbps 64 kbits=16 Mbps 1 Mbit=4.14MBps

2 sources 530 (312) 106 (62.5) 66 (40) 513 (301)

3 sources 687 (406) 137 (81.2) 85 (50) 664 (394)

4 sources 840 (500) 168 (100) 105 (62.5) 815 (480)

Table 1. Latency in milliseconds for a k-AMT with 64 leaves, M=4, d=4, k=1 (k=0).

frame size=data rate 64 kbits=2 Mbps 64 kbits=10 Mbps 64 kbits=16 Mbps 1 Mbit=4.14MBps

2 sources 718 (406) 143 (81) 89 (50) 694 (392)

3 sources 935 (531) 187 (106) 117 (66) 905 (514)

4 sources 1155 (656) 231 (131) 144 (82) 1117 (657)

Table 2. Latency in milliseconds for a k-AMT with 256 leaves, M=4, d=5, k=1 (k=0).

frame size=data rate 64 kbits=2 Mbps 64 kbits=10 Mbps 64 kbits=16 Mbps 1 Mbit=4.14MBps

2 sources 905 (500) 181 (100) 113 (62.5) 975 (483)

3 sources 1187 (656) 237 (131) 148 (82) 1147 (634)

4 sources 1465 (812) 293 (162) 183 (101) 1419 (784)

Table 3. Latency in milliseconds for a k-AMT with 1024 leaves, M=4, d=5, k=1 (k=0).

k-AMT dimension 64 leaves 256 leaves 1024 leaves

2,3,4 sources 68% 76% 81%

Table 4. Percentage of additional latency due to a redundancy k=1.

k-AMT dimension 64 leaves 256 leaves 1024 leaves

2 sources 157 (91) 558 (348) 2111 (1374)

3 sources 178 (95) 590 (353) 2154 (1380)

4 sources 190 (97) 608 (356) 2178 (1384)

Table 5. Number of messages for a k-AMT (M=4) with 64, 256, 1024 leaves and k=1 (k= 0).

UBLCS-98-7

20

4

The Group-Membership Service

DMMAs such as those mentioned above require that the exchange of the data streams be coordinated among a goup of application users. Hence, there is scope for investigating the use of the group communication paradigm [32, 59] in order to meet that requirement. That paradigm can offer a further benefit, as it can be used for constructing adequate fault tolerance mechanisms [30] that cope effectively with the critical nature of those applications. The Group-Membership (GM) service we propose allows the source and destination sites of a given DMMA to be joined to form a group, in order participate to that DMMA (a site not joined to a group will not be allowed to participate to any activity of that group). In addition, the GM service allows sites in a group to leave voluntarily that group, and to be disconnected from the activities that group is carrying out. However, the disconnection of a site from its groups may be caused by a fault of that site or of its communication links, resulting in omission or timing faults. Any such case is dealt with by the GM service as a voluntary disconnection of that site. As defined in the previous Subsection 3.2, a site represents the abstraction of a DMMA component (i.e. it can represent either a source, or a destination, or a synchronizer), and is identified by a unique site-address (or site-identifier). Each source and destination site of a given DMMA is represented by a leaf node in the k-AMT associated to that DMMA. As source and destination sites of a DMMA are joined in a group, there is a one-to-one correspondence between each group member and each leaf node in the k-AMT corresponding to that DMMA; hence, in the following, we will indicate a given group with its corresponding k-AMT, and use the term “sites” to refer to the leaves of a k-AMT, and vice versa. Groups are maintained by the Group-Membership Service described in the next Subsection. We introduce below a set of predicates that allow us to define five properties that are to be satisfied by that service. Let S be the set of all the site-identifiers that can take part in all the possible DMMAs. In order to identify unambiguously all the different groups, we assign to each group a unique groupidentifier (defined in detail later). Let G denote the set of all the possible group-identifiers. We denote by joined S ! ftrue; falseg the predicate that, for any site s 2 S , is true when s is joined to some group, and false otherwise. Let group S ! } G be the mapping that returns the set of group-identifiers Sg  G to which a site s 2 S is joined. Let finally view S  G ! } S denote the mapping that returns, for a site s 2 S , the set of members of the group g to which s is joined. We term a site s 2 S live for a given group g 2 G, at time T , if s has joined g at some time instant V < T , and has never voluntarily left g, since V . As a site failure is dealt with as a voluntary disconnection of that site from all groups to which it is joined, in the following we will say that a site is surviving, for a given group g 2 G, at time T , to denote that that site is both non faulty and live for g, at time T . We discuss below the problem of maintaining an accurate and consistent group-membership view at each different site in a group. To this end, we shall assume an evolving scenario in which sites not included in a group may request to join it (we indicate them as “new sites”, below). In addition, we shall assume that sites belonging to a group can voluntarily leave that group, or experience faults that cause their disconnection from that group [12]. Note that these sites may require to join again that group (possibly after recovery, if failed); in such cases, they will be dealt with as new sites. In order to govern the above scenario, we require that the group-membership service satisfy the following five properties.  P1: Stability. After a non faulty site s 2 S joins a group g 2 G, s remains joined to g until a failure or a leaving of s is detected.  P2: Mutual Agreement. If two surviving sites s 2 S and r 2 S are joined to the same group g, that is joined s and joined r and g 2 group s \ group r , then the two sites have the same view of the membership of that group, i.e. view s; g view r; g .  P3: Reflexivity. A site s 2 S , joined to a group g 2 G, is required to be a member of that group, that is: if joined s and g 2 group s then s 2 view s; g .

:

:

( )

:

(

()

()

( ))

()

UBLCS-98-7

(

()

()

( )) ( )= ( ) ( )

21



P4: Time Boundness on Join Delay. There exists a time constant J such that, if a site s 2 S issues a join-call to a group g 2 G at time T , and is surviving until time T J , then s joins g by time T J . Moreover, the group g is also joined by each other site r 2 S that requests to join g by time T , and is surviving until time T J .  P5: Time Boundness on Failure Detection Delay. There exists a time constant H such that if a site s 2 S , joined to a group g 2 G, fails (or leaves) at time T , then each other member r of g, surviving in the time interval T; T H , is informed that s 62 view r; g by time T H . Note that the five properties above logically imply that the following “Correctness” property C holds: C1: Correctness. If a site s 2 S joins a group g 2 G, to which another member r 2 S is also joined, then, if both s and r are surviving until time T , then both s and r will observe the same view changes (both in the same order, and by the same time). In the following, we introduce the GM Service that satisfies the five properties introduced above. This Service can be thought of as consisting of three principal components; namely, the “group-initiation” component, the “join-handling” component, the a “leave-handling” component. The group-initiation component implements a Group Initiation Algorithm, based on a “public announcement” policy. This policy allows sites to form new groups, and to identify existing groups. The join-handling component implements a Join Algorithm that allows sites to become members of existing groups; finally, the leave-handling component implements a Leave Algorithm that allows sites to voluntarily disconnect from the groups to which they are joined. The implementation of the Join and Leave algorithms is based on a so-called Periodic Confirmation (PC) algorithm that operates over the k-AMT. In this Section, we shall describe firstly the Group Initiation Algorithm. Secondly, we shall discuss the Join and Leave Algorithms. Finally, we shall introduce the PC algorithm.

+

+

+

[

4.1

+ ]

( )

+

1

Group Initiation Algorithm

Let G be the set of all the possible group-identifiers. Each site s 2 S is allowed to form a new group. In addition, each site s 2 S can form more than one group. In order to designate unambiguously the different groups that might exist, we identify each group by a unique groupidentifier g < x; s >, where x is the time instant at which the site s starts up the activity of forming the new group g. We term s the “founder” site of the group g. The founder site s starts up the activity of initiating a new group by publicly announcing this event; s can announce the new group g (and the initiation of the related activity) by submitting to a “publicity server” a “call for participation” announcement, at a given time x. (A variety of techniques are possible for implementing a publicity service; for example, this service can be implemented by a set of publicity servers that are available for announcement purposes, and are located at some well known addresses.) The call for participation announcement includes both the subject and the description of the coordinated activity that the group g is going to execute. In addition, it contains: i) the groupidentifier g, ii) a time value y, y > x, representing the “participation” deadline (see below), and iii) a time value T , T > y, representing the activation time of the group g (i.e. the time at which the members of g will start to carry out the activities of the group). Any site t (other than s), that wishes to participate in g, has to send to the “publicity server” a “participation” message that confirms t’s willingness to participate in g, and to act as source and/or destination site in the corresponding k-AMT. This “participation” message has to be received by the “publicity server” before the participation deadline y expires. Once the deadline y has expired, the publicity server holds the list of all sites (including s) that wish to participate in g. Consequently, it activates the execution of a root election algorithm among all the sites in g, in order to designate the site that will act as the root of the k-AMT representing the group g. (As mentioned earlier, this algorithm is beyond the scope of this paper.) At the termination of the root election algorithm, the “publicity server” transmits to the root site both the set of identifiers of the sites wishing to participate in g, and the activation time T of g. Finally, the root site starts the execution of the k-AMTC algorithm that constructs the k-AMT

=

UBLCS-98-7

22

communication infrastructure, and multicasts to the members of g their initial group membership view. In order to honor possible requests from new sites to join existing groups, the publicity server maintains a mapping between each group identifier, its root site, and its relative activity. Thus, a new site that wishes to join an existing group firstly will have to request to the publicity server the address of the root node of the k-AMT corresponding to that group; secondly, it will initiate the Join Algorithm described below. 4.2 Join and Leave Algorithms The operation of the Join Algorithm can be summarized as follows. A new site s 2 S that, at time X , wishes to join a group g 2 G has to send a “join” message, timestamped with time X , to the root of the k-AMT implementing that group. The join message contains the s site-identifier, and indicates that s is requesting to join the group g, and to act as a source and/or a destination site. In order to transmit that message to the root of that k-AMT, the new site establishes with the root a virtual link over which the message is sent. In normal circumstances, this join request message will be received by the root by time U X D, under the fault hypotheses we have assumed. The root will process that request and, if accepted, it will (i) generate a new k-AMT that includes s as one of its leaves, (ii) construct a new group membership view for the members of g that includes the site s, and (iii) multicast that view to all members of g. The implementation of the Leave Algorithm is based on the use of so-called Confirmation Messages (CMs) that are periodically transmitted from each site in a group g to the root of the k-AMT corresponding to that group g. (The strategy for the reliable transmission of CMs to the root is effectively implemented by means of the PC Algorithm described in the next Subsection.) Essentially, if T is the activation time of a particular group g, each site s in g schedules the transmission of a CM to the root of g’s k-AMT periodically, i.e. at fixed Membership Check Times V T k  t; k ; ; ; :::, and t > . Thus, the voluntary leaving of a site from the group g can be implemented by simply suspending the transmission of a CM from that site; the root deals with this event as described below. The CMs allow the root of a k-AMT to maintain the group membership view of the sites in the group g, represented by that k-AMT, and possibly to modify that k-AMT topology. In fact, if the root of that k-AMT does not receive the CM from one of the sites in g, it assumes that that site has either been affected by a fault, or has voluntarily left the group g. As a consequence, the root (i) generates a new k-AMT that does not include that site, (ii) constructs a new group membership view for the surviving sites in g, and (iii) multicasts that view to those sites. A view is multicast from the root node of g’s k-AMT as part of a so-called Final Confirmation Message (FCM); FCMs are received by the members of the group g by a time instant that we term Membership Confirmation Time. It is worth observing that the members of the group g receive a new group membership view, relative to g, only after modifications of their current view have taken place, e.g. after a site has executed the Join or Leave Algorithms, or an omission or a timing fault that disconnect a site from g’s k-AMT have occurred. Thus, the root node may construct and multicast a new view only periodically, if modifications to the current view have occurred. Typically, as detailed in the following example, the root may construct and multicast a new view after it has received the CMs from the surviving sites in the group g, i.e. at each Membership Check Time 1 . (For the purposes of the example below, we shall use the same notation we have introduced in Subsection 3.3 to indicate the worst case delays experienced by the CMs during their transmission over the k-AMT.) Suppose that g be a group whose members have the next membership check time set to time W V k  t, for some integer k, and t > . Their CMs are assumed to reach the root by 00 W 0 W . Subsequently, by the time W , the root will have constructed time W 0 1 a possibly updated k-AMT communication infrastructure, and a new view for the members of g, provided that modifications to the current view have occurred. That updated k-AMT and new view will not include failed and voluntarily disconnected sites, and will include new sites

=

+

= +

=1 2 3

+

+

= + =

UBLCS-98-7

+

+

=

+

23

[

]

whose join requests have been received by the root in the time interval W; W 0 . Then, by time W 000 W 00 1 (i.e. the membership confirmation time), each member of the group g will be provided with a new group-membership view. k In contrast, let Z be the membership check time immediately following W , i.e. Z V  t. The new sites whose join requests have been received by the root in the time interval W 0; Z will be included both in the updated k-AMT and in the new view by time Z .

=

+

= +( + [ ] ++

1)

4.3

Periodic Confirmation Algorithm

The PC algorithm implements the multicasting of the CMs over the k-AMT. This algorithm operates in two distinct phases, termed Ascending Phase and Descending Phase. In the Ascending Phase, CMs are transmitted from the surviving sites in a group g to the root of g’s k-AMT, i.e. from the leaf nodes of a k-AMT to its root node, via the k-AMT links. In the Descending Phase, the up-to-date information about the current group-membership view, as calculated by the root, is multicast from the root itself to all the surviving members of g, i.e. to the leaf nodes of g’s k-AMT, yet again via the k-AMT links. The Ascending and Descending phases are described below, in isolation. For each of these two phases, we describe separately the actions that are carried out by the leaf nodes of a k-AMT, and by its non-leaf nodes. Ascending Phase  As mentioned earlier, each leaf node i in a k-AMT (i.e. each surviving site in the group associated to that k-AMT) sends a CM to the root node, at each scheduled membership check time V . The CM is effectively transmitted from i to its immediate ancestor in the k-AMT, and includes the time value V as a timestamp, and i’s site-identifier. In addition, i sends replicas of that message to each node with which it is directly linked via a spare link in the k-AMT (with the exception of those nodes that originated these replicas). Typically, i may receive CM replicas from other leaf nodes. Upon reception of the “first” and undelivered copy of a CM from any other leaf node j , node i compares the membership check time V timestamped on that CM with the value F of its own clock. If F > V (i.e. the CM has been received too late for retransmission), then that CM is discarded, otherwise it is delivered.  Each nonleaf node i, upon receiving the “first” undelivered CM from a leaf node j , compares the timestamp V attached to that CM with the value F of its own clock. If F > V , then the CM is discarded, as above, otherwise it is delivered. Note that copies of the same CM may reach a node i via multiple routes. As each CM is identified by a unique timestamp, i.e. the membership check time, node i can detect and discard copies of the same CM (hence, the use of the term “first” above). Descending Phase All the messages coming from the leaf nodes of the k-AMT reach the root by time V 1, under the assumed fault hypotheses. Hence, at that time, the root has knowledge of the surviving sites and of possible new sites at time V . Let us denote with MEMBERS g; U the mapping that, given the group g and the time U , returns the set of the identifiers of all the surviving sites in g at time U . Thus, at the termination of the Ascending Phase, the root can compute the value of MEMBERS g; V by using the CMs timestamped with the same timestamp V . Moreover, it can compute, by using the k-AMTC algorithm and consuming time units, a new k-AMT communication infrastructure for the group g; that new infrastructure will not include the failed and leaving sites by time V , and include the possible new sites by that time. Needless to say, if no site has failed nor leaved nor joined the group g by time V , both the value of MEMBERS g; V and the k-AMT communication infrastructure remain unchanged. Then, the root multicasts to its sons the FCM containing the current value of MEMBERS g; V , timestamped with the value V representing the membership check time. This transmission procedure is executed by each nonleaf node of the new k-AMT abstraction, that forwards to each son a copy of the FCM, down to the leaf destination nodes. The additional links at each node of the new k-AMT are used for tolerating communication faults as described below.

+ 1

+1

+

(

)

(

)



(

)

(

UBLCS-98-7

)

24



Each nonleaf node i, upon receiving the first undelivered FCM, compares the timestamp V attached to that FCM with the value F of its own clock. If F > V (i.e. that FCM has been received too late for forwarding), then the FCM is discarded, otherwise it is delivered by i, and then forwarded both to each son of i and to each node of the k-AMT linked with i through a spare link. (As mentioned before, i will not send an FCM to any other node j from which i has received that FCM).  Each leaf node i, upon receiving the first undelivered FCM, compares the FCM’s timestamp V with the time value F of its local clock. If F > V (i.e. that FCM has been received too late for updating i’s view), then the message is discarded, otherwise it is delivered. If an FCM, timestamped with a given time value V , is delivered by a node i then that node can reconstruct its group membership view. To this end, node i reads from that FCM the transmitted value of MEMBERS g; V , when its local clock displays the time V . This value represents i’s current group membership view of the group g to which i is joined, and can be maintained at i in the local variable view i; g . Finally, if an FCM is delivered by node i, i forwards it both to each son and to each node linked with i through a spare link. Yet again, i does not send an FCM to any node j if it has received that FCM from j itself. To conclude this Subsection, note that the timing faults that the PC algorithm deals with are those that cause violation of the time constraints specified in the PC algorithm itself. Instead, the occurrence of a timing fault over a k-AMT link, that does not cause violation of the time constraints mentioned above, can be handled, as in the k-AMTS algorithm, at the application level.

++

++

(

)

++

( )

4.4 Remarks on the Correctness and the Performance of the GM Service In order to examine the correctness properties of the GM Service, it is necessary to investigate the characteristics of the PC algorithm on which the GM Service is based. As defined in [13], a generic multicast algorithm is termed atomic if (1) it is such that each message generated, at a fixed time, by a surviving source is delivered by all the surviving destinations, within some known time bound, (termination property), (2) it ensures that every message sent by a surviving source, is either delivered by all the surviving destination or by none of them (atomicity property), and (3) it guarantees that all messages generated by all the surviving sources are delivered in the same order to all the surviving destinations (order property). Formal proofs can be given. However, by simply considering that both the PC algorithm and the k-AMTS algorithm (formally analyzed in the previous Subsection 3.3.1) follow the same message diffusion strategy, operating over the same k-AMT virtual interconnection infrastructure under the same fault hypotheses, we can safely conclude that: provided that the faults occurring at the nodes and the links of the original k-AMT, during the execution of the Ascending and Descending phases of the PC algorithm, leave the communication architecture connected, if the FCM is delivered to a surviving member of the corresponding group, then, it is also delivered to all the other surviving members of that group, by a known predefined time. Hence, we can assert that, under the assumed hypothesis of accurate clock synchronization (i.e.  ), the PC algorithm possesses the termination, atomicity and order properties. It is worth noting that it is possible to show that the proposed PC algorithm possesses those three properties even if we assume the hypothesis of quite accurate clock synchronization (i.e.  > and constant). As a consequence, if at membership confirmation time V , a member s of g detects that some site r 2 view s; g is no more in MEMBERS g; V , then, by the atomicity and termination properties of the the PC algorithm, every other surviving member q of g detects, at the same membership confirmation time, that r 62 MEMBERS g; V . The following Proposition 4.1 descends from the termination, atomicity and order properties of the PC algorithm.

=0

( )

( (

) )

++

0

Proposition 4.1 Provided that both the accurate clock synchronization hypothesis is satisfied (i.e.  ), and the faults occurring at the nodes and the links of the k-AMT, during the execution of the two sequential phases of the PC Algorithm, leave the communication structures connected, then the GM Service satisfies the P1, P2, P3, P4 and P5 properties previously introduced.

=0

UBLCS-98-7

25

CM size=data rate 8 bytes=2 Mbps 8 bytes=10 Mbps 8 bytes=16 Mbps 8 bytes=4.14MBps

64 leaves 11.25 2.25 1.4 0.7

256 leaves 61 12.2 7.6 3.6

1024 leaves 307 61.5 38.3 18.5

Table 6. Latency in milliseconds of the CMs.

k-AMT size 64 leaves 256 leaves 1024 leaves

2 sources 3.6% 15% 61%

3 sources 2.7% 11.5% 46%

4 sources 2.25% 9% 37%

Table 7. Percentage of additional latency introduced by the CMs.

Yet again, formal proofs can be given; however, by analogy with the k-AMTS algorithm, it is straightforward to conclude that the virtual communication architectures, over which the Ascending and Descending Phases of the PC algorithm are executed, remain connected if innocuous sets of faults occur. In addition, it is possible to conclude that, during an execution of the PC algorithm a total number of messages Z O N logM N is sent. Hence, we can state that the PC algorithm scales well, and can tolerate a number of communication faults, which depends on k, at the cost of a communication overhead, estimated in terms of number of messages proportional to N logM N .

= ( (

( (

( ) + 1))

( ) + 1))

4.4.1 Analytical Evaluation In this section we discuss the impact of the group membership management service we have proposed on the performance of critical DMMAs. To this end, we provide below an analytical evaluation of the end-to-end latency experienced by the CMs, that are periodically transmitted by the GM service. We have carried out that evaluation assuming both the same three k-AMT architectures (sup, and nodes, respectively), and the same network link data rates introduced porting , in Section 3.3.3. In addition, we have assumed that each CM, transmitted by a k-AMT leaf node, consists of bytes. Table 6 shows, separately for three k-AMT architectures we have considered, the end-to-end latency experienced by the CMs in each of these architectures. Note that these latency values represent an additional overhead to the end-to-end latency experienced by the audio and video frames that can be transmitted over those architectures, concurrently to the CMs. Hence, the values reported in that Table indicate that, if a k-AMT architecture is characterized by a link data rate of Mbps, the latency introduced by the transmission of the CMs adds to an already intolerable end-to-end latency of the audio and video frames (as reported in the previous Tables 1, 2, and 3). Instead, if the k-AMT architectures are based on links that provide to Mbps data transfer rates, the additional latency introduced by the our group management strategy is acceptable, with the exception of the k-AMT architecture characterized by 1024 leaf nodes. For one such architecture, the end-to-end latency can be contained within acceptable bounds if the k-AMT architecture is based on : MBps ATM links. Table 7 summarizes the percentage of additional latency introduced by the transmission of CMs over the three k-AMT architectures we have considered. This Table shows that the cost (in terms of latency) of transmitting the CMs over these architectures never exceed the 61% of the latency experienced by audio and video frames transmitted over the same architectures without redundancy (see the previous Tables 1, 2, and 3). In view of the added reliability that our group membership service offers, we regard this additional cost as acceptable.

64 256 8

1024

2

10 16

4 14

UBLCS-98-7

26

Interface 3

GMMP

Level 3 Interface 2

ATP

ADP

Level 2 Interface 1

MVCP

HSP

Level 1

IP Multicast

Figure 4. Protocol Structuring

5

A Prototype Implementation

We have implemented a prototype k-AMT communication architecture using the C programming language, and the development environment provided by the SunOS 4.3 (BSD Unix) operating system. Our prototype implementation can be thought of as structured in the following four principal levels of abstraction, as depicted in Figure 4. Level 0 consists of the Internet IP Multicast datagram protocol [14]. This protocol implements the transmission of IP datagrams from a source host to a destination Host Group, i.e. a set of hosts identified by a single IP destination address. An IP Multicast datagram is delivered to the members of its destination Host Group on a “best effort delivery” basis, as a conventional unicast IP datagram. Thus, an IP Multicast datagram is not guaranteed to be delivered to all the members of its destination Host Group; moreover, consecutive IP Multicast datagrams are not guaranteed to be delivered in the same order as they were sent. The IP Multicast protocol interface, as available through the SunOS 4.3 socket interface, has been used as the low level transport service in the implementation of our architecture. Level 1 incorporates the so-called Host Synchronization Protocol (HSP), and Multicast Virtual Circuit Protocol (MVCP). HSP uses the IP multicast interface to implement a simple clock synchronization algorithm, described in [42], so as to maintain clock synchronization among all the k-AMT nodes. MVCP uses the IP Multicast interface to implement timely multicasting of real time data streams over a Multicast Virtual Circuit (MVC). A MVC extends the conventional virtual circuit abstraction by allowing a message source node to establish, maintain, and release a connection, consisting of multiple virtual channels, with a group of destination nodes. Messages transmitted over an established MVC will be delivered to the destination end-points of that MVC in the same order as they were sent. The MVC abstraction has been implemented based on the IP Multicast Host Group abstraction. Level 2 uses the MVCP services to implement the Augmented Tree Protocol (ATP), that constructs the k-AMT, and the Assemblage-Diffusion Protocol (ADP), that provides communications over the UBLCS-98-7

27

k-AMT. Finally, Level 3 uses the services provided by the ATP and the ADP to implement the GroupMembership Management Protocol (GMMP). This protocol allows a site to initiate, join and leave a group, and to take part in the coordinated activity of that group. The implementation of the Level 1, 2, and 3 protocols of our architecture is introduced below, in isolation. Multicast Virtual Circuit Protocol MVCP implements an interface consisting of the following six primitive operations: setMVC, releaseMVC, sendto, receivefrom, addmemb and dropmemb. These primitives allow one to establish, maintain, release MVCs, and to connect to, and and disconnect from, already established MVCs. In particular, the setMVC primitive allows its invoker (namely, a k-AMT node) to establish a MVC with a set of h distinct k-AMT destination nodes. The implementation of this primitive transmits a MVC establishment request to those h destination nodes. This request includes information concerning the maximum communication and processing delay bound d which can be tolerated in the communications with the setMVC invoker node. Each destination node that wishes to honor that request acknowledges it. Lack of acknowledgement from some destination node, during the MVC establishment phase, causes that setMVC raise an exception, termed MVCP type 1 exception. This exception indicates the number < w < h of unacknowledged requests, and identifies those destination nodes lacking the acknowledgement. It is the responsibility of the setMVC invoker to handle this exception appropriately, e.g. by either restricting the communications to the h ? w available destinations only, or releasing the MVC. A MVC can be released by its initiator by invoking the releaseMVC primitive. When a MVC has been established between a source node and group of destination nodes, the multicasting of data streams from that source to those destinations can take place. The sendto primitive implements the multicasting of data streams. Normal termination of this primitive indicates that the data stream has been correctly multicast from the source node to the MVC destination end-points. Abnormal termination of this primitive can be caused by transmission errors occurring on some, or all, the channels that form a MVC. In this case, an MVCP type 2 exception is raised that allows the sendto invoker node to identify the failed channels within that MVC. The receivefrom primitive is invoked by a MVC destination node in order to receive multicast data streams. Normal termination of this primitive indicates that a data stream has been delivered within the predefined delay d. Abnormal termination, instead, indicates that a data stream has been received, and is to be discarded (e.g. that data stream might have been received with a delay greater than d). Finally, the addmemb and dropmemb primitives allow their invokers to connect to, and disconnect from, an existing MVC, respectively. These two primitives have been implemented using the JoinHostGoup andLeaveHostGroup primitives provided by the IP Multicast Protocol interface. Host Synchronization Protocol (HSP) This protocol maintains clock synchronization among the processors that implement the k-AMT structure. HSP uses the primitives provided by the IP Multicast interface for implementing the distributed clock synchronization algorithm described in [42]. This algorithm guarantees that the measurable difference between the readings of all the clocks of the non faulty processors is bounded by a known constant. Augmented Tree Protocol The ATP constructs the k-AMT abstraction by implementing the k-AMTC Algorithm introduced earlier. The implementation of the ATP can be summarized as follows. After each node i has identified its M son nodes, and the k peer nodes to which it is to be connected through the spare links, it establishes (using the MVCP interface primitives) a MVC with its parent node, its son nodes, and its k peer nodes. In addition, each time a new source or destination site requires to join an existing group, firstly the ATP searches for an existing virtual leaf node of the corresponding k-AMT. If a virtual node exists, the ATP assigns that node to the new site. Otherwise, the k-AMTC algorithm is to be executed to construct a new k-AMT. Instead, if a source or destination site (implemented by a corresponding leaf node in the k-AMT) wishes to leave an existing k-AMT, that node (depending on specific application requirements) can either be considered as a virtual node of the k-AMT, or the k-AMTC algorithm can be executed to construct a new k-AMT that does not include that node. ATP terminates successfully if the construction of the k-AMT has completed successfully.

0

UBLCS-98-7

28

Successful termination of the ATP entails that each node in the k-AMT is ready to execute the message diffusion phases embodied in the k-AMTS and PC algorithms. However, it is worth observing that, during the execution of the ATP, possible MVCP type 1 exceptions can be raised. ATP intercepts those exceptions, and terminates indicating that the construction of the required , k-AMT has failed. In particular, if at least one of these MVCP exceptions is such that h ? w then ATP terminates with a failure exception (termed k-AMT isolated node exception) indicating that a node is completely isolated. Otherwise, it terminates with a failure exception (termed kAMT partial construction exception) that indicates that the k-AMT construction has failed. (We will not discuss any specific strategy for handling ATP exceptions in this paper.) Assemblage-Diffusion Protocol (ADP) This protocol supports the communications over a k-AMT by implementing the k-AMTS and the PC algorithms. It implements the two primitive operations assemble and diffuse. The assemble primitive implements the Collect phase of the kAMTS algorithm, and the Ascending phase of the PC algorithm. The diffuse primitive, instead, implements the Disseminate phase of the k-AMTS algorithm, and the Descending phase of the PC algorithm. The assemble primitive is invoked by the root node of the k-AMT. The implementation of this primitive causes that each k-AMT node i invoke the MVCP interface primitives in order both to read from its son and peer nodes, and to write to its parent and peer nodes (via the MVC previously established by the execution of the ATP). The termination of the assemble primitive is successful if the root node has constructed either the FSMDS, if the k-AMTS algorithm is executed, or the FCM, if the PC algorithm is executed. The termination of the assemble primitive returns a incomplete assemble exception if at least one MVCP type 2 exception has been intercepted by the ADP implementation. (If this occurs, in fact, a node of the communication architecture results to be isolated during the execution of either the Collect phase of the k-AMTS algorithm, or the Ascending phase of the PC algorithm). However, whether or not an incomplete assemble exception has been raised, the execution of the assemble primitive cannot terminate successfully until the root node has constructed either the FSMDS, or the FCM, within the predefined time bound. Thus, a failed assemble exception is raised, only if the root of the k-AMT is not able to compose either the FSMDS or the FCM within the predefined time bound. The failed assemble exception indicates the identifiers of the failed sites whose data streams (or confirmation messages) have not reached the root. The termination of the diffuse primitive is successful if either the FSMDS or the FCM are delivered to all the destination leaf nodes, within the predefined time bound. The implementation of this primitive can raise an incomplete diffuse exception if at least one nonleaf node in the k-AMT is detected to be isolated (i.e. an MVCP type 2 exception has been raised). Finally, a failed diffuse exception can be raised only if at least one destination node has not received either the FSMDS or the FCM, within the predefined time bound (i.e. an MVCP type 2 exception has been raised at one of the leaf nodes). Note that the exceptions raised by the ADP are propagated back to the root via the k-AMT links. Group-Membership Management Protocol The GMMP provides the following four primitive operations join, leave, initiate, and participate. The join primitive can be invoked by a site that wishes to participate to the multimedia data stream exchange activity of an already existing group. The implementation of the join primitive causes the requesting site to establish a connection with the root node of the k-AMT that implements that group, and to transmit a join request message to that root node, through that connection. After receiving a join request from a new site, the root of the k-AMT constructs the updated k-AMT abstraction that includes that new site as a leaf node. This operation is carried out within a predefined time interval since the most recent membership check time. The normal termination of the join primitive indicates that the invoking site has successfully joined the requested group, and can participate in the coordinated activity of exchanging CMDSs within that group. Abnormal termination of the join primitive returns a join failure exception, if an MVCP type 1 exception has been raised in the connection establishment phase. A site remains joined to a group, until either that site invokes a leave primitive, or it is affected by a fault that causes its disconnection from that group. The invocation of the leave primitive

=0

UBLCS-98-7

29

causes that the invoking site suspend the transmission of the periodic CM, scheduled at the first membership check time following that invocation. Thus, the root of the k-AMT will construct, after a fixed known time since the invocation of the leave primitive, an updated view that does not include the leaving site. The initiate primitive can be invoked by a site that wishes to start up a new group. The implementation of this primitive causes the invoking site to transmit a “call for participation announcement” to a so-called publicity server (for more details about this server see [45]). This “announcement” includes a deadline by which the interested sites can confirm their willingness to participate to the new group. The participate primitive allows those interested sites to respond to a “call for participation announcement”. The implementation of this primitive causes that the invoking site transmit to the publicity server a “confirmation” message that requests to become a member of the announced group. This primitive terminates successfully if that message is delivered to the publicity server before the relative participation deadline expire. Otherwise the participate primitive terminates with a failed participate exception. Finally, it is worth noting that even if a site is unsuccessful in participating in the initiation of a new group, it can try to join that group later by invoking the join primitive. After the publicity server has collected the confirmation messages from the interested sites, the root node of the k-AMT that will represent the new group can be elected. Consequently, the ATP interface primitives are invoked by that root node in order to construct the k-AMT communication structure.

6

Experimental Evaluation

The performance of the implementation of the Synchronization and Group communication protocols introduced above has been evaluated using a distributed infrastructure consisting of a 10 Mbs Ethernet interconnecting eight SPARCstation 5 workstations, running the (non real-time) SUN OS 4.3 operating system. The results of this evaluation are discussed below. As pointed out in [28], the most important metrics that influence the users’ perception of multimedia data are: the end-to-end latency of the communications between the participants of a given multimedia application, the number of data discontinuities (i.e. the number of media units belonging to a given multimedia data stream that are never played out), and, finally, the deviations from exact synchronization of audio and video streams. The end-to-end latency is defined as the elapsed time between the acquisition of a data unit at its transmitter, and the rendering of that unit at its receiver. This latency consists of: the collection delay, the network delay, and the delivery delay [19]. The collection delay is the time needed for the transmitter to collect media units and prepare them for transmission (collection may originate, for example, directly from media recorders such as a video camera, or from a multimedia file server). The network delay consists of the amount of elapsed time from the delivery of a media unit to the transport layer interface of the transmitter, to the delivery of that unit to the transport layer interface of the receiver. Finally, the delivery delay is the time the receiver needs to process the media units, synchronize them, and play them out. The end-to-end latency is to be considered one of the most revealing indicator of the performance of a multimedia system; studies in the field of interactive audio and video demonstrate that it should be kept below 300 milliseconds. Also related to the users’ perception of multimedia data streams are the discontinuities in the rendering of the data units that compose those is not displayed immediately streams. A discontinuity is said to occur when the data unit n after the data unit n [28]. This occurs typically when a data unit is lost, or is not delivered at the receiver in time to be rendered when the playback of the previous unit has completed. Finally, the synchronization among (possibly multiple) audio and video streams is to be considered fundamental in order to ensure that humans perceive the correct temporal ordering of events during a multimedia application. Studies in this field have demonstrated that audio and video should

+1

UBLCS-98-7

30

never be more than 100 milliseconds out of synchronization to guarantee the so-called “lip synchronization”. In addition to end-to-end latency, number of data discontinuities, and deviation from exact synchronization, the critical DMMAs we consider require isochronous rendering of multimedia data streams. Hence, the isochrony in rendering those streams, at a number of possibly geographically separated destination sites, turns out to be an additional parameter of primary importance, in our context. In view of these observations, the experimental assessment of our protocols we have carried out measures the performance of these protocols with respect to: end-to-end latency, number of discontinuities, deviations from synchronization, and isochronous rendering of data streams at different destinations. To this end, we have developed three different series of experiments. Each series of experiments consisted of 10 runs of our test programs, executed over two different k-AMT structures (described below), so as to assess the effectiveness of our protocols over different communication topologies. Essentially, in each run, a very large number of data streams were generated by the k-AMT source nodes, transmitted to the k-AMT root through the k-AMT links, integrated to form CMSDSs (by the synchronizer nodes), and, eventually, FSMDSs, by the root node. The FSMDSs were then transmitted from the root to the destination nodes, and rendered by those nodes. In each experiment, we observed the behavior of our transport and synchronization protocols in a time interval of a fixed length, and measured the total number of FSMDSs that were isochronously rendered at the scheduled play back time by all the destination nodes, out of the total number of data streams that were generated by the source nodes. In each series of experiments we have assumed a particular value of the end-to-end maximum latency D that could be tolerated in the communications between the k-AMT source and destination nodes. Thus, given D, a FSMDS consisting of multimedia data units originated at time O, that could not be delivered to its k-AMT destination nodes by the prescheduled playback time F O D, was considered affected by an unrecoverable timing fault, and hence discarded. The first series of experiments was carried out assuming a 500 milliseconds end-to-end maximum latency, and a 75 milliseconds delay d (as defined in Assumption A3 in Section 2). The second series of experiments assumed a 300 milliseconds end-to-end maximum latency, and a 50 milliseconds delay d. Finally, the third series of experiments assumed a 150 milliseconds endto-end maximum latency, and a 30 milliseconds delay d. Moreover, in order to implement the proposed Group Membership Management Protocol, every k-AMT leaf node in each experiment transmitted a Confirmation Message (consisting of eight bytes) at a periodic rate of 33 milliseconds; the relative group membership was calculated by the k-AMT root node, according to the group management policy described in [45], at a rate equal to the maximum end-to-end latency associated to each series of experiments. As mentioned above, two different k-AMTs were used in each series of experiments. 50% of the experiments were carried out over a k-AMT of depth 3, that included 4 leaf nodes in total, and was characterized by the parameters M , and k . Thus, the four k-AMT leaf nodes were grouped in two clusters, consisting of two leaf nodes each. Moreover, additional link at each k-AMT leaf node was used for transmitting replicas of the data streams. The scenario within which the multimedia data stream exchange occurred in this first k-AMT structure was as follows. Two leaf nodes of the k-AMT, sharing a parent node, were acting as sources of data streams; instead, the other two leaf nodes were acting as data stream destinations (in essence, the source nodes were simulating a video frame source and an audio frame source, respectively; the destination nodes were simulating the corresponding output devices). The second k-AMT structure we used was characterized by 8 leaf nodes, depth 4, M , and k , yielding four clusters of two k-AMT leaves each, and two additional links per k-AMT node. The scenario within which the multimedia data stream exchange occurred was the following. Three leaf nodes of the k-AMT were acting as data stream sources (namely, two video frame sources, and one audio frame source). Two out of these three source nodes shared a grandparent node in the k-AMT; the third source node was situated in a different subtree, instead. Yet again, the remaining five k-AMT leaf nodes acted as data stream destinations.

= +

=2

=2

UBLCS-98-7

=1

1

=2

31

Each experiment was carried out when the underlying Ethernet was lightly loaded. In addition, each k-AMT node (either source, or destination, or synchronizer) was implemented as a set of communicating processes running in one of the eight Sun SPARC 5 workstations, keeping each different k-AMT leaf node in a separate workstation. Finally, no specific hardware was used for the acquisition and the display of digital video and audio. Instead, we simulated the activities of both digitization and compression at the source nodes, and of decompression and display at the destination nodes. To this end, we assumed the use of color video frames (at a x resolution) compressed in approximately to kilobits, combined with audio frames of kilobits. For the purposes of this simulation, we also assumed that those activities of continuous acquisition and display of the video frames required an average period of 33 milliseconds. Tables 1, 2, 3, 4, 5, and 6 summarize the results of our experimentation. Each of those Tables is relative to one of the three values of the end-to-end maximum latency introduced above. For each experiment these tables report: i) the total number of data streams that were generated by the source nodes during the time interval in which the experiment was carried out, ii) the total number of FSMDSs that were isochronously played out by all the destination nodes at the same scheduled play out time, iii) the total number of FSMDSs that were either not played out isochronously, or never played out at all by the set of all the destination nodes, and finally, iv) the percentage of “lost” FSMDs. Tables 1, 2, and 3 are relative to the five experiments carried out over our first k-AMT structure, introduced above; instead, Tables 4, 5, and 6 describe the five experiments developed over the second k-AMT structure. As previously pointed out, the end-to-end latency in interactive audio applications should not exceed 300 milliseconds. In contrast, in non interactive video applications, a maximum delay of 1000 milliseconds can be tolerated [21]. Thus, the results summarized in Tables 1, 2, 4, and 5 show that our reliable transport mechanisms are extremely effective. In fact, Tables 1 and 4 show that, if a 500 milliseconds end-to-end maximum latency is chosen, then only a low average percentage of FSMDSs, (ranging from 0.9% to 2.5%, depending on the specific k-AMT architecture) was either not played out isochronously, or never played out by the destination nodes. Moreover, we observed that approximately the 90% of the total number of “lost” FSMDSs was either effectively lost, or not delivered in time for rendering, at all the different destinations. In other words, these experiments confirmed that our communication mechanisms meet satisfactorily the isochronous rendering requirement. Besides, almost no discontinuities were observed, and the audio and video streams were in perfect synchronization. Qualitatively, the results obtained when the end-to-end maximum latency was set to 300 milliseconds were also encouraging. In fact, only an average percentage of FSMDSs ranging from 2% to 4% (yet again, depending on the specific k-AMT architecture) was either not played out isochronously, or never played out at all, and the experienced video and audio discontinuities were sufficiently separate in time not to result noticeable. Moreover, the 88% of the lost FSMDSs were effectively lost at all the destination nodes. In contrast, Tables 3 and 6 show that, in our implementation, the constraint of 150 milliseconds end-to-end maximum latency causes that the synchronization between multiple audio and video data streams become very inaccurate. In fact, with this latency value, a large number of FSMDSs (ranging from 16.5% to 18%) were not delivered in time for isochronous rendering, at the kAMT destination nodes. Moreover, out of the ten experiments we carried out with this particular latency value, four of them failed, and were interrupted as a very high rate of packet loss was experienced at the system buffers of both the k-AMT synchronizer and destination nodes. That packet loss was likely due to the fact that the reduction of both the end-to-end maximum latency D, and the delay d, caused the SUN OS 4.3 operating system to operate at an unsustainable rate. To conclude this Section, it is worth pointing out that, owing to the limitations imposed by our hardware and software infrastructure, we were able to carry out our experiments with two k-AMT structures composed of a relatively small number of source and destination nodes. However, the communication mechanisms we designed resulted to be scalable with the number of k-AMT source and destination nodes which were used in our experiments. In fact, a comparison between the results reported in Tables 1 and 2, and those reported in Tables 4 and 5, demonstrate that there are not notable differences between the throughput statistics that were obtained using

256

UBLCS-98-7

60 64

240 4

32

experiments experiment 1 experiment 2 experiment 3 experiment 4 experiment 5

generated 10,000 10,000 10,000 16,000 16,000

displayed 9,908 9,918 9,929 15,842 15,866

lost 92 82 71 158 134

lost percentage 0.92 % 0.82 % 0.71 % 0.99 % 0.84 %

D = 500 ms, d = 75 ms).

Table 8. Throughput Statistics - k-AMT with 4 leaves and k=1 (

experiments experiment 6 experiment 7 experiment 8 experiment 9 experiment 10

generated 10,000 10,000 10,000 16,000 16,000

displayed 9,848 9,872 9,781 15,676 15,548

lost 152 128 219 324 452

lost percentage 1.52 % 1.28 % 2.19 % 2.02 % 2.82 %

D = 300 ms, d = 50 ms).

Table 9. Throughput Statistics - k-AMT with 4 leaves and k=1 (

the two different k-AMT architectures.

7

Related Work

Relevant results have been achieved in a number of problem areas concerning the design of distributed multimedia systems; however, there is relatively little work on the problem of supporting critical DMMAs that require an isochronous rendering of real time data, and may need recovery from the loss of video and audio [40]. For example, the ST family of protocols (namely the ST-II, ST2+ protocols) [17] provide connection oriented, multicast based, real-time mechanisms for receiver-initiated communications; these mechanisms allow receivers to join streams, specify their QOS requirements, and initiate stream establishment and resource reservation. However, both ST-II and ST2+ do not provide explicit support for group communications. Thus, the group abstraction must be supported by higher level protocols; in addition, recovery from the loss of video and audio is outside the scope of these protocols. Finally, no support is provided for guaranteeing the isochronous rendering of multicast multimedia data at geographically separated destinations. The approach presented by Jeffay et al. in [28] has been proposed and validated principally for point-to-point real time communications. In [28], a Forward Error Correction method is proposed that ameliorates the effects of audio frame losses, and ignores recovery from video frame losses. In particular, audio frames are transmitted multiple times over the same audio channel. Following that policy, the authors report that approximately the 80% of the lost audio frames can be recovered (at the cost of a 10% additional bandwidth) by retransmitting each audio frame twice.

experiments experiment 11 experiment 12 experiment 13 experiment 14 experiment 15

generated interrupted 10,000 interrupted 16,000 16,000

displayed interrupted 8,522 interrupted 13,612 12,876

lost interrupted 1,478 interrupted 2,388 3,124

lost percentage interrupted 14.78 % interrupted 14.93 % 19.52 %

D = 150 ms, d = 30 ms).

Table 10. Throughput Statistics - k-AMT with 4 leaves and k=1 (

UBLCS-98-7

33

experiments experiment 16 experiment 17 experiment 18 experiment 19 experiment 20

generated 10,000 10,000 10,000 16,000 16,000

displayed 9,982 9,798 9,818 15,496 15,476

lost 198 202 182 504 524

lost percentage 1.98 % 2.02 % 1.82 % 3.15 % 3.27 %

D = 500 ms, d = 75 ms).

Table 11. Throughput Statistics - k-AMT with 8 leaves and k=2 (

experiments experiment 21 experiment 22 experiment 23 experiment 24 experiment 25

generated 10,000 10,000 10,000 16,000 16,000

displayed 9,652 9,674 9,701 15,194 15,288

lost 348 326 299 806 712

lost percentage 3.48 % 3.26 % 2.99 % 5.03 % 4.45 %

D = 300 ms, d = 50 ms).

Table 12. Throughput Statistics - k-AMT with 8 leaves and k=2 (

In the Tenet Protocol Suite I [5], fault-handling and recovery mechanisms are proposed for simplex, unicast channels with performance guarantees. Techniques have been devised that reroute connections so as to bypass failed nodes and links, while maintaining the negotiated performance guarantees. Essentially, when a node or a link fail, the system attempts to recover all the different channels that traverse that failed component. However, owing to the unicast nature of the communications in Suite I, no multicast group information is available that can be used to set up the recovery process. This results in parallel and separate attempts to recover the interrupted channels, yielding a very low rate of successfully rerouted connections. In contrast, in the Tenet Protocol Suite II [6], these failure-recovery techniques are improved by means of a set of new protocols that support multi-party, real time communications. The notion of a real time multicast group (termed Target Set) is introduced; this is a real-time analog of the IP Host Group abstraction, (i.e. the list of all the the destinations interested in “listening” to a common session). The Target Set abstraction implements the decoupling between the senders and the receivers of a given DMMA, and provides support for the management of the connections between them. In addition, the Tenet Protocol Suite II provides the so-called Sharing Group abstraction, that allows the implementation of some form of resource sharing among those network clients whose resource requests partially overlap. Thus, policies have been proposed for storing appropriate state information concerning both the Target Sets and the Sharing Groups in the network components, as well as mechanisms for both reestablishing connectivity of failed channels, and permitting new connections [26]. The above mentioned policies have been designed based on the so-called fate-sharing principle [9]. This principle states that it is acceptable to lose the state information associated with an application component, as long as that component is lost as well. This principle has been exploited by the designers of Tenet Protocol Suite II in order to make decisions about the nodes in the net-

experiments experiment 26 experiment 27 experiment 28 experiment 29 experiment 30

generated interrupted 10,000 10,000 interrupted 16,000

displayed interrupted 8,104 8,354 interrupted 13,102

lost interrupted 1,896 1,646 interrupted 2,898

lost percentage interrupted 18.96 % 16.46 % interrupted 18.11 %

D = 150 ms, d = 30 ms).

Table 13. Throughput Statistics - k-AMT with 8 leaves and k=2 (

UBLCS-98-7

34

work where the appropriate Target Sets and Sharing Groups state information should be placed. For example, critical information associated with the establishment of a given channel is maintained at the channel source node, while the state information concerning a given Target Set is maintained at each destination node. Finally, Backward Error Recovery based mechanisms are implemented in the Tenet Protocol Suite II. These mechanisms use the state information distributed across the network in order to overcome the negative effects that may result from failures of network components. In particular, firstly, possible component failures are detected, using time-out or monitoring based techniques; secondly, based on the available state information, failed multicast trees are repaired or rebuilt, alternative routes computed, failed connections reestablished, and network resources reallocated, in the face of failures that can make part of the network inaccessible. However, backward error recovery based policies, or, in general, automatic repeat request (ARQ) schemes which retransmit corrupted packets according to receiver-generated feedback, may result impractical in a time critical multimedia environment, since the overhead caused by the failure detection and repair activities could be unacceptable in most applications. Besides, since ARQ schemes introduce additional delay and jitter to a media stream, additional buffering would be required at the receiver to smoothen out its impact. In alternative to ARQ schemes, it is possible to avoid the use of feedback and retransmissions to control error rates by employing error concealment techniques (e.g. computing a approximate value for missing pieces of the data through interpolation from neighboring values), or using Forward Error Correction (FEC) techniques. Planning in advance for error control may typically involve embedding redundant information in the transmitted streams, making packets self-contained, or transmitting critical packets more than once [48, 27]. Following these ideas, the Tenet Protocol Suite II designers suggest to employ some Forward Error Correction (FEC) technique in conjunction with a multiple-channel reservation scheme for sending multiple redundant copies of the data to the set of destinations. Unfortunately, to the best of our knowledge, not enough details concerning the above mentioned FEC based policy have been made available by the Tenet Suite II designers for assessing its effectiveness. Finally, no support is provided in the Tenet Suite II protocols for the isochronous rendering of real time multimedia data at different destinations. Another interesting FEC based policy is employed in the INRIA audio tool to recover from loss of audio packets [7]. That tool adjusts the audio packet send rate to the current network conditions, adds redundant information to each packet (under the form of highly compressed versions of a number of previous packets) when the loss rate surpasses a certain threshold, and establishes a feedback channel to control the send rate and the redundant information. The complete process is controlled by an open feedback loop that selects among different available compression schemes, and determines the amount of redundancy needed. Thus, for example, if the network load and the packet loss rate are high, the amount of redundant information carried in each packet is increased by adding to each packet the compressed version of the previous two to four packets. Every 5 seconds, the receiver returns quality of service reports to the sender in order to regulate and adapt the quantity of redundant information being sent. With our approach to fault tolerance, the experimentation that we have carried out (described in the previous Section 3) has empirically demonstrated that our protocols ensure that: i) no more than a low percentage (from 0.9% to 4%) of audio and video frames are not isochronously displayed at all the destinations of a given DMMA, and ii) the isochronous rendering of audiovideo frames can be obtained assuming an acceptable end-to-end latency (i.e. within the range of 300 to 500 milliseconds), even using a poor computing and communication infrastructure. The provision of support for recovering from the loss of video/audio frames is not the only important requirement to meet in the design of multimedia protocols; thus, we devote the remainder of this Section to discussing further important differences between the general approach we propose to the design of real-time communication support for multimedia applications, and other relevant approaches. To this end, we compare below our approach with those emerging from the Receiver-initiated Stream Protocol Version II (Receiver-initiated ST II, for short) proposed in [16], the transport and display mechanism for multimedia conferencing presented by Jeffay et al. in [28], and the media mixing strategy proposed by Rangan et al. in [58]. In particular, we UBLCS-98-7

35

compare and evaluate these approaches with reference to the following five features:

    

communication model; end-to-end latency; jitter control mechanisms; group-membership management mechanisms, and relative additional overheads, and satisfiability of the IR requirement.

Communication model. We have proposed a synchronization strategy for DMMAs that support many-to-many real time communications. The proposal by Receiver-initiated ST II, and that by Rangan et al. have been explicitly designed and experimented to support the one-to-many and many-to-many real time communication models, respectively. Instead, the approach proposed by Jeffay et al. has been mainly tested and validated for a point-to-point real time communications. End-to-end latency. Studies in the field of interactive audio show that the end-to-end latency should range from 100 to 300 milliseconds. Hierarchical synchronization architectures, such as those proposed by Rangan et al., and us, may introduce additional transport delays, owing to the height of the hierarchy. However, both the the bandwidth required for message reception and transmission at each single node of the communication infrastructure, and the computational cost of the synchronization, are reduced. In addition, experimental measures, such as those presented in [58], and our analysis of Section 3.3.3, show that, in several realistic cases, the end-to-end latency can be kept below 300 milliseconds. Jitter control mechanism. In our approach, similar to that used in the Receiver-initiated ST II, we assume that is possible to bound the delay jitter below a small upper bound by exploiting, for example, the resource reservation based scheme for delay jitter control proposed by Ferrari [20, 23]. In contrast, both the proposals by Jeffay et al. and Rangan et al. approach the design of realtime communication mechanisms without using resource reservation. In principle, this approach cannot provide guaranteed and predictable QOS. In particular, Jeffay et al. propose a “best effort” delivery of digital audio and video protocol that provides mechanisms that dynamically adapt the reception and transmission frame rate to the bandwidth available in the network. Group management mechanisms. Both Jeffay et al. and Rangan et al. proposals do not explicitly provide support for group management. Instead, in the Receiver-initiated ST II approach, mechanisms are provided for receiver-initiated communications that allow receivers to join streams, specify their QOS, initiate stream establishment and resource reservation. The authors of the Receiver-initiated ST II have shown that there is a trade off between the scalability of their protocol and the protocol functions. Maximum scalability can be obtained if the origin of the multicast tree (i.e. the source of the DMMA) is unaware of the receivers. In this case, however, some global function cannot be executed by the origin. However, if the origin is aware of all the receivers, it has more control, may execute all the expected global functions but, consequently, the amount of control messages increases. In addition, in the Receiver-initiated ST II protocol, there is not attempt to cope with possible communication faults while executing the group management. In contrast, in our approach, multiple sources and receivers are allowed to initiate the operations for joining and leaving an existing DMMA. The k-AMT root maintains a full control on the group membership management. In addition, confirmation messages flow periodically from the sources and destinations to the k-AMT root, and from the root back to the DMMA sources and destinations, providing robustness of our group membership protocol, in spite of communication faults. Yet again, our analysis in Section 4.4.1 has shown that, fixing the available bandwidth, this periodical flow of control messages causes a “periodical” additional end-to-end latency in the range from 10% to 60%, depending on the particular k-AMT communication architecture. IR requirement. In the “best effort” approach, proposed by Jeffay et al., the audio and video streams have been evaluated never to be more than 100 milliseconds out of synchronization. The mixing algorithm proposed by Rangan et al. provides conditions to be met in order to obtain isochrony. Finally, our approach allows the DMMA designer to meet the IR requirement. UBLCS-98-7

36

8

Conclusions

In this paper we have introduced both the general design and the prototype implementation of a communication architecture developed to support critical DMMAs. We have shown that this architecture can meet effectively such DMMAs requirements as those for synchronization and isochronous rendering of multimedia data streams, group management and communications, scalability and dependability. The performance measures of the prototype implementation we have developed show the adequacy of a redundancy based approach to provide critical DMMAs with reliable communications. In the near future, we expect to be able to assess our architecture using a wider (and more suitable) distributed infrastructure than that described in this paper, (namely, an ATM based communication infrastructure). Finally, further design issues that we will investigate include: the design of k-AMT reconfiguration strategies that deal with failures of the k-AMT root node, the design of QoS negotiation services, and that of appropriate security mechanisms for critical DMMAs.

References [1] T. Anderson, P. A. Lee, Fault Tolerance - Principles and Practice, London, Prentice-Hall International, 1981. [2] D. P. Anderson, Y. Osawa, R. Govindan, A File System for Continuous Media, ACM Transactions on Computer Systems, Vol. 10, N. 4, November 1992, pp. 311 - 337. [3] O. Babaoglu, R. Davoli, L. A. Giachini, M.G. Baker. RELACS: A Communications Infrastructure for Constructing Reliable Applications in Large-Scale Distributed Systems, Proc. of the 28th Hawaii International Conference on System Sciences, Maui, January 1995. [4] J.P. Banatre et al., The Design and Building of Enchere, a Distributed Electronic Marketing System, Communications of ACM, Vol. 29, N. 1, pp. 19-29, January 1986. [5] A. Banerjea, D. Ferrari, B.A. Mah, M. Moran, D.C. Verma, H. Zhang. The Tenet Real Time Protocol Suite: Design, Implementation and Experiences, IEEE/ACM Transactions on Networking, February 1996, Vol.4 N. 1, pp. 1-10 [6] R. Bettati et al., Connection Establishment for Multi-Party Real-Time Communication, Proc. NOSSDAV 1995, Durham, New Hampshire, April 18-22, 1995. [7] J. Bolot, A. Vega Garcia, “The Case for FEC-based Error Control for Packet Audio in the Internet”, to appear in ACM Multimedia Systems Journal. [8] A. Campbell, G. Coulson, D. Hutchison, A Multimedia Enhanced Transport Service in a Quality of Service Architecture, Proc. 4th International Workshop on Network and Operating System Support for Digital Audio and Video, Lancaster (U.K.), November 1993, pp. 123 - 136. [9] D. Clark, The Design Philosophy of the DARPA Internet Protocol Proc. of ACM SIGCOMM’88, Stanford (CA), pp. 106-114. [10] G. Coulson, F. Garcia, D. Hutchison, D. Shepherd, Protocol Support for Distributed Multimedia Applications, in Network and Operating System Support for Digital Audio and Video, R.G. Herrtwich (Ed.), LNCS 614, Springer-Verlag, Berlin Heidelberg, 1992, pp. 45 - 56. [11] G. Coulson, F. Garcia, D. Hutchison, D. Shepherd, Meeting the Real-Time Synchronization Requirements of Multimedia in Open Distributed Processing, Distrib. Syst. Eng. Vol. 1, 1994, pp. 135-144. [12] F. Cristian, Reaching Agreement on Processor-Group Membership in Synchronous Distributed Systems, Distributed Computing, Vol. 4, 1994, pp. 175-187. [13] F. Cristian, H. Aghili, R. Strong, D. Dolev, Atomic Broadcast: from Simple Diffusion to Byzantine Agreement, Information and Computation Vol. 118, 1995, pp. 158-179 [14] S. Deering, Host Extension for IP Multicasting, RFC N. 1112 (1992). [15] S. Deering, D. Estrin, D. Farinacci, V. Jacobson, C.G. Liu, L. Wei, An Architecture for Wide-Area Multicast Routing, Proc. of ACM SIGCOMM’94, August 1994, London UK, pp. 126-136. [16] L. Delgrossi, R.G. Herrtwich, F.O. Hoffmann, S. Schaller, Receiver-Initiated Communication with ST-II, ACM Multimedia Systems Journal, Vol. 2, N. 4, 1994, pp. 141-149. UBLCS-98-7

37

[17] L. Delgrossi, C. Halstrick, D. Hehmann, R.G. Herrtwich, O. Krone, J. Sandvoss, C. Vogt. Media Scaling in a Multimedia Communication System ACM Multimedia Systems Journal, Vol. 2, N. 4, 1994, pp. 172-181. [18] H. Eriksson MBONE: The Multicast Backbone, Communications of the ACM, Vol. 37, N. 8, 1994, pp. 54-60. [19] J. Escobar, C. Partridge, D. Deutsch, Flow Synchronization Protocol, IEEE/ACM Transactions on Networking, Vol. 2, N. 2, 1994, pp. 111-121. [20] D. Ferrari, D. C. Verma, A Scheme for Real-Time Channel Establishment in Wide Area Networks, IEEE Journal on Selected Areas in Communications, SAC-8, April 1990, pp. 368 - 379. [21] D. Ferrari, Client Requirements for Real-time Communication Services, IEEE Commun. Mag., Vol. 28, N. 11, 1990, pp. 65 - 72. [22] D. Ferrari, A. Gupta, M. Moran, B. Wolfinger, A Continuous Media Communication Service and its Implementation, Proc. Globecom’92, Orlando, Florida. [23] D. Ferrari, Design and Application of a Delay Jitter Control Scheme for Packet-switching Internetworks, in Network and Operating System Support for Digital Audio and Video, R.G. Herrtwich (Ed.), LNCS 614, Springer-Verlag, Berlin Heidelberg, 1992, pp. 72 - 83. [24] D. Ferrari, Multimedia Network Protocols: where are we? ACM Multimedia System Journal, Vol. 4, 1996, pp. 299-304. [25] I. Getting, The Global Positioning System, IEEE Spectrum, December 1993, pp. 36-47. [26] A. Gupta, K. Rothermel (1995) Fault Handling for Multi-party Real Time Communication TR-95059, International Computer Science Institute, Berkeley (CA), October 1995. [27] V. Hardman, M. A. Sasse, I. Kouvelas, Successful Multiparty Audio Communication over the Internet Communications of ACM, Vol. 41, N. 5, May 1998, pp. 74-80. [28] K. Jeffay, D. L. Stone, F. Donelson Smith, Transport and Display Mechanims for Multimedia Conferencing across Packet-Switched Networks, Computer Networks and ISDN Systems, Vol. 26, N. 10, 1994, pp. 1281 - 1304. [29] K. Jeffay, D. L. Stone, F. Donelson Smith, Kernel Support for Digital Audio and Video, in Network and Operating System Support for Digital Audio and Video, R.G. Herrtwich (Ed.), LNCS 614, Springer-Verlag, Berlin Heidelberg, 1992, pp. 10 - 21. [30] T. A. Joseph, K. P. Birman, Reliable Broadcast Protocols, in An Advance Course on Distributed Systems, S. J. Mullender (Ed.), Addison Wesley Publishing Co., 1989, pp. 313 - 338. [31] L. Kleinotz, M. Ohly, Supporting Cooperative Medicine: The Bermed Project, IEEE Multimedia, Vol. 1, N. 4, 1994, pp. 44-53. [32] H. Kopetz, G. Grunsteidl, J. Reisenger, Fault-Tolerant Membership Service in a Synchronous Distributed Real-Time System, Proc. Int. Conf. on Dependable Computing for Critical Applications, Santa Barbara, Ca, August 23-25, 1989, pp. 167 - 174. [33] H. Kopetz, G. Grunsteidl, TTP, A Time-Triggered Protocol for Fault-Tolerant Real-Time Systems, Research Report Nr. 12/92/2, Institut fur Technische Informatik, Technische Universitat, Wien, September 1992. [34] J. Kurose, Open Issues and Challenges in Providing Quality of Service Guarantees in High Speed Networks, ACM SIGCOMM Computer Communication Review, Vol. 23, N. 1, January 1993, pp. 6 - 15. [35] P. Leydekkers, B. Teunissen, Synchronization of Multimedia Data Streams in Open Distributed Environments, in Network and Operating System Support for Digital Audio and Video, R.G. Herrtwich (Ed.), LNCS 614, Springer-Verlag, Berlin Heidelberg, 1992, pp. 94 - 104. [36] T. D. C. Little, A. Ghafoor, Spatio-Temporal Composition of Distributed Multimedia Objects for Value-Added Networks , IEEE Computer, October 1991, pp. 43 - 50. [37] T. D. C. Little, A. Ghafoor, Interval Based Conceptual Models for Time Dependent Multimedia Data , IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No. 4, August 1993, pp. 551 - 563. [38] P. M. Melliar-Smith, L. E. Moser, V. Agrawala, Broadcast Protocols for Distributed Systems, IEEE Trans. on Parallel and Distributed Systems, Vol. 1, N. 1, January 1990, pp. 17 - 25. [39] L. E. Moser, Y. Amir, P. M. Melliar-Smith, V. Agrawala, Extended Virtual Synchrony, Proc. IEEE ICDCS-14, Poznan, Poland, June 1994, pp. 56-65. UBLCS-98-7

38

[40] K. Naik, Exception Handling and Fault Tolerance in MultiMedia Synchronization IEEE Journal on Selected Areas in Communications, Vol 14, N. 1, January 1996, pp. 196-211. [41] P.G. Neumann, On Hierarchical Design of Computer Systems for Critical Applications, IEEE Trans. on Software Engineering, Vol. SE-12, No. 9, 1986, pp. 905 - 920. [42] F. Panzieri, S.K. Shrivastava, The Design of a Reliable Remote Procedure Call Mechanism IEEE Transactions on Computers, Vol. C-31, N. 7, July 192, pp. 692-697. [43] F. Panzieri, M. Roccetti, A Scalable Architecture for Reliable Distributed Multimedia Applications, Proc. IEEE ICDCS-14, Poznan, June 1994, pp. 284-293 [44] F. Panzieri, S.K. Shrivastava, A View of Large Scale Distributed Computing, Broadcast Project Deliverable Report, October 1994. [45] F. Panzieri, M. Roccetti, Synchronization Support and Group-Membership Services for Reliable Distributed Multimedia Applications ACM Multimedia Systems Journal, Vol.5, N. 1, January 1997, pp. 1-22. [46] F. Panzieri, M. Roccetti, Communication Support for Critical Distributed Multimedia Applications: an Experimental Study Proc. IEEE 30th Hawaii Conference on System Sciences, Maui, January 1997, pp. 34-43. [47] J. C. Pasquale, G. C. Polyzos, G. Xylomenos, The Multimedia Multicasting Problem ACM Multimedia Systems Journal, Vol.6, 1998, pp. 43-59. [48] M. Roccetti, et al. Design and Experimental Evaluation of an Adaptive Playout Delay Control Mechanism for Packetized Audio for use over the Internet, UBLCS Technical Report n. 98-4 , Laboratory for Computer Science, University of Bologna) . [49] K. Rothermel, G. Dermler, Synchronization in Joint-Viewing Environments Network and Operating System Support for Digital Audio and Video, P Venkat Rangan (Ed.), LNCS 712, Springer-Verlag, Berlin Heidelberg, 1993, pp. 94 - 104. [50] M. Satyanarayanan, The Influence of Scale on Distributed File System Design, IEEE Trans. on Software Engineering, Vol. 18, N. 1, January 1992, pp. 1 - 8. [51] D. Sheperd, M. Salmony, Extending OSI to Support Synchronization Required by Multimedia Applications, Computer Communications, Vol. 13, N. 7, September 1990, pp. 399 - 406. [52] A. Schiper, A. Ricciardi, Virtually Synchronous Communication Based on a Weak Failure Suspector, Proc. 23rd Int. Symp. on Fault-Tolerant Computing, June 1993, pp. 534 - 543. [53] B.N. Schilit, M.M. Theimer, Disseminating Active Map Information to Mobile Hosts, IEEE Network, Vol. 8, N. 5, September/October 1994, pp. 22 - 32. [54] C. Szyperski, G. Ventre, Efficient Multicasting for Interactive Multimedia Applications, Technical Report TR-93-017, International Computer Science Institute, Berkeley, CA, March 1993. [55] A.S. Tanenbaum, Computer Networks, Prentice Hall, Englewood Cliffs (NJ), 1981. [56] W. Tawbi, F. Horn, E. Horlait, J.B. Stefani, Video Compression Standards and Quality of Service, The Computer Journal (1993) Vol. 36. N.1, pp. 43-54. [57] J. Turek, D. Shasha, The Many Faces of Consensus in Distributed Systems, IEEE Computer, Vol. 25, N. 6, June 1992, pp. 8 - 17. [58] P. Venkat Rangan, H. M. Vin, S. Ramanathan, Communication Architectures and Algorithms for Media Mixing in Multimedia Conferences, IEEE/ACM Trans. on Networking, Vol. 1, N. 1, February 1993, pp. 20 - 30. [59] P. Verissimo, L. Rodrigues, Group Orientation: a Paradigm for Distributed Systems of the Nineties, Proc. 3rd IEEE Workshop on Future Trends of Distributed Computing Systems, April 1992, Taipee, Taiwan. [60] H. M. Vin, M.S. Chen, T. Barzilai, Collaboration Management in DiCe, The Computer Journal (1993) Vol. 36. N.1, pp. 87-96. [61] A. Vogel, B. Kerherve, G. Von Bochmann, Distributed Multimedia and QOS: A Survey, IEEE Multimedia, Vol. 2, N. 2, 1995, pp. 10-19. [62] B. Wolfinger, M. Moran, A Continuous Media Data Transport Service and Protocol for Real-time Communication in High Speed Networks, in Network and Operating System Support for Digital Audio and Video, R.G. Herrtwich (Ed.), LNCS 614, Springer-Verlag, Berlin Heidelberg, 1992, pp. 171 - 182.

UBLCS-98-7

39