High Performance Distributed Computing over ATM ... - CiteSeerX

0 downloads 0 Views 75KB Size Report
purpose, as far as the protocol structure is concerned three strategies .... TCP/IP. Technology. Throughput (Mb/s) Startup latency (s). Ethernet. 8.40 ... tation and software-related issues such as the protocol stack ... Intuitively, one of the solutions to overcome the .... proach has the disadvantage that a specific version of PVM.
High Performance Distributed Computing over ATM Networks: A Survey of Strategies Joan Vila-Sallent, Josep Sol´e-Pareta Department of Computer Architecture Universitat Polit`ecnica de Catalunya Barcelona, Catalonia (Spain). E-mail: fjoanv, [email protected] Abstract

workstations, since the rapid improvement of workstation performance enables the availability of an aggregate performance potential that is comparable to what supercomputers can achieve today. This fact, together with the lower cost of a workstation cluster, is the key for the current success of this approach. In a usual network of workstations, some of them remain idle during significant periods of time. A possible use for these periods of time is to allow the workstations to run parallel jobs. In a particular case presented in [1], a system based on workstations runs only 10 times slower than a massively parallel processor-based computer.

The goal of this paper is to discuss the current trends on High Performance Distributed Computing (HPDC) environments supported by ATM networks. The adoption of a high speed network like ATM to overcome the bandwidth limitation induced by traditional LAN technologies has moved the bottleneck to the host. The challenge of ATM-based HPDC systems is to minimize the latency inherent to networking so that the aggregate performance be as close as possible to that of supercomputers. Since ATM will become the predominant networking technology in LANs and WANs, the achievement of this objective will approach high performance computing to a greater number of users. For this purpose, as far as the protocol structure is concerned three strategies are being considered: (i) replacing the network technology by ATM, leaving the rest unchanged, (ii) removing transport protocols and making applications and userlevel libraries provide the HPDC-specific transport-level functions that ATM networks currently lack, and (iii) enhancing the structure of the HPDC environment with HPDCspecific mechanisms, which eventually may be supported by hardware. After introducing the problems associated to HPDC, we comment the basic guidelines of each direction and discuss the principal contributions, including their most relevant results.

Networks of computers, either LAN or WAN, can be seen for many purposes as multicomputers, since both architectures have in common the structure of processing nodes linked by an interconnection network [5]. The characteristics of the interconnection network of both models are not comparable, however. In addition to reliability and security problems, the standard LAN and WAN technologies have become the hardest bottleneck precluding the full benefit from today’s workstation potential capabilities. The introduction of faster and flexible network technology, such as ATM, will help minimize the performance degradation inherent to networking. The ATM technique was designed to provide the transport service for the B-ISDN (Broadband Integrated Services Digital Network) and thus its main feature is the flexibility in accommodating to variable transmission speeds and highburstiness traffic. These characteristics are also interesting for computer data communications hence the fact that most current ATM environments are implemented in LANs. In workstation clusters, high burstiness is experienced as well, therefore ATM also may result in an adequate high-speed networking technology.

1. Introduction Traditionally, high performance computing has been monopolized by vector and massively parallel supercomputers. As these systems have rather a poor price-to-performance ratio, the search of other options has become a matter of interest. The most promising proposal is the use of networked

There exist two main families of HPDC systems: message-passing and shared memory. The latter are easier for the programmer to manage, but message-passing systems adapt better to the architecture of workstation clusters,

 This work has been supported by CICYT (Spanish Education Ministry) under contract TIC-95-0982-CO2-01 1

hence most works have focused on this model, although there are also proposals for optimizing a shared-memory model over ATM [14]. Message-passing systems consist of three architectural levels: (1) the interface, usually in charge of a user-level library such as p4, MPI and PVM [6]; (2) the transport protocols, which map the library’s communication primitives to the characteristics of the network service; and (3) the network service itself. In order to achieve the desired performance, this architecture has to be implemented in such a manner that the performance degradation necessarily introduced be as low as possible. In HPDC systems, the main challenge is to minimize the impact of the network on the global performance [4]. The increasing bandwidth provided by emerging networking technologies is an important step towards this goal; however, other factors contributing to communications time like transport protocols and network interfaces have not improved at the same rate [13]. Therefore, the network technology is no longer the main bottleneck and other issues apart from it, residing in the host, have to be considered, which include both hardware and software factors. In the present paper we explore several approaches which have been proposed in the literature in order to alleviate the contribution of networking to performance degradation: (i) replacing the network technology by ATM, without any change in the protocol structure; (ii) removing transport protocols and making applications and user-level libraries provide the HPDC-specific transport-level functions that ATM currently lacks, and (iii) introducing new, HPDC-specific mechanisms in the HPDC environment, which eventually may be supported by hardware and affect all levels of the communications architecture. This paper is organized as follows: Section 2 reviews the performance of legacy-network-based HPDC systems and introduces the potential advantages of ATM. In Section 3 the replacement of much of the protocol infrastructure by an ATM-specific API (Application Programming Interface) is discussed. New, HPDC-specific mechanisms for enhancing the structure of HPDC environments over ATM are described in Section 4. Finally, Section 5 summarizes the paper.

of an equivalent environment when the traditional technology is replaced by ATM.

2.1. Performance achieved with legacy LAN technologies The measurements in [4] include both communication potential and benchmark performance. The communication tests show that for both Ethernet and FDDI the latency is between two and three milliseconds, which is much higher than the 700 s experienced by a multicomputer switch. The benchmark measurements show that:

 With Ethernet no speedup is experienced when increasing the number of processors, due to the high latency and the low bandwidth resulting from the fact that communications take place through a shared medium. Concurrent communication requests are serialized since only one host can be transmitting at a given time.  With FDDI, a reasonable speedup is shown up to eight processors. Beyond that, FDDI saturates and suffers from the same problems as Ethernet. From these results, the most popular LAN technologies like Ethernet and FDDI are expected not to have adequate connectivity and/or bandwidth as well as low enough latency to carry out some of the applications. In contrast, switched networks such as those used in multicomputers can offer fully bidirectional communication over all the links, thereby facilitating simultaneous communications among different processors. ATM is potentially capable of overcoming this cause of performance degradation since it is a switched technology. Table 1. Performance under sockets and TCP/IP. Technology Ethernet FDDI ATM

Throughput (Mb/s) 8.40 17.20 16.72

Startup latency (s) 1:053 1:833 1:960

 10?3  10?3  10?3

2. Replacing traditional LAN technologies

2.2. Performance achieved with ATM

The attractiveness of parallel computing over LANs has lead to some studies of both the performance of workstation clusters over legacy networks and their limitations due to this assumption, as in [4], where several cluster environments, based on legacy LANs, are compared to multicomputers. The achieved performance can potentially be improved with the introduction of a high-speed network technology such as ATM. In [10], the performance achieved with traditional LAN technologies has been measured and compared to that

The performance achieved by the replacement of traditional LAN technologies by ATM is analyzed in [10, 13]. Table 1 shows the results presented in [10], where the performance over ATM is compared to Ethernet and FDDI. In the three cases, the messages are generated by a UNIX socket API. These results show that the performance achieved with the replacement of legacy LAN technologies by ATM does not provide significant improvements in performance and, with respect to FDDI, the performance is even worse. 2

Table 2. Performance of ATM and two traditional LAN technologies under a simple RPC protocol. RPC time (s) Activity

Ethernet Short Long

FDDI Short Long

ATM Short Long

System Calls Interrupt Handling Total Software Controller Latency Time on the wire Total Latency

123 51 174 51 115 340

671 51 722 52 1278 2052

153 112 265 97 9 371

280 126 406 164 127 697

108 34 142 16 6 164

560 37 597 88 91 776

Software Speedup Hardware Speedup Network Speedup Global Speedup

-

-

0.6 0.5 12.7 0.9

1.8 0.3 10.0 3.0

1.2 3.2 19.1 2.1

1.2 0.6 14.1 2.6

Table 2 shows the results presented in [13], where the latency experienced by a lightweight RPC (Remote procedure Call) protocol is analyzed with Ethernet, FDDI and ATM, as before. In particular, the different contributions to latency are separately studied, and two sizes of messages have been considered, which are labeled “short” and “long”. The speed improvements achieved by each contribution to latency with respect to the performance over Ethernet have been included in Table 2. These results show that the 14/19-fold bandwidth increment of ATM results in an effective speed increase of a factor of only 2, highlighting the influence of both hardware and software concerns in addition to the networking technology itself. A consequence of this fact is that, analogously to the behavior shown in Table 1, FDDI can perform better than ATM in certain cases.

these advances, progress in hardware issues is mandatory, so research on high-speed host-network interfaces is currently very active.

3. Introducing an ATM-specific API The main conclusion from the previous section is that the increase of bandwidth provided by ATM moves the bottleneck from the network to both the host and the protocols. Although the absolute time required for protocol processing could be reduced by the increased processing capacity of workstations, the relative impact of latency on the overall time may increase. This fact demands strategies to reduce this impact. Intuitively, one of the solutions to overcome the overhead introduced by the protocol structure is to bypass it, allowing for applications to directly access ATM through an ATM-specific API. Several papers are devoted to the study of the ATM API behavior [10, 21, 3]. These papers compare the performance of ATM API against other interfaces and evaluate its integration with existing HPDC environments. In particular, two approaches for the integration of an ATM API appear: (1) Leaving some transport-layer functionality to applications, without modifying any underlying system software such as operating system and message-passing libraries, and (2) Modifying message-passing libraries so as to obtain ATM API-specific implementations. Both approaches are discussed in the following.

2.3. Discussion The reason for the worst behavior exhibited by ATM with respect to FDDI is the absence in FDDI of functions equivalent to segmentation and reassembly, which in ATM introduce an additional processing delay that increases with the length of messages. In general, the improvement in bandwidth introduced by ATM results in a higher sensitivity to the delays introduced in the host, which include hardwarerelated issues like the bus design and controller implementation and software-related issues such as the protocol stack exploiting the network. A deeper study of the causes for the performance degradation in ATM-based HPDC systems has been carried in [16]. In particular, these causes are related to the network size and the network load, in addition to the delays introduced in the host which have been illustrated in this section. A full advantage of ATM technology will only be achieved in the immediate future with the minimization of the bottlenecks revealed by the measurements in Tables 1 and 2. The following sections discuss several approaches published recently in the literature which aim at this objective, as far as software is concerned. In addition to

3.1. Performance without an adapted messagepassing library In [10], a comparative study of four APIs has been carried out: PVM, RPC, BSD sockets and the ATM API proved by Fore Systems, a principal manufacturer of ATM devices. Figure 1(a) shows how the protocol structure looks like at all cases. For a basic echo test program, these combinations provide the performance values shown in Table 3. The main result from Table 3, obtained from [10] is that the best performance is achieved by the ATM API. It is very 3

Applications PVM

Applications

RPC

PVM

BSD Sockets

BSD Sockets

{TCP,UDP}/IP

Fore’s ATM API

{TCP,UDP}/IP

Fore’s ATM API

{AAL3/4,AAL5}

{AAL3/4,AAL5}

ATM

ATM

(a) Protocols considered by Lin et al [10].

(b) Protocols considered by Dowd et al [3].

Figure 1. Two protocol architectures considering an ATM API. API for small matrices, which indicates that the impact of message length is better for the API rather than the traditional transport protocol.

Table 3. Performance of five protocol combinations from a ping-pong test. Protocol structure

Throughput (Mb/s)

RPC PVM BSD Sockets ATM API over AAL3/4 ATM API over AAL5

12.72 12.16 16.72 32.56 31.68

Startup latency (s) 2:957 2:766 1:960 1:034 0:869

 10?3  10?3  10?3  10?3  10?3

3.2. Performance with adapted message-passing libraries In addition to performance issues, applications relying on low-level functionality as is the case of ATM API involve a high degree of complexity in them. Therefore, it becomes wise to adopt a more user-friendly interface, which will manage the additional functionality required by applications, for instance a message-passing library like PVM. Papers [21, 3] deal with this case, and both are based on the architecture shown in Figure 1(b). In [21], the PVM library has been modified in order to support direct access to the ATM API. In addition to some PVM implementation issues, a guaranteed transmission facility has been incorporated. The performance achieved by this enhancement has been compared to PVM-IP-ATM and PVM-IP-Ethernet. A particular implementation of these functions into the PVM library is discussed in [2]. This approach has the disadvantage that a specific version of PVM is required for every vendor’s ATM API, but in the future this problem can be partially solved thanks to current API standardization efforts in the ATM Forum [12].

important to remark, however, that the functionalities provided by these APIs are not comparable. The ATM API in particular offers no additional features with respect to ATM, and therefore in many cases they have to be provided by upper layer software or the application itself, which introduces a cause of performance degradation not considered in the measurements. The real algorithms measurements performed in [10], displayed in Table 4, show that little speedup is achieved for ATM API with respect to the other APIs, what confirms this trend. Table 4. Execution time of matrix multiplication over several APIs. Protocol structure 32 Sequential PVM over ATM PVM over Ethernet (silent) PVM over Ethernet (30% loaded) Sockets over ATM Sockets over Ethernet (silent) Sockets over Ethernet (30% loaded) ATM API

 32

Matrix Size 128 128 256



 256

0.0988 s 0.0524 s

6.6205 s 1.9493 s

64.0001 s 16.4005 s

0.0134 s

1.9693 s

16.9130 s

0.0341 s 0.0736 s

2.0355 s 1.9177 s

17.2416 s 16.4030 s

0.0627 s

1.9136 s

16.7187 s

0.0714 s 0.0629 s

1.9932 s 1.7758 s

16.9256 s 16.2709 s

Table 5. Performance of PVM over several architectures. Protocol structure

Bandwidth (Mb/s)

PVM over AAL5 (API) PVM over AAL3/4 (API) PVM over IP-ATM PVM over IP-Ethernet

26.608 17.776 16.848 8.608

Latency (s) 1:617 1:646 1:234 1:662

 10?3  10?3  10?3  10?3

The results in Table 5, obtained from a simple ping-pong test, suggest that (1) the introduction of ATM enables faster communications, as expected, and (2) the direct access to

Another issue from Table 4 is the fact that PVM over ATM, i.e. PVM-TCP/IP-ATM, performs better than ATM 4

Figure 2. Performance of several software structures obtained by Dowd et al [3]. the ATM API does not involve significant improvements with respect to the access through IP. Although bandwidth increases (very low indeed in the AAL3/4 case, due to its high amount of overhead), the performance in latency of both direct APIs is clearly worse than the IP-ATM structure. The candidates to the bottleneck are the implementation of both the ATM API and the enhanced features added to PVM. As HPDC applications are sensitive to latency [11], it is very important to overcome these problems. In [3], a similar a study has been carried out. The PVM library has been modified to enable direct access to the ATM API, in order to determine the magnitude of the performance improvement compared to ATM over TCP/IP. Figure 2 depicts the latency and throughput achieved by running simple ping-pong tests in several configurations. Both ATMAPI and Ethernet-based environments have been considered. The results show that there is little benefit of direct API access with current implementations of ATM networks. The throughput when using ATM through UDP is even superior. This behavior is attributed to the fact that while UDP is implemented directly in the kernel, the API is implemented in user space and, to communicate to the device driver, which runs over kernel space, a context switch is caused, which introduces an important amount of overhead.

do not get increased complexity, but the achieved performance does not get significant improvements either. In both cases, the reasons for this behavior are twofold:

 The implementation of the transport-layer functions is not HPDC-specific. Therefore, some of the problems of generic transport protocols still remain.  The sensitivity to the implementation of the ATM API is very high. In particular, the Fore’s ATM API has shown not to be optimized for achieving low latency, which is a requirement for HPDC applications. As a result, in order to achieve the desired performance it will be necessary to replace current implementations of the API and the message-passing library by more efficient versions which really takes into account the specificities of ATM and high performance distributed computing.

4. Enhancing the HPDC environment structure In the previous section, we have seen that the main software bottlenecks are located in the ATM API and the message-passing library. Several authors have proposed mechanisms to improve some of these issues. We classify these proposals in three groups, according to the primary issue the enhancement impacts on: (1) the ATM API; (2) the transport protocol; and (3) the application context. Examples of these approaches are [17], [7], and [20, 9], respectively.

3.3. Discussion The replacement of the protocol structure by an ATM API, which does not add any functionality to that of ATM, involves that some functions like flow control and error recovery have to be performed elsewhere, specially when AAL5 is used. If these functions are left to applications, their implementation become more complex and the performance improvement is not very significant. If transportlayer functions are implemented in ATM-API-specific implementations of the message-passing library, applications

4.1. API-level enhancement mechanisms In Section 3, the implementation of the API adopted for the experiments has shown to be an important bottleneck. Therefore, the research on alternative APIs becomes a key 5

requirement to achieve enhanced performance for ATMbased HPDC environments. In [17], a multicomputer communication mechanism, namely Active Messages is adapted to ATM. This mechanism is designed to implement a very fast, low-granularityRPC model implementation [18]. Each transmitted message includes a pointer to a user-level routine called handler whose function is to extract the message from the network and to integrate it with the ongoing computation, in a similar manner as carried out by the processor when receiving an interrupt request. Since all this process occurs in user space, memory copying and buffering overheads are avoided. The ATM implementation consists of a device driver which is downloaded into the kernel and a user-level library containing most of functionality of Active Messages, and both parts interact through traps. To overcome the unreliability inherent to ATM, a simple flow control mechanism with retransmission has been added. Table 6 from [17] shows the performance achieved by a micro-benchmark by several supercomputers and a workstation cluster with Active Messages. The results indicate a significantly high performance for the workstations with Active Messages.

In this environment, the transport protocol is called HCP. Its high speed is achieved by (1) implementing the protocol in a special communication processor, thus offloading the host from protocol processing, and (2) making the protocol implement the basic functionality of message-passing libraries. The communication processor (HIP) is designed to offer rates close to the medium speed [8]. The protocol includes separate procedures for transmitting short and long messages, as well as simple error and flow control functions in order to ensure reliability. The protocol provides applications with HPDC-specific services: point-to-point communications, group communications, process synchronization, system control and management, and error handling. The implemented services are found to be common to the most popular messagepassing libraries, including synchronous and asynchronous data transfers, broadcast, barrier and system configuration. The reliability not provided by ATM is obtained by a simple stop-and-wait flow control approach.

4.3. Application-level mechanisms Despite the availability of enhanced mechanisms like those described above, a portion of latency can remain unavoidable. Several approaches intend to take advantage of the idle timer for doing more computation. One common technique for this purpose is multithreading, which consists of allowing several concurrent execution flows per task, socalled threads, sharing all the resources allocated on a pertask basis. Thus, while one thread is waiting for a message to arrive, another thread can carry out computations not requiring the data in the expected message. An example of an ATM-specific multithreading mechanism is discussed in [20], where a small multithreaded subsystem is implemented between the message-passing library and the transport protocol (Figure 3(b)). The transport protocol layer, called NCS MPS, is similar to the HCP protocol described above. Table 7 shows the performance gains experimented by a matrix multiplication algorithm over such an environment.

Table 6. Performance of Active Messages over ATM [17]. Machine

Peak BW

SP-1 + MPL/p Paragon + NX CM-5 + Active Messages SS-20 cluster + SSAM

66.4 Mb/s 584.0 Mb/s 80.0 Mb/s 60.0 Mb/s

Round-Trip Latency 56 s 44 s 12 s 52 s

This good performance is achieved at a cost of more complexity for the programmer to exploit this interface. Active messages introduce several restrictions which are specially important when multithreading is involved with them [19]. In particular, since blocking handlers is not possible, deadlock situations may occur without a careful programming. Several application-mechanisms have been designed to hide this complexity to the programmer [19, 15].

4.2. Transport-level mechanisms

Table 7. Execution timer of Matrix Multiplication (seconds).

Traditional transport protocols have regarded the communication bandwidth as a scarce resource and the communication medium as inherently unreliable, therefore they were designed to be very general in order to handle complex failure scenarios. These characteristics have lead to complicated and therefore time-consuming protocol implementations. The high speed and high reliability of current networks allows for simpler protocols. In [7], a communication system specifically tailored for HPDC is presented. The architecture of this system is shown in Figure 3(a).

Nodes

p4

NCS MTS/p4

% Improvement

1 2 4

24.89 14.4 7.52

25.03 11.51 5.41

20.06% 28.05%

Another work taking advantage of multithreading is [9]. For each task, separate threads are created for computation, data send and data/ack receive. Here, multithreading is exploited for configuring efficient collective communications, 6

p4 appl

Usual Networking Applications

High-Speed Applications

PVM

...

NCS appl

Message Passing filters

Message-passing tools EXPRESS

PVM appl

ISIS

Multithreaded Subsystem (NCS_MTS) Message-Passing Subsystem (NCS_MPS)

Runtime System ATM API Standard transport/network protocols (e.g. TCP/IP)

HCP AAL3/4

ISDN

HiPPI

LAN

ATM-LAN

AAL5

ATM

(b) Multithreading support for HPDC.

(a) HPDC-dedicated protocol environment.

Figure 3. Mechanisms for enhancing HPDC performance. such as gather and reduction, with the construction of configurations where communications can occur concurrently.

the performance of HPDC environments. Three basic strategies are being followed for this purpose: 1. Replacing traditional LAN technologies. Traditional technologies such as Ethernet and FDDI are replaced by ATM, but all the protocol structure remains unchanged. With this approach, the bottleneck moves from the network to the host, resulting in a unsatisfactory improvement. Some of the works reveal that the extra processing required by ATM (segmentation and reassembly) in some cases can make ATM perform worse than FDDI. Therefore, the reduction of other processing time is desirable in order to compensate for this performance degradation.

4.4. Discussion The adoption of HPDC-specific mechanisms enables the achievement of better performance for HPDC applications. A promising framework for these mechanisms is an HPDCspecific subnetwork associated to the specific mechanisms which would share the ATM service with other applications with their own protocol structures. The adoption of such a model facilitates the rapid improvement of the mechanisms since research could focus on HPDC applications exclusively.

2. Introducing an ATM-specific API. Applications are allowed to directly access ATM through an ATM API, what saves the cost of transport protocols at the cost of leaving some functions like error recovering and flow control to applications or the message-passing library. The analyzed works show that this approach yields the best throughput but however the latency is worse than that of traditional protocols, due in part to the interaction of the API with the operating system. Therefore, alternative mechanisms with HPDC-dedicated implementations are required to increase performance.

5. Summary Recent advances in network performance have triggered the research in parallel computing using workstation clusters, which has become interesting for its excellent performance-cost ratio. In order to show the benefits of this approach to parallel computing, several studies have been carried out which evaluate the performance over several distributed environments and networks. The introduction of a high-speed network, such as ATM, allows for the minimization of the bottleneck at the network, but then some problems arise, due specially to the characteristics of the network access from the host. The factors dominating latency in ATM-based HPDC systems are both hardware and software related: the characteristics of the host-network interface and the protocols supporting the network. In the paper we have considered the state of the art in software architectures and mechanisms supporting high performance distributed computing. Current research aims at reducing the impact of the software accessing the ATM network over

3. Enhancing the HPDC environment structure. In order to solve the most significant problems that have been discussed here, new solutions for all levels of the architecture of these systems are being suggested: efficient APIs, HPDC-specific transport protocols, and programming environments such as multithreading. The future predominance of ATM networks in both the local and the wide area will enable workstation networks 7

to competitively support high performance computing applications, so it is worthwhile to target to take as much advantage as possible from the features of ATM. Optimizing the issues compromising the performance of workstation networks over ATM networks will lead to competitive high performance computing systems.

[15] L. van Doorn and A. S. Tanenbaum. Using Active Messages to Support Shared Objects. In Proceedings of the 6th SIGOPS European Workshop, pages 112–116, 1994. [16] J. Vila-Sallent, J. Sol´e-Pareta, T. Jov´e, and J. Torres. Potential Performance of Distributed Computing Systems over ATM Networks. Submitted to INFOCOM’97, June 1996. [17] T. von Eicken, A. Basu, and V. Buch. Low-Latency Communication Over ATM Networks Using Active Messages. Proceedings of Hot Interconnects II, 1994. [18] T. von Eicken, D. Culler, S. Goldstein, and K. Schauser. Active Messages: A Mechanism for Integrated Communication and Computation. In Proceedings of the 19th ACM International Symposium on Computer Architecture, pages 256–266, 1992. [19] D. A. Wallach, W. C. Hsieh, K. L. Johnson, M. F. Kaashoek, and W. E. Weihl. Optimistic Active Messages: A Mechanism for Scheduling Communication with Computation. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’95), 1995. [20] R. Yadav, R. Reddy, S. Hariri, and G. C. Fox. A Multithreaded Message Passing Environment for ATM LAN/WAN. In Proceedings of the 4th International Symposium on High Performance Distributed Computing, 1995. [21] H. Zhou and A. Geist. Faster Message Passing in PVM. In Proceedings of IPPS’95 Workshop on High Speed Networks, 1995.

References [1] T. E. Anderson, D. E. Culler, D. A. Patterson, et al. A Case for NOW (Networks of Workstations). IEEE Micro, 15(1):54–64, February 1995. [2] S.-L. Chang, D. H. C. Du, J. Hsieh, M. Lin, and R. P. Tsang. Parallel Computing over a Cluster of Workstations Interconnected via a Local ATM Network. Technical report, University of Minnesota, 1994. [3] P. W. Dowd, S. M. Srinidhi, E. Blade, and R. Claus. Issues in ATM Support of High Performance Geographically Distributed Computing. In Proceedings of IPPS’95 Workshop on High Speed Networks, pages 352–358, 1995. [4] R. Fatoohi and S. Weeratunga. Performance Evaluation of Three Distributed Computing Environments for Scientific Applications. In Proceedings of Supercomputing’94, pages 400–409, 1994. [5] I. Foster. Designing and Building Parallel Programs. Addison-Wesley, 1995. [6] A. Geist et al. PVM 3 Users’ Guide and Reference Manual. Oak Ridge National Laboratory, 1993. [7] S. Hariri, J. Park, M. Parashar, and G. C. Fox. A Communication System for High-Performance Distributed Computing. Concurrency, Practice and Experience. Special Issue: High Performance Distributed Computing, June 1994. [8] S. Hariri, J. Park, F.-K. Yu, M. Parashar, and G. C. Fox. A Message Passing Interface for Parallel and Distributed Computing. In Proceedings of the 2nd International Symposium on High Performance Distributed Computing, pages 84–91, 1993. [9] C. Huang, Y. Huang, and P. K. McKinley. A Thread-Based Interface for Collective Communication on ATM Networks. In Proceedings of the 1995 International Conference on Distributed Computing Systems, 1995. [10] M. Lin, D. H. C. Du, J. P. Thomas, and J. A. MacDonald. Distributed Network Computing over Local ATM Networks. IEEE Journal on Selected Areas in Communications, 13(4):733–748, May 1995. [11] M. Medin, S. M. Srinidhi, and P. W. Dowd. Issues in ATM API Support for Distributed Computing. ATM Forum, Contribution ATMF 94-1153R1, November 1994. [12] The ATM Forum Technical Committee. Native ATM Services: Semantic Description. Document ATMF 95-0008R3, August 1995. [13] C. A. Thekkath and H. M. Levy. Limits to Low-Latency Communication on High-Speed Networks. ACM Transactions on Computer Systems, 11(2):179–203, May 1993. [14] C. A. Thekkath, H. M. Levy, and E. D. Lazowska. Efficient Support for Multicomputing on ATM Networks. Technical Report TR 93-04-03, Department of Computer Science and Engineering, University of Washington, 1993.

8