A Retrospective Study of the Synthesis Kernel ... - Semantic Scholar

1 downloads 0 Views 100KB Size Report
Dept. Computer Science and Engineering. Oregon Graduate Institute. 19600 N. W. von Neumann Dr. Beaverton, OR 97006-1999. Henry Massalin. Microunity.
A Retrospective Study of the Synthesis Kernel Calton Pu and Jonathan Walpole

Dept. Computer Science and Engineering Oregon Graduate Institute 19600 N. W. von Neumann Dr. Beaverton, OR 97006-1999

Henry Massalin

Microunity Somewhere Sunnyvale, CA

Abstract In building the Synthesis kernel [4, 5, 3, 2, 6] we demonstrated some interesting implementation techniques for improving the performance of operating system kernels. In particular, we experimented with ne-grain modular kernel organization, dynamic code generation and software feedback. In addition, and perhaps more importantly, we discovered that a careful and systematic combination of these ideas can be very powerful, even though each idea by itself may have serious limitations. This paper describes the lessons we learned from Synthesis and the interesting interactions we discovered between modularity, dynamic code generation, and software feedback. We also highlight the important common under-pinnings of the Synthesis approach and present our ideas on future operating system design and implementation.

1 Introduction Historically, measures of throughput have formed the primary basis for evaluating operating system performance. As a direct consequence, many operating system implementation techniques are geared towards optimizing throughput. Unfortunately, traditional approaches to improving throughput also tend to increase latency. Examples include the use of large bu ers for data transfer and coarse-grain scheduling quanta. While this approach was appropriate for the batch processing model of early computer systems, today's interactive multimedia computing environments introduce a di erent processing model and require correspondingly di erent performance metrics and implementation techniques. The new computing model is one in which data is transferred, in real-time between I/O devices, along a pipeline of system and application-level computation steps. In this interactive environment, applications are primarily concerned with \end-to-end" performance, which is determined not only by operating system throughput, but also by the magnitude and variance of the latency introduced at each step in the pipeline. Reducing and controlling end-to-end latency, while maintaining throughput, in this \pipelined" environment, is a key goal for next-generation operating systems. In contrast to the totally throughput-oriented implementation techniques of conventional operating systems, the Synthesis kernel sought to investigate implementation techniques that would provide lower and more predictable latency as well as improving throughput. In particular, Synthesis incorporates dynamic code generation to reduce the latency of critical kernel functions, and software feedback to control the variance in latency introduced by the operating system's resource scheduling policies. Our experience with Synthesis showed these techniques to be interesting and useful in their own right. However, the more important kernel design lessons we learned from the project relate to the interactions between these techniques, their relationship to other kernel design approaches, and the fundamental principles that underlie the ideas. This paper focuses on exactly these issues. The paper is organized as follows. Section 2 outlines some key performance challenges for next generation operating systems. The main Synthesis implementation techniques, dynamic code generation and software feedback, are summarized in sections 3 and 4 respectively, together with their principle advantages and problems. Section 5 discusses the interaction between these techniques, explains their relationship to other kernel design approaches, and identi es the common principles on which they are based. Based on these principles we also describe design directions for nextgeneration operating systems. Section 6 outlines related work. Finally, section 7 concludes the paper.

2 Performance Challenges for Next-Generation Operating Systems The advent of interactive multimedia computing imposes strict new requirements on operating system performance. In particular, next-generation operating systems must support the processing of real-time data types, such as digital audio and video, with low end-to-end latency and high 2

throughput. The emerging model of computation is one in which real-time data enters the system via an input device, passes through a number of kernel and application processing steps, and is nally presented, in real-time, at an output device. In this environment, system performance is determined in large-part by the throughput and total end-to-end latency of this pipeline. As multimedia applications and systems become more complex, the number of steps in the pipeline will increase. It is important that the addition of new steps in the pipeline does not cause signi cant increases in the end to end latency. This problem is a key challenge for operating system designers. If operating systems implement data movement by bu ering large amounts of data at each pipeline stage, and process it using correspondingly large CPU scheduling quanta, then additional pipeline elements will lead to undesirable increases in end-to-end latency. An alternative approach is to implement data movement and processing steps at a ne granularity. This approach has traditionally not been taken because it does not allow operating system overhead, incurred during operations such as context switching, data transfer, system call invocation, and interrupt handling, to be amortized over large periods of useful work. Rather than focusing on amortizing these costs at the expense of end-to-end latency, we suggest that next generation operating systems must resolve the problem directly by reducing the cost of these fundamental operations. The need for new design approaches is exacerbated by the trend in operating system design towards micro-kernel-based operating systems. Such systems implement operating system functionality as a collection of coarse-grain server modules running above a minimal kernel. While this structuring approach has many well-publicised advantages, current implementations of it tend to lead to an increase in the cost of invoking operating system functions and add to the number of steps in the pipeline. Finally, the popular process of emulating existing monolithic operating systems above microkernel-based operating systems exacerbates the problem even further. Current implementation approaches for supporting emulation, such as redirecting traps to user-level emulation libraries before invoking operating system functions, introduce additional latency for kernel calls. Again, there are many important and well-accepted reasons for supporting emulation. What is needed are new implementation techniques that allow it to be supported eciently. In summary, next-generation operating systems must provide support for low-overhead data movement, control ow transfer, modularity and emulation. They must also provide predictable real-time resource scheduling to support multimedia applications. Each of these areas has been well explored within the bounds of traditional kernel implementation approaches. The Synthesis kernel, however, departs from traditional approaches by making \extensive" use of the following two techniques: 

dynamic code generation - to reduce the latency of common kernel functions, particularly, emulation, context switch, queue and bu er management, system call invocation, and interrupt handling.



software feedback - for adaptive resource scheduling with predictable variance in latency. 3

Both of these techniques have been described in detail in our earlier papers [4, 6, 3], therefore, rather than dwelling on a reintroduction of the ideas, the following sections brie y introduce the ideas and then focus on the key lessons we learned from their application in Synthesis. The remainder of the paper then discusses the interaction between these ideas, and their relation to other important concepts in kernel design.

3 Dynamic Code Generation

3.1 The Techniques, Uses and Bene ts

Brie y explain the principle at a conceptual and technical level. List the key places it is used in Synthesis and state/summarize the bene ts obtained in each of these places. Refer to papers for more details.

3.2 Interaction With Other Ideas Explain that ne-grain modularity and abstract interfaces are an essential foundation for dynamic code generation. Explain what we mean by these terms, as opposed to the coarse-grain server-level concept of modularity and interfaces that is most familiar to the sosp folks. Explain why these features are essential for dynamic code generation to be used successfully. Explain the quaject concept in Synthesis and show an example of how it helps us apply dynamic code generation. We learned the following key lessons: 

Lesson 1: without abstract interfaces it is not possible to gain much bene t from dynamic code generation. For example: the UNIX proctable limits how fast context switch can be (is this the best example we can think of?).



Lesson 2: without ne-grain modularity the approach doesn't scale. Dynamic code generation can only be applied to programming-in-the- large if a rigorous approach to modularity is employed.

For wide applicability, dynamic code generation needs ne-grain modularity. However, dynamic code generation is also the key to removing the overhead that is conventionally thought to be associated with modularity. Hence there is a mutual dependency between these ideas. For similar reasons, dynamic code generation o ers the key to ecient emulation: it allows the arti cial boundaries introduced for emulation to be removed by collapsing layers. Give example to illustrate what we mean.

3.3 Important Problems Cache management: explain the impact of dynamic code generation on cache management, particularly instruction/data cache consistency issues, and discuss hand-tuned cache management in the generated code. 4

Portability: discuss the impact of Synthesis' assembler-based approach to dynamic code generation. How does this a ect portability? Target-independent assembler may give support for portability within a family of machines, but higher-level language support is needed for wider portability (discuss issue of porting the compiler/optimizer/code-generator vs. porting the OS). Debugging: explain the distinction between dynamic code generation and self-modifying code (append-only based approach vs. update in place). What about generated code generating code? Is debugging really as big a problem as it sounds? What debugging support was available in Synthesis? Can Henry say something about the debugging support available in the kernel monitor? What would have been nicer?

4 Software Feedback

4.1 The Technique, Uses, and Bene ts Brie y, explain how software feedback can be used to support adaptive real-time scheduling. State how it di ers from more conventional real time scheduling approaches. Explain where the approach was used in Synthesis (CPU scheduling). Refer to papers for more details.

4.2 Interaction With Other Ideas Explain the cost of adaptive scheduling without dynamic code generation. Give an example. Dynamic code generation makes the simple things fast (such as queue management, CPU re-allocation, etc). Give short examples. This allows the adaptive feedback mechanism to be more responsive. Mention the need for/link with ne-grain modularity (Calton, I couldn't remember what this was getting at so I left it in here??). Explain why multimedia systems need adaptive scheduling approaches in order to make ecient use of resources. We learned the following lesson: 

Lesson 3: ne grain adaptation schemes need dynamic code generation to achieve simplicity and performance.



Lesson 4: ?can you think of another?

4.3 Important Problems Out of range responsiveness: explain the tradeo between the limitations in responsiveness and magnitude of error on the one hand and the excessive algorithmic complexity on the other hand. Theoretical foundations: explain the limitations of the current experimental approach. Outline the work necessary in terms of theoretical analysis of lter properties and then experimental validation. Discuss the need for a better theoretical foundation for the ne-grain adaptive scheduling ideas. Integration of ne-grain adaptive scheduling with existing approaches to guaranteeing levels of service and managing overload etc (i.e., how does this relate to QOS-based interfaces that 5

support resource reservation and admission testing rather than continuous degradation in the face of overload?).

5 Discussion

5.1 The Relationship Between the Ideas Dynamic optimization: using information learned at runtime to drive dynamic optimization is a very valuable principle for kernel designers. Abstractly, this is the basis of both dynamic code generation and software feedback. Dynamic code generation gathers information on invariants at run-time and uses that information to complete the instantiation of code templates. Software feedback gathers performance information at runtime and uses it to adjust resource scheduling policy. In addition to this commonality at the conceptual level, there are also a number of important mutual dependencies between the techniques. These dependencies show that the e ectiveness of the ideas depends on them being used in combination rather than being applied in isolation. The key dependencies are the following: 

Adaptive scheduling approaches are only useful if they are low cost and responsive. To achieve this they need dynamic optimizations such as dynamic code generation.



To be fully e ective, dynamic code generation requires abstract interfaces and ne-grain modularity. Modular kernel design facilitates dynamic code synthesis since the interactions are simpli ed and constrained.



For dynamic code generation to be used widely throughout the system, ne-grain modularity must be applied widely, including a modularization of the code generator.



Dynamic code generation holds the key to removing the performance overheads associated with modularity and emulation. This provides the foundation for a real scienti c comparison of di erent kernels (Calton, what does this mean?) as well as paving the way for the acceptance of a new kernel (e.g. Mach).

These interdependencies have a potentially large impact on kernel structuring. Fine-grain modularity is the essential structuring concept........... Explain the extent to which the Synthesis quaject concept was useful and appropriate. While we have claimed that the application of our implementation techniques can greatly improve support for emulation, many existing system interfaces are dicult to emulate above a completely abstract kernel interface. The abstract part of any operating system interface can be supported using emulation. However, if the kernel exports data structures such as CVT in IBM MVS, absolute location Hex 10, and proctable in Unix, the emulation problem becomes much harder. Our belief is that ultimately, kernel interfaces should become completely encapsulated. Hence, we don't view our lack of support for this level of emulation to be a major drawback. 6

5.2 A More Uniform Kernel Design Approach Synthesis showed that a couple of dynamic optimization techniques can be usefully added to the kernel developers tool-kit. However, Synthesis didn't go far in describing or supporting a new kernel development methodology that embodies these approaches in a uniform way. The fundamental principles that underlie the dynamic optimization techniques in Synthesis are based on the concept of partial evaluation. Partial evaluation can be viewed as a process of specializing operations based on knowledge of invariants that hold at various stages in program development cycle (including execution). In Synthesis, information relating to invariants was generally implicit in the kernel code. A more uniform kernel design and implementation approach would provide direct support for generalized partial evaluation by making knowledge and manipulation of invariants explicit. Support for generalized partial evaluation di ers from the Synthesis approach in the following ways. First, it uses invariant-related information, from many sources, to apply optimizations at many di erent stages in the system life-cycle. This approach can be thought of as incremental specialization. Synthesis focused primarily on runtime optimizations. Second, the programming methodology would include support for making invariant related information explicit, e.g., the source of the information, the stage at which it can be used, and the target for optimization. This information would then be used by a suite of partial evaluation tools ranging from compile-time analysers to runtime code generators. The key to this kernel design and implementation approach is to integrate a multi-level programming approach that embodies this invariant information with ne-grain modularity. This will require new high-level programming language support. The language requires the following general features: ...... support for modularity, well-de ned semantics for automatic partial evaluation, support for multi-level programming etc etc. In the longterm, we hope that the integration of multi-level programming and ne-grain modularity will also allow the separation of machine dependencies at a ne granularity, hence improving the potential for code reuse and enhancing portability.

6 Related Work Dynamic code generation has been used in several ways. [List Deutsch on compiled Smalltalk, the Blit terminal, etc. The Microunity way? Feedback systems existed for a long time in the context of control theory. In network protocols, the idea has been applied in the design of speci c protocols to do certain things [Cite Jain]. In some scheduling work [Cite IBM]. Cite Kevin Je ay's adaptive bu er management work for multimedia OS. Relationship to other real-time scheduling approaches. Emulation has been used in Mach of BSD Unix, Window of DOS, NT of Windows, OS2 of DOS (Chorus of SVR4). This is a well accepted and practical idea, since it is needed to preserve huge investment in software and allow a graceful transition. Without dynamic code generation it is considered expensive and a temporary solution. Ultimately, we would like to build an emulator 7

that is faster than the original implementation on the same hardware. Unlike the Unix emulator on Mach, the Synthesis emulator of UNIX carries very low overhead (because of dynamic code generation). Modular kernels with abstract interfaces ....Apertos and other kernels that have a completely abstract interface. Mention the implementation limitations of these kernels. Other modular kernels. Chorus [7] and Mach [1]. Contrast between a pure message based implementation (Mach) and a multiple implementation kernel (Chorus). If possible, show the cost of a modular kernel without dynamic code generation. Maybe talking to Apple people (Finlayson). Discuss the use of dynamic optimization, such as dynamic linking of micro protocols, in the x-kernel [cite peterson]. Other related ideas from the synthesis project include Lock-Free Synchronization: explain the idea of reduced synchronization and lock-free synchronization. Explain how it is important for removing bottlenecks in multiprocessor systems. Explain the recent work by Herlihy and Moss [particularly Transactional Memory] that give this idea a good theoretical foundation. Also, Synthesis is an experimental validation that the idea is powerful and applicable \in a big way".

7 Conclusion Synthesis has shown some OS kernel implementation techniques to be promising. These ideas have been studied, not only in isolation, but also in careful and systematic combination. We have learned that even though some of these techniques may not present a clear win by themselves, their combination is very powerful. Although the interaction of these ideas is subtle, our experience with Synthesis shows that they should be applied in future operating system kernels. For example, although the ideas of modular kernel design and emulation have been around for a long time, Synthesis provides new evidence on how they can be implemented eciently by combining them with dynamic optimization techniques. Not only do the optimization techniques improve the performance of modular systems, they also require a high degree of modularity in order to be used e ectively. Similarly, the advent of multimedia computing has made real-time adaptive resource management a key issue. Adaptive scheduling approaches also need dynamic optimization techniques, such as dynamic code generation, in order to be responsive. In the Synthesis project, we discovered these important interactions between interesting techniques. However, for them to be applied more widely and integrated with other operating system e orts, we need a more uni ed approach to kernel implementation and optimization. This paper outlined the essential elements of this new approach: ne-grain kernel structure, and high-level language support for modularity and incremental specialization.

References [1] M. Accetta, R. Baron, W. Bolosky, D. Golub, R. Rashid, A. Tevanian, and M. Young. Mach: A new kernel foundation for Unix development. In Proceedings of the 1986 Usenix Conference, pages 93{112. Usenix Association, 1986. 8

[2] H. Massalin and C. Pu. Threads and input/output in the Synthesis kernel. In Proceedings of the Twelfth Symposium on Operating Systems Principles, pages 191{201, Arizona, December 1989. [3] H. Massalin and C. Pu. Fine-grain adaptive scheduling using feedback. Computing Systems, 3(1):139{173, Winter 1990. Special Issue on selected papers from the Workshop on Experiences in Building Distributed Systems, Florida, October 1989. [4] H. Massalin and C. Pu. Reimplementing the Synthesis kernel. In Proceedings of Workshop on Micro-kernels and Other Kernel Architecturs, Seattle, April 1992. Usenix Association. [5] C. Pu and H. Massalin. Quaject composition in the Synthesis kernel. In Proceedings of International Workshop on Object Orientation in Operating Systems, Palo Alto, October 1991. IEEE/Computer Society. [6] C. Pu, H. Massalin, and J. Ioannidis. The Synthesis kernel. Computing Systems, 1(1):11{32, Winter 1988. [7] H. Zimmermann, J-S. Banino, A. Caristan, M. Guillemont, and G. Morisset. Basic concepts for the support of distributed systems: the Chorus approach. In Proceedings of 2nd International Conference on Distributed Computing Systems, July 1981.

9