Adaptive Resource Management - CiteSeerX

0 downloads 0 Views 174KB Size Report
for managing the information based on historical access and update patterns .... a persistent queries that are answered whenever the information changes.
Adaptive Resource Management Craig E. Wills Surendar Chandra Computer Science Department Worcester Polytechnic Institute Worcester, MA 01609 Abstract

A class of software traditionally not viewed as con gurable is that of resource information managers in a distributed environment. These servers manage information about machines, users, les and other objects. The typical approach for managing such information is to implement a xed policy for managing the information based on historical access and update patterns along with developer experience. However xed solutions to these problems are complicated by the fact that computing environments are not all the same and even within a single computing environment the characteristics of the information and its access patterns may change over the lifetime of the system. These changes motivate this work to investigate an adaptive resource management mechanism that minimizes the communication overhead on a network of machines while providing resource information to clients. The mechanism modi es its behavior according to the distribution and dynamics of the resource information, and the location and frequency of resource access. A second component of this approach is to exploit multicasting for dynamically mapping resources to logical addresses for ecient delivery of resource requests or updates. Our experience thus far indicates a relatively simple adaptive mechanism works with further re nements needed.



Current address is Department of Computer Science, Duke University, Durham, NC 27708.

1 Introduction A class of software traditionally not viewed as con gurable is that of resource information managers in a distributed environment. These servers manage information about machines, users, les and other objects. The typical approach for managing such information is to implement a xed policy for managing the information based on historical access and update patterns along with developer experience. However xed solutions to these problems are complicated by the fact that computing environments are not all the same and even within a single computing environment the characteristics of the information and its access patterns may change over the lifetime of the system. As one example, in a study done to analyze the growth of a medium-sized research laboratory's wide-area TCP connections[12], it was observed that the nger (remote user lookup) trac showed a great deal of variation. On further analysis, it was discovered that the large number of connections was due, almost entirely, to a single user who ran background scripts every 20 seconds to query a remote site to see whether a colleague was logged in there. This access pattern is not compatible with the nger application and so the system performs poorly. Ideally the system would adapt to such use by having the remote server update the client as the information changes without the client having to periodically query the server. This example illustrates the introduction of an automated information access script or application that disrupts the access patterns expected from users. Other applications, such as performance monitors, also periodically gather information. This type of information access is mixed with bursty accesses from users to create a diverse set of access patterns that may change on an hourly or daily basis over the lifetime of a system. The amount and location of information may also change, which can have an impact on eciently managing the information. These potential changes motivate this work to investigate an adaptive resource 1

management mechanism that minimizes the communication overhead on a network of machines while providing resource information to clients [3]. The mechanism modi es its behavior according to the distribution and dynamics of the resource information, and the location and frequency of resource access. A second component of this approach is to exploit multicasting, which supports ecient delivery of messages to a subset of logically related machines [7]. Resources can be dynamically mapped to logical addresses for ecient delivery of resource requests or updates to only machines containing the resource or interested in the update. The approach we take is to develop an adaptive policy based on various xed policies and compare its performance with the xed policies. The system incurs overhead in maintaining state information for adapting to the execution environment. However the system will provide better performance by adapting to changing resource access patterns, not only for dynamic information, but for relatively static information. The system will be successful if the e ort involved in maintaining state, is recovered in more ecient resource management. As part of the work, a model to describe information access and updates is proposed, which is used in a simulation study of the adaptive and xed policies. Results are shown for how the policies perform in various resource access scenarios. We conclude with a summary of our work thus far and ideas for future work.

2 Related Work Much work has been done in implementing systems that use xed approaches to solve resource management problems for di erent problem domains. Many solutions have used centralized servers to manage information where updates to and requests for the information are unicast to one of the central servers. Grapevine [2] and Clearinghouse [11] are two well-known systems using this approach. A few solutions in a local area network environment have used broadcast, either to broadcast a request for information, or to broadcast information updates to all 2

other machines. Clouds [5] uses the former approach for locating remote objects, and the rwhod protocol in the UNIX1 operating system [4] uses the latter approach for managing user and machine status information. These systems work on a small scale, but broadcasting requests and updates can overwhelm machines as the network grows in size. In terms of using multicasting, hardware support has long been supported for broadcast bus technology, but its support in higher level software has been limited. However support for multicasting at the network layer for the standard Internet Protocol (IP) has been de ned and implementations are now available [6]. Our work assumes a local area network (LAN) environment that eciently supports the delivery of multicast messages. In addition, systems such as ISIS [1] and Amoeba [10] provide reliable multicasting to ensure that each participating distributed processes receive all messages in the same order. These approaches allow group membership to change, but support a relatively small number of groups. Our approach is di erent in that not only is the group membership dynamic, but the set of multicast groups dynamically changes depending on the resources currently available in the network. One approach we have previously used with multicasting is to map resource information to multicast addresses so that a machine joins a multicast address group only if it has the corresponding resource information [13]. Servers on each machine dynamically join and leave multicast addresses as the resources on the machine change. Clients seeking information send a message to the corresponding address where, in the ideal case, a machine receives a request only when it contains the requested information about the resource. Some work has been done on dynamic recon guration in a distributed system such as Hailpern's and Kaiser's work of an object-based programming language to support distributed applications [9]. Gouda and Herman have introduced a formalism and logic for looking at adaptive programs [8]. Many network routing 1

UNIX is a registered trademark of AT&T.

3

algorithms use adaptive techniques to optimize network performance.

3 Fixed Policies As a starting point for our work we make the observation that the solutions for information location problems in distributed system can be reduced to three classic approaches. 1. Centralized. Centralize the information among one or more servers, perhaps organized in a hierarchy, so that all updates of the information and requests for the information are directed to these servers. 2. Update on change. Distribute updates of information as it changes so that all requests for information can be satis ed locally by clients. 3. Query on request. Search for information as it is requested by probing selected server machines or broadcasting to all machines. We use these xed policies as a basis for our work. In addition to using broadcasting for distribution of updates and requests, the latter two policies can also be used with multicasting. As we previously described, the query-on-request approach can map the resource to a logical address for delivery. We can hypothesize using a similar approach for update-on-change where updates are multicast to only those nodes interested in the changes. We can view these directed updates as replies to a persistent queries that are answered whenever the information changes. One problem with mapping each resource name to its own multicast address is there is a practical limit on the number of multicast addresses that a single network interface may support. This limitation forces a two-tiered strategy for query-on-request. If a machine contains just a few resources then it maps each resource over the entire range of multicast addresses, while if it contains many resources then it switches strategies and maps to a xed set of addresses. A client seeking information maps the requested resource to both addresses and send the request to both the multicast groups. However because at any time a node will 4

be listening to only one of the multicast groups, it is guaranteed that a node will receive no more than one request. Thus, with the introduction of logical addresses, these xed policies contain some adaptive elements, but they still do not perform well if the basic access pattern assumptions are not satis ed. For example, if the number of updates is higher than the number of resource requests, the update-on-change approach sends updates which are not going to be used. These updates are wasted network packets and should be avoided.

4 Adaptive Policy To manage a system where the patterns of information use change, an adaptive policy is explored that monitors the current access and update characteristics of the information being managed to adjust how the information is being managed. The aim is to minimize the amount of work needed to add new machines into the network while at the same time adjusting itself to the nature of the information being managed. The mechanism will adjust in three ways: how resources are mapped to logical addresses for delivery of requests and updates, whether information is distributed across all machines or centralized on a few machines, and whether information is queried on request or updated on change. How the resource information is managed is directly dependent on what type of and how frequent requests are made. Depending on the variability in how information is accessed, the system may never change, may change once and then remain stable, or it may periodically adjust how the information is managed. The basic adaptive strategy we have explored thus far uses a distributed voting algorithm to determine if changes to the current resource management policy should be made. This approach was chosen because it allows the decision process to be distributed and each node to base decisions on easy-to-obtain parameters at the node. Each node accumulates system parameters and periodically analyzes 5

them for the current best information management strategy. These local strategy decisions are then used as a vote for the most appropriate information management policy for the entire system. The decision votes are weighted based on the amount of local system activity from the individual nodes. The strategy requires the individual nodes to collect the system parameters: modi cations sent, modi cations received, requests sent, requests received, replies sent, replies received, local modi cations and local information accesses. For each type of information, the system measure that dominates all other measures is computed at each node. A local decision is made, based on this performance parameter. For each node, if the dominating parameter is: 







Local Accesses or Local Modi cations The system makes more local accesses than remote accesses. The system should not change strategies until there appears a need for change. Since the network accesses are low anyway, the node votes to continue operating with the current scheme. Modi cations Sent or Modi cations Received The system is handling more update packets than actually using this new information. It would be better to stop these wasteful updates and access information when there is a actual need for it. So the system votes to adapt to a query-on-request approach with requests mapped to logical addresses. Requests Sent or Requests Received The system is handling requests at at a higher rate than information is modi ed. Thus it would be bene cial to have the information cached locally with the information originator notifying when there is a change. Thus the system votes to adapt to a update-on-change approach. Replies Sent or Replies Received The system is responding to many requests implying that it would be a good node to act as a central server for this information. Thus the system votes to adapt to a centralized approach.

Once an adaptive information management strategy decision is made locally, this local decision along with the raw data that was used to arrive at this conclusion is sent to a central collecting node. This central node is not constrained to be any particular node and can dynamically select itself to originate a voting session. 6

The raw data sent by the local nodes is used by the central node to weight the votes based on the amount of machine message trac in this individual node to arrive at a nal decision. This weighting of votes is done to prevent a node with little information activity from signi cantly changing the global policy decision. A policy change is e ected if a di erent, from the current, policy receives the largest number of votes and at least 50% of the total votes. The majority rule was added to prevent transient changes from a ecting the system. Once a nal decision is made at the central node, these decisions are sent to the individual nodes. In order to reliably e ect a policy change, the system can use a simple protocol with timestamps. Once the central node decides on the global information management policy, the central node sends this new policy to all the individual nodes in the system along with an update timestamp. On receipt of the global decision, the local nodes make a policy change. As part of any request or update messages, the current policy update timestamp known by the node is included so any nodes without the most up-to-date timestamp can be updated. In addition, new nodes joining the system can query any other node for the current system state. In our simulation study of the policy, the adaptive policy calls for a voting decision every hour. This time period is a compromise between a system that changes too fast and one that is lethargic to change. To further prevent an unstable system we explored a damping factor in the system. The damping factor is applied by comparing the current value of information accesses and modi cations at each node with the same measures collected during the previous time period. If the current parameters show a change of less than 10% for both the measures this node abstains from the decision process. The individual node sends this abstain decision to the central decision node. If more than 50% of the total weight of the system abstains then the management policy will not change for lack of a majority. The principal costs of the adaptive policy are the overhead to make decisions and for individual nodes to adopt the new policy. The message overhead for making the decisions is accounted for in the simulation study. The cost for transition 7

between the policies may involve a short period where both the previous and new policy are supported.

5 System and Information Model The test of the proposed adaptive system is its implementation and use when subjected to an actual varying system workload. However, it is more dicult to gain insight into the factors that contribute to the e ectiveness of a policy without controlled experiments and a well-understood workload. Thus, we use a simulation study where the system and the information access patterns are modeled. The system model we use consists of a group of workstations connected by a high speed network. The parameter that is of prime importance to us is the number of \machine" messages sent and received by the machines as the adaptive policy tries to keep this number to a minimum. Other system parameters such as information packet size, network latency, network acquiring latency (in a broadcast medium) are not of primary importance to our work and as such are not modeled. The various nodes communicate by sending messages on the network. The network supports unicast, multicast, and broadcast communication. The system can consist of several types of information; each with a number of information instances. Each type of information has di erent activity rates and periodicity where an activity can be an access for resource information, an addition of new information instances into the system or existing information instances leaving the system. Each instance can have multiple occurrences in the system. For example, if user login information is being modeled then an instance would be an individual user and that user may be logged in to multiple machines. The number of information instances of a particular type in the simulation system is a system parameter. At the start of the simulation run, the initial information instances are distributed among the nodes. This static distribution is modi ed through information additions and deletions. Some information types are 8

distributed equally among all the nodes, such as for modeling load information. Other information is concentrated among a few nodes such as le servers that let other nodes share the local le system remotely and compute servers that share their local computing power with other nodes in the system. This natural concentration of information among nodes leads to server and client node classes where server nodes have a higher concentration of information than client nodes. The information instances are initially distributed in the system using a tuple . A distribution of signi es that 100% of the information is distributed among 100% of the nodes, that is, the information is equally distributed among all the nodes. A distribution of signi es that 20% of all the nodes provide 90% of the information. These 20% of the nodes constitute the server class. Information activity on is characterized using another tuple < %server nodes, %activity >. The %server nodes gives the number of nodes that are part of this activity as a ratio of the total number of server class nodes in the entire system. The %activity gives the amount of information activity that originates from %server nodes set of nodes. A separate tuple is used to describe each of the addition of information instances, the deletion of information instances, and the access of the instances. For example, an access rate of for the addition and deletion activities would be used to specify that all information updates come from the server nodes. Similarly, the tuple would be used for information access to indicate that only 20% of the information requests originate from the server class nodes. This information model is simple, yet allows for uneven distributions of information and accesses to be modeled. It can be used to loosely model realistic information and study the e ect of new information access patterns.

9

6 Simulation Results The various xed and adaptive policies are studied in the context of di erent resource access scenarios. The scenarios are created with a simulation where the information is modeled as previously described. The system parameter of prime importance for our study is the number of machine messages as seen by the individual nodes. The length of a simulation run is for a 24 hour period from 00:00 to 23:59. Di erent workloads and information access patterns are con gured using an input event script that describes the tuples for an information type. The simulation system generates accesses based on this event script. The same script is used for the di erent xed policies as well as the adaptive system. In processing an access scenario, the input event script is read and information generators or accessors are created depending on if the access type is an information addition/deletion or an information access. These processes generate random requests for service. A request speci es the type (access, add, delete), the information and the instance within this information, along with the node that requests this information. The simulation runs are repeated several times and the mean values of the simulation runs are analyzed. The adaptive policy makes discrete policy change decisions in adapting to any of the xed resource management policies. The system is expected to adapt to the scheme which results in the least amounts of machine messages.

6.1 Generic Model Before the simulation program is run through a real life workload, the system is used to simulate a simple synthetic workload. In any information access scenario we note that the amount of information modi cations can be greater than, equal to, or less than the number of information accesses. The synthetic scenario successively generates accesses in all these modes. The results of this workload can be compared 10

with the expected results for model veri cation. For this scenario, we assume 20 nodes in our system and a small number of information instances at the start of the simulation run. All the nodes provide information equally and there are no server or client classes among the di erent nodes. The results from running this scenario for a 24 hour period are shown in Figure 1. The vertical axis indicates the total number of machine messages over the course of the time period. The scenario generates 10 times the information modi cation rate as information accesses for the rst 8 hours (period I), followed by a period of equal accesses and modi cations for 8 hours (period II) followed by a period of 10 times as much information accesses as information modi cations (period III). 300 Adaptive Broadcast Request Multicast Request Broadcast Update Multicast Update Centralized

No of messages

250

200

150

100

50

0

0

5 I

10

15 II

20 III

Time of day (in hrs)

Figure 1: Machine Messages for Synthetic Information During the initial time period, when there are more information modi cations as compared to information accesses, the query-on-request policy results in the smallest number of machine messages. By default, the adaptive system starts up 11

by using the centralized information management strategy, but soon adapts the to query-on-request information access policy. In the second phase, the adaptive policy oscillates between the query-on-request and update-on-change policies. Upon investigation, this thrashing occurs because the low system activity breaks the damping factor as the system activity is small and even a slight change in access patterns a ects the decision process. This result indicates further work needs to be done on the damping factor strategy so the global e ect is better controlled. As the number of accesses increase, the update-on-change policy gives the best performance with respect to the number of machine messages. The adaptive system switches to this policy. However, because of the overhead involved in with the adaptive policy, it performs with a slightly higher cost than the xed information management scheme. The centralized information management policy is a good xed policy across the di erent phases. However, further investigation found the adaptive policy tends to stay away from the centralized scheme because of the voting scheme. The criterion of voting for centralization is such that few of the nodes vote for it even though the policy would be best for the overall system. This aversion to centralization points to more work needing to be done on balancing individual decision making versus the global decision within the adaptive policy.

6.2 Load Information Overall the generic simulation model behaves as expected with the adaptive policy generally switching policies when it should. A second access scenario was created to model the load information of a node, which may be used by active users in nding a lightly loaded node, by batch jobs to select the best node to run a job or by monitoring tools to display the system load status for performance analysis. As a result, load information is one of the system measures which is used actively 12

throughout the entire day and in di erent ways. For this scenario, we assume 20 nodes in our system and load average being computed every minute. All the nodes provide load information. The results from running this scenario for a 24 hour period are shown in Figure 2 where the total number of machine messages accumulated are shown for each hour. 120000 Adaptive Broadcast Request Multicast Request Broadcast Update Multicast Update Centralized

No of messages

100000

80000

60000

40000

20000

0

0

5

10

15

20

Time of day (in hrs)

Figure 2: Machine Messages for Load Information In this scenario, we model load information being sampled every minute and thus changing steadily throughout the simulation interval. Generally load information is accessed at an average rate of once every minute by some process in the system. At 8am, a user starts a performance monitoring tool to display the system load averages which queries the system load average every 30 seconds. This user remains active until 11:30 and continues to run this tool from 13:00 until 16:00. Later two of these tools are run from 22:00 to 22:30. In the meantime we have similar usage patterns from another user who works from 08:30 to 12:00 and later from 13:00 to 15:00. The system administrator runs a performance tool from 08:00 13

until 22:00. The system models similar usages for load information by di erent users at di erent time periods throughout the day, some of which overlap with one another. Though the centralized approach operates with the least number of machine messages, the adaptive policy stays away from a centralized approach as discussed previously. Next to centralized, query-on-request using multicast groups provides better performance. The adaptive strategy follows this policy most of the time with a small overhead. This result makes sense as the system generates more modi cations than accesses which results in fewer messages. In contrast, update-on-change using broadcasting communication protocol generates the most messages, as updates are sent to all nodes regardless of whether this information is needed. We would expect the query-on-request policy to work even better with multicasting because in cases where a lightly loaded machine is being sought, the request could be sent to only machines with a low load. However, the current information model is not exible enough to quantify information in this manner.

6.3 Users logged in A resource access scenario that shows wide variance in the resource usage patterns is user login information. The number of users logged into a system typically increases with the general working time for the day and tapers o in the wee hours. As more users log in, more requests are generated. Access patterns are a ected dramatically when speci c users automatically monitor other users. For this scenario, we assume 20 nodes in our system with 100 individual users in the system. The results from running this scenario for a 24 hour period are shown in Figure 3. It shows the number of machine messages as seen by the individual nodes each hour. The user activity generally increases with the time of day. Until around 6am, it remains very low and slowly increases to peak around noon. It then slowly drops 14

5000 Adaptive Broadcast Request Multicast Request Broadcast Update Multicast Update Centralized

No of messages

4500 4000 3500 3000 2500 2000 1500 1000 500 0 0

5

10

15

20

Time of day (in hrs)

Figure 3: Machine Messages for User Information o around 20:00 and drops o considerably around 22:00. The adaptive strategy generally follows query-on-request using multicasting as the management policy. For part of the time period, the policy adapts to the update-on-change policy using multicasting until the system parameters change and the adaptive policy switches to query-on-request. Although the centralized policy shows better performance the adaptive policy again tends away from this approach. We would expect the the query-on-request with multicasting policy to yield even better results because users can be mapped to logical addresses for delivery. However the distribution of the user information does not allow the policy to take advantage of mapping users to addresses.

15

7 Analysis and Future Work These preliminary results show that an adaptive policy does generally work in nding a low-cost policy in both dynamic and static environments with a relatively low overhead. Although it does not always nd the best policy, it nds a good policy to work in a range of environments. In addition, the information model allows us to study the policies in controlled environments. As discussed, problems did arise with the adaptive policy used with the scenarios we have generated. We do not view these problems as inherent weaknesses in the approach, but rather as directions in which further work on re ning the adaptive policy needs to be done. One direction that we need further work is how such a mechanism scales. This issue can be broken into two parts. The rst is how the mechanism scales in a local area network environment with more nodes. We expect similar results, but need to test this assumption. The second issue is scaling beyond a local area network. We can look at scaling these results beyond a local area network environment through the use of proxy requests. As an example, consider requesting information about a user at another site. A request can be sent to a forwarding machine at that site, which serves as a proxy for the remote request as it uses the local resource location mechanism for looking up the information and returning a result. The local client machine does not need to know about machines at the other site or the mechanism used for satisfying the request. The external requests to a site are treated as if generated by the local machines and if enough occur they could cause the management of the information to adapt. The information scenarios did bring out the problem that the adaptive policy tended away from centralization. This is a problem in the adaptive algorithm, which needs to better combine local decision making with the resulting global e ect. There was also a problem in oscillations between policies when the amount of system activity is low. This decision must not only take into account the relative change of activity, but the also the amount. 16

We also need to do more work on modeling the information. Lack of control of the information instances reduced the expected e ectiveness of using logical addresses for the resources. Another aspect of information management that should be incorporated in the adaptive policy is persistent queries from one node. In this case a single node could request updates of information and only when multiple nodes seek information need the mechanism adapt to an update-on-change policy.

8 Summary We are continuing in this e ort is to explore an adaptive resource management mechanism that modi es how it works to minimize the communication overhead on network machines while providing up-to-date resource information to any client machine. The mechanism customizes itself according to the distribution and dynamics of the resource information, and the location and frequency of resource access. Whether the resource information remains distributed, or is centralized, and how a corresponding query for a resource is satis ed is the responsibility of the resource management mechanism. An important component of this mechanism is the use of logical addresses so that resources are not located by sending queries or updates to a particular machine, but located by sending the query or update to a logical address that is received by one or more machines that may be able to satisfy the request or are interested in the information. The current status of our approach and preliminary results have been discussed. Our experience thus far indicates a relatively simple adaptive mechanism does work with further re nements needed. We also need to implement the policy to validate our simulation results and obtain more experience on switching information management policies.

17

References [1] Kenneth Birman, Andre Schiper, and Pat Stephenson. Lightweight causal and atomic group multicast. ACM Transactions on Computer Systems, 9(3):272{ 314, August 1991. [2] A. Birrell, R. Levin, R. Needham, and M. Schroeder. Grapevine: An exercise in distributed computing. Communications of the ACM, 25(4):260{274, April 1982. [3] Surendar Chandra. Adaptive resource management. Master's thesis, Computer Science Department, Worcester Polytechnic Institute, August 1993. [4] Computer Science Division, University of California, Berkeley. UNIX Programmer's Manual, 4.3 Berkeley Software Distribution, Virtual VAX-11 Version, April 1986. [5] Partha Dasgupta, Richard J. LeBlanc Jr., and William F. Appelbe. The Clouds distributed operating system: Functional description, implementation details and related work. In Proceedings of the 8th International Conference on Distributed Computing Systems, pages 2{9, June 1988. [6] S. Deering. Host extensions for IP multicasting, August 1989. RFC 1112. [7] Stephen E. Deering and David R. Cheriton. Multicast routing in datagram internetworks and extended LANs. ACM Transactions on Computer Systems, 8(2):85{110, May 1990. [8] Mohamed G. Gouda and Ted Herman. Adaptive programming. IEEE Transactions on Software Engineering, SE-17(9):911{921, September 1991. [9] Brent Hailpern and Gail E. Kaiser. Dynamic recon guration in an objectbased programming language with distributed shared data. In Proceedings of the 11th International Conference on Distributed Computing Systems, pages 73{80, May 1991. [10] M. Frans Kaashoek and Andrew S. Tanenbaum. Group communication in the Amoeba distributed operating system. In Proceedings of the 11th International Conference on Distributed Computing Systems, pages 222{230, May 1991. [11] Derek C. Oppen and Yogen K. Dalal. The Clearinghouse: A decentralized agent for locating named objects in a distributed environment. ACM Transactions on Oce Information Systems, 1(3):230{253, July 1983. Earlier, expanded version published as Technical Report OPD-T8103, Xerox Oce Products Division, Systems Development Department, October 1981. 18

[12] Vern Paxson. Growth trends in wide-area TCP connections. Technical report, Lawrence Berkeley Laboratory and EECS Division, University of California, Berkeley, CA 94720, 1 Cyclotron Road, Berkeley, CA 94720, May 12 1993. [13] Craig E. Wills and Shanti Suresh. Resource-driven resource location. In Proceedings of the 26th Hawaii International Conference on System Sciences, pages 80{89, January 1993.

19