Strategyproof Mechanisms for Scheduling ... - Computer Science

1 downloads 0 Views 213KB Size Report
In this work we propose strategyproof mechanisms for scheduling divisible loads on ... Science, Wayne State University, 5143 Cass Avenue, Detroit, MI 48202.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

1

Strategyproof Mechanisms for Scheduling Divisible Loads in Bus-Networked Distributed Systems Thomas E. Carroll, Student Member, IEEE, and Daniel Grosu, Member, IEEE Abstract—The scheduling of arbitrarily divisible loads on a distributed system is studied by Divisible Load Theory (DLT). DLT has the underlying assumption that the processors will not cheat. In the real world this assumption is unrealistic as the processors are owned and operated by autonomous, rational organizations that have no a priori motivation for cooperation. Consequently, they will manipulate the algorithms if it benefits them to do so. In this work we propose strategyproof mechanisms for scheduling divisible loads on three types of bus connected distributed systems. These mechanisms provide incentives to the processors to obey the prescribed algorithms and to truthfully report their parameters, leading to an efficient load allocation and execution. Index Terms—divisible load scheduling, mechanism design, incentive-based computing, strategyproof mechanism.



1

I NTRODUCTION

CHEDULING tasks in a distributed computing system is one of the most challenging problems that need to be solved when running applications. Inefficient scheduling decisions result in overheads and poor performance. The task scheduling problem takes many forms depending on the characteristics of the tasks to be scheduled, the characteristics of the machines composing the system and the objective function. One type of scheduling problem is the one in which there are no dependencies between tasks and the tasks can be of arbitrary size. This is the case for several applications in science and engineering in which the total load can be split into an arbitrary number of independent loads. These loads require the same type of processing and can be assigned to any computer in the system. In practice it corresponds to the widely used master-slave model of parallel computation. The above scheduling problem can be characterized using the divisible load model which was studied extensively in recent years resulting in a cohesive theory called Divisible Load Theory (DLT) [7]. DLT provides analytical results and optimal algorithms for scheduling loads on various types of platforms such as bus, tree, star and linear networks. The scheduling algorithms developed within DLT assume that the participants (in this case, processors) are obedient. Thus, they report to the scheduler the true parameters of their processing facilities (e.g., processing power). The scheduler makes the allocation decision according to the values reported by the processors or by the owners of these processors. This assumption is not valid in real life situations where these participants have no a priori motivation for cooperation and

S

• Thomas E. Carroll and Daniel Grosu are with the Department of Computer Science, Wayne State University, 5143 Cass Avenue, Detroit, MI 48202. Email: [email protected], [email protected]. Manuscript received 7 Feb. 2007; revised 26 Oct. 2007; accepted 28 Nov. 2007.

they are tempted to manipulate the scheduling algorithm if it is beneficial to do so. This behavior may lead to poor system performance and inefficiency. Thus, we need to develop new algorithms and protocols that address the self interest of the participants. Unlike the traditional DLT algorithms, the new protocols must deal with the possible manipulations. Also, the system must provide incentives to agents to participate in the given algorithm. The solution of these kinds of problems comes from economics, more precisely from mechanism design theory [27]. The scope of this theory is to provide tools and methods to design protocols for self-interested agents. Of interest are the so called strategyproof mechanisms in which the participants maximize their own utilities only if they report their true parameters and follow the given algorithm. In a general mechanism each participant has a privately known function called valuation which quantifies the agent’s benefit or loss. Payments are designed and used to motivate the participants to report their true valuations. The goal of each participant is to maximize the sum of her valuation and payment. As an example consider several resource providers that offer computer services. We assume that each resource is characterized by its job processing rate. An allocation mechanism is strategyproof if a resource owner maximizes her utility only by reporting the true resource processing rate to the mechanism. The optimal utility is independent of the values reported by the other participating processors. In this paper we consider the design of strategyproof scheduling mechanisms in the context of divisible load theory. To our knowledge this is the first attempt to augment DLT with incentives. We develop strategyproof mechanisms for three classes of distributed systems interconnected by a bus network. The strategyproof mechanisms provide incentives to the processors to participate and to report their true processing capacities to the scheduler. The processors gain the maximum profit by executing the load only if they are truthfully reporting the private values characterizing their processing capabilities.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

The first mechanism we design is for bus networks with control processor. The obedient control processor computes the outcome of the mechanism and distributes load to the remaining processors. For the other types of systems, the control processor does not exist, thus, a strategic processor must distribute the load to the others. This processor may or may not have a front-end processor that permits simultaneous communication and computation. We use the existence of the front-end processor to distinguish between the other two classes of bus networks. We design the mechanisms for these classes using cryptography and problem-partitioning techniques that allow us to catch and punish deviating agents. 1.1

Related work

The divisible load scheduling problem was studied extensively in recent years resulting in a cohesive theory called Divisible Load Theory. A reference book on DLT is [7]. Two recent surveys on DLT are [8] and [29]. This theory has been used for scheduling loads on heterogeneous distributed systems in the context of different applications such as image processing [21], databases [9], linear algebra [11], and multimedia broadcasting [6]. Scheduling divisible loads in grids has been investigated in [34]. Scheduling divisible loads in which network and processing parameters are not known in advance or are dynamic is examined in [16]. A multi-round divisible load scheduling algorithm is presented in [33]. An examination of divisible loads and return messages is performed in [4], [5]. New results and open research problems in DLT are presented in [3]. All these works assumed that the participants in the load scheduling algorithms are obedient and follow the algorithm. Recently, several researchers considered the mechanism design theory to solve several computational problems that involve self interested participants. These problems include resource allocation and task scheduling [25], [31], [32], routing [13] and multicast transmission [14]. In their seminal paper, Nisan and Ronen [26] considered for the first time the mechanism design problem in a computational setting. They proposed and studied a VCG(Vickrey-ClarkeGroves) [12], [19], [30] type mechanism for the shortest path in graphs where edges belong to self interested agents. They also provided a mechanism for solving the task scheduling on unrelated machines problem. A general framework for designing strategyproof mechanisms for one parameter agents was proposed by Archer and Tardos [1]. They developed a general method to design strategyproof mechanisms for optimization problems that have general objective functions and restricted form for valuations. In a subsequent paper [2] the same authors investigated the frugality of shortest path mechanisms. Grosu and Chronopoulos [18] derived a strategyproof mechanism that gives the overall optimal solution for the static load balancing problem in distributed systems. The results and the challenges of designing distributed mechanisms are surveyed in [15]. Mitchell and Teague [22] extended the distributed mechanism in [14] devising a new model where the agents themselves implement the mechanism, thus allowing them to deviate from the algorithm. The strategyproof computing paradigm proposed in [23] considers the self-interest and incentives

2

of participants in distributed computing systems. Ng et al. [24] proposed a strategyproof system for dynamic resource allocation in data staging. 1.2 Our contributions The main contribution of this paper is to augment the existing Divisible Load Theory with incentives. We develop strategyproof mechanisms for scheduling divisible loads in distributed systems assuming a bus type interconnection and a linear cost model for the processors. We define the mechanisms and prove their properties. We simulate and study the implementation of the mechanisms on systems characterized by different parameters. To our knowledge this is the first work on augmenting DLT with incentives, initiating the development of a cohesive theory combining DLT with incentives. 1.3 Organization The paper is structured as follows. In Section 2 we present a description of the divisible load scheduling problem for bus networks. In Section 3 we present the framework used to design our mechanisms. In Section 4 we present and discuss the proposed strategyproof mechanisms. In Section 5 we study by simulation the proposed scheduling mechanisms. In Section 6 we draw conclusions and present future directions.

2

D IVISIBLE L OAD S CHEDULING P ROBLEM

We consider a distributed system interconnected with a bus network. There are two classes of bus-networked distributed systems, distinguished by the existence of the control processor. The control processor, P0 , has no processing capability and it can only communicate with a single processor at any instant (i.e., we assume the one-port model). Each load-executing processor Pi , i = 1, 2, . . . , m, is characterized by wi , the time taken by Pi to process a unit load. The fraction of load assigned to Pi is αi . Processor Pi executes its assignment in time αi wi , which corresponds to a linear cost model. In each system we consider a load-originating processor, which is the processor with initial access to the load. If the system has a control processor, that processor is considered the load-originating processor, otherwise, one of the load-executing processors is designated as the load-originating processor. The loadoriginating processor transmits αi units of load to Pi in time αi z, where z is the time it takes to communicate a unit load from the load-originating processor to any other processor. We denote by α = (α1 , α2 , . . . , αm ) the vector of load allocations. Processor Pi finishes executing its assignment in time Ti (α ), which is the total time taken to receive and then process the assignment. Depending on the existence of a control processor we have two classes of systems: bus network with control processor (CP) and bus network without control processor (NCP). Furthermore, the bus network without control processor class can be divided into two subclasses depending on the existence of a front end: bus network without control processor, with front end (NCP-FE), and bus network without control processor, without front end (NCP-NFE). In the following we discuss these types of systems in the context of DLT.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

P0

P1

P2

α1 z

α2 z

...

α1w1

3

α2z

αmz

P1

Pm

...

T1

α2w2

P3

T2

αmwm Tm

αmz

α1w1

P2

T1

α2 w 2

α3 z

T2

α3w3

Pm

T3

αmwm Tm

Time

Time

Fig. 1. Execution on a bus network with control processor (CP)

Fig. 2. Execution on a bus network without control processor; load-originating processor with front end (NCPFE)

Bus network with control processor (CP) [7]. There is an independent load-originating processor P0 which does not have any processing capacity and can only communicate with one processor at a time (i.e., we assume the one-port model). Figure 1 shows a diagram representing the execution on this system. From the diagram, it is apparent that the finishing time Ti (α ) is given by

With each of the three systems described above we associate a different scheduling problem. We call these problems: BUS-LINEAR-CP, BUS-LINEAR-NCP-FE, and BUSLINEAR-NCP-NFE respectively. Each of these problems asks for the load allocation α which minimizes the total execution time (i.e., T (α ) = max(T1 (α ), T2 (α ), . . . , Tm (α ))) and are defined as follows

i

min T (α )

Ti (α ) = z ∑ α j + αi wi .

(1)

j=1

Bus network without control processor, load-originating processor with front end (NCP-FE) [7]. The load-originating processor is also a load-executing processor, i.e., the system lacks a control processor. The load-originating processor P1 has a front end permitting it to simultaneously communicate and compute. Again, we assume the one-port model. A diagram representing the execution on this system is shown in Figure 2. The finishing time Ti (α ) is given by ( if i = 1 α1 w1 Ti (α ) = (2) z ∑ij=2 α j + αi wi if i = 2, . . . , m Notice that P1 does not experience any delay related to communicating the load. Bus network without control processor, load-originating processor without front end (NCP-NFE) [7]. This is similar to the previous system in that a control processor is not present. But, the load-originating processor Pm does not have a front end, thus, it cannot simultaneously compute and communicate. As usual, we assume the one-port model. Figure 3 illustrates an execution on this system. We define the finishing time, Ti (α ), as ( z ∑ij=1 α j + αi wi if i = 1, . . . , m − 1 Ti (α ) = (3) z ∑i−1 j=1 α j + αi wi if i = m Processor Pm does not start computing until it has communicated the loads to all the other processors.

(4)

α

such that αi ≥ 0, i = 1, 2, . . . , m and ∑m i=1 αi = 1. The following theorems proved in [7] characterize the optimal solution for all three problems defined above. Theorem 2.1 (Participation): The optimal solution is obtained when all processors participate and they all finish executing their assigned load at the same time, i.e., T1 (α ) = T2 (α ) = · · · = Tm (α ). Theorem 2.2 (Ordering): Any load allocation order is optimal for the BUS-LINEAR-CP, BUS-LINEAR-NCP-FE, and BUS-LINEAR-NCP-NFE problems. BUS-LINEAR-CP Problem. The allocations are computed solving

αi wi = αi+1 z + αi+1 wi+1

i = 1, 2, . . . , m − 1.

(5)

The completion time for BUS-LINEAR-CP is then computed as T = α1 (w1 + z). The following algorithm uses (5) to compute the optimal vector of allocation. Algorithm 2.1: (BUS-LINEAR-CP Algorithm) Input: Time to process a unit load: w1 , w2 , · · · , wm ; Time to communicate a unit load: z; Output: Load fractions: α1 , α2 , · · · , αm ; 1. for j = 1, . . . , m − 1 do w k j ← z+w jj+1 2. α1 ←

1 m−1 i 1+∑i=1 ∏ j=1 k j

3. for i = 2, . . . , m do αi = α1 ∏i−1 j=1 k j

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

α1 z Pm

α2 z

...

αm−1z αm w m

P1

α1w1

P2

4

Tm

T1

α2 w 2

T2

Pm−1

αm−1wm−1 Tm−1

Time

Fig. 3. Execution on a bus network without control processor; load-originating processor without front end (NCP-NFE)

BUS-LINEAR-NCP-FE Problem. From Figure 2, the recursive equations

αi wi = αi+1 z + αi+1 wi+1 ,

i = 1, . . . , m − 1

(6)

compute the optimal allocation in the case of bus network without a control process, processor with front end. Notice that (6) and (5) are identical. Thus, we use the BUS-LINEAR-CP algorithm to compute the allocations. The completion time for BUS-LINEAR-NCP-FE is then computed as T = α1 w1 . BUS-LINEAR-NCP-NFE Problem. This problem is characterized by a different set of recursive equations as follows.

αi wi = ai+1 z + αi+1 wi+1 , αm−1 wm−1 = αm wm

i = 1, . . . , m − 2

(7) (8)

Using (7) and (8), the following algorithm that solves the BUSLINEAR-NCP-NFE problem is derived. Algorithm 2.2: (BUS-LINEAR-NCP-NFE Algorithm) Input: Time to process a unit load: w1 , w2 , . . . wm ; Time to communicate a unit load: z; Output: Load fractions: α1 , α2 , . . . αm ; 1. for j = 1, . . . , m − 2 do w k j ← z+w jj+1 2. α1 ←

1 wm−1 m−2 m−2 i 1+∑i=1 k ∏ j=1 k j + w m ∏ j=1 j

3. for i = 2, . . . , m − 2 do αi ← α1 ∏i−1 j=1 k j w 4. αm ← wm−1 α m−1 m These algorithms are executed by the load-originating processor whenever a load requires scheduling. In classical DLT, it is assumed that the load-originating processor faithfully executes the BUS-LINEAR algorithms and that the processors truthfully report their wi . If the processors are owned by autonomous, self-interested organizations, the load-originating processor may deviate from the algorithm or the processors may misreport their processing capacities in hope of gaining additional profit. In the next sections we present the design of mechanisms that compensate for strategic processors. The

mechanisms ensure that the processors report their true processing capacities and, due to the lack of a trusted control processor, that the algorithms are faithfully executed.

3

M ECHANISM D ESIGN F RAMEWORK

In this section, we introduce the main concepts of mechanism design theory. We limit our discussion to mechanisms for one parameter agents. Each agent in this mechanism design problem is characterized by private data represented by a single real value [26]. We define the problem in the following. A mechanism design problem for one parameter agents is characterized by (i) A set A of allowed outputs. The output is a vector α (b) = (α1 (b), α2 (b), . . . , αm (b)) ∈ A , computed according to the agents’ bids, b = (b1 , b2 , . . . , bm ). Here, bi is the bid of agent i. (ii) Each agent i, i = 1, 2, . . . , m, has a privately known value wi called the true value and a publicly known parameter w˜ i called the actual value, where w˜ i ≥ wi . The preferences of agent i are given by a function called valuation Vi (α (b), w˜ i ). (iii) Each agent goal is to maximize its utility. The utility of ˜ = Qi (b, w) ˜ +Vi (α (b), w˜ i ), where Qi is agent i is Ui (b, w) ˜ the payment handed by the mechanism to agent i and w is the vector of actual values. The payments are handed ˜ to the agents after the mechanism learns w. (iv) The goal of the mechanism is to select an output α that optimizes a given cost function g(b, α ). Definition 3.1 (Mechanism with Verification): A mechanism with verification is characterized by two functions: (i) The output function

α (b) = (α1 (b), α2 (b), . . . , αm (b)).

(9)

The input to this function is the vector of agents’ bids b = (b1 , b2 , . . . , bm ). (ii) The payment function ˜ = (Q1 (b, w), ˜ Q2 (b, w), ˜ . . . , Qm (b, w)), ˜ Q(b, w)

(10)

˜ is the payment handed by the mechanism where Qi (b, w) to agent i. Notation. In the rest of the paper, we denote by b−i the vector of bids excluding the bid of agent i. The vector of all bids is represented by b = (b−i , bi ). The following defines an important property in that an agent will maximize its utility when w˜ i = bi = wi independent of the actions of the other agents. Definition 3.2 (Strategyproof Mechanism): A mechanism is called strategyproof if for every agent i and for every bids b−i of the other agents, the agent’s utility is maximized when it declares its true value wi (i.e., truth-telling is a dominant strategy). The next property guarantees non-negative utility for truthful agents. This is important as agents willfully participate in hope of profits. Definition 3.3 (Voluntary Participation Mechanism): We say that a mechanism satisfies the voluntary participation

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

5

˜ −i , wi )) ≥ 0 for every agent i, true condition if Ui ((b−i , wi ), (w value wi , and other agents’ bids b−i and execution values ˜ −i (i.e., truthful agents never incur a loss). w There are two models for characterizing distributed mechanisms. They differ in the degree of control that the agents can exert. A mechanism is tamper-proof if the agents can only manipulate the inputs to the mechanism. In these types of mechanism, an agent can only specify its inputs and thus, the only method of cheating is altering its inputs. A more general model is the autonomous node model. A mechanism is an autonomous node mechanism if the agents control both the inputs and the algorithm that computes the output. An agent will implement an algorithm different from what is specified if it is beneficial for it to do so. In the next section we design mechanisms for divisible load scheduling in bus networks. In our model, each processor is characterized by a valuation function which is equal to the cost of processing a given load. A processor wants to maximize its utility which is the sum of its valuation and the payment given to it. Since the mechanisms are strategyproof, a processor maximizes its utility regardless of the others’ actions by bidding truthfully and executing the load at its true processing rate.

4

M ECHANISMS FOR BUS NETWORKS

In this section we propose strategyproof mechanisms for the three classes of bus-networked systems. We assume that the network is obedient and the network and communication protocols are tamper-proof. An entity is tamper-proof [22] if it never deviates from the protocol, even if such action would result in an increase in welfare. The distributed system is composed of m strategic processors. Each load-executing processor (agent) Pi , i = 1, 2, . . . , m, is characterized by its true value wi , which is equal to the time to process the unit load. We denote by w = (w1 , w2 , . . . , wm ) the vector of true unit load processing times. Execution time wi is private to Pi . Processor Pi is rational, i.e., it will bid an execution time bi 6= wi if it benefits it to do so. The mechanism computes the vector of load allocations α (b) = (α1 (b), α2 (2), . . . , αm (b)) satisfying ∑i αi (b) = 1, where b = (b1 , b2 , . . . , bm ) is the vector of bids. A processor Pi may choose to execute its assignment at a different rate given by its actual time w˜ i , where w˜ i ≥ wi (i.e., Pi may execute the load slower than its true rate). The mechanism learns the actual execution time w˜ i once Pi completes execution. As such, we assume that each processor has a tamper-proof meter that records this value. The valuation for processor Pi under load allocation α (b) is defined as Vi (α (b), w˜ i ) = −αi w˜ i .

(11)

This linear function is equivalent to the negation of the actual time required for Pi to execute αi load units. The greater the processing time, the smaller (more negative) the valuation. This can be considered to be the cost incurred by Pi in processing αi load units. The mechanism provides payment for executing its assignment. As such, each processor chooses its strategy to maximize its utility. The utility is defined

as the payment minus the costs incurred for processing the assignment. Pi ’s utility, Ui , is defined as ˜ = Qi (b, w) ˜ +Vi (α (b), w˜ i ), Ui (b, w)

(12)

˜ is Pi ’s payment and w ˜ = (w˜ 1 , w˜ 2 , . . . , w˜ m ) is where Qi (b, w) the vector of actual execution times with Pi characterized by w˜ i . The objective of the following mechanisms is to minimize the makespan. Designing such mechanisms involves finding an allocation and a payment scheme that minimizes the makespan according to the processors’ bid b and motivates all the processors to bid their true values wi and process the load at their full processing capacity (i.e., w˜ i = wi ). We begin our investigation by proposing a mechanism for a bus network with control processor. 4.1 Mechanism for bus networks with control processor The mechanism for bus network with control processor, DLSBL-CP (Divisible Load Scheduling - Bus Linear - Control Processor), is a centralized mechanism. Control processor P0 executes the mechanism, thus, we assume that P0 is obedient and is trusted by all parties. Each load-executing processor Pi , i = 1, 2, . . . , m, reports its bid bi to P0 . We can further classify this mechanism as a direct revelation mechanism as the only strategy available to the agent is disclosing its type (the execution time defines the type of processor). Once all bids are received, P0 computes the load assignments and then distributes load to the respective processor. We define the DLSBL-CP mechanism in the following. Definition 4.1 (DLS-BL-CP Mechanism): The DLS-BL-CP mechanism is defined by the following two functions: (i) The allocation function given by the BUS-LINEAR algorithm. (ii) The payment function given by ˜ = Ci (b, w) ˜ + Bi (b, w) ˜ Qi (b, w)

(13)

where the function ˜ = −Vi (α (b), w˜ i ) Ci (b, w)

(14)

is Pi ’s compensation and the function ˜ = T−i (α (b−i ), b−i ) − T (α (b), (b−i , w˜ i )) (15) Bi (b, w) is Pi ’s bonus. The function T−i (α (b−i ), b−i ) is the optimal total execution time (i.e., the makespan) when processor Pi is not used in the allocation. Thus, the bonus for a load-executing processor is equal to its contribution in reducing the total execution time. Compensation pays for the processor’s time spent executing work. For each unit of time that a processor is spending doing work, the mechanism gives one unit of payment. The amount that a processor is compensated is proportional to the amount of work assigned. The amount of work assigned is dependent on the processor’s position in the load allocation order. A processor is assigned relatively greater amounts of work when it appears earlier in the order than when it appears latter in

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

6

the order. The bonus, on the other hand, is independent of position as any allocation sequence results in the optimal load allocation (Thm. 2.2). We now show that the DLS-BL-CP mechanism is strategyproof and that is satisfies the voluntary participation condition. Theorem 4.1 (Strategyproofness): The DLS-BL-CP mechanism is strategyproof. Proof: Assuming a vector of bids b, the utility of processor Pi is: ˜ = Qi (b, w) ˜ +Vi (α (b), w˜ i ) Ui (b, w) = T−i (α (b−i ), b−i ) − T (α (b), (b−i , w˜ i )) + αi (b)w˜ i − αi (b)w˜ i = T−i (α (b−i ), b−i ) − T (α (b), (b−i , w˜ i )).

(16)

We consider two possible situations: (i) w˜ i = wi , i.e., processor Pi processes its assigned load using its full processing capability. If processor Pi bids its true value wi then its utility Uit is: Uit = T−i (α (b−i ), b−i ) − T (α (b−i , wi ), (b−i , w˜ i )) = T−i (α (b−i ), b−i ) − Tit

(17)

If processor Pi bids lower (bli < wi ) then its utility Uil is: Uil = T−i (α (b−i ), b−i ) − T (α (b−i , bli ), (b−i , w˜ i )) = T−i (α (b−i ), b−i ) − Til Uit

(18)

Uil ,

We want to show that ≥ which reduces to show that Til ≥ Tit . Because Tit is the minimum possible value for the processing time (from the optimality of BUS-LINEAR algorithm), by bidding a lower value, processor Pi gets more load and the total execution time is increased, thus Til ≥ Tit . If processor Pi bids higher (bhi > wi ) then its utility Uih is: Uih = T−i (α (b−i ), b−i ) − T (α (b−i , bhi ), (b−i , w˜ i )) = T−i (α (b−i ), b−i ) − Tih

(19)

By bidding a higher value processor Pi gets less load and thus more load will be assigned to the other processors. Due to the optimality of allocation the total execution time increases i.e. Tih ≥ Tit and thus we have Uit ≥ Uih . (ii) w˜ i > wi , i.e., processor Pi processes its assigned load at a slower rate thus increasing the total execution time. A similar argument as in case (i) applies. A desirable property of a mechanism is that the profit of a truthful agent is always non-negative. This means the agents hope for a profit by participating in the mechanism. Theorem 4.2 (Voluntary participation): The DLS-BL-CP mechanism satisfies the voluntary participation condition. Proof: The utility of processor Pi when it bids its true value wi is Uit = T−i (α (b−i ), b−i ) − T (α (b−i , wi ), (b−i , w˜ i )).

(20)

The total execution time T−i is obtained by using all the other processors except processor i. By allocating the same amount of load, we get a higher execution time T−i than in the case of using all the processors, with processor Pi bidding its true value (from the optimality of allocation). Thus UiT ≥ 0.

We present the protocol that implements the DLS-BL-CP mechanism. The protocol assumes the existence of a payment infrastructure. DLS-BL-CP Mechanism Bidding: Load-executing processor Pi , i = 1, 2, . . . , m reports its bid to P0 . Allocating Load: Control processor P0 computes the allocation using the BUS-LINEAR algorithm. It then transfers the ith assignment of size αi to Pi . Processing Load: A processor Pi begins executing its assignment as soon as it receives the entire load. Processor Pi executes its assignment in time w˜ i . The actual execution time w˜ i may be higher than its true time (i.e., w˜ i ≥ wi ) and may not be equal to its bid time (i.e., w˜ i 6= bi ). We cope with this situation by employing a strategyproof mechanism with verification. As we mentioned before, each processor Pi is outfitted with a meter that records the actual execution time w˜ i . Control processor P0 learns ˜ from these meters. the actual execution times w Computing Payments: Control processor P0 computes Qi using equation (13). It submits Q1 , Q2 , . . . , Qm to the payment infrastructure, which dispenses Qi units of money to Pi . At this point, Pi evaluates its profit. In the next section we examine the case in which the bus network does not have a designated control processor. The difficulty of designing a mechanism is greater as all participants are rational including the load-originating processor. 4.2 Mechanism for bus networks without control processor In this section we propose a mechanism that solves both BUS-LINEAR-NCP-FE and BUS-LINEAR-NCP-NFE problems. We consider distributed systems interconnected with bus networks without control processors. The system comprises m processors, each of which are strategic, including the loadoriginating processor. Similar to above, we assume that the network is obedient and that the network and communication protocols are tamper-proof. Additionally, we assume that the network has a reliable, atomic mechanism for broadcast communication. Since the transmission media (i.e., the bus) is shared among all processors and the distance between any pair of processors is constant, we believe that the assumption is reasonable. If processor P1 has a front end, then it is the load-originating processor; otherwise, Pm is designated the load-originating processor. In a bus network without control processor the load-originating processor is also responsible for executing load. A third party, the referee, is involved with the mechanism. In the standard mechanism design model, the agents provide inputs to a central authority which faithfully executes the algorithm (a direct revelation mechanism). The agents are able to lie to the central authority, but they are unable to alter the algorithm. In our model, the agents themselves compute the mechanism output; thus, they will alter the algorithm if it is beneficial for them to do so. The main role

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

of the referee is to prosecute cheating processors. The referee is isolated and remains passive until signaled by a processor that presumes cheating. If sufficient proof is brought forth, the referee imposes fines and distributes the proceeds among the other processors. The referee is different from the control processor considered above. The control processor is a trusted central authority that possesses the processor parameters, computes the load allocation, and transmits the work units to the processors. In the DLS-BL-CP mechanism, the control processor is assumed to be obedient, i.e, it will not cheat or deviate from the prescribed protocol. The referee, on the other hand, is used to resolve conflicts and if no conflicts arise, it does not possess any processor parameters. Notation. We use the following notation in this section. • The load-originating processor, Plo , is Plo = P1 if P1 has a frontend; otherwise, Plo = Pm . • Let SKβ be the private key of β . SIGβ (msg) is the secure digital signature of msg under SKβ . Let Sβ (msg) = msg||SIGβ (msg) be message msg concatenated with its digital signature under SKβ . The description of DLS-BL-NCP (Divisible Load Scheduling - Bus Linear - No Control Processor) follows. This mechanism is designed for both bus networks without control processor with front end and without front end. We assume the existence of a payment infrastructure and a public key infrastructure (PKI), to which the participants have access. DLS-BL-NCP Mechanism Initialization: Each participant has a public cryptographic key set. We do not dictate the specific cryptosystem, but it must minimally support digital signatures. The public key is registered under the participant’s identity with the aforementioned PKI. The user prepares her data by dividing it into small, equal-sized blocks. Each block B has a unique identifier IB appended to it and then the aggregate is signed by the user, i.e., Suser (B||IB ). Bidding: An all-to-all broadcast occurs in which processor Pi , i = 1, 2, . . . , m, communicates its digitallysigned bid SPi (bi ||Pi ) to Pj , j 6= i. Commitments are not required according to our atomic broadcast assumption1 . If Pi does not wish to participate, it does not broadcast a bid and it receives a utility of 0. Without loss of generality, we assume that Pi participates. Pj ( j = 1, . . . , m) verifies the authenticity and integrity of SPi (bi ||Pi ). If the message fails verification, it is discarded. If Pj receives multiple authenticated messages from Pi , it signals the referee providing the messages as evidence of cheating. If in fact 1. Commitments are required when atomic broadcast facilities are not available. When atomic facilities are not available, a sender distinctly transmits a message to each recipient. The sender may transmit different messages even though broadcasting by definition means sending the same message to all the recipients. Before broadcasting, the sender publicizes a commitment computed for the message. The recipient checks the commitment to ensure that it has received the proper message.

7

cheating has occurred, the referee fines Pi an amount F. If the concerns are unfounded, Pj is penalized F. Fine F must be large to dissuade cheating and to induce finking. Furthermore, F must be larger than the sum of the compensations, i.e., F ≫ ∑mj=1 α j w j . All parties are aware of the magnitude of F. Let Pk be F to Pi the party that is fined. The referee rewards m−1 (i = 1, 2, . . . , m, i 6= k) thus terminating the protocol. Allocating Load: Every processor computes the allocation (using either Algorithm 2.1 for the BUSLINEAR-NCP-FE problem or Algorithm 2.2 for the BUS-LINEAR-NCP-NFE problem) obtaining load allocations α (b) = (α1 (b), α2 (b), . . . , αm (b)). Processor Plo transmits α˜ i units of load to Pi (i 6= lo). If α˜ i 6= αi (i.e., the assignment of Pi is incorrect), Pi signals the referee. Processors Plo and Pi submit their vector of bids, b, to the referee who verifies the authenticity of the bids and computes the allocation α (b). Both processors must submit their vector of bids as either processor may alter its bid in its vector b. Let the cheater Pc be either Plo or Pi . Pc can alter the c-th component of b resulting in (SP1 (b1 ||P1 ), SP2 (b2 ||P2 ) . . . , SP′ c (b′c ||Pc ), . . . , SPm (bm ||Pm )). If the vector b submitted by Pc is inconsistent or fails authentication, Pc is fined. It is possible that both Plo and Pi are penalized. If Pi claims that α˜ i > αi , the referee attempts to substantiate the claim by comparing the blocks that Pi possesses with the original data set. If the claim is true, Plo is fined. If the claim is unfounded, Pi is fined. The case in which α˜ i < αi is more difficult to resolve primarily due to the absence of credible evidence. There are three cases in which α˜ i < αi may occur: (i) Plo communicated to few load units, (ii) the load unit integrity check failed, or (iii) Pi is lying. In all cases, the referee acts as an intermediary receiving load units from Plo , verifying their integrity, and transmitting them to Pi . If Plo refuses to transmit the correct number of load units or load unit integrity fails, Plo is fined. If Pi claims that it did not receive enough load units, Pi is fined. In all situations where fines are raised, the protocol is immediately terminated. The total fine collected is xF, where x is the number of penalized processors. The referee distributes αi w˜ i to the i − 1 processors that have commenced work. The procedure for determining w˜ i is detailed in the following stages. The remainder is evenly distributed among the m − x compliant processors. Processing Load: The processors execute their assignments. Processor Pi may process its load at a slower rate which means a unit load is processed in time w˜ i , where w˜ i ≥ wi . As we did for the above centralized mechanism, we cope with this situation by employing a strategyproof mechanism with verification. As we stated before, each processor Pi is augmented with a tamper-proof meter that records actual execution time w˜ i . We further assume that the

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

value is verifiable as it is recorded with the referee’s digital signature, i.e., Sreferee (w˜ i ). Each processor Pi broadcasts Sreferee (w˜ i ). If processor Pi neglects to broadcast its value or broadcasts values, then Pi is not issued payment. Computing Payments: Each processor Pi , i = ˜ j = 1, 2, . . . , m, using 1, 2, . . . , m, computes Q j (b, w), (13). We denote by Q = (Q1 , Q2 , . . . , Qm ) the vector of payments, where Qi is Pi ’s payment. Processor Pi submits SPi (Pi ||Q) to the referee. If there are multiple contradictory messages from Pi , the referee fines it. The referee verifies all vectors Q for equality. If there is inequality among the vectors, the bids are provided to the referee which computes the payments. The referee fines F to the x processors who incorrectly computed the payments or who provided contradicxF tory messages. The referee distributes m−x to each of the m − x correct processors. The referee forwards Q to the payment infrastructure. The bill is presented to the user who remits payment. More sophisticated methods such as quorums [20], [28] may not be used to resolve payments as these methods require a minimum number of obedient players and in our mechanism, all the processors are strategic. This completes the mechanism description. We now examine the penalties associated with the mechanism. There are three stages of the mechanism in which we inspect for algorithm deviation. The offenses are: (i) multiple, inconsistent bids broadcasted in the Bidding phase; (ii) incorrect load assignments in Allocating Work phase; (iii) incorrect payment computation in Computing Payments phase; (iv) manipulated bid vectors transmitted to the referee; (v) unsubstantiated claims. All the above result in penalizing the cheating processors. The penalties are engineered so that processors that have already performed computation are compensated. We now examine the properties and the complexity of the DLS-BL-NCP mechanism. We remark that the presented mechanism remains central in nature, but it is computed by the strategic load-originating processor, Plo . Lemma 4.1 (Utility maximizition): A processor maximizes its utility by following DLS-BL-NCP. Proof: A load-originating processor can cheat by appropriating larger than optimal work loads to the load-executing processors. A load-executing processor can cheat by misreporting its capacity of by miscomputing payments. Incentives are provided for processors to monitor one another. If a loadoriginating processor mis-allocates load to a load-executing processor, the load-executing processor will report the behavior as the fine is greater than the utility it were obtained otherwise. If a load-executing processor attempts to report an otherwise obedient load-originating processor, the magnitude of the fines levied against the load-originating processor are so large that it results in it receiving negative utility. If a load-originating processor misreports its capacity, it receives a utility smaller than what it could possible receive due to the strategyproofness of the underlying DLS-BL-CP mechanism (Theorem 4.1). If the processor miscomputes the payments,

8

the attempt will be detected by the referee who issues a fine. Lemma 4.2 (Fines): A processor receives a fine only if it has deviated from DLS-BL-NCP. Proof: Processor Pi is fined either for not complying to the protocol or another processor Pj produces contradictory messages signed by Pi . In the first case, Pi clearly deviates from DLS-BL-NCP. In the second case, Pj sends the messages either by successfully forging signatures or by possessing the private key of Pi . We assume that the forging of signatures is impossible. Processor Pj obtains the private key either by Pi sharing it or by stealing it from Pi . It is a violation of the mechanism for a second party to possess a private key. Thus, Pi is fined for the protocol deviation. Theorem 4.3 (Compliance): The processors will comply with the mechanism protocol. Proof: By Lemma 4.2, it is more profitable for a processor to report deviations than to deviate itself. Therefore, all deviations will be reported. Furthermore, Lemma 4.1 shows that a processor will maximize its utility by complying with the mechanism specification. Therefore, we can conclude that the processors will faithfully execute the prescribed mechanism. The second property we investigate is strategyproofness. If a mechanism is strategyproof, an agent maximizes its utility by being truthful. Theorem 4.4 (Strategyproofness): DLS-BL-NCP is strategyproof. Proof: The allocation function α and the payment function Q are identical to the ones used in DLS-BL-CP. By Theorem 4.1, we know that DLS-BL-CP is a strategyproof mechanism. Theorem 4.3 ensures that the processors will not deviate from the mechanism. Therefore, DLS-BL-NCP is strategyproof. We now investigate voluntary participation. A truthful agent never incurs a loss when partaking in a mechanism that satisfies the voluntary participation condition. Theorem 4.5 (Voluntary participation): DLS-BL-NCP satisfies the voluntary participation condition. Proof: DLS-BL-NCP uses the allocation function α and the payment function Q of DLS-BL-CP. By Theorem 4.2, the DLS-BL mechanism satisfies the voluntary participation condition. Theorem 4.3 states that the processors will not deviate from the algorithm. Therefore, DLS-BL-NCP satisfies voluntary participation. We examine the communication complexity of DLS-BLNCP. We define the communication cost as the product between the number of messages transmitted and the message size. We do not include the communication necessary for transferring the load units. Theorem 4.6 (Communication Complexity): The communication complexity of the DLS-BL-NCP protocol excluding load unit distribution for m processors is Θ(m2 ). Proof: The communication cost is dominated by the Computing Payment phase. Each of m processors transmits a vector of size m to the referee. Therefore, the complexity is Θ(m2 ).

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

9

0.08

0.045

0.07

0.06

T

T

0.04 0.05

0.035

0.04

0.03 0.03 0.02 1

2

3

4

5

6

7

8

Case

1

2

3

4

5

6

7

8

Case

Fig. 4. Makespan when P1 cheats (fast system).

Fig. 5. Makespan when P1 cheats (slow system).

5

load as reported); (2) w˜ 1 > w1 = b1 (i.e., P1 bids truthfully, but processes the load at a slower rate); (3) w1 < b1 = w˜ 1 (i.e., P1 bids a rate slower than its true rate, but processes the load at the reported rate); (4) w1 = w˜ 1 < b1 (i.e., P1 bids a rate slower than its true rate, but it processes the load at its true rate); (5) w1 < w˜ 1 < b1 (i.e., P1 processes its load slower than its true rate and bids a rate slower than its execution rate); (6) w1 < b1 < w˜ 1 (i.e., P1 processes the load slower than it bid and bids a rate slower than its true rate); (7) b1 < w1 = w˜ 1 (i.e., P1 bids a rate faster than its true rate, but processes the load at its true rate); (8) b1 < w1 < w˜ 1 (i.e., P1 bids a rate faster than its true rate and processes the load slower than its true rate). The bid b1 and the execution value w˜ 1 for each case are presented in Table 2 (fast system) and Table 3 (slow system). Figure 4 and 5 show the makespan (T ) for the eight cases and the two types of systems, fast and slow. Notice that case (1) (w1 = w˜ 1 = b1 ) results in the minimum makespan, while all other cases result in larger makespan. When w˜ 1 ≤ b1 (cases (3), (4), and (5)), the makespan is increased by a small amount (0.6% for the slow system and 11% for the fast system) as the load allocated to P1 is reduced and correspondingly, the load allocated to the other processors is increased. The effect of cheating is dispersed throughout the remaining processors. By decreasing the number of processors, the effect of cheating is greater. In the remaining cases (b1 < w˜ 1 ), the system performance dramatically degrades as P1 is overloaded and the other processors are underutilized. In these cases, P1 is the processor which is slowing down the entire system. The increase in makespan is large (between 57% and 209%) and it is due to the impact of w˜ 1 . Comparing the fast system with the slow system, we notice that the performance degradation is greater for the fast system even though the relative rate change is similar or smaller than that for the slow system. This is because BUS-LINEAR algorithm allocates more work to faster processors than to slower processors. We depict P1 ’s utility and payment in each of the eight cases for the fast and slow system in Figure 6 and Figure 7, respectively. As we expected, case (1) (w1 = w˜ 1 = b1 ) maximizes

E XPERIMENTAL RESULTS

In this section we study by simulation the proposed strategyproof scheduling mechanisms. We first study the DLSBL-CP mechanism. We consider two distributed systems each comprising sixteen processors, P1 , P2 , . . . , P16 , and one control processor, P0 . The first system, the ‘fast’ system, has a fastprocessing P1 , having w1 = 0.1. The second system, the ‘slow’ system, has a slow-processing P1 , having w1 = 0.7. The times to process a unit load wi , i = 1, 2, . . . , 16 are presented in Table 1. For both systems we assume that only P1 cheats by reporting values different than its true processing time and by processing the load at a different rate than the true rate. The time to communicate a unit load from P0 to any other processor is z = 0.01 for both systems. The low communication latency ensures that the system is computationally bound. If z is equal in magnitude or larger than the time to process a unit load at the fast processors, the system becomes communication bound, resulting in fewer processors performing work and negligible bonuses. For each system, we examine eight cases: (1) w1 = b1 = w˜ 1 (i.e., P1 bids truthfully and it processes the TABLE 1 Times to process a unit load. w1 fast 0.1

slow 0.7

w2

w3 – w5

w6 – w10

w11 – w16

0.1

0.2

0.5

1.0

TABLE 2 Bids and execution values (fast system). Case b1 w˜ 1

1 0.1 0.1

2 0.1 0.3

3 0.3 0.3

4 0.3 0.1

5 0.3 0.2

6 0.3 0.4

7 0.05 0.1

8 0.05 0.2

TABLE 3 Bids and execution values (slow system). Case b1 w˜ 1

1 0.7 0.7

2 0.7 0.9

3 0.9 0.9

4 0.9 0.7

5 0.9 0.8

6 0.8 0.9

7 0.5 0.7

8 0.5 0.8

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

10

Q1 U1

0.03

Q1 U1

0.03

0.02 0.02

0.01

0 0.01 -0.01 0

-0.02

-0.03 -0.01 -0.04

-0.05

-0.02 1

2

3

4

5

6

7

8

1

2

3

4

Case

5

6

7

8

Case

Fig. 6. Payment and Utility of P1 (fast system).

Fig. 7. Payment and Utility of P1 (slow system).

Ui

Ui

0.01

0.01

0

0.005

-0.01

0

-0.02

-0.005

-0.03

-0.01

-0.04

-0.015

-0.05

-0.02 8

8

7

7

6 P1

P3

6

5 P5

P7

Processor

4 P9

P11

3 P13

P1 Case

2 P15

P1 ’s utility. In all the other cases P1 ’s utility is lessened. When b1 < w˜ 1 (cases (2), (6), (7), and (8)), the utility is negative due to the impact of w˜ 1 on the makespan. In these cases, B1 < 0 as T−1 (α (b−1 ), b−1 ) < T (α (b), (b−1 , w˜ 1 )). In the remaining cases, B1 and thus, U1 is reduced as α1 is smaller than in the optimal case. As anticipated, the utility of P1 is much greater for the fast system than for the slow system, due to larger allocations to faster processors. Figure 8 and 9 show the utility for all processors in all cases for the considered systems. When b1 > w1 (cases (3), (4), (5), and (6)), α j (for j = 2, 3, . . . , 16) is increased resulting in greater U j . For example U2 is increased by 36% in the fast system and by 1% in the slow system. When b1 < w1 (cases (7) and (8)), the reduced α j results in decreased U j . For example U2 is decreased by 30% in the fast system and by 3% in the slow system. In the remaining cases, α j and U j are unchanged as b1 = w1 . The impact of b1 6= w˜ 1 is felt unevenly among the processors. The effects of cheating diminishes as the processor index increases. For example in the case of the fast system and cases (3), (4), (5), and (6), the increase in U2 is 36% while the increase in U16 is 31%. This behavior is due to the allocation computed by the scheduling algorithm. The effect of cheating on the makespan is dependent on the

5 P5

P7

4 P9

Processor

1

Fig. 8. Utility of each processor when P1 cheats (fast system).

P3

P11

3 P13

Case

2 P15

1

Fig. 9. Utility of each processor when P1 cheats (slow system).

order in which the loads are assigned to processors. Figure 10 shows the ratio of makespan to the optimal makespan as a function of cheating processor’s position in the assignment order. For example position six corresponds to the case in which the cheating processor receives the load after five processors received their loads. The cheating processor is characterized by w1 = 0.1, b1 = 0.05, and w˜ 1 = 0.2 (case 8 of Table 2). The figure clearly shows that the effect of cheating is greatest when the cheating processor is the first in the assignment order and it diminishes as it is moved later in the sequence. We do not present results for the other mechanism, DLSBL-NCP, as it exhibits the same performance as DLS-BL-CP when a processor misreports its processing capacity. The other case when the load-originating processor, Plo , misallocates a processor results in protocol termination and a large fine. This magnitude of the fine ensures that Plo experiences a nonpositive utility.

6

C ONCLUSION

In this work we proposed mechanisms for each class of busnetworked systems. The first mechanism we presented, is for

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

11

makespan/(optimal makespan)

3.1

[4]

3 2.9

[5]

2.8 2.7 2.6

[6]

2.5

[7]

2.4 2.3 2

4

6

8

10

12

14

16

Position

Fig. 10. The ratio of makespan to optimal makespan as a function of P1 ’s position.

[8] [9] [10]

a bus network with a control processor. The control processor’s role is to strictly distribute load to the load-executing processors as it does not have any load executing capabilities. We designed a direct revelation mechanism which is executed by the control processor. The load-executing processors report bids to it, which it then computes the load allocation. We show that this mechanism is strategyproof and satisfies the voluntary participation condition. In the other mechanism, all processors, including the load-originating processor, are strategic. In this situation, the processors compute the mechanism and thus, can exert control over it. We model this situation using the autonomous node model. By using digital signatures, we were able to design a protocol in which cheating is detected. We show that this mechanism and its corresponding protocol is strategyproof and satisfies the voluntary participation condition. In the future we plan to develop a cohesive theory that considers the use of incentives in divisible load scheduling.

ACKNOWLEDGMENT This paper is a revised and extended version of [17] presented at the 4th International Symposium on Parallel and Distributed Computing (ISPDC 2005) and [10] presented at the 20th IEEE International Parallel and Distributed Processing Symposium, 8th Workshop on Advances in Parallel and Distributed Computational Models (APDCM 2006). The authors wish to express their thanks to the editor and the anonymous referees for their helpful and constructive suggestions, which considerably improved the quality of the paper. This research was supported, in part, by NSF grant DGE-0654014.

R EFERENCES [1]

[2]

[3]

A. Archer and E. Tardos. Truthful mechanism for one-parameter agents. In Proc. of the 42nd IEEE Symp. on Foundations of Computer Science, pages 482–491, Oct. 2001. A. Archer and E. Tardos. Frugal path mechanisms. In Proc. of the 13th Annual ACM-SIAM Symp. on Discrete Algorithms, pages 991–999, Jan. 2002. O. Beaumont, H. Casanova, A. Legrand, Y. Robert, and Y. Yang. Scheduling divisible loads on star and tree networks: Results and open problems. IEEE Trans. Parallel and Distributed Syst., 16(3):207–218, Mar. 2005.

[11]

[12] [13] [14] [15]

[16] [17]

[18] [19] [20] [21] [22] [23] [24] [25]

[26] [27] [28]

O. Beaumont, L. Marchal, and Y. Robert. Scheduling divisible loads with return messages on heterogeneous master-worker platforms. In Proc. of the 12th International Conference on High Performance Computing, pages 123–132. Springer Verlang, 2005. O. Beaumount, L. Marchal, V. Rehn, and Y. Robert. FIFO scheduling of divisible loads with return messages under the one-port model. In Proc. of the 20th IEEE International Parallel and Distributed Processing Symp., Apr. 2006. V. Bharadwaj and G. Barlas. Access time minimization for distributed multimedia applications. Multimedia Tools and Applications, 12(23):235–256, Nov. 2000. V. Bharadwaj, D. Ghose, V. Mani, and T. G. Robertazzi. Scheduling Divisible Loads in Parallel and Distributed Systems. IEEE Computer Society Press, Los Alamitos, CA, USA, 1996. V. Bharadwaj, D. Ghose, and T. G. Robertazzi. Divisible load theory: A new paradigm for load scheduling in distributed systems. Cluster Computing, 6(1):7–17, Jan. 2003. J. Blazewicz, M. Drozdowski, and M. Markiewicz. Divisible task scheduling - concept and verification. Parallel Computing, 25(1):87– 98, Jan. 1999. T. E. Carroll and D. Grosu. A strategyproof mechanism for scheduling divisible loads in bus networks without control processors. In Proc. of the 20th IEEE International Parallel and Distributed Processing Symposium, 8th Workshop on Advances in Parallel and Distributed Computational Models. IEEE Computer Society, April 2006. S. Chan, V. Bharadwaj, and D. Ghose. Large matrix-vector products on distributed bus networks with communication delays using the divisible load paradigm: Performance and simulation. Mathematics and Computers in Simulation, 58:71–92, 2001. E. Clarke. Multipart pricing of public goods. Public Choice, 8:17–33, 1971. J. Feigenbaum, C. Papadimitriou, R. Sami, and S. Shenker. A BGP-based mechanism for lowest-cost routing. In Proc. of the 21st ACM Symp. on Principles of Distributed Computing, pages 173–182, July 2002. J. Feigenbaum, C. H. Papadimitriou, and S. Shenker. Sharing the cost of multicast transmissions. Journal of Computer and System Sciences, 63(1):21–41, Aug. 2001. J. Feigenbaum and S. Shenker. Distributed algorithmic mechanism design: Recent results and future directions. In Proc. of the 6th ACM Workshop on Discrete Algorithms and Methods for Mobile Computing and Communications, pages 1–13, Sept. 2002. D. Ghose, H. J. Kim, and T. H. Kim. Adaptive divisible load scheduling strategies for workstation clusters with unknown network resources. IEEE Trans. Parallel and Distributed Syst., 16(10):897–907, Oct. 2005. D. Grosu and T. E. Carroll. A strategyproof mechanism for scheduling divisible loads in distributed systems. In Proc. of the 4th International Symposium on Parallel and Distributed Computing, pages 83–90. IEEE Computer Society, July 2005. D. Grosu and A. T. Chronopoulos. Algorithmic mechanism design for load balancing in distributed systems. IEEE Trans. Systems, Man and Cybernetics - Part B: Cybernetics, 34(1):77–84, Feb. 2004. T. Groves. Incentive in teams. Econometrica, 41(4):617–631, 1973. L. Lamport, R. Shostak, and M. Pease. The Byzantine generals problem. ACM Transactions on Programming Languages and Systems, 4(3):382– 401, July 1982. X. Li, V. Bharadwaj, and C. Ko. Distributed image processing on a network of workstations. Intl. Journal of Computers and Their Applications, 25(2):1–10, 2003. J. C. Mitchell and V. Teague. Autonomous nodes and distributed mechanisms. In Proc. of the Mext-NSF-JSPS International Symp. on Software Security - Theories and Systems, pages 58–83, Nov. 2003. C. Ng, D. Parkes, and M. Seltzer. Strategyproof computing: Systems infrastructures for self-interested parties. In Proc. of the 1st Workshop on Economics of Peer-to-Peer Systems, June 2003. C. Ng, D. Parkes, and M. Seltzer. Virtual worlds: Fast and strategyproof auctions for dynamic resource allocation. In Proc. of the ACM Conference on Electronic Commerce, pages 238–239, June 2003. N. Nisan, S. London, O. Regev, and N. Camiel. Globally distributed computation over Internet - The POPCORN project. In Proc. of the 18th IEEE International Conference on Distributed Computing Systems, pages 592–601, May 1998. N. Nisan and A. Ronen. Algorithmic mechanism design. Games and Economic Behaviour, 35(1/2):166–196, Apr. 2001. M. Osborne and A. Rubinstein. A Course in Game Theory. MIT Press, Cambridge, Mass., 1994. M. Pease, R. Shostak, and L. Lamport. Reaching agreement in the presence of faults. Journal of the ACM, 27(2):228–234, Apr. 1980.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS

[29] T. G. Robertazzi. Ten reasons to use divisible load theory. IEEE Computer, 36(5):63–68, May 2003. [30] W. Vickrey. Counterspeculation, auctions, and competitive sealed tenders. Journal of Finance, 16(1):8–37, March 1961. [31] W. E. Walsh, M. P. Wellman, P. R. Wurman, and J. K. MacKie-Mason. Some economics of market-based distributed scheduling. In Proc. of the 18th IEEE International Conference on Distributed Computing Systems, pages 612–621, May 1998. [32] R. Wolski, J. S. Plank, T. Bryan, and J. Brevik. G-commerce: market formulations controlling resource allocation on the computational grid. In Proc. of the 15th IEEE International Parallel and Distributed Processing Symposium, Apr. 2001. [33] Y. Yang, K. van der Raadt, and H. Casanova. Multiround algorithms for scheduling divisible loads. IEEE Trans. Parallel and Distributed Syst., 16(11):1092–1102, Nov. 2005. [34] D. Yu and T. G. Robertazzi. Divisible load scheduling for grid computing. In Proc. of the 15th International Conference on Parallel and Distributed Computing and Systems, Nov. 2003.

Thomas E. Carroll received his Bachelors of Science in Chemistry, Bachelors of Science in Computer Science, and Masters of Science in Computer Science degrees from Wayne State University, Detroit, Michigan, USA in 2001 and 2006, respectively. Currently, he is a Ph.D. candidate in the Department of Computer Science at Wayne State University. His research focus is incentive-centered design for resource allocation in distributed computer systems. He is a student member of the IEEE.

12

Daniel Grosu received his Diploma in Engineering (Automatic Control and Industrial Informatics) from the Technical University of Iasi, Romania in 1994 and the M.Sc. and Ph.D. degrees in Computer Science from The University of Texas at San Antonio in 2002 and 2003, respectively. Currently, he is an assistant professor in the Department of Computer Science at Wayne State University, Detroit. His research interests include distributed systems and algorithms, resource allocation, computer security and topics at the border of computer science, game theory and economics. He has published more than 50 peer-reviewed papers in the above areas. He has served on the program and steering committees of several international meetings in parallel and distributed computing. He is a member of the IEEE and the ACM.