Deterministic Truthful Approximation Mechanisms for Scheduling Related Machines ∗ Vincenzo Auletta

Roberto De Prisco

Paolo Penna

Pino Persiano

Abstract We consider the problem of scheduling jobs on parallel related machines owned by selfish agents. We provide deterministic polynomial-time (2 + ǫ)-approximation algorithms and suitable payment functions that yield truthful mechanisms for several NP-hard restrictions of this problem. Up to our knowledge, our result is the first non-trivial polynomial-time deterministic truthful mechanism for this NP-hard problem. Our result also yields a family of deterministic polynomial-time truthful (4 + ǫ)-approximation mechanisms for any fixed number of machines. The only previously-known mechanism for this problem (proposed by Archer and Tardos [FOCS 2001]) is randomized and truthful under a weaker notion of truthfulness. To obtain our results we introduce a technique to transform the PTAS by Graham into a deterministic truthful mechanism.

1

Introduction

The Internet is a complex distributed system where a multitude of heterogeneous entities (e.g., providers, autonomous systems, universities, private companies, etc.) offer, use, and even compete with each other for resources. Resource allocation is a fundamental issue for the efficiency of a complex system. Several efficient distributed protocols have been designed for resource allocation. The underlying assumption is that the entities running the protocol are trustworthy; that is, they behave as prescribed by the protocol. This assumption is unrealistic in some settings as the entities owning the resources might try to manipulate the system in order to get some advantages by reporting false information. For example, a router of an autonomous system can report false link status trying to redirect traffic through another autonomous system. At a different level, Internet users have the freedom of changing the parameters of their TCP/IP stack to get a better throughput. Newly emerging Internet applications ignore TCP altogether and use UDP (or other ad-hoc transport protocols) so that they do not have to worry about the TCP congestion control mechanisms that throttles the bandwidth when the Internet is jammed. With false information even the most efficient protocol may lead to unreasonable solutions if it is not designed to cope with the selfish behavior of the single entities. Thus, new models are needed to analyze and redesign the protocols used to construct complex systems based on a multitude of selfish agents. The field of mechanism design provides an elegant theory to deal with this kind of problems. The main idea of this theory is to pay the agents to convince them to perform strategies that help the system to optimize a global objective function. A mechanism M = (A, P ) is a combination ∗

Dipartimento di Informatica ed Applicazioni “R.M. Capocelli”, Universit` a di Salerno, via S. Allende 2, I-84081 Baronissi (SA), Italy. E-mail: {auletta, robdep, penna, giuper}@dia.unisa.it.

1

of two elements: an algorithm A computing a solution and a payment function P specifying the amount of “money” the mechanism should pay to each entity. Informally speaking, each agent i has a valuation function that associates to each solution X some value vi (X) and the mechanism pays i an amount P i (X, ri ) based on the solution X and on the reported information ri . A truthful mechanism is a mechanism such that the payments guarantee that, when X = X(ri ) is the solution computed by the mechanism, ui := pi (X, ri ) + vi (X) is maximized for ri equal to the true information (see Sect. 2 for a formal definition). In other words, agent i has no incentive to lie. Recently, mechanism design has been applied to several optimization problems arising in computer science, networking and algorithmic questions related to the Internet (see [11] for a survey). Several basic problems (e.g., shortest path, minimum spanning tree, etc.) have been (re-) considered in the context of selfish agents [9, 12]: a network graph is composed by a set of weighted edges, each of them owned by an agent that privately knows the corresponding weight; this weight measures the cost for the corresponding agent in being chosen in the solution (e.g., the cost of forwarding traffic over a link). In the seminal papers by Nisan and Ronen [9, 10] (see also [12]) it is first pointed out that classical results in mechanism design theory, originated from micro economics and game theory, do not completely fit in a context where computational issues play a crucial role [10]. Also, optimization problems arise (job scheduling being one of them), for which classical existing results do not apply and new techniques are needed [9, 12]. The importance of mechanism design for job scheduling problems is twofold. On one hand, it is the first problem for which new techniques for designing truthful mechanisms have been introduced [2, 9, 12]. On the other hand, it is a basic problem which models important features of different allocation problems and routing in communication networks. The main purpose of this paper is to provide polynomial-time approximation truthful mechanisms for the problem of scheduling jobs on parallel related machines (Q||Cmax ). The existence of efficient truthful mechanisms provides a better understanding of the loss of performance due to the interplay between the lack of cooperation (i.e., selfish agents) and the limited computational resources (i.e., the approximability of NP-hard problems). As we discuss below, addressing these issues together has been done only partially before.

1.1

Previous Work

Truthful VCG mechanisms. The theory of mechanism design dates back to the seminal papers by Vickrey [13], Clarke [3] and Groves [6]. Their celebrated VCG mechanism is still the prominent technique to derive truthful mechanisms for many problems (e.g., shortest path, minimum spanning tree, etc.). In particular, when applied to combinatorial optimization problems (see e.g., [9, 12]), the VCG mechanisms guarantee the truthfulness under the hypothesis that the optimization function is utilitarian 1 and the mechanism is able to compute the optimum. On the other hand, there is no restriction on the agents’ valuation functions. Feasible mechanism design. Since for many optimization problems, like scheduling, it is not possible to compute the optimum in polynomial time, unless P = NP, [10] focuses on the truthfulness of so called VCG-based mechanisms, that is, mechanisms obtained by replacing the 1 A maximization problem is utilitarian if the optimization can be written as the sum of the agents’ valuation functions.

2

exact algorithm with an approximation one. The authors show that sub-optimal solutions do not work because a false declaration may improve the computed solution and therefore the utility of the agent. Non utilitarian problems and scheduling. The second limit of VCG-based mechanisms is the fact that they only apply to utilitarian problems. Task scheduling is not monotone since we aim at minimizing the maximum over all machines of their completion times. Nisan and Ronen [9, 12] first considered the unrelated machines case and provide an n-approximation truthful mechanism for it (n is the number of machines). Rather surprisingly, this mechanism is optimal for the case of n = 2 and i For the case n > 2 [9, 12] prove that a wide class of “natural” mechanisms cannot achieve a factor better than n, if we require the truthfulness. Finally, for n = 2 [9, 12] give a randomized 7/4-approximation mechanism. A simpler variant of task scheduling has been tackled in [2]: the related machines case (in short Q||Cmax ). This problem version makes possible to express the agent valuations as the product of the work assigned to the corresponding machine times a parameter (namely, the inverse of speed). The fundamental step made in [2] is to characterize those algorithms which can be turned into a truthful mechanism. Their beautiful result brings us back to “pure algorithmic problems” as all we need is to find a good algorithm for the original problem which also satisfies the additional monotonicity requirement: increasing the speed of exactly one machine does not make the algorithm decrease the work assigned to that machine (see Sect. 2 for a formal definition). Indeed, they prove that: • If A is monotone, then there exits a payment function P A such that M = (A, P A ) is truthful. • A mechanism M = (A, P ) is truthful if and only if A is monotone. Moreover, they provide a closed formula for P A , which depends on A only (see Theorem 28 below). The authors then showed that the algorithm computing the (lexicographically minimal) optimal solution is monotone, thus implying that this problem admits a truthful mechanism computing the optimum. As this mechanism is not feasible, unless P = NP, they also provide a randomized 3-approximation mechanism for this task scheduling version. As we also discuss in the next paragraph, in this case, the mechanism is truthful in expectation, a weaker notion of truthfulness. Weaker notions of truthfulness. There is a significant difference between the definition of truthfulness used in [9, 12] and the one used in [2]. Indeed, the randomized 7/4-approximation algorithm in [9, 12] yields a truthful dominant strategy for any possible random choice of the algorithm. A randomized mechanism M can be seen as a probability distribution over deterministic mechanisms: an element x is selected randomly and the corresponding mechanism Mx is used. So, the mechanism in [9, 12] is truthful for every fixed x. In [2], instead, the notion of utility is replaced by the expected utility one: even though the expected utility is maximized when telling the truth, for some x there might exist a better (untruthful!) strategy. This idea is pushed further in [1] where one parameter agents are considered for the problem of combinatorial auction. In this work, truthfulness is achieved w.r.t. expected utility and with high probability, that is, the probability that an untruthful declaration improves the agent utility is arbitrarily small.

3

1.2

Our contribution

It is natural to ask whether some problems require some relaxation on the definition of truthfulness in order to achieve polynomial-time approximation mechanisms. In other words, whether it is necessary to make milder assumptions on the “selfishness” of the agents in order to attain good solutions in polynomial time. In this paper we investigate the existence of truthful polynomial-time approximation mechanisms for Q||Cmax , while maintaining the strongest definition of truthfulness: truth-telling is a dominant strategy over all possible strategies of an agent. We first show that, for any fixed number of machines, Q||Cmax admits a deterministic truthful (2 + ǫ)-approximation mechanism if there exists a monotone allocation algorithm Gc whose cost is within an additive factor of O(tmax /s1 ) from the cost of Greedy, where tmax is the largest job weight and s1 is the smallest machine speed (see Sect. 4). Our result is a modification of the classical PTAS [5] based on the computation of an optimal schedule on the h largest jobs, with the remaining smaller jobs scheduled by the Greedy algorithm. Notice that this PTAS cannot be used to construct a truthful mechanism because Greedy is not monotone and the allocation produced by the combination of the two algorithms (the optimal and the greedy one) is also not monotone. Our technical contribution here is the analysis of a new algorithm obtained by combining the optimal algorithm and Gc, that preserves the monotonicity and whose cost is within a factor of 2 of the cost of the PTAS. Armed with this result, we turn our attention to the existence of such a monotone algorithm Gc. We provide a new greedy algorithm which achieves monotonicity and the above mentioned bound w.r.t. Greedy for the following versions of the problem (see Sect. 5): • speeds are integer and the largest speed is bounded from above by a constant; • speeds are divisible, that is, they belong to a set C = {c1 , c2 , . . . , cp , . . .} such that for each i, ci+1 is a multiple of ci . Thus, for both these cases, we obtain a family of deterministic truthful (2 + ǫ)-approximation mechanisms (see Sect. 7). Observe that all such restrictions remain NP-hard even for two machines: indeed the identical speeds version is reducible from Partition [4]. Up to our knowledge, this is the first result in which approximate solutions yield truthful mechanisms, where truthfulness is defined in the strongest sense. The mechanism presented in [2] guarantees a 3-approximation, but, as pointed out earlier, the mechanism presented is randomized and is truthful on average. Although our new algorithm is relatively simple, its analysis, in terms of monotonicity and approximability, is far from trivial and goes through several properties of greedy allocations on identical machines. We emphasize that the importance of an approximating mechanism for the case of divisible speeds is both practical and theoretical. Indeed, on one hand, in many practical applications “speeds” are not arbitrary but they are taken from a pre-determined set of “types”, yielding values that are multiple with each other. Moreover, this result implies the existence, for any fixed number of machines, of deterministic truthful (4 + ǫ)-approximate mechanisms for the case of arbitrary speeds, for any ǫ > 0. The reader may wonder whether assuming divisible speeds does make the problem of designing truthful mechanisms much simpler. We show that this is not the case as, also in this restriction, existing/natural approximation algorithms are not monotone, thus not suitable for truthful mechanisms (see Appendix A). 4

Furthermore, our mechanisms are able to compute the payments within polynomial time (see Sect. 7), which is a property that cannot be directly derived from the results in [2]. Finally, our mechanisms satisfy voluntary participation, meaning that a truthfully behaving agent never incurs in a loss (i.e., negative utility).

2

Preliminaries

We consider the problem of scheduling on related parallel machines (Q||Cmax ). We are given the speed vector s = hs1 , s2 , . . . , sn i, with s1 ≤ s2 ≤ · · · ≤ sn , of the of n machines and a job sequence with weights σ = (t1 , t2 , . . . , tm ). In the sequel we simply denote the i-th job with its weight ti . The largest job weight in σ is denoted by tmax . A schedule is a mapping that associates each job to a machine. The amount of time to complete job j on machine i is tj /si . The work of machine i, denoted as wi , is given by the sum of the weights of the jobs assigned to i. The load (or finish time) of machine i is given by wi /si . The cost of a schedule is the maximum load over all machines, that is, its makespan. Given an algorithm A for Q||Cmax , A(σ, s) denotes the solution computed by this algorithm on input the job sequence σ and the speed vector s. The cost of the solution computed by algorithm A on input σ and s is denoted by cost(A, σ, s). We will also consider scheduling algorithms that take as third input the parameter h. In this case we denote by A(σ, s, h) the schedule output and by cost(A, σ, s) its cost. We consider Q||Cmax in the context of selfish agents in which each machine is owned by an agent and the value of si is privately known to the agent. A mechanism for this problem is a pair M = (A, P ), where A is an algorithm to construct a solution and P is a payment function. In particular, the mechanism asks each agent i to report her speed and, based on the reported costs, constructs a solution using A and pays the agents according to P = (P1 , P2 , . . . , Pn ). The profit of agent i is defined as prof iti = Pi − wi /si , that is, payment minus the cost incurred by the agent in being assigned work wi . A strategy for an agent i is to declare a value bi for her speed. Let b−i denote the sequence (b1 , b2 , . . . , bi−1 , bi+1 , . . . , bn ). A strategy bi is a dominant strategy for agent i, if bi maximizes prof iti for any possible b−i . A mechanism is truthful if, for any agent i, declaring her true speed is a dominant strategy. A mechanism satisfies voluntary participation if, for any agent i, declaring her true speed yields a non-negative utility. An algorithm for the Q||Cmax problem is monotone if, given in input the machine speeds b1 , b2 , . . . , bn , for any i and fixed b−i , the work wi is non decreasing in bi . Given a sequence σ of m jobs, we denote by σh the subsequence consisting of the first h jobs in σ, for any h ≤ m; moreover, σ \ σh denotes the sequence obtained by removing from σ the h first jobs. The Greedy algorithm (also known as the ListScheduling algorithm [5]) processes jobs in the order they appear in σ and assigns a job tj to the machine i minimizing (wi + tj )/si , where wi denotes the work of machine i before job tj is assigned; if more than one machine minimizing the above ratio exists then the one of smallest index is chosen. An optimal algorithm computes a solution of minimal cost opt(σ, s). Throughout the paper we assume that the optimal algorithm always produces the lexicographically minimal optimal assignment. As shown in [2], this algorithm is monotone. An algorithm A is a c-approximation algorithm if, for every instance (σ, s), cost(A, σ, s) ≤ c · opt(σ, s). A polynomial-time approximation scheme (PTAS) for a minimization problem is a family A of algorithms such that, for every ǫ > 0 there exists a (1 + ǫ)-approximation algorithm

5

Aǫ ∈ A whose running time is polynomial in the size of the input. It is well-known [5] (and we review the algorithm in Section 4) that, for each fixed number of machines, there exists a PTAS for Q||Cmax .

3

A warm-up algorithm

In this section we present as a warm-up a simple monotone algorithm EZ for the case of two machines that has an approximation ration equal to the golden ration φ ≈ 1.61 . . .. As a consequence of the results of [2], this gives a polynomial-time φ-approximated mechanism for Q||Cmax . Algorithm EZ Input: a job sequence σ and a speed vector s = (s1 , s2 ), s1 ≤ s2 . The algorithm distinguishes two cases. Case I. s2 /s1 ≤ φ. In this case we run the PTAS for two machines of identical speed and obtain a (1 + ǫ)approximation. Let L1 and L2 be the loads of the two identical machines. Then we assign the larger of the two loads to the machine of speed s2 and the smaller of the two to the machine of speed s1 . Case II. s2 /s1 > φ. In this case all the jobs are assigned to the machine of speed s2 . We next show that the algorithm is monotone and then prove a bound on its approximation ratio. Lemma 1 Algorithm EZ is monotone. Proof. If the slower of the two machines decreases her speed (i.e., it becomes even slower) then the following cases are possible. 1. EZ stays in Case I. In this case the machine gets the same load. 2. EZ stays in Case II. In this case the machine gets the same load, that is, 0. 3. EZ switches from Case I to Case II. In this case the load of the machine becomes 0 and thus does not increase. Thus, in no case the load of the machine increases. If instead the slower of the two machines increases her speed (i.e., becomes faster) while remaining the slower of the two then we can have two cases: the relation of s2 /s1 with φ does not change (in which case the load remains the same) or s2 /s1 becomes smaller than φ (in which case the load of the slower machines does not decrease). If instead the machine becomes the faster of the two then it will be certainly assigned more load. The case in which the fastest of the two machines changes her speed can be argued similarly. 2 Now we give a bound on the approximation ratio of the algorithm. 6

Lemma 2 Algorithm EZ is φ-approximated. Proof. Let us assume without loss of generality that 1 = s1 ≤ s2 . In Case I, we have that the algorithm produces an s2 -approximation with s2 ≤ φ. In Case II, instead the algorithm outputs a (1 + 1/s2 )-approximation. However since, s2 ≥ φ, we have that 1 + 1/s2 ≤ φ. 2

4

Combining monotone algorithms with the optimum

In this section we show how to combine an optimal schedule on a subsequence of the jobs with the one produced by a monotone algorithm on the remaining jobs in order to obtain a good monotone approximation algorithm. Our approach is inspired by the PTAS of Graham [5] that can be described as follows. Algorithm PTAS Input: a job sequence σ, speed vector s, and parameter h. Assume that the jobs in σh are the h largest jobs of σ. A. compute the lexicographically minimal schedule among those that have optimal makespan with respect to job sequence σh and vector speed s; let ai be the load assigned to machine i by this phase; B. run algorithm Greedy on job sequence σ \ σh and vector speed s assuming that machine i has initial load ai , i = 1, . . . , n. output the schedule that assigns to machine i the jobs assigned to machine i in Phase A and Phase B. It is well known that, for any ǫ > 0 and for any number n of machines, it is possible to choose the value of the parameter h so that PTAS outputs a schedule of makespan at most (1 + ǫ) times the optimal makespan. Unfortunately, PTAS is not monotone. Indeed, even though the first phase is monotone, it is easy to see that Greedy is not monotone [2]. Moreover, even if we replace Greedy with a monotone algorithm the resulting algorithm is not guaranteed to be monotone. We, instead, propose the following approach. Let Gc be any scheduling algorithm. By Opt-Gc we denote the following algorithm. Algorithm Opt-Gc Input: a job sequence σ, speed vector s, and parameter h. Assume that the jobs in σ are ordered in non-increasing order by weight. A. compute the lexicographically minimal schedule among those that have optimal makespan with respect to job sequence σh and speed vector s; B. run algorithm Gc on job sequence σ \ σh and speed vector s assuming that machine i has initial load 0, i = 1, . . . , n; output the schedule that assigns to machine i the jobs assigned to machine i in Phase A and Phase B. We have the following lemma. 7

Lemma 3 If Gc is monotone then Opt-Gc is also monotone. Proof. Indeed the load received by each machine is the sum of the loads computed by two monotone algorithms running on two disjoint sets of jobs. These two sets are uniquely determined by the parameter h, and do not depend on the machine speeds. 2 In the next sections we show that, if Gc has an approximation factor close to the one of the greedy algorithm, then, for each ǫ > 0 and for each number n of machines, it is possible to choose the value of the parameter h so that Opt-Gc outputs a schedule of makespan at most (2 + ǫ) times the optimal schedule. We start by defining the notion of a greedy-close algorithm. Definition 4 (greedy-close algorithm) Let c be a constant. An algorithm Gc is c-greedyclose if, for any job sequence σ and any machine speed vector s = hs1 , s2 , . . . , sn i, cost(Gc, σ, s) ≤ cost(Greedy, σ, s) + c · tmax /s1 . An algorithm Gc is greedy-close if it is c-greedy-close for some constant c.

4.1

Algorithm PTAS-Gc

In this section we show how to combine a greedy-close algorithm Gc with the optimal algorithm in order to obtain a new approximation scheme PTAS-Gc. In the next section we relate the approximation factors of PTAS-Gc and Opt-Gc and show that the approximation factor of Opt-Gc is at most twice the approximation factor of PTAS-Gc. Intuitively, PTAS-Gc computes the optimal schedule on the h largest jobs and then combines it with a greedy-close solution computed using algorithm Gc. Moreover, in order to guarantee a “good” approximability, we make a balancing step in Phase B where jobs are assigned to nonbottleneck machines to reduce the unbalancing, while keeping the solution optimality. Algorithm PTAS-Gc Input: a job sequence σ, speed vector s, and parameter h. Assume that the jobs in σh are the h largest jobs of σ. A. compute the lexicographically minimal schedule among those that have optimal makespan with respect to job sequence σh and speed vector s; let opt(σh , s) be the makespan of the schedule produced in this phase; B. reduce unbalancing without increasing cost by running algorithm Greedy as long as it is possible to add jobs without exceeding opt(σh , s) and let h′ be the last job considered in this phase; C. run algorithm Gc on job sequence σ \ σh′ and vector speed s assuming that machine i has initial load 0, for i = 1, . . . , n; output the schedule that assigns to machine i the jobs assigned to machine i in phases A, B and C. The rest of this section is devoted to the analysis of the approximation guarantee of algorithm PTAS-Gc. We consider algorithm PTAS-Greedy which is algorithm PTAS-Gc with Gc = Greedy. We define the quantity cost(PTAS-Greedy, σ, s, h) as the sum of the costs of the schedule computed 8

by PTAS-Greedy at the end of Phase B and the cost of the schedule computed by PTAS-Greedy in Phase C. More precisely, we have that cost(PTAS-Greedy, σ, s, h) = opt(σh , s) + cost(Greedy, σ \ σh′ , s), where h′ is the value computed in Phase B. It is obvious that cost(PTAS-Greedy, σ, s, h) ≥ cost(PTAS-Greedy, σ, s, h). We next introduce a modification of Greedy that never outputs a schedule worse than the one it would have obtained by ignoring the slowest machines. Definition 5 (greedy∗ ) Let Greedy∗ denote the algorithm that, on input σ and s = hs1 , . . . sn i, returns as output the best schedule among those computed by Greedy on input σ and speed vectors h0, . . . , 0, sk , . . . , sn i for k = 1, . . . , n. Clearly, we have that cost(Greedy, σ, s) ≥ cost(Greedy∗ , σ, s). Next lemma proves that cost(Greedy, σ, s) is not much bigger than cost(Greedy∗ , σ, s). Lemma 6 For any job sequence σ and for any speed vector s = hs1 , · · · , sn i, we have that cost(Greedy, σ, s) ≤ cost(Greedy∗ , σ, s) + tmax /s1 . Proof. We show, by contradiction, that cost(Greedy, σ, s) ≤ opt(σ, s) + tmax /s1 , which implies the lemma since opt(σ, s) ≤ cost(Greedy∗ , σ, s) ≤ cost(Greedy, σ, s). Let us assume that cost(Greedy(σ, s)) > opt(σ, s) + tmax /s1 .

(1)

Let t∗ be the weight of the last job added by Greedy to one of the bottleneck machines. Let j be the index of the machine of minimal completion time. It can be easily seen that the overall work P is at least (wj /sj ) ni=1 si and opt(σ, hs1 , s2 , . . . , sn i) ≥ wj /sj . So, Equation 1 yields (wj + t∗ )/sj

≤ wj /sj + t∗ /s1 ≤ opt(σ, hs1 , s2 , . . . , sn i) + tmax /s1

< cost(Greedy, σ, hs1 , s2 , . . . , sn i). This contradicts the hypothesis that the job of weight t∗ is added by Greedy to a bottleneck machine j (see definition of Greedy in Sect. 2). 2 To upper bound the cost of the solution computed by PTAS-Gc, for a c-greedy-close algorithm Gc, we use the quantity cost(Gc, σ, s, α) defined as cost(Gc, σ, s, α) = cost(Greedy∗ , σ, s) + (1 + c)tmax /α. 9

Lemma 7 For a c-greedy-close algorithm Gc, and for all job sequences σ and speed vectors s = hs1 , · · · , sn i, it holds that cost(Gc, σ, s) ≤ cost(Gc, σ, s, s1 ) and cost(Gc, σ, h0, 0, . . . , 0, sk , sk+1 . . . , sn i, s1 ) ≤ cost(Gc, σ, h0, 0, . . . , 0, sk+1 . . . , sn i, s1 ). Proof. The second part of the lemma is obvious. For the first part we have (1 + c)tmax s1 c · tmax ≥ cost(Greedy, σ, s) + s1 (by Lemma 6)

cost(Gc, σ, s, s1 ) = cost(Greedy∗ , σ, s) +

≥ cost(Gc, σ, s). (by definition of greedy-close)

2 To analyze algorithm PTAS-Gc, we use the following quantity cost(PTAS-Gc, σ, s, h) = opt(σh , s) + cost(Gc, σ \ σh′ , s, s1 ), where h′ is the index of the last job considered in Phase B of PTAS-Gc. By the first part of Lemma 7, we have that cost(PTAS-Gc, σ, s, h) ≥ cost(PTAS-Gc, σ, s, h). The next two lemmas provide an upper bound on cost(PTAS-Gc, σ, s, h). Lemma 8 For any job sequence σ, any h, and any speed vector s n opt(σ, s) X cost(PTAS-Greedy, σ, s, h) ≤ cost(PTAS, σ, s, h) + si (n − 1), h · s1 i=1

!

where n is the length of the speed vector. Proof. If h′ = m then PTAS-Gc computes an optimal schedule and thus the lemma holds. Let us now consider the case h′ < m. Let Li denote the load of machine i at the end of Phase B of algorithm PTAS-Greedy. Also let δi = Lmax − Li , where Lmax = max1≤i≤n Li . Let σ ′ be the sequence obtained from σ by adding, in decreasing weight, n dummy jobs of weight δi · si , i = 1, · · · , n, just after the h′ -th job (notice that some of the dummy jobs have weight zero). Observe that δi · si ≤ th , for 1 ≤ i ≤ n. Let us now look at the execution of PTAS on input σ ′ , s and h. Observe that, since the h largest jobs of σ are still the h largest in σ ′ , they are processed in Phase A. In phase B, PTAS considers first jobs from th+1 to t′h and then the dummy jobs. When PTAS has processed the last dummy job, it ends up with a schedule where all machines have finish time Lmax . Then, PTAS is left with the jobs in σ \ σh′ Therefore, we have that cost(PTAS, σ ′ , s, h) = Lmax + cost(Greedy, σ \ σh′ , s) = opt(σh , s) + cost(Greedy, σ \ σh′ , s) = cost(PTAS-Greedy, σ, s, h). 10

Adding the dummy jobs to σ can increase the cost of any solution by at most P P Since opt(σ, h) ≥ h · t /( n s ), we have that δ · s ≤ opt(σ,s) n s and h

i=1 i

i

i

h

1 s1

i=1 i

Pn

i=1 δi

· si .

cost(PTAS-Greedy, σ, s, h) = cost(PTAS, σ ′ , s, h)

n 1 X δi · si ≤ cost(PTAS, σ, s, h) + s1 i=1

≤ cost(PTAS, σ, s, h) +

n opt(σ, s) X ( si )(n − 1), h · s1 i=1

P where the last inequality is due to the fact that δi si ≤ opth(σ) ni=1 si and that at least one dummy job must have weight (Lmax − Lmax )si = 0. This completes the proof. 2

Next lemma relates algorithm PTAS-Gc and algorithm PTAS-Greedy. Lemma 9 If Gc is c-greedy-close, then for any job sequence σ, any h, and any speed vector s cost(PTAS-Gc, σ, s, h) ≤ cost(PTAS-Greedy, σ, s, h) +

n (1 + c) · opt(σ, s) X si , h · s1 i=1

where n is the length of the speed vector. Proof. We have the following inequalities cost(PTAS-Gc, σ, s, h) − cost(PTAS-Greedy, σ, s, h) = cost(Gc, σ \ σh′ , s, s1 ) − cost(Greedy, σ \ σh′ , s) (1 + c)th ≤ cost(Greedy∗ , σ \ σh′ , s) + − cost(Greedy, σ \ σh′ , s) s1 (1 + c)th ≤ s1 n (1 + c) · opt(σ, s) X ≤ si h · s1 i=1 h . where the last inequality follows from the fact that opt(σ, s) ≥ Ph·t n s

2

i=1 i

Lemma 10 [5] There exists a function f (·), such that for any job sequence σ, any h, and any speed vector s, f (n) cost(PTAS, σ, s, h) ≤ opt(σ, s) 1 + , h+1 where n is the length of the speed vector. We next provide a bound on the cost of PTAS-Gc in terms of opt(σ, s) and sn /s1 . Theorem 11 If Gc is c-greedy-close then, for any job sequence σ, any h, and any speed vector s, f (n) + n2 + c · n sn cost(PTAS-Gc, σ, s, h) ≤ opt(σ, s) 1 + h s1 where n is the length of the speed vector. 11

!

,

Proof. By using Lemmata 8-10 we obtain the following chain of inequalities: cost(PTAS-Gc, σ, s, h) ≤ cost(PTAS-Greedy, σ, s, h) + (1 + c) · (by Lemma 9)

n opt(σ, s) X si h · s1 i=1

n n opt(σ, s) X opt(σ, s) X si (n − 1) + (1 + c) · ≤ cost(PTAS, σ, s, h) + si h · s1 h · s1 i=1 i=1

!

(by Lemma 8)

n n+cX f (n) si + ≤ opt(σ, s) 1 + h + 1 h · s1 i=1

!

f (n) + n2 + c · n sn < opt(σ, s) 1 + h s1

!

(by Lemma 10)

,

where the last inequality follows from si ≤ sn . This completes the proof.

2

The bound given by Theorem 11 is good for small values of sn /s1 . When instead, sn is much larger than s1 it might be convenient to neglect the machine with speed s1 and run instead PTAS-Gc only on the remaining n − 1 machines. In the next theorem, we prove that in this way we can obtain (1 + ǫ) approximation for any value of ǫ > 0. We start with the following technical lemma. Lemma 12 If Gc is greedy-close, then for all σ, h and s = hs1 , s2 , . . . , sn i cost(PTAS-Gc, σ, hs1 , s2 , . . . , sn i, h) ≤ cost(PTAS-Gc, σ, h0, s2 , . . . , sn i, h). Proof. The lemma follows from the definition of cost(PTAS-Gc), from Lemma 7 and from the observation that opt(σh , hs1 , s2 , . . . , sn i) ≤ opt(σh , h0, s2 , . . . , sn i). 2 We are now in a position to prove the main result of this section. Theorem 13 For any positive integer n and for any ǫ > 0, if Gc is a polynomial-time greedy-close algorithm, then there exists an h such that for all σ and for all speed vectors s of length n cost(PTAS-Gc, σ, s, h) ≤ (1 + ǫ)opt(σ, s). Moreover, the running time of PTAS-Gc is polynomial in m = |σ|. Proof. We will prove by induction on n that for any ǫ > 0 there exists an h, depending on ǫ and n only, such that cost(PTAS-Gc, σ, s, h) ≤ (1 + ǫ)opt(σ, s). The base case n = 1 is trivial. If sn /s1 ≤ ǫ, then by Theorem 11, it is possible to pick h = h(n, ǫ) so that cost(PTAS-Gc, σ, s, h) ≤ (1 + ǫ)opt(σ, s). Otherwise, pick ǫ′ such that (1 + ǫ′ )(1 + s1 /sn ) ≤ (1 + ǫ). Then by inductive hypothesis it is possible to choose h′ = h′ (n − 1, ǫ′ ) such that cost(PTAS-Gc, σ, h0, s2 , . . . , sn i, h′ ) ≤ (1 + ǫ′ )opt(σ, h0, s2 , . . . , sn i).

12

Thus we have the following inequalities. cost(PTAS-Gc, σ, hs1 , s2 , . . . , sn i, h′ ) ≤ cost(PTAS-Gc, σ, h0, s2 , . . . , sn i, h′ ) (by Lemma 12) ≤ (1 + ǫ′ )opt(σ, h0, s2 , . . . , sn i) ≤ (1 + ǫ′ )(1 + s1 /sn )opt(σ, hs1 , s2 , . . . , sn i) ≤ (1 + ǫ)opt(σ, s). Finally, the running time is O(nh+2 + m log m + poly(m)).

4.2

2

Approximation analysis of Opt-Gc

We are now ready to give a bound on the approximation ratio guaranteed by Opt-Gc. Theorem 14 For any positive integer n and for any ǫ > 0, if Gc is greedy-close, then there exists an h = h(n, ǫ) such that for all sequences of jobs σ and all speed vectors s of length n, cost(Opt-Gc, σ, s)) ≤ (2 + ǫ)opt(σ, s). Proof. Fix ǫ > 0. Let h = h(n, ǫ) be such that cost(PTAS-Gc, σ, s, h) ≤ (1 + ǫ/2)opt(σ, s) (such an h exists by Theorem 13) and let h′ be the index of the last job scheduled during phase B by algorithm PTAS-Gc on input σ, s, and h. Construct a new job sequence σ ′ from σ by adding, just after th′ , a copy of the jobs from th+1 to th′ . We observe that the set of new jobs, considered independently from the rest of the sequence, can be scheduled in time opt(σ, s) (using the same schedule computed in phase B of PTAS-Gc) and thus opt(σ ′ , s) ≤ 2opt(σ, s). Moreover, we will show that PTAS-Gc on input σ ′ , s and h computes a schedule of cost not less than the schedule computed by Opt-Gc on input σ, s, and h. Then, we have cost(Opt-Gc, σ, s, h) ≤ cost(PTAS-Gc, σ ′ , s, h) ≤ (1 + ǫ/2)opt(σ ′ , s) ≤ (2 + ǫ′ )opt(σ, s) and the theorem follows. It remains to prove that the cost of the schedule produced by PTAS-Gc on input σ ′ , s, and h is not less than the cost of the schedule produced by Opt-Gc on σ, s, and h. Let us start from PTAS-Gc: phase A processes the h largest jobs optimally, phase B processes the jobs th+1 · · · th′ using Greedy, and phase C processes jobs in σ ′ /σh′ ′ = σ/σh using Gc. We remark that at the end of phase B no additional job of weight greater than or equal to th′ can be processed without increasing the makespan of the schedule. Since the (h′ + 1)-st job in σ ′ has weight equal to th+1 ≥ th′ we have that phase B processes only jobs th+1 , · · · , th′ . Algorithm Opt-Gc, instead, consists of only two phases: phase A processes the h largest jobs of σ optimally and phase B processes jobs in σ/σh using Gc. Then, both the first and the last phases of the two algorithms produce the same schedule since they use the same algorithm to process the same sequence of jobs. Moreover, PTAS-Gc has to process another set of jobs during its re-balancing phase. Then we have that cost(Opt-Gc, σ, ~s, h) ≤ cost(PTAS-Gc, σ ′ , ~s, h). 2 13

5

A monotone greedy-close algorithm

In this section we describe a greedy-close algorithm that is monotone for the case of “divisible” speeds (see Def. 20 below). We present our algorithm for the case of integer divisible speeds; this is without loss of generality, as in case the divisible speeds are not integers then they can be scaled to be integers. Let us consider the following algorithm: Algorithm uniform Input: a job sequence σ and speed vector s = hs1 , s2 , · · · , sn i, with s1 ≤ s2 ≤ · · · ≤ sn . 1. run algorithm Greedy on job sequence σ and S =

Pn

i=1 si

identical machines;

2. order the identical machines by nondecreasing load l1 , . . . , lS ; 3. let g := GCD(s1 , s2 , . . . , sn ) and split the identical machines into g blocks B1 , · · · , Bg each consisting of S/g consecutive identical machines. For 1 ≤ i ≤ g and 1 ≤ k ≤ S/g, denote by Bik the k-th identical machine of the i-th block. Thus identical machine Bik has load l(i−1)·g+k−1 . j−1 4. for 1 ≤ j ≤ n let kj = l=1 sl ; then machine j receives the load of identical machines Bikj +1 , · · · , Bikj +sj , for each group 0 ≤ i ≤ S/g − 1;

P

As it is described above, algorithm uniform does not run in polynomial time as its running time depends on S which, in general, is not polynomially bounded in n and m. However, uniform can be easily modified so to obtain the same allocation in polynomial time. Theorem 15 The allocation of uniform can be computed in time O(n · m + m log m). Proof. If S ≤ m then the theorem clearly holds. Otherwise, we have m identical machines with one job each, and all other machines have no jobs. The solution can be computed by considering g ′ = ⌈m/g⌉ blocks of g identical machines each as follows: block g ′ contains machines from lS down to lS−g′ +1 , block g ′ − 1 contains machines lS − g ′ down to lS−2g′ +1 , and so on (the latter block may include some of the machines with no load). Therefore, the Greedy algorithm runs on O(m) identical machines and Steps 1-2 take O(m log m) time, while the remaining steps require time O(n · m). 2 Because of the above result, in the sequel we improperly denote by uniform the efficient version of the algorithm computing the same allocation within O(n · m + m log m) time.

5.1

Approximation analysis of uniform

Let us denote by wji the work of the sj identical machines from block Bi whose loads are assigned to machine j. Then we have sj /g

wji

=

X

l(i−1)·S/g+k/g+l .

l=1

Theorem 16 For any job sequence σ and any integer speed vector s = hs1 , s2 , . . . , sn i it holds that cost(uniform, σ, s) ≤ opt(σ, s) + tmax /g, where g = GCD(s1 , s2 , . . . , sn ). We next state some useful properties on wji . 14

Claim 17 For 1 ≤ i ≤ g, and for all 1 ≤ j < k ≤ n, wji /sj ≤ wki /sk . Proof. Since the identical machines are ordered and agent k receives machines of index greater than those assigned to j the following holds: then the most loaded machine assigned to j has work not greater than the least loaded machine assigned to k. As wji /sj is the average load of the identical machines in block i assigned j, the claim follows. 2 Claim 18 For any 1 ≤ i ≤ g − 1,

wji+1 sj

≥

i wn sn .

Proof. We observe that the least loaded machine in block i + 1 has load not smaller than the most loaded machine in block i. The rest of the proof is similar to that of Claim 17. 2 Claim 19 For any 1 ≤ i ≤ g, wj1 ≥

sj i sn wn

−

sj g tmax .

Proof. Let first observe that is must hold lS − l1 ≤ tmax , since otherwise it would be possible to move one job from the most loaded machine to the least loaded one and reduce the maximum load. This contradicts the definition of Greedy algorithm which is used in uniform as a subroutine. Since machine j receives the load of sj /g identical machines, it holds that sgj wj1 ≥ l1 . Similarly, for any block i, lS ≥ sgn wni . Putting things together we obtain wj1

sj sj ≥ l1 ≥ (lS − tmax ) ≥ g g

sj sj sj g i wn − tmax = wni − tmax . sn g sn g

2 Proof. (of Theorem 16) From Claims 18 and 19 we obtain the following two inequalities, respectively: wj =

g X

wji = wj1 +

i=1

Since opt(σ) ≥

g X

wji ≥ wj1 +

i=2

Pn

j=1 wj /S,

g−1 X i=1

g−1

X sj sj sj sj sj sj i wn ≥ wng − tmax + wni = wn − tmax . (2) sn sn g s s g n i=1 n

Eq. (2) implies

tmax wn /sn − opt(σ) ≤ wn /sn + g

Pn

j=1 sj

S

−

wn sn

Pn

j=1 sj

S

=

tmax . g

From Claim 17 we obtain wn /sn =

g X

wni /sn ≥

g X

wji /sj = wj /sj ,

i=1

i=1

thus implying that cost(uniform, σ, s) = wn /sn ≤ opt(σ) + tmax /g. This completes the proof. 2 When s1 divides all si s, we have that g = s1 and the uniform algorithm is greedy-close. We then define sequences of speeds that enjoy this property, which will be used below to prove the monotonicity of uniform. Definition 20 (divisible speeds) Let C = {c1 , c2 , . . . , cp , . . .}, with the property that ci divides ci+1 . Then a speed vector s = hs1 , s2 , . . . , sn i is divisible if s ∈ C n . The restriction to divisible speeds denotes the problem version in which the set C is known to the algorithm and all declared speeds must be in C. 15

We remark that if the algorithm knows C the set of agent strategies is restricted. Indeed, if an agent declares a speed s′ 6∈ C, the algorithm can simply consider this speed as the largest ci ∈ C with ci ≤ s′ . This is equivalent to the assumption that agents always declare a speed in C. We thus have following theorem. Theorem 21 Algorithm uniform is greedy-close when restricted to divisible speeds.

5.2

Algorithm uniform is monotone

In order to prove the monotonicity of algorithm uniform we first prove some technical results on greedy allocations on identical machines. Lemma 22 Let Li (respectively, li ) denote the load of the i-th least loaded machine when Greedy uses N (respectively, N + 1) identical machines. It then holds that li+1 ≤ Li , for all 1 ≤ i ≤ N . Proof. We show that the lemma holds at any iteration of Greedy after reordering the loads of the identical machines. Let l1 , l2 , . . . , lN +1 and L1 , L2 , . . . , LN satisfy lk+1 ≤ Lk , for all 1 ≤ k ≤ N . When a new job of weight w occurs, it will be assigned to the least loaded machine, thus yielding the sequences L2 , L3 , . . . , Li , L1 + w, Li+1 , . . . , LN and l2 , l3 , . . . , lj , l1 + w, lj+1 , . . . , lN +1 , for some ′ ≤ L′k , where lk′ and L′k i and j. We have to prove that these two new sequences also satisfy lk+1 denote the k-th element in the new sequence of length N and N + 1, respectively. We distinguish three cases: (j < i.) For 1 ≤ k ≤ j − 1, lk′ = lk+1 ≤ Lk = L′k−1 . Similarly, for i + 2 ≤ k ≤ N + 1, ′ . lk′ = lk ≤ Lk−1 = L′k−1 . It thus remains to check the elements from lj′ = l1 + w up to li+1 ′ ′ ′ First observe that lj ≤ lj+1 = lj+1 ≤ Lj = Lj−1 , where the last equality follows from j < i. Moreover, for j + 1 ≤ k ≤ i, we have that lk′ = lk ≤ Lk−1 = L′k−2 ≤ Lk−1 , where the last ′ inequality follows from k ≤ i. Finally, li+1 = li+1 ≤ Li = L′i−1 ≤ L′i . (j > i.) For 1 ≤ k ≤ i, lk′ = lk+1 ≤ Lk = L′k−1 . Similarly, for j + 1 ≤ k ≤ N + 1, lk′ = lk ≤ ′ up to lj′ = l1 + w. First observe Lk−1 = L′k−1 . It remains to check the elements from li+1 ′ ′ ′ that, since i + 1 ≤ j, li+1 ≤ lj = l1 + w ≤ L1 + w = Li . Similarly, for i + 2 ≤ k ≤ j, we have that lk′ ≤ lj′ ≤ L′i ≤ L′k−1 . (j = i.) For 1 ≤ k ≤ i − 1, lk′ = lk+1 ≤ Lk = L′k−1 . Similarly, for i + 1 ≤ k ≤ N + 1, lk′ = lk ≤ Lk−1 = L′k−1 . Finally, li′ = l1 + w ≤ li+1 ≤ Li = L′i−1 . This completes the proof.

2

Lemma 23 Let Li (respectively, li ) denote the load of the i-th least loaded machine when Greedy uses N (respectively, N ′ > N ) identical machines. It holds that Li ≤ li + li+1 , for 1 ≤ i ≤ N . Proof. We prove the lemma for N ′ = N + 1 since this implies the same result for any N ′ > N . PN PN Pi−1 P and For any 1 ≤ i ≤ N , Lemma 22 yields i−1 k=i+1 lk+1 , k=i+1 Lk ≥ k=1 lk k=1 Lk ≥ thus implying Li ≤ li + li+1 . 2 Lemma 24 Let Li (respectively, li ) denote the load of the i-th least loaded machine when Greedy uses N (respectively, N ′ > N ) identical machines. For any a, b, b′ such that N − b ≤ N ′ − b′ it holds that ′ W (a, b) :=

b X

Li ≥ W ′ (a + b′ − b, b′ ) :=

i=a

b X

i=a+b′ −b

16

li .

Proof. Let d = N ′ − N . By repeatedly applying Lemma 22 we obtain Li ≥ li+d , for 1 ≤ i ≤ N . Since b′ − b ≤ N ′ − N = d, it holds that Li ≥ li+d ≥ li+b′ −b , for 1 ≤ i ≤ N . This easily implies the lemma. 2 We are now in a position to prove the monotonicity of uniform. The intuition is to show that, if an agent increases her speed, then the overall work assigned to the other agents cannot decrease. Theorem 25 Algorithm uniform is monotone when restricted to divisible speeds. Proof. Let us consider some machine j and, fixing the speeds of all other machines, let wj (x) P denote the work assigned to her on declared speed x. Also let w−j (x) := ( m i=1 ti ) − wj (x). We ′ ′ ′ will show that, for all sj < sj , w−j (sj ) ≤ w−j (sj ), thus implying wj (sj ) ≥ wj (sj ). Since speeds are divisible, then the minimal increment for speed sj ∈ C is s′j = 2sj . We prove the monotonicity for this case only, since this implies any increment from sj ∈ C to s′j ∈ C. First observe that GCD(s1 , s2 , . . . , sj , . . . , sn ) = s1 . We can then distinguish two cases: (j 6= 1.) We first consider the case s′j is still the j-th smallest speed, i.e., s′j ≤ sj+1 . Let us observe that, in each block, every machine other than j receives the work of the same number of identical machines as in the assignment with sj . In particular, machines 1, 2, . . . , j − 1 Pj−1 (respectively, j + 1, j + 2, . . . , n) receive the first k = l=1 sl /g (respectively, last q = Pn s /g) identical machines from each block. Let us denote by B(i; x) the i-th block l=j+1 l of the allocation with speed of machine j equal to x. Let a (respectively, a′ ) denote the index of the first identical machine in block B(i; sj ) (respectively, B(i; s′j )). Since, for every 1 ≤ i ≤ g, B(i; sj ) and B(i; s′j ) contain S/g and S ′ /g identical machines, respectively, we have a′ = (i−1)S ′ /g +1 and (i−1)S/g +1 = a. So, a′ −a = (i−1)S ′ /g −(i−1)S/g ≤ S ′ −S. We can thus apply Lemma 24 with b = a + k − 1 and b′ = a′ + k − 1 to show that the work of the k least loaded machines of block g(sj ) is at least as the work of the k least loaded machines in block g(s′j ): W (a, a + k − 1) =

a+k−1 X

′

′

′

Li ≥ W (a , a + k − 1) =

i=a

a′ +k−1 X

li .

i=a′

Similarly, let b (respectively, b′ ) denote the index of the last identical machine in block B(i; sj ) (respectively, B(i; s′j )). Clearly b = iS/g and b′ = iS ′ /g, thus implying b′ − b = i(S ′ − S)/g ≤ S ′ − S. Lemma 24 implies that the overall work of the last q machines in B(i; sj ) is not smaller than the work of the last q machines in B(i; s′j ). In turn, in block B(i; s′j ), the overall work of all machines other than j is at most the corresponding work in block B(i; sj ). Since this holds for all 1 ≤ i ≤ g, w−j (s′j ) ≤ w−j (sj ). Finally, if s′j > sj , then machine j, on speed s′j , receives (the load of) an interval of s′j /g identical machines in block i which have indexes larger than the indexes it would receive assuming s′j being the j-th smallest speed. Since the latter case has been considered before, also in his case w−j (s′j ) ≤ w−j (sj ). (j = 1.) If s′1 is still the smallest speed, than let g ′ := GCD(s′1 , s2 , . . . , sn ) = s′1 . Machine 1 receives the load of the first identical machine from each block. For 1 ≤ i ≤ g, the index of the first machine in block i, is b(i) := (i − 1) Sg + 1 when considering speed s1 , and

17

′

b′ (i) := (i − 1) Sg′ + 1 when considering speed s′1 . Since s′1 = 2s1 , we have S ′ = S + s1 and S ′ /g ′ = S/2s1 + 1/2, thus implying b′ (2i − 1) − b(i) = 2(i − 1)

S + s1 S − (i − 1) = i − 1 ≥ 0. 2s1 s1

From Lemma 23 we obtain Lb(i) ≤ lb(i) + lb(i)+1 ≤ lb′ (2i−1) + lb′ (2i) . Finally, w1 (s1 ) =

g X i=1

Lb(i) ≤

g X

lb′ (2i−1) + lb′ (2i) =

2g X

lb′ (i) = w1 (s′1 ).

(3)

i=1

i=1

This completes the proof if s′1 is the smallest speed. Otherwise, we consider the increase of speed from s1 to s′1 in two steps: (i) from s1 to s2 with machine 1 being the first machine also with the new speed, and (ii) increase the speed of machine 2 from s2 up to s′1 . Formally, let us fix the order of the machines according to speeds si , that is, machine k corresponds to the machine of speed sk . Let wj (x1 , x2 , . . . , xn ) denote the work of machine j w.r.t. machine speeds hx1 , x2 , . . . , xn i, with xk being the speed of machine k. W.l.o.g. let us assume machine 1 is still considered the slowest machine by uniform (otherwise, we simply reason on the machine with speed equal to s2 that the algorithm consider the slowest). From Eq. (3) we obtain w1 (s1 ) ≤ w1 (s2 ) = w1 (s2 , s2 , s3 , . . . , sn ) ≤ w2 (s2 , s2 , s3 , . . . , sn ) ≤

w2 (s2 , s′1 , s3 , . . . , sn )

=

w1 (s′1 , s2 , s3 , . . . , sn )

=

w1 (s′1 ),

(4) (5)

where the second-last inequality follows from the hypothesis that machine 1 is still considered the slowest machine when declaring speed s′′1 = s2 , and the last inequality follows from Case (j 6= 1) above. This completes the proof.

2

One might think that simpler algorithms can achieve simultaneously monotonicity and the approximation ratio as in Definition 4. Consider algorithm Gc1 using S identical machines organized on a single block, i.e., machine j receives the work of S/sj contiguous identical machines. In this case, the cost of this solution is bounded above by cost(Greedy, σ, s) + tmax , which does not suffice when speeds are not bounded from above (see proof of Theorem 11). The same negative result also applies to the monotone algorithm employing a single block of S/s1 identical machines.

6

Monotone greedy-close algorithm for constant speeds

In this section we consider a simpler algorithm, which we call single-block that is monotone and greedy-close for the case in which the speeds of the machines are upper-bounded by a constant M. Algorithm single-block Input: a job sequence σ and speed vector s = hs1 , s2 , · · · , sn i, with s1 ≤ s2 ≤ · · · ≤ sn . 1. run algorithm Greedy on job sequence σ and S =

Pn

i=1 si

identical machines;

2. order the identical machines by nondecreasing load l1 , . . . , lS ; 18

3. for j = 1 to n do set kj =

Pj−1 l=1

sl ;

assign jobs of identical machines of indexes k + 1, k + 2, . . . , k + sj to machine j; We have the following theorem. Theorem 26 If the speeds of the machines are upper-bounded by a constant M , then algorithm single-block is M -greedy close. Proof. Observe that (the allocation of) algorithm single-block coincides with (the allocation of) uniform when setting g := 1 instead of g := GCD(s1 , . . . , sn ). Therefore, by Theorem 16, we have that M tmax cost(single-block, σ, s) ≤ opt(σ, s) + tmax ≤ opt(σ, s) + s1 and thus single-block is M -greedy close. 2 Let us now argue about the monotonicity of single-block. Theorem 27 Algorithm single-block is monotone. Proof. Let us consider some machine j and, fixing the speeds of all other machines, let wj (x) P denote the work assigned to machine j when she declares speed x. Also let w−j (x) := ( m i=1 ti ) − wj (x). We will show that, for all sj < s′j , w−j (s′j ) ≤ w−j (sj ), thus implying wj (s′j ) ≥ wj (sj ). Let us observe that, in the assignment with sj , every machine other than j receives the work of the same number of identical machines as in the assignment with s′j . In particular, machines Pj−1 1, 2, . . . , j − 1 (respectively, j + 1, j + 2, . . . , n) receive the first k = l=1 sl (respectively, last Pn q = l=j+1 sl ) identical machines. We can thus apply Lemma 24 with a = 1 and b = b′ = k thus obtaining W (1, k) =

k X

Li ≥ W ′ (1, k) =

k X

li .

i=1

i=1

where W (·, ·) is defined as in Lemma 24. P Let S = ni=1 si and S ′ = S − sj + s′j . Similarly, from Lemma 24 we obtain W (S − q + 1, S) =

S X

′

′

′

′

Li ≥ W (S − q + 1, S ) =

i=S−q+1

S X

li .

i=S ′ −q+1

The two above inequalities easily imply w−j (s′j ) ≤ w−j (sj ). 2

7

Polynomial-time mechanisms

Computing the payments.

We make use of the following result:

Theorem 28 ([2]) A decreasing output function admits a truthful payment scheme satisfying R voluntary participation if and only if 0∞ bi wi (b−i , u)du < ∞ for all i,b−i . In this case, we can take the payments to be Pi (b−i , bi ) = bi wi (b−i , bi ) + 19

Z

0

∞

bi wi (b−i , u)du.

(6)

We next show how to compute the payments in Eq. (6) in polynomial time when the work curve corresponds to the allocation of PTAS-uniform. Theorem 29 Let A be a polynomial-time r–approximation algorithm. It is possible to compute the payment functions in Equation (6) in time poly(n, m) when (i) all speeds are integer not greater than some constant M , and (ii) speeds are divisible. Proof. Observe that since A is an r–approximation algorithm there exists a value S ≤ r · S, P where S = ni=1 si , such that on input (s−i , S), the algorithm assigns all jobs to machine i. Then, in order to compute the work curve of machine i we have only to consider speed values in the interval [0, S]. Since A runs in polynomial time, if speeds are integer, it is always possible to compute the work curve within time O(S · poly(n, m)). When all speeds are not larger than M , we have that S ∈ O(n · M ) and the first part of the theorem follows. Suppose now that speeds are divisible. In this case all the speeds belong to the interval [2−l , 2l ], where l is the length in bits of the input. Then, there are O(log 2l ) distinct speed values that machine i can take. So, the computation of the work curve takes O(l·poly(n, m)) = O(poly(n, m)). 2 Truthful approximation mechanisms. Theorem 30 There exists a truthful polynomial-time (2+ǫ)-approximation mechanism for Q||Cmax restricted to integer speeds bounded above by some constant M . Moreover, the mechanism satisfies voluntary participation and the payments can be computed in polynomial time. Proof. Consider algorithm single-block. By Theorem 26 and 27 algorithm single-block is monotone and greedy-close when machines have constant speeds. Then, by Theorem 13 there exists a (2 + ǫ)-approximation mechanism, and by Theorem 29 it is possible to compute the payments in polynomial time. Moreover, by Theorem 28 it follows that the mechanism satisfy voluntary participation. 2 Theorem 31 There exists a truthful polynomial-time (2+ǫ)-approximation mechanism for Q||Cmax restricted to divisible speeds. Moreover, the mechanism satisfies voluntary participation and the payments can be computed in polynomial time. Proof. By Theorem 21 and 25 it follows that in this case uniform is greedy-close and monotone. Thus, by Theorem 14 there exists a (2 + ǫ)-approximation mechanism, and by Theorem 29 it is possible to compute the payments in polynomial time. Moreover, by Theorem 28 it follows that the mechanism satisfy voluntary participation. 2 Theorem 32 For every ǫ > 0, there exists a truthful polynomial-time (4+ǫ)-approximation mechanism for Q||Cmax . Moreover, the mechanism satisfies voluntary participation and the payments can be computed in polynomial time. Proof. Let us modify our PTAS-uniform by rounding every input speed si to c(si ) = ⌊log s ⌋ i 2 2 . Let s = (s1 , s2 , · · · , sn ) be the vector of speeds declared by the machines and let c(s) = (c(s1 ), c(s2 ), · · · , c(sn )) be the vector of the rounded speeds. Observe that if s′i > si then c(s′i ) ≥ c(si ). Let wi (s) denote the work of machine i according to PTAS-uniform(σ, s). The work assigned to machine i according to the modified speeds is wi (c(s)). Since by Theorems 21 and 25 PTAS-uniform 20

is monotone on divisible speeds, wi (c(s1 ), · · · , c(si ), · · · , c(sn )) ≤ wi (c(s1 ), · · · , c(s′i ), · · · , c(sn )), for all s′i > si . So, if machine i increases her speed from si to s′i she cannot decrease her work and the modified PTAS-uniform remains monotone. As for the approximability, we observe that c(si ) > si /2 and then opt(σ, (c(s)) ≤ 2opt(σ, (s)). Theorem 31 implies cost(PTAS-uniform, σ, s) = cost(PTAS-uniform, σ, c(s)) ≤ (2 + ǫ)opt(σ, c(s)) ≤ 2(2 + ǫ)opt(σ, s). (7) Finally, the payment functions can be computed as in the case of speeds that are powers of 2. 2

8

Conclusions and open problems

We have shown a general technique to transform the standard PTAS for Q||Cmax on a constant number of machines into a monotone approximation algorithm. By the results in [2] the existence of such a monotone algorithm implies the existence of a truthful mechanism that, for every ǫ > 0, computes in polynomial time a (2 + ǫ)-approximate solution. We use our technique to an approximation algorithm that is monotone for the case of (i) speeds bounded above by some constant M and (ii) divisible speeds. We have also proved that the latter result implies the existence of a (4 + ǫ)-approximate truthful mechanism for the general case. All of our mechanisms satisfy voluntary participation and compute the payments in polynomial time. The main problem left open by this paper is to prove if there exists better truthful approximate mechanisms.

References [1] A. Archer, C. Papadimitriou, K. Talwar, and E. Tardos. An approximate truthful mechanism for combinatorial auctions with single parameter agents. In Proc. of the 14th SODA, 2003. [2] A. Archer and E. Tardos. Truthful mechanisms for one-parameter agents. In Proc. of FOCS, pages 482–491, 2001. [3] E.H. Clarke. Multipart Pricing of Public Goods. Public Choice, pages 17–33, 1971. [4] M.R. Garey and D.S. Johnson. Computers and intractability: a guide to the theory of NP-completeness. Freeman, 1979. [5] R. L. Graham. Bounds for certain multiprocessing anomalies. Bell System Tech. Journal, 45:1563– 1581, 1966. [6] T. Groves. Incentive in Teams. Econometrica, 41:617–631, 1973. [7] D.S. Hochbaum and D.B. Shmoys. Using dual approximation algorithms for scheduling problems: theoretical and practical results. Journal of the ACM, 34:144–162, 1987. [8] E. Horowitz and E.H. Sahni. Exact and approximate algorithms for scheduling nonidentical processors. Journal of ACM, 23:317–327, 1976. [9] N. Nisan and A. Ronen. Algorithmic Mechanism Design. In Proc. of the 31st STOC, pages 129–140, 1999. [10] N. Nisan and A. Ronen. Computationally Feasible VCG Mechanisms. In Proceedings of the 2nd ACM Conference on Electronic Commerce (EC), pages 242–252, 2000. [11] C. H. Papadimitriou. Algorithms, Games, and the Internet. In Proc. of the 33rd STOC, 2001.

21

[12] A. Ronen. Solving Optimization Problems Among Selfish Agents. PhD thesis, Hebrew University in Jerusalem, 2000. [13] W. Vickrey. Counterspeculation, Auctions and Competitive Sealed Tenders. Journal of Finance, pages 8–37, 1961.

A

Existing approximation algorithms are not monotone even for divisible speeds

In this section we discuss why existing approximation algorithms for Q||Cmax cannot be used in the context of selfish agents even if speeds are limited to be divisible speeds. In particular, we consider the case of two machines and show that, even in this restricted setting, (natural variants of) existing algorithms are not monotone. Some of these results are already mentioned in [2], and we present them here more in detail only for the sake of completeness. In particular, we extend the analysis of the Greedy algorithm and provide some evidence that monotonicity requires that jobs be processed in non-increasing order and machine speeds be divisible. Although we are not able to prove that these two conditions are sufficient to guarantee the monotonicity of Greedy, in Section 5 we describe algorithm uniform that is monotone for this particular case. Greedy algorithm [5]. Let us first consider the non ordered sequence of jobs σ = 1, ǫ, 1, 2 − 3ǫ, for 0 < ǫ < 1/3, to be allocated on two machines. The following table shows how Greedy allocates the jobs with respect to different speeds. speeds (s1 , s2 ) (1, 1) (1, 2)

machine 1 ǫ, 1 ǫ, 2 − 3ǫ

machine 2 1,2 − 3ǫ 1,1

Clearly, machine 2 gets more work when declaring speed s2 = 1 than declaring speed s2 = 2, thus implying the non-monotonicity of Greedy when processing jobs without ordering them. We now consider ordered jobs but arbitrary speeds. First, consider jobs 3, 2, 2 and two machines with speeds (1, s2 ). For any 1 < s2 < 5/4, Greedy will assign both jobs of weight 2 to the slower machine, and thus this machine gets more load than the faster one. As for the case of jobs processed in non decreasing order, consider weights 1, 2, 5/2 and speeds (1, 4/3). It is easy to see that machine 1 will get jobs of weight 1 and 5/2. Observe that, in both cases we are using speeds whose ratio is smaller than 2. Rounding the instance [8]. A FPTAS for Q||Cmax is based on the following simple idea: (i) truncate a certain number of less significant digits in the input t1 , t2 , . . . , tm ; (b) optimally solve the truncated instance using the pseudo-polynomial time algorithm for the problem. The first problem with this approach is again the fact that, for some instances, the slowest machine may get more work than the fastest one. Indeed, consider three jobs of weight 19 and three jobs of weight 10. When truncating the least significant digit, all jobs have weight 10. Mapping back the solution to the original weights yields the following assignments: speeds (s1 , s2 ) (1, 1) (1, 2)

machine 1 10, 10, 10 19, 19 22

machine 2 19, 19, 19 10, 10, 10, 19

Randomized rounding [7]. Randomized rounding is not monotone and the modified randomized rounding in [2] guarantees only the expected work to be monotone. Both these negative results are already mentioned in [2] and apply to the case of identical speeds.

23

Roberto De Prisco

Paolo Penna

Pino Persiano

Abstract We consider the problem of scheduling jobs on parallel related machines owned by selfish agents. We provide deterministic polynomial-time (2 + ǫ)-approximation algorithms and suitable payment functions that yield truthful mechanisms for several NP-hard restrictions of this problem. Up to our knowledge, our result is the first non-trivial polynomial-time deterministic truthful mechanism for this NP-hard problem. Our result also yields a family of deterministic polynomial-time truthful (4 + ǫ)-approximation mechanisms for any fixed number of machines. The only previously-known mechanism for this problem (proposed by Archer and Tardos [FOCS 2001]) is randomized and truthful under a weaker notion of truthfulness. To obtain our results we introduce a technique to transform the PTAS by Graham into a deterministic truthful mechanism.

1

Introduction

The Internet is a complex distributed system where a multitude of heterogeneous entities (e.g., providers, autonomous systems, universities, private companies, etc.) offer, use, and even compete with each other for resources. Resource allocation is a fundamental issue for the efficiency of a complex system. Several efficient distributed protocols have been designed for resource allocation. The underlying assumption is that the entities running the protocol are trustworthy; that is, they behave as prescribed by the protocol. This assumption is unrealistic in some settings as the entities owning the resources might try to manipulate the system in order to get some advantages by reporting false information. For example, a router of an autonomous system can report false link status trying to redirect traffic through another autonomous system. At a different level, Internet users have the freedom of changing the parameters of their TCP/IP stack to get a better throughput. Newly emerging Internet applications ignore TCP altogether and use UDP (or other ad-hoc transport protocols) so that they do not have to worry about the TCP congestion control mechanisms that throttles the bandwidth when the Internet is jammed. With false information even the most efficient protocol may lead to unreasonable solutions if it is not designed to cope with the selfish behavior of the single entities. Thus, new models are needed to analyze and redesign the protocols used to construct complex systems based on a multitude of selfish agents. The field of mechanism design provides an elegant theory to deal with this kind of problems. The main idea of this theory is to pay the agents to convince them to perform strategies that help the system to optimize a global objective function. A mechanism M = (A, P ) is a combination ∗

Dipartimento di Informatica ed Applicazioni “R.M. Capocelli”, Universit` a di Salerno, via S. Allende 2, I-84081 Baronissi (SA), Italy. E-mail: {auletta, robdep, penna, giuper}@dia.unisa.it.

1

of two elements: an algorithm A computing a solution and a payment function P specifying the amount of “money” the mechanism should pay to each entity. Informally speaking, each agent i has a valuation function that associates to each solution X some value vi (X) and the mechanism pays i an amount P i (X, ri ) based on the solution X and on the reported information ri . A truthful mechanism is a mechanism such that the payments guarantee that, when X = X(ri ) is the solution computed by the mechanism, ui := pi (X, ri ) + vi (X) is maximized for ri equal to the true information (see Sect. 2 for a formal definition). In other words, agent i has no incentive to lie. Recently, mechanism design has been applied to several optimization problems arising in computer science, networking and algorithmic questions related to the Internet (see [11] for a survey). Several basic problems (e.g., shortest path, minimum spanning tree, etc.) have been (re-) considered in the context of selfish agents [9, 12]: a network graph is composed by a set of weighted edges, each of them owned by an agent that privately knows the corresponding weight; this weight measures the cost for the corresponding agent in being chosen in the solution (e.g., the cost of forwarding traffic over a link). In the seminal papers by Nisan and Ronen [9, 10] (see also [12]) it is first pointed out that classical results in mechanism design theory, originated from micro economics and game theory, do not completely fit in a context where computational issues play a crucial role [10]. Also, optimization problems arise (job scheduling being one of them), for which classical existing results do not apply and new techniques are needed [9, 12]. The importance of mechanism design for job scheduling problems is twofold. On one hand, it is the first problem for which new techniques for designing truthful mechanisms have been introduced [2, 9, 12]. On the other hand, it is a basic problem which models important features of different allocation problems and routing in communication networks. The main purpose of this paper is to provide polynomial-time approximation truthful mechanisms for the problem of scheduling jobs on parallel related machines (Q||Cmax ). The existence of efficient truthful mechanisms provides a better understanding of the loss of performance due to the interplay between the lack of cooperation (i.e., selfish agents) and the limited computational resources (i.e., the approximability of NP-hard problems). As we discuss below, addressing these issues together has been done only partially before.

1.1

Previous Work

Truthful VCG mechanisms. The theory of mechanism design dates back to the seminal papers by Vickrey [13], Clarke [3] and Groves [6]. Their celebrated VCG mechanism is still the prominent technique to derive truthful mechanisms for many problems (e.g., shortest path, minimum spanning tree, etc.). In particular, when applied to combinatorial optimization problems (see e.g., [9, 12]), the VCG mechanisms guarantee the truthfulness under the hypothesis that the optimization function is utilitarian 1 and the mechanism is able to compute the optimum. On the other hand, there is no restriction on the agents’ valuation functions. Feasible mechanism design. Since for many optimization problems, like scheduling, it is not possible to compute the optimum in polynomial time, unless P = NP, [10] focuses on the truthfulness of so called VCG-based mechanisms, that is, mechanisms obtained by replacing the 1 A maximization problem is utilitarian if the optimization can be written as the sum of the agents’ valuation functions.

2

exact algorithm with an approximation one. The authors show that sub-optimal solutions do not work because a false declaration may improve the computed solution and therefore the utility of the agent. Non utilitarian problems and scheduling. The second limit of VCG-based mechanisms is the fact that they only apply to utilitarian problems. Task scheduling is not monotone since we aim at minimizing the maximum over all machines of their completion times. Nisan and Ronen [9, 12] first considered the unrelated machines case and provide an n-approximation truthful mechanism for it (n is the number of machines). Rather surprisingly, this mechanism is optimal for the case of n = 2 and i For the case n > 2 [9, 12] prove that a wide class of “natural” mechanisms cannot achieve a factor better than n, if we require the truthfulness. Finally, for n = 2 [9, 12] give a randomized 7/4-approximation mechanism. A simpler variant of task scheduling has been tackled in [2]: the related machines case (in short Q||Cmax ). This problem version makes possible to express the agent valuations as the product of the work assigned to the corresponding machine times a parameter (namely, the inverse of speed). The fundamental step made in [2] is to characterize those algorithms which can be turned into a truthful mechanism. Their beautiful result brings us back to “pure algorithmic problems” as all we need is to find a good algorithm for the original problem which also satisfies the additional monotonicity requirement: increasing the speed of exactly one machine does not make the algorithm decrease the work assigned to that machine (see Sect. 2 for a formal definition). Indeed, they prove that: • If A is monotone, then there exits a payment function P A such that M = (A, P A ) is truthful. • A mechanism M = (A, P ) is truthful if and only if A is monotone. Moreover, they provide a closed formula for P A , which depends on A only (see Theorem 28 below). The authors then showed that the algorithm computing the (lexicographically minimal) optimal solution is monotone, thus implying that this problem admits a truthful mechanism computing the optimum. As this mechanism is not feasible, unless P = NP, they also provide a randomized 3-approximation mechanism for this task scheduling version. As we also discuss in the next paragraph, in this case, the mechanism is truthful in expectation, a weaker notion of truthfulness. Weaker notions of truthfulness. There is a significant difference between the definition of truthfulness used in [9, 12] and the one used in [2]. Indeed, the randomized 7/4-approximation algorithm in [9, 12] yields a truthful dominant strategy for any possible random choice of the algorithm. A randomized mechanism M can be seen as a probability distribution over deterministic mechanisms: an element x is selected randomly and the corresponding mechanism Mx is used. So, the mechanism in [9, 12] is truthful for every fixed x. In [2], instead, the notion of utility is replaced by the expected utility one: even though the expected utility is maximized when telling the truth, for some x there might exist a better (untruthful!) strategy. This idea is pushed further in [1] where one parameter agents are considered for the problem of combinatorial auction. In this work, truthfulness is achieved w.r.t. expected utility and with high probability, that is, the probability that an untruthful declaration improves the agent utility is arbitrarily small.

3

1.2

Our contribution

It is natural to ask whether some problems require some relaxation on the definition of truthfulness in order to achieve polynomial-time approximation mechanisms. In other words, whether it is necessary to make milder assumptions on the “selfishness” of the agents in order to attain good solutions in polynomial time. In this paper we investigate the existence of truthful polynomial-time approximation mechanisms for Q||Cmax , while maintaining the strongest definition of truthfulness: truth-telling is a dominant strategy over all possible strategies of an agent. We first show that, for any fixed number of machines, Q||Cmax admits a deterministic truthful (2 + ǫ)-approximation mechanism if there exists a monotone allocation algorithm Gc whose cost is within an additive factor of O(tmax /s1 ) from the cost of Greedy, where tmax is the largest job weight and s1 is the smallest machine speed (see Sect. 4). Our result is a modification of the classical PTAS [5] based on the computation of an optimal schedule on the h largest jobs, with the remaining smaller jobs scheduled by the Greedy algorithm. Notice that this PTAS cannot be used to construct a truthful mechanism because Greedy is not monotone and the allocation produced by the combination of the two algorithms (the optimal and the greedy one) is also not monotone. Our technical contribution here is the analysis of a new algorithm obtained by combining the optimal algorithm and Gc, that preserves the monotonicity and whose cost is within a factor of 2 of the cost of the PTAS. Armed with this result, we turn our attention to the existence of such a monotone algorithm Gc. We provide a new greedy algorithm which achieves monotonicity and the above mentioned bound w.r.t. Greedy for the following versions of the problem (see Sect. 5): • speeds are integer and the largest speed is bounded from above by a constant; • speeds are divisible, that is, they belong to a set C = {c1 , c2 , . . . , cp , . . .} such that for each i, ci+1 is a multiple of ci . Thus, for both these cases, we obtain a family of deterministic truthful (2 + ǫ)-approximation mechanisms (see Sect. 7). Observe that all such restrictions remain NP-hard even for two machines: indeed the identical speeds version is reducible from Partition [4]. Up to our knowledge, this is the first result in which approximate solutions yield truthful mechanisms, where truthfulness is defined in the strongest sense. The mechanism presented in [2] guarantees a 3-approximation, but, as pointed out earlier, the mechanism presented is randomized and is truthful on average. Although our new algorithm is relatively simple, its analysis, in terms of monotonicity and approximability, is far from trivial and goes through several properties of greedy allocations on identical machines. We emphasize that the importance of an approximating mechanism for the case of divisible speeds is both practical and theoretical. Indeed, on one hand, in many practical applications “speeds” are not arbitrary but they are taken from a pre-determined set of “types”, yielding values that are multiple with each other. Moreover, this result implies the existence, for any fixed number of machines, of deterministic truthful (4 + ǫ)-approximate mechanisms for the case of arbitrary speeds, for any ǫ > 0. The reader may wonder whether assuming divisible speeds does make the problem of designing truthful mechanisms much simpler. We show that this is not the case as, also in this restriction, existing/natural approximation algorithms are not monotone, thus not suitable for truthful mechanisms (see Appendix A). 4

Furthermore, our mechanisms are able to compute the payments within polynomial time (see Sect. 7), which is a property that cannot be directly derived from the results in [2]. Finally, our mechanisms satisfy voluntary participation, meaning that a truthfully behaving agent never incurs in a loss (i.e., negative utility).

2

Preliminaries

We consider the problem of scheduling on related parallel machines (Q||Cmax ). We are given the speed vector s = hs1 , s2 , . . . , sn i, with s1 ≤ s2 ≤ · · · ≤ sn , of the of n machines and a job sequence with weights σ = (t1 , t2 , . . . , tm ). In the sequel we simply denote the i-th job with its weight ti . The largest job weight in σ is denoted by tmax . A schedule is a mapping that associates each job to a machine. The amount of time to complete job j on machine i is tj /si . The work of machine i, denoted as wi , is given by the sum of the weights of the jobs assigned to i. The load (or finish time) of machine i is given by wi /si . The cost of a schedule is the maximum load over all machines, that is, its makespan. Given an algorithm A for Q||Cmax , A(σ, s) denotes the solution computed by this algorithm on input the job sequence σ and the speed vector s. The cost of the solution computed by algorithm A on input σ and s is denoted by cost(A, σ, s). We will also consider scheduling algorithms that take as third input the parameter h. In this case we denote by A(σ, s, h) the schedule output and by cost(A, σ, s) its cost. We consider Q||Cmax in the context of selfish agents in which each machine is owned by an agent and the value of si is privately known to the agent. A mechanism for this problem is a pair M = (A, P ), where A is an algorithm to construct a solution and P is a payment function. In particular, the mechanism asks each agent i to report her speed and, based on the reported costs, constructs a solution using A and pays the agents according to P = (P1 , P2 , . . . , Pn ). The profit of agent i is defined as prof iti = Pi − wi /si , that is, payment minus the cost incurred by the agent in being assigned work wi . A strategy for an agent i is to declare a value bi for her speed. Let b−i denote the sequence (b1 , b2 , . . . , bi−1 , bi+1 , . . . , bn ). A strategy bi is a dominant strategy for agent i, if bi maximizes prof iti for any possible b−i . A mechanism is truthful if, for any agent i, declaring her true speed is a dominant strategy. A mechanism satisfies voluntary participation if, for any agent i, declaring her true speed yields a non-negative utility. An algorithm for the Q||Cmax problem is monotone if, given in input the machine speeds b1 , b2 , . . . , bn , for any i and fixed b−i , the work wi is non decreasing in bi . Given a sequence σ of m jobs, we denote by σh the subsequence consisting of the first h jobs in σ, for any h ≤ m; moreover, σ \ σh denotes the sequence obtained by removing from σ the h first jobs. The Greedy algorithm (also known as the ListScheduling algorithm [5]) processes jobs in the order they appear in σ and assigns a job tj to the machine i minimizing (wi + tj )/si , where wi denotes the work of machine i before job tj is assigned; if more than one machine minimizing the above ratio exists then the one of smallest index is chosen. An optimal algorithm computes a solution of minimal cost opt(σ, s). Throughout the paper we assume that the optimal algorithm always produces the lexicographically minimal optimal assignment. As shown in [2], this algorithm is monotone. An algorithm A is a c-approximation algorithm if, for every instance (σ, s), cost(A, σ, s) ≤ c · opt(σ, s). A polynomial-time approximation scheme (PTAS) for a minimization problem is a family A of algorithms such that, for every ǫ > 0 there exists a (1 + ǫ)-approximation algorithm

5

Aǫ ∈ A whose running time is polynomial in the size of the input. It is well-known [5] (and we review the algorithm in Section 4) that, for each fixed number of machines, there exists a PTAS for Q||Cmax .

3

A warm-up algorithm

In this section we present as a warm-up a simple monotone algorithm EZ for the case of two machines that has an approximation ration equal to the golden ration φ ≈ 1.61 . . .. As a consequence of the results of [2], this gives a polynomial-time φ-approximated mechanism for Q||Cmax . Algorithm EZ Input: a job sequence σ and a speed vector s = (s1 , s2 ), s1 ≤ s2 . The algorithm distinguishes two cases. Case I. s2 /s1 ≤ φ. In this case we run the PTAS for two machines of identical speed and obtain a (1 + ǫ)approximation. Let L1 and L2 be the loads of the two identical machines. Then we assign the larger of the two loads to the machine of speed s2 and the smaller of the two to the machine of speed s1 . Case II. s2 /s1 > φ. In this case all the jobs are assigned to the machine of speed s2 . We next show that the algorithm is monotone and then prove a bound on its approximation ratio. Lemma 1 Algorithm EZ is monotone. Proof. If the slower of the two machines decreases her speed (i.e., it becomes even slower) then the following cases are possible. 1. EZ stays in Case I. In this case the machine gets the same load. 2. EZ stays in Case II. In this case the machine gets the same load, that is, 0. 3. EZ switches from Case I to Case II. In this case the load of the machine becomes 0 and thus does not increase. Thus, in no case the load of the machine increases. If instead the slower of the two machines increases her speed (i.e., becomes faster) while remaining the slower of the two then we can have two cases: the relation of s2 /s1 with φ does not change (in which case the load remains the same) or s2 /s1 becomes smaller than φ (in which case the load of the slower machines does not decrease). If instead the machine becomes the faster of the two then it will be certainly assigned more load. The case in which the fastest of the two machines changes her speed can be argued similarly. 2 Now we give a bound on the approximation ratio of the algorithm. 6

Lemma 2 Algorithm EZ is φ-approximated. Proof. Let us assume without loss of generality that 1 = s1 ≤ s2 . In Case I, we have that the algorithm produces an s2 -approximation with s2 ≤ φ. In Case II, instead the algorithm outputs a (1 + 1/s2 )-approximation. However since, s2 ≥ φ, we have that 1 + 1/s2 ≤ φ. 2

4

Combining monotone algorithms with the optimum

In this section we show how to combine an optimal schedule on a subsequence of the jobs with the one produced by a monotone algorithm on the remaining jobs in order to obtain a good monotone approximation algorithm. Our approach is inspired by the PTAS of Graham [5] that can be described as follows. Algorithm PTAS Input: a job sequence σ, speed vector s, and parameter h. Assume that the jobs in σh are the h largest jobs of σ. A. compute the lexicographically minimal schedule among those that have optimal makespan with respect to job sequence σh and vector speed s; let ai be the load assigned to machine i by this phase; B. run algorithm Greedy on job sequence σ \ σh and vector speed s assuming that machine i has initial load ai , i = 1, . . . , n. output the schedule that assigns to machine i the jobs assigned to machine i in Phase A and Phase B. It is well known that, for any ǫ > 0 and for any number n of machines, it is possible to choose the value of the parameter h so that PTAS outputs a schedule of makespan at most (1 + ǫ) times the optimal makespan. Unfortunately, PTAS is not monotone. Indeed, even though the first phase is monotone, it is easy to see that Greedy is not monotone [2]. Moreover, even if we replace Greedy with a monotone algorithm the resulting algorithm is not guaranteed to be monotone. We, instead, propose the following approach. Let Gc be any scheduling algorithm. By Opt-Gc we denote the following algorithm. Algorithm Opt-Gc Input: a job sequence σ, speed vector s, and parameter h. Assume that the jobs in σ are ordered in non-increasing order by weight. A. compute the lexicographically minimal schedule among those that have optimal makespan with respect to job sequence σh and speed vector s; B. run algorithm Gc on job sequence σ \ σh and speed vector s assuming that machine i has initial load 0, i = 1, . . . , n; output the schedule that assigns to machine i the jobs assigned to machine i in Phase A and Phase B. We have the following lemma. 7

Lemma 3 If Gc is monotone then Opt-Gc is also monotone. Proof. Indeed the load received by each machine is the sum of the loads computed by two monotone algorithms running on two disjoint sets of jobs. These two sets are uniquely determined by the parameter h, and do not depend on the machine speeds. 2 In the next sections we show that, if Gc has an approximation factor close to the one of the greedy algorithm, then, for each ǫ > 0 and for each number n of machines, it is possible to choose the value of the parameter h so that Opt-Gc outputs a schedule of makespan at most (2 + ǫ) times the optimal schedule. We start by defining the notion of a greedy-close algorithm. Definition 4 (greedy-close algorithm) Let c be a constant. An algorithm Gc is c-greedyclose if, for any job sequence σ and any machine speed vector s = hs1 , s2 , . . . , sn i, cost(Gc, σ, s) ≤ cost(Greedy, σ, s) + c · tmax /s1 . An algorithm Gc is greedy-close if it is c-greedy-close for some constant c.

4.1

Algorithm PTAS-Gc

In this section we show how to combine a greedy-close algorithm Gc with the optimal algorithm in order to obtain a new approximation scheme PTAS-Gc. In the next section we relate the approximation factors of PTAS-Gc and Opt-Gc and show that the approximation factor of Opt-Gc is at most twice the approximation factor of PTAS-Gc. Intuitively, PTAS-Gc computes the optimal schedule on the h largest jobs and then combines it with a greedy-close solution computed using algorithm Gc. Moreover, in order to guarantee a “good” approximability, we make a balancing step in Phase B where jobs are assigned to nonbottleneck machines to reduce the unbalancing, while keeping the solution optimality. Algorithm PTAS-Gc Input: a job sequence σ, speed vector s, and parameter h. Assume that the jobs in σh are the h largest jobs of σ. A. compute the lexicographically minimal schedule among those that have optimal makespan with respect to job sequence σh and speed vector s; let opt(σh , s) be the makespan of the schedule produced in this phase; B. reduce unbalancing without increasing cost by running algorithm Greedy as long as it is possible to add jobs without exceeding opt(σh , s) and let h′ be the last job considered in this phase; C. run algorithm Gc on job sequence σ \ σh′ and vector speed s assuming that machine i has initial load 0, for i = 1, . . . , n; output the schedule that assigns to machine i the jobs assigned to machine i in phases A, B and C. The rest of this section is devoted to the analysis of the approximation guarantee of algorithm PTAS-Gc. We consider algorithm PTAS-Greedy which is algorithm PTAS-Gc with Gc = Greedy. We define the quantity cost(PTAS-Greedy, σ, s, h) as the sum of the costs of the schedule computed 8

by PTAS-Greedy at the end of Phase B and the cost of the schedule computed by PTAS-Greedy in Phase C. More precisely, we have that cost(PTAS-Greedy, σ, s, h) = opt(σh , s) + cost(Greedy, σ \ σh′ , s), where h′ is the value computed in Phase B. It is obvious that cost(PTAS-Greedy, σ, s, h) ≥ cost(PTAS-Greedy, σ, s, h). We next introduce a modification of Greedy that never outputs a schedule worse than the one it would have obtained by ignoring the slowest machines. Definition 5 (greedy∗ ) Let Greedy∗ denote the algorithm that, on input σ and s = hs1 , . . . sn i, returns as output the best schedule among those computed by Greedy on input σ and speed vectors h0, . . . , 0, sk , . . . , sn i for k = 1, . . . , n. Clearly, we have that cost(Greedy, σ, s) ≥ cost(Greedy∗ , σ, s). Next lemma proves that cost(Greedy, σ, s) is not much bigger than cost(Greedy∗ , σ, s). Lemma 6 For any job sequence σ and for any speed vector s = hs1 , · · · , sn i, we have that cost(Greedy, σ, s) ≤ cost(Greedy∗ , σ, s) + tmax /s1 . Proof. We show, by contradiction, that cost(Greedy, σ, s) ≤ opt(σ, s) + tmax /s1 , which implies the lemma since opt(σ, s) ≤ cost(Greedy∗ , σ, s) ≤ cost(Greedy, σ, s). Let us assume that cost(Greedy(σ, s)) > opt(σ, s) + tmax /s1 .

(1)

Let t∗ be the weight of the last job added by Greedy to one of the bottleneck machines. Let j be the index of the machine of minimal completion time. It can be easily seen that the overall work P is at least (wj /sj ) ni=1 si and opt(σ, hs1 , s2 , . . . , sn i) ≥ wj /sj . So, Equation 1 yields (wj + t∗ )/sj

≤ wj /sj + t∗ /s1 ≤ opt(σ, hs1 , s2 , . . . , sn i) + tmax /s1

< cost(Greedy, σ, hs1 , s2 , . . . , sn i). This contradicts the hypothesis that the job of weight t∗ is added by Greedy to a bottleneck machine j (see definition of Greedy in Sect. 2). 2 To upper bound the cost of the solution computed by PTAS-Gc, for a c-greedy-close algorithm Gc, we use the quantity cost(Gc, σ, s, α) defined as cost(Gc, σ, s, α) = cost(Greedy∗ , σ, s) + (1 + c)tmax /α. 9

Lemma 7 For a c-greedy-close algorithm Gc, and for all job sequences σ and speed vectors s = hs1 , · · · , sn i, it holds that cost(Gc, σ, s) ≤ cost(Gc, σ, s, s1 ) and cost(Gc, σ, h0, 0, . . . , 0, sk , sk+1 . . . , sn i, s1 ) ≤ cost(Gc, σ, h0, 0, . . . , 0, sk+1 . . . , sn i, s1 ). Proof. The second part of the lemma is obvious. For the first part we have (1 + c)tmax s1 c · tmax ≥ cost(Greedy, σ, s) + s1 (by Lemma 6)

cost(Gc, σ, s, s1 ) = cost(Greedy∗ , σ, s) +

≥ cost(Gc, σ, s). (by definition of greedy-close)

2 To analyze algorithm PTAS-Gc, we use the following quantity cost(PTAS-Gc, σ, s, h) = opt(σh , s) + cost(Gc, σ \ σh′ , s, s1 ), where h′ is the index of the last job considered in Phase B of PTAS-Gc. By the first part of Lemma 7, we have that cost(PTAS-Gc, σ, s, h) ≥ cost(PTAS-Gc, σ, s, h). The next two lemmas provide an upper bound on cost(PTAS-Gc, σ, s, h). Lemma 8 For any job sequence σ, any h, and any speed vector s n opt(σ, s) X cost(PTAS-Greedy, σ, s, h) ≤ cost(PTAS, σ, s, h) + si (n − 1), h · s1 i=1

!

where n is the length of the speed vector. Proof. If h′ = m then PTAS-Gc computes an optimal schedule and thus the lemma holds. Let us now consider the case h′ < m. Let Li denote the load of machine i at the end of Phase B of algorithm PTAS-Greedy. Also let δi = Lmax − Li , where Lmax = max1≤i≤n Li . Let σ ′ be the sequence obtained from σ by adding, in decreasing weight, n dummy jobs of weight δi · si , i = 1, · · · , n, just after the h′ -th job (notice that some of the dummy jobs have weight zero). Observe that δi · si ≤ th , for 1 ≤ i ≤ n. Let us now look at the execution of PTAS on input σ ′ , s and h. Observe that, since the h largest jobs of σ are still the h largest in σ ′ , they are processed in Phase A. In phase B, PTAS considers first jobs from th+1 to t′h and then the dummy jobs. When PTAS has processed the last dummy job, it ends up with a schedule where all machines have finish time Lmax . Then, PTAS is left with the jobs in σ \ σh′ Therefore, we have that cost(PTAS, σ ′ , s, h) = Lmax + cost(Greedy, σ \ σh′ , s) = opt(σh , s) + cost(Greedy, σ \ σh′ , s) = cost(PTAS-Greedy, σ, s, h). 10

Adding the dummy jobs to σ can increase the cost of any solution by at most P P Since opt(σ, h) ≥ h · t /( n s ), we have that δ · s ≤ opt(σ,s) n s and h

i=1 i

i

i

h

1 s1

i=1 i

Pn

i=1 δi

· si .

cost(PTAS-Greedy, σ, s, h) = cost(PTAS, σ ′ , s, h)

n 1 X δi · si ≤ cost(PTAS, σ, s, h) + s1 i=1

≤ cost(PTAS, σ, s, h) +

n opt(σ, s) X ( si )(n − 1), h · s1 i=1

P where the last inequality is due to the fact that δi si ≤ opth(σ) ni=1 si and that at least one dummy job must have weight (Lmax − Lmax )si = 0. This completes the proof. 2

Next lemma relates algorithm PTAS-Gc and algorithm PTAS-Greedy. Lemma 9 If Gc is c-greedy-close, then for any job sequence σ, any h, and any speed vector s cost(PTAS-Gc, σ, s, h) ≤ cost(PTAS-Greedy, σ, s, h) +

n (1 + c) · opt(σ, s) X si , h · s1 i=1

where n is the length of the speed vector. Proof. We have the following inequalities cost(PTAS-Gc, σ, s, h) − cost(PTAS-Greedy, σ, s, h) = cost(Gc, σ \ σh′ , s, s1 ) − cost(Greedy, σ \ σh′ , s) (1 + c)th ≤ cost(Greedy∗ , σ \ σh′ , s) + − cost(Greedy, σ \ σh′ , s) s1 (1 + c)th ≤ s1 n (1 + c) · opt(σ, s) X ≤ si h · s1 i=1 h . where the last inequality follows from the fact that opt(σ, s) ≥ Ph·t n s

2

i=1 i

Lemma 10 [5] There exists a function f (·), such that for any job sequence σ, any h, and any speed vector s, f (n) cost(PTAS, σ, s, h) ≤ opt(σ, s) 1 + , h+1 where n is the length of the speed vector. We next provide a bound on the cost of PTAS-Gc in terms of opt(σ, s) and sn /s1 . Theorem 11 If Gc is c-greedy-close then, for any job sequence σ, any h, and any speed vector s, f (n) + n2 + c · n sn cost(PTAS-Gc, σ, s, h) ≤ opt(σ, s) 1 + h s1 where n is the length of the speed vector. 11

!

,

Proof. By using Lemmata 8-10 we obtain the following chain of inequalities: cost(PTAS-Gc, σ, s, h) ≤ cost(PTAS-Greedy, σ, s, h) + (1 + c) · (by Lemma 9)

n opt(σ, s) X si h · s1 i=1

n n opt(σ, s) X opt(σ, s) X si (n − 1) + (1 + c) · ≤ cost(PTAS, σ, s, h) + si h · s1 h · s1 i=1 i=1

!

(by Lemma 8)

n n+cX f (n) si + ≤ opt(σ, s) 1 + h + 1 h · s1 i=1

!

f (n) + n2 + c · n sn < opt(σ, s) 1 + h s1

!

(by Lemma 10)

,

where the last inequality follows from si ≤ sn . This completes the proof.

2

The bound given by Theorem 11 is good for small values of sn /s1 . When instead, sn is much larger than s1 it might be convenient to neglect the machine with speed s1 and run instead PTAS-Gc only on the remaining n − 1 machines. In the next theorem, we prove that in this way we can obtain (1 + ǫ) approximation for any value of ǫ > 0. We start with the following technical lemma. Lemma 12 If Gc is greedy-close, then for all σ, h and s = hs1 , s2 , . . . , sn i cost(PTAS-Gc, σ, hs1 , s2 , . . . , sn i, h) ≤ cost(PTAS-Gc, σ, h0, s2 , . . . , sn i, h). Proof. The lemma follows from the definition of cost(PTAS-Gc), from Lemma 7 and from the observation that opt(σh , hs1 , s2 , . . . , sn i) ≤ opt(σh , h0, s2 , . . . , sn i). 2 We are now in a position to prove the main result of this section. Theorem 13 For any positive integer n and for any ǫ > 0, if Gc is a polynomial-time greedy-close algorithm, then there exists an h such that for all σ and for all speed vectors s of length n cost(PTAS-Gc, σ, s, h) ≤ (1 + ǫ)opt(σ, s). Moreover, the running time of PTAS-Gc is polynomial in m = |σ|. Proof. We will prove by induction on n that for any ǫ > 0 there exists an h, depending on ǫ and n only, such that cost(PTAS-Gc, σ, s, h) ≤ (1 + ǫ)opt(σ, s). The base case n = 1 is trivial. If sn /s1 ≤ ǫ, then by Theorem 11, it is possible to pick h = h(n, ǫ) so that cost(PTAS-Gc, σ, s, h) ≤ (1 + ǫ)opt(σ, s). Otherwise, pick ǫ′ such that (1 + ǫ′ )(1 + s1 /sn ) ≤ (1 + ǫ). Then by inductive hypothesis it is possible to choose h′ = h′ (n − 1, ǫ′ ) such that cost(PTAS-Gc, σ, h0, s2 , . . . , sn i, h′ ) ≤ (1 + ǫ′ )opt(σ, h0, s2 , . . . , sn i).

12

Thus we have the following inequalities. cost(PTAS-Gc, σ, hs1 , s2 , . . . , sn i, h′ ) ≤ cost(PTAS-Gc, σ, h0, s2 , . . . , sn i, h′ ) (by Lemma 12) ≤ (1 + ǫ′ )opt(σ, h0, s2 , . . . , sn i) ≤ (1 + ǫ′ )(1 + s1 /sn )opt(σ, hs1 , s2 , . . . , sn i) ≤ (1 + ǫ)opt(σ, s). Finally, the running time is O(nh+2 + m log m + poly(m)).

4.2

2

Approximation analysis of Opt-Gc

We are now ready to give a bound on the approximation ratio guaranteed by Opt-Gc. Theorem 14 For any positive integer n and for any ǫ > 0, if Gc is greedy-close, then there exists an h = h(n, ǫ) such that for all sequences of jobs σ and all speed vectors s of length n, cost(Opt-Gc, σ, s)) ≤ (2 + ǫ)opt(σ, s). Proof. Fix ǫ > 0. Let h = h(n, ǫ) be such that cost(PTAS-Gc, σ, s, h) ≤ (1 + ǫ/2)opt(σ, s) (such an h exists by Theorem 13) and let h′ be the index of the last job scheduled during phase B by algorithm PTAS-Gc on input σ, s, and h. Construct a new job sequence σ ′ from σ by adding, just after th′ , a copy of the jobs from th+1 to th′ . We observe that the set of new jobs, considered independently from the rest of the sequence, can be scheduled in time opt(σ, s) (using the same schedule computed in phase B of PTAS-Gc) and thus opt(σ ′ , s) ≤ 2opt(σ, s). Moreover, we will show that PTAS-Gc on input σ ′ , s and h computes a schedule of cost not less than the schedule computed by Opt-Gc on input σ, s, and h. Then, we have cost(Opt-Gc, σ, s, h) ≤ cost(PTAS-Gc, σ ′ , s, h) ≤ (1 + ǫ/2)opt(σ ′ , s) ≤ (2 + ǫ′ )opt(σ, s) and the theorem follows. It remains to prove that the cost of the schedule produced by PTAS-Gc on input σ ′ , s, and h is not less than the cost of the schedule produced by Opt-Gc on σ, s, and h. Let us start from PTAS-Gc: phase A processes the h largest jobs optimally, phase B processes the jobs th+1 · · · th′ using Greedy, and phase C processes jobs in σ ′ /σh′ ′ = σ/σh using Gc. We remark that at the end of phase B no additional job of weight greater than or equal to th′ can be processed without increasing the makespan of the schedule. Since the (h′ + 1)-st job in σ ′ has weight equal to th+1 ≥ th′ we have that phase B processes only jobs th+1 , · · · , th′ . Algorithm Opt-Gc, instead, consists of only two phases: phase A processes the h largest jobs of σ optimally and phase B processes jobs in σ/σh using Gc. Then, both the first and the last phases of the two algorithms produce the same schedule since they use the same algorithm to process the same sequence of jobs. Moreover, PTAS-Gc has to process another set of jobs during its re-balancing phase. Then we have that cost(Opt-Gc, σ, ~s, h) ≤ cost(PTAS-Gc, σ ′ , ~s, h). 2 13

5

A monotone greedy-close algorithm

In this section we describe a greedy-close algorithm that is monotone for the case of “divisible” speeds (see Def. 20 below). We present our algorithm for the case of integer divisible speeds; this is without loss of generality, as in case the divisible speeds are not integers then they can be scaled to be integers. Let us consider the following algorithm: Algorithm uniform Input: a job sequence σ and speed vector s = hs1 , s2 , · · · , sn i, with s1 ≤ s2 ≤ · · · ≤ sn . 1. run algorithm Greedy on job sequence σ and S =

Pn

i=1 si

identical machines;

2. order the identical machines by nondecreasing load l1 , . . . , lS ; 3. let g := GCD(s1 , s2 , . . . , sn ) and split the identical machines into g blocks B1 , · · · , Bg each consisting of S/g consecutive identical machines. For 1 ≤ i ≤ g and 1 ≤ k ≤ S/g, denote by Bik the k-th identical machine of the i-th block. Thus identical machine Bik has load l(i−1)·g+k−1 . j−1 4. for 1 ≤ j ≤ n let kj = l=1 sl ; then machine j receives the load of identical machines Bikj +1 , · · · , Bikj +sj , for each group 0 ≤ i ≤ S/g − 1;

P

As it is described above, algorithm uniform does not run in polynomial time as its running time depends on S which, in general, is not polynomially bounded in n and m. However, uniform can be easily modified so to obtain the same allocation in polynomial time. Theorem 15 The allocation of uniform can be computed in time O(n · m + m log m). Proof. If S ≤ m then the theorem clearly holds. Otherwise, we have m identical machines with one job each, and all other machines have no jobs. The solution can be computed by considering g ′ = ⌈m/g⌉ blocks of g identical machines each as follows: block g ′ contains machines from lS down to lS−g′ +1 , block g ′ − 1 contains machines lS − g ′ down to lS−2g′ +1 , and so on (the latter block may include some of the machines with no load). Therefore, the Greedy algorithm runs on O(m) identical machines and Steps 1-2 take O(m log m) time, while the remaining steps require time O(n · m). 2 Because of the above result, in the sequel we improperly denote by uniform the efficient version of the algorithm computing the same allocation within O(n · m + m log m) time.

5.1

Approximation analysis of uniform

Let us denote by wji the work of the sj identical machines from block Bi whose loads are assigned to machine j. Then we have sj /g

wji

=

X

l(i−1)·S/g+k/g+l .

l=1

Theorem 16 For any job sequence σ and any integer speed vector s = hs1 , s2 , . . . , sn i it holds that cost(uniform, σ, s) ≤ opt(σ, s) + tmax /g, where g = GCD(s1 , s2 , . . . , sn ). We next state some useful properties on wji . 14

Claim 17 For 1 ≤ i ≤ g, and for all 1 ≤ j < k ≤ n, wji /sj ≤ wki /sk . Proof. Since the identical machines are ordered and agent k receives machines of index greater than those assigned to j the following holds: then the most loaded machine assigned to j has work not greater than the least loaded machine assigned to k. As wji /sj is the average load of the identical machines in block i assigned j, the claim follows. 2 Claim 18 For any 1 ≤ i ≤ g − 1,

wji+1 sj

≥

i wn sn .

Proof. We observe that the least loaded machine in block i + 1 has load not smaller than the most loaded machine in block i. The rest of the proof is similar to that of Claim 17. 2 Claim 19 For any 1 ≤ i ≤ g, wj1 ≥

sj i sn wn

−

sj g tmax .

Proof. Let first observe that is must hold lS − l1 ≤ tmax , since otherwise it would be possible to move one job from the most loaded machine to the least loaded one and reduce the maximum load. This contradicts the definition of Greedy algorithm which is used in uniform as a subroutine. Since machine j receives the load of sj /g identical machines, it holds that sgj wj1 ≥ l1 . Similarly, for any block i, lS ≥ sgn wni . Putting things together we obtain wj1

sj sj ≥ l1 ≥ (lS − tmax ) ≥ g g

sj sj sj g i wn − tmax = wni − tmax . sn g sn g

2 Proof. (of Theorem 16) From Claims 18 and 19 we obtain the following two inequalities, respectively: wj =

g X

wji = wj1 +

i=1

Since opt(σ) ≥

g X

wji ≥ wj1 +

i=2

Pn

j=1 wj /S,

g−1 X i=1

g−1

X sj sj sj sj sj sj i wn ≥ wng − tmax + wni = wn − tmax . (2) sn sn g s s g n i=1 n

Eq. (2) implies

tmax wn /sn − opt(σ) ≤ wn /sn + g

Pn

j=1 sj

S

−

wn sn

Pn

j=1 sj

S

=

tmax . g

From Claim 17 we obtain wn /sn =

g X

wni /sn ≥

g X

wji /sj = wj /sj ,

i=1

i=1

thus implying that cost(uniform, σ, s) = wn /sn ≤ opt(σ) + tmax /g. This completes the proof. 2 When s1 divides all si s, we have that g = s1 and the uniform algorithm is greedy-close. We then define sequences of speeds that enjoy this property, which will be used below to prove the monotonicity of uniform. Definition 20 (divisible speeds) Let C = {c1 , c2 , . . . , cp , . . .}, with the property that ci divides ci+1 . Then a speed vector s = hs1 , s2 , . . . , sn i is divisible if s ∈ C n . The restriction to divisible speeds denotes the problem version in which the set C is known to the algorithm and all declared speeds must be in C. 15

We remark that if the algorithm knows C the set of agent strategies is restricted. Indeed, if an agent declares a speed s′ 6∈ C, the algorithm can simply consider this speed as the largest ci ∈ C with ci ≤ s′ . This is equivalent to the assumption that agents always declare a speed in C. We thus have following theorem. Theorem 21 Algorithm uniform is greedy-close when restricted to divisible speeds.

5.2

Algorithm uniform is monotone

In order to prove the monotonicity of algorithm uniform we first prove some technical results on greedy allocations on identical machines. Lemma 22 Let Li (respectively, li ) denote the load of the i-th least loaded machine when Greedy uses N (respectively, N + 1) identical machines. It then holds that li+1 ≤ Li , for all 1 ≤ i ≤ N . Proof. We show that the lemma holds at any iteration of Greedy after reordering the loads of the identical machines. Let l1 , l2 , . . . , lN +1 and L1 , L2 , . . . , LN satisfy lk+1 ≤ Lk , for all 1 ≤ k ≤ N . When a new job of weight w occurs, it will be assigned to the least loaded machine, thus yielding the sequences L2 , L3 , . . . , Li , L1 + w, Li+1 , . . . , LN and l2 , l3 , . . . , lj , l1 + w, lj+1 , . . . , lN +1 , for some ′ ≤ L′k , where lk′ and L′k i and j. We have to prove that these two new sequences also satisfy lk+1 denote the k-th element in the new sequence of length N and N + 1, respectively. We distinguish three cases: (j < i.) For 1 ≤ k ≤ j − 1, lk′ = lk+1 ≤ Lk = L′k−1 . Similarly, for i + 2 ≤ k ≤ N + 1, ′ . lk′ = lk ≤ Lk−1 = L′k−1 . It thus remains to check the elements from lj′ = l1 + w up to li+1 ′ ′ ′ First observe that lj ≤ lj+1 = lj+1 ≤ Lj = Lj−1 , where the last equality follows from j < i. Moreover, for j + 1 ≤ k ≤ i, we have that lk′ = lk ≤ Lk−1 = L′k−2 ≤ Lk−1 , where the last ′ inequality follows from k ≤ i. Finally, li+1 = li+1 ≤ Li = L′i−1 ≤ L′i . (j > i.) For 1 ≤ k ≤ i, lk′ = lk+1 ≤ Lk = L′k−1 . Similarly, for j + 1 ≤ k ≤ N + 1, lk′ = lk ≤ ′ up to lj′ = l1 + w. First observe Lk−1 = L′k−1 . It remains to check the elements from li+1 ′ ′ ′ that, since i + 1 ≤ j, li+1 ≤ lj = l1 + w ≤ L1 + w = Li . Similarly, for i + 2 ≤ k ≤ j, we have that lk′ ≤ lj′ ≤ L′i ≤ L′k−1 . (j = i.) For 1 ≤ k ≤ i − 1, lk′ = lk+1 ≤ Lk = L′k−1 . Similarly, for i + 1 ≤ k ≤ N + 1, lk′ = lk ≤ Lk−1 = L′k−1 . Finally, li′ = l1 + w ≤ li+1 ≤ Li = L′i−1 . This completes the proof.

2

Lemma 23 Let Li (respectively, li ) denote the load of the i-th least loaded machine when Greedy uses N (respectively, N ′ > N ) identical machines. It holds that Li ≤ li + li+1 , for 1 ≤ i ≤ N . Proof. We prove the lemma for N ′ = N + 1 since this implies the same result for any N ′ > N . PN PN Pi−1 P and For any 1 ≤ i ≤ N , Lemma 22 yields i−1 k=i+1 lk+1 , k=i+1 Lk ≥ k=1 lk k=1 Lk ≥ thus implying Li ≤ li + li+1 . 2 Lemma 24 Let Li (respectively, li ) denote the load of the i-th least loaded machine when Greedy uses N (respectively, N ′ > N ) identical machines. For any a, b, b′ such that N − b ≤ N ′ − b′ it holds that ′ W (a, b) :=

b X

Li ≥ W ′ (a + b′ − b, b′ ) :=

i=a

b X

i=a+b′ −b

16

li .

Proof. Let d = N ′ − N . By repeatedly applying Lemma 22 we obtain Li ≥ li+d , for 1 ≤ i ≤ N . Since b′ − b ≤ N ′ − N = d, it holds that Li ≥ li+d ≥ li+b′ −b , for 1 ≤ i ≤ N . This easily implies the lemma. 2 We are now in a position to prove the monotonicity of uniform. The intuition is to show that, if an agent increases her speed, then the overall work assigned to the other agents cannot decrease. Theorem 25 Algorithm uniform is monotone when restricted to divisible speeds. Proof. Let us consider some machine j and, fixing the speeds of all other machines, let wj (x) P denote the work assigned to her on declared speed x. Also let w−j (x) := ( m i=1 ti ) − wj (x). We ′ ′ ′ will show that, for all sj < sj , w−j (sj ) ≤ w−j (sj ), thus implying wj (sj ) ≥ wj (sj ). Since speeds are divisible, then the minimal increment for speed sj ∈ C is s′j = 2sj . We prove the monotonicity for this case only, since this implies any increment from sj ∈ C to s′j ∈ C. First observe that GCD(s1 , s2 , . . . , sj , . . . , sn ) = s1 . We can then distinguish two cases: (j 6= 1.) We first consider the case s′j is still the j-th smallest speed, i.e., s′j ≤ sj+1 . Let us observe that, in each block, every machine other than j receives the work of the same number of identical machines as in the assignment with sj . In particular, machines 1, 2, . . . , j − 1 Pj−1 (respectively, j + 1, j + 2, . . . , n) receive the first k = l=1 sl /g (respectively, last q = Pn s /g) identical machines from each block. Let us denote by B(i; x) the i-th block l=j+1 l of the allocation with speed of machine j equal to x. Let a (respectively, a′ ) denote the index of the first identical machine in block B(i; sj ) (respectively, B(i; s′j )). Since, for every 1 ≤ i ≤ g, B(i; sj ) and B(i; s′j ) contain S/g and S ′ /g identical machines, respectively, we have a′ = (i−1)S ′ /g +1 and (i−1)S/g +1 = a. So, a′ −a = (i−1)S ′ /g −(i−1)S/g ≤ S ′ −S. We can thus apply Lemma 24 with b = a + k − 1 and b′ = a′ + k − 1 to show that the work of the k least loaded machines of block g(sj ) is at least as the work of the k least loaded machines in block g(s′j ): W (a, a + k − 1) =

a+k−1 X

′

′

′

Li ≥ W (a , a + k − 1) =

i=a

a′ +k−1 X

li .

i=a′

Similarly, let b (respectively, b′ ) denote the index of the last identical machine in block B(i; sj ) (respectively, B(i; s′j )). Clearly b = iS/g and b′ = iS ′ /g, thus implying b′ − b = i(S ′ − S)/g ≤ S ′ − S. Lemma 24 implies that the overall work of the last q machines in B(i; sj ) is not smaller than the work of the last q machines in B(i; s′j ). In turn, in block B(i; s′j ), the overall work of all machines other than j is at most the corresponding work in block B(i; sj ). Since this holds for all 1 ≤ i ≤ g, w−j (s′j ) ≤ w−j (sj ). Finally, if s′j > sj , then machine j, on speed s′j , receives (the load of) an interval of s′j /g identical machines in block i which have indexes larger than the indexes it would receive assuming s′j being the j-th smallest speed. Since the latter case has been considered before, also in his case w−j (s′j ) ≤ w−j (sj ). (j = 1.) If s′1 is still the smallest speed, than let g ′ := GCD(s′1 , s2 , . . . , sn ) = s′1 . Machine 1 receives the load of the first identical machine from each block. For 1 ≤ i ≤ g, the index of the first machine in block i, is b(i) := (i − 1) Sg + 1 when considering speed s1 , and

17

′

b′ (i) := (i − 1) Sg′ + 1 when considering speed s′1 . Since s′1 = 2s1 , we have S ′ = S + s1 and S ′ /g ′ = S/2s1 + 1/2, thus implying b′ (2i − 1) − b(i) = 2(i − 1)

S + s1 S − (i − 1) = i − 1 ≥ 0. 2s1 s1

From Lemma 23 we obtain Lb(i) ≤ lb(i) + lb(i)+1 ≤ lb′ (2i−1) + lb′ (2i) . Finally, w1 (s1 ) =

g X i=1

Lb(i) ≤

g X

lb′ (2i−1) + lb′ (2i) =

2g X

lb′ (i) = w1 (s′1 ).

(3)

i=1

i=1

This completes the proof if s′1 is the smallest speed. Otherwise, we consider the increase of speed from s1 to s′1 in two steps: (i) from s1 to s2 with machine 1 being the first machine also with the new speed, and (ii) increase the speed of machine 2 from s2 up to s′1 . Formally, let us fix the order of the machines according to speeds si , that is, machine k corresponds to the machine of speed sk . Let wj (x1 , x2 , . . . , xn ) denote the work of machine j w.r.t. machine speeds hx1 , x2 , . . . , xn i, with xk being the speed of machine k. W.l.o.g. let us assume machine 1 is still considered the slowest machine by uniform (otherwise, we simply reason on the machine with speed equal to s2 that the algorithm consider the slowest). From Eq. (3) we obtain w1 (s1 ) ≤ w1 (s2 ) = w1 (s2 , s2 , s3 , . . . , sn ) ≤ w2 (s2 , s2 , s3 , . . . , sn ) ≤

w2 (s2 , s′1 , s3 , . . . , sn )

=

w1 (s′1 , s2 , s3 , . . . , sn )

=

w1 (s′1 ),

(4) (5)

where the second-last inequality follows from the hypothesis that machine 1 is still considered the slowest machine when declaring speed s′′1 = s2 , and the last inequality follows from Case (j 6= 1) above. This completes the proof.

2

One might think that simpler algorithms can achieve simultaneously monotonicity and the approximation ratio as in Definition 4. Consider algorithm Gc1 using S identical machines organized on a single block, i.e., machine j receives the work of S/sj contiguous identical machines. In this case, the cost of this solution is bounded above by cost(Greedy, σ, s) + tmax , which does not suffice when speeds are not bounded from above (see proof of Theorem 11). The same negative result also applies to the monotone algorithm employing a single block of S/s1 identical machines.

6

Monotone greedy-close algorithm for constant speeds

In this section we consider a simpler algorithm, which we call single-block that is monotone and greedy-close for the case in which the speeds of the machines are upper-bounded by a constant M. Algorithm single-block Input: a job sequence σ and speed vector s = hs1 , s2 , · · · , sn i, with s1 ≤ s2 ≤ · · · ≤ sn . 1. run algorithm Greedy on job sequence σ and S =

Pn

i=1 si

identical machines;

2. order the identical machines by nondecreasing load l1 , . . . , lS ; 18

3. for j = 1 to n do set kj =

Pj−1 l=1

sl ;

assign jobs of identical machines of indexes k + 1, k + 2, . . . , k + sj to machine j; We have the following theorem. Theorem 26 If the speeds of the machines are upper-bounded by a constant M , then algorithm single-block is M -greedy close. Proof. Observe that (the allocation of) algorithm single-block coincides with (the allocation of) uniform when setting g := 1 instead of g := GCD(s1 , . . . , sn ). Therefore, by Theorem 16, we have that M tmax cost(single-block, σ, s) ≤ opt(σ, s) + tmax ≤ opt(σ, s) + s1 and thus single-block is M -greedy close. 2 Let us now argue about the monotonicity of single-block. Theorem 27 Algorithm single-block is monotone. Proof. Let us consider some machine j and, fixing the speeds of all other machines, let wj (x) P denote the work assigned to machine j when she declares speed x. Also let w−j (x) := ( m i=1 ti ) − wj (x). We will show that, for all sj < s′j , w−j (s′j ) ≤ w−j (sj ), thus implying wj (s′j ) ≥ wj (sj ). Let us observe that, in the assignment with sj , every machine other than j receives the work of the same number of identical machines as in the assignment with s′j . In particular, machines Pj−1 1, 2, . . . , j − 1 (respectively, j + 1, j + 2, . . . , n) receive the first k = l=1 sl (respectively, last Pn q = l=j+1 sl ) identical machines. We can thus apply Lemma 24 with a = 1 and b = b′ = k thus obtaining W (1, k) =

k X

Li ≥ W ′ (1, k) =

k X

li .

i=1

i=1

where W (·, ·) is defined as in Lemma 24. P Let S = ni=1 si and S ′ = S − sj + s′j . Similarly, from Lemma 24 we obtain W (S − q + 1, S) =

S X

′

′

′

′

Li ≥ W (S − q + 1, S ) =

i=S−q+1

S X

li .

i=S ′ −q+1

The two above inequalities easily imply w−j (s′j ) ≤ w−j (sj ). 2

7

Polynomial-time mechanisms

Computing the payments.

We make use of the following result:

Theorem 28 ([2]) A decreasing output function admits a truthful payment scheme satisfying R voluntary participation if and only if 0∞ bi wi (b−i , u)du < ∞ for all i,b−i . In this case, we can take the payments to be Pi (b−i , bi ) = bi wi (b−i , bi ) + 19

Z

0

∞

bi wi (b−i , u)du.

(6)

We next show how to compute the payments in Eq. (6) in polynomial time when the work curve corresponds to the allocation of PTAS-uniform. Theorem 29 Let A be a polynomial-time r–approximation algorithm. It is possible to compute the payment functions in Equation (6) in time poly(n, m) when (i) all speeds are integer not greater than some constant M , and (ii) speeds are divisible. Proof. Observe that since A is an r–approximation algorithm there exists a value S ≤ r · S, P where S = ni=1 si , such that on input (s−i , S), the algorithm assigns all jobs to machine i. Then, in order to compute the work curve of machine i we have only to consider speed values in the interval [0, S]. Since A runs in polynomial time, if speeds are integer, it is always possible to compute the work curve within time O(S · poly(n, m)). When all speeds are not larger than M , we have that S ∈ O(n · M ) and the first part of the theorem follows. Suppose now that speeds are divisible. In this case all the speeds belong to the interval [2−l , 2l ], where l is the length in bits of the input. Then, there are O(log 2l ) distinct speed values that machine i can take. So, the computation of the work curve takes O(l·poly(n, m)) = O(poly(n, m)). 2 Truthful approximation mechanisms. Theorem 30 There exists a truthful polynomial-time (2+ǫ)-approximation mechanism for Q||Cmax restricted to integer speeds bounded above by some constant M . Moreover, the mechanism satisfies voluntary participation and the payments can be computed in polynomial time. Proof. Consider algorithm single-block. By Theorem 26 and 27 algorithm single-block is monotone and greedy-close when machines have constant speeds. Then, by Theorem 13 there exists a (2 + ǫ)-approximation mechanism, and by Theorem 29 it is possible to compute the payments in polynomial time. Moreover, by Theorem 28 it follows that the mechanism satisfy voluntary participation. 2 Theorem 31 There exists a truthful polynomial-time (2+ǫ)-approximation mechanism for Q||Cmax restricted to divisible speeds. Moreover, the mechanism satisfies voluntary participation and the payments can be computed in polynomial time. Proof. By Theorem 21 and 25 it follows that in this case uniform is greedy-close and monotone. Thus, by Theorem 14 there exists a (2 + ǫ)-approximation mechanism, and by Theorem 29 it is possible to compute the payments in polynomial time. Moreover, by Theorem 28 it follows that the mechanism satisfy voluntary participation. 2 Theorem 32 For every ǫ > 0, there exists a truthful polynomial-time (4+ǫ)-approximation mechanism for Q||Cmax . Moreover, the mechanism satisfies voluntary participation and the payments can be computed in polynomial time. Proof. Let us modify our PTAS-uniform by rounding every input speed si to c(si ) = ⌊log s ⌋ i 2 2 . Let s = (s1 , s2 , · · · , sn ) be the vector of speeds declared by the machines and let c(s) = (c(s1 ), c(s2 ), · · · , c(sn )) be the vector of the rounded speeds. Observe that if s′i > si then c(s′i ) ≥ c(si ). Let wi (s) denote the work of machine i according to PTAS-uniform(σ, s). The work assigned to machine i according to the modified speeds is wi (c(s)). Since by Theorems 21 and 25 PTAS-uniform 20

is monotone on divisible speeds, wi (c(s1 ), · · · , c(si ), · · · , c(sn )) ≤ wi (c(s1 ), · · · , c(s′i ), · · · , c(sn )), for all s′i > si . So, if machine i increases her speed from si to s′i she cannot decrease her work and the modified PTAS-uniform remains monotone. As for the approximability, we observe that c(si ) > si /2 and then opt(σ, (c(s)) ≤ 2opt(σ, (s)). Theorem 31 implies cost(PTAS-uniform, σ, s) = cost(PTAS-uniform, σ, c(s)) ≤ (2 + ǫ)opt(σ, c(s)) ≤ 2(2 + ǫ)opt(σ, s). (7) Finally, the payment functions can be computed as in the case of speeds that are powers of 2. 2

8

Conclusions and open problems

We have shown a general technique to transform the standard PTAS for Q||Cmax on a constant number of machines into a monotone approximation algorithm. By the results in [2] the existence of such a monotone algorithm implies the existence of a truthful mechanism that, for every ǫ > 0, computes in polynomial time a (2 + ǫ)-approximate solution. We use our technique to an approximation algorithm that is monotone for the case of (i) speeds bounded above by some constant M and (ii) divisible speeds. We have also proved that the latter result implies the existence of a (4 + ǫ)-approximate truthful mechanism for the general case. All of our mechanisms satisfy voluntary participation and compute the payments in polynomial time. The main problem left open by this paper is to prove if there exists better truthful approximate mechanisms.

References [1] A. Archer, C. Papadimitriou, K. Talwar, and E. Tardos. An approximate truthful mechanism for combinatorial auctions with single parameter agents. In Proc. of the 14th SODA, 2003. [2] A. Archer and E. Tardos. Truthful mechanisms for one-parameter agents. In Proc. of FOCS, pages 482–491, 2001. [3] E.H. Clarke. Multipart Pricing of Public Goods. Public Choice, pages 17–33, 1971. [4] M.R. Garey and D.S. Johnson. Computers and intractability: a guide to the theory of NP-completeness. Freeman, 1979. [5] R. L. Graham. Bounds for certain multiprocessing anomalies. Bell System Tech. Journal, 45:1563– 1581, 1966. [6] T. Groves. Incentive in Teams. Econometrica, 41:617–631, 1973. [7] D.S. Hochbaum and D.B. Shmoys. Using dual approximation algorithms for scheduling problems: theoretical and practical results. Journal of the ACM, 34:144–162, 1987. [8] E. Horowitz and E.H. Sahni. Exact and approximate algorithms for scheduling nonidentical processors. Journal of ACM, 23:317–327, 1976. [9] N. Nisan and A. Ronen. Algorithmic Mechanism Design. In Proc. of the 31st STOC, pages 129–140, 1999. [10] N. Nisan and A. Ronen. Computationally Feasible VCG Mechanisms. In Proceedings of the 2nd ACM Conference on Electronic Commerce (EC), pages 242–252, 2000. [11] C. H. Papadimitriou. Algorithms, Games, and the Internet. In Proc. of the 33rd STOC, 2001.

21

[12] A. Ronen. Solving Optimization Problems Among Selfish Agents. PhD thesis, Hebrew University in Jerusalem, 2000. [13] W. Vickrey. Counterspeculation, Auctions and Competitive Sealed Tenders. Journal of Finance, pages 8–37, 1961.

A

Existing approximation algorithms are not monotone even for divisible speeds

In this section we discuss why existing approximation algorithms for Q||Cmax cannot be used in the context of selfish agents even if speeds are limited to be divisible speeds. In particular, we consider the case of two machines and show that, even in this restricted setting, (natural variants of) existing algorithms are not monotone. Some of these results are already mentioned in [2], and we present them here more in detail only for the sake of completeness. In particular, we extend the analysis of the Greedy algorithm and provide some evidence that monotonicity requires that jobs be processed in non-increasing order and machine speeds be divisible. Although we are not able to prove that these two conditions are sufficient to guarantee the monotonicity of Greedy, in Section 5 we describe algorithm uniform that is monotone for this particular case. Greedy algorithm [5]. Let us first consider the non ordered sequence of jobs σ = 1, ǫ, 1, 2 − 3ǫ, for 0 < ǫ < 1/3, to be allocated on two machines. The following table shows how Greedy allocates the jobs with respect to different speeds. speeds (s1 , s2 ) (1, 1) (1, 2)

machine 1 ǫ, 1 ǫ, 2 − 3ǫ

machine 2 1,2 − 3ǫ 1,1

Clearly, machine 2 gets more work when declaring speed s2 = 1 than declaring speed s2 = 2, thus implying the non-monotonicity of Greedy when processing jobs without ordering them. We now consider ordered jobs but arbitrary speeds. First, consider jobs 3, 2, 2 and two machines with speeds (1, s2 ). For any 1 < s2 < 5/4, Greedy will assign both jobs of weight 2 to the slower machine, and thus this machine gets more load than the faster one. As for the case of jobs processed in non decreasing order, consider weights 1, 2, 5/2 and speeds (1, 4/3). It is easy to see that machine 1 will get jobs of weight 1 and 5/2. Observe that, in both cases we are using speeds whose ratio is smaller than 2. Rounding the instance [8]. A FPTAS for Q||Cmax is based on the following simple idea: (i) truncate a certain number of less significant digits in the input t1 , t2 , . . . , tm ; (b) optimally solve the truncated instance using the pseudo-polynomial time algorithm for the problem. The first problem with this approach is again the fact that, for some instances, the slowest machine may get more work than the fastest one. Indeed, consider three jobs of weight 19 and three jobs of weight 10. When truncating the least significant digit, all jobs have weight 10. Mapping back the solution to the original weights yields the following assignments: speeds (s1 , s2 ) (1, 1) (1, 2)

machine 1 10, 10, 10 19, 19 22

machine 2 19, 19, 19 10, 10, 10, 19

Randomized rounding [7]. Randomized rounding is not monotone and the modified randomized rounding in [2] guarantees only the expected work to be monotone. Both these negative results are already mentioned in [2] and apply to the case of identical speeds.

23