Dynamic Power Management of Multiprocessor ... - Semantic Scholar

1 downloads 0 Views 473KB Size Report
performance for a given power budget, and iii) dynamic update of the power ... include Advanced Configuration and Power Interface (ACPI) [13] and OnNow [18].
Dynamic Power Management of Heterogeneous Systems Jinwoo Suh, Dong-In Kang, and Stephen P. Crago University of Southern California/Information Sciences Institute 3811 N. Fairfax Drive, Suite 200, Arlington, VA 22203

{jsuh, dkang, crago}@isi.edu Area: Resource Management – Power Aware Abstract Power management is critical to power-constrained real-time systems. In this paper, we present a dynamic power management algorithm for real-time heterogeneous systems. Unlike other approaches that focus on the tradeoff between power and performance, our algorithm maximizes power utilization and performance. Our algorithm considers a dynamic environment, allowing for changes in the available energy and adapting system parameters such as operating voltage, frequency, and the number of processors. In our algorithm, the power management problem is divided into three subproblems: i) initial power allocation to minimize wasted energy and avoid the undersupplied power situation, ii) system parameter computation based on the allocated power that maximizes the performance for a given power budget, and iii) dynamic update of the power and system parameters in run time. The simulation results of the algorithm for a satellite system using three heterogeneous processors is presented. Keywords: dynamic power management, power-aware computing, heterogeneous system, real-time system

1. Introduction Many embedded systems in power-constrained environments such as satellite systems, hand-held devices, and solar powered systems operate using rechargeable batteries. Even though technology advances in rechargeable batteries have improved the energy storage capability, the power requirements for many mission critical systems still far exceed the storage capacity. Therefore, efficient power management techniques are critical for the operation of these systems. In this paper, we consider a heterogeneous system that has a rechargeable battery to store energy and an external power source. In this system, not only low-power design is important, but high power utilization and performance are also important to achieve. The power management algorithm needs to find an initial power management schedule, and, during run time, it needs to determine system parameters and update the schedule to accommodate differences between the planned schedule and real schedule. Recently, hardware and software interface support for efficient power management has become available (e.g. frequency scaling, voltage scaling, and OS-level industry standards). The StrongARM processor supports frequency scaling by providing a clock frequency scaling register, but does not support on-chip dynamic voltage scaling. Some efforts have added voltage scaling capability to StrongARM based systems using external circuits [19][25]. Crusoe chips provide both voltage and frequency scaling [7]. The Power-Aware Multiprocessor Architecture (PAMA) effort implemented a multiprocessor system that provides frequency scaling [28]. Industrial standards include Advanced Configuration and Power Interface (ACPI) [13] and OnNow [18]. Many power management techniques using this hardware and software interface supports have been proposed. The simplest and most widely used technique for dynamic power management is the time-out method, in which components are turned off after a fixed amount of idling time. Advanced methods include more sophisticated techniques to estimate expected idling time to minimize the power usage. For example, the idling history can be used to estimate the future status of a system [11][17][27]. Low-power task scheduling algorithms for uniprocessors [1][3][15][24][26][30] and multiple devices [5][16] have been proposed. In these algorithms, the service requests are clustered to make the busy time and idle time long enough to minimize state transitions and hence reduce power usage. However, these algorithms cannot save power if the input data is fed into the system continuously since the systems are never turned off. Stochastic control approaches are used in [1][22][23]. In these approaches, the dynamic power management problem is formulated as a policy optimization. In [1], a system is modeled as a discrete-time Markov decision process. In [22], a dynamic power management algorithm based on continuous-time Markov decision processes is proposed. In [23], the system is extended to include multiple service providers. These algorithms calculate power 1

Compile-time

Run-time Run-time History keeping

Expected event schedule for application 0

Expected charging schedule

Surplus function computation

Expected event schedule for application 1

Powerparameter computation

Expected event schedule for application L-1 Allowable power estimation

Powerparameter table

Compute system parameters Cmax Cmin Heterogeneous System

System parameters

Processor 0

Compute difference between actual and estimated

Processor 1

Processor n-1

Application scheduling

Figure 1. Dynamic power management algorithm management policies for system components that minimize power usage for given delay constraints. Thus, if power is oversupplied, i.e., the supplied energy is more than the rechargeable battery can store, then the energy is wasted. Other related areas such as instruction-level power optimization, variable voltage techniques, and approximate signal processing are extensively surveyed in [5]. Previous work focuses mainly on minimizing the energy consumption or maximizing the performance when the energy or power budget is given. We presented a power-aware power management technique for homogeneous systems in [28]. In [28], both the performance and power utilization for homogeneous systems are considered together and they are maximized. The algorithm in the paper is extended to heterogeneous systems here. Also, the algorithm in this paper is for 2

discrete power systems in which possible voltages and frequencies are a set of discrete values which is the case for most real systems. To maximize both power utilization and performance, we first maximize the power utilization, which results in an initial power allocation schedule. Then, the performance is maximized for the given schedule. In our algorithm, we consider both environment and computing resources together. Environmental factors include the dynamic nature of the energy source, the status of the energy currently available from a rechargeable battery, and the dynamic nature of the events. The parameters of the computing resource include operating voltage, frequency, and the number of active processors. Figure 1 shows our approach. It first estimates the initial power allocation. The estimation of the initial power allocation maximizes power utilization and performance by avoiding oversupplied or undersupplied power conditions by using or saving energy before such conditions occur. Then, based on the power allocation, the system parameters are computed. The parameters are computed in compile time and the results are stored in a table. In runtime, the parameters are read from the table based on the power at given time. Then, in run-time, the variances of the externally supplied energy and power usage are computed dynamically and are used to recalculate the power allowance and system parameters. The rest of the paper is organized as follows: In Section 2, the dynamic power management problem is defined. Section 3 presents our algorithm. Simulation results are shown in Section 4. Section 5 concludes the paper.

2. Problem Definition The system considered in this paper consists of N heterogeneous processors. Each processor can operate at one of the frequencies in a set of values between fmin and fmax and can be turned off independently. The supplied voltage to each processor is one of the voltages in a set which consists of values between vmin and vmax. The power is supplied from a rechargeable battery that is charged by an external power source that has a periodic power supply schedule. For example, a signal processing system on a satellite operates using energy from solar panels mounted on the satellite. The charging property is periodic due to periodic orbiting. In this paper, we assume that the power supplied to processors can be changed discretely rather than in a continuous way. The minimum amount of power change is denoted as Punit.

We denote the period as T. For a given time t, 0 ≤ t < T, the following are defined:

Expected charging schedule c(t). The c(t) is the estimated external power that will be supplied at time t. The schedule may be derived theoretically or empirically. For example, the recorded charging power for the previous period or weighted average of several previous periods can be used. Expected event rate schedule u(t): The event rate is the rate of the events that initiate the computation on the system. As in the expected charging schedule, u(t) can be derived using any reasonable method such as predicting from data from previous periods or mathematically predicting supply power. For example, if the event is related to weather conditions, weather forecast data can be used for the estimation of u(t). Weight function w(t): The weight function is a user input that is used to emphasize some portion of the period. For example, if we want to process data more intensively during commute time in a traffic monitoring system, then the period is given a higher weight value. Maximum charging capacity Cmax: Even if the externally supplied energy is available for charging, if the energy charged to the battery is equal to Cmax, then the energy cannot be stored. Thus, the additional supplied energy is wasted. Minimum charging capacity Cmin: A minimum charge, cmin, that should be maintained at all times on the battery to allow for emergency graceful processing (e.g., shut-down). Heterogeneous processors: There are n heterogeneous processors in the system. Each of them may have different power-performance pair. Energy utilization: (Energy used for computation)/(Energy available) for the period T. The goal is to compute the following parameters of the multiprocessor system to maximize the energy utilization and to maximize the performance for the given inputs. Frequency of processors (f0, f1, … fn-1)t: The frequency of each processor at time t. Voltage of processors (v0, v1, … vn-1)t: The voltage of each active processor at time t. 3

3. Dynamic Power Management Algorithm In this section, our dynamic power management algorithm is described. First, the initial power allocation computation is presented. Then, the system parameter computation algorithm based on the allocated power is shown. Then, the dynamic update of the power allocation and system parameters during run-time is described. 3.1. Initial Power Allocation Computation To compute initial power allocation, the weighted power usage function, g(t) is computed. The g(t) is a desired power allocation based on the event rate schedule and weight function. g(t) = u(t) w(t) for all 0 ≤ t ≤ T

(1)

The surplus power function is c(t) − g(t)

(2)

The integration of the above function shows the energy stored in rechargeable battery (not including externally supplied power) at time t. t

Eoriginal (t ) = ∫ c(v) − g (v)dv

(3)

0

This function may exceed Cmax or be less than Cmin. If it exceeds Cmax, then the externally supplied energy available cannot be charged to the battery, which results in the waste of available energy. If it is less than Cmin, then computation may not be performed until the battery is recharged. Thus, if the energy available at time t exceeds Cmax, then it is desirable to dissipate some power before time t for useful tasks to obtain better energy utilization. If the energy available at time t is less than Cmin, then the energy needs to be saved before time t by using less energy before time t. The adjustment of power allocation can be done using the following algorithm. Algorithm 1: Adjust power dissipation schedule 1 Choose time t0 where Eoriginal (t 0 ) > C max is maximum. Set this point as the starting point of a period. t

2

E original (t ) = ∫ c(v) − g (v)dv + C max ;

3

S ← {t| dEoriginal (t ) = 0 and ( Eoriginal (t ) < C min or E original (t ) > C max ) };

4 5 6

Sort S based on t in ascending order; for all elements in S if two consecutive elements, t0 and t1 satisfy Eoriginal (t0 ) < Cmin and Eoriginal (t1 ) < C min then

0

dt

7 8 9

Remove the element that has larger Eoriginal (t ) ; if two consecutive elements, t0 and t1 satisfy Eoriginal (t 0 ) > C max and Eoriginal (t1 ) > Cmax then Remove the element that has smaller Eoriginal (t ) ;

10 t0← the first element of S; 11 S ← S – {t0}; 12 while S is not empty 13 t1 ← the first element of S; 14 if Eoriginal (t 0 ) < Cmin and Eoriginal (t1 ) > Cmax then 15 16

E init (t ) =

C max − C min ( E original (t ) − E original (t 0 )) + C min , E original (t1 ) − E original (t 0 )

t 0 ≤ t ≤ t1 ;

else if Eoriginal (t ) > Cmax and Eoriginal (t1 ) < C min then

17 18 t0 ← t1; 19 S ← S- {t1}; 20 t1 ← T;

Einit (t ) =

C max − C min ( Eoriginal (t ) − Eoriginal (t1 )) + C min , Eoriginal (t 0 ) − Eoriginal (t1 )

4

t 0 ≤ t ≤ t1 ;

21 if Eoriginal (t 0 ) < Cmin then 22 23 else 24

E init (t ) =

C max − C min ( E original (t ) − E original (t 0 )) + C min , E original (t1 ) − E original (t 0 )

E init (t ) = C max ,

t 0 ≤ t ≤ t1 ;

t 0 ≤ t ≤ t1 ;

25 Pinitl (t ) = c(t ) − dEinit (t ) , 0 ≤ t < T dt

In line 1, the period is set to start when the available energy is Cmax for the simplicity. In line 2, the energy available is set to Cmax. This is based on the assumption that the system starts with fully charged battery that makes the presentation simpler without loss of generality. In lines 3 – 9, the algorithm first identifies the times at which the original energy allocation reaches maximum or minimum. These times are used to adjust the power allocation. From lines 10 - 19, the algorithm adjusts the energy allocation by allocating more power or less power based on this information. In the algorithm, the amount of stored energy depends on the original power allocation. However, other ways of adjusting can be used. For example, the power can be evenly distributed. In lines 21 - 24, the power allocation is adjusted for the rest of the period. The energy available is Cmax to make it sure that the periodic operation continues. Using the Einit(t), Pinit(t) is calculated in Line 25. The complexity of the algorithm is O(T lg T), where T is the interval. 3.2. Initial Parameter Computation For the given initial power allocation, the system parameters need to be computed. The status of the system at time t can be denoted using (f0, … fn-1, v0, … vn)t, where fi denotes the frequency of the processor i and vi denotes the voltage of the processor i. When a processor is inactive, the processor is denoted using zero voltage or frequency. The frequency and voltage are such that fi ∈F and vi ∈ V, where F and V are sets of frequency values and voltages values, respectively. The number of active processors can be easily derived by counting processors that have non-zero frequency or voltage. The parameter computation algorithm first generates the power-parameter table and power-performance table for each processor before run time. In our implementation, we define performance as inverse of the execution time of a task (See Section 4), but it can be defined as any measure depending on the application. It is generated using the following simple algorithm. Let us assume that vmin,p is the minimum voltage for processor p, h-1p(f) is the minimum voltage of processor p that allows the processor operates at frequency f. Algorithm 2 Power-parameter table and power-performance table 1 for all 0 ≤ p ≤ n-1 2 for all possible frequency, f, for processor p 3 voltage = max (vmin,p, h-1p(f)); 4 Apply voltage v to processor p and set the frequency at f and measure the performance and power; 5 Put the power and performance in power-performance table; 6 Put the power, voltage, and frequency in power-parameter table;

The algorithm sets each processor at various frequencies and measures the power and performance. The results are stored in corresponding tables. Then, during run time, it simply reads parameter values from the power-parameter table based on the allocated power to each processor. Thus, the next step is computing power allocated to each processor which is shown in Algorithm 2. In this algorithm, for a given total power to the system, the processor and power usage are chosen to maximize the performance. The main idea of the algorithm is using a performance per power measure. Using the measure, we choose the pair of processor and power amount that has the highest performance per power. Then, the next best pair is chosen until all power is allocated to the processors. After each selection, the performance per power measure is recalculated since the choice changes the value of the performance per power measure. 5

Algorithm 3 Determining system parameters from allowable power 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

for all 0 ≤ p ≤ n-1 bp ← 0; for all 1 ≤ s ≤ maximum power possible in the system Pavail ← s; S ← (p, j) s.t. j ≤ Pavail, where p is performance and j is power; While S is not empty for all 0 ≤ p ≤ n-1, 0 ≤ i ≤ np-1 pppt(p, i) = ppt(p,i) / i; choose (p, j)∈Z that has the maximum pppt(p, j); bp = j; Pavail←Pavail - j; np = np - j; Z ← (p, j); for j ≤ k pppt(p, k-j) = (ppt(p,k+bp) -ppt(p,j+bp) ) / (k-j); S ← (p, j) s.t. j ≤ Pavail; // end of while loop for all 0 ≤ p ≤ n-1 choose maximum value of j from (p, j)∈Z, 0 ≤ p ≤ n- 1and put in Tparam(s); for all items in Tparam if power used for two s values are the same remove an item that is for greater s;

Lines 1 – 3 initialize variables for the algorithm. Lines 3 computes the system parameters for each power level. Lines 5 – 8 compute the performance per power for each performance-power pair. Line 9 chooses the performancepower pair that has the best performance per power. After a performance-pair is chosen, the pair is moved to Z that contains chosen pairs from S that contains candidate pairs. In lines 12 – 15, pw(u,v) and pf(u,v) are updated. Lines 17 – 21 remove an item that has more power if more than one item have the same power values. The complexity of the algorithm is O(N2S2), where N is the number of processors and S is the number of power steps for a processor. Example: Let us assume that there are four processors in a system and the power-performance is shown as in Table 1. For simplicity, let us assume that the power is an integer value. Let us assume that the power is 6 W. The values of performance per power are shown in Table 2. The solid line in Table 2 shows the possible choices with which power is less than allocated power, 6 W. From the table, the maximum value is for the pair of (Processor 2, 1 W), so processor 2 is chosen. Then, available power is reduced by 1 W which was allocated to Processor 2. The performance per power is also recalculated and the result is shown in Table 3. Note that the solid line boundary is changed due to the allocated power. From the Table 3, the next best choice is allocating 1 W to Processor 0. Then, the new performance/power is recalculated as in Table 4. Then, it is repeated until the allocated power is zero. The final power allocations computed for Processor 0, Processor 1, Processor 2, and Processor 3 are 1, 0, 3, and 2, respectively. Table 1. An example of power-performance table

Power Processor

0 1 2 3

0

1

2

3

4

5

6

7

8

9

10

11

0.0 0.0 0.0 0.0

9.0 0.0 10.0 0.0

0.0 8.0 .0.0 18.0

13.0 0.0 20.0 0.0

0.0 0.0 0.0 0.0

15.0 16.0 23.0 21.0

0.0 0.0 0.0 0.0

16.0 0.0 25.0 0.0

0.0 24.0 0.0 22.0

17.0 0.0 26.0 0.0

0.0 0.0 0.0 0.0

18.0 32.0 27.0 23.0

( W)

6

Table 2. Performance per power table

Power Processor

0

1

2

3

4

5

6

7

8

9

10

11

9.0 0.0 10.0 0.0

0.0 4.0 0.0 8.5

4.3 0.0 6.7 0.0

0.0 0.0 0.0 0.0

3.0 3.2 4.6 4.2

0.0 0.0 0.0 0.0

2.3 0.0 3.6 0.0

0.0 3.0 0.0 2.8

1.9 0.0 2.9 0.0

0.0 0.0 0.0 0.0

1.6 2.9 2.5 2.1

( W)

0 1 2 3

Table 3. Recomputed performance per power table

Power Processor

0

1

2

3

4

5

6

7

8

9

10

11

9.0 0.0 0.0 0.0

0.0 4.0 5.0 8.5

4.3 0.0 0.0 0.0

0.0 0.0 3.3 0.0

3.0 3.2 0.0 4.2

0.0 0.0 2.5 0.0

2.3 0.0 0.0 0.0

0.0 3.0 2.0 2.8

1.9 0.0 0.0 0.0

0.0 0.0 1.7 0.0

1.6 2.9 0.0 2.1

( W)

0 1 2 3

Table 4. Recomputed performance per power table

Power Processor

0 1 2 3

0

1

2

3

4

5

6

7

8

9

10

11

0.0 0.0 0.0 0.0

2.0 4.0 5.0 8.5

0.0 0.0 0.0 0.0

1.5 0.0 3.3 0.0

0.0 3.2 0.0 4.2

1.2 0.0 2.5 0.0

0.0 0.0 0.0 0.0

1.0 3.0 2.0 2.8

0.0 0.0 0.0 0.0

0.9 0.0 1.7 0.0

0.0 2.9 0.0 2.1

( W)

The times when the parameters of the system change are determined by the following Algorithm 4. Algorithm 4 Calculation of the time to change system parameters 1 S ← all t s.t. Pinit(t) is the power in Tparam; 2 Sort S based on time; 3 t0, t1 ← the first two elements in S; 4 while (S is not empty) 5 if Pinit(t0) ≠ Pinit(t1) 6 calculate t2 s. t. 7 T ← (t2, Tparam(Pinit(t1))); 8 t0 ← t1 ; 9 t1 ← next element in S ; 10 else if Pinit(t0) = Pinit(t1) 11 t1 ← next element in S ;

Line 1 computes the times when the Pinit values are the same as one of the power values in the set Tparam. Then, the system parameters need to be changed at time between the points. These are computed in Lines 4 – 11. The complexity of the algorithm is O(T lg T). 7

3.3. Dynamic Recomputation of the Power Allocation and System Parameters Since the actual event rate and external energy supply can be changed during run-time, the energy consumption can vary from the initial expectation. Algorithm 3 is used to recompute the power allocation during run time. In this case, the Eavail in line 1 is the real energy available at time t. Algorithm 5 Dynamic update of the power allocation 1 Ediff ← Einit(t) – Eavail(t); 2 if Ediff > 0 then 3 Find time w such that P ( w) = C ; init max 4 for all v such that t ≤ v < w 5 P (v ) Pinit (v) = min( E diff

6 7 8

9

init

w

∫ Pinit (v)dv

, C max );

t

if Ediff < 0 then Find time w such that Pinit ( w) = C min ; for all v such that t ≤ v < w Pinit (v) = Ediff

Pinit (v) w

∫P

init

;

(v)dv

t

Lines 2 –5 are executed when the available energy is less than the planned energy and Lines 6 – 9 are executed when the available energy is more than the planned energy. In each case, the energy difference between the planned and available energies is distributed over the time between the current time and the time when the Pinit is Cmax or Cmin. The complexity of the algorithm is O(T).

4. Simulation Results: An Example System We implemented a simplified Fast On-Orbit Recording of Transient Events (FORTE) [20] signal processing application that detects radio frequency events from a satellite. As an example of an LEO satellite, the weather satellite NOAA10 tracked the regions shown in Figure 2 on Januray 26th, 2001 at around 11:00am. We implemented software trigger. The software trigger performs an FFT and computes the time of arrival when the signal from the sensor triggers an analogue threshold circuit. The software trigger checks if the signal has the characteristics of an interesting RF event.

Figure 2. A screenshot of Jtrack The system simulated is based on the parameters of the second generation PAMA (Power-Aware Multiprocessor Architecture) board [4]. The board consists of four Hitachi SH-4 processors and two FPGAs for interconnection. The board is shown in Figure 3. The clock speed of the processors varies between 1 MHz and 200 MHz. The Linux operating system is ported and is running on each processor. The network in the FPGA is a bi-directional crossbar 8

network. A power measurement board is used to measure real-time power consumption. The performance for various power are shown as Processor 1 in Figure 4. We defined performance as the inverse execution time, i.e., the number of software trigger execution in one second. To simulate a heterogeneous system, we used two artificial processors that have different power-performance characteristics as shown as Processor 2 and Processor 3 in Figure 4. Using the algorithm 3, we computed the parameters for powers as shown in Figure 5. The Processor 1 and Processor 2 show interesting result when power is changed from 6W to 7W, i.e., the powers allocated for them are reduced. This is because of the discrete nature of the system. Processor 3 has a pair of processor-power, (Processor 3, 1.5W), that has a high value of performance per power. However, that pair needs 1.5 W to be selected. When the total power is 6W, after selecting pairs that have the highest values of performance per power, the pair (Processor 3, 1.5W) was not selected. However, another pair that needs less power, e.g., (Processor 3, 0.5W) can be selected even though the performance per power is smaller. Thus, when power is increased to 7W, the pair (Processor 1, 1.5W) now can be selected first due to the increased power and the pair, (Processor 3, 0.5W), is not selected.

Performance

Figure 3. PAMA-2 board 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0

Processor 1 Processor 2 Processor 3

0

0.5

1

1.5

2

2.5

3

Power (W)

Figure 4. Power-performance values for PAMA-2 board (Processor 1: SH-4, Processor 2 and 3: Artificial)

9

Power to processors (W)

3.5 3

Processor 1 Processor 2

2.5

Processor 3

2 1.5 1 0.5 0 1

3

5

7

9

11

13

15

17

19

Total power (W) Figure 5. Power allocation to each processor for a heterogeneous system

14 12 Proposed

Performance

10

Exhaustive

8 6 4 2 0 0

1

2

3

4

5

6

7

8

9

Power

Figure 6. Power-performance values for a heterogeneous system To assess the algorithm, we implemented an exhaustive search algorithm and compared. For this scenario, our algorithm provides the same performance for all power values except one when power is 2.5. The performance was 5.1 for the proposed algorithm versus 5.2 for exhaustive search result. In another scenario that has more power steps, it also provides the performance values for most power values that is shown in Figure 7.

10

25

20

Processo 1

Performance

Processor 2 Processor 3

15

10

5

9.9

9.6

9.1

Power

8.4

7.5

6.4

5.1

3.6

1.9

0

0

(a) power-performance for each processor

50 45 Proposed Straightforward

Performance

40 35 30 25 20 15 10 5

41 .8

38

22 .8 26 .6 30 .4 34 .2

19

7. 6 11 .4 15 .2

3. 8

0

0

Power (b) Power-performance for a system Figure 7. Power-performance values for a heterogeneous system The run-time scenario is shown from Figure 8 to Figure 12. We assumed Cmin = 240 J, Cmax = 1800 J, and Punit = 0.5 W. Figure 8 shows the charging schedule and use schedule. The initial power allocation and energy available on rechargeable battery are shown in Figure 9 and Figure 10, respectively.

11

9 8 7 Power (W)

6 5 4 3 Charging schedule

2

Use schedule

1

99

92

85

78

71

64

57

50

43

36

29

22

15

8

1

0 Tim e (m in.)

Figure 8. Charging schedule and use schedule 9 8 7 Pinit (W)

6

Pinit

5 4 3 2 1

99

92

85

78

71

64

57

50

43

36

29

22

15

8

1

0 Time (min.)

Figure 9. Pinit

Available Energy (KJ)

2 1.5 1 0.5

99

92

85

78

71

64

57

50

43

36

29

22

15

8

1

0

Time (min.)

Figure 10. Available energy in a rechargeable battery 12

Two metrics are computed for comparison: energy wasted and undersupplied energy. The energy wasted is energy that was not used for useful computation. This happens when the battery is fully charged while the external source has energy to supply. Undersupplied energy means the energy needed for computation but not available at that time. As shown in the Figure 10, if the power usage and charge are the same as expected, then, there is no overcharge and undersupplied situation. However, in real systems, the deviation from the expected scenario is unavoidable. To simulate this situation, we shifted the use schedule by various values and computed the overcharge and undersupply energy. The results are shown in Figure 11 and Figure 12. 10 9 Wasted energy (KJ)

8 7

Proposed

6

Straightforw ard

5 4 3 2 1 0 -5

-4

-3

-2

-1

0

1

2

3

4

5

Shift of charge schedule (m in.)

Figure 11. Wasted energy 6

Wasted energy (KJ)

5 Proposed

4

Straightforw ard

3 2 1 0 -5

-4

-3

-2

-1

0

1

2

3

4

5

Shift of charge schedule (m in.)

Figure 12. Undersupplied energy Since the algorithms proposed here are unique in that they do not simply minimize the energy used, they can not be compared directly with existing algorithms. However, to provide a general idea of the advantages of the proposed algorithms, we implemented a modified time-out algorithm and compared. In the modified time-out algorithm, the history data for the previous period is used to calculate the optimal delay time before shut down. If the externally supplied energy is more than the usage, then the difference is charged to a rechargeable battery. If more energy is used than supplied energy, then, the difference is supplied from battery. 13

The results show that the proposed algorithm reduces the wasted energy significantly compared with the modified time-out algorithm. Also, since it allocates less power if it anticipates a situation when the energy is undersupplied, it lowers the probability of the undersupplied situation. As seen in the figures, it reduces the wasted energy and undersupplied energy most when the expected charge schedule and use schedule matches with actual charge and use. The closer the actual behavior to the planned schedules, the smaller the wasted energy and undersupplied energy.

5. Conclusions We have described a dynamic power management algorithm for heterogeneous systems. By computing the initial power allocation, our technique can minimize the wasted energy. Also, system parameters are calculated to maximize the performance for given power. We have simulated the algorithm for a satellite signal processing system based on the Power-Aware Multiprocessor Architecture (PAMA). The results show that the dynamic algorithm reduces the wasted energy significantly. It also prevents a situation where the supplied power is not enough to maintain the minimum power requirement.

6. References [1] H. Aydin, R. Melhem, D. Moss? and Pedro Mejia Alvarez ``Dynamic and Aggressive Scheduling Techniques for Power-Aware Real-Time Systems'', RTSS'01 (Real-Time Systems Symposium), London, England, Dec 2001. [2] L. Benini, A. Bogliolo, G. Paleologo, and G. De Micheli, “Policy Optimization for Dynamic Power Management,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,” Vol. 18, No. 6, pp. 813-833, June 1999. [3] J. J. Brow, D. Z. Chen, G. W. Greenwood, X. Hu, and R. W. Taylor, “Scheduling for Power Reduction in a Real-Time System,” International Symposium on Low Power Electronics and Design, 1997. [4] S. Crago, D.-I. Kang, C. Li, J. Suh, K. McCabe, and M. Gokhale, “Power Aware Multiprocessor Architecture,” DARPA PI meeting, Annapolis, MD, November, 2000. [5] D.-I. Kang, S. Crago, and J. Suh, “Power-Aware Design Synthesis Techniques for Distributed RealTime Systems,” ACM Workshop on Languages, Compilers, and Tools for Embedded Systems (LCTES) '01, Jun. 2001. [6] L. Benini and G. De Micheli, “System-Level Power Optimization: Techniques and Tools,” International Workshop on Low Power Electronics and Design, 1999. [7] DARPA, Power Aware Computation/Communication, http://www.darpa.mil/ito/research /pacc/index.html, 2001. [8] L. Geppert and T. S. Perry, “Transmeta’s Magic Show,” IEEE Spectrum, Vol. 37, No. 5, 2000. [9] J. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, 2nd Edition, Morgan Kaufmann Publishers, Inc., 1996. [10] I. Hong, G. Qu, M. Pokonjak, and M. B. Srivastava, “Synthetic Techniques for Low-Power Hard Real-Time Systems on Variable Voltage Processors,” IEEE Real-Time Systems Symposium, December 1998. [11] C.-H. Hwang and A. Wu, “A Predictive System Shutdown Method for Energy Saving of Eventdriven Computation,” International Conference on Computer Aided Design,” November 1997. [12] Intel, “Intel StrongARM 1100 Microprocessor Developers Manual,” August 1999. [13] Intel, Microsoft, and Toshiba, “Advanced Configuration and Power Interface Specification,” http://www.teleport. com/~acpi/, 2000. [14] JTRACK, NASA, http://liftoff.msfc.nasa.gov/realtime/Jttrack/, 2001. 14

[15] J. R. Lorch and A. J. Smith, “Scheduling Techniques for Reducing Processor Energy Use in MacOS,” Wireless Networks, Vol. 3, Num. 5, pp. 311-324, 1997. [16] Y.-H. Lu, L. Benimi, and G. De Micheli, “Low-power Task Scheduling for Multiple Devices,” International Workshop on Hardware/Software Codesign, 2000. [17] Y. Lu and G. De Micheli, “Adaptive Hard Disk Power Management on Personal Computers,” IEEE Great Lakes Symposium on VLSI, February 1999. [18] Microsoft, “Benefits of OnNow,” http://msdn.microsoft.com/training/offers/WINVBO_BLD/ Topics/ winvb00325. htm, 2000. [19] R. Min, M. Bhardwaj, S.-H. Cho, A. Sinha, E. Shih, A. Wang, and Anantha Chandrakassan, “An Architecture for a Power-aware Distributed Microsensor Node,” IEEE Workshop on Signal Processing Systems, 2000. [20] Mitsubishi Microcomputers, M32000D4BFP-80 Data Book, http://www.mitsubishichips.com/data/datasheets /mcus/mcupdf/ds/e32r80.pdf. [21] K. R. Moore, J. F. Wilkerson, D. Call, S. Johnson, T. Payne, W. Ford, K. Spencer, and C. Baumgart, “A Space-Based Classification System for RF Transients,” International Workshop on Artificial Intelligence in Solar-Terrestrial Physics, Lund, Sweden, 1993. [22] Q. Qiu and M. Pedram, “Dynamic Power Management Based on Continuous-Time Markov Decision Processes,” Design Automation Conference, June 1999. [23] Q. Qui, Q. Wu, and M. Pedram, “Dynamic Power Management of Complex Systems Using Generalized Stochastic Petri Nets,” Design Automation Conference, June 2000. [24] G. Qu and M Potkonjak, “Power Minimization using System Level Partitioning of Applications with Quality of Service Requirement,” International Conference on Computer Aided Design , 1999. [25] J. Pouwelse, K. Langendoen, and H. Sips, “Dynamic Voltage Scaling on a Low-Power Microprocessor,” UbiCom-Technical Report, April 2000. [26] Y. Shin and K. Choi, “Power Conscious Fixed Priority Scheduling for Hard Real-Time Systems,” Design Automation Conference, 1999. [27] M. Srivastava, A. Chandrakasan, and R. Brodersen, “Predictive System Shutdown and Other Architectural Techniques for Energy Efficient Programmable Computation,” IEEE Transactions on VLSI System, Vol. 4, pp. 42-55, March 1996. [28] J. Suh, D. Kang, and S. P. Crago, “Dynamic Power Management of Multiprocessor Systems,” Tenth International Workshop on Parallel and Distributed Real-Time Systems (WPDRTS) in conjunction with IPDPS, Fort Lauderdale, FL, April 2002. [29] J. Suh, C. Li, S. P. Crago, and R. Parker, “A PIM-Based Multiprocessor System,” International Parallel and Distributed Processing Symposium, San Francisco, CA, April 2001. [30] M. Weiser, B. Welch, A. Demers, and S. Shenker, “Scheduling for Reduced CPU Energy,” Symposium on Operating Systems Design and Implementation, 1994. 7. Acknowledgment The authors appreciate the proofreading by Jennifer Schmidt and the help of Maya Gokhale, Scott Briles, Patrick Shriver, and Kevin McCabe at Los Alamos National Laboratory in providing FORTE information. Effort sponsored by Defense Advanced Research Projects Agency (DARPA) through the Air Force Research Laboratory, USAF, under agreement number F30602-00-2-0548. The U.S. Government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not interpreted as necessarily representing the official policies or endorsement, either expressed or implied, of the Defense Advanced Research Projects Agency (DARPA), Air Force Research Laboratory, or the U.S. Government.

15