Two Iterative Metaheuristic Approaches to ... - Semantic Scholar

0 downloads 0 Views 188KB Size Report
F-56321 Lorient Cedex, France [email protected]. Abstract. Electronic embedded systems designers aim at finding a trade- off between cost and power ...
Two Iterative Metaheuristic Approaches to Dynamic Memory Allocation for Embedded Systems Mar´ıa Soto, Andr´e Rossi, and Marc Sevaux Universit´e de Bretagne-Sud, Lab-STICC, CNRS Centre de recherche B.P. 92116 F-56321 Lorient Cedex, France [email protected]

Abstract. Electronic embedded systems designers aim at finding a tradeoff between cost and power consumption. As cache memory management has been shown to have a significant impact on power consumption, this paper addresses dynamic memory allocation for embedded systems with a special emphasis on time performance. In this work, time is split into time intervals, into which the application to be implemented by the embedded system requires accessing to data structures. The proposed iterative metaheuristics aim at determining which data structure should be stored in cache memory at each time interval in order to minimize reallocation and conflict costs. These approaches take advantage of metaheuristics previously designed for a static memory allocation problem. Keywords: Memory allocation, Electronics, Metaheuristics.

1

Introduction

Advances in nanotechonolgy have made possible the design of miniaturized electronic chips which have drastically extended the features supported by embedded systems. Smart phones that can surf the WEB and process HF images are a typical example. While technology offers more and more opportunities, the design of embedded systems becomes more and more complex. In addition to market pressure, this context has favored the development of Computer Assisted Design (CAD) software, which bring a deep change in the designers’ line of work. CAD tools as Gaut [1] can generate the architecture of a circuit from its specifications, but the designs produced by CAD software usually lack optimization, which results in high power consumption, and this is of course a major drawback. Thus, designers want to find a trade-off between architecture cost (i.e. the number of memory banks in the embedded system) and its power consumption [2]. To some extent, electronics practitioners consider that minimizing power consumption is equivalent to minimizing the running time of the application to be implemented by the embedded system [3]. Moreover, the power consumption of a given application can be estimated using an empiric model as in [4], and P. Merz and J.-K. Hao (Eds.): EvoCOP 2011, LNCS 6622, pp. 250–261, 2011. c Springer-Verlag Berlin Heidelberg 2011 

Metaheuristics for Dynamic Memory Allocation

251

parallelization of data access is viewed as the main action point for minimizing execution time, hence power consumption. We study dynamic memory allocation in embedded processors, as this issue has a significant impact on execution time and on power consumption as shown by Wuytack et al. in [5]. There exist recentl approaches of software and hardware techniques [6,7] to tackle this kind of problems. In this paper we address this problem from the point of view of operations research. It is organized as follows. Section 2 provides a more detailed presentation of the problem, and Section 3 gives an integer linear program formulation. Two iterative metaheuristics are then proposed for addressing larger problem instances in Section 4. Computational results are then shown and discussed in Section 5.

2

Modeling the Problem

The main objective in embedded system is very often to implement signal processing applications efficiently (i.e. MPEG decoding, digital filter, FFT etc.). The application to be implemented is assumed to be given as a C source code, whose data structures (i.e. variables, arrays, structures) have to be loaded in the cache memory of the processor that executes it. Time is split into T time intervals whose durations may be different, those durations are assumed to be given along with the application. During each time interval, the application requires accessing a given subset of its data structures for reading and/or writing. Unlike alternative problem versions like in [8] where a static data structure allocation is searched for, the problem addressed in this paper is to find a dynamic memory allocation, i.e. the memory allocation of a data structure may vary over time. Roughly speaking, one wants the right data structure to be present in cache memory at the right time, while minimizing the efforts for updating memory mapping at each time interval. The chosen memory architecture is similar to the one of a TI C6201 device, which is composed of m memory banks (i.e. cache memory) whose capacity is cj kilo octet (ko) for all j ∈ {1, . . . , m} and an external memory (i.e. RAM memory) whose capacity is supposed to be large enough to be considered as unlimited. The external memory is referred to as memory bank m + 1. The processor requires access to data structures in order to execute the operations (or instructions) of the application. The data structure access time is expressed in milliseconds, and depends on its current allocation. If the data structure is allocated to a memory bank, its access time is equal to its size in ko because the transfer rate from a memory bank to the processor is one ko per millisecond. If it is allocated to the external memory, its access time is p ms per ko. Initially (i.e. during time interval I0 ), all data structures are in the external memory and memory banks are empty. The time required for moving a data structure form the external memory to a memory bank (and vice-versa) is v ms/ko. The time required for moving a data structure from a memory bank to

252

M. Soto, A. Rossi, and M. Sevaux

another is l ms/ko. The memory management system is equipped with a DMA (Direct Memory Access) controller that allows for a direct access to data structures. The time performances of that controller are captured with the numerical values of v and l. Moreover, the time required for moving a data structure in memory is assumed to be less than its access cost: v < p. The TI C6201 device can access all its memory bank simultaneously, which allows for parallel data loading. Thus, both data structures (or variables) a and b can be loaded in parallel when the operation a+b is to be executed by the processor, provided that a and b are allocated to two different memory banks. If these variables share the same memory bank, the processor has to access them sequentially, which requires twice more time if a and b have the same size. Two data structures are said to be conflicting whenever they are involved in the same operation in the application. Each conflict has a cost, that is equal to the number of times a and b are involved in the same operation during the current time interval. This cost might be non-integer if the application source code has been analyzed by a code-profiling software [9,10] based on the stochastic analysis of the branching probability of conditional instructions. This happens when an operation is executed within a while loop or after a conditional instruction like if or else if. A conflict between two data structures is said to be closed if both data structures are allocated to two different memory banks. In any other case, the conflict is said to be open. A data structure can be conflicting with itself: this typically happens when the data structure is an array, and the application performs an operation like a[i] = a[i+1]. However, data structures can not be split and expand over different memory banks. This problem is denoted by Dy-MemExplorer, it is to allocate a memory bank or the external memory to any data structure of the application for each time interval, so as to minimize the time spent accessing and moving data structures while satisfying the memory banks’ capacity. In this paper, the application to be implemented and its data structures are assumed to be given. In practice, a software like SoftExplorer [11] can be used for collecting the data, but the code profiling is out of the scope of this work.

3

ILP Formulation for Dy-MemExplorer Problem

Dy-MemExplorer problem is intrinsically linear, and in this section we present its integer linear formulation. Let n be the number of data structures in the application. The size of a data structure is denoted by si , for all i in {1, . . . , n}. nt is the number of data structures that the application has to access during the time interval It , for all t in {1 . . . , T }. At ⊂ {1, . . . , n} denotes the set of data structure required in the time interval It for all t ∈ {1, . . . , T }. Thus ei,t denotes the number of times that i ∈ At is accessed in the interval It . The number of conflicts in It is denoted by ot , and dk,t is the cost of conflict (k, t) = (k1 , k2 ) during the time interval It for all k in {1, . . . , ot }, k1 and k2 in At , and t in {1, . . . , T }.

Metaheuristics for Dynamic Memory Allocation

253

The allocation of data structures to memory banks (and to the external memory) for each time interval are modeled as follows. For all (i, j, t) in {1, . . . , n}× {1, . . . , m + 1}× {1, . . . , T }, xi,j,t is set to one if and only if data structure i is allocated to memory bank j during time interval It , xi,j,t = 0 otherwise. The statuses of conflicts are represented as follows. For all k in {1, . . . , ot } and t ∈ {1, . . . , T }, yk,t is set to one if and only if conflict k is closed during time interval It , otherwise yk,t = 0. The allocation change for a data structure is represented with the two following sets of variables. For all i in {1, . . . , n} and t ∈ {1, . . . , T }, wi,t is set to one if and only if the data structure i has been moved from a memory bank j = m + 1 at It−1 to a different memory bank j  = m + 1 during time interval  is set to one if and only if the It . For all i in {1, . . . , n} and t ∈ {1, . . . , T }, wi,t data structure i has been moved from a memory bank j = m + 1 at It−1 to the external memory, or if it has been moved from the external memory at It−1 to a memory bank during time interval It . The cost of executing operations in the application can be written as follows: T    m   t=1

ot      yk,t · dk,t ei,t · xi,j,t + p ei,t · xi,m+1,t −

i∈At j=1

i∈At

(1)

k=1

The first term in (1) is the access cost of all the data structures that are in a memory bank, the second term is the access cost of all the data structures allocated to the external memory, and the last one accounts for closed conflict cost. The cost of moving data structures between the intervals can be written as: nt T   t=1

 si · (l · wi,t + v · wi,t )

 (2)

i=1

 m+1  The cost of a solution is the sum of these two costs. Since i∈At j=1 ei,t ·     xi,j,t = i∈At ei,t is a constant for all t in {1, . . . , T }. The cost function to minimize is equivalent to: f=

T  



(p − 1)

t=1

ot      yk,t · dk,t + si · (l · wi,t + v · wi,t ) (3) ei,t · xi,m+1,t −

i∈At

k=1

i∈At

The ILP formulation of Dy-MemExplorer is then Minimize

f m+1 

(4) xi,j,t

=1

∀i ∈ {1, . . . , n}, ∀t ∈ {1, . . . , T }

(5)

xi,j,t si

≤ cj

∀j ∈ {1, . . . , m}, ∀t ∈ {1, . . . , T }

(6)

j=1 n  i∈At

xk1 ,j,t + xk2 ,j,t ≤ 2 − yk,t

∀k1 , k2 ∈ At , ∀j ∈ {1, . . . , m + 1}, ∀k ∈ {1, . . . , ot }, ∀t ∈ {1, . . . , T }

(7)

254

M. Soto, A. Rossi, and M. Sevaux xi,j,t−1 + xi,g,t ≤ 1 + wi,t

∀i ∈ {1, .., n}, ∀j = g, (j, g) ∈ {1, .., m}2 , ∀t ∈ {1, .., T }

xi,m+1,t−1 + xi,j,t ≤ 1 +

 wi,t

xi,j,t−1 + xi,m+1,t ≤ 1 +

 wi,t

(8)

∀i ∈ {1, . . . , n}, ∀j ∈ {1, . . . , m}, ∀t ∈ {1, . . . , T }

(9)

∀i ∈ {1, . . . , n}, ∀j ∈ {1, . . . , m}, ∀t ∈ {1, . . . , T }

(10)

xi,j,0

=0

∀i ∈ {1, . . . , n}, ∀j ∈ {1, . . . , m}

(11)

xi,m+1,0

=1

∀i ∈ {1, . . . , n}

(12)

xi,j,t ∈ {0, 1}

∀i ∈ {1, . . . , n}, ∀j ∈ {1, . . . , m}, ∀t ∈ {1, . . . , T }

(13)

wi,t ∈ {0, 1}

∀i ∈ {1, . . . , n}, ∀t ∈ {1, . . . , T }

(14)

 ∈ {0, 1} wi,t

∀i ∈ {1, . . . , n}, ∀t ∈ {1, . . . , T }

(15)

yk,t ∈ {0, 1}

∀k ∈ {1, . . . , ot }, ∀t ∈ {1, . . . , T }

(16)

Equation (5) enforces that any data structure is either allocated to a memory bank or to the external memory. (6) states that the total size of the data structures allocated to any memory bank must not exceed its capacity. For all conflicts (k, t) = (k1 , k2 ), (7) ensures that variable yk,t is set appropriately. Equations (8)  . The fact that to (10) enforce the same constraints for variables wi,t and wi,t initially, all the data structures are in the external memory is enforced by (11) and (12). Finally, binary requirements are enforced by (13) − (16). This ILP formulation has been integrated in SoftExplorer. It can be solved for modest size instances using an ILP solver like Xpress-MP [12]. A simplified static version of this problem is when the application time is not split in intervals and memory banks are not subject to capacity constraints. Indeed, in that case the external memory is no longer used and the size, as well as the access cost of data structures can be ignored. This simplified static version is similar to the k-weighted graph coloring problem (see [13]). In this problem, the vertices represent data structures and edges represent conflict between a pair of structures. Colors model the allocation of memory banks to data structures. It is well known that this problem is NP -hard, and so is Dy-MemExplorer. 3.1

Example

For the sake of illustration, Dy-MemExplorer is solved on an instance originating in the LMS (Least Mean Square) dual-channel filter [14] which is a well-known signal processing algorithm. This algorithm is written in C and is to be implemented on a TI C6201 target. The compilation and code profiling of the C file yields an eight data structure instance having the same size (16384 Bytes), there are 2 memory banks whose capacity is 32768 Bytes. On that target, p = 16 (ms), and l = v = 1 (ms/ko). The conflicts (1;5), (2;6), (3;5) and (4;6) with cost conflict of 4095 are involved at intervals 1, 2, 3 and 4 respectively. The cost conflict for the remaining conflicts is 1. The conflicts (1;5) and (1;7) are involved

Metaheuristics for Dynamic Memory Allocation

255

in the interval 5. The interval 6 involves the conflicts (2;8) and (2;6); and the last interval involves the conflicts (3;3) and (4;4). Data structure 3 is required by the application during time interval I3 . In an optimal solution found by Xpress-MP [12], the data structures 1 and 3 are swapped for avoiding to access data structure 3 from the external memory. Similarly, during time interval I4 , data structures 4 and 5 are swapped, then there is no allocation change during the last time intervals. The data structures 7 and 8 are in the external memory during all the application lifetime because their moving cost is larger than their access cost. The cost of this solution is 147537 milliseconds.

4 4.1

Iterative Metaheuristic Approaches Long-Term Approach

This approach takes into account the application’s requirements for the current and future time intervals. The Long-term approach relies on addressing a memory allocation sub-problem called MemExplorer. This sub-problem is to search for a static memory allocation of data structures that could remain valid from the current time interval to the end of the last one. In this subproblem, the fact that the allocation of data structures can change at each time interval is ignored. MemExplorer is addressed for all time intervals It , for all t ∈ {1, . . . , T }. The data and variables of this subproblem are the same as for Dy-MemExplorer, but index t is removed. MemExplorer is then to find a memory allocation for data structures such that the time spent accessing these data is minimized, for a given number of capacitated memory banks and an external memory. MemExplorer is addressed using a Variable Neighborhood Search-based approach hybridized with a Tabu Search-inspired method in [8]. This algorithm relies on two neighborhoods: the first one is generated by performing a feasible allocation (respecting the capacity constrains to change allocation of a single data structure); and the other neighborhood explores solutions that are beyond the first one by allowing for unfeasible solutions before repairing them. The tabu search for MemExplorer is based on TabuCol, an algorithm for graph coloring introduced in [15]. The main difference with a classic tabu search is that the size of the tabu list is not constant over time. This idea is introduced in [16] and also used in the work of Porumbel, Hao and Kuntz on the graph coloring problem [17]. In TabuMemex, the size of the tabu list N T is set to a + N T max × t every N T max iterations, where a is a fixed integer and t is a random number in [0, 2]. The Long-term approach builds a solution iteratively, i.e. from time interval I1 to time interval IT . At each time interval, it builds a preliminary solution called the parent solution. Then, the solution for the considered time interval is built as follows: the solution is initialized to the parent solution; then, the data structures that are not required until the current time interval are allocated to the external memory.

256

M. Soto, A. Rossi, and M. Sevaux

Algorithm 1. Long-term approach Data: for each time interval t ∈ {1, . . . , T } a set of data structures At involved, a set of size of data structures St , a set of conflicts between data structures Kt and a set of cost of conflicts Dt . Result: X1 , . . . , XT memory allocations for each time interval and the total cost of the application C. Initially all data structures are in the external memory X0 (a) = m + 1, for all a ∈ ∪Tα=1 Aα P0 ← X0 for t ← 1 to T do Updating data A = ∪Tα=t Aα , A = ∪tα=1 Aα , E = ∪Tα=t Eα , S = ∪Tα=t Sα , S  = ∪tα=1 Sα , K = ∪Tα=t Kα , D = ∪Tα=t Dα Solving MemExplorer problem with current data Mt ← MemExplorer(A, E, S, K, D) Computing the total cost. executing cost plus converting cost CMt ← Access Cost(Mt , A, E, K, D) + Change Cost(Xt−1 , Mt , A , S  ) CPt−1 ← Access Cost(Pt−1 , A, E, K, D) + Change Cost(Xt−1 , Pt−1 , A , S  ) Choosing the parent solution if CMt < CPt−1 then Pt ← Mt else Pt ← Pt−1 end Making the solution at time interval t Xt ← Pt for a ∈ / A do Xt (a) = m + 1 end Computing the total cost of application C ← C+ Access Cost(Xt , At , Et , Kt , Dt ) + Change Cost(Xt−1 , Xt , A , S  ) end

At each time interval, the parent solution is selected among two candidate solutions. The candidate solutions are the parent solution for the previous interval, and the solution to MemExplorer for the current interval. The total cost of both candidate solutions is then computed. This cost is the sum of two subcosts. The first sub-cost is the cost that we would have if the candidate solution was applied from the current time interval to the last one. The second sub-cost is the cost to be paid for changing the memory mapping from the solution of the previous time interval (which is known) to the candidate solution. Then, the candidate solution associated with the minimum total cost is selected as the parent solution. The Long-term approach is presented in Algorithm 1. A memory allocation is denoted by X, so X(a) = j means that data structure a is allocated to memory bank j in {1, . . . , m + 1}. The solution Xt is associated with time interval It for all t in {1, . . . , T }. The solution X0 consists in allocating all the data structures of the application to the external memory.

Metaheuristics for Dynamic Memory Allocation

257

The parent solution is denoted by Pt for the time interval It . The algorithm builds the solution Xt by initializing Xt to Pt , and the data structures that are not required until time interval It are moved to the external memory. In the algorithm, Mt is the memory allocation found by solving the instance of MemExplorer built from the data for the time interval It . Then, a new instance of MemExplorer is solved at each iteration. Algorithm 1 uses two functions to compute the total cost CX of a solution X. The first sub-cost is computed by the function Access Cost(X, . . . , ). That function returns the cost produced by a memory allocation X for a specified instances (data) of Memexplorer. The second sub-cost is computed by the function Change Cost(X1, X2 ). It computes the cost of changing solution X1 into solution X2 . At each time interval It the parent solution Pt is chosen between two candidate Pt−1 and Mt . It is the one which produces the minimum total cost (comparing both the total cost CPt−1 and CMt ). At each iteration, Algorithm 1 updates the data and uses the same process to generate the time interval solution Xt for all t in {1, . . . , t}. 4.2

Short-Term Approach

This approach relies on addressing a memory allocation sub-problem called MemExplorer-Prime. Given an initial memory allocation, this sub-problem is to search for a memory allocation of data structures that should be valid from the current time interval. This sub-problem takes into account the cost for changing the solution of the previous time interval. Algorithm 2. Short-term approach Data: Same as Algorithm 1. Result: Same as Algorithm 1. Initially all data structures are in the external memory X0 (a) = m + 1, for all a ∈ ∪Tα=1 Aα for t ← 1 to T do Solve MemExplorer-Prime problem with current data Xt ← MemExplorer-Prime(Xt−1 , At , Et , St , Kt , Dt ) end

MemExplorer-Prime is addressed for all time intervals. The data of this subproblem are the same as for MemExplorer. MemExplorer-Prime is stated as follows: for a given initial memory allocation for data structures, number of capacitated memory banks and an external memory, we search for a memory allocation such that the time spent accessing data and the cost of changing allocation of these data are minimized. In this paper, MemExplorer-Prime is addressed using a Tabu Search method similar to one used by Long-term approach. The Short-term approach iteratively builds a solution for time intervals. Each solution is computed by taking into account the conflicts and data structures involved in the current time interval, and also by considering the allocation in the previous time interval. The Short-term approach solves MemExplorer-Prime

258

M. Soto, A. Rossi, and M. Sevaux

considering as initial allocation the allocation of data structures of previous interval. Algorithm 2 presents this approach. A solution X is defined as above, and it uses a function MemExplorer-Prime(X0, . . .) for solving an instance of the problem MemExplorer-Prime where the initial solution is solution X0 . At each iteration the algorithm updates the data and the solution produced by the MemExplorer-Prime(X0, . . .) is taken as the time interval solution.

5

Computational Results

These approaches have been implemented in the c++ programming language and compiled with gcc 4.11 in Linux OS 10.04. They have been tested over a set of instances on an Intel Pentium iv processor system at 3 ghz with 1 Gbyte ram. The first eighteen instances of Table 1 are real life instances that come from electronic design problems addressed in the Lab-sticc laboratory. The remaining ones originate from dimacs [18], they have been enriched by generating some edge costs at random to represent conflicts, access costs and sizes for data structures, the number of memory banks with random capacities, and by dividing the conflicts and data structures into different time intervals. Also p = 16 (ms), and l = v = 1 (ms/ko) for all instances. Although real-life instances available today are relatively small, they will be larger and larger in the future as market pressure and technology tend to integrate more and more complex functionalities in embedded systems. Thus, we tested our approaches on current instances and on larger (but artificial) ones as well, for assessing their practical use for forthcoming needs. In Table 1, we compared performances of approaches with the ILP formulation solved by Xpress-MP, that is used as a heuristic when the time limit of one hour is reached: the best solution found so far is then returned by the solver. We presented the instances sorted by non decreasing sizes (i.e. by the number of conflicts and data structures). The first two columns of Table 1 show the main features of the instances: name, number of data structures, conflicts, memory banks and time intervals. The next two columns present the cost and the CPU time of Short-term approach. For the Long-term approach we present the best costs and its time reached in twelve experiments, the standard deviation and the ratio between the standard deviation and average cost. The following two columns report the cost and CPU time of the ILP approach. The column “gap” reports the gap between the Long-term approach and the ILP. The last columns indicates whether or not the solution returned by Xpress-MP is optimal. The optimal solution is known only for the smallest instances. Memory issues prevented Xpress-MP to address the nine largest instances. Bold figures in the table are the best known solutions reported by each method. When the optimal solution is known, only three instances resist to the Long-term approach with a gap of at most 3%. Over the 17 instances solved by Xpress but without guarantee of the optimal solution, the ILP method finds 6 best solutions whereas the Longterm approach improves 11 solutions, sometimes up to 48%.

Metaheuristics for Dynamic Memory Allocation

259

Table 1. Cost and CPU time for all the approaches proposed for Dy-MemExplorer Instances Name gsm newdy compressdy volterrady cjpegdy lmsbvdy adpcmdy lmsbdy lmsbv01dy lmsbvdyexp spectraldy gsmdy gsmdycorr lpcdy myciel3dy turbocodedy treillisdy mpegdy myciel4dy mug88 1dy mug88 25dy queen5 5dy mug100 1dy mug100 25dy r125.1dy myciel5dy mpeg2enc2dy queen6 6dy queen7 7dy queen8 8dy myciel6dy alidy myciel7dy zeroin i3dy zeroin i2dy r125.5dy mulsol i2dy mulsol i1dy mulsol i4dy mulsol i5dy zeroin i1dy r125.1cdy fpsol2i3dy fpsol2i2dy inithx i1dy

Short-term

Long-term

n\o\m

T

cost

(s)

cost

6\5\3 6\6\3 8\6\3 11\7\3 8\8\3 10\8\3 8\8\3 8\8\3 8\8\3 9\8\3 19\18\3 19\18\3 15\19\3 11\20\3 12\22\4 33\61\3 68\69\3 23\71\4 88\146\3 88\146\3 25\160\4 100\166\3 100\166\3 125\209\4 47\236\4 130\239\3 36\290\5 49\476\5 64\728\6 95\755\3 192\960\7 191\2360\5 206\3540\16 211\3541\16 125\3838\19 188\3885\17 197\3925\26 185\3946\17 186\3973\17 211\4100\26 125\7501\24 425\8688\16 451\8691\16 864\18707\28

2 3 2 4 3 3 3 4 4 3 5 5 4 4 4 6 8 7 6 6 5 7 7 6 6 12 10 16 24 11 48 24 35 35 38 39 39 39 40 41 75 87 87 187

7,808 571,968 192 4,466,800 4,323,294 49,120 54,470,706 4,399,847 5,511,967 44,912 1,355,420 494,134 31,849 6,947 3,835 1,867 11,108 16,277 27,521 24,641 22,927 30,677 29,463 37,486 26,218 10,248 31,710 47,988 73,091 70,133 135,682 176,921 219,189 215,950 379,162 238,724 229,157 240,439 243,237 236,435 413,261 528,049 521,923 1,058,645

< 0.01 < 0.01 < 0.01 < 0.01 < 0.01 < 0.01 0.01 < 0.01 0.01 < 0.01 < 0.01 < 0.01 0.01 < 0.01 < 0.01 < 0.01 < 0.01 < 0.01 0.02 0.16 0.02 0.23 0.03 0.14 0.03 0.09 0.04 0.05 0.13 0.16 0.58 0.42 1.11 1.16 1.12 0.86 1.51 0.96 0.98 1.59 2.06 2.50 2.83 12.76

Number of optimal solutions Number of best solutions Average CPU time and gap

3 4 0.72

(s) stand-dev ratio

ILP cost

(s)

7,808 342,592 178 4,466,800 4,323,294 44,192 7,409,669 4,350,640 4,367,024 15,476 1,355,404 494,118 26,888 3,890 3,246 1,806 10,630 8,847 25,543 24,310 15,358 30,488 28,890 36,484 24,162 9,812 23,489 37,599 54,214 65,716 64,696 163,676 212,138 210,464 238,443 232,537 222,410 232,315 236,332 231,170 475,593 516,549 509,834 1,038,331

< 0.01 0.00 0.00 7,808 < 0.01 59,284 0.17 342,592 < 0.01 0.00 0.00 178 0.01 0.00 0.00 4,466,800 < 0.01 1,352,052 0.31 4,323,294 0.01 0.00 0.00 44,192 0.29 1,146,369 0.23 7,409,669 < 0.01 388,819 0.09 4,350,640 < 0.01 1,787,414 0.41 4,367,024 0.01 4,393 0.25 15,472 0.01 0.00 0.00 1,355,390 0.04 0.00 0.00 494,118 0.02 0.00 0.00 26,888 0.01 457 0.11 3,792 0.13 158 0.05 3,195 0.03 1 0.00 1,806 0.13 110 0.01 10,614 0.94 121 0.01 8,611 5.17 126 0.00 25,307 5.87 178 0.01 24,181 0.11 572 0.04 15,522 5.80 253 0.01 29,852 5.89 203 0.01 28,448 2.93 24 0.00 36,489 0.11 336 0.01 23,118 0.75 1 0.00 9,887 0.35 219 0.01 24,678 0.90 564 0.01 46,721 2.10 195 0.00 86,270 11.21 670 0.01 61,831 1.46 2,124 0.03 65,882 215.93 2,026 0.01 276,542 19.15 93 0.00 404,270 19.74 72 0.00 368,212 561.98 1,297 0.01 430,900 20.69 160 0.00 0.00 21.11 19 0.00 0.00 17.67 149 0.00 0.00 19.24 171 0.00 0.00 22.72 34 0.00 0.00 1,488 5,329 0.01 0.00 189.39 398 0.00 0.00 133.50 395 0.00 0.00 1,559 201 0.00 0.00

0.02 0.22 0.06 0.16 0.11 0.11 0.48 0.38 0.27 0.27 0.69 0.77 0.32 1.44 23.09 1.56 6.21 3,600 3,600 1,197 3,600 3,600 3,600 3,600 3,600 3,600 3,600 3,600 3,600 3,600 3,600 3,600 3,600 3,600 3,600 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

14 33

18 24 98.5

gap opt. 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.02 0.00 0.00 0.03 0.01 0.01 −0.01 0.02 0.02 −0.00 0.05 −0.01 −0.05 −0.20 −0.37 0.06 −0.02 −0.41 −0.48 −0.43 −0.45 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes yes no no yes no no no no no no no no no no no no no no no no no no no no no no no no

1,783.80 −0.06

The practical difficult of one instance is related to its size (n, o), but it is not the only factor, for example the ratio between the total capacity of memory bank and the sum of sizes of data structures. We can see instances mug88 1dy and mug88 25dy, they have the same size but the performance of solver Xpress for the ILP formulation is different. The last three lines of the table summarize the results. The Short-term approach finds 4 optimal solutions and the Long-term approach finds 14 out of the

260

M. Soto, A. Rossi, and M. Sevaux

18 known optimal solutions. The Long-term approach is giving the largest number of best solutions with an average improvement of 6% over the ILP method. In most cases, the proposed metaheuristic approaches are significantly faster than Xpress-MP, the Short-term approach being the fastest one. The Shortterm approach is useful when the cost of reallocating data structures is small compared to conflicts costs. In such a case, it makes sense to focus on minimizing the cost of the current time interval without taking future needs into account, as the most important term in the total cost is due to open conflicts. The Longterm approach is useful in the opposite situation (i.e. moving data structure is costly compared to conflict costs). In that case, anticipating future needs makes sense as the solution is expected to undergo very few modification over time. Table 1 shows that the architecture used and the considered instances are such that the Long-term approach returns solution of higher quality than the Shorttern approach (except for r125.1cdy), and then emerges as the best method for today’s electronic applications, as well as for future needs.

6

Conclusion

This paper presents an exact approach and two iterative metaheuristics based on a static subproblem for addressing memory allocation in embedded systems. Numerical results show that the Long-term approaches returns good results in a reasonable amount of time, which makes this approach appropriate for today and tomorrow needs. However, the Long-term approach is outperformed by the Short-term approach on some instances, which suggests that taking the future requirements by aggregating the data structures and conflicts of the forthcoming time interval might not always be relevant. Indeed, the main drawback of this approach is that it ignores the potential for updating the solution at each iteration. Consequently, future work should concentrate on a Mid-term approach, for which future requirements are less and less weighted as they as far away from the current time interval. A second idea would be to design a global approach that builds a solution for all time intervals.

References 1. Coussy, P., Casseau, E., Bomel, P., Baganne, A., Martin, E.: A formal method for hardware IP design and integration under I/O and timing constraints. ACM Transactions on Embedded Computing System 5(1), 29–53 (2006) 2. Atienza, D., Mamagkakis, S., Poletti, F., Mendias, J., Catthoor, F., Benini, L., Soudris, D.: Efficient system-level prototyping of power-aware dynamic memory managers for embedded systems. Integration, the VLSI Journal 39(2), 113–130 (2006) 3. Chimientia, A., Fanucci, L., Locatellic, R., Saponarac, S.: VLSI architecture for a low-power video codec system. Microelectronics Journal 33(5), 417–427 (2002) 4. Julien, N., Laurent, J., Senn, E., Martin, E.: Power consumption modeling and characterization of the TI C6201. IEEE Micro. 23(5), 40–49 (2003)

Metaheuristics for Dynamic Memory Allocation

261

5. Wuytack, S., Catthoor, F., Nachtergaele, L., Man, H.D.: Power exploration for data dominated video application. In: Proc. IEEE International Symposium on Low Power Electronics and Design, Monterey, CA, USA, pp. 359–364 (1996) 6. Cho, D., Pasricha, S., Issenin, I., Dutt, N.D., Ahn, M., Paek, Y.: Adaptive scratch pad memory management for dynamic behavior of multimedia applications. Trans. Comp.-Aided Des. Integ. Cir. Sys. 28, 554–567 (2009) 7. Ozturk, O., Kandemir, M., Irwin, M.J.: Using data compression for increasing memory system utilization. Trans. Comp.-Aided Des. Integ. Cir. Sys. 28, 901–914 (2009) 8. Soto, M., Rossi, A., Sevaux, M.: Exact and metaheuristic approaches for a memory allocation problem. In: Proc. EU/MEeting, Workshop on the Metaheuristics Community, Lorient, France, pp. 25–29 (2010) 9. Iverson, M., Ozguner, F., Potter, L.: Statistical prediction of task execution times through analytic benchmarking for scheduling in a heterogeneous environment. IEEE Transactions on Computers 48(12), 1374–1379 (1999) 10. Lee, W., Chang, M.: A study of dynamic memory management in C++ programs. Comp. Languages Systems and Structures 28(3), 237–272 (2002) 11. Softexplorer (2006), [Online] http://www.softexplorer.fr/ 12. Xpress-mp, FICO (2009), [Online] http://www.dashoptimization.com/ 13. Carlson, R., Nemhauser, G.: Scheduling to minimize interation cost. Operations Research 14, 52–58 (1966) 14. Besbes, S.F.J.H.: A solution to reduce noise enhancement in pre-whitened lms-type algorithms: the double direction adaptation. In: Proc. Control, Communications and Signal Processing, 2004, pp. 717–720 (2004) 15. Herz, A., de Werra, D.: Using tabu search techniques for graph coloring. Computing 39(4), 345–351 (1987) 16. Battiti, R.: The reactive tabu search. ORSA Journal on Computing 6(2), 126–140 (1994) 17. Porumbel, D., Hao, J.-K., Kuntz, P.: Diversity control and multi-parent recombination for evolutionary graph coloring algorithms. In: Cotta, C., Cowling, P. (eds.) EvoCOP 2009. LNCS, vol. 5482, pp. 121–132. Springer, Heidelberg (2009) 18. Porumbel, D.: DIMACS graphs: Benchmark instances and best upper bound (2009), [Online] http://www.info.univ-angers.fr/pub/porumbel/graphs/