Network Partitioning and GA Heuristic Crossover for ... - IEEE Xplore

2 downloads 0 Views 480KB Size Report
Network Partitioning and GA Heuristic Crossover for NoC Application Mapping. Yin Zhen Tei∗, M. N. Marsono∗, N. Shaikh-Husin∗, Yuan Wen Hau†. ∗Faculty ...
Network Partitioning and GA Heuristic Crossover for NoC Application Mapping Yin Zhen Tei∗ , M. N. Marsono∗ , N. Shaikh-Husin∗ , Yuan Wen Hau† ∗ Faculty

of Electrical Engineering of Health Science and Biomedical Engineering Universiti Teknologi Malaysia 81310 Skudai, Johor, Malaysia Email: [email protected] , {nadzir, nasirsh}@fke.utm.my, [email protected] † Faculty

Abstract—Network-on-chip (NoC) has been introduced as a promising on-chip communication architecture to support many IP (intellectual property) cores on a single chip. Application mapping of IP cores onto a NoC topology is considered as a NPhard problem. The increasing number of IP cores makes NoC application mapping more challenging to obtain optimum coreto-topology mapping. This paper proposes a genetic algorithm approach that incorporates network partitioning and heuristic crossover techniques to improve the NoC application mapping. Our experiment on VOPD (video object plane decoder) shows that our proposed method results in only 0.2% to 0.8% communication cost difference compared to global optimal mapping and 6% better communication cost compared to technique using conventional GA. Index Terms—Network-on-chip, application mapping, genetic algorithm, network partitioning

I. I NTRODUCTION Multi-processor system-on-chips (MPSoCs) are expected to contain hundreds of IP cores [1]. Network-on-chip (NoC) has emerged as a promising on-chip communication architecture providing modularity and scalability for MPSoCs. In order to design NoC-based MPSoC with maximum performance and minimum cost, several design problems have been identified and categorized [2]. Optimum application mapping is one of such problems. Application mapping determines the placement of IP cores to routers in the network such that the performance or cost metrics of interest are optimized [2]. The application mapping is a NP-hard combinatorial optimization problem and thus inherently intractable [3]. As an example, possible mapping space for 100 IP cores is 100! ≈ 9.33 × 10157 . Therefore, it is desirable to reduce the size of searching space while maintaining the efficient search for optimum mapping. Genetic algorithm (GA) may provide optimum solution for application mapping which is limited known information [4]. Based on conventional GA, initial population is randomly generated and evaluated by fitness function. Then, genetic operations such as crossover and mutation allow the fittest solution to evolve towards the converged solution. For large possible mapping space, GA needs to be improved in order to speed up the convergence and obtain optimum solution. It is widely accepted that initial population affects the convergence speed of GA [4]. A good initial mapping may increase the probability of reaching optimum mapping [5].

978-1-4673-5762-3/13/$31.00 ©2013 IEEE

This paper proposes incorporating GA with network partitioning (NP refers to network partitioning from here onwards) and heuristic crossover to improve convergence speed and minimize NoC communication cost. NP technique has been proposed in NoC application mapping [6]. It decomposes a large system into smaller subsystems with heavily communicating cores in the same partition. Therefore, minimizes the average internode communication cost. This prior NP information may provide a good initial mapping for GA. Besides initial mapping, crossover is the most dominant genetic operator for generation evolution to search for the mapping space efficiently. Our result shows 6% improvement in the NoC average communication cost compared to random GA initial mapping and crossover. Moreover, the result obtained is close to global optimum mapping up to 0.8% difference. The rest of this paper is organized as follows. Section II briefs some related works in GA and NP application mapping. Section III presents the definition of application mapping and important parameters of GA. Section IV presents the proposed technique for GA application mapping improvement. Section V shows the experiment result and finally, Section VI the conclusion. II. R ELATED W ORK Several techniques are proposed for searching NoC optimum mapping such as integer linear programming (ILP) [7], simulated annealing (SA) [3] and other heuristic techniques [5]. Multi-objective GA has been used for solving application mapping problem [6], [8]. Random initial mapping in GA and heuristic crossover that remaps NoC hotspot within a single chromosome have been proposed [6], [8]. Reference [1] proposed a GA technique for NoC architecture design with random initial population and random crossover for invalid chromosome. Network partitioning has been proposed to solve application mapping [6]. Large SoC system can be divided into several clusters (partitions). Cluster-based application mapping has been proposed in [3], [7]. The author in [7] proposed a clusterbased relaxation for ILP formulation for application mapping in order to reach optimum result within tolerable time limits. This technique maps partitions and cores without improving cross partition movement. Thus, it limits search space for

1228

global optimum and may be locked at local minimum. Reference [3] proposed a cluster-based simulated annealing (CSA) to speed up the convergence to near-optimal solution. This work shows the advantage in runtime without compromising quality of solution compared to pure SA. However, CSA performs application mapping by swapping cores according to annealing temperature, similar to the mutation operation in GA, but converges slower compared to GA with crossover process.

i. Integer chromosome: It consists of a series of genes with each gene corresponds to a router in the mesh topology. Each gene which is assigned to an integer represents an IP core defined in AP CG that is attached to the corresponding router. A gene is assigned a zero if no IP core is assigned to the router. Fig. 2 shows an example of an integer chromosome in a 3 × 3 mesh topology.

III. G ENETIC A LGORITHM FOR N O C A PPLICATION M APPING Some definitions used in this paper are listed below. Definition 1: Application characteristic graph AP CG(V, E) is a directed graph, where each vertex v ∈ V represents an IP core and a directed edge eS,D ∈ E represents the communication bandwidth between a source core (vS ) and a destination core (vD ). We assume that the application tasks have been assigned and scheduled in the IP core level. Definition 2 : NoC mesh topology, T (R, Ch) is a labelled graph. R denotes a set of routers while Ch is a set of channels. Given a AP CG and a NoC topology, application mapping map (V → R) maps cores to routers such that the communication cost is minimized with one router connects to one IP core, i.e., size(R) ≥ size(V ).

1

v4

v6

v5

2

8

7

3

4

0

6

5

Gene

where hopS,D is the path taken from source cores to destination cores (one hop is the distance between two adjacent routers) with XY deterministic routing and BWS,D is the bandwidth requirement from source cores to destination cores. BW is the edge weight of an APCG and is defined by each application. Thus, optimizing hop cost means minimizing the overall communication cost.

Mapping Solution

Crossover

Optimization objective functions  (Fitness function)

Fig. 1: Genetic Algorithm concept for NoC application mapping. The elements of genetic algorithm are as follows:

v3

ii. Crossover: Parent chromosomes are chosen randomly. For two selected chromosomes, a random crossover point is selected. The genes before the crossover point of the first parent are copied into a child chromosome. The genes after the crossover point of the second parent are replicated into the same child chromosome. If identical integer exists in two or more genes, the chromosome is invalid since a core should be assigned to only one router. After crossover operation, some unmapped cores may not be assigned to any router. The invalid gene will be replaced randomly with the unmapped cores. iii. Mutation: One child chromosome is randomly selected from the population. For each gene, a random number is generated. If the random number is smaller than the probability for mutation, the gene will be exchanged with a random gene in the same chromosome. Otherwise the gene remains in the chromosome. iv. Fitness function: Fitness function represents the desired optimization goal. The optimization goal of application mapping is to minimize communication cost measured by � (hopS,D × BWS,D ) (1) CommCost = all S,D

GA Mutation

v7

Fig. 2: Mapping solution represented by integer chromosome.

Initial population

Topology

v8

Integer chromosome

Genetic algorithm shown in Fig. 1 mimics the processes of biological evolution [9]. GA applies three genetic operators: reproduction, crossover and mutation to manipulate the population. Given an APCG and a NoC topology, GA starts with a random-generated initial population which consists of a set of integer chromosomes. These integer chromosomes are evaluated based on a predefined fitness function. A set of the fittest chromosomes is selected and reproduced as parents of the next generation. These parent chromosomes are randomly selected to produce children chromosomes by the means of crossover and mutation operations. GA continues to operate iteratively until a fixed number of iterations or a termination criteria has been met.

Selection

v2

NoC Topology

A. Genetic Algorithm Application Mapping

APCG

v1

IV. P ROPOSED GA WITH NP INITIAL MAPPING AND HEURISTIC CROSSOVER

This paper focuses only on single objective GA in minimizing the NoC communication cost. We address the application mapping by incorporating GA with NP to cut down the GA

1229

mapping search space. Besides, heuristic crossover technique is proposed to speed-up the convergence of GA mapping search space.

Algorithm 1 Crossover Algorithm P opulation is the population size T otalP arent is total parent chromosomes B is the length of chromosome for i = T otalP arent + 1 to P opulation do Select random parent chromosome, P 1 and P 2. Select random crossover point, C. Child(i) ← Crossover between P 1 and P 2. Check InvalidGene. Check UnmappedCores. NeighborCore = GetAdjacentCore(InvalidGene) CommunicatingCore = GetCommCore(NeighborCore, UnmappedCores) if CommunicatingCore == 0 then InvalidGene ← GetRandom(UnmappedCores) else InvalidGene ← GetRandom(CommunicatingCore) end if end for

A. Initial GA mapping based on NP NP decomposes a large system into a few smaller partitions. For NoC application mapping, NP is implemented in two stages: application partitioning and mesh topology partitioning. Multilevel-KL (Kernighan-Lin) partitioning implemented in Chaco tool [10] is used for application partitioning. This algorithm is chosen due to its high-quality partitions and scalable for large problem [10]. Multilevel-KL algorithm decomposes application size to halves and refines the partitions at each level. The level of partitioning is user defined. For the second stage, mesh topology is assigned into few smaller regions where each represents one partition. These two stages NP is used to produce initial population for GA. The IP cores from each partition are placed randomly within the assigned region of mesh topology. NP initial mapping connects IP cores closer to each other. It results in a better initial mapping compared to random-based mapping and increases the probability for GA to reach an optimum solution. B. GA Heuristic Crossover Random crossover and mutation are used to search for fitter chromosomes [1]. For NoC application mapping, the fitness function is closely related to the distance between the source and destination cores. In order to get a fitter solution, heuristic crossover is proposed as in Algorithm 1. The inputs for crossover are the population size, number of parent chromosomes and length of the chromosome. Parent chromosomes and crossover point are randomly set according to nature randomization behaviour of GA. Children chromosome are generated from the selected parents. After crossover between two parents, if the same integer assigned to two genes, the latter gene is named as InvalidGene. Cores that are not assigned to any gene are named as UnmappedCores. The InvalidGene are remapped with UnmappedCores that communicate to the core attached to the adjacent router. This approach aids GA to explore mapping space efficiently. V. E XPERIMENTAL R ESULTS We verify the proposed GA approach on a multi-media benchmark application, VOPD with 16 IP cores [11]. The VOPD application is fitted into 4 × 4 mesh topology. GA algorithm is first modelled using Matlab. NP is implemented using Chaco partitioning tool [10]. VOPD application is divided into four partitions using two-level partitioning. Mesh topology for VOPD application is also recursively assigned into four regions. The results are compared to conventional GA and previous works done on partition-based application mapping. Considering the partitioned VOPD, the searching space is reduced from 16! to 5! × 4!. As a result, the population size of 50 is sufficient for the partition-based search space for this application. From the population size, 15 parent

chromosomes are able to perform space exploration and convergence. Probability mutation is set to 0.05 and probability crossover is 0.8. We execute 50 simulation runs to get the average communication cost. Each simulation run consists of 500 generations considering the randomization effect of GA. Elitism selection method is chosen for this experiment. With the same parameters used, tournament selection would gives similar result for small application but slower convergence for large application. Only elitism selection discussed in this paper. Fig. 3 shows the result of the random-based (RB) and NP initial mapping with different crossover techniques. For VOPD application with 4 × 4 mesh topology, there is significant communication cost difference in the early generation. It is observed that though both RB and NP initial mapping converged and almost reached equilibrium state after 500 generations, the NP initial mapping converges faster to the optimum mapping compared to RB initial mapping in the early generation. This indicates that initial mapping significantly increases the quality of final mapping. While compared in terms of different crossover technique, it is observed that both initial mappings with heuristic crossover always offer better communication cost than random crossover. Therefore, we can conclude that with the combination of NP and heuristic crossover, the proposed technique always offers a best solution. Table I shows the statistical analysis after 500 generations. NP initial mapping with heuristic crossover gives the best mapping among all with lowest average communication cost and smallest standard deviation. It results in 6% better communication cost compared to RB initial mapping with random crossover. Table II compares our proposed approach with other previous works using the VOPD application [3], [7]. Our results show near-optimum mapping compared to other techniques. From exhaustive search, our results shows 0.2% to 0.8% communication cost difference compared to the optimum mapping.

1230

4

RB RB NP NP

7000

initial initial initial initial

mapping mapping mapping mapping

9

(Random crossover) (Heuristic crossover) (Random crossover) (Heuristic crossover)

Average Communication Cost (Mbps)

Average Communication Cost (Mbps)

7500

6500

6000

5500

5000

4500

4000 0

50

100

150

200

250

Generation

300

350

400

450

(a) VOPD application with 4 × 4 mesh topology

RB RB NP NP

8

initial initial initial initial

mapping mapping mapping mapping

(Random crossover) (Heuristic crossover) (Random crossover) (Heuristic crossover)

7

6

5

4

3

500

x 10

0

50

100

150

200

250

Generation

300

350

400

450

500

(b) TGFF application with 8 × 8 mesh topology

Fig. 3: Average communication cost versus generation for different initial mapping and crossover algorithm. TABLE I: GA application mapping statistical analysis of communication cost. Initial mapping RB NP

Random crossover Avg. Stdev. 4384 209.96 4220 94.76

Heuristic crossover Avg. Stdev. 4199 82.26 4132 7.48

TABLE II: VOPD application mapping result of different proposed techniques. Application Mapping Technique

Communication Cost (Mbps)

SA [3] CSA [3] Clustered ILP [7] NP GA with heuristic crossover Exhaustive search

4231 4169 4205 4125 4119

However, the current proposed technique only considers the static cost function in fitness function evaluation of application mapping. In the future, the proposed technique will be enhanced by considering dynamic cost function to obtain more accurate solution. In addition to that, the heuristic crossover technique may need to be improved to satisfy the optimization goal. We believe that the effectiveness of NP can be applied to multi-objective application mapping especially for large size of NoCs. R EFERENCES

To show the scalability of the proposed algorithm, we conducted an experiment on 8 × 8 mesh topology with 60 IP cores generated using TGFF [12] divided into 8 partitions. The simulation result of the application mapping in Fig. 3b shows the exactly same pattern as Fig. 3a mentioned above. It is observed that the combination of NP and GA heuristic crossover always offers the best solution in terms of faster convergence and lower communication cost. In fact, this advantage is even more significant when applied to large application compared to conventional GA. This proves that our proposed technique is scalable and suitable for future large NoC application mapping. VI. C ONCLUSION In this paper, we proposed an application mapping technique based on the combination of network partitioning and GA heuristic crossover targeted for mesh-based NoC. While NP initial mapping efficiently cuts down large NoC mapping space and converges faster to optimum mapping, the heuristic crossover also speeds up GA convergence and offers lower communication cost compared to conventional GA. Therefore, combination of these two techniques always offers the best solution in large NoC-based MPSoC application mapping.

[1] G. Leary, K. Srinivasan, K. Mehta, and K. Chatha, “Design of networkon-chip architectures with a genetic algorithm-based technique,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 17, no. 5, pp. 674–687, May 2009. [2] R. Marculescu, U. Ogras, L.-S. Peh, N. Jerger, and Y. Hoskote, “Outstanding research problems in NoC design: System, microarchitecture, and circuit perspectives,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and System, vol. 28, no. 1, pp. 3–21, January 2009. [3] Z. Lu, L. Xia, and A. Jantsch, “Cluster-based simulated annealing for mapping cores onto 2d mesh networks on chip,” in Proceeding of the 11th IEEE Workshop on Design and Diagnostics of Electronic Circuits and Systems (DDECS 2008), April 2008, pp. 1–6. [4] H. Maaranen, K. Miettinen, and A. Penttinen, “On initial populations of a genetic algorithm for continuous optimization problems,” J. of Global Optimization, vol. 37, no. 3, pp. 405–436, Mar. 2007. [5] C. Marcon, E. Moreno, N. Calazans, and F. Moraes, “Comparison of network-on-chip mapping algorithms targeting low energy consumption,” IET Computers Digital Techniques, vol. 2, no. 6, pp. 471–482, November 2008. [6] A.A.Morgan, “Networks-on-chip: Modeling, system-level abstraction, and application-specific architecture customization,” Ph.D. dissertation, University of Victoria, 2011. [7] S. Tosun, “Cluster-based application mapping method for network-onchip,” Advances in Engineering Software, vol. 42, no. 10, pp. 868–874, October 2011. [8] G. Ascia, V. Catania, and M. Palesi, “Multi-objective mapping for meshbased noc architectures,” in Proceedings of the 2nd IEEE/ACM/IFIP International conference on Hardware/software codesign and system synthesis (CODES+ISSS ’04), 2004, pp. 182–187. [9] M. Mitchell, “Genetic algorithm: An overview,” 1995. [Online]. Available: http://ohm.ecce.admu.edu.ph/wiki/pub/Main/ ResearchProjects/mitchell GA tutorial.pdf [10] B. Hendrickson and R. Leland, “The Chaco user’s guide version 2.0,” 1995. [Online]. Available: http://www.sandia.gov/∼bahendr/papers/ guide.ps [11] E. B. Van Der Tol and E. G. T. Jaspers, “Mapping of MPEG-4 decoding on a flexible architecture platform,” in Media Processors 2002, 2002, pp. 1–13. [12] R. Dick, D. Rhodes, and W. Wolf, “TGFF: task graphs for free,” in Proceedings of the Sixth International Workshop on Hardware/Software Codesign (CODES/CASHE ’98), March 1998, pp. 97–101.

1231