Understanding Resource Provisioning for ...

7 downloads 56932 Views 422KB Size Report
Understanding Resource Provisioning for ClimatePrediction.net. Malik Shahzad K. .... system support was included for Linux and Apple Macintosh OS X [12].
Understanding Resource Provisioning for ClimatePrediction.net

Malik Shahzad K. Awan and Stephen A. Jarvis Department of Computer Science, University of Warwick, Coventry, UK. [email protected] Abstract—Peer-to-peer computing, involving the participation of thousands of general purpose, public computers, has established itself as a viable paradigm for executing looselycoupled, complex scientific applications requiring significant computational resources. ClimatePrediction.net is an excellent demonstrator of this technology, having attracted hundreds of thousands of users from more than 200 countries for the efficient and effective execution of complex climate prediction models. This paper is concerned with understanding the underlying compute resources on which ClimatePrediction.net commonly executes. Such a study is advantageous to three different stakeholders, namely the application developers, the project participants and the project administrators. This is the first study of this kind, and while offering general conclusions with specific reference to ClimatePrediction.net, it also illustrates the benefits of such work to scalability analysis of other peer-to-peer projects. Keywords- Peer-to-Peer Computing; Performance Analysis; Climate Prediction

I.

INTRODUCTION

Predicting climatic trends remains a computing Grand Challenge, despite the significant advances in highperformance computing. The development of complex climate models, coupled with atmospheric/ocean global circulation models, requires considerable computing resources for calculating several hundred years of accurate model data. Historically, supercomputers have been used for performing the scientific computations involved in simulating complex climate models [1]. The peer-to-peer (P2P) computing paradigm, which involves contributions from (potentially) thousands of personal computers belonging to the general public, has changed this computing model. P2P computing provides cheap, general-purpose computing resources with comparable computing power (FLOP/s) to an otherwise expensive supercomputer; thus significant computing resources are now much more readily available, making possible previously infeasible research [2]. The contribution of P2P computing has been furthered by the development of software-support for general-purpose distributed computing. This includes Entropia, which we see underlying the Great Internet Mersenne Prime Search (GIMPS) [2, 3] project, for example, and Distributed.net, which supports research in cryptography [2, 4]. Of significant value is the Berkeley Open Infrastructure for Network Computing (BOINC) research, which provides

platform support for a variety of complex scientific applications in the P2P paradigm [2, 5], etc. Realizing the fact that executing a non-linear climate model was beyond the computational capabilities of available supercomputers [6], the Intergovernmental Panel on Climate Change (IPCC) in 1999 launched the ClimatePrediction.net initiative, to explore the possible contribution of P2P computing [7]. The porting of the ClimatePrediction.net project to the BOINC platform was done in 2004 and since then the project has attracted more than 220,000 users, deploying more than 429,000 hosts, from 217 different countries. Significantly, the project has achieved average computational power of 129.694 Tera floating-point operations per second (TeraFLOP/s) [8]. The ClimatePrediction.net project has involved host machines running 200 distinct microprocessors and 21 different operating systems. Despite this wide variety of supporting resources, there has been little work on understanding the implications of this on the application development, the project participants or the project administration. This study goes some way to addressing this issue, and while the results are specifically related to ClimatePrediction.net, wider conclusions can be drawn with respect to other P2P projects. Four of the most widely deployed processors in the ClimatePrediction.net project have been used in our analysis. We also consider the impact of the choice of operating system on the participating computers. The remainder of the paper is organized as follows: Section II details the ClimatePrediction.net project; the description of the publicly available performance data and the project statistics for ClimatePrediction.net are found in Section III; Section IV presents the results of our analysis on this performance data; We identify performance challenges in Section V, while Section VI contains the conclusions and a description of the future directions of our research. II.

CLIMATEPREDICTION.NET

ClimatePrediction.net is a distributed computing project producing predictions of the Earth’s climate up to the year 2100 and, at the same time, testing the accuracy of climate prediction models. Parametric approximations, e.g., for carbon dioxide, sulphur, etc., are analyzed to improve the understanding of, and confidence in, the impact of these parameters on the variety of climate prediction models available to scientists [13].

We highlight the relevant characteristics of the ClimatePrediction.net project to this work:

compressed and returned to the ClimatePrediction.net project server [12].

A. Scientific Goal The main scientific goal of the ClimatePrediction.net project was to improve “methods to quantify uncertainties of climate projections and scenarios, including long-term ensemble simulations using complex models” [9].

H. Performance Evaluation The ClimatePrediction.net project involves the synthetic Dhrystone and Whetstone benchmarks for measuring the performance of the participating computers. The BOINC platform has an accounting system that defines the notion of ‘credit’, which is a weighted combination of computations performed, network transfer and the storage used for computation [2]. The publicly available benchmark results for each participating ‘type’ of computer capture the microprocessor, the operating system, the measured integer performance, the measured floating point performance, the total disk space, the free disk space, the number of CPUs in the participating computer, RAM size, Cache size and the Swap Memory size [8].

B. ClimatePrediction Experiments The project utilizes a number of different experiments. These include a simplified single-layer slab ‘ocean’ atmospheric model for analyzing the suitability of the model for climate prediction; replicating the past climate from 1920-2000 using a full atmosphere-ocean model, and simulating the climate for 2000-2100 with varying values of solar, sulphate and greenhouse forces. C. ClimatePrediction Model The ClimatePrediction.net project utilizes the ‘Hadley Centre Slab Model 3’ (HadSM3) version of the ‘Unified Model’ (UM) [10, 11] from the UK MetOffice for the future climatic behavior investigations [12]. The prediction model divides the world into 2.75 by 3.75 degrees resolution grid boxes and generates simulations of 45-years that require approximately 4-6 weeks of computation on a uniprocessor computer [12]. D. Source Code of the Project The original UM code, written in FORTRAN 90, consists of approximately 1 million lines of code and was executed on a single-processor Unix system [12]. The source code was later ported to object-oriented C++ and a wrapper code was developed for providing the necessary infrastructure to the ClimatePrediction.net project, enabling it to act like a ‘volunteer computing’ application [12]. The project deploys the mySQL database for storage [12]. E. Underlying OS Platform The first phase of the ClimatePrediction.net project was based on Windows and took almost 2.5 years of development. Later, with the adoption of the BOINC platform in 2004, underlying operating system support was included for Linux and Apple Macintosh OS X [12]. F. System Requirements The minimum system requirements for running the simulations include, 1) a 1.6 GHz microprocessor or above, 2) 256 MB Memory, and 3) at least 1 GB of available hard disk space - for the different versions of the Windows and Linux operating systems. For a Macintosh platform, a 1.66GHz microprocessor is required, with 600 MB of available disk and 192 MB memory are required to execute the project simulations [14]. G. Simulation Results The simulations executed on the host machines produce a detailed output of approximately 2 GB [2], however, only 8MB of the results (summarizing information concerning temperature, precipitation, clouds, humidity, etc.) are

I. General Project Statistics Project statistics from ClimatePrediction.net reveal that more than 220,000 users, forming 6,961 teams, have participated with around 429,650 machines from 217 countries across the globe [8]. The top 3 participating countries of the project in terms of users are the United States (53,466 users), the United Kingdom (21,056 users) and Germany (19,478 users) [8]. A total of 200 different microprocessors have been involved in the ClimatePrediction.net project. The Intel Pentium platform (27,844 hosts) is reported as being the most commonly used microprocessor, with the Intel Pentium 4 3.0GHz (22,753 hosts), Intel Pentium 4 2.8GHz (17,548 hosts), Intel Pentium 4 2.4GHz (11,646 hosts) and the Intel Pentium 4 3.2GHz (11,364 hosts) forming four of the top five most widely used processors in the set of host machines [8]. Project participants have used 21 different operating systems with only one of the operating systems having unknown (hidden) type and name. Microsoft Windows XP has been reportedly used in more than 273,869 host machines, Microsoft Vista has been used on 43,015 and Linux on 40,512 host machines. The use of FreeBSD and Microsoft Windows 95 is uncommon [8]. III.

PERFORMANCE ANALYSIS

In order to conduct a reliable performance analysis of the ClimatePrediction.net host resources, we appeal to the fivestage decomposition of this topic as proposed in [15]. These are: 1) Instrumentation – involving accessing application data for measurement and storage during execution; 2) Measurement – including measurement and storage of the application data during execution; 3) Analysis – comprising the identification of performance trends and potential bottlenecks; 4) Presentation – consisting of visual representation of the measured data for manual or automated analysis, and 5) Optimization – containing the activities associated with the removal of identified bottlenecks. This paper presents our preliminary findings using data supplied by the Dhrystone and Whetstone benchmarks.

A. Scope of the Study The scope of this study is purely limited to the analysis and presentation stages of the performance analysis. Thus, we analyze a proportion of the readily available performance data from the ClimatePrediction.net project. This data was collected from the publicly available data source for scientific applications running on the BOINC platform – see [8] for more details. Part of this study is to statistically assess this data to determine whether it provides a sufficient foundation for further research. If this is not the case then new resource benchmarking tools may be required in this context. B. Significance of Analysis We believe that the results of this analysis will be significant to three different stakeholders namely: application developers, project participants and project administrators. For application developers, the results provide an insight into system performance when executing scientific applications such as those found in ClimatePrediction.net. The measured performance of the system, with different parametric values for the supporting hardware components, will provide the application developers with supporting data so that future versions of the software can be improved. Moreover, the results will also enable the application designers to exploit operating system characteristics. The general public, interested in participating in P2P projects, will gain information on effective platforms for scientific computations involving floating-point and integer operations. While this may seem inconsequential, it may be significant to future multi-core resource allocation and sharing, which if better understood, may increase the number of participants in the ClimatePrediction.net project. Finally, we believe that the analysis will assist project administrators in identifying suitable processor, operating system and hardware component configuration for best-case execution. The significance of this is that it may allow a new form of time-constrained experiments to be supported. In such a case the middleware may be instructed to postpone the allocation of work until a more effective architecture becomes available. This of course represents a trade-off, balancing the likely occurrence of a more suitable architecture against the increase in run-time of employing the first available system. Without supporting performance data (such as that seen here) such analysis is not possible. C. Selected Processors When selecting the processors for our analysis, we consider two factors: 1) the rank of the most widely used processors in the host machines and, 2) the credit earned per CPU, which represents the amount of work done by a system. Therefore we consider four (of the top 5) most widely used processors for this analysis, based on the average credit earned by the CPUs of this type. These processors are: 1) the Intel Pentium 4 3.2 GHz (credit per CPU:27,351.64); 2) the Intel Pentium 4 3.0 GHz (credit per CPU:27,698.48); 3) the Intel Pentium 4 2.8 GHz (credit per CPU:24,926.16); 4) the Intel Pentium 4 2.4 GHz (credit per CPU value:16,867.05). All the processors in this study are

from the Intel Pentium 4 family and have SSE2 instruction set architecture [20]. Such a performance analysis can be further extended for other candidate processors belonging to various manufacturers and representing different instruction set architectures. IV.

RESULTS

We analyze the performance of processors, operating systems, their combination at both an overall level and based on their mean performance, factors that affect the FP and INT performance across platforms and their effect on the measured FP and INT values. The data used in the analysis was manually collected and concerns the performance of four Intel Pentium 4 microprocessors: 3.2 GHz, 3.0 GHz, 2.8 GHz and 2.4 GHz. A total of 240 samples were collected from [8] for the four processors – 60 data samples for each of the four processor types. Of the 60 data samples for each processor type, 30 represented the performance under Windows XP, while the remaining 30 samples captured the performance behaviour under the Linux operating system. Our analysis reveals some interesting performance trends among the considered processors under different operating systems. These results are discussed in the following subsections. A. Processor Performance Analysis When analyzing the overall performance we use quantile-plots, as this aids clarity of results, while bar-charts are used for analyzing the mean performance of processors for FP and INT operations. 1) Overall Processor Performance for FP operations: The quantile-plots, shown in figure 1, represent the overall processor performance for floating point (FP) operations. These results suggest that the performance of the 3.2 GHz processor was better in terms of achieved FP performance. The highest achievable FP count value with the 3.2 GHz processor, as compared to the 3.0 GHz, 2.8 GHz and 2.4GHz processors, was greater by 7.14%, 4.60% and 16.03% respectively. However, the lowest possible FP value measured on the 3.2 GHz processor was 4.21% lower than that achieved using a 3.0 GHz processor, demonstrating considerable variability in the results. This same processor produced a 404.76% improvement over that obtained using a 2.8 GHz processor, while it also highlighted a 3.196% lower performance than the lowest FP value measured using a 2.4 GHz processor. 50% of the performance results for: 1) the 3.2 GHz processor, were found in the inter-quartile range (IQR) of 1045 and 1475; 2) the 3.0 GHz processor, were found in the IQR of 988 and 1309.67; 3) the 2.8 GHz processor, were found in the IQR of 809 and 1276; and 4) the 2.4 GHz processor were found in the IQR of 670.60 and 1254.50. A comparison of IQR reveals a significant overlap of the 50% FP values of all the processors based on the inter-quartile ranges, hence, a clear distinction at this level is not possible. The median value, i.e. the 50th percentile, represents the central tendency of data and identifies an interesting distribution trend. The performance instances within the IQR

for 3.2 GHz, 3.0 GHz and 2.4 GHz had a highly skewed distribution while the values for 2.8 GHz show less of a trend. The data ranges for the 3.2 GHz, 3.0 GHz, 2.8 GHz and 2.4 GHz processors were 636-1636, 664-1527, 126-1564 and 657-1410 respectively. While the maximum and minimum values for each of the processors lead to a fairly inconclusive comparison (the exception being the 2.8 GHz processor which exhibits high variability), there is a clear trend at the 50th percentile. While the trend itself may be unsurprising, the 3.2 GHz processor achieves approximately twice the measured FP performance of that of the 2.4 GHz processor, which clearly represents an exploitable difference.

were present within the upper and lower whiskers. Several of the INT values for the 3.0 GHz processor exceeded the upper whisker boundary and were classified as ‘near outliers’ and are represented by circles with respect to the distribution of the majority of data instances. The data ranges for the 3.2 GHz, 3.0 GHz, 2.8 GHz and 2.4 GHz processors were 8693011, 1123-3876, 315-2937 and 1040-2845 respectively.

Figure 2. Comparison of Processor Performance on Measured INT Count

Figure 1. Comparison of Processor Performance on Measured FP Count

2) Overall Processor Performance for INT operations: Figure 2 represents the overall processor performance for INT operations. The quantile-plots reveal that the performance of the 3.0 GHz processor is marginally better in terms of achieving high INT values. The highest achievable INT value with 3.0 GHz processor as compared to 3.2 GHz, 2.8 GHz and 2.4 GHz processors showed a relative increase of 28.72%, 31.97% and 36.24% respectively. Similarly, the lowest possible INT value measured on a 3.0 GHz processor was better than the remaining processors. It was 29.22% higher than that achieved using a 3.0 GHz processor; 256.50% more than that obtained using a 2.8 GHz processor and 7.98% better than that measured on a 2.4 GHz processor. A comparison of IQR values for 3.2 GHz (16112500.50), 3.0 GHz (1561.75-2242), 2.8 GHz (1239-2264.50) and 2.4 GHz (1838-2422) reveals a significant overlap of the values of all the processors based on the inter-quartile ranges. The median value, i.e. the 50th percentile, representing the central tendency of the data, is less conclusive than in the case of floating-point arithmetic. The performance instances within the IQR for 3.2 GHz, 3.0 GHz and 2.8 GHz processors has slightly skewed distributions, however, the quantile-plot suggests a highly skewed distribution for performance samples of the 2.4 GHz processor lying within the IQR. All the data instances for each of the processors, except for the 3.0 GHz processor,

Figure 3. Comparison of Processor Performance on Mean INT and FP Count

3) Average Processor Performance for INT and FP: Due to the apparent overlap in the quantile-plots representing the INT and FP performance, we further compare the average performance values for INT and FP operations to get a better insight into the overall processor performance for this application. Figure 3 presents the comparison of the average performance for both INT and FP operations. For INT operations, the 2.4 GHz processor gave higher results than the 3.2 GHz, 3.0 GHz and 2.8 GHz processors by 9.65%, 9.83% and 29.20% respectively. Perhaps unsurprisingly the 3.2 GHz processor performs better for FP operations (on average presenting a 24.03%, 32.87% and 31.63% improvement over the 3.0 GHz, 2.8 GHz and the 2.4

GHz processors). As previously stated, knowing that the 3.2 GHz processor performs better is not surprising – however, knowing the percentage improvement over the other processors is a feature that we can later exploit. B. Operating System Performance Analysis The performance of operating systems (OS) for FP and INT operations using the four processors has been analyzed at an overall level using quantile-plots and at mean performance levels using bar-charts. Further details of the analysis are discussed in the following subsections. 1) Overall OS Performance for FP operations: The performance of Windows XP and Linux on the 3.2 GHz, 3.0 GHz, 2.8 GHz and 2.4 GHz processors for floating point operations is shown in figure 4. For the 3.2 GHz processor, in some cases, Windows XP provides better results than Linux with an increased peak value of 6.72% for FP operations, while an increased lowest possible value by 87.89%. Despite the overlap of IQRs of Windows XP (13661589.75) and Linux (757-1469.50), the quantile-plots reveal that Windows XP provides an overall better performance than Linux on a 3.2 GHz processor. The median value (at the 50th percentile) shows the skewed distribution of the performance data present in the IQRs. The data range for the Windows XP and Linux operating systems were 1195-1636 and 636-1533 respectively. For the 3.0 GHz processor, the non-overlapping IQRs in the quantile-plots reveal that Windows XP (1285-1324.75) clearly performed better than Linux (985.75-1083.50) for FP operations apart from for a few values classified as: 1) near outliers represented by circles; and 2) significant outliers represented by stars. Windows XP yields an increased peak value and an improved lower value than Linux by 8.91% and 53.01% respectively. An approximate normal distribution of performance data within the IQR for Windows XP is represented by the median value, while it reports a skewed distribution for Linux. The performance range for FP operations while using Windows XP and the Linux operating systems on 3.0 GHz processor were 1016-1527 and 6641402 respectively. For the 2.8 GHz processor, the quantile-plots reveal that Windows XP clearly performs better than Linux. When Windows XP is used with a 2.8 GHz processor, an increased peak value for FP operations, up to 50.67%, was reportedly achieved. Similarly, for the lowest possible value, usage of Windows XP yields better values of up to 810.301% relative to Linux. The median value for Windows XP and Linux suggests a skewed distribution of performance data for both operating systems in their respective IQRs. The data ranges for the Windows XP and Linux operating systems were 1147-1564 and 126-1038 respectively. For the 2.4 GHz processor, the quantile-plots reveal that Windows XP clearly performs better than Linux. With Windows XP, an increased peak value for FP of up to 106.44% over Linux is achieved. Similarly, for the lowest possible value, usage of Windows XP yielded better values of up to 59.81% relative to Linux. In terms of Linux running on a 2.4 GHz processor, the performance values for FP operations have been found to be very predictable. The data

ranges for the Windows XP and Linux operating systems were 1050-1410 and 657-683 respectively.

Figure 4. Comparison of OS Performance on Measured FP Count

2) Average OS Performance for FP operations: A comparison of Windows XP and Linux, shown in figure 5, in terms of achieving higher average FP performance suggested a clear performance advantage of Windows XP. Windows XP provides a 27.76%, 28.53%, 103.33% and 84.42% average performance improvement over Linux on 3.2 GHz, 3.0 GHz, 2.8 GHz and 2.4 GHz processors respectively.

Figure 5. Comparison of Average OS Performance for FP operations

3) Overall OS Performance for INT operations: Figure 6 represents the quantile-plots describing the performance of Windows XP and Linux on the 3.2 GHz, 3.0 GHz, 2.8 GHz and 2.4 GHz processors for INT operations. In terms of the 3.2 GHz processor, the top 25% of the data instances above 75th percentile for Windows XP were higher than Linux except in a few cases. The performance of both Windows XP and Linux can be largely characterized by the outliers. The deployment of Windows XP provides a decreased peak value and a decrease in performance of 8.73% and 4.20% respectively, for INT operations when compared with Linux on a 3.2 GHz processor. The median values suggest a skewed distribution of performance data for

both operating systems. The data range for the Windows XP and Linux operating systems were 869-2748 and 907-3011 respectively.

4)Average OS Performance for INT operations: A comparison of Windows XP and Linux for achieving higher average INT performance is shown in figure 7. It was revealed that Windows XP gave better average performance than Linux for INT operations on the 3.2 GHz, 2.8 GHz and 2.4 GHz processors by 31.17%, 58.75% and 25.67% respectively. However, the average performance of Windows XP was degraded by 24.08% on 3.0 GHz processor as compared to Linux.

Figure 6. Comparison of OS Performance on Measured INT Count

Overall, the 3.0 GHz processor Linux performance is significantly better than that of Windows XP, apart from a few outlier values. The usage of Windows XP resulted in a decreased peak value for INT of up to 28.38%, while gave a degraded performance of up to 4.43% for the lowest possible INT value when compared with Linux. The median values suggested skewed distribution of performance data for both Windows XP and Linux. The data ranges for the Windows XP and Linux operating systems were 1123-2776 and 11753876 respectively. Windows XP performed relatively better than Linux for the 2.8 GHz processor. When Windows XP is used with a 2.8 GHz processor, an increased peak value for INT operations of up to 25.67% was reportedly achieved. Similarly, for the lowest possible value, the use of Windows XP yielded better values of up to 232.38% relative to Linux. At least for 50% of the performance instances, Windows XP gave higher results than Linux. The 50th percentiles for Windows XP and Linux revealed a skewed distribution of performance data. The data ranges for the Windows XP and Linux operating systems were 1047-2937 and 315-2337 respectively. The performance of Windows XP was found to be clearly better than Linux apart from a few outlier values on the 2.4 GHz processor. When Windows XP was used with a 2.4 GHz processor, an increased peak value for FP of up to 50.77% was reportedly achieved. However, for the lowest possible value, usage of Windows XP resulted in a degraded value of up to 40.64% relative to Linux due to the presence of outliers associated with Windows XP performance data. For Linux running on a 2.4 GHz processor, the performance values for INT were found to be very predictable with few outliers. The data ranges for the Windows XP and Linux operating systems were 1040-2845 and 1752-1887 respectively.

Figure 7. Comparison of Average OS Performance for INT operations

C. Factors affecting the FP and INT performance The performance analysis reveals that the 3.2 GHz processor is the most suitable for floating-point intensive computations with Windows XP, while for integer intensive computations, the 2.4 GHz processor with Windows XP is the most suitable amongst those considered. We further investigate the factors affecting the FP and INT performance on these platforms and the effects of the change in values of these impacting factors on the FP and INT performance. 1) Factors affecting the FP performance: We perform correlation analysis to identify the factors that influence the FP performance on the 3.2 GHz processor with Windows XP. The Pearson coefficient of correlation is calculated using PASW Statistics 18 for identifying the influence of RAM, Cache, Swap Memory, Total Disk Size and Free Disk Size on the measured FP performance. The analysis, summarized in Table I, identified Cache as the only factor influencing the FP performance of the system with a moderate positive strength of 0.363 and a significance value of 0.048. TABLE I. CORRELATION ANALYSIS FOR FP PERFORMANCE

Pearson Sig. (2-tailed)

RAM

Cache

Swap

.073 .700

.363 .048

-.002 .991

Total Disk -.043 .822

Free Disk -.023 .903

To determine the effect of change in Cache size on measured FP value for the 3.2 GHz processor with Windows XP, we performed regression analysis with several different

regression models: 1) Linear; 2) Inverse; 3) Quadratic; 4) Cubic; 5) Compound; 6) Power; 7) S; 8) Growth and, 9) Exponential; to identify the best suited functional relationship between the Cache size and the measured FP value. The R2 values for Linear (0.132), Logarithmic (0.133), Inverse (0.134), Quadratic (0.145), Cubic (0.145), Compound (0.143), Power (0.144), S (0.144), Growth (0.143) and Exponential (0.143) models suggests that no single model can accurately represent the complex relationship between the Cache size and the FP performance of the system. Figure 8 shows the scatter plot and the trendlines using these models.

contains the results of the correlation analysis. The analysis again identifies Cache as the only factor influencing the INT performance of the system with a moderate negative strength of 0.599 and a highly significant value of 0.001. TABLE II. CORRELATION ANALYSIS FOR INT PERFORMANCE

Pearson Sig. (2-tailed)

RAM

Cache

Swap

.235 .247

-.599 .001

.273 .177

Total Disk .381 .060

Free Disk .358 .072

To determine the effect of a change in the Cache size on measured INT values for the 2.4 GHz processor with Windows XP, we again perform regression analysis, given in Figure 9, to best characterize the functional relationship between the Cache size and the measured FP value. The R2 values for Linear (0.359), Logarithmic (0.359), Inverse (0.359), Quadratic (0.359), Cubic (0.359), Compound (0.337), Power (0.337), S (0.337), Growth (0.337) and Exponential (0.337) models suggest once again that none of the models could accurately represent the complex relationship between the Cache size and the INT performance of the system. V.

CHALLENGES ASSOCIATED WITH PERFORMANCE ANALYSIS

While analyzing the performance data returned by ClimatePrediction.net, we have identified two main challenges. These are briefly discussed in the following subsections. Figure 8. Relationship between Cache Size and FP performance

Figure 9. Relationship between Cache Size and INT performance

2) Factors affecting the INT performance: We performed correlation analysis to identify the factors influencing the INT performance of the 2.4 GHz processor with Windows XP. The Pearson coefficient of correlation is calculated using PASW Statistics 18 for finding the impact of RAM, Cache, Swap Memory, Total Disk Size and Free Disk Size on the measured INT performance. Table II

A. Benchmarks The Dhrystone and Whetstone benchmarks used for measuring the performance of the systems participating in the ClimatePrediction.net project are traditional synthetic benchmarks, often described in the literature as ‘toy benchmarks’ [16]. As such, the results that we can yield from the underlying resources are limited. The performance data obtained using these synthetic benchmarks cannot be compared with a standard performance benchmark suite e.g., SPEC 2000. The performance data collected for the ClimatePrediction.net project are in terms of millions of instructions executed per second. On the other hand, the SPEC 2000 suite has two components: CINT2000 and CFP2000. The CINT2000 calculates the integer operation speed of a system by running 12 different programs for an odd number of times greater than or equal to 3. The CFP2000 component calculates the floating-point operation speed of a system by running 14 different programs for an odd number of times greater than or equal to 3. The results are then converted to ratios with respect to a benchmark (the Sun Ultra 10 machine); the geometric mean of the individual results is calculated and published [17]. B. Limitations of BOINC There are also reported limitations of the BOINC benchmarking system [18]. These include: 1) instability of benchmarking results with successive runs, yielding

significant variations in performance results; 2) the credit claims based on the reported benchmarking results for the same work units reportedly have wide variation of up to 2X virtually across every BOINC powered project; 3) no provision is available to prevent ‘cheating’ or ‘cooking’ the benchmarking results; 4) the BOINC benchmarking does not perform a good characterization of the system; and 5) the usage of synthetic benchmarks are not good representatives of the real-world scientific applications running on BOINC. Realizing the limitations of BOINC benchmarking, efforts are underway for improving the benchmarking system employed [18-19]. VI.

process of developing light-weight resource profiling tools that we hope to contrast with those detailed in [18]. As well as improving the accuracy of the accounting system found in BOINC, there several other benefits to collecting accurate resource information. We envisage scenarios in which middleware may select resources on the basis of this resource data (semi-real-time analysis, QoS-managed services etc), where it may be advantageous to wait for better resources as opposed to selecting resources as and when they come available. This may in itself allow a richer variety of P2P service, and thus broaden the appeal (and application) of this paradigm.

CONCLUSIONS & FUTURE WORK

The aim of this study is to characterize the underlying resources on which the ClimatePrediction.net project, or similar projects that are based on the BOINC platform, execute. By understanding more about the participating resources we hope to: (i) Support application developers by providing insights into system performance. The measured performance of the system, with different parametric values for the supporting hardware components, will provide the application developers with supporting data so that future versions of the software can be improved. Moreover, the results will also enable the application designers to exploit operating system characteristics; (ii) Support the participants in peer-to-peer projects, by informing them as to the amount of RAM, swap memory and main memory consumed during execution. While this may at first appear inconsequential, it may be significant to future multi-core resource allocation and sharing, which if better understood, may increase the number of participants in these P2P projects; (iii) Finally we believe that the analysis will assist project administrators in identifying suitable processor, operating system and hardware component configuration for best-case execution. The significance of this is that it may allow a new form of time-constrained experiments to be supported. In such a case the middleware may be instructed to postpone the allocation of work until a more effective architecture becomes available. When selecting the processors for our analysis, we consider two factors: 1) the rank of the most widely used processors in the host machines and, 2) the credit earned per CPU, which represents the amount of work done by a system. Therefore we consider four (of the top 5) most widely used processors for ClimatePrediction.net in this analysis. A total of 240 data samples were collected, with 60 data samples representative of each of the four processors in question. We also evaluate the impact of operating system on the floating-point and integer performance. While trends are apparent in many cases, which themselves may be exploited in later work, the limitations of the Dhrystone and Whetstone benchmarks from which the data is collected is a limiting factor. Further work is needed to strengthen the data collection process. This in itself has some difficulties, not least, the introduction of further overhead on the contributing machines. We are in the

REFERENCES [1]

[2]

[3] [4] [5] [6] [7]

[8]

Stainforth, D., et. al., “Distributed Computing for Public Interest Climate Modeling Research”, Computing in Science and Engineering, Vol. 4, Issue 3, May 2002, pp. 82-89. Anderson, D. P., “BOINC: A System for Public-Resource Computing and Storage”, Proc. of 5th IEEE/ACM International Workshop on Grid Computing, 2004, pp. 4-10. Entropia PC Grid Computing: http://www.entropia.com/ Distributed.net: http://www.distributed.net/ BOINC: http://boinc.berkeley.edu/ Allen, M. R., “Possible or probable?”, Nature, 425:242, 2003. Allen, M. R., “Do-it-yourself climate prediction”, Nature, 401:642, 1999. ClimatePrediction.net Statistics: http://www.allprojectstats.com/po.php?projekt=21

[9]

[10]

[11]

[12]

[13] [14]

Working Group I, “Climate Change 2001: The Scientific Basis”, Third Assessment Report of the Intergovernmental Panel on Climate Change, Cambridge Univ. Press, Cambridge, UK, 2001. Cullen, M., “The Unified Forecast/Climate Model”, Meteor. Mag., 122:81-94, 1993. Williams, K. D., et. al., “Transient climate change in the Hadley Centre models: The role of physical processes”, Journal of Climate, 14 (12): 2659-2674, 2001. Christensen, C., et. al., “The Challenge of Volunteer Computing with Lengthy Climate Model Simulations”, Proc. of the 1st International Conference on e-Science and Grid Computing (e-Science’ 05), 2005. ClimatePrediction.net: http://climateprediction.net/ ClimatePrediction.net Project System Requirements: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/tech_faq_boinc.php

[15]

[16]

[17]

Koehler, C., Curreri, J., and George, A. D., “Performance Analysis Challenges and Framework for High-Performance Reconfigurable Computing”, Parallel Computing, Vol. 34, Issue 4-5, May 2008, pp.217-230. Hennessy, J. L., and Patterson, D. A., Computer Architecture – A Quantitative Approach, Morgan Kaufmann Publishers, Inc., San Mateo, California, 1990. Standard Performance Evaluation Corporation (SPEC): http://www.spec.org

[18]

Improved Benchmarking System Using Calibration Concepts: http://www.boincwiki.info/Improved_Benchmarking_System_Using_Calibration_Con cepts

[19]

New Credit System Design:

http://boinc.berkeley.edu/trac/wiki/CreditNew [20]

Intel Pentium 4 Processor:

http://pixel01.cps.intel.com/products/processor/pentium4/index.htm