Calling Network: A New Method for Modeling Software Runtime ...

18 downloads 10259 Views 2MB Size Report
Calling Network: A New Method for Modeling Software Runtime. Behaviors. Yu Qu [email protected]. Xiaohong Guan [email protected]. Qinghua ...
ACM SIGSOFT Software Engineering Notes

Page 1

January 2015 Volume 40 Number 1

Calling Network: A New Method for Modeling Software Runtime Behaviors Yu Qu [email protected]

Ting Liu [email protected]

Xiaohong Guan [email protected]

Qinghua Zheng [email protected]

Jianliang Zhou [email protected]

Jian Li [email protected]

MOE Key Lab for Intelligent Networks and Network Security, Xi’an Jiaotong University Xi’an, China

ABSTRACT Modern software systems’ structures and behaviors are becoming very complicated. Existing models either lack systematic considerations on the whole system’s behavior patterns or are inefficient in runtime monitoring. In this paper, the Calling Network (CN ) model is proposed to provide new perspectives to analyze the dynamic execution process of a software system. CN is consisted of one or a series of Calling Graph (CG), which is a dynamic version of Call Graph and encodes method call frequencies. Some new perspectives such as Growing Network and Network (Graph) Sequence are also embodied in CN model. Based on a data set of 10 real-world Java programs, we show that CN presents several interesting features, such as Power-law degree distribution, Densification Power Law, and the stability of an entropy value – Local Entropy. Experiments have been conducted to show the applications of CN in software significant module identification and runtime failure diagnosis.

Categories and Subject Descriptors H.4 [Information Systems Applications]: Miscellaneous; D.2.8 [Software Engineering]: Metrics—complexity measures, software science

General Terms Theory, Measurement

Keywords Software Model, Software Behavior, Method Call

1.

INTRODUCTION

Software systems might be one of the most complex systems made by human being. Many models have been proposed to understand and describe the structures and behaviors of software systems. These models can be summarized from the following two perspectives. On one hand, software models in the form of Finite State Automata [19] or Program Invariants [9] have been intensively used in software engineering domain. But these models’ practical applications are often hindered by the enormously large state space of software. For example, the Model Checking techniques [6] often have to face the state space explosion problem, and Dynamic Symbolic Execution techniques [12] are suffered from the path explosion problem. In a word, these models and techniques are very useful in software engineering, but they need the help provided by ∗Corresponding author.

DOI:10.1145/2693208.2693223

systematic considerations on the behavior patterns and dynamics of the whole system. On the other hand, with the progress of Complex Network (also referred to Complex System) theory and graph mining algorithms in recent years, these theories have been intensively applied to the quantitative analysis of software systems. This is an interdisciplinary research area. Existing methods in this area have successfully applied the Complex System theory and graph algorithms to software evolution prediction [4], software community, controllability, vulnerability analysis [21, 25, 26], software structure interpretation and evaluation [3, 20, 31], etc. However, existing methods suffer from the following problem: Most of the networks [3, 4, 20, 21, 25, 26, 27, 30] are constructed by statically analyzing the source code of a software system. Thus, these networks are not applicable in testing, monitoring and maintenance processes, which take important roles in a software system’s life cycle. Moreover, modern software design patterns and practices make it sometimes impossible to capture all the prerequisite information to construct a complete network only by static analysis. Take the Inversion of Control (IoC) [13] mechanism of the popular J2EE framework – Spring, for example. Figure 1 is an excerpt taken from a J2EE benchmark web application – JPetStore1 ’s “applicationContext.xml ” file, which is a standard configuration file of the Spring framework. This excerpt defines that in the source files of JPetStore, all the class transactionManager is actually the real implementation class DataSourceTransactionManager. It can also be linked to another implementation class just by changing this figuration file. By using this flexible mechanism, the Spring framework decouples the interconnections between classes, making it impossible to construct the network by only analyzing the system’s source code. Network and graph models are inherently suitable for software systems. Different methods/functions2 , classes, modules and components are the basic ingredients which are combined together to form a complete system. As shown in Figure 2, during the execution process of a software system, its method call behaviors are recorded using instrumentation frameworks. Calling Network (CN ) is constructed to model these method call behaviors and relationships. CN is consisted of 1

http://code.google.com/p/mybatis/ In this paper, we use software interchangeably with program, and method interchangeably with function. 2

http://doi.acm.org/2693208.2693223

ACM SIGSOFT Software Engineering Notes

Page 2

January 2015 Volume 40 Number 1

Figure 1: An excerpt from the “applicationContext.xml ” file of JPetStore 6. CG1

CG40

CG80

CG160

Scheme 3: Partitioned Calling Network

ĂĂ

Scheme 1: Raw Calling Network

Software Method Call Process

ĂĂ t1

t2

t3

tn

Time

Scheme 2: Growing Calling Network

Color

Lines of Code 0 – 45 46 – 90

CG∞

91 – 135 136 – 180 271 – 315

Figure 2: Calling Network model, a personal information management system – Makagiga’s Calling Network is depicted in this figure. one or a series of Calling Graph (CG). Three computing schemes are proposed to generate CN. CN has the following characteristics and advantages: Firstly, CN has Power-law degree distribution, and is a typical Small-world and Scale-free network (Scheme 1 in Figure 2). These findings have laid a good foundation for further applying Complex Network theory to evaluating software dynamic structure [31] and behavior. Moreover, as CN is constructed dynamically, it is directly related to a certain functionality of a system. In experiments, we show that node centrality metrics in Complex Network theory are useful in identifying significant modules for a web forum system’s posting functionality. Secondly, this paper propose to model the dynamic method calls from a Growing Network perspective. As shown in Scheme 2 in Figure 2. A very interesting phenomenon is observed: it is discovered that all the software’s CN s obey the same growing pattern – Densification Power Law. It is the first time to report such phenomenon in software domain. Thirdly, we propose to partition the original CN into a sequence of CGs, as shown in Scheme 3 in Figure 2. It is discovered that the Local Entropy, which is a quantification method in Complex Network theory, exhibits very stable nature in such Calling Graph sequence. Local Entropy is helpful in software runtime fault diagnosis and localization. Finally, all the aforementioned methods and perspectives are included in a single and integrated model, which is complementary to existing models. In summary, this paper makes the following contributions: 1) a new and systematic model, CN, of software runtime method calls is proposed. 2) three schemas are proposed to generate CN, then several interesting features of CN are observed, such as Power-law degree distribution, Densification Power Law and the stability of Local Entropy in CN. 3) Experiments have been conducted

DOI:10.1145/2693208.2693223

to show the application of CN in performance optimization and runtime failure diagnosis. The rest of this paper is organized as follows. In Section 2, the definition of CG is introduced. A formal definition of CN is given, then three CN generation schemes are discussed. In Section 3, based on 10 real-world Java programs, the measurements, discussions and applications of CN model and its generation schemes are discussed. Section 4 gives the conclusion and future work.

2. CALLING NETWORK MODEL 2.1 Calling Graphs and Other Networks CN is consisted of one or a series of CG. In this section, an illustrative example is given to describe the differences between CG and other networks proposed in previous related works. Figure 3 (a) is a simple example code snippet in Java syntax. Most of the networks previously proposed are constructed by statically analyzing the source code of the target system. For the given code snippet, the networks in Figure 3 (b) and Figure 3 (c) are constructed statically. Network in Figure 3 (b) is the Class Dependency Network [25, 26]. Although other literatures [3, 20, 30] usually only gave methods to construct the networks, the intrinsic nature of these networks is same to the Class Dependency Network ’s. In these networks, nodes are classes and edges represent relationships between classes. These relationships include aggregation (A→B, A→C, and B→D), inheritance (D→C) and interface implementation, and return / parameter types. The Class Collaboration Graph [21] is similar to Class Dependency Network except that the return / parameter type relationships are not considered. Network in Figure 3 (c) is the Class Graph [27], which only considers the inheritance relationships and is a simplified version of the Class Dependency Network. Figure 3 (d) shows the Object Graph [24], which is constructed dynamically. Each node in Object Graph represents an object created during program execution, and the edges are using relationships between objects (b:B→d:D). The nodes c:C and c2:C are isolated because they are used by the static main() method rather than an object.

http://doi.acm.org/2693208.2693223

ACM SIGSOFT Software Engineering Notes

Page 3

January 2015 Volume 40 Number 1

Figure 3: Example code snippet and different networks. Figure 3 (e) illustrates Calling Graph (CG), which is used in this paper. CG is based on dynamic execution of the software. The weights in Figure 3 (e) represent method call frequencies during the execution process or a certain period (will be explained later). Moreover, the weights can represent other statistical information. CG is a processed version of the well known notion in software engineering – Call Graph [14]. It is mostly similar to the reduced (dynamic) call graphs introduced in [7], which processed the original rooted ordered tree in Call Graph to a reduced weighted graph, in which the weights of edges encode method call frequencies.

2.2

Strategy 1 . uses fixed interval and quantity of cbs to generate CG. Two parameters are set in this strategy: NItv and NCG , represent interval between two consecutive CBi s, and the number of cbs in each CBi respectively. Then CBi is expressed as: CBi = {cbk |(i − 1) · NItv ≤ k ≤ (i − 1) · NItv + NCG }. Strategy 2 . uses time step to partition CB. Two parameters are set in this strategy: Ti and ∆t. Ti is the i-th time point, ∆t is the time window to select cbs to construct CB i . Then CB i is expressed as: CB i = {cbk |Ti − ∆t ≤ tk ≤ Ti }. In summary, the Calling Network model is formalized as (Strategy 1 is included):

Calling Network Model

Formal definitions of CN are given as follows: Definition 1 . Software method Calling Behavior cb is a record of one method invocation, which is a four-tuple: cbk = (tk , Callerk , Calleek , P aramk ), where tk is the time when the method invocation happened, Callerk and Calleek are selfdescriptive, P aramk is the parameter set of Calleek . Definition 2 . Calling Behavior Set CB is an ordered set of cb: CB = {cbk |k ∈ N }, where k is the sequence number of cb. Definition 3 . Calling Graph is a directed weighted network: CG = (V, E), where the nodes set V represents the method set, and the edge set E represents the method invocation relations. CG has a weighting function w : E → N. The weighting function can represent different computation schemas. In this paper, edge weight mainly represents method call frequency. Definition 4 . Calling Graph generation function: fCG−Gen : CB → CG. Definition 5 . Calling Network is an ordered set of CG: CN = {CGi |i ∈ N }, where CGi = fCG−Gen (CBi ), CBi ⊆ CB. CB is partitioned into CBi using some strategies. The 2 following strategies are discussed in this paper. Other strategies can also be included in the model.

DOI:10.1145/2693208.2693223

 CN = {CGi |i ∈ N } ,    CGi = fCG−Gen (CBi ) , CBi ⊆ CB and    CBi = {cbk |(i − 1) · NItv ≤ k ≤ (i − 1) · NItv + NCG } , CG = (V, E) , w : E → N,       CB = {cbk |k ∈ N } , cbk = (tk , Callerk , Calleek , P aramk ) . (1)

2.3

CN Generation Schemes

Based on Equation 1, three CN generation schemes are proposed by setting different values of NItv and NCG :

2.3.1

Raw Calling Network (Raw CN)

Setting NItv = 0 and NCG = |CB| in CN model. Then there is only one CG in CN, denoted as CG∞ . Scheme 1 in Figure 2 depicts this Raw CN of Makagiga3 , a personal information management system. CG∞ is constructed based on a CB whose cardinality is 324928 (|CB| = 324928), which contains 2358 nodes and 5540 edges.

2.3.2

Growing Calling Network (Growing CN)

Assuming NItv = 0 and NCG = i · NConst , where NConst is a constant value, which means CBi = {cbk |0 ≤ k ≤ i · NConst }. It can be noticed that this sequence of CGs represents the growing 3

http://sourceforge.net/projects/makagiga/

http://doi.acm.org/2693208.2693223

ACM SIGSOFT Software Engineering Notes

Page 4

process of Calling Graph over time. Scheme 2 in Figure 2 shows Growing CN of Makagiga.

Corendal Wiki’s CDDF 4

Cumulative Degree Distribution Function

2.3.3

Partitioned Calling Network (Partitioned CN)

Letting NItv and NCG be some non-zero values, then Partitioned CN is derived. Scheme 3 in Figure 2 illustrates the Partitioned CN of Makagiga, in which NItv = NCG = 2000, means that CBi = {cbk |(i − 1) · 2000 ≤ k ≤ i · 2000 }. Considering |CB| = 324928, then a CN contains 163 CGs is derived (|CN | = 163). The 1st, 40th, 80th and the 160th CGs in this CN are shown.

3.

EXPERIMENTS, MEASUREMENTS, ANALYSIS AND APPLICATIONS OF CN 3.1 Data Set and Experiment Implementation A data set including 10 real-world open-source Java programs are collected, as shown in Table 1. Corendal Wiki4 is a intranet wiki application. DLOG4J5 is a web blog system based on JSP and Servlet. Endeavour6 is a software project management system. FreeMind7 is a mind-mapping software. JForum8 is a discussion board system. JPetStore is a J2EE demo system introduced in Section 1. Kunagi9 is a software project management system based on the agile framework Scrum10 . LogicalDOC11 is a document management system. Makagiga is a personal information management system introduced in Section 2. OpenKM12 is a multi-platform application for document management. These programs exhibit a strong heterogeneity in their sizes, design principles and functionalities. Most of these programs are popular and widely used in practice (except for JPetStore, which is a standard demo system developed by MyBatis13 and it also has been widely used in researches as a benchmark system [15]). For example, Endeavour has been downloaded for 73,637 times in SourceForge14 since April 2nd, 2009. Table 1 shows basic information of these programs. For each program, the second column shows the program’s architecture (Arch) (Web or desktop), the third column provides the version number. The 4th to 8th columns give the value of Static Lines of Code (SLoC), and the numbers of packages, classes and methods respectively. The last column gives the cardinality of CB, which will be explained later. To construct CGs and CN, dynamic execution traces of the target programs have to be collected. The Kieker framework15 [28], which is an open-source software dynamic behavior monitoring framework based on AspectJ [16], was used and re-developed as the instrumentation framework. The open-source Python software NetworkX16 , which is a package for the computation of the structure, dynamics, and functions of complex networks, was redeveloped as the main tool to conduct network data analysis. 4

http://sourceforge.net/projects/corendalwiki 5 http://sourceforge.net/projects/dlog4j/ 6 http://sourceforge.net/projects/endeavour-mgmt/ 7 http://sourceforge.net/projects/freemind/ 8 http://jforum.net/ 9 http://kunagi.org/ 10 http://www.scrumalliance.org/ 11 http://www.logicaldoc.com/ 12 http://www.openkm.com/ 13 http://blog.mybatis.org/ 14 http://sourceforge.net/ 15 http://kieker-monitoring.net/ 16 http://networkx.github.io/

DOI:10.1145/2693208.2693223

January 2015 Volume 40 Number 1

10

Node Degree 1408.7126k −1.6089r2 = −0.97642

3

10

In-degree Out-degree 2

10

1

10

0

10 0 10

1

2

10 10 Node Degree

3

10

Figure 4: Degree distribution of Corendal Wiki’s CG∞ . To generate CBs of these programs, different usage scenarios, user inputs and test cases were designed according to these programs’ functionalities and documents. Then these test cases were used to drive the programs to generate CBs. The last column of Table 1 lists the cardinality of CB of these programs. In the following part of this section, unless specifically noted, all the measurements and analysis are based on these Calling Behavior sets. Experiments were done on a 8-core Intel Xeon server with 16 GB of RAM.

3.2

Raw CN: Statistics and Measurement of CG∞ Table 2 shows the statistical data of CG∞ s of the 10 programs. The second and third columns in Table 2 give the number of nodes and number of edges of the corresponding CG∞ . Statistics in the 4th to 6th columns correspond to the undirected version of CG∞ . Real world complex networks often exhibit Power-law degree distribution, expressed as: p (k) ∼ k−γ , where k is a node’s degree, p (k) is the probability distribution of k, and γ is the scaling exponent. We have found that all the programs’ CG∞ has an approximate Power-law degree distribution. Furthermore, all the scaling exponents (γ) are in the interval (2, 3), which means that CG∞ is Scale-free [2]. Figure 4 shows the Cumulative Degree Distribution function of Corendal Wiki’s CG∞ . The 5th column shows the average path length of CG∞ , which is defined as: X 1 l= dij n (n − 1) i6=j

where dij is the shortest path length between node i and j. The 6th column gives the average cluster coefficient of CG∞ . A node i’s cluster coefficient is defined as: ei C (i) = ki (ki − 1) /2 where ki is node i’s degree, ei is the number of edges between node i’s neighbors. The last 2 columns in Table 2 give the average clustering coefficient and the average path length of the corresponding Erd˝ os-R´enyi (ER) random graph [8] which has the same quantity of nodes and connection probability with CG∞ . ER random graph is one of the most commonly used random graph models, it is generated by connecting pairs of nodes randomly with a given connection probability p [8]. Based on Table

http://doi.acm.org/2693208.2693223

ACM SIGSOFT Software Engineering Notes

Programs Corendal Wiki DLOG4J Endeavour FreeMind JForum JPetStore Kunagi LogicalDOC Makagiga OpenKM

Arch Web Web Web Desktop Web Web Web Web Desktop Web

Page 5

January 2015 Volume 40 Number 1

Table 1: Experiment subject programs Version SLoC # Package # Class 3.0.2 49,083 21 513 1.4.2 85,112 38 540 1.21 18,312 6 216 0.9.0 53,669 34 960 2.1.9 65,040 42 397 6.0 1,893 4 24 0.23 176,486 130 2,792 6.7.0 131,888 103 2,058 3.8.2 156,906 65 2,020 6.2.2 N/A N/A N/A

Table 2: Basic statistics Programs n Corendal Wiki 475 DLOG4J 517 Endeavour 630 FreeMind 299 JForum 716 JPetStore 222 Kunagi 781 LogicalDOC 892 Makagiga 2,358 OpenKM 1,390

of the m 1,030 748 1,799 865 1,506 328 1,352 3,684 5,540 2,525

JForum Fault Injection Experiment

Average User Bandwidth (KB/s)

600

To sum up, CG∞ usually has Power-law degree distribution, and is a typical Scale-free and Small-world complex network.

This experiment shows that the metrics of nodes’ centrality values in CG∞ are useful in identifying a software system’s significant module related to the system’s performance. These centrality values are also helpful in a software system’s performance optimization process, as the methods with large centrality values should be optimized preferentially. 17

http://jmeter.apache.org/

DOI:10.1145/2693208.2693223

|CB| 6,584 8,327 39,322 192,694 42,516 2,099 198,259 160,685 324,928 249,990

10 programs’ CG∞ s in Raw CN s γ l C CER lER 2.61 4.27 0.13 0.007 4.30 2.49 4.46 0.02 0.002 5.74 2.43 3.97 0.07 0.011 3.95 2.34 3.02 0.25 0.014 3.54 2.63 4.44 0.07 0.008 4.58 1.98 3.40 0.04 0.003 5.19 2.54 5.31 0.10 0.004 5.32 2.51 3.87 0.06 0.010 3.44 2.81 4.75 0.05 0.001 5.27 2.03 4.08 0.09 0.002 5.72

2, it can be concluded that C  CER and l ≈ lER , which means that CG∞ is a typical Small-world network [29].

The preceding results have laid a good foundation for further applying Complex Network theories to evaluating software dynamic structure and behavior. To show the application of CG∞ , an experiment was designed and conducted. CG∞ of JForum was constructed based on JForum’s posting operation. Then 5 faulty versions of JForum were constructed. For the 1st version, we randomly selected 5 methods and injected 50 ms delay into each method. Then based on the CG∞ , each node’s values of Betweenness Centrality [5], Closeness Centrality [11], Communicability Centrality [10] and Load Centrality [22] were computed. The 2nd to 5th fault versions of JForum were constructed by injecting 50 ms delay into 5 methods which had the largest these centrality values respectively. Then we used Apache JMeter17 to simulate concurrent 50 users who perform the same operations with the operations been used to generate CG∞ . For each version, the experiment has been conducted for 10 times. The average bandwidths of these experiments are illustrated in Figure 5. “Normal” in Figure 5 represents the normal version of JForum, while “Random” to “Load” represent the faulty versions respectively. The error bars in Figure 5 depict standard deviations.

# Method 3,147 5,384 1,706 5,974 2,991 289 18,021 8,692 10,356 N/A

500

400

300

200

100

0

Normal Random Between ClosenessCommun

Load

Normal and Faulty Versions

Figure 5: Fault injection experiment on JForum.

The performance degradation introduced by nodes with large centrality values is much significant than the random situation. In Complex Network theory, importance of a node is quantified by its centrality values [2]. Results of this experiment are in accordance with the perceptions of Complex Network theory. This experiment shows the potential applications of CG∞ . Many theoretical results of Complex Network theory and graph algorithms can be further applied to the analysis of CG∞ .

3.3

Growing CN: Densification Power Law

In recent years, it has been discovered that the real world networks often exhibit a consistent tendency in their evolutions: these networks usually become denser over time, which is not in accordance with some widely used models, e.g., preferential attachment model [1, 23]. Furthermore, the densification processes exhibit a consistent relation, often expressed as [17, 18]:

http://doi.acm.org/2693208.2693223

ACM SIGSOFT Software Engineering Notes

e (t) ∝ n(t)a

Page 6

(2)

where e (t) and n (t) are the number of edges and nodes of the network at time t, a is an exponent between 1 and 2. This relation is named as Densification Power Law [17, 18], which means that network usually become denser and the number of edges and nodes grows obey a super linear relationship defined in equation 2 during the network’s evolution. During the execution process of a software system, its CN also evolves. Software starts with an entry point, e.g., the main() method in Java program, more methods join in CN as the software executes and does various computation jobs (as shown in Scheme 2 in Figure 2). Growing CN can represent this evolution process. Figure 6 shows the results of Growing CN s of 4 programs.

January 2015 Volume 40 Number 1

Table 3: The size of Programs Endeavour FreeMind JForum Kunagi LogicalDOC Makagiga OpenKM

Partitioned CN s of 7 systems |CB| NCG |CN | 39322 2000 19 192694 3500 56 42516 1500 29 198259 2500 80 160685 2500 65 324928 2000 163 249990 4000 63

Table 4: Local Entropy’s Local Programs 0 Endeavour 57.07% 629 359 In Figure 6, the number of edges versus the number of nodes are FreeMind 52.35% plotted in log-log scale. The straight line is the linear regression 298 156 fit results, the slope of the straight line and the correlation coJForum 76.08% efficient are also shown. These results are quite surprising: it 715 544 seems that all the experiment programs’ Growing CN s strictly Kunagi 70.38% obey the Densification Power Law. We believe that the Densifi780 549 cation Power Law nature of Growing CN s is related to essential LogicalDOC 63.30% mechanisms of software systems’ dynamics. Whether the expo891 564 nent is a quality indicator or a parameter which is related to the Makagiga 69.54% functionalities of a software system? This interesting phenomenon 2357 1639 needs further research. OpenKM 77.97% 1389 1083 3.4 Partitioned CN: Stability of Entropy and EntropyTOTAL 69.33% based Applications 7059 4894

3.4.1

Sample Variance of 7 programs Entropy’s Sample Variance 0–0.001 0.001–0.1 0.1–0.3 13.35% 26.07% 3.50% 84 164 22 11.41% 27.85% 8.39% 34 83 25 10.63% 11.19% 2.10% 76 80 15 8.33% 19.23% 2.05% 65 150 16 10.77% 25.36% 0.56% 96 226 5 10.99% 17.86% 2.38% 259 403 56 11.45% 7.27% 3.31% 159 101 46 10.95% 17.10% 2.62% 773 1207 185

Entropy Based Method

In CN, the weights of edges can represent execution frequencies of method calls. Considering the frequencies encode the basic dynamics and method call patterns of a software system, a method in Complex Network theory is used to quantify the method call patterns. For a node i in CN, the Local Entropy [2] of i is:

LE (i) = −

  X wij wij 1 ln ln ki s (i) s (i)

(3)

j∈V (i)

where ki is node P i’s degree, s (i) is node i’s vertex strength, defined as: s (i) = j∈V (i) wij , where V (i) is the adjacent node set of node i, wij is the weight between node i and node j. Local Entropy is used to quantify the heterogeneity of node i’s edges’ weights. It goes from 0 if all the invocation of i is fully concentrated on one link to the maximal value 1 for i’s strength is introduced by homogeneous method calls. Local Entropy can quantify the heterogeneity of a method’s weight in CN. From software engineering practice perspective, a method is usually called in a relatively fixed pattern, like “in what program state, it should be called by what method for how many times”. This intuition inspired us that the Local Entropy of a method may not change drastically in Partitioned CN, makes it a stable metric to quantify a software method behavior pattern. The following experiment was conducted to verify such hypothesis. Based on previously introduced data, we used different values of NCG in equation 1 (let NItv = NCG ) and constructed Partitioned CN s of 7 programs (we haven’t analyzed Corendal Wiki, DLOG4J and JPetStore in this experiment because their CBs are too small). Basic statistics of Partitioned CN s are illustrated in

DOI:10.1145/2693208.2693223

Table 3. The values of NCG were chosen with the purpose of generality. Figure 8 shows values of several metrics in Partitioned CN s of 2 programs. It shows that most of the metrics vary significantly in Partitioned CN. On the other hand, the Local Entropy of each method doesn’t exhibit significant change. It can also be noticed that a certain method may not appear in every CG in Partitioned CN, but the Local Entropy values have little difference among CGs in which the method appears. Table 4 shows the Sample Variance of methods’ Local Entropy values in these 7 programs’ Partitioned CN s. For each entry, the percentage is shown in the first line, followed by the quantity of the corresponding methods in the second line. It shows that for 69.33% methods of these 7 programs, their Local Entropy doesn’t change in Partitioned CN. For these methods, Local Entropy can be used in runtime failure diagnosis and localization, the following part of this section shows the details. Most of the other methods’ Local Entropy values only exhibit slightly changes. This experiment confirms the hypothesis that Local Entropy of a method usually does not change drastically in Partitioned CN.

3.4.2

Runtime Fault Diagnosis and Localization

In this section, an experiment aiming to show the application of Local Entropy was conducted. In this experiment, we manually injected a runtime fault, which could cause an “Array Index Out of Bounds” exception with 40% probability if the current time of the server exceeds a pre-defined time, into a method of JPetStore. Then we used Apache JMeter to simulate 40 concurrent users: 50% of them were browsing users, and the others had purchasing behaviors.

http://doi.acm.org/2693208.2693223

ACM SIGSOFT Software Engineering Notes

Page 7

January 2015 Volume 40 Number 1

Figure 6: Number of nodes versus number of edges in Growing CN s Law, which may reveal the underlying mechanism of a software system’s dynamics. We have proposed using the Local Entropy in Partitioned CN to quantify a method’s behavior pattern. Experiment has been conducted to prove the stability of Local Entropy. We also have conducted experiment to prove that Partitioned CN and the notion of Local Entropy are helpful in software runtime fault diagnosis and localization. In the future, we plan to use more advanced data mining algorithms in the analysis of CN.

5.

Figure 7: Local Entropy of different methods. The result of this experiment is shown in Figure 7. The “faulty method” in Figure 7 represents the method contained the injected fault; the “called method” is one of the methods called by the faulty method, while m1 and m2 are two methods which don’t have direct invocation relations with the faulty method. As shown in Figure 7, before the injected fault was triggered, Local Entropies of all the 4 methods didn’t have drastic changes (the faulty method and m1’s Local Entropies didn’t change in Partitioned CN ). After the 25th slice of Partitioned CN (when the fault was triggered), the Local Entropy of the faulty method exhibited drastic changes, while the Local Entropies of the other 3 methods, including the called method, remained in the same trend as before. By analyzing Local Entropies in Partitioned CN, one can immediately identify the abnormal deviation of the faulty method’s behavior pattern. The partitioned and dynamic nature of Partitioned CN makes it possible to dynamically update and re-evaluate the Local Entropy value, which is the key factor in this experiment.

4.

CONCLUSIONS

In this paper, we have proposed a new model – Calling Network, to describe a software system’s runtime method call behaviors. Three CN generation schemes are proposed: Raw CN, Growing CN and Partitioned CN. It has been shown that like other previously proposed networks, Raw CN also has Power-Law degree distribution, and is a typical Small-world and Scale-free network. Experiment shows that centrality values in Raw CN are useful in significant module identification and performance optimization for a software system. It has been discovered that Growing CN evolves strictly obey the newly discovered Densification Power

DOI:10.1145/2693208.2693223

ACKNOWLEDGMENTS

This paper was supported by the National Natural Science Foundation of China (91118005, 91218301, 61221063, 61203174 and 61202392), Key Projects in the National Science & Technology Pillar Program (2011BAK08B02), Doctoral Fund of Ministry of Education of China (20110201120010) and the Fundamental Research Funds for the Central Universities. We would also like to thank the anonymous reviewers for their valuable comments and suggestions for improving this paper.

6.

REFERENCES

[1] A.-L. Barab´ asi and R. Albert. Emergence of scaling in random networks. science, 286(5439):509–512, 1999. [2] A. Barrat, M. Barthlemy, and A. Vespignani. Dynamical processes on complex networks. Cambridge University Press, 2008. [3] G. Baxter, M. Frean, J. Noble, M. Rickerby, H. Smith, M. Visser, H. Melton, and E. Tempero. Understanding the shape of java software. In ACM SIGPLAN Notices, volume 41, pages 397–412. ACM, 2006. [4] P. Bhattacharya, M. Iliofotou, I. Neamtiu, and M. Faloutsos. Graph-based analysis and prediction for software evolution. In Proceedings of the 2012 International Conference on Software Engineering, pages 419–429. IEEE Press, 2012. [5] U. Brandes. A faster algorithm for betweenness centrality*. Journal of Mathematical Sociology, 25(2):163–177, 2001. [6] E. M. Clarke, O. Grumberg, and D. E. Long. Model checking and abstraction. ACM Transactions on Programming Languages and Systems (TOPLAS), 16(5):1512–1542, 1994. [7] F. Eichinger, K. B¨ ohm, and M. Huber. Mining edge-weighted call graphs to localise software bugs. In Machine Learning and Knowledge Discovery in Databases, pages 333–348. Springer, 2008. [8] P. Erd˝ os and A. R´enyi. On random graphs i. Publ. Math. Debrecen, 6:290–297, 1959. [9] M. D. Ernst, J. Cockrell, W. G. Griswold, and D. Notkin. Dynamically discovering likely program invariants to support program evolution. Software Engineering, IEEE

http://doi.acm.org/2693208.2693223

ACM SIGSOFT Software Engineering Notes

Page 8

January 2015 Volume 40 Number 1

Figure 8: Different metrics in Calling Graph sequence of Partitioned CN. EN represents number of edges, AD represents average degree, ACC means the average clustering coefficient, and LE represents Local Entropy. Transactions on, 27(2):99–123, 2001. [10] E. Estrada and N. Hatano. Communicability in complex networks. Physical Review E, 77(3):036111, 2008. [11] L. C. Freeman. Centrality in social networks conceptual clarification. Social networks, 1(3):215–239, 1979. [12] P. Godefroid, N. Klarlund, and K. Sen. Dart: directed automated random testing. In ACM Sigplan Notices, volume 40, pages 213–223. ACM, 2005. [13] GoPivotal incorporation. http://static.springsource.org/spring/docs/3.2.x/springframework-reference/html/overview.html#overviewdependency-injection, 2013. [14] S. L. Graham, P. B. Kessler, and M. K. Mckusick. Gprof: A call graph execution profiler. ACM Sigplan Notices, 17(6):120–126, 1982. [15] M. Grechanik, C. Fu, and Q. Xie. Automatically finding performance problems with feedback-directed learning software testing. In Software Engineering (ICSE), 2012 34th International Conference on, pages 156–166. IEEE, 2012. [16] G. Kiczales, E. Hilsdale, J. Hugunin, M. Kersten, J. Palm, and W. G. Griswold. An overview of aspectj. In ECOOP 2001 – Object-Oriented Programming, pages 327–354. Springer, 2001. [17] J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 177–187. ACM, 2005. [18] J. Leskovec, J. Kleinberg, and C. Faloutsos. Graph evolution: Densification and shrinking diameters. ACM Transactions on Knowledge Discovery from Data (TKDD), 1(1):2, 2007. [19] D. Lorenzoli, L. Mariani, and M. Pezz. Automatic generation of software behavioral models. In Proceedings of the 30th international conference on Software engineering, pages 501–510, Leipzig, Germany, 2008. ACM. [20] P. Louridas, D. Spinellis, and V. Vlachos. Power laws in software. ACM Transactions on Software Engineering and

DOI:10.1145/2693208.2693223

Methodology (TOSEM), 18(1):2, 2008. [21] C. R. Myers. Software systems as complex networks: Structure, function, and evolvability of software collaboration graphs. Physical Review E, 68(4):046116, 2003. [22] M. E. Newman. Scientific collaboration networks. ii. shortest paths, weighted networks, and centrality. Physical review E, 64(1):016132, 2001. [23] M. E. Newman. The structure and function of complex networks. SIAM review, 45(2):167–256, 2003. [24] A. Potanin, J. Noble, M. Frean, and R. Biddle. Scale-free geometry in oo programs. Communications of the ACM, 48(5):99–103, 2005. ˇ [25] L. Subelj and M. Bajec. Community structure of complex software systems: Analysis and applications. Physica A: Statistical Mechanics and its Applications, 390(16):2968–2975, 2011. ˇ [26] L. Subelj and M. Bajec. Software systems through complex networks science: Review, analysis and applications. In Proceedings of the First International Workshop on Software Mining, pages 9–16. ACM, 2012. [27] S. Valverde and R. V. Sol´e. Hierarchical small worlds in software architecture. arXiv preprint cond-mat/0307278, 2003. [28] A. van Hoorn, J. Waller, and W. Hasselbring. Kieker: A framework for application performance monitoring and dynamic software analysis. In Proceedings of the third joint WOSP/SIPEW international conference on Performance Engineering, pages 247–248. ACM, 2012. [29] D. J. Watts and S. H. Strogatz. Collective dynamics of small-world networks. nature, 393(6684):440–442, 1998. [30] R. Wheeldon and S. Counsell. Power law distributions in class relationships. In Source Code Analysis and Manipulation, 2003. Proceedings. Third IEEE International Workshop on, pages 45–54. IEEE, 2003. [31] Q. Zheng, Z. Ou, T. Liu, Z. Yang, Y. Hou, and C. Zheng. Software structure evaluation based on the interaction and encapsulation of methods. Science China Information Sciences, 55(12):2816–2825, 2012.

http://doi.acm.org/2693208.2693223