Paper Title (use style: paper title)

3 downloads 0 Views 648KB Size Report
we used OpenSSL v1.01f and BusyBox v1.21 to compile for three architectures MIPS, x86 and ARM as [4, 5] have done, which generated about 3,6000 functions ...
CVSSA: Cross-architecture Vulnerability Search in Firmware Based on Support Vector Machine and Attributed Control Flow Graph Hong Lin, Dongdong Zhao, Linjun Ran, Mushuai Han, Jing Tian, Jianwen Xiang* Key Laboratory of Transportation of Internet of Things School of Computer Science and Technology Wuhan University of Technology Wuhan, China {linhonghong, zdd, linjun_ran, hanmushuai, jtian, jwxiang}@whut.edu.cn Abstract— Nowadays, an increasing number of IoT vendors have complied and deployed third-party code bases across different architectures. Therefore, to avoid the firmware from being affected by the same known vulnerabilities, searching known vulnerabilities in binary firmware across different architectures is more crucial than ever. However, most of existing vulnerability search methods are limited to the same architecture, there are only some research on cross-architecture cases, of which the accuracy is not high. In this paper, to promote the accuracy of existing cross-architecture vulnerability search methods, we propose a new approach based on Support Vector Machine (SVM) and Attributed Control Flow Graph (ACFG) to search known vulnerability in firmware across different architectures at the function level. We employ a known vulnerability function to recognize suspicious functions in other binary firmware. First, considering the internal and external characteristics of the functions, we extract the function level features and basic-block level features of the functions to be inspected. Second, we employ SVM to recognize a little part of suspicious functions based on function level features. After the preliminary screening, we compute the graph similarity between the vulnerability function and suspicious functions based on their ACFGs. We have implemented our approach CVSSA, and employed the training samples to train the model with previous knowledge to improve the accuracy. We also search several vulnerabilities in the realworld firmware images; the experimental results show that CVSSA can be applied to the realistic scenarios. Keywords—firmware security; cross-architecture; SVM; bipartite matching

I.

INTRODUCTION

In general, firmware refers to not only the interface combining the hardware with the software, but also refers to the software residing in the hardware. Firmware is an important part of IoT systems, the BIOS in computer systems and the programs in extension ROM, as well as executable programs of common network devices such as routers, switches and webcams, are typical firmware. However, similar to the common software, firmware may also have vulnerabilities to leave the potential risk to IoT systems [1]. ZoomEye’s statistical report showed that, in

Xian Ma*, Yingshou Zhong The State Grid Qinghai Electric Power Company Xining, Qinghai, 810008, China [email protected] [email protected]

the event of backdoors found in D-Link routers in 2013, the affected router models were more than 10, 23% of more than 60,000 routers available to publicly search were affected. According to the report of OWASP in 2014, among the top 10 of the attacks on IoT systems, the attacks on software and firmware in embedded devices ranked ninth[2]. With more and more security incidents happening due to the malicious firmware [3], it makes people realize the importance of the vulnerability search in firmware. However, even when we know some vulnerabilities in the specific firmware, because the firmware may be compiled and deployed in various architectures by the vendors that we never know, and our knowledge is barren about the internal connections among them, it is difficult for us to derive the same vulnerabilities to other firmware across different architectures. On account of the difficulties to get the source code of the most of firmware, the known vulnerabilities search works mainly at the binary code. There exist abundant approaches to search known vulnerabilities at binary code level. However, most of these existing approaches neither utilized dynamic analysis nor were limited to the same architecture. The dynamic analysis on the firmware generally needs the specific devices or the simulation environment to run the target binary firmware, and there are several strict requirements for the target code to execute, so it is laborious to apply the dynamic analysis to the cross-architecture vulnerability search cases. The other approaches, such as the k-gram and sequence alignment of instructions, are closely related on the specific architecture to obtain the opcodes or instructions for analysis, which are difficult to be directly applied to the known vulnerability search across different architectures. After Pewny et al. [4] presented the pioneering works on the similarity comparison to the known binary vulnerabilities across different architectures, there are only some research on crossarchitecture known vulnerability search cases. Recently, one advanced method to search known vulnerability across different architectures is proposed by Eschweiler et al. [5]. They employed a pre-screening method to screen out the most of the dissimilar functions in the binary and then used MCS algorithm

to find out the true matching function among a few suspicious functions. Although the method is effective in the crossarchitecture cases, its pre-screening stage seems unreliable and has a bad effect on the accuracy. In this paper, taking into account of promoting the accuracy, we also adopted a staged strategy similar to Eschweiler’s work to recognize known vulnerabilities in firmware across ARM, MIPS and x86, and we can obtain a higher accuracy in the prescreening stage. First, considering the calls relationship, basic information and the structure of the functions, we extracted some typical features of the function which vary a little across different architectures. Second, in the pre-screening stage, different from Eschweiler’s work to get some suspicious functions by kNN [5], we employed SVM with previous knowledge to achieve the same goal, which was demonstrated to be more accurate than the kNN. Third, inspired by the Qian Feng’s work, we used bipartite matching to pick out the true matching functions from the suspicious functions which were left from the pre-screening stage, and the experimental results show that CVSSA can achieve a good accuracy. To sum up, our major contributions are as follows:  We show a staged approach CVSSA to search vulnerabilities in binary firmware across different architectures, which can take advantage of the knowledge that we have already known about the vulnerabilities.  We demonstrate the performance of CVSSA on two datasets. The results show that CVSSA can improve the accuracy of the vulnerability search in the pre-screening stage and also can be applied to the realistic scenarios. II.

APPROACH OVERVIEW

In this paper, we divided the framework of CVSSA into three stages, as shown in Fig.1. In the first stage, we utilize IDA pro [7] for pre-processing on the firmware. After the pre-processing, the firmware includes several binary files which consist of many functions after disassembling, so if we want to recognize a known vulnerability in firmware, we can inspect the suspicious functions in the binary files contained by the firmware. Preparing for similarity computing in the next two stages, we extract the function level and basic-block level features of the functions which vary a little across different architectures. The function level features mainly refer to the basic properties of the functions, like the set of strings, the times of function calls and so on. The basic-block level features mainly refer to the ACFGs of the functions, which can achieve a higher accuracy. In the second stage, we compute the similarity vector of the compared functions based on their function level features. Then we take the similarity vector as the input of the trained SVM model. The output of the model is a label telling us whether the compared functions are similar or not. In this way, we can screen out the most of these dissimilar functions and get a little part of suspicious functions left to be inspected in the next stage. In the third stage, we utilize the ACFG of each function to

Binary File F,G

Reverse Analysis

Vulnerability Function f Data Pre-processing

Function Pre-screening

F

Target Function g

Function Level Features

Function Level Features

Basic-block Level Features

Basic-block Level Features

G

Compute the Similarrity Vector of the Two funcitons SVM computing

Graph Similarity

ACFGs of the Two Functions

Bipartite Matching

Result Analysis

Fig. 1. The framework of CVSSA

further recognize the real match function among the suspicious functions. In fact, the ACFG is the digraph with a series of basic blocks, the edges of which are the links of those associated basic blocks with features [6]. To incorporate with these features, we use the bipartite graph matching to inspect the suspicious functions based on ACFGs, which can achieve a good result. III. IMPLEMENTATION In this section, we discuss the implementation of CVSSA. A. Data preparation Dataset I- Baseline assessment. For the baseline assessment, we used OpenSSL v1.01f and BusyBox v1.21 to compile for three architectures MIPS, x86 and ARM as [4, 5] have done, which generated about 3,6000 functions. The three chosen architectures are popular in IoT devices and the applications of no access to source code. The binaries we compiled are all unstripped, so we can take the symbolic information, such as the function names, as the basis of the assessment. Dataset II- Real-world firmware dataset. To evaluate the practicability of our approach, we applied our approach on the real-world vulnerability functions, which included five vulnerability functions in OpenSSL and the heartbeat function in DD-WRT (r21676) [10], and also on the firmware images dataset, which included four D-Link router firmware, DIR-815, DIR-300, DIR-600 and DIR-645. Those real-world vulnerability functions and firmware images are very common in people’s daily life, which are helpful to validate our method.

TABLE I.

TWO LEVEL FEATURES USED IN CVSSA

Type

Feature Name No. of Function Calls No. of Logic Instructions No. of Redirection Instructions No. of Transfer Instructions Size of Local Variable No. of Basic Blocks No. of Edges No. of Incoming Calls No. of Instructions No. of String Constants No. of Function Calls No. of Control Transfer Instructions No. of Arithmetic Instructions No. of Incoming Calls No. of Instructions No. of offspring Betweenness

Function level

Basic-block level

B. Data preprocessing On account of the strong ability of IDA pro to handle the binary files across a vast number of architectures, we used this tool to disassemble the firmware and used its API to write plugins for features extraction of the functions. We mainly extracted features in two levels: function level and basic-block level. We extracted the same type of features as [5] have done, which is shown in table I. The function-level features are used for the preliminary screening, and the basicblock level features mainly serve to graph matching of the ACFGs. C. Functions pre-screening based on SVM To screen out the most of the dissimilar functions and accelerate the vulnerability search, we used the function level features based on SVM to recognize a little part of suspicious functions to be inspected in the next stage. 1) Computing similarity vectors Given compared functions 𝑓 and 𝑔 , each function has a numeric feature vector of 9 dimensions described above, we calculated the similarity between each corresponding dimension of the compared feature vectors to get the similarity vector. We input the similarity vector between the compared functions into the SVM which have trained by training set, the output of the SVM was a label “1” or “0” telling us whether the two functions are similar or not. We used the same way used in [8] to compute the similarity vectors as following: 𝑐, 𝑠𝑖𝑚 = {

1−

𝑣𝑓 = 0 𝑎𝑛𝑑 𝑣𝑔 = 0 |𝑣𝑓 − 𝑣𝑔 | max(𝑣𝑓 , 𝑣𝑔 )

 , 𝑜𝑡ℎ𝑒𝑟,

In (1), c is a constant between 0 and 1, 𝑣𝑓 and 𝑣𝑔 are numerical value of the corresponding dimension of the feature vector of 𝑓 and 𝑔. 2) Construction of training samples Suppose that we know the vulnerability function on which architecture and the architectures of the suspicious functions are also known, we can fully take advantage of the previous knowle-

TABLE II. From→To ARM→MIPS MIPS→ARM ARM→x86 x86→ARM MIPS→x86 x86→MIPS

EVALUATION WITH OPENSSL ON SVM AND KNN Avg. No. of Candidates

Percent Correct(%)

kNN

SVM

kNN

SVM

128 128 128 128 128 128

110 110 109 109 77 77

99.1 98.9 57.1 88.9 58.6 95.3

99.7 99.7 99.1 99.1 99.4 99.4

dge we have known to get the higher accuracy of recognizing the known vulnerability on specific architecture to other different ones, thus we constructed the training samples for SVM model as following: Taking construction of training samples across ARM to MIPS for example, we compiled the source code of Busybox v1.21 on ARM and MIPS separately. After compiling, we got two unstripped binary files A and M. If 𝑓 is a function in A, there is a function 𝑓′ with the same name as the function 𝑓 in M. We took the similarity vector between the compared functions 𝑓 and 𝑓′ as the positive training samples with label “1”. Then randomly selecting ten functions 𝑔𝑖 in M, which were all different from the function 𝑓 in A. we took the similarity vector between the compared functions 𝑓 and 𝑔𝑖 as negative training samples with label “0”. The proportion of positive and negative training samples is 1:10. Noted that we only present one training samples in condition of ARM to MIPS, there are also other training samples in different conditions such as ARM to x86, MIPS to x86. 3) Evaluation of the functions pre-screening based on SVM To evaluate the performance of the trained SVM model we used, we took OpenSSL v1.0.1f as the test samples. In the same way, taking the condition of vulnerability searching across ARM to MIPS for example, we compiled the source code of OpenSSL v1.0.1f on ARM and MIPS, and got two unstripped binary files A and M. Given a function 𝑓 in A, we computed the similarity vector between 𝑓 and every function in M. Then we got a set of similarity vectors and put them into the trained SVM model. Based on the output of the SVM model, we could get a little part of suspicious functions from M, marked as similar to the function in A. Our goal is to pick out the true matching function in A from the suspicious functions. However, in some cases, there is no such function in the suspicious functions. We should note that if there is no true matching function in the suspicious functions we got in the pre-screening stage, the work in the next stage is meaningless, so we assessed the correctness of the pre-screening by the ratio of the true matching functions. We present the result of evaluation with OpenSSL on SVM and kNN in Table II. In [5], Eschweiler used kNN for suspicious functions pre-screening, which simply got the several closest functions to the vulnerable function on Euclidean distance. Although it is easy to implement the kNN, the method seems not so credible. Different from kNN, SVM can be trained by the training set and can learn from the known knowledge which is helpful for us to handle the new conditions. Once the SVM mod-

G1

G2

V1

V2

v11

v21

v12

v22

v13

v23

v14

ek

v24

Fig. 2. The ration of the true matching functions of different k values in the condition of from x86 to MIPS Fig. 3. Bipartite graph matching

el is trained, we can process the new data more easily and accurately. As shown in Table II, “From→To” refers to we recognize a known vulnerability from a architecture to another architecture as similar in [4]. The table presents the ratios of the true matching functions in different conditions, and shows the average number of suspicious functions on each deriving condition. We can see the accuracy of SVM is more than 99% in every architecture combinations, while the accuracy of the kNN is not stable, the lowest is only 57.1%. In addition, the average number of suspicious functions on SVM is much smaller than the k which is set to 128 in [5]. As shown in Fig.2, the ratio of the true matching functions from x86 to ARM changes along with the k values. If the k is smaller, the accuracy is lower, otherwise the accuracy is higher. However, even when the k is large enough, the accuracy of the kNN is just close to the SVM. Obviously, the SVM has a higher accuracy than the kNN, and do not need to know how to choose the k value. D. Graph similarity based on bipartite matching In this stage, our task is to pick out the true matching functions from the suspicious functions based on their ACFGs. In essence, the ACFG is exactly the CFG with a series of features which are helpful in the process of graph matching. Therefore, we transferred the function matching problem into a graph matching problem. There are several methods to handle this problem, like MCS used by [5] and bipartite matching used by [6]. In [6], it said that the bipartite matching seemed like a better choice than the MCS. Therefore, we chose the bipartite matching to achieve the graph similarity on the suspicious functions’ ACFGs, which assisted us to find out the true matching functions. 1) Bipartite graph matching based on ACFGs As shown in Fig.3, we can take ACFG 𝐺1 and ACFG 𝐺2 as one part of bipartite graph respectively as [6] have done. In [6], the combined bipartite graph 𝐺 = (𝑉, 𝐸), 𝑉 is a set of all nodes in 𝐺1 and 𝐺2 , 𝐸 is also a set of all edges in 𝐺1 and 𝐺2 . As already mentioned, the nodes refer to the basic blocks in ACFG, the edges refer to the links of every associated basic blocks, and each edge is assigned a specific cost to measure the matching distance between the basic blocks. There are so many matching conditions from the nodes in 𝐺1 to the nodes in 𝐺2 . However, we

TABLE III.

THE WEIGHT OF THE BASIC BLOCK LEVEL FEATURES

Feature Name No. of String Constants No. of Function Calls No. of Control Transfer Instructions No. of Arithmetic Instructions No. of Incoming Calls No. of Instructions No. of offspring Betweenness centrality

Weighting (𝛚) 11 66 150 56 66 42 199 31

only need to get the matching condition with the minimum matching distances on the entire matching basic blocks. We used Kuhn-Munkres [9] to inspect around the combined bipartite graph and then get the minimum matching distances of entire matching. In our approach, the match distance between the associated basic blocks is calculated by their feature vectors. We took the same way used in [6] to get the matching distance as following: ∑𝑖 𝜔𝑖 |𝛼1𝑖 − 𝛼2𝑖 | 𝑑(𝑣1 , 𝑣2 ) =  ∑𝑖 𝜔𝑖 𝑚𝑎𝑥 (𝛼1𝑖 , 𝛼2𝑖 ) In (2), 𝛼1𝑖 is the numeric value on a certain dimension of the feature vector on block 𝑣1 . And likewise, 𝛼2𝑖 is the numeric value on the corresponding dimension of the feature vector on the block 𝑣2 . 𝜔𝑖 represents the weighting we have assigned to each dimension of the feature vector. The detailed weighting values are shown in TABLE III. By Kuhn-Munkres algorithm, we can get the minimum matching distances of the compared graphs. We calculated the graph similarity of the compared graphs 𝑔1 and 𝑔2 as following: 𝜓(𝑔1, 𝑔2 ) = 1 −

𝑑(𝑔1, 𝑔2 ) − 𝑝 ∗ 𝑁 min(𝑑(𝑔1, ∅), 𝑑(𝑔2, ∅))

In (3), 𝑑(𝑔1, 𝑔2 ) is the minimum distances of the matching graphs. ∅ is the ACFG with all the feature values of zero, which has the same number of nodes as the matching graph. p is the penalty coefficient, 𝑁 is the difference between the number of compared ACFGs’ basic blocks. If the 𝑁 is large to a certain degree, we can directly say that the two compared ACFGs are not matching.

TABLE IV.

TOP X OF OPENSSL ON DIFFERENT ARCHITECTURES Top 1 (%) 73.93 80.17 82.05 91.18 61.56 64.34

From→To ARM→MIPS MIPS→ARM X86→ARM ARM→x86 MIPS→x86 x86→MIPS

Top 10 (%) 93.60 95.46 97.57 97.99 89.64 91.65

Top 100 (%) 98.69 98.66 99.33 99.15 99.01 98.48

2) Evaluation of graph similarity based on bipartite matching We still took OpenSSL v1.01f as the test data to assess the graph similarity based on bipartite matching. We obtained three unstripped binaries A, M, X after compiling the source code of OpenSSL across the three architectures we chose separately. Taking “ARM→x86” for example, given a function in A, we calculated the graph similarity between the given function and the entire functions which were suspicious in X, then we could get the true matching function in X and the rank it took up, generally at the first. We used Top 1, Top 10 and Top 100 [4] as the assessment criteria. The result is shown in TABLE IV. As shown in TABLE IV, the result of the condition of ARM to x86 performs best, of which the three Top indicators are 91.18%, 97.99% and 99.15% in OpenSSL. We can observe that, as long as there is MIPS architecture in the condition, the result is not so good. The possible reason is that the ACFGs of the functions in MIPS have more nodes than other two architectures. However, even the results are not so good on MIPS, the Top 1 indicator is still up to 61.5%, which is good enough. In general, our approach can reach a good accuracy. IV.

EXPERIMENTAL EVALUATION

In this section, we assess the experimental results of the experiments on our approach CVSSA. We mainly apply the approach on the real-world vulnerability functions and firmware images under realistic conditions, the experimental results show that CVSSA can be applied to realistic scenarios. A. Experiment Setup We used IDA pro v6.8 to reverse parse the binaries, and with its API, we employed python script to extract the two levels of features and ACFGs of the functions in the binaries. The SVM was implemented by the toolbox of Matlab R2016a. B. Experiments in real-world vulnerability functions and firmware images 1) Vulnerablities in OpenSSL We gathered five real-world vulnerability functions in OpenSSL as shown in TABLE V. For convenience, we gave them ID 1 to 5. We used CVSSA to recognize the five known vulnerability functions across different architectures. The results are shown in TABLE VI. As shown in TABLE VI, the order of five vulnerability functions in OpenSSL are all at first based on our approach CVSSA across the three architectures we chose, and the most of the number of the suspicious functions of the given vulnerability function is less than 100, the minimum is only 7.

TABLE V.

FIVE VULNERABILITY FUNCTIONS IN OPENSSL

Vulnerability Function c2i_ASN1_OBJECT EVP_DecodeUpdate X509_cmp_time tls1_process_heartbeat tls_decrypt_ticket TABLE VI.

CVE Number

ID

CVE-2014-3508 CVE-2015-0292 CVE-2014-3567 CVE-2014-0160 CVE-2014-1959

1 2 3 4 5

SEARCH RESULTS OF FIVE VULNERABILITY FUNCTIONS CVSSA

From→To

No. of candidates 2 3 4

1

ARM→MIPS MIPS→ARM ARM→x86 x86→ARM MIPS→x86 x86→MIPS TABLE VII. To From DIR-815 DIR-300 DIR-600 DIR-645

10 7 36 11 57 13

84 29 75 36 44 26

98 69 77 80 56 59

22 67 71 26 78 18

5

1

2

Rank 3

4

5

114 91 109 83 43 80

1 1 1 1 1 1

1 1 1 1 1 1

1 1 1 1 1 1

1 1 1 1 1 1

1 1 1 1 1 1

HEDWIGCGI_MAIN RANKS IN FIRMWARE FOR CVSSA

DIR-815

DIR-300

DIR-600

DIR-645

1 1 1

1 1 1

1 1 1

1 1 1 -

In addition, we also recognized the vulnerability function with ID 4 from the three architectures we chose to the MIPSbased firmware DD-WRT (r21676) [10] separately. When we employed the vulnerability function from the given architecture to recognize the suspicious functions to the DD-WRT, we could always successfully find the true matching function in the firmware. 2) Overflow vulnerabilities in firmware images We also evaluated our approach on recognizing known overflow vulnerabilities which is common in IoT devices. We took the D-Link router firmware as the test dataset, which is the router brand used most in people’s daily life. We chose four DLink router firmware, DIR-815, DIR-300, DIR-600 and DIR645, which had the same overflow vulnerability (SAP10008) as D-Link officially announced. The function that caused the overflow vulnerability is hedwigcgi_main(). The experimental results are shown in TABLE VII. As shown in TABLE VII, we recognized the vulnerability function hedwigcgi_main() from one router firmware to another. We can see, the rankings of hedwigcgi_main() in four firmware for CVSSA are almost at first, which demonstrates that we can recognize the vulnerability correctly. From the results of our experiments, we can see our approach can always identify the vulnerable function in the given firmware correctly. We can draw the conclusion that our approach can be well applied to realistic scenarios. It is worth mentioning that the four D-Link firmware images are all based on MIPS. Although our approach is aiming at recognizing known vulnerabilities across different architectures, it also performs very well on the same architecture.

V. RELATED WORK Different from the approaches that are to find undiscovered vulnerabilities, the target of our work is to drive the known vulnerability to other different architectures by function matching, thus we mainly discussed the related work on the approaches to recognize the known vulnerabilities, while the works, like Rozzle [11] and Driller [12], contributing to discover unknown vulnerabilities, is not within our discussion. A. Vulnerability search at source code level Using code similarity to do the known vulnerabilities search is a common means. There exist some works operating at source code for code similarity. Kamiya et al. [13] proposed a tokenbased tool to recognize similar source code fragments. In the same way, CP-Miner [14] employed the token sequence to search the similar source code across scalable software. The other approaches, such as Deckard [15] and the approach Yamaguchi et al. [16] proposed, took advantage of the abstract syntax tree to find out the copy-pasted code. Bellon et al. [17] even evaluated six clone approaches in various aspects. There are also some systems specifically designed for detecting unpatched similar codes on source code, such as ReDeBug [18]. B. Vulnerability search at binary level Due to the difficulties to get the source code of the most of the firmware, it is important to do the known vulnerability search at binary level. However, the lack of symbolic information in binaries makes it rather complicated to search the vulnerabilities at binary code. In the early phase, the works are focused on the approaches based on the sequence comparison of the instructions [19] or the bistream [20]. In the later, there are also some approaches to assess the semantics of the binary code. Binhunt [21] and its enhancing project iBinHunt [22] used symbolic execution and the theorem prover to capture the semantics of the basic blocks. Based on the semantic equivalence, the two approaches are easily affected by the little changes in code, so it is difficult for them to do the vulnerability search. Moreover, they are only for single architecture, which is unable to derive the known vulnerability across different architectures. BINJUICE [23] and BINHASH [24] also operated on basic block level, they could not support multi architectures, too. TEDEM [25] employed the expression tree of the basic blocks to find vulnerabilities in binaries. There also exist a few methods for known vulnerability search across various architectures at binary level. Pewny et al. [4] presented the pioneering work to recognize the known vulnerabilities across various architectures. They utilized VEX [26] to weaken the influence of the different architectures, and then they employed I/O behaviors to grasp the semantics on code similarity computing. They even used MinHash to speed up the searching process, however, there are still some efficiency problems of the strategy [5]. Taking into account of the efficiency problem, DiscovRE [5] used a staged strategy to screen out the most of the dissimilar functions and then got a little part of suspicious functions to be inspected by graph matching. However, the pre-screening strategy was demonstrated unreliable by [4]. Instead of performing the

vulnerability search on CFGs originally, Qian Feng et al. [6] transformed the ACFGs into the numerical vectors for similarity computing, which was encouraged by the search methods in other areas. C. Vulnerability search on dynamic analysis Different from the static analysis methods, dynamic analysis methods run the target code on specific devices or in simulated surroundings for the vulnerability search. Avatar [29] combined real-world embedded devices with simulated environment to execute the suspicious code dynamically. Similar to our work, Egele et al. [27] used BLEX to recognize the known vulnerability functions at binary level. However, they did not analyze the function code in static as we did but executed each function under the simulated surroundings to get their runtime features. Due to its need of specific environment, BLEX is not suitable for deriving the known vulnerabilities across different architectures. Moreover, it is just useful for the dynamic methods on firmware images in early phase [28, 29]. VI. RESTRICTIONS OF CVSSA Although our approach is proved to be able to recognize the vulnerability function across different architectures in some extend, there are still some restrictions of our approach. Firstly, in our approach, the premise is that each of the vulnerability is caused by a specific function, that is, we apply CVSSA in function level. However, there are some cases that the vulnerabilities are not only raised by one function but also by some other functions, which work together with the vulnerabilities. There are also some cases that the vulnerabilities are only raised by a fraction of the code and sometimes even by a string constant in the function. Although our approach cannot handle the cases described above, they are just minority, the function-level is enough for us to tackle with the most application scenarios. Secondly, in our approach, there is no assessment criteria to assess the false alarm rate. In our all experiments, all of the test data we used contained the specific suspicious function that matched the known vulnerability. However, there are some cases that some of the functions we found with the known vulnerabilities are not the true matching functions. This problem will be included in our future work. VII. CONCLUSION AND FUTURE WORK In this paper, we demonstrated and implemented a staged method CVSSA to recognize the known vulnerabilities in firmware across different architectures, mainly ARM, MIPS and x86. Compared to the previous work, our approach utilized the knowledge already known about the vulnerability to achieve a higher accuracy in the pre-screening stage. We also applied the approach in the real-world firmware images, it could correctly recognize the vulnerability functions, such as the overflow vulnerability, across the three architectures we chose. In the future, besides considering the problem of the false alarm rate, we will take into account of comparing other machine learning algorithms, such as neural network and decision tree with SVM on vulnerability search in firmware. By analyzing

their advantages and disadvantages respectively, we can choose the most appropriate approach according to the different application situations. We are also going to do some vulnerability search works on some different compilers and optimization options. Although they are not mentioned in this paper, they have important influence in the accuracy of vulnerability search. If we can not only know the architecture but also the compiler and even the optimization option of the binary firmware before we inspect, there is no doubt that we can reach higher accuracy. In addition, we plan to go further with the problem of dynamic link libraries. In this paper, we took the functions similarity of the dynamic link libraries as a constant. By analyzing the library files where the functions to be inspected reside, we can extract the features of those functions and reach a higher accuracy. We are also going to apply our approach to a large firmware images dataset in the future. ACKNOWLEDGMENT This work was partially supported by the National Natural Science Foundation of China (Grant No. 61672398), the Key Natural Science Foundation of Hubei Province of China (Grant No. 2015CFA069), and the Applied Fundamental Research of Wuhan (Grant No. 20160101010004). REFERENCES [1]

[2]

[3]

[4]

[5]

[6]

[7] [8]

[9]

A. Costin, J. Zaddach, A. Francillon, and D. Balzarotti. “A Large-Scale Analysis of the Security of Embedded Firmwares.” USENIX Security Symposium. 2014, pp. 95-110. Open Web Application Secutity Project. OWASP Internet of Things Proj ect( TOP 10 IOT Vulnerabilities (2014) Project) [EB/OL]. (2016-02-05) [2016-03-10]. https://www.owasp.org/index.php/OWASP_Internet_of_T hings_Top_Ten_Project#tab=Top_10_IoT_Vulnerabilities_282014_29 F. Adelstein, M. Stillerman, and D. Kozen. “Malicious code detection for open firmware,” Computer Security Applications Conference, 2002. Proceedings. 18th Annual. IEEE, 2002, pp. 403-412. J. Pewny, B. Garmany, R. Gawlik, C. Rossow, and T. Holz. “CrossArchitecture Bug Search in Binary Executables,” in Proceedings of the 36th IEEE Symposium on Security and Privacy (S&P), 2015, pp. 709-724. S. Eschweiler, K. Yakdan, E. Gerhards-Padilla. “discovre: Efficient crossarchitecture identification of bugs in binary code,” in Proceedings of the 23th Symposium on Network and Distributed System Security (NDSS). 2016, pp. 381-396. Q. Feng, R. Zhou, C. Xu, et al. “Scalable Graph-based Bug Search for Firmware Images,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2016, pp. 480-491. IDA Pro - Interactive Disassembler. Available in: http://www.hexrays.com/idapro/. Q. Chang, Z. Liu, M. Wang, et al. “VDNS: An Algorithm for CrossPlatform Vulnerability Searching in Binary Firmware,” Journal of Computer Research and Development, vol. 53, no.10, pp. 2288-1198, 2016. Bourgeois F, Lassalle J C. “An extension of the Munkres algorithm for the assignment problem to rectangular metrices,” Communications of the ACM, vol. 12, no.12, pp. 802-804, 1971.

[10] DD-WRT. r21676, May 2013. Available in: http://tinyurl.com/ddwrt21676. [11] C. Kolbitsch, B. Livshits, B. Zorn, and C. Seifert. “Rozzle: De-cloaking internet malware,” Security and Privacy (SP), 2012 IEEE Symposium on. IEEE, 2012, pp. 443-457. [12] N. Stephens, J. Grosen, C. Salls, A. Dutcher, and R. Wang. “Driller: Augmenting Fuzzing Through Selective Symbolic Execution,” In NDSS. 2016, pp. 1-16. [13] Kamiya T, Kusumoto S, Inoue K. “CCFinder: a multilinguistic tokenbased code clone detection system for large scale source code,” IEEE Transactions on Software Engineering, vol. 28, no. 7, pp. 654-670, 2002. [14] Z. Li, S. Lu, S. Myagmar, and Y. Zhou. “CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code,” OSDI. 2004, pp. 289-302. [15] L Jiang, G. Misherghi, Z. Su, and S. Glondu. “Deckard: Scalable and accurate tree-based detection of code clones,” Proceedings of the 29th international conference on Software Engineering. IEEE Computer Society, 2007, pp. 96-105. [16] F. Yamaguchi, N. Golde, D. Arp, and K. Rieck. “Modeling and discovering vulnerabilities with code property graphs,” Security and Privacy (SP), 2014 IEEE Symposium on. IEEE, 2014, pp. 590-604. [17] S. Bellon, R. Koschke, G. Antoniol, et al. “Comparison and evaluation of clone detection tools,” IEEE Transactions on software engineering, vol. 33, no. 9, pp. 577-591, 2007. [18] J. Jang, A. Agrawal, D. Brumley. “ReDeBug: finding unpatched code clones in entire os distributions,” Security and Privacy (SP), 2012 IEEE Symposium on. IEEE, 2012, pp. 48-62. [19] Y. David, E. Yahav. “Tracelet-based code search in executables,” Acm Sigplan Notices, vol. 49, no. 6, pp. 349-360, 2014. [20] G. Myles, C. Collberg. “K-gram based software birthmarks,” Proceedings of the 2005 ACM symposium on Applied computing. ACM, 2005, pp. 314-318. [21] D. Gao, M K. Reiter, D. Song. “Binhunt: Automatically finding semantic differences in binary programs,” International Conference on Information and Communications Security. Springer Berlin Heidelberg, 2008, pp. 238-255. [22] J. Ming, M. Pan, D. Gao. “iBinHunt: Binary hunting with inter-procedural control flow,” International Conference on Information Security and Cryptology. Springer Berlin Heidelberg, 2012, pp. 92-109. [23] A. Lakhotia, M D. Preda, R. Giacobazzi. “Fast location of similar code fragments using semantic ‘juice’,” Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop. ACM, 2013, pp. 1-6. [24] W. Jin, S. Chaki, C. Cohen, et al. “Binary function clustering using semantic hashes,” Machine Learning and Applications (ICMLA), 2012 11th International Conference on. IEEE, 2012, pp. 386-391. [25] J. Pewny, F. Schuster, L. Bernhard, T. Holz, and C. Rossow. “Leveraging semantic signatures for bug search in binary programs,” Proceedings of the 30th Annual Computer Security Applications Conference. ACM, 2014, pp. 406-415. [26] Valgrind Documentation. Available in: http://valgrind.org/docs/manual/i ndex.html [27] M. Egele, M. Woo, P. Chapman, and D. Brumley. “Blanket execution: Dynamic similarity testing for program binaries and components,” USENIX, 2014, pp. 303-307. [28] D D. Chen, M. Woo, D. Brumley, and D. Brumley. “Towards Automated Dynamic Analysis for Linux-based Embedded Firmware,” In NDSS. 2016, pp. 1-16. [29] J. Zaddach, L. Bruno, A. Francillon, and D. Balzarotti. “AVATAR: A Framework to Support Dynamic Security Analysis of Embedded Systems' Firmwares,” In NDSS. 2014, pp. 1-16.