Manipulating the Attacker's View of a System's Attack ... - IEEE Xplore

2 downloads 4670 Views 2MB Size Report
Abstract—Cyber attacks are typically preceded by a recon- naissance phase in which attackers aim at collecting valuable information about the target system, ...
2014 IEEE Conference on Communications and Network Security

Manipulating the Attacker’s View of a System’s Attack Surface Massimiliano Albanese∗ , Ermanno Battista† , Sushil Jajodia∗ , and Valentina Casola† ∗ Center

for Secure Information Systems George Mason University, Fairfax, VA 22030, USA Email: {malbanes,jajodia}@gmu.edu † Department of Electrical Engineering and Information Technology University of Naples Federico II, Naples, NA 80125, Italy Email: {ermanno.battista,casolav}@unina.it Abstract—Cyber attacks are typically preceded by a reconnaissance phase in which attackers aim at collecting valuable information about the target system, including network topology, service dependencies, and unpatched vulnerabilities. Unfortunately, when system configurations are static, attackers will always be able, given enough time, to acquire accurate knowledge about the target system and engineer effective exploits. To address this important problem, many adaptive techniques have been devised to dynamically change some aspects of a system’s configuration in order to introduce uncertainty for the attacker. In this paper, we advance the state of the art in adaptive defense by looking at the problem from a control perspective and proposing a graph-based approach to manipulate the attacker’s view of a system’s attack surface. To achieve this objective, we formalize the notion of system view and distance between views. We then define a principled approach to manipulate responses to attacker’s probes so as to induce an external view of the system that satisfies certain desirable properties. In particular, we propose efficient algorithmic solutions to different classes of problems, namely (i) inducing an external view that is at a minimum distance from the internal view while minimizing the cost for the defender; (ii) inducing an external view that maximizes the distance from the internal view, given an upper bound on the admissible cost for the defender. Experiments conducted on a prototypal implementation of the proposed algorithms confirm that our approach is efficient and effective in steering the attackers away from critical resources.

I.

I NTRODUCTION

Today’s approach to cyber defense is governed by slow and deliberative processes such as security patch deployment, testing, episodic penetration exercises, and human-in-the-loop monitoring of security events. Adversaries can greatly benefit from this situation, and can continuously and systematically probe target networks with the confidence that those networks will change slowly if at all. In fact, cyber attacks are typically preceded by a reconnaissance phase in which adversaries aim at collecting valuable information about the target system, including network topology, service dependencies, and unpatched vulnerabilities. As most system configurations are static – hosts, networks, software, and services do not reconfigure, adapt, or regenerate except in deterministic ways to support maintenance and uptime requirements – it is only This work was partially supported by the Army Research Office under grants W911NF-13-1-0421, W911NF-09-1-0525, and W911NF-13-1-0317, and by the Office of Naval Research under grants N00014-12-1-0461 and N0001413-1-0703.

978-1-4799-5890-0/14/$31.00 ©2014 IEEE

472

a matter of time for attackers to acquire accurate knowledge about the target system. This will eventually enable them to engineer reliable exploits and pre-plan their attacks. In order to address this important problem, significant work has been done in the area Adaptive Cyber Defense (ACD), which includes concepts such as Moving Target Defense (MTD), as well as artificial diversity and bio-inspired defenses, to the extent they involve system adaptation for security and resiliency purposes. Essentially, a number of techniques have been proposed to dynamically change a system’s attack surface by periodically reconfiguring some aspects of the system. In [1], a system’s attack surface has been defined as the “subset of the system’s resources (methods, channels, and data) that can be potentially used by an attacker to launch an attack”. Intuitively, dynamically reconfiguring a system is expected to introduce uncertainty for the attacker and increase the cost of the reconnaissance effort. However, one of the major drawbacks of current approaches is that they force the defender to periodically reconfigure the system, and this may introduce a costly overhead for legitimate users, as well as the potential for denial of service conditions. Additionally, most of the existing techniques are purely proactive in nature or do not adequately consider the attacker’s behavior. In this paper, we aim at advancing the state of the art in adaptive cyber defense by looking at the problem from a control perspective and proposing a graph-based approach to manipulate the attacker’s perception of a system’s attack surface. To achieve this objective, we formalize the notion of system view as well as the notion of distance between views. We refer to the attacker’s view of the system as the external view and to the defender’s view as the internal view. A system’s attack surface can then be thought of as the subset of the internal view that would be exposed to potential attackers when no deceptive strategy is adopted. Starting from these definitions, we develop a principled yet practical approach to manipulate responses to attacker’s probes so as to induce an external view of the system that satisfies certain desirable properties. In particular, we propose efficient algorithmic solutions to different classes of problems, namely (i) inducing an external view that is at a minimum distance from the internal view while minimizing the cost for the defender; (ii) inducing an external view that maximizes the distance from the internal one, given an upper bound on the admissible cost for the defender.

2014 IEEE Conference on Communications and Network Security

Observable Behavior Probes

Observable Variables

+

System

Standard Controller

+

Control Signals Probe Responses

Attacker

ACD Controller

Proposed Solution

Fig. 1.

Control perspective

Figure 1 shows how standard and ACD-based controllers can be adopted in order to alter the attacker’s perception of the attack surface. State of the art solutions aim at dynamically reconfiguring the system in order to create a moving target for the attacker. The controller uses internal and external system variables to monitor the system’s state and realize a moving target objective. On the other hand, our approach aims at transparently altering the attacker’s perception. The attacker’s knowledge of the system is inferred from the system’s observable variables (e.g., ports, packets, web-pages content, DNS records). In order to gather additional information, the attacker probes the system. The proposed ACD controller identifies such probes and controls the information exposed through the system’s observable variables by crafting adequate probe responses to bias the attacker’s view of the system. Compared to traditional approaches, our novel approach presents two key advantages: (i) we can introduce uncertainty for the attackers without actually changing the system’s configuration, thus minimizing the overhead; (ii) we can deceive the attackers and steer them away from critical resources rather than simply introducing uncertainty and forcing them to use a random strategy. Experiments conducted on a prototypal implementation of the proposed algorithms confirm that our approach is efficient and effective in steering the attackers away from critical resources. The remainder of the paper is organized as follows. Section II discusses related work. Section III discusses the threat model whereas Section IV presents a motivating example. Then, Section V provides a detailed description of our approach and presents the problem statement as well as the proposed algorithms. Finally, Section VI reports the results of our experiments, and Section VII gives concluding remarks. II.

R ELATED W ORK

Moving Target Defense (MTD) defines mechanisms and strategies to increase complexity and cost for attackers [2]. MTD approaches aiming at selectively altering a system’s attack surface usually involve reconfiguring the system in order to make attacks’ preconditions impossible or unstable. Dunlop et al. [3] propose a mechanism to dynamically hide addresses of IPv6 packets to achieve anonymity. This is done by adding virtual network interface controllers and sharing a secret among all the hosts in the network. In [4], Duan et al. present a proactive Random Route Mutation technique to randomly change the route of network flows to defend against eavesdropping and DoS attacks. In their implementation, they use OpenFlow Switches and a centralized controller to define the route of each flow. Jafarian et al. [5] use an IP virtualization

473

scheme based on virtual DNS entries and Software Defined Networks. Their goal is to hide network assets from scanners. Using OpenFlow, each host is associated with a range of virtual IP addresses and mutates its IP address within its pool. A similar identity virtualization approach is presented in [6]. In Chapter 8 of [7], an approach based on diverse virtual servers is presented. Each server is configured with a set of software stacks, and a rotational scheme is employed for substituting different software stacks for any given request. This creates a dynamic and uncertain attack surface. Casola et al. [8], [9] propose an MTD approach for protecting resource-constrained distributed devices through fine-grained reconfiguration at different architectural layers. These solutions tend to change the system in order to modify its external attack surface. On the other hand, the external view of the system is usually inferred by the attackers based on the results of probing and scanning tools. Starting from this observation, our approach consists in modifying system responses to probes in order to expose an external view of the system that is significantly different from the actual attack surface, without altering the system itself. Reconnaissance tools, such as nmap or Xprobe2, are able to identify a service or an operating system by analyzing packets that can reveal implementation specific details about the host [10], [11]. Network protocol fingerprinting refers to the process of identifying specific features of a network protocol implementation by analyzing its input/output behavior [12]. These features may reveal specific information such as protocol version, vendor information and configurations. Reconnaissance tools store known system’s features and compare them against the scan responses in order to match a fingerprint. Watson et al. [11] adopted protocol scrubbers in order to avoid revealing implementation-specific information and restrict an attacker’s ability to determine the operating system of a protected host. Moreover, some proof-of-concept software and kernel patches have been proposed to alter a system fingerprint [13], such as IP Personality and Stealth Patch. Honeypots have been traditionally used to try to divert attackers away from critical resources. Although our approach and honeypots share a common goal, they are significantly different. Our approach does not alter the system while honeypotbased solutions introduce vulnerable machines in order to either capture the attacker [14] or collect information for forensic purposes [15]. Instead, we aim at diverting attackers by manipulating their view of the target system and forcing them to plan attacks based on incorrect or inaccurate knowledge. As such, these attacks will likely fail. To the best of our knowledge, we are the first to propose an adaptive mechanism for changing the attacker’s view of a system’s attack surface without reconfiguring the system itself. III.

T HREAT M ODEL

We assume an external adversary who is attempting to infer a detailed view of the target network. The threat model assumes that the attacker will use reconnaissance tools such as nmap [10] to discover active hosts in the network. The information attackers aim at discovering includes operating systems, exposed services and their version, network layout and routing

2014 IEEE Conference on Communications and Network Security

Scan Target

Our goal is to modify the attacker’s view of the system and its attack surface. In order to do so, we only modify system-dependent information exposed by system-specific protocol implementations. By adopting protocol and fingerprint scrubbers [11], [13], we alter and filter all the outgoing traffic. Using these techniques and a graph-based strategy to generate different views of the system by repeatedly applying view manipulation primitives, we can force attackers to infer a different view of the system, such as the one in Figure 4.

Port/Service List

Analysis of Target

Identify Service to Exploit

Search Exploit for Service

Yes

Found?

Configure the Exploit

Run Exploit

Analysis of Target

Gain Access and Remote Shell

No

End

Yes

Public Web Servers

Goal?

DNS Server

B

C

I

Order Processing Stock Management Server Server F E

No Setup a Proxy Tunnel

Internet

Scan the Network from the Exploited Machine

Firewall

Fig. 2.

Attacker’s strategy flowchart

The attacker strategy is illustrated in Figure 2 through a flowchart. The attacker’s goal is to launch an exploit against one of the hosts in the target network. With this exploitation the attacker will gain sufficient privileges to proceed in further analysis of the network. Multiple stages of this attack strategy (marked with a red cross in the figure) can be defeated using our approach. For instance, we may expose services with no publicly available exploits. On the other hand, for a given service, we may expose exploitable vulnerabilities which do not correspond to the actual vulnerabilities of that service. IV.

M OTIVATING E XAMPLE

As a reference example, we consider the distributed system of Figure 3, representing the IT infrastructure of an e-commerce company. Customers access publicly available services throughout a public website hosted in the DMZ. The business logic and critical services are implemented in the Intranet and some of these are required to be accessible through the Internet in order to allow company branches to process bulk orders and query the inventory status. Public Web Server

DNS Server A

Internet

Stock Management Server A

Firewall

Management Server

Fig. 4.

Firewall

C Database Server

Management Server

Topology and configuration presented to the attacker

In Figure 4, we depict a manipulated service with a different color/texture – compared to Figure 3 – of the machine where it is deployed. A change of an operating system is represented by a different letter on the top right corner of each machine. In this example, applying several manipulation steps to the original view, we move from a scenario where all the servers have the same operating system to a scenario in which each operating system is different from the real one and the others. In the same way, all the services under our control are altered. For instance, we alter the database server fingerprint so as it will be recognized as an implementation from a different vendor. As for the public web server, we want it to act like two web servers in a load balancing configuration. To do so we mutate, with a certain frequency, both the OS and service fingerprints as well as modify packet level parameters. In this way, we can force the attacker to believe that multiple servers need to be compromised in order to disrupt the service. V.

O UR A PPROACH

In order to achieve our goal of inducing an attacker’s view of the system’s attack surface that is measurably different from the internal view, we first need to formalize the notion of view, as well as the notion of manipulation primitive and distance between views (Section V-A). A. View Model

A

In the following, we assume a system is a set S = {s1 , s2 , . . . , sn } of devices (e.g., hosts, firewalls), and use Ψ to denote the set of services that can be offered by hosts in S. The defender’s and attacker’s knowledge of the system is represented by views, as defined below.

Intranet

A Mail Server

Management Server

Order Processing Server A

DMZ Firewall

Fig. 3.

D Mail Server

information. They will then leverage this knowledge to plan and execute attacks aimed at exploiting exposed services. We also assume that the attacker will use an OS fingerprint technique based on sending valid and invalid IP packets and studying the respective responses. We do not explicitly address techniques based on timing and data analysis. Moreover, we limit the service fingerprinting to the case of TCP probes, as it is the case for common probing tools.

Intranet

DMZ

A Database Server

Management Server

Definition 1 (System’s View): Given a system S, a view of S is a triple V = (So , C, ν), where So ⊆ S is a set of observable devices, C ⊆ So × So represents connectivity between elements in So , and ν : So → 2Ψ is a function that maps each host in So to the set of services it offers.

Topology and configuration of the reference system

474

2014 IEEE Conference on Communications and Network Security

Intuitively, a view represents knowledge of a subset of the system’s devices and includes information about the topology as well information about services offered by reachable hosts1 . Definition 2 (Manipulation Primitive): Given a system S and a set V of views of S, a manipulation primitive is a function π : V → V that transforms a view V 0 ∈ V into a view V 00 ∈ V. Let Π denote a family of such functions. For each π ∈ Π, the following properties must hold. ∀V 0 , V 00 ∈ V



to generate external views starting from the internal view V of a system S. Definition 4 (Distance): Given a system S and a set V of views of S, a distance over V is a function δ : V × V → R such that, ∀ V 0 , V 00 , V 000 ∈ V, the following properties hold:  δ V 0 , V 00 ≥ 0  δ V 0 , V 00 = 0 ⇐⇒ V 0 = V 00

 V 00 = π V 0 = 6 V0

(∀V ∈ V) (@hπ1 , π2 , ..., πm i ∈ Πm | π1 (π2 (...πm (V ) ...)) = π (V ))

  δ V 0 , V 00 = δ V 00 , V 0

Intuitively, a manipulation primitive is an atomic transformation that can be applied to a view to obtain a different view.

   δ V 0 , V 000 ≤ δ V 0 , V 00 + δ V 00 , V 000

Example 1: A possible manipulation primitive is πOSB (V 0 ). This primitive transforms a view V 0 into a view V 00 by changing the operating system fingerprint of a selected host2 . Figure 5 is a graphical representation of the effect of this primitive on the system’s view. This primitive can be implemented as defined in [11], [13]. A

Define OS Fingerprint

B

Definition 5 (Path Set): Given a view manipulation graph G, the path set PG for G is the set of all possible paths (sequence of edges) in G. We denote the k-th path, of length mk , in PG as pk = hπi1 , πi2 . . . , πimk i.

⇡OSB DMZ

DMZ

V' Fig. 5.

Example 3: In the simplest case, the distance can be measured by evaluating the number of elements that change between views. To do so, we can consider the difference in the number of hosts between the two views. Then, for each host that is present in both views we add one if OS fingerprints differ and we add one if service fingerprints differs. More sophisticated distances can be defined, but this is beyond the scope of this paper.

V''

Example 4: Consider the View Manipulation Graph G in Figure 6. In this case, the set of all possible paths is:

Example of manipulation primitive

Definition 3 (View Manipulation Graph): Given a system S, a set V of views of S, and a family Π of manipulation primitives, a view manipulation graph for S is a directed graph G = (V 0 , E, `), where 0



V ⊆ V is a set of views of S;



E ⊆ V × V is a set of edges;



` : E → Π is a function that associates with each edge (V 0 , V 00 ) ∈ E a manipulation primitive π ∈ Π such that V 00 = π(V 0 ).

PG = {p1 = hπOSA i, p2 = hπM AILv1.5 i, p3 = hπOSA , πM AILv1.5 i, p4 = hπOSB i, p5 = hπW EBv2.2 i, p6 = hπF T Pv3.5 i, p7 = hπOSB , πW EBv2.2 i, p8 = hπW EBv2.2 , πF T Pv3.5 i, p9 = hπOSB , πW EBv2.2 , πF T Pv3.5 i} pk

We will use the notation Va → Vb to refer to a path pk originating from Va and ending in Vb . For instance, in Figure 6, the path which goes from V to V5 is p9 V → V5 = hπOSB , πW EBv2.2 , πF T Pv3.5 i. Definition 6 (Cost Function): Given a path set PG , a cost function fc is a function fc : PG → R that associate a cost to each path in PG . The following properties must hold.

The node representing the internal view has no incoming edges. All other nodes represent possible external views. Example 2: Figure 6 shows an example of View Manipulation Graph. As one can notice, starting from the internal view V , multiple external views can be derived. ⇡ OS A

V

⇡OS

B

V1

⇡M AILv1.5

(1)

 h    i  fc hπij , πij+1 i ≥ min fc hπij i , fc hπij+1 i

(2)

      fc hπij , πij+1 i ≤ fc hπij i + fc hπij+1 i

(3)

If Equation 3 holds strictly, then fc is said to be additive.

V2

B. Problem Statement V3 ⇡ V5 V4 ⇡ F T Pv3.5 W EBv2.2 Internal Attack Surface

Fig. 6.

fc (hπi i) ≥ 0

We can now formalize the two related problems we are addressing in this paper.

External View

Example of View Manipulation Graph

After applying any π ∈ Π, a new view is generated. By analyzing the graph, one can enumerate all the possible ways 1A

more complete definition of view could incorporate information about service dependencies and vulnerabilities, similarly to what proposed in [16]. 2 Each primitive may have a set of specific parameters, which we omit for the sake of simplicity in the presentation.

475

Problem 1: Given a view manipulation graph G, the internal view Vi and a distance threshold d ∈ R, find an pd external view Vd and a path Vi → Vd which minimizes fc (pd ) subject to δ (Vi , Vd ) ≥ d. Problem 2: Given a view manipulation graph G, the internal view Vi and a budget b ∈ R, find an external view pb Vb and a path Vi → Vb which maximizes δ (Vi , Vb ) subject to fc (pb ) ≤ b.

2014 IEEE Conference on Communications and Network Security

Algorithm 1 GenerateGraph(O, C, d)

C. Algorithms In this section, we present the algorithms used to solve the problems defined in Section V-B. For both problems we adopted a top-k heuristic approach. The algorithms start from the internal view and proceed with the state space exploration by iteratively traversing the most promising outgoing edges until a termination condition is reached. To limit the exponential explosion of the search space only the k most promising edges are traversed. To quantify the benefit of traversing a given edge, we can define a benefit score as the ratio of the distance between the corresponding views to the cost for achieving that distance. For each node in the graph, we only traverse the k outgoing edges with the highest values of the benefit score. 1) Algorithm T opKDistance: To solve Problem 1, we first generate the view manipulation graph and then process it with the top-k algorithm. For efficiency purposes, we only generate a sub-graph Gd of the complete view manipulation graph G, such that generation along a given path stops as soon as the distance from the internal view becomes equal to or larger than the minimum required distance d. We can limit the graph generation up to this point because any additional edge in the graph will increase the cost of the solution, thus it will not be included in the optimal path. Algorithm 1 describes how we generated the sub-graph up to a distance d from the internal view. The algorithm uses a queue Q to store the vertices to be processed. At each iteration of the while loop (Line 3), a vertex v is popped from the queue and the maximum distance from the internal view O is updated (Line 4). The constant MAX INDEGREE (Line 5) is used to test if a node has been fully processed. When the in-degree is equal to MAX INDEGREE the node has been linked to all the nodes that differ by only one element in the configuration. In this case, there are no new vertices to generate from this node. Given the set C(s) of all admissible configurations per each device sP∈ S, MAX INDEGREE can be computed as MAX INDEGREE = s∈S |C(s)| − 1. On Line 6 the set of v’s predecessors is retrieved. The function createCombinations (Line 7) generates all possible combinations of configurations that can be generated starting from v and that have only one configuration element that differs form v’s configuration. Both v and predecessors are excluded from the returned array. The function getOneChangeV ertices (Line 11) returns an array of all the vertices whose configuration differs just for one element from the vertex given as input. Those vertices need to be linked with the vertex v (Lines 12-14). Lines 15-17 check if the maximum distance d has been reached. If this is not the case, the new generated vertex is pushed into the queue to be examined. Once the subgraph has been generated, we can run the topk analysis using Algorithm 2, which recursively traverses the subgraph to find a solution. We use v to denote the vertex under evaluation in each recursive call. Line 1 creates an empty list to store all the paths discovered from v. Line 2 is one of the two termination conditions. It checks whether the current distance is greater than or equal to d or no other nodes can be reached. The second term in the termination condition takes into account both the case of a node with no outgoing edges and the case of a node whose

476

Input: The internal view vo , the set of admissible configurations per host C, the minimum required distance d. Output: A subgraph Gd of the complete view manipulation graph with vertices within distance d of vo . 1: // Initialization: Q is the queue of vertices to be processed (initially empty); Gd is also initially empty. 2: maxDistance ← 0; Gd .addV ertex(vo ); Q.push(vo ); 3: while Q 6= ∅ do 4: v ← Q.pop(); maxDistance ← max (maxDistance, δ(vo , v)); 5: if v.indegree < MAX INDEGREE then 6: predecessors ← Gd .getP redecessors(v) 7: newV ertices ← createCombinations(v, predecessors, C) 8: for all vn ∈ newV ertices do 9: Gd .addV ertex(vn ) 10: Gd .addDirectedEdge(v, vn , cost(v, vn ), δ(v, vn )); //Add an edge from v to vn 11: verticesT oLink ← getOneChangeV ertices(Gd , vn ) 12: for all vtl ∈ verticesT oLink do 13: Gd .addBidirectionalEdge(v, vn , cost(vn , vtl), δ(vn , vtl)) //Add a bidirectional edge between vn and vtl 14: end for 15: if maxDistance ≤ d then 16: Q.push(vn ) 17: end if 18: end for 19: end if 20: end while

successors are also its predecessors. We do not consider edges directed to predecessors in order to construct loop-free paths. If the termination condition is satisfied, then a solution has been found and a path from v to vo can be constructed by closing the recursion stack. Line 3 creates an empty path and adds v to this path. Then, the path list is updated and returned. On Line 6 all the outgoing edges originating from v are sorted by decreasing benefit score. Then, Lines 7-13 perform the top-k analysis. For each of the best k destinations, T opKDistance is recursively invoked. The result of this evaluation is a list of paths. The vertex v is then set as the origin of each of these paths (Lines 10-12). On Line 14, the algorithm terminates and the updated path list is returned. Algorithm 2 T opKDistance(G, v, k, cc, cd) Input: A graph G graph, a vertex v, an integer k, a minimum distance d, the current cost cc, and the current distance cd. Output: List of paths pathList. 1: pathList ← ∅ 2: if cd ≥ d ∨ v.successors \ (v.successors ∩ v.predecessors) = ∅ then 3: p ← emptyP ath; p.addV ertex(v); pathList ← {p}; 4: return pathList 5: end if 6: sort(v.outgoingEdges); numT oEval ← min(k, |v.outgoingEdges|); 7: for all i ∈ [1, ≤ numT oEval] do 8: vn ← v.outgoingEdges[i].getDestination() 9: eval ← T opKDistance(G, vn , k, d, update(cc), update(cd)) 10: for all p ∈ eval do 11: p.addV ertex(v); p.addDirectedEdge(v, vn , δ(v, vn ), cost(v, vn )); //Add an edge from v to vn 12: end for 13: end for 14: return pathList

2) Algorithm T opKBudget: Algorithm 3 implements both graph generation and exploration in order to improve time efficiency in the resolution of Problem 2. It is very reasonable that the defender will invest a large budget, subsequently it will be very likely that a solution will be far deep in the graph. Our approach is to generate the graph only in the k most profitable directions in order to limit graph generation as much as possible. This algorithm uses a queue to store the examined paths that may be a solution.

2014 IEEE Conference on Communications and Network Security

Line 5 retrieves from the queue the path under examination p and, from this, its last vertex v. Lines 6-17 contains the graph generation step starting from v. The generation process is similar to the one described in Algorithm 1. On Line 18, all the successors of v (generated or linked at this stage) have been computed. All the successor nodes are used to compute the importance level of v. We sum v’s benefit (distance/cost) and an importance estimation of v’successors. The estimation provides some knowledge about the solution we may discover exploring further through v. It is done by the function estimate which returns the value of the maximum benefit of vertices reachable from v’s successor vn . The importance level, the successor vn and the path under examination p are then added to a list (Lines 20-21). On Line 24 the list is sorted by decreasing importance level. Line 26 checks if there is no further exploration to perform. In this case the path under examination is added to the solutions list. Otherwise (Lines 29-32) new paths are generated from p. Each of the path is p plus a new node that is in the top k successors of v. All the new generated paths are then pushed into the queue for further examination. When the queue becomes empty, the complete list of solutions is returned. VI.

E XPERIMENTAL E VALUATION

In this section, we report some the results of experiments conducted to validate the proposed algorithmic approach to the problems formulated in Section V-B. We evaluate the

477

performance of the algorithms in terms of processing time and approximation ratio for different numbers of hosts and different numbers of admissible configurations for each host. All the results reported in this section are averaged over multiple graphs with the same number of hosts and configurations for each node, but different costs and distances among the vertices. A. Evaluation of T opKDistance First, we show that, as expected, the processing time increases when the number of configurations per nodes increases. 3 Configs

5 Configs

Lin. (3 Configs)

Poly. (5 Configs)

60 R² = 0.9962 50 Processing Time [ms]

Input: The internal view vo , the set C of admissible configurations per host, a budget b ∈ R. Output: A list of paths. 1: // Initialization: Q is the queue of paths to process (initially empty); solutions is initially empty as well; G is a graph and contains only vo 2: p ← emptyP ath; p.addV ertex(vo ); Q.add(p) 3: while Q 6= ∅ do 4: toExploreDataHolder ← ∅ 5: p ← Q.pop(); v ← p.getLast() 6: if v.indegree < MAX INDEGREE then 7: predecessors ← G.getP redecessors(v) 8: newV ertices ← createCombinations(v, predecessors, C) 9: for all vn ∈ newV ertices do 10: G.addV ertex(vn ) 11: G.addDirectedEdge(v, vn , cost(v, vn ), δ(v, vn )) //Add an edge from v to vn 12: verticesT oLink ← getOneChangeV ertices(G, vn ) 13: for all vtl ∈ verticesT oLink do 14: G.addBidirectionalEdge(v, vn , cost(vn , vtl), δ(vn , vtl)) //Add a bidirectional edge between vn and vtl 15: end for 16: end for 17: end if 18: for all vn ∈ v.getDirectSuccessors() do 19: if ¬v.isP recedessor(vn ) ∧ ¬p.contains(vn ) ∧(p.totalCost + cost(v, vn ) ≤ b) then 20: importance ← δ(v, vn )/cost(v, vn ) + estimate(vn , p, C) 21: toExploreDataHolder.add([vn , p, importance]) 22: end if 23: end for 24: sort(toExploreDataHolder) 25: numT oEval ← min(k, toExploreDataHolder.size) 26: if numT oEval = 0 then 27: solutions ← p 28: end if 29: for all i ∈ [1, numT oEval] do 30: newP ath ← toExploreDataHolder [i] .p 31: newP ath.addAsLeaf (vn ) 32: Q.push(newP ath) 33: end for 34: end while 35: return solutions

40 30 20 R² = 0.6604

10 0 6

Fig. 7.

7

8 Number of Hosts

9

10

Processing time vs. number of hosts

Figure 7 shows the processing time trends in the case of k = 7 and a required minimum distance d = 5. The processing time is practically linear in the number of hosts in the case of three configurations per host but, as soon as the number of configurations increases, it becomes polynomial as shown in the case of five configurations per host. k=3

k=4

k=5

k=7

Poly. (k=3)

Lin. (K=4)

Lin. (K=5)

Poly. (k=7)

60 R² = 0.9958

50 Processing Time [ms]

Algorithm 3 T opKBudget(vo , )

40 R² = 0,99827

30

R² = 0,98637

20

R² = 0,98277

10 0 0

Fig. 8.

20,000

40,000

60,000 80,000 Number of Vertices

100,000

120,000

Processing time vs. graph size

Figure 8 shows the processing time vs. graph size for different values of k. The graph size is intended as the number of nodes that have a distance from the internal view that is less than or equals to d. Comparing the trends for k = 3, 4, 5 we can notice that the algorithm is polynomial for k = 3 and linear for k = 4, 5. This can be explained by the fact that for k = 3 it is necessary to explore the graph more deeply than in the case of k = 4, 5.

2014 IEEE Conference on Communications and Network Security

Moreover, if we consider values of k bigger than five, the trend is again polynomial due to the fact that the algorithms starts to explore the graph more broadly. Indeed, as we will show shortly, relatively small values of k provide a good trade-off between approximation ratio and processing time, therefore this result is extremely valuable. To better visualize the relationship between processing time and k, we plotted, in Figure 9, the average time to process different families of graphs against k.

7 hosts

8 hosts

9 hosts

10 hosts

900 800

Poly. (Avg. Processing Time)

14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

6 hosts 1,000

Number of Solutions

Average Processing Time [ms]

Avg. Processing Time

in the case of a fixed number of configurations per node (5 configurations per node) and for increasing numbers of hosts. It is clear that the approximation ratio improves when k increases. Relatively low values of k (between = 3 and 6) are sufficient to achieve a reasonably good approximation ratio in a time-efficient manner.

R² = 0.9352

700 600 500 400 300 200 100 0 1

2

3

4

5

6

7

8

k

1

2

3

4

5

6

7

Number of solutions vs. k

Fig. 11.

8

k

Average processing time vs. k

Fig. 9.

The trend can be approximated by a polynomial function and the minimum is between k = 2 and k = 3. For k greater than 4, the average time to process the graph will increase almost linearly. 6 hosts

7 hosts

8 hosts

9 hosts

Finally, we measured the number of solutions found by the algorithm for increasing values of k. Figure 11 shows how the number of solutions increases when k increases, for different number of hosts. The chart refers to the case of a fixed number of configurations per node (5 configurations per node). B. Evaluation of T opKBudget As done for the previous case, we show that, as expected, the processing time increases when the number of configurations per nodes increases.

10 hosts

3 2.8

3 Configs

2.4

700

2.2

600

2 Processing Time [s]

Approximation Ratio

2.6

1.8 1.6 1.4 1.2 1 1

2

3

4

5

6

7

8

5 Configs

Lin. (3 Configs)

Poly. (5 Configs) R² = 0.9907

500 400 300 200 100

k

R² = 0.6722

0

Fig. 10.

6

Approximation ratio vs. k

Moreover, we evaluated the approximation ratio achieved by the algorithm. To compute the approximation ratio we divided the cost of the algorithm’s solution by the optimal cost. In order to compute the optimal solution we exhaustively measured the shortest path (in term of costs) from the internal view to all the solutions with a distance greater than the minimum required d, and sorted those results by increasing cost. The optimal solution has the maximum distance and the minimum cost. When the algorithm could not find a solution (none of the discovered paths has a distance greater than the minimum required d), we considered an infinite approximation. Figure 10 shows how the ratio changes when k increases

478

Fig. 12.

6.5

7

7.5

8 8.5 Number of Hosts

9

9.5

10

Processing time vs. number of hosts

Figure 12 shows the processing time trends in the case of k = 2 and a budget b = 18. The processing time is practically linear in the number of hosts in the case of three configurations per host. In this case, the minimum time (six hosts) is ∼150ms and the maximum time (10 hosts) ∼3,500ms. When the number of configurations increases, the time rapidly increases due to the time spent in the generation of the graph. Figure 13, shows a scatter plot of average processing times against increasing graph sizes. This chart suggests that, in

2014 IEEE Conference on Communications and Network Security

35

25 20 15 10 5 0 0

Fig. 13.

5,000

10,000

15,000 20,000 25,000 Number of Vertices

30,000

1.8 1.6 1.5 1.4 1.3 1.2 1.1 1

35,000

1

2

3

4

5

6

7

8

k

Approximation ratio vs. k

Finally, we measured the number of solutions found by the algorithm for increasing values of k.

Poly. (Avg. Processing Time)

7,000 6,000

6 hosts

R² = 0.9987

7 hosts

8 hosts

2,500

4,000

2,000 Number of Solutions

Average Processing Time [ms]

1.7

Fig. 15.

5,000

8 hosts

1.9

Processing time vs. graph size

Avg. Processing Time

7 hosts

2

Approximation Ratio

30 Processing Time [s]

6 hosts

R² = 0.9807 R² = 0.9833 k=3 R² = 0.9831 R² = 0.9761 k=4 k=5 k=6 k=7 R² = 0.955 k=8 Lin. (K=3) R² = 0.9613 Lin. (K=4) Poly. (k=5) Poly. (k=6) Poly. (k=7) Poly. (k=8)

3,000 2,000 1,000 0 1

2

3

4

5

6

7

1,500 1,000 500

8

k 0 1

Fig. 14.

2

3

4

5

6

7

8

k

Average processing time vs. k

practice, processing time is linear in the size of the graph for small values of k. Similarly, Figure 14 shows how processing time increases when k increases and for a fixed budget b = 18 and different graph families. The trend is approximated by a polynomial function and tends to saturate for values of k ≥ 6. This can be explained by the fact that for higher values of k most of the time is spent in the graph generation phase and starting from k = 6 it is generated almost completely. Even in this case the important result is that low values of k achieve linear time. Moreover, for these values, the algorithm can achieve a good approximation ratio. To compute the approximation ratio we divided the optimal distance by the distance returned by the algorithm. In order to compute the optimal solution we exhaustively measured the shortest path (in term of distances) from the internal view to all the solutions in a given graph. Due to the fact that it would be unfeasible to generate an exhaustive graph, we generated a sub-graph up to a maximum number of nodes. We then ordered the paths for decreasing values of distance and saved the cost needed to reach the solution. We then started the algorithm with a budget equal to the saved cost. Figure 15 shows how the ratio changes when k increases in the case of a fixed number of configurations per node (5 configurations per node) and for increasing numbers of hosts. The approximation ratio is good even for k = 1, but to obtain a more accurate solution it will be better to use k = 2, 3. Greater values of k will be less ideal in terms of time efficiency.

479

Fig. 16.

Number of solutions vs. k

The important result is that for values of k greater than 4, the number of solutions increases linearly. VII.

C ONCLUSIONS

Cyber attacks are typically preceded by a reconnaissance phase in which attackers aim at collecting valuable information about the target system, including network topology, service dependencies, and unpatched vulnerabilities. Unfortunately, when system configurations are static, attackers will always be able, given enough time, to acquire accurate knowledge about the target system, which in turn enables them to engineer effective attacks. To address this problem, many adaptive techniques have been devised to dynamically change some aspects of a system’s configuration in order to introduce uncertainty for the attacker. In this paper, we advance the state of the art in adaptive defense by looking at the problem from a control perspective and proposing a graph-based approach to manipulate the attacker’s view of a system’s attack surface. To achieve this objective, we formalize the notion of system view and distance between views. We then define a principled approach to manipulate responses to attacker’s probes so as to induce an external view of the system that satisfies certain properties. In particular, we propose efficient algorithmic solutions to different classes of problems, namely (i) inducing

2014 IEEE Conference on Communications and Network Security

an external view that is at a minimum distance from the internal view while minimizing the cost for the defender; (ii) inducing an external view that maximizes the distance from the internal view, given an upper bound on the admissible cost for the defender. Experiments conducted on a prototypal implementation of the proposed algorithms confirm that our approach is efficient and effective in steering the attackers away from critical resources.

80/tcp open http Apache httpd 1.3.23

Once the filter is enabled, nmap identifies an IIS web server and infers that the target machine has a Windows OS: 80/tcp open http Microsoft IIS httpd 5.0 Service Info: OS: Windows; CPE: cpe:/o:microsoft:windows

R EFERENCES [1]

A PPENDIX T RANSPARENT S ERVICE BANNER M ANIPULATION

[2]

As a proof-of-concept to demonstrate the feasibility of fingerprinting deception, along with the solutions presented in II, we report the details of a possible implementation using iptables and NFQUEUE to deceive the fingerprinting of an Apache web server. iptables is a command line tool to manage Netfilter, a packet handling engine introduced in Linux kernel 2.4. It enforces ordered chains of rules on network packets and specific kinds of traffic. A rule contains a match section and a target section. The match section defines a filter on the traffic (e.g., -i eth0 -p any, that is all the traffic on the eth0 interface). The target section tells what to do with a packet that matches the match section (e.g., ACCEPT, DROP). A more sophisticated target is NFQUEUE, which delegates the decision on packets to userspace software by passing the packets through a queue. A userspace program must issue a verdict (ACCEPT, DROP, ACCEPT_MODIFIED) on all the packets it receives from the kernel. For instance, one can append an NFQUEUE target to the host’s OUTPUT chain and start a demon to modify the content of packets containing information used by scanners to fingerprint services. In this example, we modify packets generated by the Apache server on TCP port 80:

[3]

[4]

[5]

[6]

[7]

[8]

iptables -A OUTPUT -p tcp --sport80 -j NFQUEUE --queue-num 1

For instance, a Python implementation of the demon requires to start the NFQUEUE, attach a callback function implementing the verdict policy (see Algorithm 4), and bind the daemon to the actual queue:

[9]

q = nfqueue.queue(); q.set_callback(callback); q.fast_open(1, AF_INET); q.try_run();

[10]

On Line 1 a packet object is created from the packet payload. In case the packet contains header information (Line 2), the service banner is substituted through a proper function. On Line 5, the packet length is updated and the checksums are deleted so as the library can recompute them. Subsequently, the verdict is set to ACCEPT_MODIFIED. In case the packet must not be altered its verdict is set to ACCEPT (Line 7).

[11]

[12]

[13]

Algorithm 4 callback(payload) 1: 2: 3: 4: 5: 6: 7: 8:

pkt ← IP(payload.get data()) if pkt[TCP].flags & 0x10 ∧ len(pkt[TCP].payload) > 0 then header ← pkt[TCP].payload substitute(header,‘Server: Apache/1.3.23’,‘Microsoft-IIS/5.0’) pkt.len ← len(pkt); del pkt[TCP].chksum; del pkt.chksum; payload.set verdict modified(nfqueue.NF ACCEPT,str(pkt),len(pkt)) else payload.set verdict(nfqueue.NF ACCEPT) end if

[14]

[15]

[16]

The result for a nmap scan (nmap -sV -p 80 192.168.0.4) without enabling iptables is:

480

P. K. Manadhata and J. M. Wing, “An attack surface metric,” IEEE Transactions on Software Engineering, vol. 37, no. 3, pp. 371–386, May 2011. Executive Office of the President, National Science and Technology Council, “Trustworthy cyberspace: Strategic plan for the federal cybersecurity research and development program,” http://www.whitehouse.gov/, December 2011. M. Dunlop, S. Groat, R. Marchany, and J. Tront, “Implementing an IPv6 moving target defense on a live network,” in Proceedings of the National Moving Target Research Symposium, Annapolis, MD, USA, June 2012. Q. Duan, E. Al-Shaer, and H. Jafarian, “Efficient random route mutation considering flow and network constraints,” in Proceedings of the IEEE Conference on Communications and Network Security (IEEE CNS 2013). Washington, DC, USA: IEEE, October 2013, pp. 260–268. J. H. Jafarian, E. Al-Shaer, and Q. Duan, “OpenFlow random host mutation: Transparent moving target defense using software defined networking,” in Proceedings of the 1st Workshop on Hot Topics in Software Defined Networks (HotSDN 2012). Helsinki, Finland: ACM, August 2012, pp. 127–132. M. Albanese, A. De Benedictis, S. Jajodia, , and K. Sun, “A moving target defense mechanism for manets based on identity virtualization,” in Proceedings of the IEEE Conference on Communications and Network Security (IEEE CNS 2013). Washington, DC, USA: IEEE, October 2013, pp. 278–286. S. Jajodia, A. K. Ghosh, V. Swarup, C. Wang, and X. S. Wang, Eds., Moving Target Defense: Creating Asymmetric Uncertainty for Cyber Threats, 1st ed., ser. Advances in Information Security. Springer, 2011, vol. 54. V. Casola, A. De Benedictis, and M. Albanese, “A moving target defense approach for protecting resource-constrained distributed devices,” in Proceedings of the 14th International Conference on Information Reuse and Integration (IEEE IRI 2013), San Francisco, CA, USA, August 2013, pp. 22–29. ——, Integration of Reusable Systems, ser. Advances in Intelligent and Soft Computing. Springer, 2013, ch. A Multi-Layer Moving Target Defense Approach for Protecting Resource-Constrained Distributed Devices. G. F. Lyon, Nmap Network Scanning: The Official Nmap Project Guide to Network Discovery and Security Scanning. Insecure, 2009. D. Watson, M. Smart, G. R. Malan, and F. Jahanian, “Protocol scrubbing: Network security through transparent flow modification,” IEEE/ACM Transactions on Networking, vol. 12, no. 2, pp. 261–273, April 2004. G. Shu and D. Lee, “Network protocol system fingerprinting - a formal approach,” in Proceedings of the 25th IEEE International Conference on Computer Communications (INFOCOM 2006). IEEE, April 2006. D. Barroso Berrueta, “A practical approach for defeating Nmap OSFingerprinting,” http://nmap.org/misc/defeat-nmap-osdetect.html, January 2003. F. H. Abbasi, R. J. Harris, G. Moretti, A. Haider, and N. Anwar, “Classification of malicious network streams using honeynets,” in Proceedings of the IEEE Conference on Global Communications (GLOBECOM 2012). Anaheim, CA, USA: IEEE, December 2012, pp. 891–897. C.-M. Chen, S.-T. Cheng, and R.-Y. Zeng, “A proactive approach to intrusion detection and malware collection,” Security and Communication Networks, vol. 6, no. 7, pp. 844–853, July 2013. M. Albanese, S. Jajodia, A. Pugliese, and V. S. Subrahmanian, “Scalable analysis of attack scenarios,” in Proceedings of the 16th European Symposium on Research in Computer Security (ESORICS 2011). Leuven, Belgium: Springer, September 2011, pp. 416–433.