Detecting Stepping Stones - Usenix

3 downloads 220 Views 103KB Size Report
cient algorithm for detecting stepping stones by monitor- ing a site's Internet ... hence can be used to find stepping stones even when the traffic is encrypted.
USENIX Association

Proceedings of the 9th USENIX Security Symposium Denver, Colorado, USA August 14 –17, 2000

THE ADVANCED COMPUTING SYSTEMS ASSOCIATION

© 2000 by The USENIX Association All Rights Reserved For more information about the USENIX Association: Phone: 1 510 528 8649 FAX: 1 510 548 5738 Email: [email protected] WWW: http://www.usenix.org Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein.

Detecting Stepping Stones Yin Zhang Department of Computer Science Cornell University Ithaca, NY 14853 [email protected]

Vern Paxson AT&T Center for Internet Research at ICSI International Computer Science Institute Berkeley, CA 94704 [email protected]

Abstract One widely-used technique by which network attackers attain anonymity and complicate their apprehension is by employing stepping stones: they launch attacks not from their own computer but from intermediary hosts that they previously compromised. We develop an efficient algorithm for detecting stepping stones by monitoring a site’s Internet access link. The algorithm is based on the distinctive characteristics (packet size, timing) of interactive traffic, and not on connection contents, and hence can be used to find stepping stones even when the traffic is encrypted. We evaluate the algorithm on large Internet access traces and find that it performs quite well. However, the success of the algorithm is tempered by the discovery that large sites have many users who routinely traverse stepping stones for a variety of legitimate reasons. Hence, stepping-stone detection also requires a significant policy component for separating allowable stepping-stone pairs from surreptitious access.

1 Introduction A major problem with apprehending Internet attackers is the ease with which attackers can hide their identity. Consequently, attackers run little risk of detection. One widely-used technique for attaining anonymity is for an attacker to use stepping stones: launching attacks not from their own computer but from intermediary hosts that they previously compromised. Intruders often assemble a collection of accounts on compromised hosts, and then when conducting a new attack they log-in through a series of these hosts before finally assaulting the target. Since stepping stones are generally heterogeneous, diversely-administered hosts, it is very difficult to trace an attack back through them to its actual origin.

 Also with the Lawrence Berkeley National Laboratory.

There are a number of benefits to detecting stepping stones: to flag suspicious activity; to maintain logs in case a break-in is subsequently detected as having come from the local site; to detect inside attackers laundering their connections through external hosts; to enforce policies regarding transit traffic; and to detect insecure combinations of legitimate connections, such as a clear-text Telnet session that exposes an SSH passphrase. The problem of detecting stepping stones was first addressed in a ground-breaking paper by Staniford-Chen and Heberlein [SH95]. To our knowledge, other than that work, the topic has gone unaddressed in the literature. In this paper, we endeavor to systematically analyze the stepping stone detection problem and devise accurate and efficient detection algorithms. While, as with most forms of intrusion detection, with enough diligence attackers can generally evade detection [PN98], our ideal goal is to make it painfully difficult for them to do so. The rest of the paper is organized as follows. We first examine the different tradeoffs that come up when designing a stepping stone algorithm ( 3). We then in 4 develop a timing-based algorithm that works surprisingly well, per the evaluation in 5, and also evaluate two cheap context-based techniques. We conclude in 6 with some of the remaining challenges: in particular, the need for rich monitoring policies, given our discovery that legitimate stepping stones are in fact very common; and the possibility of detecting non-interactive relays and slaves.









2 Terminology and Notation We begin with terminology. When a person (or a program) logs into one computer, from there logs into another, and perhaps a number still more, we refer to the

sequence of logins as a connection chain [SH95]. Any intermediate host on a connection chain is called a stepping stone. We call a pair of network connections a stepping stone connection pair if both connections are part of a connection chain.

be taken, though, because we will not find a perfect match between two stepping stone connections. They may differ due to translations of characters such as escape sequences, or the varying presence of Telnet options [PR83b].

Sometimes we will differentiate between flow and connection. A bidirectional connection consists of two unidirectional flows. We term the series of flows along each direction of a connection chain a flow chain.

In addition, suppose we are monitoring connections and , where is the stepping stone the attacker is using to access from . If we adopt a notion of “binning” in order to group activity into different time regions (for example to compute character frequencies as done in [SH95]) then due to the lag between activity on and activity on , the contents falling into each bin will match imperfectly. Furthermore, if the attacker is concurrently attacking via , then the traffic on will be a mixture of that from and that from , and neither of the latter connections’ contents will show up exactly in .

We use the following additional notation:

  %

 & '

  %

('





 %('

   : a bi-directional network connection be*) ( &( tween  and  . We also use  , , ... to denote  +% '  ,& ) network connections.    : a unidirectional flow from   to  .  %   is a binary relation defined over all conThese considerations complicate content-based detecnections as follows:  !"#$  if and only if techniques. A more fundamental limitation is that

 and form a stepping stone connection pair. tion content-based techniques cannot, unfortunately, work when the content is encrypted, such as due to use of SecureShell (SSH; [YKSRL99]).

3 Design Space In this section we discuss the tradeoffs of different highlevel design considerations when devising algorithms to detect stepping stones. Some of the choices relate to the following observation about stepping-stone detection: intuitively, the difference between a stepping stone connection pair and a randomly picked pair of connections is that the connections in the stepping stone pair are much more likely to have some correlated traffic characteristics. Hence, a general approach for detecting stepping stones is to identify traffic characteristics that are invariant or at least highly correlated across stepping stone connection pairs, but not so for arbitrary pairs of connection. Some potential candidates for such invariants are the connection contents, inter-packet spacing, ON/OFF patterns of activity, traffic volume or rate, or specific combinations of these. We examine these as they arise in the subsequent discussion.

3.1 Whether to analyze connection contents A natural approach for stepping-stone detection is to examine the contents of different connections to find those that are highly similar. Such an approach is adopted in [SH95] and proves effective. Considerable care must

The goal of our work was to see how far we could get in detecting stepping stones without relying on packet contents, because by doing so we can potentially attain algorithms that are more robust. Not relying on packet contents also yields a potentially major performance advantage, which is that we then do not need to capture entire packet contents with the packet filter, but only packet headers, considerably reducing the packet capture load. However, we also devised two cheap content-based techniques for purposes of comparison ( 5.3), neither of which is robust, but both of which have the virtue of being very simple.



3.2 Direct vs. indirect stepping stones

 .-  /- ('

 0+% ( +% '  -  -1121$-  ' -  ) &(  ' & )  31211 ('   '

Suppose is a connection chain. The direct stepping stone detection problem is to detect that is a stepping stone if we are observing network traffic that includes the packets belonging to and . If, however, the connection chain is , then the indirect stepping stone detection problem is to detect that connections and form a stepping stone pair, given that we can observe their traffic but not the traffic belonging to (and hence there is no obvious connection between and ).

Detecting direct stepping stones can be simpler than detecting indirect ones because for direct ones we can often greatly reduce the number of candidates for connection pairs. On the other hand, it is much easier for attackers to elude direct stepping stone detection by simply introducing an additional hop in the stepping stone chain. Furthermore, if we can detect indirect stepping stones then we will have a considerably more flexible and robust algorithm, one which can, for example, be applied to traffic traces gathered at different places (see below).

Here we focus on passive monitoring, both because of its operational simplicity, and because if we can detect stepping stones using only passive techniques, then we will have a more broadly applicable algorithm, one that works without requiring the ability to manipulate incidental traffic.

In this paper we focus on the more general problem of detecting indirect stepping stones.

Tracing traffic at multiple points could potentially provide more information about traffic characteristics. On the other hand, doing so complicates the problem of comparing the traffic traces, as now we must account for varying network delays and clock synchronization. In this paper, we confine ourselves to the single measurement point case, with our usual presumption being that that measurement point is on the access link between a site and the rest of the Internet.

3.3 Real-time detection vs. off-line analysis We would like to be able to detect stepping stones in real-time, so we can respond to their detection before the activity completes. Another advantage of real-time detection is that we don’t have to store the data for all of the traffic, which can be voluminous. For instance, a day’s worth of interactive traffic (Telnet/Rlogin) at the University of California in Berkeley on average comprises about 1 GB of storage for 20,000 connections. Algorithms that only work using off-line analysis are still valuable, however, for situations in which retrospective detection is needed, such as when an attacked site contacts the site from which they were immediately attacked. This latter site could then consult its traffic logs and run an off-line stepping stone detection algorithm to determine from where the attacker came into their own site to launch the attack. Since real-time algorithms generally can also be applied to off-line analysis, we focus here on the former.

3.4 Passive monitoring vs. active perturbation Another design question is whether the monitor can only perform passive monitoring or if it can actively inject perturbing traffic to the network. Passive monitoring has the advantage that it doesn’t generate additional traffic, and consequently can’t disturb the normal operation of the network. On the other hand, an active monitor can be more powerful in detecting stepping stones: after the monitor finds a stepping-stone candidate, it could perturb one connection in the pair by inducing loss or delay, and then look to see whether the perturbation is echoed in the other connection. If so, then the connections are very likely correlated.

3.5 Single vs. multiple measurement points

3.6 Filtering An important factor for the success of some forms of real-time stepping-stone detection is filtering. The more traffic that can be discarded on a per-packet basis due to patterns in the TCP/IP headers, the better, as this can greatly reduce the processing load on the monitor. However, there is clearly a tradeoff between reduced system load and lost information. First, if a monitor detects suspicious activity in a filtered stream, often the filtering has removed sufficient accompanying context that it becomes quite difficult determining if the activity is indeed an attack. In addition, the existence of filtering criteria makes it easier for the attackers to evade detection by manipulating their traffic so that it no longer matches the filtering criteria. For example, an evasion against filtering based on packet size (see below) is to use a Telnet client modified to send a large number of do-nothing Telnet options along with each keystroke or line of input. The main likely filtering criteria for stepping-stone detection is packet size. Keystroke packets are quite small. Even when entire lines of input are transferred using “line mode” [Bo90], packet payloads tend to be much smaller than those used for bulk-transfer protocols. Therefore, by filtering packets to only capture small packets, the monitor can significantly reduce its packet capture load (for example, by weeding out heavy bulk-transfer SSH sessions while keeping interactive ones).

3.7 Minimizing state for connection pairs Since potentially there can be a large number of active connections seen by the monitor, it is often infeasible to keep stepping-stone state for all possible pairs of connections due to the memory requirements. Therefore we need mechanisms that allow us to only keep state for a small subset of the possible connection pairs.

4

One approach is to limit our analysis to only detecting direct stepping stones, but for the reasons discussed in 3.2 above, this is unappealing. There are, however, other mechanisms that work well:





&(

 +& '

Remove connection pairs sharing the same port on and both use the same host. If port on host , then most likely the two connections are merely using the same server on , accessing a server on and then rather than from that server running a client on to access a server on . Removing such connection pairs is particularly helpful when there are a large number of connections connecting to the same popular server—without such filtering, when connections connect to the same server, we need to keep state for connection pairs!

5