Fast Classification, Calibration, and Visualization of Network Attacks ...

1 downloads 0 Views 216KB Size Report
Dec 14, 2001 - precise. Since we need only 3 to 4 memory lookups per packet to de- ... of the algorithm on a Pentium-4 PC, the algorithm incurred no packet.
Fast Classification, Calibration, and Visualization of Network Attacks on Backbone Links Hyogon Kim1 , Jin-Ho Kim2 , Saewoong Bahk2 , and Inhye Kang3 1

2

Korea University Seoul National University 3 University of Seoul

Abstract. This paper presents a novel approach that can simultaneously detect, classify, calibrate and visualize attack traffic at high speed, in real time. In particular, upon a packet arrival, this approach makes it possible to immediately determine if the packet constitutes an attack and if so, what type of attack it is. In this approach, a flow is defined by a 3-tuple, composed of source address, destination address, and destination port. The core idea starts from the observation that only DoS attack, hostscan and portscan appear as a regular geometric shape in the hyperspace defined by the 3-tuple. Instead of employing complex pattern recognition techniques to identify the regular shapes in the hyperspace, we apply an original algorithm called RADAR that captures the ”pivoted movement” in one or more of the 3 coordinates. From the geometric perspective, such movement forms the aforementioned regular pattern along the axis of the pivoted dimension. Through real execution on a Gigabit link, we demonstrate that the algorithm is both fast and precise. Since we need only 3 to 4 memory lookups per packet to detect and classify an attack packet, while simultaneously running 2 copies of the algorithm on a Pentium-4 PC, the algorithm incurred no packet loss over 330Mbps live traffic. Memory requirement is also low - at most 200MB of memory suffices even for Gigabit pipes. Finally, the method is general enough to detect both DoS’s and scans, but the focus of the paper is on its capability to identify the latter on backbone links, in the light of recent global worm epidemics.

1

Introduction

Detecting attacks on backbone-speed links, let alone performing attack classification and other more involved tasks, is hard. The formidable speed forbids any algorithm requiring more than a few memory lookups and computation steps per packet, to operate in-line. Traditional anomaly-based approach [1, 2] is obviously not usable in this environment since, first, it requires traffic accumulation to characterize normal traffic, second, it usually requires complex computation. In this paper, we discuss an approach to simultaneously detect, classify, and calibrate attack traffic at backbone speed, in real time. Better yet, it easily lends itself to H.-K. Kahng and S. Goto (Eds.): ICOIN 2004, LNCS 3090, pp. 837–846, 2004. c Springer-Verlag Berlin Heidelberg 2004 

838

Hyogon Kim et al.

the visualization of on-going attacks. To be more specific, it has the following desirable properties: a) real-time detection and classification: done in O(1) perpacket processing, immediately upon packet arrival, b) low memory requirement: less than 200MB for gigabit pipes, c) ease of calibration: attack source/victim, duration, intensity, dimensions identified without off-line post-mortem analysis, d) minimal false positives/negatives, e) no requirement for the support from the Internet infrastructure in any form: neither protocol modification, protocol addition, nor coordination between networks/routers, f) simultaneous DoS, hostscan and portscan tracking, and finally, g) immunity from asymmetric routing. This paper is organized as follows: Section 2 presents our real-time classification method. A novel representation of attacks, their particular signatures, and the implementation of the signature generator are discussed. In Section 3, we show the result of applying the algorithm to a backbone trace, and live network traffic on campus backbone. The paper is concluded in Section 4. Due to the space constraints we omit the discussion on the statistical nature of the method, its analysis, performance evaluation of the scheme in terms of the speed, memory requirement, sensitivity, estimation error, and false positive rate. Interested readers are referred to [3] for these details and related work.

2

Real-Time Attack Classification

On each packet arrival, we want to judge whether it is (highly likely) part of an attack or not. And if indeed it constitutes an attack, we want to classify the type of attack: DoS, hostscan, or portscan . Furthermore, we want to identify who is the victim (DoS), who is the perpetrator and what ports are scanned (hostscan, portscan), and the intensity of the attack. In this section, we discuss our approach to achieve these goals. First, we define a flow to be a 3-tuple < s, d, p >, composed of the source address (s), destination address (d), and destination port (p). Our novel idea starts from the observation that only DoS attack, hostscan and portscan appear as a regular geometric entity in the hyperspace defined by the 3-tuple. For instance, source-spoofed DoS packets maintain a fixed destination address, thus appears as a straight line (in case destination port is fixed) parallel to the s axis, or as a rectangle (in case destination port is randomly varied) parallel to the s-p plane. Legitimate flows, on the other hand, appear as random points scattered across the hyperspace. Figure shows the flows observed at 9:35 and 9:36 a.m. in December 14th, 2001 on two trans-pacific T-3 links connecting the U.S. and a Korean Internet Exchange. The three axes are the source IP address, destination IP address, and destination port as used in the flow definition above. (The source and the destination addresses have decimal scale.) Each dot in the 3-dimensional hyperspace represents a single flow (not a packet). Total of 2.22 million packets were mapped to the hyperspace in the figure, where the packets in the same flow fall on the same position. We can easily recognize the regular geometric formations, such as a large rectangle and a leaner rectangle lying parallel to s-axis, lines parallel to d-axis, and numerous vertical lines. These regular formations are (destination port varied) DoS at-

Fast Classification, Calibration, and Visualization of Network Attacks

839

Fig. 1. Flows at around 9:35 a.m., Dec. 14th, 2001

tacks, hostscans, and portscans, respectively. Although far outnumbering them, legitimate flows do not form any regular shape, and are less conspicuous. Instead of employing complex pattern recognition techniques such as 3-dimensional edge detection, we apply an original algorithm that captures the ”pivoted movement” in one or more of the 3 coordinates. This is because, from graphical perspective, such movement forms the aforementioned regular pattern along the axis of the pivoted dimension. In hostscan, the source IP address and the destination port are fixed, while the destination IP address pivots on them [5]. In portscan, the destination port pivots on the source and the destination IP address. In sourcespoofed DoS, the destination IP address is fixed, while either only the source IP address or both the source IP address and the destination port pivots on it [9]. In order to detect the presence of pivoting in the traffic stream, our scheme first generates a signature for each incoming packet. The signature is simply a tuple consisting of 3 binary values: < Ks , Kd , Kp >. The coordinates in the signature one-to-one correspond to the flow coordinates. Each coordinate value in the signature tells us whether the corresponding value in the flow (that the packet in hand belongs to) was seen ”recently” or not. (The degree of recentness for different coordinates could vary, and we will deal with it later.) For example, suppose two flows Flow Flow ID Arrival time t: < 3.4.5.6, 5.6.7.8, 90 > 1 t + 1: < 1.2.3.4, 5.6.7.8, 80 > 2

840

Hyogon Kim et al.

pass through the monitor that executes our scheme. For convenience, throughout the paper we will call the monitor RADAR monitor (for Real-time Attack Detection And Report), and the algorithm that it executes, RADAR algorithm. Unless we explicitly mention the algorithm, we refer to the monitor (that includes the algorithm) when we simply say RADAR. RADAR remembers these two flows for a finite time duration L. For the sake of explanation, let us assume for now that the time duration is the same for every coordinate, e.g., L = 2. When a packet with source IP = 1.2.3.4, destination IP = 3.4.5.6, destination port = 90 appears at time t + 2, RADAR tells that this packet’s signature is < Ks , Kd , Kp >=< 1, 0, 1 >. This is because source IP address 1.2.3.4 appeared in flow (2) and port 90, in flow (1). But 3.4.5.6 was not used either in (1) or (2) as the destination address, so Kd = 0 . If L = 1, flow (1) would have been purged from RADAR at the time of the packet arrival, and the signature would be < 1, 0, 0 >. In principle, this per-packet signature determines whether the packet is part of a ”pivoted movement”, and if so, what type it is. Note that when pivoting occurs, the value of the pivoted coordinate changes constantly from packet to packet within the attack stream. From the perspective of RADAR algorithm, the pivoted coordinate is viewed as persistently presenting recently unobserved values. In Fig. 2, for instance, the pivoted coordinate is the destination address, and each packet presents a new value: 72.142.101.84 → 72.142.101.197 → 197.14.58.120 → . . .. So RADAR will keep generating < 1, 0, 1 > signatatures for hostscan. This way, RADAR gets to yield the signatures < 1, 0, 1 >, < 1, 1, 0 >, or < 0, 1, ∗ > rather frequently in the presence of hostscan, portscan, or sourcespoofed DoS, respectively. (’*’ is wildcard, i.e., ’0’ or ’1’). These signatures are what we call attack signatures, and the corresponding flow goes through further examination. Sometimes legitimate traffic can get attack signatures, and vice versa. Or one attack might be mistaken as another, all due to hapless modification of one or more coordinates in the signature, so some refinement is required in back-end processing (which is much less time-pressed). The accuracy of the proposed algorithm thus depends on how likely these unwanted changes in the signature are, and the analysis of this statistical aspect of our algorithm can be found in [3]. 2.1

Attack Signatures

In this section, we explore possible signatures and their semantics. There are attack signatures and the signatures of legitimate traffic, and we start the discussion with the former. Figure 3 exhaustively enumerates all signatures and their conceivable implied attack types. As we described earlier, ’0’ in a signature means that the monitor has not recently seen the value in the given coordinate. Thus, if a packet belongs to an attack stream, ’0’ value in a coordinate most probably means that the coordinate is pivoting. The leftmost column is the number of dimensions that are pivoting. The second column is how the attacks might manifest themselves geometrically when the attack is mapped on to the 3-d hyperspace a la Figure 1. An important note here is that the signatures listed in Table I are self-induced. Namely, the values in a signature are what are

Fast Classification, Calibration, and Visualization of Network Attacks

Time

Source address

…… 09:35:23.955222 09:35:23.958716 09:35:23.965132 09:35:23.965443 09:35:23.966412 09:35:23.974520 09:35:23.976617 09:35:24.091332 09:35:24.093271 09:35:24.093317 …… 09:35:24.104956 09:35:24.105238 09:35:24.106191 09:35:24.107471 09:35:24.125654 09:35:24.126519 ……

…… x.x.x.x x.x.x.x x.x.x.x x.x.x.x x.x.x.x x.x.x.x x.x.x.x x.x.x.x x.x.x.x x.x.x.x …… x.x.x.x x.x.x.x x.x.x.x x.x.x.x x.x.x.x x.x.x.x ……

Source port

Destination address

…… 64218 64232 64310 64311 64316 64322 64331 64424 64423 64422 …… 64438 64437 64433 64429 64466 64464 ……

…… 72.142.101.184 72.142.101.197 197.14.58.120 197.14.58.121 197.14.58.126 197.14.58.132 197.14.58.141 19.231.216.127 19.231.216.126 19.231.216.125 …… 19.231.216.141 19.231.216.140 19.231.216.136 19.231.216.132 85.114.173.117 85.114.173.115 ……

841

Destination port …… 111 111 111 111 111 111 111 111 111 111 …… 111 111 111 111 111 111 ……

Fig. 2. Real-life pivoting example: hostscan

Dim. 0 1

Graphical manifestation Dot Straight line

Signature

Implied attack



Single-source-spoofed DoS Portscan Hostscan Source-spoofed DoS (destination port fixed)

2

3

Rectangle

Hexahedron

Kamikaze



Source-spoofed DoS (destination port varied) Distributed hostscan



Network-directed DoS



Fig. 3. Attack signatures

caused by the corresponding attack itself, but not by others. To wit, these are what an attack would obtain in the absence of any cross (legitimate + other type of attack) traffic. But as we discussed earlier, cross traffic might overlap in one or more coordinates, and these signatures are not always those detected when corresponding attack is under way. For < 0, 0, 0 >, one or more coordinates can be flipped to 1 by cross traffic that happens to coincide on IP addresses or

842

Hyogon Kim et al.

port number. Suppose a flow < 4.4.4.4, 2.2.2.2, 5555 > is initiated after a flow < 1.1.1.1, 2.2.2.2, 3333 > is registered by RADAR. Then the former will receive < 0, 1, 0 > signature, which RADAR recognizes as the port-varied DoS attack. Since the signatures in Table I are before their attack traffic is subject to possible overlap, we call them original signatures. In contrast, if an original signature does get modified by overlap, we call the resulting signature transformed signature. For instance, if the transformation < 0, 0, 0 >→< 1, 1, 1 > occurs, where < 0, 0, 0 > is the original signature and < 1, 1, 1 > is the transformed signature. So when RADAR detects an attack signature, it might be a transformed signature, or an original signature kept intact. Most signatures in Table I are fairly straightforward, but there are a few that call for some explanation. First, even if nothing is pivoting (signature < 1, 1, 1 >), theoretically it still can constitute an attack. One may use a single, spoofed source IP address and a fixed destination port number in a DoS attack. But it is impractical from the perspective of the attacker. Once the attack is identified as DoS, simply filtering on the single (spoofed) source address leads to the complete elimination of the attack. ”Worse” yet, the collateral damage in the filtering process is limited to the spoofed host only (it is denied an access to the victim). Therefore, we assume in this paper that this type of attack is not employed in reality. Second, we assume the distributed hostscan (signature < 0, 0, 1 >) will be detected as multiple hostscans (signature < 1, 0, 1 >), as it is. Third, the network-directed DoS (signature < 0, 0, 0 >) is an attack on the ingress pipe rather than on any particular host in the victim network. The only rationale might be that the attacker wants to evade detection because attack intensity for individual destination IP address contained in the pivoting range is proportionally reduced. But then the attacker is assuming (micro) flow-based detector as its potential opponent, which is lame under the whole gamut of other existing detecting/filtering methods [6, 7]. So in this paper, we also reject this type of attack as dubious. In sum, we reject three among the listed eight as original attack signatures: < 0, 0, 0 >, < 0, 0, 1 >, and < 1, 1, 1 > (shaded in Table I). Finally, distributed DoS (DDoS) does not appear in Table I. We can consider two cases. If DDoS sources spoof source IP address, they will collectively be detected as a single DoS attack < 0, 1, ∗ >. If spoofing is not used, since individual DoS streams look like legitimate flows from our monitor’s viewpoint, they will not be detected as attacks. Usually, however, DDoS mobilizes a large DoS network of agent hosts to maximize the impact - e.g., more than 359,000 machines were made an agent by Code-Red version 2 [4] in an attempt to bombard the White House web site. The Sapphire worm infected more than 70,000 hosts [5]. Therefore, when the attack commences, RADAR will begin to see a great many source IP addresses all of a sudden. This will produce a noticeable amount of < 0, 1, ∗ > signature at a fast pace, and draw the attention of RADAR. Provided the intensity exceeds the tolerable threshold, which is low enough to be used on a spoofed DoS attack from a single attacker (see Section V), RADAR will raise an alarm. The remaining five cases are of our interest in the paper. First of all, ”Kamikaze” is special. A single source spews packets at a high rate towards random destination hosts at random ports.

Fast Classification, Calibration, and Visualization of Network Attacks

843

Apparently, it cannot be an effective attack, but rather, it seems suicidal. The origin of this type of ”attack” is not clear, but it does appear in our traces [3]. One explanation could be a bug in the DoS attack code - pivoting destination address instead of source. But a more plausible theory is that it is the backscatter [8] from the DoS victim towards spoofed attack sources. And in Table I, we list two DoS types, but the distinction is only for the convenience of analysis - it does not bear any practical significance. The signatures of the legitimate traffic can be similarly analyzed, but we omit the discussion due to space constraint. Interested readers can find them in [3]. 2.2

Signature Generation

Fig. 4 shows the construction of main filter in the attack monitor. This is what we have called the ”front-end” thus far. It is composed of 3 hash tables, and collectively these hash tables generate the signature for each incoming packet. The network/transport packet header is mirrored to the filter, where a single, separate lookup is made against source IP address, destination IP address, and destination port number table, respectively. When a value (address or port) is ’not found’, i.e., recently unobserved, it is registered in the corresponding hash table as a new sighting. Any hash function can be used as long as it has good distributional property and can be quickly calculated. Among these two properties, however, the speed weighs more for the front-end. For instance, MD5 and SHA-1 may have good distributional property, but they require too complicated a computation, so they would not fit our environment. Our experience shows that using the least significant 24 bits from the IP address suffices for casual operation. Against the backbone trace we have, it resulted in 1.0072 comparisons on average (most are 0 and 1, where 0 means empty hash bucket), with only a few reaching up to 8 comparisons. For port hash table, the hash function is identify function, i.e., we use the port number as the index itself. This is because there are only 64K port number values. Since the hash lookups are used, the complexity of the main filter can be engineered at O(1). with each entry is the last accessed time tl . We maintain a moving time window L beyond which registered IP addresses or port numbers age out. Namely, if tnow − L > tl , we remove the entry from the corresponding hash table. We call the time window lifetime, and we define two lifetimes as follows: – LH (= Ls = Ld ): [source/destination] host lifetime – Lp : destination port lifetime The reason that we perform a separate lookup for each coordinate is clear. If we maintained each flow entry indexed by < s, d, p > collectively, we would not know which coordinate is responsible for a failed flow lookup. It means that we would not know immediately which coordinate is being pivoted, i.e., what type of attack is being mounted. Then some additional processing would be necessary on these new flows in order to achieve classification. Therefore, for real-time classification, separate hash lookups are essential. Earlier we mentioned the possibility

844

Hyogon Kim et al. packet

main filter s

d

p

source source hash table hash table

dest hash dest hash table table

port hash port hash table table

Ks

Kd

Kp

Fig. 4. Signature generation by the main filter

of signature transformation. In particular, when the signature of the first packet belonging to a legitimate flow gets transformed, the packet may be identified as an attack. For < 1, 1, 1 >, on the other hand, the cause of misinterpretation is the inadequately set lifetime(s). In case it is set too low, RADAR forgets too fast (i.e., before the flow ends), and returns 0 when it should return 1. Likewise, attack packets can get non-attack or incorrect attack signatures depending on the number and location of the flipped bit(s). So there is always possibility that any coordinate can suffer this unwanted bit flip(s). In [3], we analyze the false positive and false negative probability of the proposed algorithm caused by bit flip(s).

3

Implementation

We implemented a prototype of the RADAR system. Figure 5 shows the result of applying RADAR to the 8-hour trace (Dec. 14th, 2002) of about 612 million packets. It processed the trace in just 2.5 hours on a Pentium-3, 966MHz PC. The figure clearly shows that it successfully extracts attacks. Interested readers can find and compare animations of attacks and their processed results in [3]. We also plugged RADAR to a campus network gateway. The incoming packets were optically tapped from the gateway router on two Gigabit Ethernet interfaces [3]. A Pentium-4 2.4GHz machine with 512MB Rambus memory, Intel PRO/1000MF dual port LAN card, and PCI 2.2 (32bit) bus simultaneously run a separate instance of the RADAR algorithm on each Ethernet port. The total traffic rate was roughly 330Mbps (65Kpps) at the time of the experiments [3]. The most important result is that there was no packet loss at the kernel [3], due to RADAR processing. This is remarkable considering that we simultaneously run 2 instances of the algorithm. The memory requirement of the hash tables in the main filter

Fast Classification, Calibration, and Visualization of Network Attacks

845

Fig. 5. Graphical output from the post-filter, a real RADAR-processed result of Figure 1

and the post filter [3] is moderate. Assuming we use a 24-bit hash for the source and destination IP tables, we need at least 225 hash buckets whose heads are a pointer (usually 4 octets). This alone is 128MB. Over and above, we need to store each flow in these tables, where a flow has at least 2 IP addresses, 1 port number, and a timestamp. Also each entry needs a pointer to the next entry. So each flow entry requires at least 17B. Assuming there are 1 million flows being tracked simultaneously, 34MB should be used. Then 1 million flows in the main filter IP table translates to approximately 10Gbps (OC-192) based on our flow arrival rate constant, since we have by default LH = 10s. Over and above, we have the port table in the main filter. However, there are only 64K entries, thus it adds little to the memory requirement. In the post-filter, we do not have large tables, since concurrent attacks must be only handful. We do not expect to see, say 64,000 attacks all simultaneously under way, even it is on a backbone link. Therefore, we use 16-bit hash for all tables. Again, the memory requirement will be insignificant, most likely less than 2MB. In sum, more than half of the memory of RADAR is used to construct the IP tables in the main filter. If memory is a critical resource, we could use 23-bit hash, halving the requirement, and then 22-bit hash and so forth.

4

Conclusion

This paper proposes a novel approach that determines for each arriving packet if it constitutes an attack, and if so, what type of attack it is, on a high-speed link, in real time. The approach is based on a simple observation that only network attacks such as DoS and scans manifest themselves as a regular geometric

846

Hyogon Kim et al.

entity in a 3-dimensional hyperspace whose dimensions are source IP address, destination IP address, and destination port number. Instead of employing complex pattern recognition algorithms to detect such regular patterns, we propose a novel algorithm, RADAR, that captures the ”pivoting” behavior which directly translates to the forming of abovementioned regular geometry in the 3-d hyperspace. RADAR algorithm requires only a few memory lookups per packet, yet the classification error is minimal. This algorithm pans out only suspicious packets matching the pivoting behavior, so buys enough time for a more sophisticated back-end processing which removes the false positives from the suspicious packets. We analyze the performance of RADAR algorithm in terms of speed, sensitivity, relative error, and false positive rate. The simulation and real implementation experiments demonstrate that the algorithm indeed performs up to our expectation on high-speed links, and that it could be a useful building block for an early warning and reaction framework against fast global attacks of the future.

References [1] R. B. Blazek et al., ”A novel approach to detection of denial-of-service attacks via adaptive sequential and batch-sequential change-point detection methods,” IEEE Systems, Man, and Cybernetics Information Assurance Workshop, June 2001. 837 [2] C. C. Zhou, ”Using Hidden Markov Model in Anomaly Intrusion Detection,” http://tennis.ecs.umass.edu/ czou/research/HMM/index.htm. 837 [3] H. Kim, ”Fast Classification, Calibration, and Visualization of DoS and Scan Attacks for Backbone Links,” Technical Report, June 2003, http://net.korea.ac.kr/papers/RADAR.html. 838, 840, 843, 844, 845 [4] CAIDA, ”CAIDA analysis of Code Red,” http://www.caida.org/analysis/security/code-red/coderedv2 analysis.xml, July 2001. 842 [5] CAIDA, ”Analysis of the Sapphire Worm,” http://www.caida.org/analysis/security/sapphire/, Jan. 30, 2003. 839, 842 [6] M. Poletto, ”Practical Approaches to Dealing with DDoS Attacks,” NANOG presentaion, May 2001. http://www.nanog.org/mtg-0105/poletto.html. 842 [7] Ratul Manajan, Steven M. Bellovin, Sally Floyd, John Ioannidis, Vern Paxson, and Scott Shenker, ”Controlling High Bandwidth Aggregates in the Network,” ACM CCR, V.32 N.3, July 2002. 842 [8] David Moore, Geoffrey Voelker, and Stefan Savage, ”Inferring Internet Denialof-Service Activity,” in proceedings of the 2001 USENIX Security Symposium. 843 [9] K. Houle and J. Weaver, ”Trends in Denial of Service Attack Technology,” CERT Coordination Center, Oct. 2001. 839