Protecting Information Systems from DDoS Attack ... - DRO - Deakin

3 downloads 2150 Views 352KB Size Report
framework is made up of a Forward Bodyguard (FB) .... we used the LMbenchmark [12] to run various API'S .... than parallel according to the Figure 6 graph. This.
Chonka, Ashley, Zhou, Wanlei, Knapp, Keith and Xiang, Yang 2008, Protecting information systems from DDoS attack using multicore methodology, in CIT 2008 : Proceedings of IEEE 8th International Conference on Computer and Information Technology, IEEE, Piscataway, N.J., pp. 270275. ! ! ! ! ! ©2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. !

IEEE 8th International Conference on Computer and Information Technology Workshops

Protecting Information Systems from DDoS Attack Using Multicore Methodology Ashley Chonka, Wanlei Zhou, Keith Knapp School of Engineering & Information Technology Deakin University Geelong, 3220, Australia {ashley, wanlei, kdk}@deakin.edu.au

Yang Xiang School of Management and Information Systems Central Queensland University Rockhampton, 4702, Australia [email protected] One of the many problems that we are seeing with the industry push into developing more multicore systems is software retardation. A majority of today’s security software programmers still build and develop their programs on the basis of single core designs. Though, the results of running their applications on a multicore system have produce increases in performance time. The main reason for the speedup is due to the chip design, and not to the efficiency of the application modelling or design [13]. Security software programmers could see a number of security problems answered, by the introduction of multicore framework and methodology. For example, security applications could be run on isolated environments. Alternatively, parallel intrusion detection and attack packet filtering could be carried in real-time, in conjunction with other applications. Network activities could be monitored and visualized in real-time along with security software. Lastly, the efficiency of handling common application errors and troubleshooting would be greatly enhanced [16]. Our contribution to the field of Information Security is a multicore defence framework called bodyguard. Using this framework, we developed a bodyguard called Farmer (named after the Kevin Costner character in the movie, bodyguard). The basic hypothesis of bodyguard framework, is to separate all security processes from other processes (email, browser, etc), and assign them to a set of cores. The remaining cores within the system are assigned to the applications that require security. Bodyguard framework is made up of a Forward Bodyguard (FB) and Side Bodyguard (SB). For example, in our Farmer bodyguard, the SB is responsible for providing a fast decision on whether to filter out any attack traffic. Upon detecting an attack, FB will than move in front of the application in order to protect it and initiate a filtering procedure. This type of system configuration acts like a real bodyguard, in order to protect other applications. But at the same time minimise the performance issues that would otherwise encompass the bodyguard system, if installed on a single core

Abstract Previous work, in the area of defense systems has focused on developing a firewall like structure, in order to protect applications from attacks. The major drawback for implementing security in general, is that it affects the performance of the application they are trying to protect. In fact, most developers avoid implementing security at all. With the coming of new multicore systems, we might at last be able to minimize the performance issues that security places on applications. In our bodyguard framework we propose a new kind of defense that acts alongside, not in front, of applications. This means that performance issues that effect system applications are kept to a minimum, but at the same time still provide high grade security. Our experimental results demonstrate that a ten to fifteen percent speedup in performance is possible, with the potential of greater speedup. Index Terms — Multicore, Bodyguard, Neural Network, Non-linear Dynamic System and DDoS

1. Introduction Over the last few years, we have seen a rapid growth in processing power, with the implementation of multicore systems [1][2][3][4]. The push towards these designs is due, to the capping of the clock speed at around 4 GHz. Therefore, to increase processing speed, the information industry is now pushing towards multicore systems. With the release of these processors on a single chip, it has been hailed as a solution to the problems that come with single-core design. Multicore can be defined as two or more core processors that are connected to a single CPU. These core processors incorporate into their design, microprocessors, which in turn share computer resources. For example, L2 cache and front-side bus that share a multicore system [5] [6].

978­0­7695­3242­4/08 $25.00 © 2008 IEEE DOI 10.1109/CIT.2008.Workshops.88

270

system. There are many advantages that come with the use of the bodyguard framework, these include, efficient use of resources, performance increases and real-time detection and filtering. The rest of this paper is organized as follows. Section Two covers the related work done in Multicore. The details of Bodyguard framework and the Farmer bodyguard are discussed in Section Three. Section Four presents the experiments and evaluation that were conducted by the system. Lastly, Section Five covers the conclusion and future work.

Figure 2. Multi-Tasking on a Multicore System

usually provide similar results except the multicore cost is less in overheads. Little research has been conducted by researchers, using security frameworks that use multicore methodology, in comparison with multiprocessing methodology [14][15]. On the other hand, research has been conducted, in the development of convert sequential programs into multicore programs. For example, Bader et al. [6], designed a open-source parallel programming framework called SWARM (SoftWare and Algorithms for Running on Multi-core). SWARM is a library that provides the functionality to a sequential program and converts it to a multithreading program. SWARM, also, encompasses synchronization, memory control and collective operations. The programming framework for SWARM is a descendant of the symmetric multiprocessor (SMP) library component of SIMPLE [10]. SWARM is built using POSIX threads, which allows the user to employ multithreading principals. Once the programmer has inserted the SWARM code into their sequential program, SWARM will allocate and de-allocate shared memory, control threads, construct parallelization and synchronize communication primitives. Another area of multicore programming is the impact of using large-scale platforms for real-time scheduling [5]. Their research has only just started in this area with a comparison of partitioning, global scheduling and system overheads [11]. Calandrino et al. [11] developed a hybrid approach for scheduling real-time tasks on a largescale multicore platform using hierarchical shared caches. This method allowed them to partition a multicore platform into clusters. These were statically

2. Related Work This section surveys the current methods in the areas of multicore.

2.1. Multicore Multicore systems have two or more processing cores integrated into a single chip [1][2][3][8][9]. Figure 1 and 2 shows the difference between a single core and multicore system. In such a design, processing cores have their own private cache (L1) and a shared common cache (L2). The shared cache and main memory share the bandwidth between all the processing cores. Industry and companies are now pushing towards multicore systems to handle large amounts of soft real-time transactions. For example a company called Azul, built a system that contained 48core chips [7]. This system was used for developing a variety of large-scale multicore platforms, in order to handle large scale transactions. Multicore methodology is used to initiate multiple tasks simultaneously (multitasking), while using the same common resource (eg. one core processor). For example, the operating system can switch between applications more quickly, while only using one core processor. Multicore and Multiprocessor systems have one main difference between them. A multicore system has a single physical processor that contains two or more cores. However, multiprocessor systems include two or more physical processors. Another difference between the two is, multicore and multiprocessors

Attacker

Authorized User

Farmer

Attacker

Authorized User

Farmer

Farmer

Farmer

Farmer

Figure 1. Single Core Defender/Victim

Figure 3. System Architecture of Farmer

271

router (This is to provide better performance, by breaking up the security and application data), monitors the performance of each other (So if a successful attack brings down a bodyguard, the next hop router is prepared to handle the security). For each individual router in the network, the Bodyguard framework incorporates multicore methodology. By multicore methodology, we mean the separation of security processes and placing them on one or more cores (Figure 4). With the use of the bodyguard framework, we develop a bodyguard called Farmer. Farmer includes the two parts of the bodyguard framework, the side bodyguard (SB) and front bodyguard (FB). The side bodyguard is the main component of the framework, which consists of the following objectives: 1. To protect the system, while allowing applications full performance potential. 2. If an attack is discovered, the front bodyguard subsystem will be initiated, which will affect the performance ratio of the application, but will not affect the other applications on the host. The affected application performance ratio will be kept to a minimum, while the security issue is resolved. 3. That all security process generated by side and front bodyguard sub-system are handled by the Security Cores. The front bodyguard’s objective is to remain in a constant state of hibernation until the side bodyguard initiates its start-up.

Figure 4. Bodyguard Architecture on each router

assigned tasks, and scheduled using a pre-emptive global EDF scheduling algorithm. So though their results mimic a multicore system they are more in line with multiprocessing results than multicore.

2.2. Distributed Denial of Service One of the most deadly enemies to an information infrastructure is a Distributed Denial of Service (DDoS) attack. Rogers [17] from The Computer Emergency Response Team (CERT) characterizes a DDoS attack as an explicit attempt from an attacker to prevent legitimate users accessing their resources. According to the Prolexic Zombie Report 2007, over 4000 DDoS attacks happen daily [18]. Some of the most recent DDoS incursions brought down the CGold Chat Forum website [19] and SE-NSE Forums [20]. Reasons vary for why attackers use DDoS. Some use the attacks for competitive advantage [21], extortion of online business [22] or Employee Vilification [23]. Another goal of a DDoS attack is for the attackers to hide their identity. This is done by mimicking legitimate Web Service traffic in order to create one large group of agents to launch an attack [23][24]. The approaches used to defend against DDoS attacks, include congestion control [25][26], replication [13][14], Filtering [15][27] and traceback [28][29].

3.2. Side Bodyguard The focus of the paper is not on our non-linear dynamic neural network, so we only cover it briefly here. The SB’s main component is a non-linear dynamic neural network. The reason for training a neural network to learn ‘chaos’, is that traffic has a deterministic characteristic. Therefore any abnormal traffic that perturbs the initial conditions can be used as a detection of DDoS attack. Once we have trained our neural network we implement it into the SB.

3. Farmer System Design 3.1. Body Guard Framework The bodyguard framework is distributed on each router in the network; in order to provide overall protection (Figure 3). Each Bodyguard is a source end (provides security before traffic leaves the router) and destination end protector (provides security as the traffic enters the network). Another feature in Figure 3 is that each bodyguard is connected to each other. There are three main reasons for this; to allow bodyguards to send updated security information to each other (new attacks that each has encountered, for example), send security information down to the next hop for checking application data as it comes into the

3.3. Front Bodyguard The FB, in Figure Four, is shown with transparent lines. The reason for this is that the FB remains in System S-Core M-Core

T1 191 199

T2 190 195

T3 185 201

T4 183 170

T5 180 177

Total

Speedup

-8

-5

-16

+13

+3

-13

Table 1. LMBenchmark Results

272

90%

1.50

80% 70%

1.00

%

60%

Single Core

50%

Multicor e(Par all el)

Multicore

30%

Multicor e(Ser ial)

0.50

Multicore (Serial)

40%

0.00 1

20%

2

3

10%

4

5

T est

0% 1

2

3

4

5

Figure 7. Resource Sharing delay time for assigned memory to core (latency)

Test

Figure 5. CPU usage performances of single core and multicore

120 100 CPU %

1.50 1.45

Core0

80

Core1

60

Core2

40

Core3

20

1.40

0

Multi cor e(Ser i al)

1.35

1

5

10

15

20

Multi cor e

25

30

35

40

45

50

55

60

Time

1.30 1

2

3

4

5

Figure 8. CPU efficiency of the Security and application that need security

T est

Figure 6. Resource Sharing delay time for shared memory (latency)

,such as memory bandwidth and system calls. We, also, developed our own measurements, by watching CPU core processes over both systems, and make a comparison of CPU core efficiency and CPU core speed. We apply these measurements to only our multicore system, in order to see the difference between serialization security programs and multicore security programs. To measure the Linux kernel performance, we used LMbenchmark and our own multicore CPU evaluations. In order to do this, we wrote 3 simple Fibonacci functions to represent the applications that will need security. We assigned them to 3 cores within our multicore system by using affinity methods. The security function is simulated, by assigning Chaos Neural Network (CNN), to run in training mode on Core 0. To differentiate between O/S assigned cores and ours, we use two terms, Serial and Multicore. For example security and Fibonacci applications assigned by the O/S are serial. On the other hand multicore means, we were the ones to assign core(s), through the use of the affinity method.

hibernation, until the SB system has detected an attack. If an attack is detected the front body guard is initiated and filtering the attack begins. By placing the FB into a hibernation mode it provides other applications the ability to speedup there performance.

4. Performance Evaluation In order to test and evaluate our system, we split our simulations into two areas. The first area that was evaluated was the single and multicore systems. The second area was on resource and speedup performance.

4.1 Multicore Performance Analysis To assess the performance of our multicore system, we compared the two kernel benchmarks. The hardware on the first system (Single Core) was a Dell Dimesion DM501 Intel Pentium single-core CPU, 3.0 GHz, 2 GB of RAM and 2 300GB SATA hard- drives. The second system (Multicore) had Intel Core 2 Quad Q6600 2.4GHz Quad Core Processor, 2 GB of RAM and 2 300GB SATA hard-drives. The kernel under measurement was 2.6.22.14.72 fc6. To gather our data, we monitored the CPU usage, by using the top command and press 1. We set CPU 0 as the core to handle the security processes. Further, we conducted experiments just on our multicore system, in order to make comparisons of serial security applications and our multicore security application. For measurements, we used the LMbenchmark [12] to run various API’S

4.1. Simulation Setup 4.1.1 LMbenchmark LMbenchmark is a micro benchmark that contains a suite of latency and bandwidth measurements. The kernel components of LMbenchmark are the process management, memory map and scheduler. These low level kernel primitives, provide a good indicator of the underlying hardware capabilities and performance. To study effects of multicore system, we conducted five

273

rounds of testing, each focusing on the latency measurements on the single core and multicore with serialization.

System

T1

T2

T3

T4

T5

Total

S-Core M-Core

150 130

153 133

150 129

151 133

151 132

150 130

4.1.2 MCBM (Multicore benchmark)

Speedup 20 20 21 18 19 20 Table 2. Speedup Comparison between Multicore Serial and Multicore

The tests that we conducted on our multicore system were as follows: 1. Conducting performance tests on the multicore system, by running serialization programs through 5 times, as well as affinity programs and getting CPU usage data. The reason for 5 test runs was to give us a clear understanding of CPU usage, and to see differences between serialisation and multicore. 2. From the above tests, we conducted performance efficiency tests on the CPU usage. The greater the CPU usage the better the performance. 3. To simulate the sharing of resources, we allowed serial and multicore programs to access the resources that they needed. We recorded the delay time (Latency), in order to see how long the programs needed to get the resources they needed. 4. Another performance test was conducted, to see the CPU performance usage amongst the security and non-security programs. 5. The last lot of tests conducted, was to record the performance speed of serial and parallel programs, on our multicore system.

cumulative performance will be of great benefit in the future. We tested resource sharing amongst serial and parallel programs. Serial programs did a little better than parallel according to the Figure 6 graph. This type of resource sharing is not recommended since it runs into the false sharing problem. It also leads into race conditions and bottleneck problems. As seen in test 4 which we believe was the result of a bottleneck problem. Assigning resources on a separate core was our next test. We found that multicore did much better than multicore (Serial). Figure 7 results compared to Figure 6 shows that latency delays were a little longer in the separate cores. One reason for this could be due, to the assignment of resource deployment not being sufficient. Another reason could be that memory delay was involved. Figure 8 displays the CPU performance of security and non-security programs, over the course of an hour, the results show the security application assigned to core 0 was averaging around the 90% efficiency. The non-security applications (Fibonacci programs) were around the 30%. The reason for the low CPU efficiency for the non-security application is due to the assigned programs. We would predict that if programs like an email or browser were used, the CPU usage would be a lot higher. Lastly, the results in table two show a comparison of multicore (serial) and multicore. By assigning programs to specific cores, we got an average speedup of about 20ms (10%). The generation of those results came from placing stop watches into our code, and watching how quickly the programs were executed. We believe that the result in speedup is fairly good and if the speedup is maintained over long periods the overheads costs would be greatly reduced.

4.2. Evaluation 4.2.1 LMBenchmark The results from LMbenchmark are displayed in Table 1. As we can see, there was an overall -13 degradation. This means that single core is far more efficient than a multicore system, when we split the processors up into nearly two identical copies. These results were not unexpected, since single core machines do not have to share memory, as multicore machines do [13]. We predict the results will shift towards the multicore system in the future.

5. Conclusion and Future Work

4.2.2 Multicore Bench

In this paper, we introduced a multicore defence framework, called bodyguard. From this framework, we designed and built a bodyguard system called Farmer. Farmer, is built upon a side bodyguard (containing a Non-Linear Dynamic Neural Network) and forward bodyguard (Used to filter the attack traffic detected by our neural network). The goal of such a security system is to use the new multicore machines that are coming out. This is in order to deal with performance issues that security encompasses when

The results from our observation of CPU usage, Figure 5, we can see an impressive result of the multicore system with serialization and without. The results were generated by cumulative usage across all the programs, in order to see how efficient our system could be. The conclusion we can draw from Figure 5 is that multicore and multicore serialisation have only a small difference. But in the long term, processing this

274

service’, Proc. Of the 3rd Information Survivability Workshop (ISW2000). [14] Sangpachatanaruk, C., Khattab, S.M., Znati, T., Melhem, R. and Mosse, D., (2003), ‘A simulation study of the proactive server roaming for mitigating denial of service attacks”, Proc. Of the 36th Annual Simulation Symposium 2003, pp.7. [15] Park, K., and Lee, H, (2001), “On the effectiveness of route-based packet filtering for distributed DoS attack prevention in power-law Internet”, ACM SIGCOMM 2001, pp.15-21 [16] Zhou, Wanlei, (2008), Using Multi-core To Support Security-Sensitive Applications , Presentation in NPC2007 Panel: Multi-core - how can parallel computing become mainstream? , September 18-21, 2007, Dalian, China. [17] Rogers, L., (2008), ‘What is a Distributed Denial of Service (DDoS) Attack and What Can I Do About It?’ Computer Emergency Response Team, http://www.cert.org/homeusers/ddos.html [18] Prolexic Technologies, ‘Prolexic Technology Report,(2007), http://www.prolexic.com/zr/zombie_july_2007.pdf [19] Poulsen. K., (2004), ‘FBI Busts Alleged DDoS Mafia’, 2004. http://www.securityfocus.com/news/9411. [20] Pappalardo, D., and Messmer, E., (2005), ‘Extortion via DDoS on the rise, NetworkWorld’, May 2005. com/news/2005/051605-ddoshttp://www.networkworld. extortion.html [21] Bhaskaran, M., Natarajan, A.M. and Sivanandam, S.N., (2007), ‘Tracebacking the Spoofed IP Packets in Multi ISP Domains with Secured Communication’ IEEE - ICSCN 2007, MIT Campus, Anna University, Chennai, India. Feb. 22-24, 2007. pp.579-584. [22] Digital Money, (2008), ‘C-Gold Chat Forum Crash’, http://www.digitalmoneyworld.com/ , 11 January, 2008. [23] SE-NSE Forums, (2008), http://forums.sense.net/index.php , 10 January, 2008. [24] Trostle, J, (2006), ‘Protecting Against Distributed Denial of Service (DDoS) Attacks Using Distributed Filtering’, Securecomm and Workshops, 2006 Aug. 28 2006Sept. 1 2006 Page(s):1 – 11 [25] Floyd, S. and Jacobson, V. (1993), ‘Random early detection gateways for congestion avoidance’, IEEE/ACM Transactions on Networking, Vol.1 No. 4, pp.397-413 [26] Gevros, P., Crowcroft, J., Kirsten, P. and Bhatti, S., (2001), ‘Congestion control mechanism and the best efforts service model’, IEEE network, Vol. 15, No 3, pp 16-26 [27] Xiang, Y., Zhou, W., (2006), “Protecting information infrastructure from DDoS attacks by MADF”, Int. J. High Performance Computing and Networking, Vol. 4, Nos. 3/4, 2006. [28] Xiang, Y., Zhou, W. and Rough, J., (2004), “Tracing IP packets by Flexible Deterministic Packet Marking (FDPM)”, IEEE International Workshop on IP Operations & Management. [29] Aljifri, M., (2003), ‘IP Traceback: A NewDenial-ofService Deterrent?’ Published By The Ieee Computer Society 1540-7993/03 2003

developers use it. We show with our experimental results that a speedup of 10% with an average of 90% CPU efficiency for security programs. In the future, we are moving Farmer onto the enterprise grid system at Deakin University, in order to fine tune the bodyguard system.

6. Acknowledgements This research was supported by the ARC Linkage grant (Project number LP0562156).

References [1] Multi-Core from Intel – Products and Platforms. http: //www.intel.com/multi- core/products.htm, 2006. [2]AMD Multi-Core Products. http://multicore.amd.com/en/Products/, 2006. [3] Kongetira, P., Aingaran, K., and Olukotun, K., Niagara: A 32-way multithreaded Sparc processor. IEEE Micro, 25(2):21–29, 2005. [4] Gorder, P.M, (2007), ‘Multicore processors for science and engineering’, IEEE CS and the AIP, 15219615/07/,March/April 2007 [5] Calandrino, J.M, Anderson, J.H., and Baumberger, D. P, 2007, ‘A Hybrid Real-Time Scheduling Approach for LargeScale Multicore Platforms’, 19th Euromicro Conference on Real-Time Systems (ECRTS'07), IEEE, 2007 [6] Bader, D.A, Kanade, V and Madduri, K, (2007), ‘SWARM: A Parallel Programming Framework for Multicore Processors’, Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International 26-30 March 2007 Page(s):1 – 8 [7] Gevros, P., Crowcroft, J., Kirsten, P. and Bhatti, S., (2001), ‘Congestion control mechanism and the best efforts service model’, IEEE network, Vol. 15, No 3, pp 16-26 [8] Barroso, L., Gharachorloo, k., McNamara, R., Nowatzyk, A., Qadeer, S., Sano, B., Smith, S., Stets, R., and Verghese, B., (2000), Piranha: A scalable architecture based on singlechip multiprocessing. SIGARCH Comput. Archit. News, 28(2):282– 293, 2000. [9] Kahle, J., Day, M., Hofstee, H., Johns, C., Maeurer, T., and Shippy, D., (2005), ‘Introduction to the Cell multiprocessor’. IBM J.Res. Dev., 49(4/5):589–604, 2005. [10] Bader, D.A. and J´aJ´a, J, (1999), ‘SIMPLE: A methodology for programming high performance algorithms on clusters of symmetric multiprocessors (SMPs)’, Journal of Parallel and Distributed Computing, 58(1):92–108, 1999 [11] Calandrino, J., Leontyev, H., Block, A., Devi, U., and Anderson, J., (2006), ‘LITMUSRT: A testbed for empirically comparing real-time multiprocessor schedulers. Proceedings of the 27th IEEE Real-Time Systems Symposium, 2006. [12] McVoy, L., & Staelin, C., (1996), ‘lmbench: Portable Tools for Performance Analysis’, Proceedings of the USENIX 1996 Annual Technical Conference San Diego, California, January 1996 [13] Yan, J., Early, S. and Anderson, R. (2000), ‘The XenoService a distributed defeat for distributed denial of

275