International Journal of Cyber-Security and Digital ...

26 downloads 3042688 Views 16MB Size Report
More than 1,300,000 android malware belonging ... Figure 6 Growth of android malware between 2011 and. 2012 as ...... -Micro SD up to 16GB. Battery.
International Journal of Cyber-Security and Digital Forensics (IJCSDF) Published by The Society of Digital Information and Wireless Communications Miramar Tower, 132 Nathan Road, Tsim Sha Tsui, Kowloon, Hong Kong

Volume 3 - 2014

[email protected] http://www.sdiwc.net/security-journal/

http://sdiwc.net/digital-library/browse/66

ISSN 2305-0012

International Journal of Cyber-Security and Digital Forensics (IJCSDF) Published by The Society of Digital Information and Wireless Communications Miramar Tower, 132 Nathan Road, Tsim Sha Tsui, Kowloon, Hong Kong

Volume 3, Issue No. 1 - 2014

Email: [email protected]

Journal Website: http://www.sdiwc.net/security-journal/ Publisher Paper URL: http://sdiwc.net/digital-library/browse/66

ISSN 2220-9085 ISSN: 2305-0012

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 30-37 The Society of Digital Information and Wireless Communications, 2014 2013 (ISSN: 2305-0012)

Detecting Hidden Encrypted Volume Files via Statistical Analysis Mario Piccinelli and Paolo Gubian University of Brescia Via Branze 38, 25123 Brescia (Italy) email: [email protected] Abstract — Nowadays various software tools have been developed for the purpose of creating encrypted volume files. Many of those tools are open source and freely available on the internet. Because of that, the probability of finding encrypted files which could contain forensically useful information has dramatically increased. While decoding these files without the key is still a major challenge, the simple fact of being able to recognize their existence is now a top priority for every digital forensics investigation. In this paper we will present a statistical approach to find elements of a seized filesystem which have a reasonable chance of containing encrypted data. Keywords Detection.



Forensics,

Anti-anti-forensics,

Encryption,

I. INTRODUCTION DATA ENCRYPTION is here understood as the process of transforming information using an algorithm (called cypher) to make it unreadable to anyone except those possessing special knowledge, usually referred to as a key. In computer forensics the encrypted data is usually a part of a filesystem (one or more files) cyphered into a single binary blob, which in turn can be saved as a single file in the main filesystem of the machine it is hosted on or as a whole partition (or a part of it). When a single encrypted file hosts a filesystem it is called volume file, because it mimics a logical volume of a disk. The data produced by modern encryption software like TrueCrypt or PGP Virtual Disk is usually indistinguishable from uniform random data, and has no recognizable header. This means that it is impossible to link an encrypted volume file to the methodology used to encrypt it, nor it is possible to even prove that it is in fact an encrypted file. This goes under the principle of “Plausible Deniability”, by which it is not possible to prove under a court the existence of hidden data. Nonetheless, experimental evidence proved that there is very low probability for a normal file to have a random distribution of data; under this assumption it can be said that proving a file to be random means proving it has a big chance of being an encrypted volume. A major problem in recognizing encrypted data by its randomness is that random data can be produced by other means, the most important of those being data wiping algorithms. Data wiping tools in fact destroy data by overwriting the interested area with a random sequence [1], which is as stated before indistinguishable from encrypted data. This is a still open challenge in the analysis of entire

disks, entire volumes of a disk, or apparently unused disk space [2]; in these cases it is impossible to prove that these areas contain encrypted data (according to the principle of “Plausible Deniability” described before). For encrypted files, instead, it is impossible to hide their existence as single entities on the disk, and this can lead to the conclusion that they could contain useful data. II. NIST STATISTICAL TEST SUITE The NIST Statistical Test Suite (identified by the description “a statistical test suite for random and pseudorandom number generators for cryptographic applications”) is a software tool written in ANSI C developed by the National Institute for Standards and Technology (U.S. Department of Commerce). It includes 10 pseudo-random number generation algorithms and 15 algorithms for testing randomness of a given data stream. It has been made available in the public domain under an open source license, and can be downloaded from the NIST website with exhaustive documentation. For the experiments mentioned in this paper we used the last release available at the time of writing, version 2.1.1 dated April, 2010. For the purposes of this research only a subset of the testing algorithms will be used. The subset will be chosen according to the size of the file to analyze, because each test has a recommended minimum length (n) in bytes for each run (each sequence is split into a chosen number of runs to be analyzed individually, and each run is n bits long). A. How the suite is used Once the package has been compiled a single executable named assess is created. It accepts one integer parameter, the bit stream length n. $ ./assess 32000 Once launched, the software asks the user to choose the sequence generator among all the available pseudo random number generators. For our tests we select the option: [0] Input File Then, the software asks to enter the name of the data file to analyze. User Prescribed Input File: _

30

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 30-37 The Society of Digital Information and Wireless Communications, 2014 2013 (ISSN: 2305-0012) In the next screen the user is asked about which statistical test to run on the selected file. The user can choose between running each test (option 1) or be brought to a next screen in which he’ll be able to select a specific subset (option 0). Whether the user selects all the tests or just a subset, another menu is shown to present the default test-specific parameters, and let the user modify them. The parameters depend mainly on the size n of the streams, and are well described in the NIST manual. They won’t be described here because they are outside the scope of this paper. To go on the user has to choose option 0. At last the user is asked to insert the number of runs of n bit to extract from the selected data source, then how the input file is built. Input File Format: [0] ASCII - A sequence of ASCII 0’s and 1’s [1] Binary - Each byte in data file contains 8 bits of data For our tests the second option will be chosen. Then the test begins. B. How the suite is used To be able to validate the detection algorithm described in this paper many test runs had to be performed. To make the process faster, the NIST suite was modified to be able to perform a test without user interaction by passing all the needed parameters from command line. The needed parameters are the size in bits of a run (which is already provided by command line), the number of runs to perform and the input file. With the modified NIST suite an entire test is performed by calling: ./assess 32000 250 testfile.ext The above mentioned line tests the file testfile.ext with 250 runs of 32000 bits each. C. How the tests are performed This section has been extracted from the NIST Statistical Test Suite manual [3]. Each test is formulated to test a specific null hypothesis (H0), which states that the analyzed sequence is random. Associated with the null hypothesis is the alternative hypothesis (Ha), which is that the sequence is not random. For each test a decision is derived that accepts or rejects the null hypothesis. For each test, a relevant randomness statistic must be chosen and used to determine the acceptance or rejection of the null hypothesis. Under an assumption of randomness, such a statistic has a distribution of possible values. A theoretical reference distribution of this statistic under the null hypothesis is determined by mathematical methods. From this reference distribution, a critical value is determined (typically, this value is “far out” in the tails of the distribution, say out at the 99% point). During a test, a test statistic value is computed on the data (the sequence being tested). This test statistic value is compared to the critical

value. If the test statistic value exceeds the critical value, the null hypothesis for randomness is rejected. Otherwise, the null hypothesis (the randomness hypothesis) is not rejected (i.e., the hypothesis is accepted). Each test is based on a calculated test statistic value, which is a function of the data. The test statistic is used to calculate a P-value that summarizes the strength of the evidence against the null hypothesis. For these tests, each P-value is the probability that a perfect random number generator would have produced a sequence less random than the sequence that was tested, given the kind of non- randomness assessed by the test. If a P-value for a test is determined to be equal to 1, then the sequence appears to have perfect randomness. A P-value of zero indicates that the sequence appears to be completely non-random. A significance level (α) can be chosen for the tests. If P-value ≥ α, then the null hypothesis is accepted; i.e., the sequence appears to be random. If P-value < α, then the null hypothesis is rejected; i.e., the sequence appears to be non-random. The parameter α denotes the probability of the Type I error (i.e. the probability of rejecting a random sequence), and its default value (which will be used during the following tests) is 0.01, which means that one would expect one sequence in 100 sequences to be rejected by the test if the sequence was random. D. How the tests are interpreted The output data from the tests is made up of ASCII text files saved in the directory experiments/AlgorithmTesting/. This directory contains several subdirectories (one for each test) and two general files. Each test-specific subdirectory contains two test-specific files. Test-specific files:  results.txt contains the p-values of the single runs.  stats.txt contains test-specific computational information for each run. General files:  freq.txt contains the count of 0s and 1s in each run.  finalAnalysisReport.txt is the main result file. For further analysis the main result file finalAnalysisReport.txt will be used. The file has a structure as shown in Listing 1. This file contains a row for each test performed, and shows the results as: 

 

Columns C1-C10 show the distribution of the pvalues. The p-value range (0-1) is split into 10 subranges, and the software counts the number of runs with a p-value included in each (i.e., the column C1 contains the number of tests with a p-value between 0.0 and 0.1). Column P-value contains the P-value that arises via the application of a chi-square test, used to assess the uniformity of P-values for each test performed. Column Proportion shows the proportion of single runs which passed the test.

31

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 30-37 The Society of Digital Information and Wireless Communications, 2014 2013 (ISSN: 2305-0012)

Listing 1 Example of NIST final analysis report

At the end of the output file the minimum pass rate for each statistical test is shown, determined using the confidence interval defined as:

where = 1-α and n is the sample size. For example, if α = 0.01 and n = 1000, the confidence interval is:

which means the proportion should lie above 0.98056. When a test fails this condition (or the condition provided by the uniformity condition) a star (*) symbol is inserted near the failing value. III. DETECTING ENCRYPTED FILES As stated before, there is no simple mean of detecting files encrypted using modern cyphering tools such as Truecrypt. The files have no useful header section and present no useful extension. Moreover, the extension of encrypted files can be easily modified to make the file appear like a normal system file. It is not uncommon for Truecrypt files to have their name changed to something system-like, such as system.dll (on Windows systems), and be placed in unusual positions on the filesystem (as in the windows/system directory among many other .dll files). In such cases it is impossible to detect these

files at first glance or with a superficial analysis of the disk, and deeper methods must be employed. A. Detecting suspicious files A statistical analysis on each file in a computer could take a huge amount of time because of the high number of files in a normal system. For a first analysis it could be useful to be able to detect suspicious files, i.e. files which appear to be something different of what they are supposed to be. The next sections outline some simple methods for the identification of the most interesting target files for a first analysis. File size: If a file is used to hide an encrypted volume it has to be big enough to contain the data. Under this assumption, it is unlikely to find a small file used to hide an encrypted volume. The first files to be checked on an acquired file system should be the bigger ones. File extension: In normal conditions the extension of a file identifies the file type and so the software which should be able to handle it. For example, a .jpg file is supposed to be an image, and so it should be read by any software able to handle that kind of images. A big file with a known extension but which can’t be opened by the right software is suspicious. This means the first thing to do on a copy of a seized filesystem is try to open all the files with known extensions to assess whether they are what they look like or not. File type via header: In normal conditions all files are identified by their header, the first part of a file which contains information about the file itself. Known file types are identified by their header data, which should match with their extension (a .jpg file should present the .jpg header data) [4] [5]. A mismatch between file header and file extension (or having a known extension on a header less file) is suspicious.

32

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 30-37 The Society of Digital Information and Wireless Communications, 2014 2013 (ISSN: 2305-0012) It should be noted that this methodology could be disrupted using software tools able to change header information of a file or add a known header to a header less file, such as the Transmogrify tool [6]. File content via know hash: Known operating system and program installation files can be deemed not important (and thus not need to be further analyzed) if it can be proved that their content is not different from what is expected in a nonmanipulated case. To verify that it is possible to check them (for example by their MD5 hash) versus the same files in a reference system. Another way is to check their MD5 hashes against a database of known hashes, such as the one provided by the HashKeeper tool [7] provided free of charge by the National Drug Intelligence Center, a component of the Department of Justice of the United States. B. Analysing suspicious files Once a file has been deemed for further analysis (or, at least, each file which is not clearly recognized as forensically useless), the statistical analysis with the NIST Statistical Test Suite can be performed to assess whether the file data is random and thus it can be recognized with a certain probability as an encrypted volume file. The tests to be performed on the data must be chosen according to the size of the data itself, according to the minimum n values recommended for each test. Some tests require a big value for n, such as the Random Excursion tests (see appendix A-N and A-O) requiring at least one million bits. Testing sequences with a high value of n can require a large amount of time, so it’s preferable to select lower values and choose the tests list accordingly. Each file is split into 1 MB blocks, and for each block the NIST test is performed for 250 runs of 4 KB each (n = 320,000). These values have been chosen after extensive testing as a good tradeoff between run size and test duration. Once a block has been tested a report like the one in listing 1 is produced for further interpretation. For the results shown in the following chapters all tests have been considered except for the Binary Matrix Rank Test, the Random Excursion Test and the Random Excursion Variant Test, which require a value of n greater than 32000 to be considered reliable. C. Interpretation of analysis results After the analysis is completed by the NIST tool, results in the output file have to be interpreted to discriminate whether the sequence can be considered random or not. To choose whether the file is random or not we used the following algorithm: 

for each block (1MB): o for each kind of test performed on the selected block:  if the test is run only one time: the test is passed if both p-value and proportion are deemed passed by the NIST suite.





if the test is run many times: the test is passed if at least 90% of the times it is deemed passed by the NIST suite. o The block is deemed passed if at least 70% of the test kinds performed are passed. The whole file is deemed passed (and thus random) if at least 70% of the blocks are passed.

To achieve this result we defined three thresholds. These thresholds have been determined by experiments with various samples of “normal” data and data produced by cyphering algorithms. The algorithm has been implemented as a Python script which receives as input the finalAnalysisReport.txt’s of all the blocks of a single file concatenated. D. Example of interpretation As an example to illustrate the algorithm we decide to test a 50 MB file. The file is split into 50 blocks of 1 MB each, and on each block the NIST test is performed with 250 runs of 4 KB each. We assume the first test gives an output like this: - Test: 49, lines: 188, passed: 160 - test: Frequency, passed: 1/1 (PASS) - test: BlockFrequency, passed: 1/1 (PASS) - test: CumulativeSums, passed: 2/2 (PASS) - test: Runs, passed: 1/1 (PASS) - test: LongestRun, passed: 1/1 (PASS) - test: Rank, passed: 1/1 (PASS) - test: FFT, passed: 1/1 (PASS) - test: NonOverlappingTemplate, passed: 148/148 (PASS) - test: OverlappingTemplate, passed: 1/1(PASS) - test: Universal, passed: 0/1 (FAIL) - test: ApproximateEntropy, passed: 0/1(FAIL) - test: Serial, passed: 2/2 (PASS) - test: LinearComplexity, passed: 1/1(PASS) Tests passed: 11/13 (84%) PASSED In the previous listing it is shown that 188 tests of 13 different kinds have been performed on the block. The reason for this incongruity is that some tests are performed many times (such as the non-overlapping template test, which is performed 148 times, each time with a different test template). To keep all the tests on the same level of importance it has been decided not to count every occurrence of them, but to deem the kind “passed” if at least 90% of the occurrences are passed. This way the non-overlapping template test, run 148 times, still counts as one “passed”. After all the test results are analyzed the percentage of passed tests against the number of tests is calculated. In the example this percentage is 84%, over the 70% threshold, so the whole block is deemed passed. After testing all the 1 Mb blocks in the file, a final statistic is calculated:

33

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 30-37 The Society of Digital Information and Wireless Communications, 2014 2013 (ISSN: 2305-0012) Final results: 49/50 blocks passed PASSED If at least 70% of the blocks is deemed passed, then the whole file can be considered random. IV. TEST CASES To test the proposed detection algorithm we chose the most widely used open source encryption software, TrueCrypt, to create some test cases. The results are validated by testing TrueCrypt volume files and volumes versus standard files from a reference repository. A. TrueCrypt files Using TrueCrypt version 7.0a on Mac Os X we created three encrypted volume files of 50 MB each, one with no data, one half full and one full. Results are shown in Table I. TrueCrypt files have no header and are recognized as random data for all the file size (both encrypted data and empty space are equally random). TABLE I TEST RESULTS FOR TRUECRYPT VOLUMES

File Empty volume Half full volume Full volume

Blocks

Random blocks

Nonrandom blocks

Perc.

database that can be used for research purposes. From that repository we downloaded directory 000 and chose some of the files larger than 5 MB to test. Results are shown in Table III. It is shown that all files but one are clearly recognized as nonrandom. TABLE III TEST RESULTS FOR STANDARD FILES

File

Blocks

000033.xls 000030.xls 000113.doc 000134.ppt 000143.pdf 000187.pdf 000208.pdf 000559.ppt 000564.csv 000736.gz 000766.ps 000801.doc 000938.txt

6 8 14 9 5 9 6 17 8 6 5 6 29

Random blocks 0 0 0 0 0 6 0 0 0 0 0 0 0

Nonrandom blocks 6 8 14 9 5 3 6 17 8 6 5 6 29

Perc. 0% 0% 0% 0% 0% 66% 0% 0% 0% 0% 0% 0% 0%

V. SIMILAR TOOLS FOUND IN LITERATURE In literature we found a little number of tools which claim to be able to detect cyphered files. The most interesting tools 50 49 1 98% are:  FI Tools from Forensics Innovations 50 50 0 100%  TCHunt from 16 Systems Interestingly all their developers agree on the fact that it is impossible to accurately detect encrypted files from random B. TrueCrypt partitions A TrueCrypt encrypted partition was created on a USB files, because they appear identical on every analysis. It is mass storage of 256 MB using TrueCrypt version 7.0a on Mac remarked that the only method to provide some sort of Os X. This partition was then dumped to a file using EnCase v. detection is to identify files containing random data [8] [9]. TCHunt uses another three file attributes to try to detect 4.20 for Windows. The file was then split into chunks of 50 TrueCrypt files: MB each, and the above described analysis was performed on  The suspect file size modulo 512 must equal zero, each part. The results are shown in Table II. It is shown that because TrueCrypt files are built from 512 bytes all the parts are correctly recognized as random data. blocks. TABLE II  The suspect file size is at least 19 KB in size, because TEST RESULTS FOR TRUECRYPT VOLUME FILES this is the minimum dimension for a TrueCrypt Random Nonrandom volume file. File Blocks Perc. blocks blocks  The suspect file must not contain a common file Chunk 1 50 48 2 96% header. Chunk 2 50 49 1 98% After performing some tests it appears that these tools have Chunk 2 50 47 3 94% almost the same success rate of the methodology explained, Chunk4 50 50 0 100% because they work on the same hypothesis. Chunk 5 50 47 3 94% VI. CONCLUSIONS C. Standard files This paper was motivated by the lack of open source In order to test the classifier we looked on the Internet for a forensically sound tools to provide some sort of detection of publicly available repository of standard files, here intended encrypted volume files. While it was known from the as files more likely to be found on a computer (which should beginning that a true detection of this sort of archives is not be classified nonrandom). We found a repository named possible, a methodology was developed to identify filesystem Digital Forensics Corporai which was set in order to have a elements which with high probability contain encrypted data. 50

49

1

98%

34

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 30-37 The Society of Digital Information and Wireless Communications, 2014 2013 (ISSN: 2305-0012) The methodology was tested against a number of test cases which proved it to be reliable for identifying data encrypted with a popular encryption tool. The next step in this research work will be to provide an integrated software tool which could be used by both researchers and practitioners in digital forensics to easily scan a filesystem and identify realistic candidates for further cryptographic examination. APPENDIX A TESTS IN THE NIST SUITE This appendix reports a brief description of the statistical tests used in the NIST Suite. The descriptions are taken from the NIST Statistical Test Suite manual [3]. A. Frequency (Monobit) Test The purpose of this test is to determine whether the number of ones and zeros in a sequence are approximately the same as would be expected for a truly random sequence. The test assesses the closeness of the fraction of ones to 0.5, that is, the number of ones and zeroes in a sequence should be about the same. It is recommended that each sequence to be tested consist of a minimum of 100 bits (i.e., n ≥ 100). B. Frequency Test within a block The purpose of this test is to determine whether the frequency of ones in an M-bit block is approximately M/2, as would be expected under an assumption of randomness. For block size M=1, this test degenerates to test 1, the Frequency (Monobit) test. It is recommended that each sequence to be tested consist of a minimum of 100 bits (i.e., n ≥ 100). Note that n ≥ MN. The block size M should be selected such that M ≥ 20, M > .01*n and N < 100. C. Run test The purpose of the runs test is to determine whether the number of runs of ones and zeros of various lengths is as expected for a random sequence. A run of length k consists of exactly k identical bits and is bounded before and after with a bit of the opposite value. In particular, this test determines whether the oscillation between such zeros and ones is too fast or too slow. It is recommended that each sequence to be tested consist of a minimum of 100 bits (i.e., n ≥ 100). D. Test for the Longest Run of Ones in a Block The purpose of this test is to determine whether the length of the longest run of ones within the tested sequence is consistent with the length of the longest run of ones that would be expected in a random sequence. The recommended length of the sequence to be tested is n ≥ 128. According to this dimension, the length M of the blocks is chosen as follows: Minimum n 128

M 8

6272 750000

128 104

E. Binary Matrix Rank Test The purpose of this test is to check for linear dependence among fixed length substrings of the original sequence, by calculating the rank of disjoint sub-matrices of the entire sequence. The probabilities for M = Q = 32 (where M is the number of rows in each matrix, and Q the number of columns) have been calculated and inserted into the code. Other choices of M and Q may be selected, but the probabilities would need to be calculated. The minimum number of bits to be tested must be such that n ≥ 38MQ (i.e., at least 38 matrices are created). For M = Q = 32, each sequence to be tested should consist of a minimum of 38,912 bits. F. Discrete Fourier Transform (Spectral) Test The purpose of this test is to detect periodic features (i.e., repetitive patterns that are close to each other) in the tested sequence that would indicate a deviation from the assumption of randomness. The intention is to detect whether the number of peaks exceeding the 95% threshold is significantly different than 5%. It is recommended that each sequence to be tested consist of a minimum of 1000 bits (i.e., n ≥ 1000). G. Non-overlapping Template Matching Test The purpose of this test is to detect sequences with too many occurrences of a given non-periodic (aperiodic) pattern. An m-bit window is used to search for a specific m-bit pattern. If the pattern is not found, the window slides one bit position. If the pattern is found, the window is reset to the bit after the found pattern, and the search resumes. The test code has been written to provide templates for m = 2, 3,...,10. It is recommended that m = 9 or m = 10 be specified to obtain meaningful results. Although N = 8 has been specified in the test code, the code may be altered to other sizes. However, N should be chosen such that N ≥ 100 to be assured that the P-values are valid. Additionally, be sure that M > 0.01 ≥ n and N = floor(n/M). H. Overlapping Template Matching Test Both this test and the Non-overlapping Template Matching (section A-G) test use an m-bit window to search for a specific m-bit pattern. As with the test in A-G, if the pattern is not found, the window slides one bit position. The difference between this test and the test in section A-G is that when the pattern is found, the window slides only one bit before resuming the search. The values of K, M and N have been chosen such that each sequence to be tested consists of a minimum of 106 bits (i.e., n ≥ 106). Various values of m may be selected, but for the time being, NIST recommends m = 9 or m = 10. I. Maurers Universal Statistical Test

35

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 30-37 The Society of Digital Information and Wireless Communications, 2014 2013 (ISSN: 2305-0012) The purpose of the test is to detect whether or not the sequence can be significantly compressed without loss of information by evaluating the number of bits between matching patterns. A significantly compressible sequence is considered to be non-random. This test requires a long sequence of bits (n ≥ (Q+K)L) which are divided into two segments of L-bit blocks, where L should be chosen so that 6 ≤ L ≤ 16. The first segment consists of Q initialization blocks, where Q should be chosen so that Q = 10 ∙ 2L. The second segment consists of K test blocks, where . The values of L, Q and n should be chosen as follows: n ≥ 387840 ≥ 904960 ≥ 2068480 ≥ 4654080 …

L 6 7 8 9 …

640 1280 2560 5120 …

J. Linear Complexity Test The purpose of this test is to determine whether or not the sequence is complex enough to be considered random by evaluating the length of a linear feedback shift register (LFSR). Random sequences are characterized by longer LFSRs. An LFSR that is too short implies non-randomness. It is recommended that n ≥ 106. The value of M must be in the range 500 ≤ M ≤ 5000, and N ≥ 200. K. Serial Test The purpose of this test is to determine whether the number of occurrences of the 2m m-bit overlapping patterns is approximately the same as would be expected for a random sequence. Random sequences have uniformity; that is, every m-bit pattern has the same chance of appearing as every other m-bit pattern. Note that for m = 1, the Serial test is equivalent to the Frequency test. It is recommended to choose m and n such that . L. Approximate Entropy Test The purpose of the test is to compare the frequency of overlapping blocks of two consecutive/adjacent lengths (m and m+1) against the expected result for a random sequence. It is recommended to choose m and n such that . M. Cumulative Sums (Cusum) Test The focus of this test is the maximal excursion (from zero) of the random walk defined by the cumulative sum of adjusted (-1, +1) digits in the sequence. The purpose of the test is to determine whether the cumulative sum of the partial sequences occurring in the tested sequence is too large or too small relative to the expected behavior of that cumulative sum for random sequences. This cumulative sum may be

considered as a random walk. For a random sequence, the excursions of the random walk should be near zero. It is recommended that each sequence to be tested consist of a minimum of 100 bits (i.e., n ≥ 100). N. Random Excursions Test The focus of this test is the number of cycles having exactly K visits in a cumulative sum random walk. The cumulative sum random walk is derived from partial sums after the (0, 1) sequence is transferred to the appropriate (-1, +1) sequence. A cycle of a random walk consists of a sequence of steps of unit length taken at random that begin at and return to the origin. The purpose of this test is to determine if the number of visits to a particular state within a cycle deviates from what one would expect for a random sequence. This test is actually a series of eight tests (and conclusions), one test and conclusion for each of the states: -4, -3, -2, -1 and +1, +2, +3, +4. It is recommended that each sequence to be tested consist of a minimum of 1,000,000 bits (i.e., n ≥ 106). O. Random Excursions Variant Test The focus of this test is the total number of times that a particular state is visited (i.e., occurs) in a cumulative sum random walk. The purpose of this test is to detect deviations from the expected number of visits to various states in the random walk. This test is actually a series of eighteen tests (and conclusions), one test and conclusion for each of the states: -9, -8, ..., -1 and +1, +2, ..., +9. It is recommended that each sequence to be tested consist of a minimum of 1,000,000 bits (i.e., n ≥ 106). REFERENCES [1] [2]

[3]

[4]

[5]

[6]

[7] [8]

A. Savoldi, M. Piccinelli, and P. Gubian. A statistical method for detecting on-disk wiped areas. Digital Investigation, Elsevier, Volume 8, 2012. A. Czeskis, D.J.St. Hilaire, T. Kohno, K. Koscher, S.D. Gribble, and B. Schneier. Defeating Encrypted and Deniable File Systems: TrueCrypt v5.1a and the Case of the Tattling OS and Applications. Retrieved January, 2011, from http://pdos.csail.mit.edu/6.858/2010/readings/truecrypt.pdf. D. Banks, E. Barker, J. Dray, A. Heckert, S. Leigh, M. Levenson, J. Nechvatal, A. Rukhin, M. Smid, J. Soto, M. Vangel, and S. Vo. NIST Statistical Test Suite, 2008. Retrieved January, 2011, from http://csrc.nist.gov/groups/ST/toolkit/rng/documents/sts-2.1.zip. D.J. Hickok, D.R. Lesniak, and M.C. Rowe. File Type Detection Technology, 2005. Retrieved January, 2011, from http://www.uwplatt.edu/csse/courses/prev/csse411materials/StudentConferencePublications/MICS2005 File Type Detection Technology.pdf. C. Sadowski and G. Levin. SimHash: Hash-based Similarity Detection, 2007. Retrieved January, 2011, from http://simhash.googlecode.com/svn/trunk/paper/SimHashWithBib. pdf. B. Blunden. Anti-Forensics: The Rootkit Connection. In Black Hat USA 2009 Conference proceedings, 2009. Retrieved January, 2011, from http://www.blackhat.com/presentations/bh-usa09/BLUNDEN/BHUSA09-Blunden-AntiForensics-PAPER.pdf. Hashkeeper web site. Retrieved on February, 2011 from http://www.justice.gov/ndic/domex/hashkeeper.htm. Comments from president of Forensic Innovations, Inc. Rob Zirnstein on FI blog post TrueCrypt is now detectable. Retrieved January, 2011, from http://www.forensicinnovations.com/blog/?p=7.

36

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 30-37 The Society of Digital Information and Wireless Communications, 2014 2013 (ISSN: 2305-0012) [9]

i

TCHunt FAQs from 16 System website. Retrieved on February 14th, 2011 from http://16s.us/TCHunt/faq/.

http://domex.nps.edu/corp/files/govdocs1/

37

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 38-48 The Society of Digital Information and Wireless Communications, 2014 2013 (ISSN: 2305-0012)

Detection and Protection Against Intrusions on Smart Grid Systems Ata Arvani and Vittal S. Rao Texas Tech University Electrical and Computer Engineering Department Box 43102, Lubbock, Texas 79409, USA [email protected], [email protected]

ABSTRACT The wide area monitoring of power systems is implemented at a central control center to coordinate the actions of local controllers. Phasor measurement units (PMUs) are used for the collection of data in real time for the smart grid energy systems. Intrusion detection and cyber security of network are important requirements for maintaining the integrity of wide area monitoring systems. The intrusion detection methods analyze the measurement data to detect any possible cyber attacks on the operation of smart grid systems. In this paper, the model-based and signal-based intrusion detection methods are investigated to detect the presence of malicious data. The chi-square test and discrete wavelet transform (DWT) have been used for anomaly-based detection. The false data injection attack (FDIA) can be detected using measurement residual. If the measurement residual is larger than expected detection threshold, then an alarm is triggered and bad data can be identified. Avoiding such alarms in the residual test is referred to as stealth attack. There are two protection strategies for stealth attack: (1) Select a subset of meters to be protected from the attacker (2) Place secure phasor measurement units in the power grid. An IEEE 14-bus system is simulated using real time digital simulator (RTDS) hardware platform for implementing attack and detection schemes.

KEYWORDS Cyber security, stealth attack, wide area monitoring, smart grid, anomaly-based detection methods, discrete wavelet transform.

1 INTRODUCTION

The generation, transmission, and distribution of electric power systems embedded with real time measurements make the smart grid the most dependable critical infrastructure in the world. The present monitoring systems depends on state estimation, which is based on the supervisory control and data acquisition (SCADA) systems for the collection of data from field devices such as remote terminal units (RTUs) and sent up to the central control center [1]. In the future smart grid systems, the wide area monitoring will be accomplished by collecting system level information in real time by using phasor measurement units (PMUs) and phasor data concentrators (PDCs). The data obtained from PMUs will be used for the state estimation and implementation of control strategies for optimal control of smart grid systems [2-4]. The PMUs which are also called synchrophasors provide accurate measurements of active power, reactive power, voltage, current along with phasor angles in real-time. The data from various remote locations will be synchronized with a common time source using global positioning systems (GPS). In a typical smart grid energy network synchrophasors are used along with PDCs where the data is collected. The synchrophasors can increase the reliability of power systems embedded with renewable energy sources, like the solar and wind power by triggering the corrective actions for accounting the unpredictable power generation. The synchrophasors hold the key to the future power 38

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 38-48 The Society of Digital Information and Wireless Communications, 2014 2013 (ISSN: 2305-0012) systems by increasing the reliability, operational efficiency and quality of power distribution [5]. Early power system networks used communication standards like DNP3 protocols. These protocols have limitations to handle realtime data and synchronization with the geographically dispersed synchrophasor devices. The current PMUs use IEEE C37.118 protocols for communication, which defines the message and communication standards for synchronized networks in real-time. In future electrical power systems, the wide use of PMUs is inevitable and thus raises the importance of cyber security [6]. There are different methods to detect the malicious data. The main objective of this paper is to investigate the model- and signal-based intrusion detection methods to detect any anomalies in measurement data. The main feature of model-based method lies in the development of dynamic models of the power system and using the chi-square test along with largest normalized residual to detect and identify the malicious data. The signal-based method exploits the statistical properties of the signal and discrete wavelet transform are used to detect and identify the malicious data at different levels [7].

Figure 1. Schematic diagram of IEEE 14-bus system

These files are converted to RSCAD for implementation on RTDS system. An experimental smart grid test bed with hardware-inthe-loop (HIL) simulation capabilities is available at Texas Tech University and a schematic is shown in Figure 2. These facilities were used to implement attack and intrusion methods.

2 MODELLING OF IEEE 14-BUS SYSTEM The benchmark IEEE 14-bus system has been investigated by a number of researchers for the analysis of dynamic system stability, power flow analysis and state estimation problems [8]. The power system simulator for engineering (PSS/E) is a commercially available software package for simulating, analyzing, and optimizing of power systems. This package has been used to build the PSSE files for the IEEE 14-bus system shown in Figure 1.

Figure 2. Schematic of smart grid test bed at Texas Tech University

3 MODEL-BASED INTRUSION DETECTION METHODS Due to presence of malicious data in the power system measurements, the operation of power system will be compromised. Hence we need an intrusion detection method for the detection of malicious data in the measurements [10]. In this 39

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 38-48 The Society of Digital Information and Wireless Communications, 2014 2013 (ISSN: 2305-0012) section we present an intrusion detection method using static state estimation algorithms. The chisquare distribution test and largest normalized residual tests are used to detect and identify the malicious data [11].

If

the bad data will be suspected.

The largest normalized residual test can be used to identify bad data. A gain matrix is defined as: (6)

The linear measurement equation is given by: And the hat matrix is: (1) Where is the measurement vector, is the Jacobian coefficient matrix, and is the error vector with: and . The weighted least square (WLS) estimator of the linear state vector can be obtained as follows: (2) And the estimated value of

(7) The hat matrix, K, is used to find the residual sensitivity matrix, S, where is the identity matrix: (8) is multiplied by the error vector, , to find the measurement residuals, . The measurement residual vector is divided by the square root of the residual covariance matrix, , which is defined as: (9)

is: (3)

The intrusion detection method consists of two steps: 1) malicious data detection and 2) identification of bad data. The chi-squares test is used to detect the malicious data and the largest normalized residual test is then used to identify the bad data. The objective function can be obtained for corresponding measurements: (4) Chi-square distribution table corresponding to a detection confidence with probability and degree of freedom can be obtained as follows: (5)

Thus, normalized value of the residual can be obtained as follows: (10) The largest normalized residual will be suspected as bad data. We have simulated the IEEE 14-bus system and its measurement configuration for the demonstration of intrusion detection methods [8]. The number of state variable, , for this system is 27, made up of 14 bus voltage magnitudes and 13 bus voltage phase angles, slack bus phase angle being excluded from the state list. There are altogether measurents, i.e., 1 voltage magnitude measurement, 8 pairs of real/reactive power injections, and 12 pairs of real/reactive flows. The degrees of freedom for the approximate chi-square distribution of the objective function will be: The real power injection at bus 2 is manipulated by the man-in-the-middle intentionally, to simulate bad data as shown in Table 1. 40

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 38-48 The Society of Digital Information and Wireless Communications, 2014 2013 (ISSN: 2305-0012) Table 1. Real power manipulation at bus 2 Measurement Type

No bad data 0.183

One bad data 0.483

Tables 2 and 3 illustrate the state estimation of IEEE 14-bus system without malicious data and with malicious data, respectively. Table 2. IEEE 14-Bus system without malicious data Bus Number Estimated State (No Bad Data) 1 2 3 4 5 6 7 8 9 10 11 12 13 14

1 1.0068 0.9899 0.9518 0.9579 0.9615 1.0185 0.9919 1.0287 0.9763 0.9758 0.9932 1.0009 0.9940

0.00 0.00 -5.5265 -14.2039 -11.4146 -9.7583 -16.0798 -14.7510 -14.7500 -16.5125 -16.7476 -16.5397 -17.0203 -17.0583

Table 3. IEEE 14-Bus system with malicious data Bus Number

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Estimated State (One Bad Data) 1 0.9897 0.9731 0.9329 0.9370 0.9407 0.9992 0.9717 1.0094 0.9559 0.9554 0.9733 0.9812 0.9742

0.00 0.00 -5.5304 -14.9925 -12.3482 -10.6143 -17.2033 -15.8285 -15.8269 -17.6649 -17.9071 -17.6846 -18.1813 -18.2210

The test threshold at confidence level is obtained by MATLAB function: For the first case (No malicious data), , bad data will not be suspected. For the second case (with malicious data in real power injection at bus 2), , bad data will be suspected. Figure 3 shows the active power at bus number 2 for the IEEE 14-bus system.

Figure 3. Active power at bus No 2

The normalized residual tests are used to detect and eliminate the bad data for this measurement set. The weighted least squares (WLS) state estimator results for the significant measurement residuals shows that the power injection at bus 2 is detected as bad data and ignored from the measurement set. We verified the efficiency of the model-based algorithm using chi-square test and largest normalized residual for detecting the malicious data. 4 STEALTH ATTACK In this section, we investigate the stealthy false data injection attack (SFDIA) in the state estimation of power system. The bad data detection can be accomplished by calculating the measurement residual as follows: (11) if the measurement residual is larger than expected detection threshold, then an alarm is triggered and 41

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 38-48 The Society of Digital Information and Wireless Communications, 2014 2013 (ISSN: 2305-0012) bad data can be identified. Avoiding such alarms in the residual test is referred to as stealth attack. The basic principle of stealthy false data injection attack can be represented by: (12) where represents the malicious data measurements, and a is referred to as an attack vector. If the attack vector is a linear combination of the column vectors of H, that is , the residual test can be bypassed by the attacker. The is an arbitrary nonzero vector. The jth element attack vector being nonzero means that attacker manipulates the measurement. The new state estimation can be calculated as follows: (13) where (14) considering , the residual test can be computed as:

Hence, the measurement residual of bad data is less than detection threshold and can bypass the residual test. In general, there are three different scenarios: (1) Protected meters (2) Verifiable states (3) Combined scenario For protected meters we assume that the attacker can access to particular meters and modify them. Let be the set of the m particular meters which can be accessed by the attacker. For the protected meters which cannot be accessed, the associated attack vector would be zero. 1) Targeted attack: In a targeted FDIA, the attacker aims to inject errors into state estimation of some particular state variables. a) Constrained case: The error injected into the state estimation can be calculated as follows:

(16) Let represents the set of state variable which can be verified independently, i.e., for Therefore, the attacker can substitute into , and verify if for . If so, the attack vector can be generated. b)unconstrained case: For this case, the attack vector should meet these conditions: 1) 2) for all 3) is a particular value for 2) Random attack: In a random attack, the attacker aims to inject error into state estimation regardless of any particular state variables. Theorem 1. if and only if , where Theorem 2. Let m be the number of particular meters which can be accessed. if the attacker can always bypass the residual test which satisfies with for where represents the set of particular meters which can be accessed. Demonstration of stealth attack using IEEE 14bus system: In this section, we investigate the targeted attack in IEEE 14-bus system for the first nineteen measurements. The linear measurement equation of IEEE 14-bus system can be expressed as follows: (17) where

is a measurement vector, H is Jacobian coefficient matrix, and is a state vector. The attack vector can be represented as follows: (18) 42

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 38-48 The Society of Digital Information and Wireless Communications, 2014 2013 (ISSN: 2305-0012) which where

is

,

is , is , and is square matrix. As mentioned earlier, the attack vector for protected meters would be zero. Hence, the and we have: (19) and (20) The numerical values are given in Appendix. Choosing arbitrary, the attack vector can be obtained as:

(21) There are two protection strategies for stealth attack. (1) Select a subset of meters to be protected from the attacker (2) Place secure phasor measurement units in the power grid For the first strategy, let P be the minimum number of meters that the attacker needs to satisfies the detection evading condition as follows: (22) The injected error to the jth state can be expressed as: (23) where

is

column

and

after deleting the jth is the best possible attack

vector modifying at least the jth state with the objective function and constraint as follows:

such that:

(24)

is the number of nonzero elements in

. In the second strategy, by adding some other phasor measurement units into the power system, the Jacobian coefficient matrix would be modified. Consider be the Jacobian associated matrix corresponding to a secure PMU at bus P. The attacker should satisfies the following condition: (25) Given , the goal for grid designer is to find a bus to place a secure PMU such that: (26) As a result, the attacker should find another solution for c. 5 SIGNAL-BASED INTRUSION DETECTION METHODS A brief review of discrete wavelet transform (DWT) is presented in this section [12]. DWT is a mathematical tool to decompose signals and is used to extract information in different resolution levels. Wavelet transform breaks the signal into its wavelets, which are scaled and shifted versions of a signal waveform known as the mother wavelet. Wavelet analysis is suitable for revealing scaling properties of the temporal and frequency dynamics simultaneously. The irregularity in shape and compactly supported nature of wavelets make wavelet analysis an ideal tool for analyzing signals of a non-stationary nature. Their fractional nature allows them to analyze signals with discontinuities or sharp changes, while their compactly supported nature enables temporal localization of a signal’s features. A one-dimensional discrete wavelet transform is composed of decomposition (analysis) and reconstruction (synthesis). Discrete wavelet transform produces two sets of constants term as approximation and detail coefficients. The approximation coefficients are the high scale, low 43

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 38-48 The Society of Digital Information and Wireless Communications, 2014 2013 (ISSN: 2305-0012) The benchmark and corrupted data of voltage and current are shown in Figures 5 and 6, respectively. Discrete wavelet transform is used to analyze the measured signal, by calculating the statistical properties of the signal. 1.1

x 10

4

Corrupted Signal Original Signal

1.09

Voltage magnitude

frequency components and the detail coefficients are the low scale, high frequency components. The signal is passed through a series of high pass and low pass filters to analyze respective functions at each level. Wavelet analysis starts by selecting basic wavelet function, called the mother wavelet, Wavelet representaion of a function , defined for all can be given by:

1.08 1.07 1.06 1.05 1.04 0

100

200

300 Time stamp

400

500

Figure 5. Original and corrupted data of voltage signal 500

(28)

Corrupted Signal Original Signal

450

Current magnitude

By considering Haar wavelet the scaling function and wavelet function are defined as:

400 350 300 250 200 150 100 50 0

(29)

For a given signal, approximation and detail coefficients can be obtained by convolving lowpass filter and high-pass filter followed by down sampler, respectively. (30) (31) The low-pass filters are represented by L, and the high-pass filters are represented by H. Anomaly detection of malicious data consists of three parts as shown in Figure 4. The first part is the PMU signal from the power system. The second part consists of discrete wavelet transformation to analyze the signal [13-15]. In the third part, the threshold values are compared for the determination of the anomalies in the signal.

Figure 4. Anomaly-based intrusion detector

100

200

300 Time stamp

400

500

Figure 6. Original and corrupted data of current signal

We employ Haar filter and compute the onedimensional discrete wavelet transform up to 5 levels. In order to obtain the thresholds for anomaly-based intrusion detection the distribution of the wavelet reconstructed signal without anomaly should be analyzed. Then, normality is verified by Lilliefors test for goodness of fit to normal distribution [16-18]. This has a normal distribution at 5% significance level. We can detect anomaly intrusion by choosing some of the levels through selective reconstruction. Table 4 and Table 5 show some statistical properties of original and corrupted data of voltage and current signal. It should be noted that the original data could be considered as Gaussian white noise, and anomaly could be considered as random signal. For any random variable, choosing confidence interval yields to: (32) 44

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 38-48 The Society of Digital Information and Wireless Communications, 2014 2013 (ISSN: 2305-0012) Table 4. Statistical properties of voltage signal

This interval corresponds to 99.7% confidence level, which means that we can detect anomalies with 0.3% error rate.

Original data of voltage magnitude

Level

Figure 7. Wavelet decomposition of original voltage signal

Standard deviation

Corrupted data of voltage magnitude

Threshold

Level

Standard deviation

1

0.0944

0.2832

1

5.121

2

0.1265

0.3795

2

4.854

3

20.67

62.01

3

21.64

4

47.13

141.39

4

48.11

5

102.2

306.60

5

101.4

Figure 8. Wavelet decomposition of corrupted voltage signal

The PMU signals are analyzed at different resolution levels. Figures 7 and 8 show the approximation and detail coefficients of original and corrupted signal of voltage up to level 5. By comparing the analyzed information with thresholds it is possible to detect the anomalies and alert the operator regarding the presence of anomalies in the data. In order to detect shorter anomalies we have analyzed the signal at higher level such as 1 and 2. For example, by selecting the thresholds at level 1 to -0.2832 and 0.2832 respectively, which is equivalent to we can detect the anomalies with error rate of 0.3%. Table 4 shows the statistical parameters of voltage signal like standard deviation for original and corrupted data.

Figure 9. Thresholds values and detail coefficients at different levels of voltage signal

We can set the thresholds for each level, which are equivalent to confidence level to detect the anomalies. We have repeated the procedure for current signals. The detail and approximation coefficients of original current signal and corrupted current signals are shown in Figures 10 and 11, respectively.

Figure 10. Wavelet decomposition of original current signal

45

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 38-48 The Society of Digital Information and Wireless Communications, 2014 2013 (ISSN: 2305-0012)

Figure 11. Wavelet decomposition of corrupted current signal

Table 5 shows the statistical parameters of current signal like standard deviation for original and corrupted data.

Figures 9 and 12 show the detail coefficients and corresponding thresholds for original and corrupted signal at different levels up to 5. The values located on the top and bottom of the thresholds indicate that intrusion has been occurred in the network. For the corrupted voltage and current signals, Figures 9 and 12, the detail coefficients at level 1, and level 2 are greater than the corresponding thresholds and the malicious data has been detected. The results show that the use of proposed method successfully detected the anomalies in the data. 6 CONCLUSIONS

Table 5. Statistical properties of current signal Original data of current magnitude

Level

Standard deviation

Corrupted data of current magnitude

Threshold

Level

Standard deviation

1

5.122

15.36

1

13.57

2

4.84

14.52

2

14.94

3

17.86

53.58

3

19.47

4

42.86

128.58

4

43.44

5

111.4

334.2

5

110

Wide-area monitoring and control that coordinates the various devices of the power system to improve system-wide dynamic performance and stability is being implemented in the smart grids. These critical devices usually have the most significant impacts on power system oscillation, damping, performance and stability. The cyber security and the data integrity are very important for successful integration of phasor measurement units for automatic control of electric power systems. In this paper a cyber security tool is developed and presented for intrusion detection. We have simulated an IEEE benchmark 14-bus system using RTDS system. The bench mark and malicious data has been generated in our laboratory. The proposed cyber security tool for the detection of intrusion detection has been successfully employed on this data. The results are very satisfactory. The detection method depends on the selection of threshold values. In the future we will be comparing this method with the methods based on measurement residual detection methods. 7ACKNOWLEDGMENTS

Figure 12. Thresholds values and detail coefficients at different levels of current signal

The authors gratefully acknowledge support of the National Science Foundation through grant ECCS46

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 38-48 The Society of Digital Information and Wireless Communications, 2014 2013 (ISSN: 2305-0012) 1040161 for acquiring the research instrumentation used in this research work.

0 0 0 0 0 0 0 0 1.8658 -0.3964 0 0 0 0 0 0 0 0 0 0 0];

8 APPENDIX

[ 0 0 0 0 0; 29.3120 -4.6148 -4.9889 -5.0489 0; -4.2921 8.8195 -4.5274 0 0; 0 0 -4.6375 0 0; 0 0 0 0 0; 0 0 0 0 0; 0 0 0 0 -4.1249; 0 0 0 0 -3.2113; 0 0 0 0 0; -9.1431 0.3773 1.0925 1.2497 0; 1.7368 3.7701 2.0333 0 0; 0 0 0.2704 0 0; 0 0 0 0 0; 0 0 0 0 0; 0 0 0 0 2.0106; 0 0 0 0 1.6083; 0 0 0 0 0; 15.6191 0 0 0 0; 4.6148 -4.6148 0 0 0];

[0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0; 0 0 0 0 0 0 0 0 -6.3806 9.6128 -0.3964 -1.1406 -1.2999 0 0 0 0 0 0 0 0 0; 0 0 0 0 0 0 0 0 0 -1.7546 1.9797 2.1227 0 0 0 0 0 0 0 0 0 0;19.2277 -5.7921 8.7981 0 0 0 0 0 0 0 0 -0.2823 0 0 0.0001 -0.0001 0.2772 0 0 0 0 0; -5.7921 5.7921 0 0 0 0 0 0 0 0 0 0 0 0 0.0001 0.0001 0 0 0 0 0 0; 0 0 -9.8589 14.1190 -4.2601 0 0 0 0 0 0 0 0 0 0 0 -3.8489 5.5507 -1.8509 0 0 0;0 0 0 -4.2733 8.3982 0 0 0 0 0 0 0 0 -1.9742 0 0 0 -1.8521 3.7749 0 0 0;0 0 0 0 0 5.4531 -2.2418 0 0 0 0 0 0 -1.5792 0 0 0 0 0 3.9574 -2.4896 0;0 0 -2.8194 0 0 0 -2.2033 5.0226 0 0 0 0 0 0 0 0 -1.4438 0 0 0 -1.1293 2.3159;0 0 0 0 0 0 0 0 -14.5614 30.3178 -4.8487 -5.2085 5.2515 0 0 0 0 0 0 0 0 0;0 0 0 0 0 0 0 0 0 -4.3361 9.4306 -4.7266 0 0 0 0 0 0 0 0 0 0;0.0001 0.0001 0.2706 0 0 0 0 0 0 0 0 -4.8415 0 0 19.3948 5.6308 -9.0119 0 0 0 0 0; -0.0001 0.0001 0 0 0 0 0 0 0 0 0 0 0 0 -5.8395 6.0483 0 0 0 0 0 0;0 0 3.7577 -5.5958 1.8382 0 0 0 0 0 0 0 0 0 0 0 -10.0985 14.3511 -4.2895 0 0 0;0 0 0 1.8072 -3.8177 0 0 0 0 0 0 0 0 -4.0502 0 0 0 -4.3794 8.4210 0 0 0; 0 0 0 0 0 -4.0828 2.4745 0 0 0 0 0 0 -3.1532 0 0 0 0 0 5.4166 -2.2555 0;0 0 1.4096 0 0 0 1.1224 -2.5320 0 0 0 0 0 0 0 0 -2.8879 0 0 0 -2.2168 5.1031;0 0 0 0 0 0 0 0 6.5953 -3.5294 0 0 0 0 0 0 0 0 0 0 0 0; 0

[-4.6609 0 4.6609 0 0; 0 0 4.6375 0 0; 0 0 1.7283 0 0; -4.8100 0 0 4.8100 0; 0 0 -20.0455 20.0455 0; 0 0 0 4.1433 -4.1433; 0 0 0 0 6.2300; 0 0 0 0 0; 0 0 0 0 -4.1249; 0 0 0 0 0;3.4936 0 0 0 0; -0.3773 0.3773 0 0 0; 2.0878 0 -2.0878 0 0; 0 0 0.2704 0 0; 0 0 0.1542 0 0; 1.9793 0 0 -1.9793 0; 0 0 5.7228 -5.7228 0; 0 0 0 0.4591 -0.4591; 0 0 0 0 3.0309; 0 0 0 0 0; 0 0 0 0 2.0106; 0 0 0 0 0];

[0 0 0 0 0 0 0 0 0 -2.1092 0 1.0503 0 0 0 0 0 0 0 0 0 0;-4.6375 0 0 0 0 0 0 0 0 0 0 0 0.2823 0 0 0.2726 0 0 0 0 0 0 0; 0 0 -1.7283 0 0 0 0 0 0 0 0 0.1610 0 0 0 0 0.1579 0 0 0 0 0; 0 0 0 0 0 0 0 0 0 -1.9996 0 0 1.2123 0 0 0 0 0 0 0 0 0; 0 0 0 0 0 0 0 0 0 0 0 5.9746 7.2017 0 0 0 0 0 0 0 0 0;0 0 0 0 0 0 0 0 0 0 0 0 0.4775 0.4508 0 0 0 0 0 0 0 0; 0 0 0 0 0 0 6.2300 0 0 0 0 0 0 3.3361 0 0 0 0 0 -3.0495 0; 8.7981 0 -8.7981 0 0 0 0 0 0 0 0 0 0 0 0.2728 0 0.2772 0 0 0 0 0;0 0 0 0 4.1249 0 0 0 0 0 0 0 0 1.9742 0 0 0 0 1.8587 0 0 0;0 0 0 0 0 2.2418 2.2418 0 0 0 0 0 0 0 0 0 0 0 0 2.5099 -2.4896 0;0 0 0 0 0 0 0 0 15.1638 -15.7791 0 0 0 0 0 0 0 0 0 0 0 0; 0 0 0 0 0 0 0 0 0 4.7613 -4.8487 0 0 0 0 0 0 0 0 0 0 0; 0 0 0 0 0 0 0 0 0 -4.7087 0 4.9019 0 0 0 0 0 0 0 0 0 0; -0.2704 0 0 0 0 0 0 0 0 0 0 4.5253 0 0 4.6755 0 0 0 0 0 0 0; 0 0 -0.1542 0 0 0 0 0 0 0 0 1.7503 0 0 0 0 -1.7703 0 0 0 0 0; 0 0 0 0 0 0 0 0 0 -4.8592 0 0 4.9509 0 0 0 0 0 0 0 0 0; 0 0 0 0 0 0 0 0 0 0 0 -20.9276 20.6424 0 0 0 0 0 0 0 0 0; 0 0 0 0 0 0 0 0 0 0 0 0 3.8768 -4.0683 0 0 0 0 0 0 6.2683 0;0 0 0 0 0 0 3.0309 0 0 0 0 0 0 6.3133 0 0 0 0 0 0 0 0; 0.2706 0 -0.2706 0 0 0 0 0 0 0 0 0 0 0 9.1621 0 -9.0119 0 0 0 0 0;0 0 0 0 -2.0106 0 0 0 0 0 0 0 0 -4.0502 0 0 0 0 3.9784 0 0 0;0 0 0 0 0 2.4745 2.4745 0 0 0 0 0 0 0 0 0 0 0 0 2.2679 2.2555 0];

47

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 38-48 The Society of Digital Information and Wireless Communications, 2014 2013 (ISSN: 2305-0012) [11]

A. Abur and A. G. Expósito, "Power System State Estimation: Theory and Implementation." Boca Raton, FL: CRC, 2004.

[12]

Mallat, A wavelet tour of signal processing. Academic Press, 1998.

[13]

C. T. Huang, S. Thareja, and Y. J. Shin, “Wavelet based real time detection of network traffic anomalies,” in Securecomm and Workshops, 2006, pp. 1–7, 2006.

[14]

J.Gao, G. Hu,X. Yao, and R. K. C. Chang, “Anomaly detection of network traffic based on wavelet packet,” in Proceedings of the Asia- Pacific Conference on Communications (APCC ’06), pp. 1–5, Busan, Korea, August 2006.

[15]

Seong Soo Kim , A. L. Narasimha Reddy , Marina Vannucci, “Detecting traffic anomalies using discrete wavelet transforms“, Proceedings of International Conference on Information Networking (ICOIN), Busan, Korea.

[16]

Kosut, O.; Liyan Jia; Thomas, R.J.; Lang Tong; , "Malicious Data Attacks on Smart Grid State Estimation: Attack Strategies and Countermeasures," Smart Grid Communications (SmartGridComm), 2010 First IEEE International Conference on , Vol., No., pp.220-225, 4-6 Oct. 2010.

[17]

A. Monticelli, F. F. Wu, and M. Y. Multiple. Bad data identification for state estimation by combinatorial optimization. IEEE Transactions on Power Delivery, 1(3):361–369, July 1986.

[18]

Y. Liu and P. Ning and M. K. Reiter, “False Data Injection Attacks against State Estimation in Electric Power Grids”, Proc. of the 16th ACM conference on Computer and communications security, Nov. 2009.

[0.1; -2.5;3; 1;1.5];

[17.8938;-4.8379; -71.5305; 29.7217; -29.7013; 168.4239; 40.4534; -0.0244; -66.8719; -5.5095; 13.2331; 89.5476; -101.3154; -224.7588; 73.0451; -0.0801; -77.6946;55.6844;37.5025]; 9 REFERENCES [1]

Leirbukt, A.; Breidablik, O.; Gjerde, J.O.; Korba, P.; Uhlen, K.; Vormedal, L.K., “Deployment of a SCADA integrated wide area monitoring system”, Transmission and Distribution Conference and Exposition: Latin America, 2008 IEEE/PES, pp. 1 – 6., Aug 2008.

[2]

Hong Li; Weiguo Li, "A new method of power system state estimation based on wide-area measurement system," Industrial Electronics and Applications, 2009. ICIEA 2009. 4th IEEE Conference, pp.2065-2069, 2527 May 2009.

[3]

Monticelli, “Electric Power System State Estimation”, Proceedings of the IEEE, Vol. 88, No. 2, Feb. 2000 pp. 262-282.

[4]

L. Zhao, A. Abur, “Multi Area State Estimation Using Synchronized Phasor Measurements,” IEEE Transactions on Power Systems, Vol. 20, No. 2, pp. 611-617, May 2005.

[5]

XiaoYun Chen; DongMei Zhao; Xu Zhang, "A Novel Voltage Stability Prediction Index Based On Wide Area Measurement," Power and Energy Engineering Conference (APPEEC), 2010 Asia-Pacific ,Vol., No., pp.1-4, 28-31 March 2010.

[6]

Luitel, B.; Venayagamoorthy, G.K.; Johnson, C.E., “Enhanced wide area monitoring systems”, Innovative Smart Grid Technologies, pp. 1-7, Jan. 2010.

[7]

Seong Soo Kim; Reddy, A.L.N., "Statistical Techniques for Detecting Traffic Anomalies Through Packet Header Data," Networking, IEEE/ACM Transactions on, Vol.16, No.3, pp.562-575, June 2008.

[8]

L.L. Freris, A.M. Sasson, “Investigation of the LoadFlow Problem,” Proceedings of IEE, Vol. 115, No. 10, pp. 1459-1470, 1968.

[9]

Meikang Qiu; Wenzhong Gao; Min Chen; Jian-Wei Niu; Lei Zhang , “Energy Efficient Security Algorithm for Power Grid Wide Area Monitoring System”, IEEE Transactions on Smart Grid , Vol. 2, No. 4, pp. 715 – 723, Dec. 2011.

[10]

Denning, D.E., "An Intrusion-Detection Model," Software Engineering, IEEE Transactions on, Vol.SE13, No.2, pp. 222- 232, Feb. 1987.

48

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 49-62 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

CC-Case as an Integrated Method of Security Analysis and Assurance over Life-cycle Process 1

Tomoko Kaneko, 2 Shuichiro Yamamoto, 3 Hidehiko Tanaka 1 NTTDATA Corporation, 2 Nagoya University, 3Institute of Information Security 1 3-3-9 Toyosu, Koto-ku, Tokyo 135-8671, JAPAN 2 464-8601Furo-cho, chikusa-ku, Nagoya, Aichi, JAPAN, 3 2-14-1 Tsuruya-cho, Kanagawa-ku, Yokohama, Kanagawa 221-0835, JAPAN 1 [email protected], 2 [email protected], 3 [email protected]

ABSTRACT Secure system design faces many risks such as information leakage and denial of service. We propose a method named CC-Case to describe security assurance cases based on the security structures and thereat analysis. CC-Case uses Common Criteria (ISO/IEC15408). While the scope of CC-Case mainly focuses to the requirement stage, CC-Case can handle the life-cycle process of system design that contains the requirement, design, implementation, test and the maintenance stages. It can make countermeasure easily against the

situation which an unexpected new threat produced by invisible attackers incessantly. KEYWORDS Secure System Design Methodology, Security Requirement, Security Assurance, Assurance Case, Common Criteria, GSN, Risk Management, Life-cycle Process, ISO15206, ISO15408

1 INTRODUCTION Customers expect that IT products and systems satisfy the necessary conditions not to fall into any dangerous situations. Developers must build up IT products and systems avoiding many risks. Then risk management is important to support the development. We face many risks at every stage of system development such as requirement analysis, design, development, test and service provision stage. It is really

important to assure the countermeasure against risks in the each process of system development. It is important to handle security risks because security accidents have serious influences. However, there are no established methods to assure the validity of security risk up to this time. In this paper, we propose a security assurance case method against risks. This assurance case makes clear the elements to assure against system risks, and the process to argue with a customer and to get an agreement. It gives the way to make systems trustworthy by effective arguments with stakeholders. In Chapter 2, we explain assurance case, security assurance case and risk management. In Chapter 3, we show significance of assurance case, current system risks and handling by assurance case, difficulty of treatment for security risks. Chapter 4, we show the concept of proposed method and its provision to the life-cycle process. In Chapter 5, we explain example of concrete model and the merits of CC-Case. In Chapter 6, we show summary and future tasks. 2 RELATED STUDIES

2.1Assurance Case Assurance case, which is defined in ISO/IEC15026 part2, is a method for describing a system’s critical security level. Standards are proposed by ISO/IEC15026 [1] and OMG’s Argument Metamodel (ARM) and [2] Software

49

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 49-62 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) Assurance Evidence Metamodel (SAEM) [3]. ISO/IEC 15026 specifies scopes, adaptability, application, assurance case’s structure and contents, and deliverables. Minimum requirements for assurance case’s structure and contents are: to describe claims of IT products and systems properties, systematic argumentations of the claims, evidence and explicit assumptions of the argumentations; to structurally associate evidence and assumptions with the highest-level claims by introducing supplementary claims in the middle of a discussion. One common notation is Goal Structuring Notation (GSN) [4], which widely used in Europe for about ten years to verify system risk and validity after identifying risk requirements. Contents of GSN is shown below. Table 1. Contents of GSN

requirements, design, coding, and operation. The approach did not use the Security Target structure of the CC to describe Security assurance case. Alexander, Hawkins and Kelly overviewed the state of the art on the Security Assurance cases [6]. They showed the practical aspects and benefits to describe Security assurance case in relation to security tar-get documents. However they did not provide any patterns to describe Security assurance case using CC. Kaneko, Yamamoto and Tanaka recently proposed a security countermeasure decision method using Assurance case and CC [7-8]. Their method is based on a goal oriented security requirements analysis [9]. Although the method showed a way to describe security assurance case, it did not provide Security assurance case graphical notations and the seamless relationship between security structure and security functional requirements. 2.3 Common Criteria

Minimum requirements for assurance case’s structure and contents are: to describe claims of system and product properties, systematic argumentations of the claims, evidence and explicit assumptions of the argumentations; to structurally associate evidence and assumptions with the highest-level claims by introducing supplementary claims in the middle of a discussion. 2.2 Security Assurance Case Goodenough, Lipson and others proposed a method to create Security Assurance case [5]. They described that the Common Criteria provides catalogs of standard Security Functional Requirements and Security assurance Requirements. They decomposed Security assurance case by focusing on the process, such as

Common Criteria (CC: equivalent to ISO/IEC15408) [10] specifies a framework for evaluating reliability of the security assurance level defined by a system developer. In Japan, the Japan Information Technology Security Evaluation and Certification Scheme (JISEC) is implemented to evaluate and authenticate IT products (software and hardware) and information systems. In addition, based on CC Recognition Arrangement (CCRA), which recognizes certifications granted by other countries’ evaluation and authorization schemes, CC accredited IT products are recognized and distributed internationally. As an international standard, CC is used to evaluate reliability of security requirements of functions built using IT components (including security functions). CC establishes a precise model of Target of Evaluation (TOE) and the operation environment. And based on the security concept and relationship of assets, threats, and objectives, CC defines ST (Security Target) as a framework for evaluating TOE’s Security Functional Requirement (SFR) and Security 50

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 49-62 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) Assurance Requirement (SAR). ST is a document that accurately and properly defines security functions implemented in the target system and prescribes targets of security assurance. ST is required for security evaluation and shows levels of adequacy in TOE’s security functions and security assurance. 2.4 Risk Management Risk Management's goal is to increase the impact and probability of positive risks and decrease them for negative risks. The point is not only avoiding failure, but to bring about opportunities. Time and energy can be spent avoiding, transferring to a third party, and mitigating potential failures. They can be similarly spent on accepting, sharing with third parties and enhancing opportunities. It is task of Risk Management to determine how much time and energy should be on avoiding failures and promoting opportunities. Risk management includes six main processes in the theory of PMBOK [11]. These are risk management planning, risk identification, qualitative risk analysis, quantitative risk analysis, risk response planning, and risk monitoring and control. 3 ASSURANCE CASE FOR CURRENT SYSTEM RISK 3.1Significance of Assurance case Assurance case has been applied to the safety field [12] mainly. The largest benefit which we will enjoy by using the assurance case can perform the agreement argument for the demand between stakeholders enough. It is the point that can record the process of the reason / argument that reached the agreement result / conclusion. By certainly describing the following four points, assurance case offers the framework for building an argument more deeply. •Claim •Argumentation •Evidence

•Explicit Assumption We show significance of assurance case. Assurance case makes easy to confirm requirements by structured documentation. Requirements are verified that systems or services of target for evaluation are confirmed by assurance case. In addition, assurance case makes the basis clarified by the evidence that can achieve a goal. You can confirm the basis for judgment when a problem occurs. If the basis for judgment was validated by customer, you can use the evidence for legal basis. The true difficulty of risk management in developing software comes from invisibility of software. Developers must build up software system avoiding many risks. Although software is invisible, developers must show that software system can work correctly as customer needs. It is important that customer’s need records correctly as an agreement. Assurance case record verified evidence with stakeholder. Therefore, assurance case is useful as a consensus building tool for risk management. 3.2 Current System Risks and Handling by Assurance Case Risks in system development are categorized 3 types: Customer agreement risk, Business continuity risk, and System risk (Table2).Customer agreement risk has the risk of suits. Business continuity risk has the risk of communication. System risk has explanation risk of validation in developing activity. However, in conventional development, only system risks are considered, the others are not considered. A brand is suffered a big economic loss when 3 type of risks are mixed. Therefore appropriate treatment for negative risks (avoid, transfer, mitigate) is important. Assurance case for risk management is a method to avoid current risks in system development.

51

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 49-62 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) Table2 Risk treatment by assurance case

3.3 Difficulty of Treatment For Security Risks It is especially important to handle security risks. If any security accidents happened, the brand would suffered a big economic and honor loss. It is also difficult to treat security risks. Customer agreement risk of security of table 2 corresponds to suits risk when security accidents occur. To treat with this risk, it is necessary to have the evidences to show authentication of contractual customer’s agreements. Business continuity risk of security mainly is caused by service damages by attackers. To counter with this risk, it is necessary to have the evidences to show authentication of risk monitoring and control. System risk of security is the risk which occurs within development activities of IT products or security function of systems. To counter with this risk, it is necessary to have the evidences to show authentication of risk identification, analysis and counter plan. In this paper, the method to handle business continuity risk and system risk of security are focused. It is important to show objective evidences that the customer recognizes that his request

such as “The system is acceptably secure” is satisfied. 4 CC-CASE 4.1 Concept of CC-Case We propose a methodology of security analysis and assurance, named CC-Case, using assurance case (ISO/IEC15026) and CC (Common Criteria: ISO/IEC15408). Purpose of CC-Case is to solve several problems which we face at the development of secure systems which can handle more increasingly sophisticated threats. CC-Case can provide not only security requirement analysis method but also assurance according to the standard of CC. CC-Case contains the process which can clarify the scope of assurance for threats. It also contains the process which can verify the security specification based on CC using assurance case, and obtain an agreement of customer on the assurance. The procedures of CC-Case have dual-layer. Upper layer is named logical model. Under layer is named concrete model. Logical model and concrete model is shown Figure.1. Logical model shows the process structure developed in detail as much as possible independently of specific system. Logical model has life-cycle process and each stage’s process. Concrete model contains real cases corresponding to the specific system. The concrete model is decomposed logically until it describes evidences at the bottom layer. It makes up evidences as real case and approval results of customers. These evidences recorded in sequence can be used for verification. Customer’s requirements may change frequently. It is necessary to keep evidences depending on changes. CC-Case supports changes through storing all evidences in a database. Targets of CC-Case are IT products or systems. Although CC-Case is a method to make agreements with customers and developers, if we have no fixed customers, it is replaced by the person concerned on deciding requirements.

52

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 49-62 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) 4.2 Life-Cycle Support of CC-Case

. Figure1. Logical model and concrete model

We show the support of the life-cycle process of CC-Case. The life-cycle process of CC-Case contains whole processes of requirement, design, implementation, test, maintenance stages. However, in this research, we focus on the process of requirement stage. The life-cycle process of CC-Case should handle whole risks of security including business continuity risk of security. Figure3 is the life-cycle process of CC-Case. CC-Case uses GSN which is one common notation of assurance case. Using this assurance case, we explain the concept of the life-cycle process of CC-Case. In this case, top goal of assurance case is “IT products and systems using CC-Case are secure.” Explicit assumption of the argumentations is “CC”. The strategy shows to verify the process of life-cycle development.

Figure3. The life-cycle process of CC-Case

53

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 49-62 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

The strategy can be divided into 4 processes of the second goal, “Requirements using CC-Case is secure.”, “Designs using CC-Case is secure.”, “Implementations using CC-Case is secure,” and “Tests and deliveries using CC-Case is secure”. These goals need evidence which can verify the goals. The second goal of “Requirements using CC-Case is secure.” can be divided into 2 processes of the third goal, “Security specification using CC-Case is secure.”, and “Definition of development environment using CC-Case is secure.” through the strategy of “Verify security functions of requirements.” The third goal, “Security specification using CC-Case is secure.” is equivalent to the top goal of CC-Case at the requirement stage.

(1) Assurance case of Security Specification In the requirement stage, the procedures to make security specification are defined, and the documents which are necessary for ST (Security Target) are made. These procedures are defined as an assurance case, and produce evidences which give grounds of conformity with CC and agreements with customers. The assurance case of security specification can be classified into the stage of defining security concept, of making measures and of making security specification. Each stage’s logical model is shown. Also purpose, player, confirmation method for the output, and process (input, procedure, output) are clarified for every goal at the bottom layer of each stage. Figure4 shows the relationship of the procedures to make secure specification, inputs, evidences which give grounds.

4.3 The Requirement Stage of CC-Case

Figure4. Whole model of the requirement stage of CC-Case

54

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 49-62 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Figure5. Assurance case of Security Specification

Figure6. Defining security concept stage

55

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 49-62 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Figure7. Stage of making security countermeasures

Figure8. Stage of security specification

56

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 49-62 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) Figure5 is the top assurance case of the requirement stage of CC-Case. By branched their sub goals, more detailed tasks are decided. Therefore the leftmost sub goal in Figure5 is the top goal of Figure6. The center sub goal in Figure5 is the top goal of Figure7. The light sub goal in Figure5 is the top goal of Figure8. It is the logical model that is the procedures of defined tasks in Figure5, Figure6, Figure7, and Figure8. It goes into details Figure4. The concrete model shows the evidences of ST contents and the logical relationship in the real cases into the ST contents under the all bottom goals. The evidences in Figure4 are equivalent to the evidences of ST contents of the concrete model. The applied case of the concrete model is shown in Figure9. The method of assurance case is applied to these procedures. Therefore we insist that the secure specification can be made by keeping the procedure. Upper goals need to satisfy all lower goals. At the same time, these goals are sufficient for all the requirements to do. These are the unique points which are different from previous methods like simple flow charts of processes. They show this method’s integrity. The more detailed processes of 3 sub goals in Figure5 are shown as below. (2) Defining security concept stage The assurance case of the defining security concept stage is shown Figure6. In this stage security requirements are extracts, Taking into account of needs of customers, trends of market, security requirements, and security concept is defined. In this stage the process makes not ST, but the security concept. The security concept is examined repeatedly many times and decided between customers and developers. In this stage, security requirements for products by the viewpoints of users are collected and arranged. After enough extractions, security requirements, the goal of “target of evaluations is clear by setting the scope of analysis “, and the goal of “Security requirements or issues is clear.“ are verified.

Then defining security concept is validated with agreement of customers. CC-Case considers validity confirmation as assurance. (3) The stage of making security countermeasures In this stage (Figure7), with security concept as an assumption, CC-Case defines evaluation criteria, analyzes threats. It drafts, evaluates, and selects countermeasures. Then all the processes are verified that they are secure. There are 3 steps in this stage. In each step, the analysis and rationale of relationships between security requirements and relationships of logic are shown to customers. Each step defines the goal of validation that obtains an agreement to customers. Step1: At first the scope of the target of evaluation (TOE) is defined. EAL is selected as the level of assurance. The definiteness of evaluation criteria is confirmed. It is necessary to get approval of a customer. Step2: Define the threats models of information asset which is an object of protection and to perform analysis of a threat to the asset. Step3: Select countermeasures to carry out from countermeasures plan. Countermeasures which do not handle are managed as remaining risks. Next, the proof that proper selections are made is confirmed. It is necessary to get approval of a customer for the selections of security measures. These steps of making countermeasures define the systematic procedure, *1 Evaluation of criteria is validated. *2 Evaluation of countermeasures is validated. *3 Selection of countermeasures is validated. These procedures intend that security measures are agreed upon between customers and evidences are left. There are seven argument patterns of resolution and applied pattern [12] that Bloomfield showed on making assurance case. For the stage of making security countermeasures mentioned above, we made the comparison

57

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 49-62 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) pattern of substitute plan that was one of the applied patterns [12] as a reference model. The stage of making security countermeasures is equivalent to a process to make ST after having examined threat analysis and evaluation criteria. CC-Case did not just use ST item as a merely constituted process, we consider the following 5 points. *1 Application of the type pattern of assurance case for securing of validity of the evaluation to measures. *2 Validity confirmation process is set for the correspondence to a customer agreement risk by 3 steps. *3 Making ST items for guarantees of the CC conformity *4 Selecting enforcement measures *5 Taking into account of remaining risks (4)TOE summary specifications stage In this stage (Figure8), extended component definition, security function requirement, security assurance requirement, summary specifications are verified to be secure. It is necessary to get customer agreement. The security function requirement is made by selection of function requirement in CC part2 in order to establish the security objectives for the TOE as technical countermeasures. The Security Assurance Requirement is made by references in CC part3. When it is difficult to make the security function requirement and the Security Assurance Requirement using only CC, extended component is defined. Then use extended function requirement and assurance requirement. Summary specifications show the method to implement the security function in the real system. Customer agreement is validated for this security specification. 4.4 Logical model detailed The bottom goals of Figure 6-8 correspond to the top goals of the concrete model. Therefore the bottom goals are important, we examine those meanings closely.

We define purpose, player, confirmation method for the output, input, procedure, and output for the bottom goal. For example, we show the process on the bottom goal of “Threat analysis is secure.” which is the 5rd sub goal from the left in the stage of making security countermeasures. “Threat analysis is secure.” Purpose: We define a characteristic that TOE is going to deal with formal description technique, and the range of security. Player: developer Confirmation method for the output: verification Input: assets, security function Procedure: A threat consists of an adverse action performed by a threat agent on an asset. Therefore developer analyze a threat with an asset and a security function, extract an expected threat. Threat agents may be described as individual entities, but in some cases it may be better to describe them as types of entities, groups of entities etc. Examples of threat agents are hackers, users, computer processes, and accidents a threat is expressed by the name with prefix "T.". Example: T.ACCESS An unauthorized user carries out access and the operation to resources Output: a result of threat analysis 5 CONSIDERATION 5.1 Example of Concrete Model Figure9 and Figure10 show an example of concrete model. CC-Case describes real case by the concrete model. As an example of real case, we show ST of IPA [13] using CC-Case. As a result, we confirm that we can write the whole example in assurance case.

58

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 49-62 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) Though a security case has a commonality, a process does not become clear. Therefore, there is a problem of the utility to have low unevenness and efficiency of the quality in individual treatment with the making security case. To the contrary, CC-Case clarifies a commonality as a logical model, and there is the utility that can make a security case naturally by adding a concrete model. Also it has the good point to be able to leave the result as evidence. Figure9 is the concrete model that made the example that "the threat analysis is secure." in the stage of making security countermeasures of Figure7. In this case, User data, TSF data, backup data are equivalent to the assets need to be protected. This verifies that the proper extraction of the threats are made and described. In this case the threats extracted are illegal logon, unauthorized access, misuse, injustice, spoofing, disclosing network data, removable medium, and unexpected accident.

Knowledge assets are made by obtained know-how using CC-Case. The extraction of threat s such as the illegal logon becomes easy when we use the pattern of the threat that became a catalogue by knowledge assets. Each verification result is shown as evidence of the threats analysis such as T.ILLEGAL_LOGON, UNAUTHORIZED_ACCESS. Figure9 is equivalent to the example of the attached document of A.6.2 “threat” in CC Part 1. It includes verification means of the evidence in conformity with specifications of the ST. The contents of the evidence of “ T. ILLEGAL_LOGON” in the leftmost of Figure.9 is described as “An attacker may destruct, manipulate, and disclose a user data by pretending to be a fair user of TOE.” . Figure10 is the concrete model that made the example that "The Security Assurance Requirement is secure." in the stage of security specification of Figure8. Figure11 shows the contents of the evidence that is”6.2 Security assurance requirements”.

Figure9. Example of threat analysis

59

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 49-62 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Figure10. Example of security assurance requirements

6.2. TOE Security assurance requirements of TOE are shown as below. This TOE’s Evaluation Assurance Level is EAL3. All security assurance requirements use the security assurance component defined CC partt3 directly. (1)

Development(ADV) ADV_ARC.1 :security archtecture description ADV_FSP.3 :function specification with complete summary ADV_TDS.2 :archtecture design

(2)

Guidence document(AGD) AGD_OPE.1 :user operation guidence AGD_PRE.1 :preparation procedure

(3)

Life-cycle support(ALC) ALC_CMC.3 :manegementof permission ALC_CMS.3 :CM Scope of implementation representation ALC_DEL.1 :delivery procedure ALC_DVS.1 :identifidation of security method ALC_LCD.1 :difinition of life-cycle model by developer

(4)

Security Target Eavaluation(ASE) ASE_CCL.1 :conformance claim ASE_ECD.1 :extended component definition ASE_INT.1 :ST introduction ASE_OBJ.2 :security objectives ASE_REQ.2 :derived security requirements ASE_SPD.1 :security problem definition ASE_TSS.1 :TOE summary specifications

(5)

Test(ATE) ATE_COV.2 ATE_DPT.1 ATE_FUN.1 ATE_IND.2

(6)

:coverage analysis :test:base design :function test :independent test –sample

Vulnerability apprasal(AVA) AVA_VAN.2 :vulnerability analysis

Figure11. Evidence of security assurance requirement

60

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 49-62 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

5.2 The Merits of CC-Case CC-Case has many merits to solve several problems which we face at the development of secure systems. In this paper, we show the merits focus on its life-cycle process. The life-cycle process of CC-Case can be expected to establish the discipline and control in the processes of refinement of the IT products and systems during its development and maintenance. It strengthens the handling for system risk and business continuity risk by life-cycle support. The merits of handling system risk are mentioned below. CC-Case makes possible to improve the development method to handle. Life-cycle process using CC-Case establishes rules and control in the development and assurance of objective system and production by implementing its requirements to system and production correctly. For example, at design stage CC-Case makes easier to accept specification changes by using its logical traceability and evidences. It can expect the improvement of development method, reuse, productivity that is the problem of the assurance case. By defining development processes at each stage, CC-Case would be improved to a development method with assurance of life-cycle. It can keep assurance based on CC through the life-cycle process. If its scope was only requirement stage, it has only assurance at the time as the mere expectation. If its scope is whole life-cycle process, it is possible to assure the real products of long span. In other words, CC-Case with life-cycle support has different quality of assurance from CC-Case with requirement stage. The merits of handling business continuity risk are mentioned below. Security risks changes incessantly because invisible attackers exist, and an unexpected new threat occurs. The life-cycle support of requirement, design, test, and maintenance makes countermeasure easily against the situation which an unexpected new threat produced.

Life-cycle process of CC-Case can handle the business continuity risk by its monitoring and control process. It makes easier to handle the change of risks additionally. If you make a system or a production with CC-Case, it makes easy to cope with modification for a security accident. Because CC-Case has evidences of the argumentations to clarify a modification point for an essential cause to a security accident. At maintenance stage, CC-Case can be expected to improve reusability, productivity by reusing evidences stored according to defined process. 6 CONCLUSION 6.1 Summary We propose a methodology of security analysis and assurance, named CC-Case, using assurance case (ISO/IEC15026) and CC (Common Criteria: ISO/IEC15408). We explained the general concept. CC-Case has dual-layer. Upper layer is named logical model. Under layer is named concrete model. Logical model shows the process structure developed in detail as much as possible independently of specific system. Logical model has life-cycle process and each stage’s process. Concrete model contains real cases corresponding to the specific system. It makes up evidences as real case and approval results of customers. We show the detailed explanation of logical model, example of concrete model, and merits of the life-cycle process of the CC-Case. CC-Case strengthens the handling for system risk and business continuity risk by life-cycle support. 6.2 Future Tasks There are some unsolved issues in the CC-Case presented in this paper.

61

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 49-62 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) (1) We need to show detailed processes of stages over life-cycle except requirement. (2) A detailed selection process for remaining risks is important as a future task, because it implies specific assurance. In addition, describing measures in detail applied when unknown threats actually occur is important as a future task. We need to define remaining threats and reaction procedures for events caused by the remaining risks.

(SAFECOMP'97), Springer-Verlag, September1997 13. IPA: Security Target of A company individual information processing system application, in Japanese ,https://www.ipa.go.jp/security/jisec/index.html

7 REFERENCES 1. 2. 3. 4.

5.

ISO/IEC15026-2-2011,Systems and Software engineering-Part2:Assurance case OMG, ARM, http://www.omg.org/spec/ARM/1.0/Beta1/ OMG,SAEM,http://www.omg.org/spec/SAEM/1. 0/Beta1/ T. Kelly and R. Weaver, “The Goal Structuring Notation – A Safety Argument Notation, ” Proceedings of the Dependable Systems and Networks 2004 Workshop on Assurance Cases, July 2004 J. Goodenough, H. Lipson, and C.Weinstock, Arguing Security - Creating Security Assurance Cases, https://buildsecurityin.us-cert.gov/bsi/articles/knowled ge/assurance/643-BSI.html, 2007

6.

R. Alexander, R. Hawkins, T. Kelly, ”Security Assurance Cases: Motivation and the State of the Art,” , CESG/TR/2011, 2011 7. T. Kaneko, S. Yamamoto, H. Tanaka, Proposal on Countermeasure Decision Method Using Assurance Case And Common Criteria, ProMAC 2012, 2012 8. T. Kaneko, S. Yamamoto, H. Tanaka, “an Integrated Method of Security Analysis and Assurance using Common Criteria-based Assurance Case Using lifecycle-model”, ComSec2014, 2014 9. Yamamoto, S. Kaneko, T. and Tanaka, H.: A Proposal on Security Case based on Common Criteria, Asia ARES2013 (2013). 10. Common Criteria for Information Technology Security Evaluation, http://www.commoncriteriaportal.org/cc/

11. PMBOK,http://www.projectmanagement.net.au/ pmbok-risk-management 12. T. Kelly, J. A. McDermid, ”Safety Case Construction and Reuse using Patterns” in Proceedings of 16th International Conference on Computer Safety, Reliability and Security

62

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 63-71 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

An Adaptive Steganographic Method in Frequency Domain Based on Statistical Metrics of Image Seyyed Amin Seyyedi1, Nick Ivanov2 1

1,2

Department of Computer, Maku Branch, I.A.U, Maku, Iran Department of Electronic Computing Machines, Belarusian State University of Informatics and Radioelectronics 6, Brovki St, 220013, Minsk, Belarus [email protected], [email protected]

ABSTRACT Steganography is a branch of information hiding. A tradeoff between the hiding payload and quality of digital image steganographic schemes is major challenge of the steganographic methods. An adaptive steganographic method for embedding secret message into gray scale images is proposed. Before embedding the secret message, the cover image is transformed into frequency domain by integer wavelet. The middle frequency band of cover image is partitioned into 4×4 non overlapping blocks. The blocks by deviation and entropy metrics are classified into three categories: smooth, edge, and texture regions. Number of bits which can be embedded in a block is defined by block features. Moreover, RC4 encryption method is used to increase secrecy protection. Experimental results denote the feasibility of the proposed method. Statistical tests were conducted to collect related data to verify the security of method.

KEYWORDS Steganography, Wavelet, Steganalysis, Image Quality Metrics.

1 INTRODUCTION Nowadays the digital communication channels and Internet play important role in data transmission and sharing, hence there is a great need in providing security of information to prevent unauthorized access. This leads to new trends of confidential data transmission research. One of the methods increasing the privacy of data transmission is steganography. Steganography is technique of hiding confidential data in any form of media in such a way that no one, except the

intended recipient knows the existence of secret insertion [1, 2]. The main difference between steganography and cryptography is in the suspicion factor. Combining cryptography with steganography ensures better private communication. The digital images, videos, audios and other digital files can be used as a carrier for information embedding. Steganographic methods can be classified into two broad categories namely spatial-domain techniques and frequency-domain techniques. In spatial domain techniques, the secret messages are embedded directly into cover image. The simplest spatial domain method is the LSB (Least Significant Bit) approach. In frequency domain methods, the cover image is converted into frequency ranges and then the secret message is embedded into one of them. A frequency domain method, especially wavelet methods is more secure than other ones [3]. Steganalysis is the art and science of challenging the security of steganographic methods. First problem in steganalysis is in detecting the existence of the secret message in carrier [4]. The ability of steganalysis method depends on the payload of hidden message. Hence, this fact imposes an upper bound limit for embedding data, such that if the size of hidden data is less than upper bound, one may assert that the carrier is safe and the known statistical analysis methods cannot detect it [4, 5]. Therefore, a tradeoff between the hiding payload of a cover image and the detectability and quality of a stego-image is the main problem in steganographic schemes. For this reason an adaptive steganographic method based on integer wavelet transform to make the best tradeoff between payload and other criteria is proposed. After preprocessing the cover image,

63

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 63-71 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) the middle frequency band is partitioned into 4×4 non overlapping blocks. Amount of payload is determined based on characteristics of each blocks. In order to achieve higher security and authentication RC4 encryption method 40-bit key applied on secret message in advance. 2 BACKGROUNDS The section briefly explains some techniques utilized in this article. 2.1 Cover Image Adjustment During the embedding process in frequency domain, some coefficients will befall underflow/overflow after embedding secret message into these coefficients (in gray scale image underflow means the pixel value is smaller than 0 and overflow means that the pixel exceeds maximum value 255). In this case, during inverse wavelet transform lower/higher values are to be clipped and the secret message bits eventually will be lost. To overcome the underflow/overflow difficulty, a preprocessing instructions on the cover image need to be applied before the embedding. Hence, the cover image pixels C (i, j ) are adjusted as follow: [6, 7] C (i, j )  N / 2 if C (i, j )  255  N / 2 (1) C(i, j )   C (i, j )  N / 2 if C (i, j )  N / 2 where C ' (i, j ) denotes the adjusted pixel in spatial coordinates i, j. N is the argument to modify histogram of an image. The value of N is set to 30. 2.2 Integer Lifting Wavelet Transform Multi resolution analysis is the main theory in wavelets that analyzes a signal in frequency domain. One level 2D wavelet transform on an image, decomposes it into four bands, namely LL, HL, LH, and HH. The LL band represents the low pass coefficients and corresponds to soft approximation of image. Other three bands represent the high pass coefficient of the image that includes horizontal, vertical and diagonal features of the image respectively. The same decomposition can be repeated on the LL band. Basically, a digital image consists of integer samples. Unfortunately, wavelet filters return

floating point values as wavelet coefficients. When one hides data in the coefficients any truncations of the floating point values cause the corruption of the hidden information. To overcome this difficulty one can apply Integer Lifting Wavelet Transform (IntLWT) [8]. The lifting scheme is a technique for both designing wavelet and performing the discrete wavelet transform. The lifting scheme decomposes wavelet transform into three phases, split, predicate and update respectively. Figure 1 represents the generic scheme. An advantage of lifting scheme is that it does not require temporary storage in calculation steps and the inverse transform has exactly the same complexity as the forward one. In this paper biorthogonal CohenDaubechies-Feauveau (CDF 2.2) lifting scheme is chosen as a case study. The integer forward transform formulas of CDF 2.2 are as follows [9, 10]: S i  x 2 i Splitting:  , (2) d  x 2 i 1  i 1 1 Predicate: d i  d i   ( si  si 1 )   , (3) 2 2 1 1 Update: si  si   (d i 1  d i )  , (4) 2 4 where x denotes the original signal, and inverse transform formulas are: 1 1 Inverse update: si  si   (d i 1  d i )   , (5) 2 4 1 1 Inverse predicator: d i  d i   ( si  si 1 )  , (6) 2 2  x 2 i  si Merging:  (7)  x2i 1  d i 2.3 Rounding Method Rounding method is one of the ways for embedding secret message bits into cover image. The pixel value is modified into the nearest integer with the last LSB bits equal to the input bits. For example, assume that the data payload of the current pixel is found to be 3 bits. Then, the current pixel is equal to 102 or (01100110)2 and the input bits are equal to (100)2. According to the rule described above, the pixel value is altered to 100 or (01100100)2. The mathematical representation of rounding method is [11]: 64

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 63-71 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Figure 1 The lifting scheme

y  x  A  ( A  B)  B  ( B  A),

(8)

A  mod(m  x,2 ),

(9)

B  mod( x  m,2 )

(10)

where y, x, m, and c denotes the output value, input value, secret message and payload respectively.

Symmetric encryption method is recommended for steganographic methods. The symmetric encryption is a method that uses the identical key to encrypt and decrypt a secret message. In secure transmission of confidential data between parties, each party must agree on shared secret key. The security of encrypted data depends on the secrecy of the key. If attacker gains knowledge of the secret key, he can use the key to decrypt all the data. In this paper symmetric encryption method RC4 with 40-bit key is utilized to encrypt the secret message [6, 13].

2.4 Pixel Mapping Method

3. THE PROPOSED METHOD

Pixel Mapping Method (PMM) is a method for embedding two or four bits of secret message into the cover image. Data embedding is performed by mapping the secret message bits into each pixel based on some features of pixel. The state machine of pixel mapping method for embedding two bits is shows in figure 2. For example, assume that secret message bits are equal to (11)2 Then, the current pixel is equal to 34 or (00100010)2. According to the rule described in figure 2, the value of pixel is changed into 35 or (00100011)2 [12].

Payload in LSB method can be greatly improved by increasing the number of embedding bits. The more LSBs are used for embedding; more quality loss of stego-image is obtained, because pixels in an image cannot be undergone equal amounts of changes. The human eyes are very sensitive to changes of the gray value of pixels in smooth regions. A proper locations for hiding the secret message in digital images are regions with high contract, texture and high variations in its gray levels (edges), because this regions are very noisy and variations in these regions for hiding secret message is difficult to detect. An adaptive steganographic method based on Integer Lifting Wavelet Transform (IntLWT) is proposed in this article.

c

c

For extracting data the receiver can use the following formula:

m  mod( y,2 c ),

(11)

2.5 Encryption One of the approaches to satisfied security of steganographic system is cryptography.

65

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 63-71 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Figure 2 State machine for embedding two bits

Figure 3 Block diagram of proposed method

After preprocessing the cover image, IntLWT is performed. The middle frequency band of cover image is partitioned into 4×4 non overleaping blocks. Each block is categorized to different regions according to statistical metrics. The secret message bits are embedded in blocks that contain edge and texture. The block diagram of proposed method is shown in figure 3. 3.1 Embedding Region The cover images used in the proposed method are 256 gray-valued ones. After preprocessing of the cover image, the IntLWT is applied to it. The middle frequency band is partitioned into 4×4 non overlapping blocks. The Maximum Deviation ( MD ) and Entropy ( En ) are calculated for each block that respectively defined as: 1 4 4 (12) X   w(i, j ) , 16 i 1 j 1

MD(k )  Max{| w(i, j )  X | , i, j  1,2,3,4}, (13) where w is wavelet coefficients within block of dimension 4×4 and k is a block number. 16

En(k )   Pi log 2 Pi

(14)

i 1

where P is the probability of wavelet coefficients in each 44 blocks and k is the block number respectively. MD and En are vectors that comprise maximum deviation and entropy of each block. The counter k corresponds to blocks and its length is equal to number of blocks. For example for 512×512 cover image, the number of 4×4 blocks in middle frequency (HL or LH) is 4096. Each block of cover image is classified as smooth, edge and texture regions. The blocks in image for which MD (k) is greater than threshold T1 belong to non-smooth regions and others belong to smooth areas. The non-smooth regions for which

66

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 63-71 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) En (k) is greater than threshold T2 is specified as edge, other belongs to texture. The thresholds T1 and T2 are defined as: (15) T1    mean(MD), (16) T2  mean( En),

End. Step 9: Assemble middle frequency band from blocks. Step 10: Perform inverse wavelet transform to gain stego-image S .

where  (0    1) is a tradeoff factor that equilibrates payload and fidelity requirements.

4. EXPERIMENTAL RESULT

3.2 Embedding Algorithm The secret message embedding scheme to gain the reasonable tradeoff between hiding payload and quality of stego-image comprises of following steps: Input: Cover image C of size M × N and a secret message SE . Output: Stego-image S . Step 1: Read cover image C . Step 2: Read the secret message SE and perform the RC4 encryption method on SE . Step3: Apply cover image adjustment by formula (1) to image C . Step4: Perform one level IntLWT on the cover image. Step5: Divide the middle frequency band into 4×4 blocks. Step 6: Calculate MD and En values by formulas (13) and (14). Step 7: Compute thresholds T1 , T2 values by formulas (15) and (16). Step 8: Apply coefficient replacement process for block k as: IF DM (k) > T1 IF En (k)> T2 Embed 3 bits of secret message by rounding method into block k. Else Embed 2 bits of secret message by PMM into block k. End End IF all bits of secret message have been embedded Go to step 9 Else k=k+1, go to step 8

In this section, some experiments are carried out to assess the efficiency of the proposed method based on data payload and fidelity benchmarks [3]. The method has been simulated using the MATLAB 8.1 (R2013a) tools on Windows 7 version 6.1 platform. The secret message is generated randomly. All experiments were conducted on image database of BOSSBase (v0.92) [14]. Fundamentally, data payload of steganographic method is one of the evaluation criteria. Data payload can be defined as the amount of information that can be hidden in the cover image. The embedding rate is usually given in absolute measurement such as the size of the secret message or in bits per pixel, etc. According to proposed method, the tradeoff factor  expresses the regulator related to the threshold value T1 as shown in formula (15). So, the payload is linked directly to the tradeoff factors. Figure 4 shows the amount of payload for several values of  . If factor  goes to zero, the data payload increases. Usually, fidelity (invisibility) of the steganographic method measures by various image similarity metrics such as Mean Square Error (MSE), Peak Signal to Noise Ratio (PSNR) and Cross Correlation (CC).

The MSE between the cover image and the stegoimage is defined as follows: 1 M N MSE  (C (i, j )  S (i, j )) 2 . (17) 2 i 1  j 1 (M  N ) The PSNR is computed using the following formula: Max 2 (18) PSNR  10 log10 dB, MSE where Max denotes the maximum pixel value of the image. Higher PSNR value indicates the better quality of stego algorithm. Cross-Correlation (CC) is a measure of similarity of two images computed as:

67

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 63-71 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) M

CC 

N

 (C (i, j)   )(S (i, j)   ) 1

i 1 j 1

M

N

2

M

, (19)

N

 (C (i, j)   )  (S (i, j)   ) i 1 j 1

2

1

2

i 1 j 1

Values 1 and  2 are mean pixel values of the cover image and stego-image.

results shown in table 1, decreasing the  rate conflicts with similarity metrics, because in this case selected regions is not completely nonsmooth. Figure 5 shows the cover images and stego-images Barbara and Airplane with their corresponding histogram after embedding 6500 byte with  =0.7

Table 1 presents the image similarity metrics versus different message sizes. According to the

Figure 4 Amount of payload for several values of

(a)

(b)

(c)



(d)

(e) (g) (f) (h) Figure 5 (a-d) Cover images Barbara and Airplane with their corresponding histogram, (e-h) stego-images and their corresponding histogram

68

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 63-71 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) Table 1 Calculation various similarity metrics for middle frequency HL band

 =0.3

Similarity Metrics PSNR MSE CC

5000 Mean 42.49 4.159 0.9996

St. Dev. 2.091 2.558 0.0001

 =0.7

Length of embedding message (Byte) 10000 5000 Mean St. Dev Mean St. Dev 39.43 1.566 43.37 3.775 7.979 3.755 4.294 4.264 0.9994 0.0003 0.9997 0.0002

6500 Mean 42.43 5.018 0.9996

St. Dev 3.347 4.599 0.0002

4.1 Steganalysis of proposed method through IQMs Steganographic method is said to be undetectable or secure if the existence statistical tests cannot distinguish between the cover and the stegoimage. During the embedding process in the cover image some statistical variations are arises. The stego-image is perceptually identical but statistically differs from the cover image. The attacker uses these statistical differences in order to detect the secret message. I. Avcibas et al. [15, 16] showed that embedding of secret message leaves unique artifacts, which can be detected using Image Quality Metrics (IQMs). There are twenty six different measures that are categorized into six groups as Pixel difference, Correlation, Edge, Spectral, Context, and Human Visual System. I. Avcibas [17] developed a discriminator for cover image and stego-image using a proper set of IQMs. In order to select appropriate set of IQMs, they used analysis of variance techniques. The selected IQMs for steganalysis are Minkowsky measures M1 and M2, Mean of the angle difference M4, Spectral magnitude distance M7, Median block spectral phase distance M8, Median block weight spectral distance M9, Normalized mean square HVS error M10. The IQMs scores are computed from images and their Gaussian filtered versions with   0.5 and mask size 3×3 for selected IQMs [17, 18] as shown in figure 6.

Figure 6 Calculation IQMs scores

The variations in IQMs for proposed method with different rates of  (0.3, 0.7) with embedding the 9000 bytes in cover images were considered. From experimental results it can be perceived that difference IQMs between cover images and stegoimages of proposed method with  =0.7 is less than for  =0.3. Therefore proposed method for  =0.7 is more secured than for  =0.3. For the  =0.7 the warden cannot distinguish stego-image from the cover image. The variations in IQMs for M1, M7 and M9 are shown in figure 7 (a-c).

(a)

69

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 63-71 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

of  . As shown in figure 4, the different cover images give different results in term of data payload and fidelity of stego-image. The new approach intends for selecting reasonable cover image for steganographic methods. 6 REFERENCES 1. 2.

(b)

3.

4. 5.

6.

7. (c) Figure 7 Variations in IQMs

5- CONCLUSION AND FUTURE WORK The main goal of image steganographic techniques is to maximize embedding payload while minimizing the distortion rate and detectability of stego-image. The proposed adaptive method utilizes the characteristic of human visions sensitivity to gray value variation. The secret message is embedded into HL middle frequency band of cover image by recognizing the edge and texture regions. Using integer wavelet transform and RC4 encryption technology can enhance the reliability, improve the resistibility. Also tradeoff factor  affects requirements of proposed method. This parameter equilibrates the amount of data payload and fidelity of the stego-image. The Sender can make the best tradeoff between requirements based on the appropriate selection

8. 9. 10.

11.

12.

13.

Johnson, N.F., Jajodia, S.: Exploring Steganography Seeing the Unseen, IEEE Computer, vol.31, no.2, pp.26--34 (1998). Lu, S.: Steganography and Digital Watermarking Techniques for Protection of Intellectual Property, Idea group publishing (2005). Cheddad, A., Condell, J., Curran, K., Kevitt, P. M.: Digital Image Steganography: Survey and Analysis of Current Methods, Digital Signal Processing, vol.90, no.3, pp.727--752 (2010). Nissar, A., Mir, A.H.: Classification of Steganalysis Techniques, Digital Signal Processing, vol.90, no.6, pp.1758--1770 (2010). Chandramouli, R., and Memon, N.D.: Steganography Capacity: a Steganalysis Perspective, SPIE Security Watermarking Multimedia Contents, vol.5020, pp.173-177 (2003). Al-Ataby, A., and Al-Naima, F.: A Modified High Capacity Image Steganography Technique Based on Wavelet Transforms, International Arab Journal of Information Technology, vol.7, no.4, pp.358--364 (2010). Raja, K.B., Sindhu, S., Mahalakshmi, T.D., Akshatha, S., Nithin, B.K., Sarvajith, M., Venugopal, K.R., and Patnaik, L.M.: Robust Image Adaptive Steganography Using Integer Wavelets. In: proc 2008, Communication Systems Software and Middleware and Workshops (COMSWARE), pp.614--621. India (2008). Walker, S.: A Premier of Wavelets and Their Scientific Applications, CRC Press (1999). Sweden, W.: The Lifting Scheme, A Construction of Second Generation Wavelets, SIAM J. Math Anal, vol. 29, no.2, pp.511--546 (1997). Uytterhoeven, G., and Roose, D., Bultheel, A.: Wavelet Transforms Using the Lifting Scheme. In: proc 1997 (ITC-CSCC’99) International Technical Conference on Circuits/Systems computers and communications, pp.6251--6253. Japan (1997). Sarreshtedari, S., Ghobi, M., and Ghaemmeghami, S.: High Capacity Image Steganography in Wavelet Domain. In: proc. (2010) the 7th annual IEEE consumer communications and networking conference, pp.1--5. USA (2010). Bhattacharyya, S., and Sanyal, G.: Data Hiding in Images in Discrete Wavelet Domain Using PMM, International Journal of Electrical and Computer Engineering, vol.5, no.6, pp. 597--605 (2010). Smart, N.: Cryptography: An Introduction, McGrawHill College (2004).

70

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 63-71 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) 14. Image database of BOSSBase V (0.92), http://exile.felk.cvut.cz/boss/BOSSFinal/index.php 15. Avcibas, I., Memon, N., and Sankur, B.: Steganalysis Using Image Quality Metrics, IEEE Transaction on Image Processing, vol.12, no.3,pp.221--229 (2003). 16. Avcibas, I., Memon, N., Kharrazi, M., and Sankur, B.: Image Steganalysis with Binary Similarity Measures, EURASIP Journal on Advances in Signal Processing, vol.2005, no.1, pp.2749--2757 (2005). 17. Avcibas, I., Sankur, D., and Sayood, Kh.: Statistical Evaluation of Image Quality Measures, Journal of Electronic Imaging, vol.11, no.2, pp.206--223 (2002). 18. Mali, S.N., Patil, P.M., and Jaluekar, R.M.: Robust and Secure Image Adaptive Data Hiding, Digital Signal Processing, vol.22, no.2, pp.314--323 (2012).

71

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 72-83 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Behind Identity Theft and Fraud in Cyberspace: The Current Landscape of Phishing Vectors Thomas Nagunwa Department of Computer Science Institute of Finance Management Tanzania [email protected]

ABSTRACT Increased consumer anti-phishing awareness and improved anti-spam technologies has gradually reduced the impact of traditional phishing spams in recent years. To keep up their game in a multi-billion dollar cybercrime industry, hackers have been continuously innovative in developing polymorphic phishing vectors. Spear phishing, malware, search engines poisoning, use of rogue Secure Socket Layer (SSL) certificates, mobile and social media attacks are among the modern and prominent vectors for phishing attacks today. This paper examines today’s most adopted phishing vectors by cybercriminals as observed by security vendors, security analysts and anti-phishing campaigners. Learning the current landscape of these vectors is a key step in developing effective technological, social and legal devices to mitigate the impacts of these threats across the globe. The paper concludes that almost of all today’s phishing attacks begin with spear phishing. Phishers focus more their attacks towards small and medium enterprises. Malware toolkits are the key player in all major attacks. Mobile and social media attacks have rapidly grown recently and promise to be the future of phishing.

KEYWORDS Phishing, vector, website, URL, email, spam, online fraud, identity theft, malware, online credentials.

1

INTRODUCTION

The history of phishing goes as far as 1996, by then the word referred to fishing of online users’ identities from the sea of internet users and uses them for various malicious intents including fraud [1], [2]. Phishing involves different enticing skills

including technology and social engineering to trick online users into giving up their online credentials and purporting them to steal money or make unauthorized online purchases [3]. These credentials may include usernames, passwords, bank account numbers, credit card details, social security numbers, ATM PINs, birthdates, addresses and others [2], [4]. Over the years, phishing activities leading to fraud have caused many economic and social damages to online communities. In 2012, for instance, global online consumers experienced a total loss of US $110 billion when 556 million consumers were victimized in more than 30 million hacking activities including phishing [5], [6]. About 187.2 million online identities were stolen in 2011 for online fraud [12]. Some business brands have lost reputations and confidence towards sections of their online markets leading to fall of businesses [4]. Despite global efforts in imposing technological solutions, anti-phishing educational campaigns and anti-cybercrime legal frameworks, phishing has been relatively growing over the years. Phishers have been frequently evolving their techniques to elude advanced security technologies and lure even phishing-aware online users successfully [7]. From a tradition phishing approach of generic email scams, phishers are shifting towards targeted email scams, rogue websites, and deployment of sophisticated malware to infect legitimate websites and hosts to steal identities. Recent popularity of mobile devices and social media to online communities has opened new effective phishing vectors as security solutions deployed are still at immaturity 72

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 72-83 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) level while their users think they are in trust zones [6]. Cloud computing has shown signs as the future target of phishing as more businesses are converging their services and data to cloud.

Figure 1 Global growth of phishing attacks between 2010 and 2012 as detected by RSA [8].

This paper aims at exposing phishing vectors being deployed in recent years. Learning the trends of these vectors is a key step in educating online consumers about the existing online threats, risks and the need to adopt safe internet access practices. Online businesses, anti-phishing campaigners, security communities and legal societies should find this information as a basis to develop adaptive and effective strategies to educate and protect both online consumers and businesses. 2

Phishing spams are sent with ‘calls for action’ oriented message, convincing a user to respond in order to get a certain advantage [10]. These messages are crafted in such a way that they are seemed to be coming from legitimate businesses like banks, online retailers or payment processing providers customers have an association with. Some often used ‘calls for action’ crafted by phishers are; • The business has opened a new service and is freely offered to only first few confirming members, the rest will be required to pay for the service. Member is argued to respond for confirmation with personal details. • Unusual number of attempts of log in, high number of transactions or changes in settings has been observed at member’s account. To prevent fraud, the business has closed the account temporarily. To activate the account, member is required to log in • The business’s system has experienced breakdown and all accounts were shut down during servicing. The system is up now, to activate the account, member is required to submit log in details. • A failure to process a bill due to unverified payment details. A customer is required to verify the payment details through a given link [1], [10].

PHISHING SPAMS

Phishing email is a category of spam, an unsolicited email message, sent to multiples users to lure them to provide their online identities for impersonation. Phishing spam is the earliest phishing vector and is still being deployed by phishers though at a reduced rate due to the diversification of attacking methods in recent years [6]. In 2012, 1 in every 414 legitimate emails was observed to be a phishing one while the ratio was 1:299 in 2011 [6]. Up to 5% of all phishing emails sent globally succeed to lure recipients [9].

Figure 2 A typical phishing email purporting to be from AOL [11].

73

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 72-83 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

To respond to ‘call for action’, user is given a URL link which directs user to a malicious fake website or deformed legitimate website to which is either prompted to submit credentials or malware are installed to spy and steal the credentials. 2.1 URL Obfuscations To facilitate phishing email vector, phishers craft email links’ URLs to hide the actual URLs directing them to phishing websites. One of the ways is to design a link in html form in which the real URL redirecting the user is hidden underneath the visible html-crafted URL to avoid user’s suspicion. For instance, in the link below; https://oib.westpac.com.au/ib/default.asp user sees the first innocent URL but actually is directed to the second malicious URL [10]. The use of bad domain is largely applicable where a URL’s domain is slightly changed to look close to the real one. For example, https://www.paypal.com/cgibin/webscr?cmd=_login-run its domain can be modified from .com to .com.cn as shown below. User can hardly notice the difference. https://www.paypal.com.cn/cgibin/webscr?cmd=_login-run The real domain can also be placed as a path of a fake domain. http://2iphoto.cn/https://www.paypal.com/cgibin/webscr?cmd=_login-run User may think is accessing a PayPal website but is redirected to 2iphoto.cn site [13]. IP addresses of the phishing sites are used to hide their suspicious domain names [1], [13]. http://217.80.34.66/https://www.paypal.com/cgibin/webscr?cmd=_login-run

of the phishing URLs are very long which become easy to raise suspicions. To evade the scenario, phishers convert their URLs to short forms using free services offered by companies such as tinyurl and smallurl as legitimate websites do. For instance the URL http://0322.0206.0241.0043/http://signin.ebay.co m/ebayisapidllsignin.html can take a new short form of http://tinyurl.com/4 [1]. Latest trend in this approach is the use of phishers’ fake URL shortening services [15]. In this case, email link is crafted using legitimate short URL which when clicked, links to a fake short URL then to the phisher’s website [15]. 2.2 Malicious Domain Name Uses To spoof brands, phishers often ensure phishing websites’ URLs contain domain information of the brands. 94.6% of all phishing URLs use brands’ compromised domains [16]. This is achieved by hacking web hosts and then create rogue subdomains and their URLs. Also phishers, in other cases, register their own fake domains to domain registrars using fake company identities. Top level domains (TLD) popular to phishers are .COM which records 55.9% of all malicious domains, .NET (6.2%), .ORG (5.0%), .BR (2.5%) and .INFO (2.3%) [16]. One domain is often used to launch up to 20,000 unique phishing attacks through unique subdomains and URLs [16]. Of 89,748 domains used for phishing in the second half of 2012, 6.5% were the phishers’ domains [16]. 3

SPEAR PHISHING

This is a fast emerging and most practiced form of phishing emails in which phishes are a specific targeted group. The target can be specific customers of a particular service, bank or retail, employees or members of an organization, government agency or a social group [2]. About 91% of all cyber attacks today begin with spear phishing [17].

The use of third party’s shortened URLs is one of the popular methods used by phishers [14]. Most 74

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 72-83 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) In this vector, phisher crafts an email with a specific and relevant content to the targeted group to gain readers’ trust and then direct them to either click a provided malicious link or download a malicious attachment [18]. By responding to the email, user installs vulnerability exploit malware injected in the download or redirected to a phishing website to launch the attack. Once the attack is successfully, the hacker intrudes the user’s host or network and steals intellectual properties and financial assets for fraud [18], [19]. 94% of spear phishing attacks deploy malicious file attachments while 6% use links to malicious websites [18]. Common file formats used for malicious attachments are RTF, XLS, ZIP, RAR, DOC, DOCX, PPT and PDF [7], [19]. To craft relevant email content, phisher spies a target before the attack through a previous intrusion or from online information about the organization or its specific key members. Ease availability of detailed information such as email addresses and corporate reports from corporates’ websites and social media has been the key driving factor for the growth of this form of attack [17]. 10% of all spear phishing attacks are directed to financial sector for online fraud [7]. Other major victims are governments (65%), activists (35%) and manufacturing industries (22%) [7]. Since 2006, one hacking group was found to have attacked more than 100 companies and organizations using spear phishing vector [20]. 4

VISHING

It is a use of telephone to lure users to reveal their personal details to phishers. There are two forms of vishing, by phishing email or telephone [2], [21]. By email, phisher craft an email pretending to be from a legitimate business, bank or law enforcement explaining about customer’s account being compromised, for instance. To help investigation or activating the account, customer is required to call a toll-free telephone number provided as a link and asked to provide account details [21]. Voice over Internet Protocol (VoIP) has been used for calls in this form of vishing [2].

In vishing by telephone, phisher deploys VoIP software to call a customer pretending to be an employee of a legitimate company the customer has a business with [21]. The software is well crafted that the number or call ID appears to be a legitimate one while using also professional sounding automated service line such as those used in large firms [2]. When calling, customer is prompted for account details to either activate/reset his account or for maintenance purpose. Vishing has been growing over the years as users are becoming more educated on phishing spams and their failure to determine legitimate and fake calls. In UK, for instance, 23% have experienced telephone vishing while in 2012 alone, customers lost about £7m from vishing activities across the country [22]. 39% of UK population admits a challenge in differentiating genuine calls from fake ones [22]. 5

MALWARE

Use of malicious programs for identity theft is one of the most preferred and effective vectors by phishers today. Malware can be deployed by phishers to install other malware, spy and steal host’s user data, change host configurations and block host’s access to operating system (OS) or to some applications [6], [23]. Malware are distributed and installed in the hosts through drive-by downloads, compromised websites or malvertising [3], [6], [23]. In drive-by download channel, a malware is injected in an email attachment file and distributed through a phishing spam or spear phishing vectors. When a file is downloaded, the malware exploits a particular vulnerability, usually of an OS, Microsoft office, acrobat reader, Java platform, web browsers, web plug-ins or other common applications, to infect the host. Peer to peer file sharing also plays a major role in this channel. A compromised legitimate website through injection of malicious codes is another common channel. Hacker attacks a web host and then 75

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 72-83 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) injects a JavaScript or a piece of a code in a website that dynamically download vulnerability exploiting malware payload when a particular web page is visited [6]. 1 in every 532 websites was found to be compromised by malware in 2012 [6]. Malvertising is an online advertising with a malware embedded advertisement (ad). In this approach, phisher designs an ad with an injected malware and then pay for it to be hosted and advertised by legitimate ad websites [6]. When the ad is clicked, a dynamic malware is installed by exploiting a specific vulnerability of the host. Trojans, malware kits and bots are the major forms of malware that have been deployed by phishers in recent years. Trojans

be trapped and then sent to the phisher. Alternatively, the proxy can be used to intercept a transaction such as bank transfer and falsify some of the information to the phisher’s advantage before committing data to a legitimate server [26]. A bank trojan, SilentBanker.D, was used to falsify online bank statements [14]. Some variants of trojans have been designed as scareware/ransonware. They usually block host OS or some of the applications/files and while posing as law enforcement or anti-malware software, enforce users to pay some money or submit credit card details as a way to regain access [23]. PGPCoder trojan used a typical approach by encrypting users’ data files and forces user to pay a ransom to re-access them [23]. About 250,000 of new and unique ransonware were observed globally in the first quarter of 2013 [27]. 3% of the victims agree to pay the money [14].

Trojans are the most deployed malware by phishers, contributing to 77% of all malware attacks [3]. These are malware installed for malicious operations including stealing credentials or linking the host to a botnet [14]. Trojans are used for phishing mainly in four channels; as key and screen loggers, as rogue websites, as web proxy, as scareware or as spam relays [14], [23], [24], [25], [26]. Once installed, a trojan can be designed to spy a user’s access to particular websites mostly of banks. When a specified website’s login page is accessed, it takes screen shots of the page or log in typed data and then sends to phisher’s server [14], [26]. Zeus family of trojans, one of the dominant trojans in 2012, used this approach [14]. In the other form, a trojan, launches a fake log in web page looking similar to a legitimate one [25]. User’s data is then submitted and sent to the phisher’s server. PWSteal.Bancos was one of the trojans to deploy the method. In other cases, a legitimate log in page is launched but with a hacker injected log in window [14]. Trojan can act as a web proxy such that it redirects web traffic to phisher’s DNS and then connects to a phishing server [25]. Log in credentials can all

Figure 3 A typical ransonware in action [14].

Trojans have been observed to be deployed as spam relays. They are connected to a botnet, receive spams from botnet spam generators and then distribute to recipients’ mail servers [25]. Through this, they play a key role in hiding true identities of spam generators. Malware Toolkits These are commercially developed advanced malware to allow cybercriminals with less or without programming skills to undertake hacking 76

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 72-83 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) operations [12]. Toolkits are designed to perform multiple phishing operations using different techniques. Operations may include creating backdoors, key and screen logging, generation of phishing websites, ransonware and others [12], [28]. Use of malware kits have been growing each year due to the growth of online black markets selling these products [8]. 61% of all attacks through malicious websites use toolkits [12]. The toolkits take advantage of vulnerabilities in OS, web browsers, web databases, web plug-ins and applications to infect hosts [28]. Lizamoon toolkit took advantage of web database vulnerability to attack 4 million websites through SQL injection [29]. Malware toolkits are effective over a long time because they are usually designed to dynamically generate unique signatures in each attack [28]. Also they are often updated into new versions and to each version, there may be so many variants used in different phishing attacks [6], [12].

Figure 4 Top web-attacks malware toolkits in 2012 [6].

Bots Bots are programs installed in multiple hosts controlled and commanded by remote hacker’s servers to coordinate malicious activities [30], [31]. Bots are often distributed by spams, worms or malicious websites and installed through backdoors or exploitation of vulnerabilities [31]. Bots coordinated by the same servers and serve the same purposes form a botnet.

Due to their abilities to infect up to millions of hosts in short time, leading to massive impacts, botnets of malware toolkits have been the most attracting phishing approach. Bots are the main generators of spams, contributing to more than 81.2% of all global spams [12]. They can also be used as backdoors, stealers of users’ online credentials for particular websites or as ransonware [31]. Zbot, a botnet of Zeus trojan, is one of the notorious botnets observed in recent years, with 3.6 million infected hosts in US only [14]. The bots use various phishing methods to steal users’ online banking credentials and then send to remote phishing servers. 6

PHARMING

Pharming is the hijacking of domain name services to deviate DNS requests to spoofed websites for malicious operations [32]. This is done by altering DNS server/cache IP entries or adding new bogus entries. Potential DNS attacking points are host file (local host DNS cache), LAN’s DNS server, ISP’s DNS server and home wireless router [32]. Pharming can be achieved through infection of malware, injected web based malicious codes, sniffing traffic between hosts and DNS servers or hijacking DNS server administrators’ accounts [32]. Home or small office users with wireless routers are the most vulnerable to this attack as more than 50% of them observed to use routers with default settings or without passwords, while 95% of them allow JavaScript in their browsers [32]. Through phishing spams, users can be lured to click attached links which lead to downloading of malware or injection of poisonous web browser scripts. The infections then launch access to router to edit some of specific DNS entries to point to phishing sites. By hijacking administrative privileges of DNS servers through numerous ways, hackers can malconfigure the entries unnoticed. Sniffing of hosts – DNS traffic can allow hacker to learn sequence of requests’ IDs generated, and then be 77

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 72-83 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) able to spoof DNS server responses by directing users to fake IPs [32]. New form of pharming was observed by Symantec in the wild in 2012, using free DNS services offered by afraidDNS [33]. Attackers used a weakness of the company’s services by creating spoofed webpages as subdomains of legitimate domains using afraidDNS’s free DNS services [33]. Users thought they were directed towards subdomains of original websites but fell into phishing traps. 7

SQL INJECTION

SQL injection attack is the incorporation of SQL statements with genuine user’s SQL queries into SQL based applications to perform malicious activities [34], [35]. The attack takes advantage of inability of the applications to validate user inputs [36]. Once the attack is successfully, SQL injected codes can expose user credentials in the database, display database administrator’s log in credentials, add a new administrator or allow hacker to gain access to a host OS [34], [35], [36]. Web based SQL applications have been the most victims of the attacks due to their global audiences and therefore difficult to trace the attackers [11]. Also most of dynamic websites use SQL databases management systems such as MySQL, MS SQL and Oracle SQL to host millions of customers’ information potential to lucrative frauds [11]. Many of these sites lack strong input validation implementations making them vulnerable to this attack, without web masters’ knowledge. In 2011, about four million websites were attacked by the biggest ever SQL injection attack known as Lizamoon [37]. Users of the infected websites were infected with scareware which report that their machines are infected with viruses. To clean up, they had to purchase hacker’s rogue antivirus thus their monies were stolen [36]. Typical SQL injection can be executed in two ways, by adding a code within the original SQL query or by adding a new malicious query to form multiple queries [11], [36]. For instance, in

SELECT * FROM Users WHERE UserID=’ $ID’ AND Password=’$pwd’; hacker can enter userID as ’ OR 1=1 - - instead of a genuine username, forming a new query SELECT * FROM Users WHERE UserID=’ ’ OR 1=1 - - AND Password=’$pwd’; 1=1 is always true, - - terminates the query meaning that a table of users will be displayed without knowing the password, exposing all accounts [34], [35], [36]. 8

CROSS SITE SCRIPTING

Cross site scripting (XSS) attack occurs when an attacker uses a web application to send malicious code, generally in the form of a browser side script, to a different end user [38]. The attack takes advantage of invalidation of user’s inputs and the retrieved contents from the application’s server [34]. On successful attack, hacker can be able to capture users’ session cookies thus compromising their accounts, installing malware into the web browser or host, monitoring web browsing habits or redirecting users to a phishing website [38], [39]. XSS malicious code is injected in the form of scripting languages most notably JavaScript. Html tag with attributes such as onload, onmouseover or onerror is also used to inject the codes [38]. There are two forms of XSS, reflected XSS and stored XSS. In reflected XSS, user is enticed to click on the URL link provided in the phishing spam. The linked XSS code injected rogue website prompts user to provide inputs and then forward them to the vulnerable website’s server along with XSS code [38], [39]. The server returns the content with XSS code which in turn is executed at the browser and then hijacks user session. For stored XSS, user’s inputs are stored into the application’s database along with the XSS code. When the same data set is retrieved by any other user to the browser, the code is also invoked and executed to hijack users’ sessions [34], [38], [39]. Samy worm, in 2005, deployed this technique to infect more than one million user accounts of MySpace in just 24 hours [40]. 78

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 72-83 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) 10 ROGUE SSL CERTIFICATES 9

BLACKHAT SEARCH OPTIMIZATION (SEO)

ENGINE

This is a technique used by hackers to poison a search engine such as Google, Yahoo or Bing by injecting and ranking highly their malicious links when popular trends or events are searched [41]; [42]. Hackers design their malicious websites and then inject keywords of popular events such as Halloween and Olympics or link their sites to blogs/websites with articles about those events [42]. When search results are presented, malicious links are usually ranked highly compared to legitimate links due to their richness in the keywords. When links are followed by users, they inject malware into the web browsers or hosts leading to phishing activities. 92% of the poisoned search results contain images of the related events while 8% being text based results [41]. Bing is found to be the most victimized engine with 65% of all poisoned results, Google being the second with 30% [41].

There has been a tremendous increase in the use of SSL encrypted channels by websites to protect their communications from man-in-the-middle attacks [6], [12]. Some websites like Facebook, Google, Twitter and others have extended the use of secured channels from log in pages only to other non-transactional pages [12]. Hackers have also started to adopt the authentication approach to legitimize their malware or phishing websites by signing them with genuine looking SSL certificate keys. One way of hackers to acquire the certificates is by purchasing them from Certificate Authorities (CA) using fake company identities. CAs with improper validation procedures of certificate requests have fallen in this trap in many cases. For instance in May 2012, authors of banking trojans in Brazil were able to purchase certificates from Comodo, the CA, using false identities, to sign their trojans and then launch massive phishing attacks [6], [44]. In other cases, hackers break into CAs’ networks and generate fraudulent certificates and use them in multiple attacks [12]. In 2011, DigiNotar, a Dutch CA, was attacked by one hacker leading to generation of 500 certificates which were then used for Google related hacking activities [44]. 11 MOBILE PHONE PHISHING

Figure 5 A typical BlackHat SEO with image based malicious link ranked second in a search result [43].

The popularity of smartphones among global phone users has encouraged businesses to innovate mobile services such as mobile payments or bank transfers, making mobile phones as one of the most attractive arenas to phishers. Phishers have developed variants of malware to hijack smartphones, using mobile OS exploits, to steal users’ data or sending premium rated contents [6], [7]. With a market share of 72%, android OS is the main target of mobile threats followed by Apple’s iOS at 14% [6]. More than 1,300,000 android malware belonging to few malware families and variants were observed in the wild in 2012 only, raised by 58% 79

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 72-83 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) compared to 2011 [6], [7]. 40.6% were premium service abusers, 24.9% data stealers, 22.8% malicious downloaders while the rest were adware, click fraudsters and rooters [7]. Publicly reported mobile OS vulnerabilities have also been increasing over the years, reaching 415 in 2012 only, suggesting that as more exploits are found, phishers keep developing new malware variants to use these exploits [6].

Figure 6 Growth of android malware between 2011 and 2012 as observed by Trend Micro [7].

Malware are distributed through official app store such as Google play or third party stores by trojanized legitimate apps [6], [12]. Once infected app is downloaded, it gets installed by exploiting a particular OS vulnerability. Malware can scan and steal personal data from the phone, data to be entered in a particular mobile website or monitoring web browsing habits of the user and then send to the phisher’s server [7]. In some cases, malware can access and purchase apps from mobile app stores on behalf of the phone owner [23]. A malware can silently generate and frequently send premium contents such as sms to premium rate numbers while the bill is charged to the phone’s owner [6], [7], [12]. A phishing sms can be generated and sent by a malware, containing a link to a phishing website which when accessed by a user, personal data can be stolen [45].

Large botnets of mobile malware have been observed in the wild, controlling hundreds of thousands of mobile phones [6]. 12 SOCIAL MEDIA PHISHING

AND

WEB

2.0

Scams through social networking sites such as Facebook, Twitter, Instagram, Tumblr and others appear to be a popular attacking trend today, Facebook being the leading victim [6], [23]. Phishing via social media grew from 8.3% of all phishing attacks in 2010 to 84.5% in 2011 [8]. These attacks are preferred by phishers because users often trust messages from their friends while users’ messages can quickly propagate to large communities of friends [6]. Also the media are easy source of personal data as users expose a lot of their credentials such as birthdates, mobile numbers, friends’ contacts, addresses, driving license details and even credit card particulars [6], [23], [46]. Phishers use these data to break user accounts’ passwords, collecting email address lists to send phishing spams, spreading phishing messages in the social media or making illegal online purchases [6], [23]. Obtained mobile numbers observed to be used for smishing by sending premium rated contents [23]. Phishing messages are spread with messages and/or images often about product offers accompanied with phishing links [6], [23]. To complete processing the offers, users are supposed to follow the links which direct them to submit their personal data. In other cases, the links lead to installation of phishing malware. Fake “likes” can be embedded in an image based Facebook post which when clicked, leads a user to a phishing website or installation of a phishing malware [23].

80

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 72-83 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) leading to a breach of thousands of usernames and passwords of users [23]. 14 CONCLUSION

Figure 7 A typical twitter scam message with a phishing link [6].

Web 2.0 sharing media sites such as YouTube and Digg have also experienced phishing attacks of their own forms [42]. Phishers, using their bogus site accounts, spread enticing comments to articles/videos of popular events [42]. These comments contain malicious links which lead to phishing websites or downloading of phishing malware. Hackers have also been able to inject rogue web browsers adds-on to allow display of their phishing advertisements in web 2.0 sites. When these ads are clicked, they lead to phishing websites or downloading of phishing malware. Wikipedia has experienced this form of attack through a fake Google Chrome add-on [23]. 13 CLOUD COMPUTING ATTACKS 37% of global businesses had already adopted cloud services by 2011 and increased by 20% in 2012 [6], [12]. From emails, financial data to advertisements, companies today go cloud to reduce their ICT operational costs as well as data security risks. Though cloud providers have heavily invested in security measures, consolidation of massive data at one point is a lucrative attraction to hackers [6]. Intrusion can start from a cloud client or through cloud provider, possibly through spear phishing, to hijack provider’s infrastructure. Few incidents of cloud attacks have been reported so far but it is promising to be one of the biggest vectors in the near future. For instance, in 2012, Dropbox file sharing cloud service was hacked

Spear phishing and use of malware are the leading vectors in all phishing attacks today. Though traditional phishing spam rate is decreasing every year, it still contributes significantly in numbers of global phishing attacks. New phishing trends have emerged in recent times such as malvertising, search engine optimization, mobile phishing and social media attacks and promise to be the leading vectors in the near future. New corporates’ trend towards cloud is already attracting phishers as massive harvests of data are centralized at fewer points. SQL injections, XSS and pharming vectors are still persistent over the years and appear not to vanish in the near future. Phishing has been successful mainly due to ignorance and lack of awareness of users and businesses on the best practices of internet and computer safety. Failure to frequently patch operating systems and applications expose many software vulnerabilities that are exploited by malware. Users still find it hard to distinguish phishing emails and calls from legitimate ones. Majority of home users use default settings of their wireless routers. Some web developers do not validate inputs of their web applications. Many businesses today have mobile websites and social media accounts which increase availability of their information for spear phishing while exposing their customers to mobile and social media attacks. Businesses need to invest more in securing their applications, limiting their online information and have specific programs to train their employees and customers on anti-phishing practices. Specific security training software such as phishme can be used by firms to practically engage their staff in learning spear phishing. Also firms must have specific email gateway solutions with spear phishing protection capabilities such as proofpoint. Websites of online businesses should be equipped with interesting programs to educate their customers on ways to evade phishing traps. 81

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 72-83 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Cloud service providers must invest heavily in security measures, from multi-layer secured infrastructure approach to strong authentication procedures to access clients’ applications and data. Zero-day vulnerabilities should immediately be addressed by software vendors and initiate imminent public awareness of the patches. Online business community must abandon the use of SSL certificates for extended SSL certificates (EV) to prevent phishers from buying rogue certificates from certificate authorities (CA). 15 REFERENCES 1.

Ollman, G., (2004), “The Phishing Guide: Understanding and Preventing Phishing Attacks”, The Next Generation Security Software. 2. Banday, M.T., Qadri, J.A., (2007). “Phishing - A Growing Threat to E-Commerce,” The Business Review, 12(2): 76-83. 3. Anti-Phishing Working Group (APWG), (2012), Phishing Activity Trends Report 4th Quarter 2012, APWG. 4. Lynch J., (2005), “Identity theft in cyberspace. Crime control methods and their effectiveness in combating phishing attacks”, Berkeley Technology Law Journal, 20: 259-300. 5. Symantec Corporation, (2012), “2012 Norton Study: Consumer Cybercrime Estimated at $110 Billion Annually”, Symantec Press Release. Available at: http://www.symantec.com/about/news/release/article.jsp ?prid=20120905_02 [Accessed September 2013]. 6. Symantec Corporation, (2013), Symantec Internet Security Threat Report 2013, Symantec Corporation. 7. TrendLabs, (2012a), Evolve threats in a ‘post-pc’ world, TrendLabs Annual Security Roundup, Trend Micro. 8. RSA, (2013), The year in phishing 2012, EMC. 9. Dhamija, R., Tygar, J., Hearst, M., (2006), “Why Phishing Works?”, Proceedings of the conference on Human factors in Computing Systems (CHI-2006), pp 581-590. Available at: ACM Digital Library [Accessed September 2013]. 10. Emigh, A., (2005), “Online Identity Theft: Phishing Technology, Chokepoints and Countermeasures”, Radix Labs. 11. Nagunwa, T. (2008), Investigation of data privacy threats in online retail industry and assessment of strategies used in mitigating their impacts, Msc Thesis, Dublin Institute of Technology.

12. Symantec Corporation, (2011b), May 2011 Intelligence Report, Symantec Corporation. 13. Garera, S., Provos, N., Chew, M., Rubin, A. (2007), “A Framework for Detection and Measurement of Phishing Attacks”, Proceedings of the 2007 ACM Workshop on Recurring Malcode, pp 1-8. Available at: ACM Digital Library [Accessed September 2013]. 14. Symantec Corporation, (2011a), Defend your institution against Trojan-aided fraud: Banking Trojan, Symantec Corporation. 15. Symantec Corporation, (2011c), Internet security threat report: 2011 trends, Symantec Corporation. 16. Aaron, G., Rasmussen, R., (2013), Global Phishing Survey: Trends and Domain Name Use in 2H 2012, Anti-Phishing Working Group (APWG). 17. Savvas, A., (2012), “91% of cyberattacks begin with spear phishing email”, Tech World. Available at: http://news.techworld.com/security/3413574/91-ofcyberattacks-begin-with-spear-phishing-email/ [Accessed September 2013]. 18. Ashford, W., (2013), “FBI warns of increased spear phishing attacks”, Computer Weekly. Available at: http://www.computerweekly.com/news/2240187487/FB I-warns-of-increased-spear-phishing-attacks [Accessed September 2013]. 19. TrendLabs, (2012b), Spear-phishing email. The most favored APT attack bait, Trend Micro. 20. Cinstantin, L., (2013), “Mandiant report on Chinese cyberespionage used as bait in spear-phishing attacks”, Computer World. Available at: http://www.computerworld.com/s/article/9237036/Man diant_report_on_Chinese _cyberespionage_used_as_bait_in_spear_phishing_attac ks [Accessed September 2013]. 21. Hicks, D., (n.d), “Vishing: Another Internet Fraud Scam”, Federal Reserve Bank of Boston, Available at: http://www.bos.frb.org/consumer/spotlight/vishing.htm [Accessed September 2013]. 22. The Guardian, (2013), “'Vishing' scams net fraudsters £7m in one year”, The Guardian. Available at: http://www.theguardian.com/money/2013/aug/28/vishin g-scams-fraudsters-seven-million-pounds [Accessed September 2013]. 23. PandaLabs, (2013), PandaLabs Annual Report 2012, Panda Security. 24. Stewart, J. (2003), “Reverse Proxy Spam Trojan, Migmaf”, http://www.secureworks.com/research/threats/migmaf/ [Accessed July 2008]. 25. Levy, E., Arce, A., (2004), “Criminals Become Tech Savvy, Security and Privacy”, IEEE Communications Society, 2 (2): 65-68.

82

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(1): 72-83 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) 26. Westervelt, R., (2010), “Security report finds rise in banking Trojans, adware, fewer viruses”, Search Security, Available at: http://searchsecurity.techtarget.com/news/1378277/Secu rity-report-finds-rise-in-banking-Trojans-adware-fewerviruses [Accessed September 2013]. 27. McAfee Labs, (2013), “McAfee Threats Report - First Quarter 2013”, McAfee. 28. Sophos, (2013), Security threat report 2013, Sophos. 29. PC Tools, (2011), “Lizamoon: A Serious SQL Injection Attack”, PC Tools, Available at: http://www.pctools.com/security-news/lizamoon-aserious-sql-injection-attack/ [Accessed September 2013]. 30. Berinato, S., (2006), “Attacks of the Bots”, Wired Magazine, 14:11. 31. Schiller, C., Binkley, J., Harley, D., Vron, G., Bradley, T., Willems, C., Cross, M., (2007), “Botnets the Killer Web App”, Syngress Publications. 32. Stamm, S., Ramzan, Z., Jakobsson, M., (2006), “Driveby Pharming”, Indiana University School of Informatics and Cumputing, Technical Report TR641. Retrieved from: http://www.cs.indiana.edu/cgibin/techreports/TRNNN.cgi?trnum=TR641 [Accessed September 2013]. 33. Sharf, E., (2012), “Christmas-Themed Facebook Scams: How Cybercrooks Kick it up a Notch and Piggyback on Big Brands”, Websense Security Labs Blog, Available at: http://community.websense.com/blogs/securitylabs/arch ive/2012/12/06/merry-xmas-on- facebook.aspx [Accessed September 2013]. 34. Kals, S., Kirda, E., Kruegel, C., Jovanovic, N., (2006), “Secubat: A Web Vulnerability Scanner”, Proceedings of the 15th international conference on World Wide Web, pp 247-256. Available at: ACM Digital Library [Accessed September 2013]. 35. Rietta, F., (2006), “Application Layer Intrusion Detection for SQL Injection”, Proceedings of the 44th Annual Southeast Regional Conference, pp 531-536. Available at: ACM Digital Library [Accessed September 2013]. 36. The PHP Group, (2013), “PHP SQL Injection”, The PHP Group, Available at: http://php.net/manual/en/security.database.sqlinjection.php [Accessed September 2013]. 37. Jacob, J., (2011), “What is Lizamoon, the viral scareware that infected four million websites”, International Business Times, Available at: http://www.ibtimes.com/what-lizamoon-viral-

38.

39.

40.

41.

42. 43.

44.

45.

46.

scareware-infected-four-million-websites-278289 [Accessed September 2013]. OWASP, (2013), “Cross-site Scripting (XSS)”, OWASP, Available at: https://www.owasp.org/index.php/Crosssite_Scripting_(XSS) [Accessed September 2013]. Glynn, F., (n.d), “XSS Cheat Sheet Prevent Cross Site Scripting Attacks, Injections”, Veracode, Available at: http://www.veracode.com/security/xss [Accessed September 2013]. Auger, R., (2011), “Cross Site Scripting”, Web Application Security Consortium, Available at: http://projects.webappsec.org/w/page/13246920/Cross% 20Site%20Scripting [Accessed September 2013]. Leyden, J., (2012), “Bing is the most heavily poisoned search engine, study says” The register, Available at: http://www.theregister.co.uk/2012/10/08/bing_worst_se arch_poisoning/ [Accessed September 2013]. PandaLabs, (2009), Annual Report PandaLabs 2009, Panda Security. Arfunnis, (2010), “Searching for ‘Ileana Tacconelli’ leads to Fake Adobe Flash Update and TDSS”, PC Tools, Available at: http://www.pctools.com/securitynews/ileana-tacconelli-fake-adobe-flash-update-tdss/ [Accessed September 2013]. Ragan, S., (2012), “Comodo Certificates Used to Sign Banking Trojans in Brazil”, Security Week, Available at: http://www.securityweek.com/comodo-certificatesused-sign-banking-trojans-brazil [Accessed September 2013]. Tufts, A. (2012), “How to protect your Android against ‘smishing’”, One Click Root, Available at: http://www.oneclickroot.com/how-to/how-to-protectyour-android-against-smishing/ [Accessed September 2013]. Kharouni, L., (2012), “The Dangers of Posting Credit Cards, IDs on Instagram and Twitter”, TrendLabs Security Intelligence Blog, Available at: http://blog.trendmicro.com/trendlabs-securityintelligence/the-dangers-of-posting-credit-cards-ids-oninstagram-and-twitter/ [Accessed September 2013].

83

International Journal of Cyber-Security and Digital Forensics (IJCSDF) Published by The Society of Digital Information and Wireless Communications Miramar Tower, 132 Nathan Road, Tsim Sha Tsui, Kowloon, Hong Kong

Volume 3, Issue No. 2 - 2014

Email: [email protected] Journal Website: http://www.sdiwc.net/security-journal/ Publisher Paper URL: http://sdiwc.net/digital-library/browse/66

ISSN: 2305-0012

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 84-92 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Comprehensive Solution to Mitigate the Cyber-attacks in Cloud Computing Jitendra Singh Computer Science Department PGDAV College, University of Delhi-110065 [email protected]

ABSTRACT Cloud computing is a web based utility model. Cybercrimes effecting web based system are equally applicable in the cloud computing. Considering the sensitivity and the damage that can be caused by cybercrimes, this work is an effort to study the various cyber threats and methods to mitigate them. Firstly, it highlights the cloud usage as per the various services model (IaaS, PaaS, and SaaS) and the potential threats that are applicable to these services model. Further, it highlights the industries that are widely attacked by the adversaries. Finally, we proposed a comprehensive five phased model to mitigate the cyber-crime. Action needed in each phase has been discussed in detail. Proposed model suggests the implementation of security in all the spheres of cloud. It starts with the sanitizing of the user’s computer, implementation of security during transmission and ends with the data storage in the cloud. Adhering to the suggested model, cyber-attacks can be mitigated to a great extent. KEYWORDS Cyber security, cloud attack, cybercrime, resource protection, cloud utilization I.

INTRODUCTION

Cloud computing is utility based model that can be accessed through the web. In this paradigm, user does not need to maintain the infrastructure that is

managed and maintained by the cloud provider. Cloud paradigm offers comprehensive solutions related to software services, development environment and Infrastructure offerings. Consequently, its offered services are categorized into Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS) [15]. All these services can be offered via its different cloud deployment model including private, public, hybrid, and community model [1]. Cloud computing is profoundly appealing due to its agility and self-provisioning of the resources. It enables the users to provision and de-provisioned the resources of their own in minimum time, with no or minimal support from the provider. Provisioning of extra resources is a huge challenge in legacy system as it requires extra investment, cumbersome approval procedure that requires weeks to month for the approval depending upon the organization purchase policies. In addition, it is also appealing due to minimum upfront cost for IT resource procurement, increases reliability, throughput and availability [10]. Consequently, it is equally appreciated and applicable to new as well as established enterprises. Resource provisioning is facilitated by the data center. In majority of the cases, data center are located outside the data owner’s country and managed by the cloud provider. Its capability of non-maintenance of resources, availability of data center outside the owner’s country is posing

84

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 84-92 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) significant cyber threats. Recent cyber-attack occurred on this paradigm have shaken the user’s confidence [11-14]. Cloud attacks are not confined to a single type of industry instead, affecting various industries including, manufacturing, finance, public sectors companies, etc. [2]. Considering the significance of cloud and recent cyber-attacks, it is imperative to review the threats applicable to this paradigm, nature of adversaries attacking them and the potential damage that can be caused to this evolving paradigm.

II.

UTILIZATION OF CLOUD RESOURCES

drive, Sky drive, Dropbox, etc. have also emerged. They are profoundly adopted due to their anytime, anywhere access. However, due care is to be observed before storing the data into the cloud drive, since a number of incidents already occurred that have caused the data loss[11,12]. Recently, Aberdeen group [4] has carried out the study to determine the usage of public cloud. As per their findings, SaaS is the preferred choice of the subscriber’s and has been ranked as first. Whereas, cloud storage and IaaS model managed to get second and third rank, respectively. Among the public model, hybrid is the less used model. Same is illustrated in the figure 1.

To cater the wide variety of users, cloud offers III. CLOUD COMPUTING AND SECURITY numerous services types from its different service models. Software as a service, storage as a service, Security remained a matter of concern since legacy recovery services, etc. are some of the prominent system, despite of the fact that it uses on premise cloud services. Storage service is a key requirement model of resources and outside access is not for the users who intend to save high infrastructure permitted. However, security is of much concern in cost of data center, data center security cost, cooling the cloud environment due to the remote access and cost, etc. It is boon for the small and medium data owner’s non-control of resources. Due to the enterprises (SME) due to their inability to purchase remote location of user and the resources, costly and modern equipment needed to safeguard transmission of data takes place between the user’s the data. Amazon simple storage services [3], PC to cloud service provider. Therefore, security Microsoft cloud storage are some of the major issues that need to be managed are as follows: offerings under this category. Recently, to cater the  Security of the data during transmission need of individual user’s, services like Google from user’s node to the service provider’s 90 node. 80  Sharing of resources among many 70 subscribers. utilization 60  No control of resources. 50 40 80  Many of the times, data is stored outside the 30 owner’s own country. 52 20

36

10

26

21

Cloud Recovery

Cloud Hybrid

0 SaaS

Cloud Storage

IaaS

Figure 1: Public cloud usage

In addition, security in the cloud is also dependent on the type of cloud deployment model opted by the user. In IaaS, security challenges are different, whereas they are entirely different in PaaS and SaaS environment. Considering the security dependency on deployment models, security threats applicable

85

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 84-92 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) in these deployment models have been discussed in the upcoming sub-section: A.

 

process, particularly in secure software development life cycle (SSDLC).

Threats in IaaS

In the IaaS model, users are subscribing for the infrastructure that is under the control of the cloud service provider. The type of security offered by the cloud provider cannot be physically ascertained by the subscriber. Prominent threats exist in the IaaS model has been discussed as follows:  

Physical security of infrastructure. Safeguarding the resources from external attacks such as DDoS.



Establishing the identity and granting the resources as per the user’s authorization. Protecting the accessibility of resources

 

among users. Protecting the bandwidth utilization by the attackers.

from

mis-

In the IaaS environment, all the above threats need to be managed. For economic point of view, last threat is extremely critical. Since, mis-utilization of the resources by the adversaries will deprive the legitimate user(s) from access of the resources and may also cause him to pay huge bill. B.

Threats in PaaS

In the PaaS environment of cloud computing, users (mainly developers) are subscribing for the development resources that are needed to develop software. Windows azure, Google Application engine (GAE) are some of the prominent examples under this category. In PaaS environment, potential threats that exist have been discussed as follows:  

Safeguarding the code of a developer from any external entity. Security of the code during access, transmission and while working on the code.

Protecting the intellectual property. Transparency in software development

C.

Threats in SaaS environment

In the SaaS environment of cloud computing, users are accessing the resources using web browser or thin client [10]. To establish the remote resources/application connection, user need to pass his credential, if he supplied data is correct, connection is established. Once the connection is established then for the whole session, users can pass the information. However, SaaS environment is also not without any threats. Major threats that exist in the SaaS environment have been given as follows: i. ii. iii.

Weak credentials. Insecure protocols Web based application flaws

In cloud computing, username and passwords method is most widely used method for authenticity. Although, other methods such as token and smartcard based authentication exist, still username/password is widely used for login. Selection of username and password is not dependent on the cloud provider; instead, it entirely depends on the user to select the strong password. Similarly use of protocols, for instance, HTTP, FTP, Telnet, etc. are widely used in internet. However, using these protocols is not safe in cloud environment and poses tremendous threat to user’s data. Finally, the other threat is the web based application flaws that is attributed due to the extensibility of the application to the cloud environment. Application that are deployed in web based are not exactly suitable for cloud environment, therefore they may not justify their utilization and billing.

86

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 84-92 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) IV. STUDY

ON

CYBER-ATTACKS

IN

CLOUD

COMPUTING

Cyber-attacks have emerged as attractive business for the adversaries. In 2012, the UN Office on drugs and crime has estimated the cost of transnational organized crime as $870billion or 1.2% of global GDP [5]. It estimated that $600 billion of this figure came from illegal drug trafficking. If cyber cost losses also cost the same figure, cost could be more than $600 billion. Consequently, it is attracting the attacker from the various fields to do more fraud to increase their profitability. A.

Prominent types of cyber-attacks in cloud

Presently, clouds are considered as more secure than the on-premises model [6]. Better security in cloud paradigm is attributed to the availability of modern devices at cloud data center. Cloud providers are committed to offer better security than on-premises model to avoid any bad name that may result due to the potential cyber-attack. Despite of better security, various attacks already took place in cloud environment. Brute force, Malware/Botnet, web app attack, etc. are the major cyber-attacks that are witnessed in the cloud environment. However, on-premises are more prone to malware/Botnet attacks. In cloud, malware/Botnet contributes 11.3% whereas in on-premises model, it is 31.6% [6]. B.

Figure 2: Attack incident rates as per the industry

Majority of the attacks are caused by the outsiders and consisting of 50% of the total attackers [7]. However, malicious insider is also cause of cyberattack, and causing 20% of the overall attacks. The key objectives of these attacks were to harm the ICT users in one way or the other.

Industry targeted and intention of target

As per the IBM security services (cyber security intelligence index), finance and infrastructure, ICT, health and social services, are attracting the cybercriminal in great number. From the total cyberattacks that took place, 20.9% were directed towards finance and insurance domain. Various industries targeted and the percentage of attack have been illustrated in figure 2.

Figure 3: Categories of attackers V. EXISTING SOLUTION

Considering the significance of security in cloud computing, a number of measures have already been initiated. Cloud security alliance, NIST, ENISA, etc. are some of the prominent example of the cloud security groups. These organizations identify the security threats and recommends

87

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 84-92 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) measures to strengthen the security. A number of researchers are already working in this area. Some of the major work has been discussed in the upcoming sub-section. A.

Literature review

Considering the significant of cloud security, numerous initiatives have been taken by the cloud providers and the governments in various countries. Work carried out by [8] is one of the major recommendations that has similarity to our work, whereas our work is the extension of [8] and suggests the comprehensive solution. Therefore, it is worth discussing it in detail, and same has been discussed in the upcoming sub-section.

B.

Existing recommendation to counter Cyber Crime and security risks

To identify the various cyber-crime and security risks, we have reviewed [8]. This work has categorized them into three as: crime and security risks involving cloud service provider, tenants and attacks targeting the transmission data. Same is depicted in table 1[8].

Table 1: Cyber crime types and their categorization 1. Crime and security risks involving cloud service providers

2. Crime and security risks targeting cloud tenants

3. Attacks targeting the transmission of data

Authentication issues

cloud computing tenants

Session hijacking and session riding

Denial of service attacks and Phishing botnets

Man-in-the-middle attacks

Use of cloud computing for Domain name system Network/packet sniffing criminal activities attacks Illegal activity service providers

by

cloud Compromising device accessing cloud

Attacks on physical security Insider abuse of access

Access issues

the the

management

Malware Side channel attacks Vulnerabilities applications

in

software

Cryptanalysis of insecure or

88

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 84-92 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) obsolete encryption SQL injection

and

A.

VI. PROPOSED SOLUTION

to cloud

secure

Subscribing accredited provider

Establishing connection

Selecting country

secure

The above [8] has recommended the factors that need to be applied to mitigate the cyber-attack. In our work, we have proposed the comprehensive model, based on awareness, to enhance the security

Security of the data is extremely dependent on the location of the data center. For instance, if the country is affected with terrorist activities then in such places data security will be extremely low. Considering the importance of country’s location for the cloud computing, a research has been conducted by BSA [9].This survey ranks the country by grading them on the scale of 10, where more point indicates that country is more secure. By utilizing the survey’s result, security can be determined in a specific country. Research report revealed that South Korea has scored the minimum marks therefore; it is highly risky to subscribe for the data center, which are located in South Korea. It based

In the proposed solution of [8], Encrypting data stored in the cloud, Intrusion detection and prevention systems and network monitoring, and Implementing multifactor authentication have been considered as the methods which can use be used in addressing majority of security threats and cybercrime [8].

Selecting secure country

Safeguarding workstation

a) Technical prevention measures b) Physical security c) Organizational policies, awareness training

establishing the secure connection, physical security by safeguarding the resources, and third party based security to address the cyber-crime and security risk in cloud computing. Each phases have been illustrated in figure 4, and discussed in the upcoming sub-section.

Third party security

To minimize cyber-crimes and security threats mentioned in table 1, authors have proposed three main methods as:

Figure 4: Cloud based model to counter cloud cyber attacks

in the cloud computing. The various phases to counter the cyber threats have been depicted in figure 4. To secure the cloud and to mitigate the risk, it considers selecting the secure country, subscribing to the accredited cloud provider,

is followed by Canada that has also been considered as less secure. Whereas, the countries like Japan, Germany and France has been projected as most secure in the survey, these countries have secured 10 marks.

89

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 84-92 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

10

9.4

10

10

9.6

8.8

8.8

9

8.8

6.8

8.2 6.2

4.8

Figure 5: Ranking of the countries as per the cloud security

Based on the result illustrated in figure 5, if the subscriber(s) need highest level of security then they should subscribe to any of the data center located in country who have secured 10 marks. Selection of country should depend on:  

Vicinity from the user’s location. Compliance issues that subscriber need to adhere. More vicinity to cloud data center would result in less latency, whereas compliance to regulatory acts will avoid any legal conflict in future. B.

Subscribing to accredited cloud provider

Majority of the cloud providers are claiming that they are offering maximum security. However, it cannot be trusted by the subscriber till it has not been certified by the third party. To claim the security offering, cloud provider minor or the major need to be accredited to various standards, to reflect the level of security offered by them. Selection of reputed and accredited cloud provider plays a significant role in countering security. Therefore, it is significant that users should verify the accreditation of cloud provider to certain standards, such as ISO-9001, ISO-27001 applicable for cloud

provider. Accreditation also need to be cited on the web site for the reference of user. Security will be more in cloud resources where the cloud providers have accreditation. Once the reputed and established cloud providers has been opted then data center safety is managed by the cloud provider, for instance users can subscribed to Desktop as a Services(DaaS) where the resources are stored in data center based in UK. It is also providing enterprise based security equipped with firewall. In this case, security will be managed by the UK based provider offering DaaS. C.

Virtual private cloud

In line with the Virtual private network, cloud can also be accessed using virtual private network; such types of cloud are known as virtual private cloud (VPC). It is continuously increasing in popularity. Virtual private cloud offers cloud capabilities similar to that of virtual private network. As a result, security during cloud usage improves manifold. Many of the cloud providers, for instance, Amazon offers VPC with the flexibility to assign the IP address as decided by the cloud users. Resources that need to be shared and needed for exclusive use can be assigned different IP address. Assigning different IP address will further improve the security. Cloud providers are also offering group

90

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 84-92 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) security and the security policy to improve the security for the individual users accessing the cloud computing. By subscribing to the VPC, users can avail the security during connectivity from the client end to the cloud provider’s end. D.

Avoid downloading and installing software on the client, used for cloud access

Cloud offers wide network access that facilitates accessing using various devices, including desktop, laptop, and mobile devices. Considering the malware that are used to attack the cloud, it is significant that the client devices used to access the cloud need to be safeguarded from any malware. To secure the client from malware, following measure can be helpful: 

Not installing the software that is not needed.

E.



Using the reliable browser.



Updating the browser from time to time.



Periodically checking malware infection.

of

devices

for

Third party based Security

To minimize the upfront cost in cloud, powerful resources such as processor, memory and hard disk is available at cloud provider side. User is accessing the cloud with the help of client having less power. However, client devices need to be secured from various types of threats including virus, malware etc. to safeguard the cloud resources. Purchasing the updated and sophisticated software for the client end may be costly, particularly for the SMEs. Therefore, such users can opt for the cloud based security offered by the third party. These third parties have the updated software, skilled personnel, modern devices, etc. that are deployed to counter the security attacks. Deploying cloud similar modern devices and hiring human skill may not be feasible for the SMEs and individuals.

VII.

CONCLUSION

Cloud computing is proliferating with amazing pace and perceived as substitution for legacy based system. However, it is facing tremendous challenges from cyber threats which are different for different services model. These cyber threats and attacks are affecting the various types of industries, including finance, manufacturing, web hosting, etc. Attacks intensity in these industry varies considering the benefit that can be realized by the cloud attackers. Numerous methods exist to counter the cyber threats, however, they are not considering the all aspects of cloud threats. To mitigate the cyber threats, we have suggested a five phased comprehensive model. This model has considered the various aspects of security in cloud paradigm, such as selection of provider, data center location, installation of software, etc. Suggested model can manage the security at the client side, during transmission and security needed at cloud storage. If the potential user(s) subscribing for the cloud is adhering to the suggested model, cyberattacks can be mitigated to a great extent and probable data loss can be avoided. REFERENCES [1]

P. Mell, & T. Grance, "The NIST definition of cloud computing", special Publication 800145, National Institute of Standards and Technology, available at http://csrc.nist.gov/ publications/ PubsSPs.html#800-145 (2011).

[2] Verizon, “Data breach Investigation report”, 2013 available at (www.verizonenterprise. com/DBIR/2013 ) accessed on (25 Feb 2014) [4]

Aberdeen group (2013).The emerging industry of cloud computing. Analyst insight

[5]

http://www.unodc.org/unodc/en/frontpage/20 12/October/transnationalcrime-proceeds-inbillions-victims-in-millions-says-unodcchief.html

91

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 84-92 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) [6] AlertLogic,“Targeted attacks and opportunistic hacks”, State of cloud security report, 2013. [7]

IBM (2011). IBM security service cyber security intelligence index. IBM Global technology services security services.

[8] A. Hutchings, R. G Smith & L. James, “Cloud computing for small business: Criminal and security threats and prevention measures”, Trends & issues in crime and criminal justice, No. 456, May 2013. [9]

zon_bitbucket_outage/ (accessed on 1 Feb 2014) [15] G. Boss, P. Malladi, D. Quan, L. Legregni, H. Hall, “Cloud computing”, 2009. http://www. ibm.com/developerswork/websphere/zones/hi pods/ library.html.

BSA(2013), BSA Global Cloud Computing Scorecard, a Blueprint for Economic Opportunity available at http://cloudscore card.bsa.org/2013/assets/PDFs/BSAGlobalCl oud Scorecard20 13.pdf (accessed on 20 Feb. 2014).

[10] B. Hayes, “Cloud computing, Communications of the ACM Vol. 51, No.7 pp. 9-11 (2008). [11]

A. Hall, “ Recent phishing attack targets select Microsoft employees”, 24 Jan 2014 available at https://blogs.technet.com/b/trustworthy computing/archive/2014/01/24/post.aspx (accessed on 01 Feb 2014)

[12]

C. Green, “Dropbox hit by Zeus phishing attack”, Oct 2013, available at http://www. information-age.com/technology/security/ 123457411/-dropbox-hit-by-zeus-phishingattack

[13] R. Westervelt, “Phishing attack, stolen credentials sparked South Carolina breach”, available at http://searchsecurity.techtarget. com/news/2240172466/Phishing-attackstolen-credentials-sparked-South-Carolinabreach?asrc=EM_NLN_19698566&track=NL -102&ad=883490 [14]

C. Metz, “DDoS attack rains down on Amazon cloud”, Oct 2009, available on http://www.theregister.co.uk/2009/10/05/ama

92

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 93-105 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Security and Integrity of Data in Cloud Computing Based on Feature Extraction of Handwriting Signature Ali A. Yassin1, Hikmat Z. Neima2, and Haider Sh. Hashim1 1 Computer Science Dept., Education College for Pure Science, Basrah University, Basrah, 61004, Iraq, 2 Computer Science Dept., Science College, Basrah University, Basrah, 61004, Iraq, [email protected]

ABSTRACT Cloud Computing gains users to store their data into the cloud as the remotest manner so that they can be comforted from the trouble of local data save and maintenance. The user loses the control of his remotely located data. This feature has many security challenges such as the authority and integrity of data. One of the significant concerns that require to be addressed is to assure the user of the integrity i.e. rightness of his data in the cloud. Continuously, the user cannot access to cloud’s data directly. So, the cloud must provide a technique for the user to ensure if the integrity of his data is protected or is compromised. In this paper, we propose the use of encrypted data integrity by presenting the feature extraction of handwriting signature in a modern encryption scheme that preserves the integrity of data in cloud server. Any prohibited data modification, removal, or addition can be detected by cloud user. Additionally, our proposed scheme presents a proof of data integrity in the cloud which the user can know the truth of his data in the cloud server. We employ user’s handwritten signature to secure and integrate his data in cloud server. Extensive security and performance analyses view that our proposed scheme has highly efficient and provably secure. In addition, the performance time decreases and the compensation ratio of data integrity is increases.

KEYWORDS Cloud computing, handwriting, Feature Extraction, data integrity, Security

1 INTRODUCATION Recently, we are witnessing an increasing interest in cloud computing: Many Internet vendors including Amazon, Google and Microsoft have

introduced various cloud solutions to provide computing resources, programming environments and software as a services in a Pay-As-You-Go manner. For example, Amazon introduces Amazon Elastic Compute Cloud (EC2) which provides computing cycle as a services and Google introduces Google App Engine (GAE) to provide programming environments as a service [1, 2]. This interest of cloud computing is due to its significant features which can be summarized according to the National Institute of Standards and Technology (NIST) as follows [3] (1) Ondemand self-service: A user can be unilaterally supplied with computing facilities; (2) Wide network access: All the services can be obtained through Internet; (3) Resource pooling: Service provider’s computing resources are available ondemand and for multiple users. Large number of physical and virtual resources can be automatically assigned and reassigned according to the user’s demand; (4) Rapid flexibility: Services and resources can be dynamically scaled up and down; (5) Measured service [3, 4]. In particular, cloud components are gradually more popular even though a doubting security and privacy problems are slowing down their acceptance and success. Indeed, saving of user data in a cloud server regardless of its benefits has several interesting security concerns which require to be extensively studied for making it a trustworthy solution to the issue of averting local storage of data. Several issues such as data authority and integrity i.e., how to proficiently and securely guarantee that the cloud storage 93

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 93-105 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) server side preserves truthful and complete results in reply to its user’s queries [4, 5]. Data integrity considers one of the most serious components in any system. It is easily accomplished with a standalone system, when data integrity deals with a single database. In this case, data integrity is responsible for maintaining database through a series of constraints and transactions. The situation is different in the distributed system; there are several databases and many applications. With a view to maintain data integrity in this system, transactions via several data sources need to be taken exactly in the fail safe manner. This state needs to use manger of a central global transaction. At the same time, each application in the distributed system must have ability to take part in the global transaction across a resource manager. Data integrity in cloud computing, it refers in the same meaning by ensuring the integrity of remote data saved at the un-trusted cloud servers. In this case, we deal with the issue of implementing a protocol for getting a proof of data ownership in the cloud. This issue attempts to get and validate a proof that the data that is saved by a real user at remote data storage in the cloud side is not updated by the archive and thus the integrity of data is confident. This verification system does not allow the cloud storage archives from changing the data stored in it without the permission of the data owner by using a multi-test on the storage archives. Furthermore, the cloud server could defraud the cloud users in two manners: 1. The cloud server calculates some functions and sends back a random number, but the claims remain needed some computations to complete transaction. 2. The cloud server selects some miss data which does not require highest computational cost and claims to use the valid data while the original data is wrong. In this paper we focus on the important issue of implementing a protocol for getting a proof of data ownership in the cloud sometimes denoted to as Proof of retrievability (POR). This issue tries to get and validate a proof the data that is saved by real user at a remote data storage which is called

cloud storage archives or simply archives. This type of storage has a good feature which does not allow modifying it by the archive and in this manner the integrity of the data is confidential. Cheating, in this environment, refers that the storage archive has ability to delete some of the data or perform some modifications on data. It must be distinguished that the storage server has immune against malicious attacks; as a substitute, it might be simply untrustworthy and lose the hosted data. Here, the data integrity schemes must to detect any modifications that may happen to data users in cloud storage servers. Any like these proofs of data ownership schemes do not perform by themselves, preserve the data from fraud by the archive. It just gives permission to reveal of tampering or modifying of a remotely sited file on an untrustworthy cloud storage server. While enhancing proofs for data ownership at untrustworthy cloud storage servers we are often restricted by the number of resources on the cloud server in addition to the client. Additionally, we propose an efficient and secure data integrity scheme based on Merkle hash tree and feature extraction from user’s handwriting. Additionally, our scheme does not require extra device or software compared with previous works in biometric field. Also, we provide a scheme which gives a proof of data integrity in the cloud which the customer can employ to check the correctness of his data in the cloud. Our proposed scheme was enhanced to minimize the computational and storage operating cost of the client side as well as to reduce the computational fixed cost of the cloud storage server. The encryption function of data commonly requires a large computational power. In our proposed scheme the operation of encryption is not there and hence preserving cost and time of computation in the client side. Additionally, we developed Merkle hash tree by making a user’s query to work one-time which leads to prevent an adversary from applying his malicious attacks such as Man-in-the-Middle (MITM) attack, insider attack, and replay attack. In addition, our proposed scheme provides many pivotal merits: more functions for security and effectiveness, mutual verification, key agreement, dynamic data support, recoverability when some data blocks are lost, 94

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 93-105 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) unlimited number preserving.

of

queries,

and

privacy

The rest of this paper is organized as follows. The necessary primitives and requirements of our scheme exist in section 2. An overview of related work is displayed in section 3. The proposed scheme is addressed in section 4. Security analysis and experimental results are existed in section 5. Conclusions are presented in section 6. Fig. 1. The architecture of cloud data storage services

Manuscript must be typed in two columns. All text should be written using Times Roman 12 point font. Do not use page numbers. 2 DESIGN ISSUES 2.1 Problem Definition We assume a cloud data storage service consisting of three different components, as explained in Fig. 1 show the first one is called cloud user (CU), who possess many data files to be saved in the cloud; the second one is known the cloud server (CS), which is controlled by third component who knows the cloud service provider (CSP) to provide data storage service and has important storage space and computation resources. For more details, CSP must ensure that all significant data are covered and only formal users have arrived to data in its entirety. Also, it has ability to ensure that applications available as a service over the cloud are secure from adversaries. We assume a general cloud computing model involving of n cloud servers such as S1, S2,.., Sn, which may be monitored by one or more CSP. CU delegates his data to the cloud servers. CU employs the cloud servers as data storage and submits some functions for computation. The cloud service provider can expose the user in two manners as follows. 1.The Cloud Service Providers (CSPs) has ability to remove some seldom accessed data files to decrease the storage cost or update the stored data of users to disclose the data integrity. This way is called storage misuse. 2.The cloud server selects some incorrect data which describes by lowest computational cost and claims to employ the valid data while the original data of the user is lost. This way is known compromising computation.

Fig. 2. The structure of our proposed scheme

Additionally, we must refer to significant component that is called the third party auditor (TPA); this component has skill and capability that is trusted to evaluate the security of cloud storage service instead of the user upon request. Users depend on the CS for saving and preserving their data. They may also automatically cooperate with the CS to arrive and update their stored data for different application purposes. Sometimes, the users rely on TPA to guarantee the storage security of their data, while wishing to preserve their data private from TPA. Fig. 2 shows the basic architecture of our proposed data integrity scheme. Our scheme comprises of three components that have previously mentioned. The overall work can be divided into two phases: Configuration Phase and Verification Phase. The first phase consists of two steps: 1) Generation of meta-data; 2) Encrypting the Meta-data. In the generation of meta-data stage, each user registers his identity information (username, password, and signature handwriting handF) and his data file (F) in CA. Then, CA extracts features from user’s signature handwriting 95

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 93-105 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) and then splits it to m byte. Constantly, CA divides the data file F into n data blocks and each data block n splits to m byte. In encrypting the meta-data by embedding each m byte in the data block of F with m byte in handF. Fig. 3 shows the mechanism of this phase. Finally, the original data and secure meta-data store into cloud server. In the verification phase, assume the verifier V wish to verify the integrity of the original data file F. It sends a challenge to the cloud server and requires it to respond. Each of the challenge and response are compared and V displays the result as accepting or rejecting the integrity proof by using the feature extraction of signature handwriting and Merkle hash tree. Additionally, we notice our proposed scheme does not require TPA and then acquires it more privacy, performance, and efficiency (see Fig. 2). 2.2 Merkle Hash Tree In cryptography, Merkle tree considers a binary tree which consists of many nodes; each non-leaf node is imprinted with the hash of the names of its child nodes. Hash trees are functional because they product flexible and secure verification of the components of many data structures. To explain that a leaf node considers as a part of a specified hash tree that needs to offer an amount of data appropriately with the number of nodes in the hash tree. We can demonstrate the mechanism of the hash tree during Fig. 4. So, the leaves are generated by hashing of data blocks in, for example, a file or components of files. We notice the hash 0 represents the result of merging and . Respectively, so referees concatenation function. Definition Merkle trees. It is a binary tree with an employment of a string with each node: …(1) where the value of the parent node products from a one-way hash function of the children’s node values (see equation 1).

Fig. 3. The mechanism of computing secure meta-data

The root value of this tree have denoted to public value, while all the values connected with leaf preimages are identified by the “tree owner” alone. 2.3 Feature extraction of digital handwriting Signature recognition represents one of the oldest and most significant biometric authentication schemes, with wide-spread official acceptance. Handwritten signatures are usually used to approbate the components of a document or to verify a financial transaction [6]. An important benefit biometrics signature as the human tinge for biometric authentication via other features is their long standing convention in several regularly encountered verification jobs. In other side, the signature verification process is already recommendable by the general public. Furthermore, it is also comparatively less expensive than the other biometric schemes [6, 7]. The difficulties connected with biometric signature verification systems due to the wide intra-class variations, make biometric signature verification a complex pattern recognition issue. This scheme does not need additions cost (such as digitizing tablet, the pressure sensitive pen) like online methods, just requires a pen and a paper, and are therefore less persistent and more users friendly. In off-line biometric signature verification, the signature exists in a paper which is scanned to acquire its digital image. There are many types to extract features from biometric digital such basic functions, geometric normalization, extended functions, time derivatives, and signal normalization. In this paper, we focus on the basic function to extract main features for each user’s signature and then employed them to work as the main factor to generate meta- data. 96

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 93-105 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) 7. With the help of TPA 8.Untrusted server ).

Fig. 4. Merkle hash tree structure

In the first type, the biometric signature representation depends on the following five elements: horizontal and vertical location trajectories, azimuth and altitude of a pen to preserve to the tablet, and single of the pen’s pressure pan. The value n = 1,..., N denotes to the discrete time index specified by the gaining device and N is the time period of the biometric signature in sampling units. Consequently, the basic function set compounds of , , a synthetic timestamp pen ups , and . 2.4 Features of remote data integrity testing protocols Any a remote data integrity checking scheme requires main conditions as follows: 1.Privacy preservation : The TPA does not have any abilities to obtain knowledge of the real user data over the auditing process. 2.Unlimited number of queries : The verifier allows applying an unbound number of queries in the challenge-response process for data verification. 3.Data dynamics ( : The clients can perform processing on data files like add, delete, and update while maintaining data rightness. 4.Public verifiability : Anyone must be permitted to confirm the integrity of data. 5.Block less verification : Challenged file blocks must not be recovered by the verifier during the verification phase. 6.Recoverability : The main section for checking the correct ownership of data, some scheme to retrieve the lost data is required.

).

3 RELATED WORK Simply, the Proof of retrivability (POR) method can be generated by using the keyed hash function . In this approach the verifier, before archiving the data of original file in the cloud storage side uses the cryptographic hash of by using at a first process and then, saves this result with the secret key as a second process. To ensure if the integrity of the original file is missing, the verifier sends the secret key to the cloud archive side and requires it to calculates and sends back the value of . By saving multiple hash values for variant keys the verifier can test for the integrity of the original file for several times, each one being a self-determining proof. Though this scheme considers very simple and effortless implementable, requires high resource costs for the implementation. On the verifier side this includes saving as several keys to check the integrity of the original file. Additionally, computing the hash function for many data files can be heavy for some clients such as mobile phones. In the archive side, each call of the protocol needs the archive to process the full file . This can be computationally troublesome for the archive even for using a simple operation like hashing [6]. Juels and Burton presented a scheme called Proof of retrieve-ability for huge files using ”sentinels” [8]. This scheme is different from the key-hash scheme, only the single key can be employed regardless of the size of the file or the amounts of files that retrieve-ability it wishes to verify. Additionally, the archive requires contacting only a small section of the file F . This small segment of the file is autonomous of the length of the file. In this scheme, the cloud user must to note these segments of the sentinel values as well as the number of times that a cloud user side challenging the cloud server side is more limited. Ateniese et al. [7] proposed “Provable Data Possession” model for checking possession of files on the unconfident storages. In their scheme, they demonstrate RSA- relying on homomorphic tags for auditing outsourced data. In this scheme the 97

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 93-105 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) cloud user should pre-compute the tags at first and save all the tags at the second. These tags require a lot of computation and space of storage. Shacham and Waters [10] employed the homomorphic features for ensuring from the integrity of data. Chang and Xu [11] assisted the MAC and reed solomon code for testing the remote integrity. So, the homomorrphic features, MAC and the reed Solomon code cannot be used to check the validity of computations. Sravan Kumar R and Ashutosh Saxena presented a scheme which includes the partial encryption of the whole data file. They encrypted only little bits of data for each data block thus decreasing the computational operating cost on the clients. In their scheme the verifier requires to save only a single cryptographic key regardless of the size of the file. The verifier will save each original data file and meta- data in the archive. In the verification phase, the verifier uses this Meta data to verify block data in the original data file. Their work is considered a good for the soft clients but when it performs to giants then there will be required a lot of computational overhead. Scheme [12] is depended exclusively on symmetric-key encryption. The essential process is that, before outsourcing, data owner precomputes several verification tokens, each one presenting some set of data blocks. The real data is then given over to the server. Consequently, when the data owner wants to achieve a proof of data ownership, he sends his challenges values to the server. In the server side, he computes a short integrity test over the specified blocks and comebacks it to the owner. This scheme does not support the public verifiability, privacy preservation, and the number of quires is limited. Wan et al. [13] proposed a scheme allowed a third party auditor (TPA) to validate the integrity of the dynamic data saved in the cloud servers. This scheme describes by many features such as no privacy preservation, fully dynamic data operation, and block less verification. Hao et al. [14] presented a new remote integrity checking scheme depended on homomorphic verifiable tags. This scheme has the procedures SetUp, TagGen, Challenge, GenProof and CheckProof, in addition to functions for data dynamically. The drawbacks of this scheme do not

have ability to recover the lost or corrupted data. Table 1 describes a comparison of security properties between our proposed scheme and previous works.

C1 C2 C3 C4 C5 C6 C7 C8

Table 1 Notations of our proposed scheme Our Ateniese et al. Wan et Hao et proposed [12] al. [13] al. [14] scheme Yes No No Yes Yes No Yes Yes Yes Yes ( not fully Yes Yes dynamic) Yes No Yes Yes Yes Yes Yes Yes Yes No No No No No Yes No Yes Yes Yes Yes

Our proposed scheme will minimize the computational and storage operating cost of the client and reduces the computational overhead in the cloud storage server side. It also decreases the size of the proof of data integrity so as to minimize the network bandwidth burning up. In our data integrity scheme the verifier requires to save only a feature extraction of user’s handwriting that used for generating encrypted Meta-data and then appended to original data file before storing the file at the archive side. At the time of verification, the verifier employed this meta-data to validate the integrity of the data. It is significant to know that our proof of data integrity just ensures the integrity of data (i.e. if the original data has been criminally modified or omitted. It does not avoid the archive from updating the data. In Merkle hash function, Wang et al. [15] proposed a scheme which gains to third party auditor to have the ability to confirm the rightness of the stored data on demand. Additionally, this scheme uses Merkle hash tree to allow the clients to process block-level operations on the original data files while processing the same level of data truth assurance. In this scheme, the third party verifier has the capability to misuse the data while they are performing the verification operation. Lifei et al. [14] presented a technique for ensuring the rightness of computations done by a cloud service provider. In this scheme, they have employed the Merkle hash tree to validate the 98

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 93-105 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) accuracy of the computation. The weakness in this scheme refers to the number of cloud user’s computations who submits to the service provider must be in the power of 2, so the Merkle hash tree can be generated for the nodes of power 2. Our proposed scheme enhances the existing proof of storage techniques by manipulating the classic Merkle Hash Tree structure for preserving authentication protocol. Additionally, we improved Merkle hash tree by making a user’s query to work one-time which leads to prevent an adversary from applying his malicious attacks such as Man-in-the-Middle (MITM) attack, insider attack, and replay attack. 4 OUR PROPOSED SCHEME The common notations in Table 2 will be used throughout this scheme. The client must be performed some processes to its original data file F before saving its data in cloud servers. The client extracts features from his signature and creates appropriate meta-data which is employed in the later phase of verification which ensues from the data integrity in the cloud storage. When the verifier wishes to validate the integrity of the file F, a user presents a challenge to the target server and requires the server to respond. The challenge detects the block number and the position of the byte number in the data block that possess to be verified. The server replies with two values (i) the first value of meta-data and (ii) the second value of the original data. The verifier uses feature extraction of signature handwriting to decrypt the metadata and ensures if the decrypted value is equal to the value of the original data. If the result is true then integrity is assured. The main structure for checking integrity between the cloud server and user is described in the Fig 5.

Table 2 Notations of our proposed scheme Symbol Definition Cloud server. Verifier. Two random numbers are used by CS and V to generate shared key between them. The shared key is between V and CS. It refers to the j’th byte in the i’th block of meta-data file. It refers to the j’th byte in the feature extraction of signature’s handwriting file . The challenges parameters that are sent from V to CS. The challenges parameters that are sent from CS to V. refers leaf that is selected by V to send as challenge to CS. Other miscellaneous values which are used in the verification.

4.1 Configuration Phase This phase distributes in three stages; the first one each of Verifier (V) and Cloud server (CS) agree to detect the shared key. In the second stage, Verifier will prepare meta-data to use in the next phase. A third stage is specified for ensuring the rightness of computations that done by the cloud server, we used Merkle hash tree. It is necessary for ensuring the authenticity and integrity of outsourced data. First stage: 1.verifier selects random number , computes , and sends to the cloud server; 2.Cloud server chooses random number , computes each of and the shared key and sends to the verifier; 3.Verifier computes the shared key .

Second stage: The verifier V performs the following steps: 1.Extract features from signature hand writing and then divide to m byte. 2.Split original file F into n data block and split each data file to m byte.

99

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 93-105 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) depending on the integrity proof. The main steps of this phase describe as follows: 1.

The verifier challenges the cloud storage server by detecting the challenge value as the following computations:  Verifier selects the block number and the byte position .  Verifier generates a random number 

. Verifier encrypts and sends significant parameters

to

the

cloud server.

2. The cloud server computes the following operations:    

Fig. 5. Our proposed scheme diagram

3. Compute meta-data by using the equation Eq. 2: … (2) where , . Hence, the existing of the feature extraction in meta-data is represented save meta-data in a secure manner. 1. Add the meta-data to the original data file. 2. Save the inserted meta-data and original data inside the cloud server.

Third stage: the cloud user selects a vector consisted n elements that have been derived from the feature extraction signature randomly. Then, he submits this vector to the cloud server for constructing Merkle hash tree which can be organized for the number of leaves in power of 2. 4.2 Verification Phase Let the verifier wishes to verify the integrity of the original file . It sends a challenge to the archive and requires it to respond. The two values of challenge and the response are compared and the verifier decides to accept or reject by

Retrieve the significant parameters by using ). Compute . Compute one-time shared key as follows: . Send to the verifier.

3. . The verifier computes the shared key to decrypt by using encryption function After that, Verifier performs some computations.  The verifier performs the inverse function of Eq. 3 as follow: …(3) 

He computes and checks whether if the result is true, then the data is not modified, Verifier selects a value of one leaf that exist at the end level of the tree, encrypts this value If

the result is false, the data is modified, Verifier returns original data block to the cloud server for recovering the original data block which is lost or modified, selects , computes and then sends it to CS. From above steps (2, 3), the data integrity has been detected by using biometric signature thereby calming the data integrity. 4. . The cloud server retrieves the by decryption function and discovers in the Merkle 100

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 93-105 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) hash tree a path from the leaf to the root by depending on . For example, in figure 4, the challenge on needs to calculate a path with the vertices {datablock1, hash0-1, hash 0, Root hash}. The cloud server computes the hash value of root and sends the sibling sets of nodes in path (from Xm to the root) to the cloud user with . 5. The verifier obtains the values from the cloud server and generates the hashed root during the result and the sibling value set submitted by the verifier. If the matches with , the verifier authenticates that the computations are done acceptably. Otherwise, the verifier returns original sibling sets to for restoring the lost sibling sets. 6. Cloud server decrypts by computing and then restitutes the lost data as a main step for recovering lost data. 4.3 Data Dynamics The proposed scheme provides data dynamics at the block level, which contains a block modification, block insertion, and block deletion. In cloud data storage, we notice several potential scenarios where data saved in the cloud is dynamic such as e-documents, video. Therefore, it is essential to consider the dynamic state, where a client may wish to execute the above operations while maintaining the storage exactness assurance. For performing any data dynamic operation, the client must first create the corresponding produced file blocks and sends challenges to CS for ensuring from his validity. Next we view how our scheme supports these operations. Data Modification: We begin from data modification process, which is considered one of the most commonly used operations in the cloud data storage. An essential data modification operation denotes to the replacement of determining blocks with new ones. Assume the client wishes to modify the block number i and the byte position j. On the first, depend on the new block , the client computes meta-data of a new block

computes encryption function and then sends it to the , where refers to the modification operation. Upon receiving the request, the cloud server executes update operation by decrypting for retrieving . Constantly, the cloud server: (i) replaces the block data with and outputs ; (ii) replaces the with ; (iii) replaces with , in the Merkle hash tree structure and creates the new root (see the example in Fig. 6). Finally, the cloud server responses the client with a proof for this operation by computing . After receiving the proof for updating operation of the cloud server, the client first creates root based on a new data block and authenticates the cloud server by comparing with . If it is true, the update operation has successfully. Data Insertion: Compared to data modification operation, which does not update the logic organization of client’s data file, data insertion, denotes to append new data block after some specified locations in the original data file F. Assume the client wishes to add block after the i'th block . The mechanisms of processing are similar to the data updating state. At begin, based on the client constructs meta-data of new block , computes encryption function and then sends it to the , where refers to the insertion operation. Upon receiving the request, the cloud server executes insert operation by decrypting for retrieving . Continually, the cloud server (i) saves , adds and “after” and , respectively. Then, he adds a leaf “after” leaf in the Merkle hash tree and outputs F ; (ii) he creates the new root . Finally, the cloud server checks the validity of client by computing . After receiving the proof for updating operation of the cloud server, the client first creates root based on a new data block and authenticates the

101

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 93-105 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) cloud server by comparing with . If it is true, the insert operation has successfully.

Fig. 6 Views example of the block modification operation

Data Deletion: We can describe this operation as the opposite operation of data insertion. For deleting any data block, it denotes to deleting the detected block and shifting all the latter data blocks one block forward. Assume the cloud server receives the update request for deleting block , where refers to the deletion operation, he will delete each of from its storage space. Then, he omits the leaf node in the Merkle hash tree and constructs the new root . The details of this operation are similar to that of data updating and insertion, which are thus deleted here. 5 SECURITY ANALYSES AND EXPERIMENTAL RESULTS In this section, we analyze the security features of our proposed scheme and view the comparison of file sizes for the original data and metadata by using the signature of handwriting. 5.1 Security Analysis Proposition 1. Our proposed scheme can supply mutual verification. Proof. This security feature means that an adversary cannot impersonate the legal V to CS, and vice versa. Only the genuine verifier who possesses the secret factors can successfully bring the factors to the cloud server. In this state, CS can decrypt each of , and

computes . If so, a verifier is genuine. At the same time, the verifier can compute that must be decrypted by using shared key SK. So, SK generates once for each verifier’s request. Also, the verifier can detect the authority of CS by comparing each of and with and . Furthermore, it depends on features extraction of verifier’s signature handwriting. Therefore, our proposed scheme achieves mutual verification between the two entities (see Fig. 5). Proposition 2. Our proposed scheme can forward Secrecy. Proof. Our proposed scheme protects of the password even when the shared key is disclosed or leaked. If the secret key SK is revealed by the adversary, the authentication of the system is not impressed, and he cannot use this key in the next verification phase. At the same time, it is extremely hard that an adversary can derive the secret key which consists of and random number . Also, the attribute of the crypto one-way hash function and an adversary still cannot obtain shared SK which is used to encrypt and then sends to in communication channel. has been benefited from to generate shared key for each verification phase. Hence, our work maintains the forward secrecy. Proposition 3. Our proposed scheme can supply security of the digital handwriting (biometric agreement). Proof. In the proposed scheme, we notice that the communication messages only include information about . They do not include any information related to the signature handwriting . Therefore, the messages of mutual verification stage are generated once for each verifier’s request, denoting feature extraction of signature and verification messages are completely individualistic. Also, the cloud server does not contain file signature handwriting that helps him to increase time processing of our proposed scheme or exposes to malicious attacks. Thus, our work supports security of the digital handwriting. 102

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 93-105 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

1.The verifier computes and checks whether = , if the result is false, the data is modified in a illegally manner, Verifier returns original data block to the cloud server for recovering the original data block which is lost or modified, sends sends to CS; 2.When the verifier compares with , if does not match, the verifier returns original sibling sets to CS for restoring the lost sibling sets. As a result, the proposed scheme can gain recoverability.

it. In our proposed scheme, the adversary fails to fool the service provider since he has to know the shared key and signature handwriting . These keys are employed to compute , which is used to decrypt the ciphertext , sent to CS by the verifier. Addition to that, the adversary does not possess ( , ) for computing which are used to verify the both entities. Obviously, the proposed scheme can resist the insider attack. Proposition 8. Our proposed scheme can withstand Man-In-The-Middle (MITM) attack. Proof. This type of attack is intended that an attacker has the ability to intercept the messages between the verifier and the cloud server. Then, he uses this message when the verifier signs out the cloud server. In our proposed scheme, the factors are securely encrypted and sent to the service provider. Generation of the random value is through the creation of sensitive data( ) by verifier as challenges to CS. This sensitive data becomes useless when V signs-off the cloud server. Therefore, an attacker spotting communication between V and CS can learn which is used only once; he is unable to compute . Nevertheless, when V signs out of the cloud server, an attacker cannot compute to impersonate the genuine verifier or calculates to impersonate the cloud server. As a result, the proposed scheme can resist MITM attack.

Proposition 6. Our proposed scheme can withstand a replay attack. Proof. The verifier’s login request message of our proposed scheme employs a random instead the timestamp to resist replay attack. Hypothetically, if the attacker detects the old secret keys authentication such as , he still cannot perform a replay attack on the next authentication session. So, an attacker fails to get for generating . Obviously, the adversary fails to use the replay attack. Proposition 7. Our proposed scheme can withstand a reflection attack. Proof. This attack means when a legitimate user ships a login request to the server, the adversary tries to eavesdrop on user’s request and replies to

5.2 Efficiency Analysis The client constructs the meta-data, encrypts the meta-data and adds the data to the original data and saves the data at the cloud server. This requires some additional computation cost in the client part. After the computation phase, the size of the file becomes double. So the client will contain double the file size of storage space and signature hand writing the file. The comparison of file sizes for the original data and metadata is described in the Fig. 7. time processing of verification phase. So, the efficiency of signature handwriting has high performance, security, and does not effect to performance of the system. The efficiency of our work has been tested in term of measuring the response time of CS. Our proposed

Proposition 4. The proposed scheme can provide known-key security. Proof. Known-key security refers that the compromise of a session shared key will not show the way to further compromise of session shared keys. In the case, the session key becomes exposed to an attacker, he fails to derive other session keys since they are constructed from the random numbers based on the key exchange manner which is started from by and it’s finshied to the verifier for each verification phase. Therefore, the proposed scheme can gain known-key security. Proposition 5. The proposed scheme can provide recoverability. Proof. The verification of the proof-ofretrievability (POR) is happened when our proposed scheme detects illegal updating/ losing of original data block . We notice this state in two places:

103

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 93-105 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) scheme has been executed and tested on a base of many signatures. These signatures were acquired using the Biometrics Ideal Test for supporting biometric database. Additionally, our experimental results are based on UC Irvine Machine Learning Repository database. Now, we study the performance of our work. The evaluation parameters are declared in Table 3. The time requirement of our proposed scheme is existed in Table 4. We utilize the computational overhead as the metrics to evaluate the performance of our proposed scheme.

Fig.7 Comparing the data size of files in cases: meta-data, biometric meta-data, and without meta- data Symbol

Table 3 Evaluation Parameters Definition Time processing of a hash function. Time processing of the mathematic operations. Time processing of symmetric encryption operation. Time processing of a symmetric decryption operation. Time processing of an XOR operation.

Table 4 Performance of Our Proposed Scheme Phase Client Cloud Server Configuration Verification Total

6 CONCLUSIONS In this paper, we presented a scheme for the data integrity over the cloud computing and we employ the feature extraction of hand writing signature and Merkle hash function to succeed the integrity

principle, in such a way to aid the user to verify and test the data from unauthorized users that employ with the cloud data server. Additionally, we have used in our paper different algorithm compared to previous related work in this reverence for cloud data management and biometric. From this preserving data of cloud, a user can be robust confidence for his uploaded data for any work in the future. Additionally, the key idea of our proposed scheme is to gain integrity to the cloud storage area with sturdy reliability so that a user does not worry to upload his data in his allocated area. In encrypted processing, a user updates his/her sensitive data with a remote cloud from other components of the system. Furthermore, our proposed scheme is immune from, replay attacks, MITM attacks, and reflection attacks. Our work supports many system via separated processes whose execution in cloud environment and data is protected security features such as mutual verification, forward Secrecy, known-key security, revocation, and biometric agreement. In the performance our presented scheme has been evidenced to achieve sturdy security with low cost compared with previous schemes. 7 REFERENCES [1] S. Subashini and V. Kavitha, , ’’A survey on security issues in service delivery models of cloud computing, ” Journal of Network and Computer Applications, vol. 34, no. 1, pp. 1-11, Jan, 2011. [2] M. Piccinelli and P. Gubian, ’’ Detecting Hidden Encrypted Volum Files via Statistical Analysis”, International Journal of Cyber-Security and Digital Forensics, vol. 3, no. 1, pp. 30-37, 2013. [3] E. Mykletun, M. Narasimha, and G. Tsudik, “Authentication and integrity in outsourced databases,” Trans. Storage, vol. 2, no. 2, pp. 107–138, 2006. [4] D. X. Song, D. Wagner, and A. Perrig, “Practical techniques for searches on encrypted data,” in SP ’00: Proceedings of the 2000 IEEE Symposium on Security and Privacy. Washington, DC, USA: IEEE Computer Society, p. 44, 2000. [5] Pearson, S., 2009. Taking account of privacy when designing cloud computing services. Proceedings of the ICSE Workshop on Software Engineering+ Challenges of Cloud Computing, (CLOUD’ 09), ACM Press, USA., pp: 44-52. DOI: 10.1109/CLOUD.2009.5071532 [6] A. Juels and B. S. Kaliski, Jr., “Pors: proofs of retrievability for large files,” in CCS ’07: Proceedings of the 14th ACM conference on Computer and

104

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 93-105 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

communications security. New York, NY, USA: ACM, pp. 584–597, 2007. Ateniese, G., R. Burns, R. Curtmola, J. Herring and L. Kissner et al., 2007. “Provable data possession at untrusted stores, ”. Proceedings of the 14th ACM Conference on Computer and Communications Security, ACM Press, New York, pp: 598-609, Oct. 2831. Shacham, H. and B. Waters, 2008. “Compact proofs of retrievability, ”. Proceedings of the 14th International Conference on the Theory and Application of Cryptology and Information Security: Advances in Cryptology, (ASIACRYPT’ 08), ACM Press, Heidelberg, pp: 90-107. Chang, E.C. and J. Xu, 2008. “Remote integrity check with dishonest storage serve, ”r. Proceedings of the 13th European Symposium on Research in Computer Security: Computer Security, (ESORICS’08), ACM Press, Heidelberg, pp: 223-237. G. Ateniese, et al., “Scalable and efficient provable data possession, ” Proceedings of the 4th international conference on Security and privacy in Communication networks, Istanbul, Turkey, 2008. Q. Wang, C. Wang, K. Ren, W. Lou and J. Li, “Enabling Public Auditability and Data Dynamics for Storage Security in Cloud Computing, ” IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 5, May, 2011. Z. Hao, S. Zhong and N. Yu, “A Privacy-Preserving Remote Data Integrity Checking Protocol with Data Dynamics and Public Verifiability, ” IEEE Transactions on Knowledge and Data Engineering, Vol. 23, no. 9, September, 2011. Wang, C., Q. Wang, K. Ren and W. Lou, 2009a. “Ensuring data storage security in cloud computing. Proceedings of the17th International Workshop on Quality of Service, ” IEEE Xplore Press, Charleston, SC., pp: 1-9, Jul. 13-15. Lifei, W., H. Zhu, C. Zhenfu and W. Jia, 2010. SecCloud: “Bridging secure storage and computation in cloud. Proceedings of the 2010 IEEE 30th International Conference on Distributed Computing Systems Workshops, ” IEEE Xplore Press, Genova, pp: 52-6, Jun, 21-25.

105

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 106-110 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Improvising the Input Process of Traceability Model for Digital Forensic Investigation Azizah Bt Abdul Manaf Advanced Informatics School Universiti Teknologi Malaysia Kuala Lumpur, Malaysia [email protected]

Iman Ahmeid Mohamed Advanced Informatics School Universiti Teknologi Malaysia Kuala Lumpur, Malaysia [email protected]

Abstract— In this paper, we present an enhancement input in the traceability model of digital forensic investigation. Plus, we present a literature review about existing traceability models. Furthermore, the outcome of this model expected to help and improvise the traceability model with theoretically proven justifications. Keywords—Forensic, traceability, Scenario, Evidence

I.

INTRODUCTION

Today’s world has a large growth of technology in increasing numbers of computers, portable devices and networks, Due to such growth, criminals are becoming smarter and trickier to trace, which makes it easier to commit crimes and fulfill an illegal purpose. However, digital forensics come in finding a way to handle the increasing amount of digital crimes by carrying out certain procedures and processes to track the source of the crime through finding evidence to uncover the identity of the offender. Therefore, a lot of people are exploring this field worldwide and such procedures continuously evolve. According to RSA 2012 cybercrime trends report review, cybercrime is constantly displays no signs of reducing down. In fact, 2011 noticeable a year of new advanced risks and threats, as increasing the level of sophistication in the attacks testified and witnessed around the world. As we moved into 2012, cybercrime is diverging down with a different direction as new financial malware and viruses variants emerge, cybercriminals find new way to monetize non-financial data and the increasing of hacktivism-related inhales new life into an old adversary [1]. On the whole, in this research paper we are focusing on tracing the evidence using traceability model. The input process of the traceability model will be enhanced by adding a significant phase to make the model work effectively. II.

Issues in Digital Evidence

investigator, it needs to examine the ethical consideration, in order to use this information to prosecute a legal act and to prevent reduction during the trial, evidence must be gathered effectively and lawfully. It is particularly important to be conscious about the privacy rights of suspects, victims and uninvolved third parties. Any person challenge to examine such a case should be acquainted with the primary technological innovation involved in collecting the information, how to effectively gather the data, and how to ensure that the information will be legitimate as evidence during trial. There are many reasons for the unsolved digital crime cases; one of the reasons could be not conducting the forensics investigation steps to find the evidence. Gathering the evidence should have the proper tools to complete the process while having the skillful investigators required to collect the evidence. Another point is that, in the prosecution will identify that the case is not solved due to not compiling the necessary steps to prove the evidence collected. The criminals are highly knowledgeable about the forensics; meanwhile they struggle to hide themselves in a very professional way. In addition to that, the organization may lack the tools or the skillful people to gather the evidence [3]. Thus, the issues highlighted in [4] can be summarized in the following points:    

  

Disorganization of the evidence For instance, a hard drive platter contains many pieces of information mixed together and layered on each other over time.  The abstraction of the digital incidents that gives a partial view of what occurred in each crime occurred.  Intentionally alteration or manipulation of the digital evidence without leaving any traces.  The complexity of the methods of computerized evidences and traditional evidences, where traditional evidences are generated and retrieved as a single record but in computerized evidence, it is generated or retrieved from different records and sources. 

In [2] has been stated that, because of the higher amount of information that can be obtained from a forensics

106

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 106-110 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) III.

DIGITAL FORENSIC

Digital Forensic can be described as the process of using techniques and methods that are forensically approved to collect, preserve, analyze and document the digital evidence to present the original source in the court [5]. Therefore, digital forensics has persisted as long as computers and digital devices have stored data that could be used as evidence. Thus, digital forensic was performed by government agencies, but has become common in the commercial and other sectors over the past several years. IV. TRACEABILITY IN PROCESS

DIGITAL FORENSIC INVESTIGATION

Generally, traceability is very significant phase during the digital evidence investigation process, which is the key used in forming and identifying the chain of evidence. Traceability is a broad and an instrument that attain many different objectives to finish the tracing process which is very complex. Moreover, [5] defined traceability as the approach that trace and map the evidence to find the source of the incident. Traceability of the digital evidence process becomes very necessary in investigation aspects as it is applicable to track the different sources of the evidence. Cyber crimes or digital crimes are now serious, extensive, competitive, increasing and progressively innovative, which poses major effects for nationwide. However, there are challenges that investigators face during acquiring and tracing the digital evidence as follow:  



Lack of traceability models that is used for digital forensic process as well as a limited number of research about it.  Hard to deal with the function of the traceability model, that is to say, difficult to comprehend and uneasy to identify the process of tracing where it is used as theoretical base. 

However, traceability in digital forensic investigation process has been addressed in [5] as the process of finding the original source that caused the incident. The traces can be found in electronic devices as well as digitally examining the activities such as in network, whereby it needs to identify the open ports, protocols numbers and the IP addresses associated to the source of the incident. Besides that, it can be on database documents or internet activities, to be able to construct trace pattern that enables the investigator to find the source of the incident during the digital investigation process [6].

V.

A. A conceptual Model of Traceability of Safety Systems The model is needed as an initial solution that is generating traceability approach for traceability management of safety system. It gives an overview of the data produced during development, safety analysis process and the relation between those data. However, the traceability has a direct effect on the success of system implementation, due to the lack of good methods and tools to provide additional personnel resources. It has declared the necessity of developing a traceability model for safety systems. The approach developed to be used to capture and maintain traceability as well as performing impact analysis. The two main reasons of developing the traceability approach were to trace such hazards, safety requirements to the design and implementation. Moreover, it is stated that there are two sets of traces need to be developed which are solved in the safety system, which are, firstly, Development, maintenance and testing (requirements). Secondly, the safety analysis and certification which is applied by providing proof of conformance method [7]. B. Model-Based Traceability The model-based traceability propose an approach by establishing traceability matrices that helps project stakeholders in the organization to prepare, produce and perform traces in graphical modeling environment.. The model aims to manage traceability strategies and queries, and thus introduce four layered model for defining traceability metagraph. It could help the stakeholders to plan, generate and execute trace strategies in modeling environment [8]. C. Research for Traceability Model of Material Supply Quality in Construction Project The traceable model is designed for the quality of material content. It built the content supply quality for project development. It traces the quality of the material content and then validates and proves the project development. The system achieves the tracking and tracing of material quality in supply chain. Then, through the instance of application, validity and feasibility of the model will be proved [9]. D. Traceability between Software Architecture Models The framework is to record and trace the data between software architecture models using traceability model to help developers comprehend the software system lifecycle as well as understanding the reason of occurring on modification and renovation of designing the traceability model. With the traceability approach or model, one can trace back to find out the information due to the relative simplicity of high level software architecture models [10].

TRACEABILITY MODELS

This section provides a summary of the researches that propose traceability models in various areas.

107

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 106-110 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) E. A model for requirements traceability in a heterogeneous model-based design process: Application to automotive embedded systems The model is used for real time design for requirement traceability, whereby, it contains validation and verification on the model activities to ensure the coordination of the preliminary requirements of particular product. The model establishes a link between these flows and affords full traceability of requirements, including those set for heterogeneous models based on application to automotive embedded systems [11].

traceability model to describe that the stakeholder has a role in tracing the object. Furthermore, stakeholder describes the persons involved in the traceability process with different roles such as investigator, complainer or administrator. The source describes the original location of the traced information whereas the stakeholders manage to obtain the source that documents the traced object [6].

F. Adapting traceability in digital forensics process The proposed model called traceability model that uses in digital forensic investigation process, to trace pattern of the original source of an incident. The purpose of the model is to ensure the accurateness and completeness of the traces found and the relationship between them. The model is to illustrate the relationship in the digital forensic investigation process by integrating the traceability features. Moreover, the model is to trace and map the evidence to the source and shows the link between the evidence, the entities and the sources involved in the process [6]. VI.

Fig1. Traceability Model [6]

Traceability model assists the investigator in identifying the relationship between the original sources of the evidence and the digital evidence along with the all the individuals involved in the digital investigation process. Plus, provides a precise and complete evidence of the case incident [6].

SUMMARY OF TRACEABILITY MODELS

As mentioned, traceability is a broad approach and it can be applied in many fields and purposes such as software system development. In fact, as a review of the traceability models, we can notice that traceability researches concentrated on traceability of design, implementation of the requirement such as architecture design. However, as we have reviewed the traceability models, it can be said that each model serves a different purpose, but they all fall under traceability approach. It means, most of the models are used to trace the requirements of the development or software system to ensure its functionality, well- implemented and accurate design. On the other hand, the traceability model for digital forensic investigation process is only proposed by (Siti Rahayu Selamat & Robiah Yusof & Shahrin Sahib& Irda Roslan &Mohd Faizal Abdollah& Zaki Mas’ud – 2011). on the work done in the paper “Adapting traceability in digital forensics process” [6]. It is considered as the only and recent paper that uses traceability in digital forensics investigation. VII. TRACEABILITY MODEL OF DIGITAL FORENSIC INVESTIGATION PROCESS The model discusses what kind of information such as object that represents characteristics of the information. The object is showed with the link named traces to in the

VIII. THE ENHANCEMENT OF THE TRACEABILITY MODEL BASED-ON SCENARIO FOR DIGITAL FORENSIC INVESTIGATION PROCESS

Traceability Model consists of 3 phases which are stakeholders, object and source. The three phases are connected and contributed to each other in finding the evidence and the relationship between the evidences found. As a result, we added the scenario phase in the top of the model which considers the first phase to have the initial planning of who will be the stakeholder and what are the objects and the sources needed to be found based on the scenario reported.

In [6] they have used scenarios at various stages of the development life cycle that describes the traceability requirements. Thus, it is important to have the scenario as the first phase in the model to have a clear understanding of the requirements and the related components of the traceability model. Moreover, this model is undefined in terms of the usage whereby the model only shows the three components of the traceability model and the relationships between each other. However, scenario describes the intended use or purpose of the model [6]. The Oxford English Dictionary definition of scenario is the script or the outline of a film, with details of an imagined sequence of future events or with a detailed of scenes [12].

108

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 106-110 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) Furthermore, scenarios have been a part of the models that represents a single or example of the sequence of the event [10]. Scenario is the field from the real world stories and description of requirements or models. Therefore, scenarios consider as examples of real world description of experience that is expressed in many forms such as picture, natural language or other media [13, 14]. Similarly, in requirement engineering scenarios can assist in testing the requirement specifications and the models during verification and validation of the requirement. Nonetheless, the advantage of using scenarios is when needed to have a supporting ground argument and reasoning with specific details [9]. Scenarios can help in creating a model, by looking for patterns of the details in the real world which gathers stories and description from users [12]. To expand on that, there are two requirement engineering methods that uses scenarios as one of the phases of the method, which are the ScenIC method and SCRAM, while many other requirement engineering methods uses scenarios as one of the elements of the method [12]. Additionally, the scenario phase is used to trace such any existing system behavior. Thus, scenario is understood by all stakeholders in requirements engineering [15]. Substantially, there are some methods such as Inquiry Cycle of Potts that used scenario to identify the problems and issues in requirement analysis [16] [17]. In fact, a scenario involved in the design phase and at many level of specific details. Though, some researchers have worked on identifying trace dependencies using scenarios [18]. But, some others also concerned to the tasks users can carry out in the design process, without involving in the lower-level details that describes how the system will use the functionality for the tasks to be carried out by the users [19][20]. IX.

TRACEABILITY MODEL 0.2

Fig2. Traceability Model 0.2

Traceability Model 0.2 is the second version of the first traceability model of digital forensic investigation process. It differs from the previous version that it has the Scenario as the first phase. Whereby, scenarios define the event cases, the needed resources, requirements and tools necessary to start the investigation and to have a clear idea of who will be the stakeholders as well as objects and the sources. The scenarios could be any case that must find its original traces such as, digital crime cases or system failure. Subsequently, scenarios describe the stakeholder that can be the admin, forensics investigator or complainer. Then, the stakeholders determine the object to trace events and attributes with managing the devices and the logs. X.

CONCLUSION

In this paper, we have reviewed the traceability models and discussed about the enhancement of the input process of the traceability model based on scenario in digital forensic investigation. Therefore, we can say that the traceability model 0.2 simplifies the process of tracing the source of the incident to the investigators of digital forensic investigation. The achievement gained in this research paper can be further improved by expanding the knowledge of the model to enhance some elements in the model as well as proving and specifying the forensic tools that can be used in the majority of the scenarios. ACKNOWLEDGMENT This work is a part of a research that has been done in Advanced Informatics School, under support from Universiti Teknologi Malaysia.

REFERENCES [1] RSA 2012 cybercrime trends report http://www.rsa.com/products/consumer/whitepapers/11634_CYBR C12_WP_0112.pdf [2]

Bui, S., Enyeart, M., and Luong, J. Issues in Computer Forensics. Santa Clara University Computer Engineering, USA, 2003.

[3]

National Institute of Standards and Technology (NIST), and United States of America. Forensic Examination of Digital Evidence: A Guide for Law Enforcement. 2004.

[4]

Siti, R. S., Shahrin, S., Nor, H., Robiah, Y., and Mohd, F. A. A Forensic Traceability Index in Digital Forensic Investigation. Journal of Information Security, (4) 19-32; 2013.

[5]

Gary L Palmer. (2001). A Road Map for Digital Forensic Research. Technical Report DTR-T0010-01, DFRWS. Report for the First Digital Forensic Research Workshop (DFRWS).

[6]

Selamat, S. R., Yusof, R., Sahib, S., Roslan, I., Abdollah, M. F., and Mas' ud, M. Z. Adapting Traceability in Digital Forensic Investigation Process. Malaysian Technical Universities

109

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 106-110 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) International Conference on Engineering & Technology (MUiCET 2011), 2-3; 2011. [7]

Katta, Vikash, and T. Stalhane. A conceptual model of traceability for safety systems. CSDM-Poster Presentation. 2010.

[8]

Cleland-Huang, J., Hayes, J. H., and Domel, J. M. Model-based traceability. In Proceedings of the 2009 ICSE Workshop on Traceability in Emerging Forms of Software Engineering. 6-10; 2009. IEEE Computer Society.

[9]

Wang, S., Shi, J., Jiang, D., and Qi, Z. Research for Traceability Model of Material Supply Quality in Construction Project. In Computational Intelligence and Design (ISCID), 2012 Fifth International Symposium on Vol. 2, 398-401; 2012. IEEE.

[10]

[11]

[12]

Feng, Y., Huang, G., Yang, J., and Mei, H. Traceability between software architecture models. In Computer Software and Applications Conference, 2006. COMPSAC'06. 30th Annual International, Vol. 2, 41-44; 2006. IEEE. Dubois, H., Peraldi-Frati, M., and Lakhal, F. A model for requirements traceability in a heterogeneous model-based design process: Application to automotive embedded systems. In Engineering of Complex Computer Systems (ICECCS), 2010 15th IEEE International Conference on 233-242; 2010. IEEE. Sutcliffe, A. Scenario-based requirements engineering. In Requirements engineering conference, 2003. Proceedings. 11th IEEE international. 320-329; 2003. IEEE.

[13]

Antón, A. I., and Potts, C. The Use of Goals to Surface Requirements for Evolving Systems. 1998 International Conference on Software Engineering: Forging New Links, on IEEE Computer Society Press. 157-166; 1998.

[14]

Gough, P. A., Fodemski, F. T., Higgins, S. A., and Ray, S. J. Scenarios-an industrial case study and hypermedia enhancements. In Requirements Engineering, 1995. Proceedings of the Second IEEE International Symposium on IEEE. 10 -17; 1995.

[15]

Carroll, J. M. Making use: scenario-based design of humancomputer interactions. The MIT press. 2000.

[16]

Alexander, I., and Maiden, N. (Eds.). Scenarios, Stories, Use Cases. John Wiley, 2004.

[17]

Potts, C., Takahashi, K., and Anton, A. I. Inquiry-based requirements analysis. Software, IEEE, 11(2), 21-32; 1994.

[18]

Sutcliffe, A. G., Maiden, N. A., Minocha, S., and Manuel, D. Supporting scenario-based requirements engineering. Software Engineering, IEEE Transactions on, 24(12); 1998. 1072-1088.

[19]

Carroll, J. M. Scenario-based design. International Encyclopedia of Ergonomics and Human Factors, -3 Volume Set, 198. 2010.

[20]

Egyed, A., A Scenario-Driven Approach to Traceability, Proceedings of the 23rd International Conference on Software Engineering (ICSE), Toronto, Canada, IEEE Computer Society, 123-132; 2001.

110

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 111-121 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Software Stream Cipher based on pGSSG Generator Antoniya Tasheva1, Zhaneta Tasheva2, Ognyan Nakov1 1

Technical University of Sofia, Bulgaria 8 Kliment Ohridski blvd., Sofia 1000, Bulgaria 2

National Military University “Vasil Levski”, Faculty of Artillery, Air Defense and Communication and Information Systems, Bulgaria, 1 Karel Shkorpil Str., Shumen 9700, Bulgaria [email protected], [email protected], [email protected] ABSTRACT Secrecy of a software stream cipher based on p-ary Generalized Self-Shrinking Generator (pGSSG) is examined in this paper. Background information for the generator’s algorithm is provided. The software architecture and key management for the cipher initialization are explained. Galois Field GF(25732) and feedback polynomials are chosen for initialization of the generator. In order to examine the secrecy mathematical model of the software system is made. It is proved that the cipher is not perfect but the empirical tests result in less than 0,0125% deviation of the encrypted files’ entropy from the perfect secrecy. At last the proposed cipher is compared to four eSTREAM finalists by key length and period.

KEYWORDS PRNG, pGSSG, Security, eSTREAM, Stream Cipher.

Entropy,

Encryption,

1 INTRODUCTION Recently, computer technologies have started to play a huge role in everyday life as well as at the workplace. As the Internet gains more and more popularity and becomes a major means of communication, the term information security becomes more and more important. Encryption has been studied for centuries and the need to find new and better solutions is still present to date. When transmitting large amounts of data over communication channels such as mobile and wireless networks, and when high speed, low error propagation and resistance to attacks are needed, the use of stream ciphers is recommended. They

encrypt each symbol of the transmitted message with a keystream, which is usually generated by a Pseudo Random Number Generator (PRNG), producing binary Pseudo Random Sequences (PRSs). The elements of PRNGs that are most used in stream ciphers are Linear Feedback Shift Registers (LFSRs), because LFSR of length n can generate a PRS of maximum length T = 2n – 1. Since 1969, when the Berlekamp-Massey algorithm [1] was discovered, scientific researchers are looking for new methods of generating non-linear sequences. Researchers use two basic methods to generate nonlinear sequences [2]: structures based on LFSR registers, such as filter generators, combinatorial generators and clock controlled generators, and generators in finite fields such as GMW (Gordon, Mills and Welch) sequences and Bent function sequences. Recently some clock controlled generators which use a p-ary PRS [3] to create nonlinear sequences have been proposed [4 - 7]. They summarize the work of the Shrinking Generator proposed by D. Coppersmith, H. Krawczyk and Y. Mansour at Eurocrypt’93 [8], and the self-shrinking generator (SSG) proposed by W. Meier and O. Staffelbach at Eurocrypt’94 in [9]. Such generator that uses LFSR in order to produce its output PRSs is the recently proposed p-ary Generalized Self-Shrinking Generator [10]. It is proved that it has long period, is well balanced, has good statistical characteristics and is resistant against exhaustive search and entropy attacks. As most of the properties of the sequences generated by pGSSG generator have been studied and proved to give good results, it was decided to 111

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 111-121 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) build a software encryption system based on it. Its encryption properties are tested using the entropy measure [11, 12], which is the matter of this paper. An entropy measure is usually defined in terms of probability distribution. The entropy H(X) of a random variable X is a measure of its average uncertainty. It is the minimum number of bits required on the average to describe the value x of the random variable X [11]. In this study the entropy of the pGSSG PRSs is considered as well as their influence on the symbol distribution in the source and encrypted files. The paper is organized as follows. First, the working algorithm of p-ary Generalized SelfShrinking Generator for Galois Field GF(pn) is given. Then the architecture of software encryption system based on a pGSSG stream cipher is described. Then the key management is explained. In Section 4 the mathematical model of the system is designed and the entropy is both evaluated and calculated. Section 5 gives a comparison of the proposed stream cipher with four eSTREAM finalists. 2 ALGORITHM OVERVIEW The proposed p-ary Generalized Self-Shrinking Generator [10], given in Figure 1, consists of a pLFSR register A, whose length will be denoted by L. It generates a sequence (ai ) i0 with p-ary digits (i.e. (ai ) i0 , 0  ai  p  1 ) and 0  i  L  1. The multipliers of the feedbacks are given by coefficients q1 , q2 ,..., qL , q L  [0, 1, ..., p  1] of the primitive polynomial in GF(pL). Every element can remember one p-ary number. The register is initialized by p-ary sequence (a0 , a1 , , aL1 ) . Clock i

pLFSR A

api+1 api+2 … api+(p-1) api

SELECTION RULE

Memory for p-ary transformation of ( p  1) * log 2  p  1

bits p-ary output

The pGSSG selects a portion of the output p-ary LFSR sequence by controlling the p-ary LFSR itself using a six-step algorithm (see Fig. 1): 1. The p-ary LFSR A is clocked with clock sequence with period T. 2. The output pLFSR sequence is split into ptuples a pi , a pi1 , a pi2 ,..., a pi( p1) , i  0, 1, ... 3. If a pi  0 the whole p-tuple is discarded from the pGSSG output, i.e. the output is shrunken. 4. When a pi  0 , the corresponding digit a pia pi in the p-tuple forms the output of

the pGSSG. For example, if api = 1, then api+1 is output and the other digits a pi , a pi2 ,..., a pi( p1) are discard. If a pi  2 , then api+2 is output and the other digits a pi , a pi1 , a pi3 ,..., a pi( p 1) are discard and so on. If a pi  p  1, then a pi( p1) is output and the other digits a pi , a pi1 ,..., a pi( p2) are discard. 5. The shrunken p-ary GSSG output sequence is transformed into binary sequence in which every p-ary number is presented with log 2 ( p  1) binary digits, where

x  is the smallest integer which is greater or equal to x. 6. Every output number i from 1 to p-1 of p-ary GSSG sequence is depicted with pary expansion of the number by the formula: 2 log2 p 1  p  1 (1) (i  1)  2 7. Every p-ary zero in its i-th appearance ( i  1, 2, 3,... ) of the generated p-ary sequence can be represented binary by number di, (d i 1  1) mod p, if d i 1  p - 1 di  (2) 1, if d i 1  p - 1 and initial condition d 0  0 .

Binary output

Figure 1. p-ary Generalized Self-Shrinking Generator

112

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 111-121 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) 3 SOFTWARE STREAM CIPHER

Table 1. Feedback polynomials used in pGSSG.

A software encryption system has been built. Its main task is to encrypt the transmitted data in advance using a keystream, generated by the output of pGSSG generator. When needed the encrypted data could be put under second encryption with standard methods, such as DES in CBC mode or AES in CCM mode, which are used in the contemporary wireless networks.

№ 1 2 3

3.1 Architecture The software encryption system is based on symmetric stream encryption algorithm which is initialized by the value of the secret key K. It encrypts the input data stream (the plain text) as simple XOR operation on each byte of the plain text and the keystream received from the pGSSG generator (see Fig. 2). Encryption

Key K

pGSSG

+ Plain Text

pGSSG

Encrypted Text

Encrypted Text

The result of the encryption with that software system is that the amount of data remains the same before and after sending it into the communication channel. If the communication network provides the option for additional data encryption, it is applied as second level of encryption through a standard block cipher AES or DES. For example in WiMAX WLAN networks where data encryption is mandatory the confidentiality and the security will increase using two levels of encryption (Fig. 3). Plain Text First Level Encryption

Decryption

Key K

Feedback Polynomial x32 + x + 10 x32 + 75 x2 + 174 x + 33 x32 + 188 x2 + 200 x + 107

+ Plain Text

Figure 2. Architecture of the Software Encryption System based on pGSSG Generator.

The software encryption system uses class libraries for software representation of the LFSR register and the pGSSG generator. Although they are designed to be universal, Galois Field GF(25732) was chosen for the implementation because of the ease of byte representation and therefore the possibility of faster implementation. Three of the primitive feedback polynomials used to create a pLFSR register with prime p = 257 and length L = 32 are shown in Table 1.

GMH

4 bytes PN Security Header

pGSSG encryption

128 AES encryption DLEN

Second Level Encryption

8 bytes ICV

CRC

Security Trailer

Figure 3. Use of pGSSG Software Encryption System in WiMAX WLAN Networks.

3.2 Key management The architecture of the pGSSG generator allows building a symmetrical stream cipher with flexible key management. The key is multicomponent and consists of several elements: 1. L in count p-ary bit components, giving the initial state (a0, a1, …, aj, … aL1), where aj = 0, 1, ..., p - 1, j = 0, 1, …, L1, of the inner LFSR register. 2. L+1 in count p-ary number components, giving the coefficients of the feedback polynomial (q0, q1, …, qj, …,qL), where qj = 0, 1, ..., p - 1, j = 1, 2, …, L, of the inner LFSR register. 3. The last component is set as the initial value of the ‘zero’ in the output and is a random p-ary number. 113

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 111-121 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) If we consider only the initial state as key of the system, it will have length L in p-ary digits. To present each p-ary digit log 2 p bits are needed. Therefore the key length is LK InSt = L. log 2 p . It can be seen that with p growing the key length is also growing. Table 4.1 shows the minimum length Lmin of the pLSFR register that ensures key length LKmin to be 256, 512 or 1024 bits, required by the contemporary cryptographic applications. The choice of a feedback polynomial is made between preliminarily calculated primitive polynomials in GF(pL). Their count Cpoly is calculated by the formula [13]:  ( p  1) , (3) C poly  L  1  1  where  ( x)  x1   1   is the Euler phi  q1   qk  function, and x is a positive integer with

L different p-ary digit components for the initial state of pLFSR register and one for the initial state d0 of the p-ary zero are used.

factorization given by x  q1e1 qk ek .

The length of the secret key K increases both with the growth of the inner register and the value of the prime p. The vast amount of components and the great length of the key K make more difficult and dramatically slow up the process of searching through all different keys when decrypting the received message by malicious users. There is negative side on growing the prime p, and that is the decreasing the speed of the software encryption system. This is due to the imperfect software registers and the sequence manner of processor’s calculations in contrast of a hardware implementation. For this certain software implementation a tradeoff is made in order to find maximum security with minimum slow down. A prime p = 257 is chosen, where all feedback coefficients, the initial state and the initialization zero can be saved into 9 bits and the output of a 257GSSG is a single byte. The register length may vary from L = 8 to L = 34. When these elements are set the secret key K consists of the following components (fig. 4):  Initial state – aL– 1 aL – 2 … a1 a0, ai = 0, 1, …, p – 1; i = 0, 1, …, L – 1 (Lmin = 8, Lmax = 34);  Feedbacks – qL qL-1 … q1 q0, qi = 0, 1, …, p – 1; i = 0, 1, …, L – 1 (Lmin = 8, Lmax = 34);  Initial zero – z = 0, 1, …, p – 1

Table 2. Minimal length of the pLSFR register to ensure key length LK InSt of 256, 512 or 1024 bits p 2 3 5, 7 11, 13 17, 19, 23, 29, 31 37, 41, 43, 47, 53, 59, 61 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251

Lmin of pLSFR for 256 bits 512 bits 1024 bits 256 512 1024 128 256 512 85 171 342 64 128 256 52 103 205 43 86 171 37

74

147

32

64

128

The count of primitive polynomials with different base p and register length L is shown in table 3. In order to determine which one polynomial from the list should be used in the cryptographic system log 2 Cpoly  bits are needed as shown in the last column of table 3. Using the fact that log 2 p  bits are needed for presentation of the GF(p) elements the total length LK of the secret key can be calculated as: (4) LK  L  1log 2 p  log 2 Cpoly  , where L is the length of the inner pLFSR register.

Table 3. Count of primitive polynomials with different powers and bases p

L

17

32

61

32

127

32

167

32

257 257 257

8 10 16

257

24

257

32

257

34

Bit Count 18976458037657197225461286734659584000 124 10244854334997082026962604386814833429237905 183 489920000000 13788941890097745057084265918295403933459936 217 6308071077551957606400 85864901880654498516021075274859050725923408 229 8656177980910291910656000а 582729142999449600 59 37204727422640848896000 75 5511978689381920798272782377943040000 122 54468265154915537177576109136107838060351092 185 424704000000 98767885865554779011732725900114323996711704 250 7735833824574876556711690240000 82943263199168780322247079972233538359404412 266 641258708571048665491964779770675200 Count of primitive polynomials

114

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 111-121 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) Feedback Polynomial Initial State Zero

qL



aL-1



q1

q0

qi = 0, 1, …, p-1

a1

a0

ai = 0, 1, …, p-1

z

z = 0, 1, …, p-1

Figure 4. Main components of the secret key K in a software encryption system based on pGSSG generator

The minimum key length is calculated for the mentioned above initialization values only for base p = 257 LKmin = 9.9 + 59 = 140 bits, and the maximum key length is: LKmax = 35.9 + 266 = 581 bits. The contemporary symmetrical encryption applications/ systems in wireless communication networks most often use secret key with length up to 256 bits, like 3DES with key-length of 168-bits (3 times 56-bit DES key), AES – working with key sizes of 128, 192 or 256 bits. As it can be seen the pGSSG encryption system can have twice longer secret key. In order to calculate the time for brute force attack the count of all possible keys should be determined. The speed for conducting each test is also needed. The pGSSG based software encryption system is enabled to use all possible bit combinations with a certain length and therefore the count NK of all possible keys is: (5) N K  2 LK , where LK is the length of the secret key. Table 4 shows the count NK of all possible keys with different length. Table 4. Count of possible keys of length LK Key length LK [bits] 80 96 112 128 140 256 512 546 581

Count of possible different keys NK 1208925819614629174706176 79228162514264337593543950336 5,1922968585348276285304963292201.1033 3,4028236692093846346337460743177.1038 1,3937965749081639463459823920405.1042 1,1579208923731619542357098500869.1077 1,3407807929942597099574024998206.10154 2,3034438628061165479989957159352.10164 7,9145728471393450899360806726287.10174

The speed each key is tested is a secondary factor, thus it can be assumed that all keys are tested

independently and for equal amount of time. This parameter is closely tight to the budget of the organization conducting the attack. The attack is faster when is held on parallel processors due to the assumptions made so far. Each parallel processor checks part of the keys and no interaction between them is needed except for a stop signal when the correct key is found. 4 EXAMINING THE CIPHER SECRECY Determining the theoretical secrecy of a cryptographic system is a very complex mathematical task. The use of different extended Galois Fields GF(257L) makes the mathematical cryptanalysis of the pGSSG generator more difficult. To see if this system is usable, many questions need to be answered. They are related to its robustness and security when the attacker is not limited in time and has access to all possible means to analyze encrypted messages. Another question is could a single solution be found and what amount of data should be intercepted to get this solution. Due to the nonlinearity of modern encryption algorithms, a comprehensive fully mathematically justified answer could not be given. However, the entropy, suggested by Claude Shannon in “A mathematical theory of communication” [14], has found wide application in analyzing these issues since 1948. 4.1 Mathematical Model As it can be seen on figure 3, the pGSSG based software encryption system ciphers the entire plain text: both the data and the headers of the files. If we consider the byte organization of stored data, the mathematical model of the pGSSG encryption system can be presented as follows. The system (see Fig. 5) would work with 256 (1 byte) different symbols a0, a1, …, ai, …, a255 with corresponding probability for appearance P(ai), i = 0, 1, ..., 255. As a result of the encryption, these symbols are mapped into cryptograms b0, b1, …, bj, …, b255 with probability for appearance P(bj), j = 0, 1, ..., 255. The keys K0, K1, ..., Kn are equally possible and their maximum count NK depends on the key length. It is possible 115

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 111-121 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) that one input symbol is converted to a different output symbol when using different keys. The required and sufficient condition for the system to be completely secret [14, 15] is: P(bj) = P (bj / ai), j = 0, 1, ..., 255.

(6)

i.e. P (bj / ai) should not depend on the input symbol ai. K0, K23

a0 P(a0) KNk

K1 Kj

b0 P(b0)

a1 P(a1)

b1 P(b1)

ai P(ai)

bj P(bj) K2

an P(a255)

K78 K255 K5

b255 P(b255)

Figure 5. Mathematical model of the pGSSG software encryption system.

Here P(bj/ai) is the conditional probability of the encrypted symbol bj, provided that the input symbol is ai. That means that it is the sum of the possibilities of all keys that transform the symbol ai into bj. It is known that the perfect encryption system is achieved when the following three conditions are true [Sha49]: 1. Each message is associated with only one cryptogram. 2. The number of keys is equal to number of messages. 3. All keys are equally possible. Under these conditions, the entropy H of the system is n 1 n 1 1 1 H ( A)   P(ai ) log 2 P(ai )   log 2  log 2 n (7) n i 0 i 0 n

4.2 Experiments using Shannon Entropy As evident from the mathematical model of the pGSSG software encryption system, it is not absolutely secret because it does not fulfill

conditions 1 and 2. Studies are made to answer the question how close the proposed system is to the perfect case scenario in (7). For this purpose over 100 different files are tested. They are distributed equally in the main types: text documents (.doc, .docx, .txt), images (.bmp, .jpg, .png), executable files (.exe), audio files (.wav, .mp3) and archives (.zip, .rar). The frequencies of occurrence of all characters in input and encrypted files are studied and their entropy is calculated. Furthermore, for image files their three color components R, G and B were analyzed separately. In Table 5 the Shannon entropy of the input and output files for four files from each group are shown. Figure 6 demonstrates the distribution histograms of the plain text and encrypted text for different types of files. The results for the three color components R, G and B of the images are respectively shown in Table 7 and Figure 7. More than 100 sequences of length 1 000 000 bits were generated for each password in order to check how the password transforms into keystream. They were then tested via NIST [16], [17] test suite to obtain their properties. It is known that the crypto-analytics can use the existing dependencies in the occurrence of characters in various types of information and the model of the standard headers in different file types. That information can help them decrypt the data. Thus they have blocks of plain text and when capturing the encrypted message and they can map it to the corresponding encrypted text. The use of additional level of encryption with the software encryption system eliminates these possibilities as it uniformly changes the values of the symbols for the whole length of the file, including the header part. This conclusion is confirmed by the results shown in Tables 5 and 7. Analysis shows that the entropy of encrypted files slightly differs from the perfect encryption system entropy with less than 0,001 bit which is 0,0125 % deviation from the perfect secrecy. The entropy of a perfect secrecy system with 256 equally possible symbols is: 255 1 (8) H ( A)   log 2 2 8  8 bit. i 0 256

116

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 111-121 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) Table 5. Entropy of input and pGSSG encrypted files. № 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.

File

Password

explorer.exe 2345explorer encryptPhD.exe Fast2Enc%(! notepad++.exe ++нотпад++ WinRAR.exe KoМпReСиQ CompleteSet.docx 2013BCPC DiplomnaRabota.docx Info4диплом document.docx Просто*Tekst TrainingTasks.docx Passw0rd presentation.pptx Imagine8№ IFRToolLog.txt лог$а@к§n& leMouse.rar el^gAto-te Raboti.rar lANg25cop18% Zadaniq.rar worry35ePIc^ Zadaniq.zip rAts24mOd= ALARM.WAV VarY39eRns& cat8.wav NiX^pEn=Dry1 TU.bmp myUniversit1 lenna.bmp vAN83scoUR@ snowman.jpg faLL!kOR?jiltS sozopol.jpg idS?OcHRy65

Entropy of Input File Encrypted File 5.872596706508290 7.999598134607586 7.631900643959939 7.998833667709484 6.091405544356415 7.999648675667926 6.469161170026363 7.999644709803738 7.948642291913700 7.998595024152175 7.524045264552306 7.989391747378668 7.910834642297916 7.998354864955056 7.948642291913700 7.998405935750952 7.937660319385949 7.999896904729666 4.714971677469938 7.999566854515114 7.986186549835893 7.998523780587349 7.999329046994613 7.999986138240808 7.999026651474129 7.999558585297283 7.997750400646980 7.999661066662499 6.871613007127728 7.997961560964973 4.572460198612183 7.998654459803242 7.735134585817764 7.999598750116132 5.682224748742369 7.999696018375432 7.927340915095799 7.995697933926553 7.966401975045295 7.997800281093063

As seen on Figure 6 the distribution of symbols in most file types differs radically from the uniform distribution. Exceptions are the archive files whose distribution is largely similar to the uniform and this is no coincidence, because in the archives different algorithms for compressing data are applied. This feature of the archives determines their entropy, which depending on the compression algorithm can reach up to 7.998. However in the distribution of the symbols there are usually detectable peaks of the symbols having a value of 0 or 255 (see Figure 6.c). These anomalies in the histograms are eliminated by the use of encryption with the pGSSG generator. 5 COMPARISON WITH OTHER CIPHERS In this section we compare our 257GSSG software stream cipher with four eSTREAM finalists from profile 1 which use large LFSR and a nonlinear filter with memory. The eSTREAM project was launched in 2004 as part of the EU-sponsored ECRYPT Framework VI Network of Excellence [18]. The primary goal of eSTREAM was to help developers how to analyze and design stream ciphers. To promote research in stream ciphers, a call for new proposals has been made. Two specific stream cipher profiles were identified: Profile 1: Stream ciphers for software applications with high throughput and Profile 2:

Stream ciphers for hardware applications with highly restricted resources [19]. In addition, to emphases the importance of providing an authentication method along with encryption, two further profiles were proposed: Profile 1A: Stream ciphers satisfying Profile 1 with an associated authentication method and Profile 2A: Stream ciphers satisfying Profile 2 with an associated authentication method. The original call provoked significant interest and 34 stream ciphers were submitted by the deadline of April 29, 2005. All candidates are evaluated by some significant criteria like security, performance compared to the AES and to other submissions, justification and supporting analysis, simplicity and flexibility, and completeness and clarity over the tree phases of eSTREAM. Only 16 algorithms were advanced to the final phase of eSTREAM by eight in Profile 1 and 2. Four of the final eight Profile 1 ciphers use NonLinear FSR (NLFSR). They are CryptMT v3 [20], DRAGON [21], NLS v2 [22] и SOSEMANUK [23]. CryptMT version 3 is a stream cipher obtained by combining a large LFSR and a nonlinear filter with memory using integer multiplication. Its period is proved to be no less than 219937−1. The key-size can be flexibly chosen from 128 bits to 2048 bits, as well as the IV-size. The authors claim that the security level is the same with the minimum of the Key-size and the IV-size. Dragon is a word-based stream cipher, which state is initialized with 128- or 256-bit key-IV pairs. Dragon uses a single 1024-bit word based NLFSR and a 64-bit memory M which give a state size of 1088 bits. The period for the sequence produced by a 1024-bit NLFSR is 2512 and since the counter M has a period of 264, the expected Dragon period is 2576. NLSv2 is a synchronous stream cipher designed for a secret key that may be up to 128 bits in length. NLSv2’s stream generator is constructed from a NLFSR and a non-linear filter. NLSv2 is intended to provide security under the condition that no nonce is ever reused with a single key, that no more than 280 words of data are processed with one key, and that no more than 248 words of data are processed with one key/nonce pair. 117

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 111-121 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) Sosemanuk is a synchronous software-oriented stream cipher with variable key length between 128 and 256 bits. Any key length is claimed to achieve 128-bit security. It uses a non-singular LFSR which operates over elements of GF(232). The output Sosemanuk sequence of 32-bit words is periodic and has maximal period 2320 − 1. Table 6. Comparison of Key length, IV length and Period of Stream Ciphers Key Length, bits CryptMT v3 from 128 to 2048 DRAGON 128 or 256 NLS v2 up to 128 SOSEMANUK from 128 to 256 pGSSG, p = 257 from 297 to 547 Stream Cipher

IV Length, Period bits from 128 ≥ 219937 – 1 to 2048 128 or 256 2576  280 words 128  2320 – 1 144

256.25731.8 > 2259

Here we consider only software version of pGSSG that is constructed using a single 257-ary LFSR with length L = 32. For one feedback polynomial the initial state of pGSSG is populated using the key K in conjunction with Initial Vector IV. The initial filling of the 257LFSR is done by 32 clock cycles as follows: K , ai   i  ( K i  IVi 15 ) mod 257,

0  i  15 16  i  31

(9)

For a single feedback polynomial the key K consists of 32 in count 257-ary digits, representing the initial state, and one digit for the initial value of 257-ary zero in binary pGSSG sequence. Due to the fact that a 257-ary digit can be presented binary with log2257 = 9 bits, the key length for

one feedback polynomial in bits is 33.9 = 297 bits. The IV length is 16 257-ary digits, which in bits is 16.9 = 144 bits. Any key length is claimed to achieve 297-bit security. The other 250 bits of the key define which of the feedback polynomials is used to constructs the pGSSG. In this case the maximum key length can be calculated as 297 + 250 = 547 bits. The period for the sequence of 257-ary digits produced by a 257LFSR with length L = 32 is 25732 – 1. Due to self-shrinking procedure of pGSSG, the output pGSSG sequence is non-linear and its expected period is T = (p – 1).pL – 1 = 256.25731, assuming the pGSSG sequence of 257-ary digits is pseudo-random [10, 24]. Because every 257-ary digit in pGSSG sequence is transformed into log2(257 – 1) = 8 bits, then the period of pGSSG output binary sequence is greater than 256.25631.8 = 28.28.31.23 = 2259. To make the design of pGSSG more robust against cryptanalytic attacks we change the primitive feedback polynomial of used 257LFSR before the pGSSG period is produced. The number of distinct primitive polynomials in GF(25732) is shown in Table 3. The comparison shows that 257GSSG provides 297-bit security which is more than three of the eSTREAM finalists shown in Table 6. Only CryptMT v3 offer variable key length from 128 to 2048 bits, which may provide more than 297-bit security. The period of 257LFSR is less than the period of CryptMT v3 and DRAGON, more than the period of the NLS v2 and similar to the period of SOSEMANUK.

Table 7. Entropy of the colour components R, G and B of the input and pGSSG encrypted images. № Image

Password

1. 2. 3. 4. 5.

*&^%^%sG *&*B”}@E Poiuytr^B& M*b%V)JY 12345678

medal.bmp banner.bmp DarkDoor.bmp pepper.bmp lenna.bmp

Input Image R 6.0038 7.3162 7.2023 7.3388 5.0465

G 5.9726 6.9769 7.0943 7.4962 5.4576

Encrypted Image B 6.0043 6.8664 7.0060 7.0583 4.8001

R 7.99928 7.99977 7.99934 7.99930 7.99914

G 7.99943 7.99973 7.99935 7.99924 7.99918

B 7.99937 7.99974 7.99924 7.99935 7.99904

118

Encrypted File

Input File

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 111-121 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

a) text file IFRToolLog.txt

b) executable file notepad++.exe

c) archive file leMouse.rar

Figure 6. Distribution histograms for symbols in input and pGSSG encrypted files: a) text files, b) executable file, c) archive file.

Input File

Encrypted File

Figure 7. Input and encrypted image and the corresponding distribution histograms for the R, G and B values.

119

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 111-121 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

6 CONCLUSIONS AND FUTURE WORK

3.

The secrecy of the software pGSSG encryption system is tested with the aim of its mathematical model using the term Entropy. It is proved that the system does not have perfect secrecy but transforms the data into such with uniform distribution of characters. The analysis shows that the entropy of encrypted files compared to the perfect encryption system differs with less than 0,001 bits, which is 0,0125 % deviation from the perfect secrecy. The comparison with eSTREAM finalists from profile 1 which use large NLFSR shows that software version of 257GSSG with length L = 32 offer 297-bit security which is more than the security of DRAGON, NLS v2 and SOSEMANUK. The period of the pGSSG stream cipher is compared to those of the eSTREAM finalists and it is stated that it is times longer than NLS, similar to SOSEMANUK, but shorter that the other two. This shortness is compensated by simply changing the primitive polynomial of the generator. The task of decrypting the received data has been made more complicated for the crypto-analytics when using known dependencies in the occurrence of characters in different types of information. However, there are some practical issues that need to be addressed. First, to measure the degree of randomness of sequences generated by pGSSG some statistic experiments using approximate entropy [25] must be done. Second, to analyze the problem of finding the secret key in pGSSG software system min-entropy can be used [11], which determines the probability of guessing the correct value at first attempt. Moreover, it is necessary to find the average number of guesses needed to determine the key, which is given by the guessing entropy [11].

4.

7 REFERENCES

17.

1. 2.

Massey J., Shift-register synthesis and BCH decoding, IEEE Transactions on Information Theory, vol. 15, no. 1, pp. 122–127, (1969) Gong G., Sequence Analysis, University of Waterloo, p. 137, (1999) http://calliope.uwaterloo.ca/~ggong.

5.

6.

7.

8.

9.

10.

11. 12. 13. 14. 15. 16.

Golomb S., Shift Register Sequences, Aegean Park Press, Laguna Hills, Calif, USA, revised edition, (1982) Kanso, A. Clock-controlled generators. University of London, (1999) Tashev, T., Bedzhev, B., Tasheva, Zh. The Generalized Shrinking-Multiplexing Generator, ACM International Conference Proceeding Series 285, Article number 48, Proceedings of the 2007 international conference on Computer systems and technologies CompSysTech '07, (2007) Tasheva, Z., Bedzhev, B., Stoyanov, B. P-adic shrinking-multiplexing generator, Proceedings of the Third Workshop - 2005 IEEE Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, IDAACS 2005, pp. 443 – 448, (2005) Tasheva Zh. N. Design and Analysis of 3-ary Generalized Shrinking Multiplexing Generator, International Journal of Advance in Communication Engineering 4 (2), pp. 129-140, (2012) Coppersmith D., H. Krawczyk, Y. Mansour, The shrinking generator, Advances in Cryptology – EUROCRYPT’93, vol.773 of LNCS, Berlin, SpringerVerlag, pp. 22-39, (1993) Meier W., O. Staffelbach, The self-shrinking generator. In A.De Santis, editor, Advances in Cryptology – EUROCRYPT ’94, vol.950 of LNCS, Berlin, SpringerVerlag, pp. 205-214, (1995) Tasheva A. T., Tasheva, Zh. N., Milev, A. P., Generalization of the Self-Shrinking Generator in the Galois Field GF(pn), Advances in Artificial Intelligence, vol. 2011, Article ID 464971, 10 pages, (2011) doi:10.1155/2011/464971 Cachin, C. Entropy measures and unconditional security in cryptography. Konstanz: Hartung-Gorre, (1997) Gray R., Entropy and Information Theory, Springer, Second Edition, (2011) Lidl, Rudolf. Finite fields. Vol. 20. Cambridge University Press, (1997) Shannon, C. E. “A mathematical theory of communication.” ACM SIGMOBILE Mobile Computing and Communications Review 5, no. 1, pp. 3-55, (2001) Shannon, C. E. “Communication theory of secrecy systems.” Bell system technical journal 28, no. 4, pp. 656-715. (1949) NIST Statistical Test Suite, Version 2.1.1., August 11, (2010), http://csrc.nist.gov/groups/ST/toolkit/rng/docum entation_software.html Rukhin A., J. Soto, et.al, NIST Special Publication 80022rev1a: A Statistical Test Suite for the Validation of Random Number Generators and Pseudo Random Number Generators for Cryptographic Applications, (2010), http://csrc.nist.gov/groups/ST/toolkit/rng/ documents/SP800-22rev1a.pdf.

120

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 111-121 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) 18. Babbage, Steve, et al. "The eSTREAM portfolio." eSTREAM, ECRYPT Stream Cipher Project. April 15, (2008). http://www.ecrypt.eu.org/stream/portfolio.pdf. 19. ECRYPT. The eSTREAM project, http://www.ecrypt.eu.org/stream/ 20. Zhang, Haina, and Xiaoyun Wang. “On the Security of Stream Cipher CryptMT v3.” IACR Cryptology ePrint Archive 2009, pp. 110, (2009) 21. Chen, Kevin, et al. "Dragon: A fast word based stream cipher." Information Security and Cryptology–ICISC 2004. Springer Berlin Heidelberg, Seoul, Korea, pp. 3350. (2005) 22. Hawkes, Philip, et al. "Specification for NLSv2." New Stream Cipher Designs. Springer Berlin Heidelberg, pp.57-68. (2008)

23. Cho, Joo Yeon, and Miia Hermelin. “Improved linear cryptanalysis of SOSEMANUK.” Information, Security and Cryptology–ICISC 2009. Springer Berlin Heidelberg, pp. 101-117. (2010) 24. Tasheva, A., Nakov O., and Tasheva, Zh. About balance property of thep-ary generalized self-shrinking generator sequence. In Proceedings of the 14th International Conference on Computer Systems and Technologies (CompSysTech '13), ACM, New York, NY, USA, pp. 299-306. (2013) 25. Pincus S., B. H. Singer, Randomness and degrees of irregularity, Proc. Natl. Acad. Sci. USA. Vol. 93, pp. 2083-2088 (1996)

121

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 122-129 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

The Consequences of State-Level Intrusions: A Risk Worth Taking? Murdoch Watney Department of Public Law University of Johannesburg Johannesburg, South Africa [email protected] This article is based on research supported in part by the National Research Foundation of South Africa (UID85384). Opinions expressed are those of the author and not the NRF.

ABSTRACT Over the years states have intruded the cyberspace of other states. Does the offending state consider the consequences it may face for the intrusion under international law and/or at international level? Is the position at present so uncertain that a state may decide that the risk is worth taking in view of the problems experienced with establishing attribution or is the injured state being second guessed on its reaction? This discussion explores, with reference to examples of state-level intrusions, whether the intrusion is prohibited under the international law or not and the possible consequences the offending state may face. The danger exists that as countries develop and/or improve their cyber capabilities, they may follow the example of countries who had intruded the cyberspace of other countries. It will be difficult for the latter to preach restraint from the moral high ground. Past state behaviour illustrates that states will put national interests and aspirations above trust, openness and transparency in cyberspace. Cyberspace is becoming a crowded place where state behaviour necessitates governance otherwise cyberspace will become lawless to the detriment of all states including those states who in the past may have decided state-intrusion is a risk worth taking.

KEYWORDS International law, cyberspace, state-level intrusion, consequences, state accountability, attribution, state behaviour.

1

INTRODUCTION

In 2007 Israeli aircraft were able to enter the Syrian airspace undetected. This was made possible after Israel had manipulated the Syrian computerized air defense radar system into not displaying aircraft entering its airspace. As a result of this cyber manipulation, Israel successfully bombed a facility in Syria which was allegedly being built for the development of a nuclear weapon [1]. Although Syria and Israel are neigbouring countries, their relationship can be described as hostile and it is therefore not surprising that Israel saw the possible development of a nuclear weapon as a serious threat to its national security. [1] In the given example Israel sent out a clear message to Syria and other countries that it would take matters into its own hands and defend its national security interest where international obligations under the international law are ignored. It may also have paved the way for the 2010 usage of the Stuxnet worm against Iran. The above-mentioned cyber intrusion, which can be classified as air defense radar system manipulation, should have been a wake-up call to the international community that not only do some countries have the cyber capabilities to achieve such intrusion, but it will also be used. The report that as many as 13 planes flying over Europe vanished from radar screens in June 2014 during an unprecedented series of blackouts that lasted 25 minutes, are worrying [2]. Air-traffic control centres in Austria, southern Germany, the Czech Republic and Slovakia all reported the same incident. Unsubstantiated claims were made that the air traffic control system may have been hacked. Playing devil’s advocate and assuming that a

122

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 122-129 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) country may have been testing its cyber capabilities, it may be asked: which type of intrusion would it constitute; what consequences may such a country face; and would a country even take such a risk? The core discussion focusses on the possible consequences a state may face for intruding into the cyberspace of another state and whether it is a risk worth taking. The discussion does not deal with state-level cyber intrusions where countries are engaged in physical combat (armed conflict, also referred to as ius in bello), but it deals with noncombat (peaceful) state of affairs where a state intrudes into the cyberspace of another country to achieve a specific objective. Interestingly enough, although the topic under discussion is one that warrant attention, it has been neglected – some may say, even avoided - as it has the potential of evoking controversy since it touches on many complex inter-related issues, such as the international law, international politics and relations between states. A clear understanding of the possible consequences a state may face for its role in statelevel intrusion calls for a brief discussion of whether state-level intrusion is prohibited under international law, and if not, whether a state may face consequences at international level. 2 AN OUTLINE INTERNATIONAL LAW STATE-LEVEL INTRUSIONS 2.1

OF THE GOVERNING

Introduction

A clear distinction must be drawn between state-onstate cyber intrusions at national and international level:  At national level states are sovereign and may implement their own national laws which are applicable within their territory. State-level intrusion may constitute a crime within the ambit of the victim country’s national laws, but as the intrusion constitutes a state-on-state intrusion, it will have to be dealt with at international level.  At international level it will be determined whether the intrusion is prohibited in terms of the international law or not. The consequences a state may face for its role in the intrusion will depend on

whether the intrusion constitutes a prohibited intrusion under international law. 2.2 Prohibited state-level cyber intrusions under international law The following state-on-state cyber intrusions are prohibited under the international law: 1. An intervention is prohibited. [3] It is not expressly set out in the United Nations Charter, but the prohibition of intervention is implicit in the principle of the sovereign equality of states laid out in article 2(1) of the United Nations Charter. 2. The threat of or the use of force is prohibited. [3] Article 2(4) of the United Nations Charter provides that “All Members of the United Nations shall refrain in their international relations from the threat of use of force against the territorial integrity or political independence of any state or in any other manner inconsistent with the purposes of the United Nations.” The prohibition is also a norm of customary international law. 3. Use of force that constitute an armed attack is prohibited [3]. Article 51 of the United Nations Charter provides that “nothing in the present Charter shall impair the inherent right of individual or collective self–defence if an armed attack occurs against a member of the United Nations until the security council has taken the measures necessary to maintain international peace and security.” The article reflect the customary right to self-defence. 2.3 A void under existing international law Although the international law provides for the above-mentioned prohibited intrusions, it does not indicate which intrusions fall within a prohibited category and/or when the threshold of a category is exceeded. International lawyers must interpret an intrusion and establish the category of intrusion and threshold of intrusion, but this is not easy and the matter is open to different interpretations. Over the years state-level intrusions have escalated and have incorrectly been referred to as “cyber war” which is an inadequate term as it does not accurately describe intrusions between states. [4] Countries require clarity pertaining to state-level intrusions. Against the above-mentioned background, the NATO Cooperative Cyber Defence Centre of

123

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 122-129 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) Excellence (CCD CoE), an international military organisation based in Tallinn, Estonia, invited an independent international group of experts (hereafter referred to as experts) in 2009 to produce a manual on the law governing cyber warfare. In 2013 the Tallinn Manual on International Law applicable to Cyber Warfare (referred to as the Tallinn Manual) was published under the directorship of Michael Schmitt, a US professor of international law [3]. 2.4 Tallinn Manual international law?

as

a

source

of

There is no treaty on cyber warfare. The Tallinn Manual is not a source of international law. Schmitt as the editor of the Tallinn Manual stated: “We wrote it as an aid to legal advisers, to governments and militaries, almost a textbook. We wanted to create a product that would be useful to states to help them decide what their position is. We were not making recommendations, we did not define best practice, we did not want to get into policy” [5]. Kono [6] states that the Tallinn Manual may in future be adopted into the practice of states, but as will be illustrated hereafter, it is doubtful whether such a level of consensus regarding its application will be reached amongst all states:  The drafting process of the Tallinn Manual has been criticized for not being representative of the international community. Mälksoo [7] observes that the experts came from American and so-called old European backgrounds. He questions the absence of experts from other countries such as China, Russia, Poland or Hungary. In defense of the Tallinn Manual it may be observed that it was not drafted under the auspices of the UN but NATO CCD CoE.  Unfortunately some non-western countries may perceive the Tallinn Manual as a product conceptualising the law derived from the practices of western countries, especially after the unfortunate observation that the main conclusions in the Tallinn Manual are “largely congruous with those of the United States government” [7]. Some countries may feel uncomfortable with such an observation as there exists a perception – whether it is justifiable is not under discussion – that the US wishes to dominate cyberspace by enforcing its

opinions and policies onto cyberspace. Edward Snowden’s revelations in 2013 of US mass and unrestrained espionage practices did the US no favours as it confirmed some countries’ allegations that the US is vying for state superiority in cyberspace [7]. In the light of the aforesaid, many non-western countries will most probably not support the Tallinn Manual as a so-called textbook in establishing the legal position to cyber intrusions under international law. A similar problem is experienced with the Council of Europe Convention on Cybercrime of 2001 which some countries also perceive as a European instrument. It is regrettable that the study on the application of international law to cyber intrusions was not conducted under the auspices of the UN with representation from all countries. Be that as it may, the Tallinn Manual cannot be excluded from a discussion on state-level cyber intrusions. It is the only guide on cyber warfare and therefore a most useful and valuable guide [8]. Unfortunately the Tallinn Manual is not without shortcomings which is understandable taking into account the complexities of cyber intrusion. Some of the shortcomings will be highlighted within the context of the topic under discussion. Whether an intrusion will constitute “use of force” (rule 11 of the Tallinn Manual) or an “armed attack” (rule 13 of the Tallinn Manual) is not easily established. Acts that injure or kill persons or cause physical damage or destroy objects are unambiguous use of force. The use of force is a prerequisite for an armed attack to exist [6]. A gap exists between use of force and armed attack, but it is not easy to determine when the threshold was exceeded and use of force escalated to an armed attack [3] [6]. The Tallinn Manual focuses on a “scale and effects” approach in determining when an intrusion amounts to the use of force. The experts offer a non-exhaustive list of eight indicative criteria a state may take into account when assessing whether the intrusion has reached the use of force threshold. An armed attack constitutes a higher threshold than use of force and would be the gravest form of use of force. The experts could not agree whether the use of force that does not cause physical damage, but an adverse

124

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 122-129 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) effect for example financial loss, would constitute an armed attack [3] [8]. When determining whether an intrusion falls within the ambit of the international law or not, cognizance must be taken of the legal position as outlined in the Tallinn Manual as well as the interpretations given by commentators on the international law. 3 APPLICATION OF THE INTERNATIONAL LAW TO PRACTICAL EXAMPLES OF STATE-LEVEL INTRUSIONS Much has been written on this aspect [9]. The purpose of briefly discussing the application of the international law with reference to examples is to lay the groundwork for the discussion hereafter at paragraph 4 of the consequences a state may face for intruding into the cyberspace of another. Example 1: Use of a cyber intrusion in combination with conventional weapons while countries are not engaged in physical combat The Tallinn Manual is only applicable to cyber-tocyber operations and does not address whether the cyber intrusion in the Israeli-Syrian example fall under international law. Kono [6] is of the opinion that such an operation qualify as part of an armed attack when it is an integral part of the whole attack even if it does not cause physical harm. However, it may be argued that Israel’s intrusion was justifiable as anticipatory self-defence to protect its national security. It may be asked: What would the position under international law have been had Israel only disabled the traffic radar system, but then decided not to proceed with the physical attack? Example 2: Distributed Denial of Service attacks In 2007 Estonia decided to relocate a Soviet war memorial, namely a statue of a Russian soldier who had fought during World War II, to a military cemetery [1]. The decision of Estonia resulted in an outcry from Estonians of Russian descent and Russia which saw such removal as an affront to Russia’s national interests. The consequence of Estonia’s decision saw the first cyber intrusion of its kind being launched against a state’s information infrastructure as a whole. The DDosS attacks targeted for example websites of the government,

political parties and banks [10]. It lasted three weeks and caused a lot of inconvenience and disruption which resulted in significant economic damage since virtually all online business transactions could not be processed for several days. Any business that earned revenue through online advertisements on their websites lost income while their websites were down. Although Russia was accused of being behind the attacks, which it denies, not everyone believed the denial [10]. The experts of the Tallinn Manual are of the opinion that the intrusion never reached the threshold of use of force and therefore did not constitute an armed attack, although at the time of the intrusions many referred to it as cyber warfare [3]. Example 3: Usage of malware Israel and the US were concerned about Iran’s continued insistence on developing a nuclear weapon which these countries saw as a threat to national and global security [10]. Taking into consideration the successful 2007 Israel-Syria intrusion, these countries – (unofficially confirmed in 2012 [11]) – used the Stuxnet worm in 2010 to sabotage uranium enrichment centrifuges controlled by high-frequency converter drivers used by the uranium enrichment facility at Natatz. The malware had been injected into the network not by means of the internet, but by means of infected removal media [10]. The experts of the Tallinn Manual were divided as to whether the usage of Stuxnet constituted an armed attack, but the majority agreed that it was use of force [3]. The experts of the Tallinn Manual are of the opinion that although an armed attack and therefore cyber warfare is a distinct possibility, it has not occurred. Some commentators such as Iasiello [10] is of the opinion that it was an armed attack and the first example of cyber warfare. He substantiates his argument with reference to the sophistication of the malware, its functionality, the intent behind its deployment and its clandestine appearance on a non-internet connected industrial control system network. Interestingly enough, Fidler [12] is of the opinion that it did not reach the threshold of use of force because of state practice. The Stuxnet example clearly illustrates how

125

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 122-129 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) different interpretations may be given to the same intrusion. Example 4: Espionage State-level espionage is not new. The experts of the Tallinn Manual indicated that state-level espionage is not prohibited under international law [3]. It may be seen as interference in the sovereignty of a state, but it is not a prohibited intrusion. In 2013 Snowden, a former US National Security Agency contractor, revealed that the US had employed unrestrained espionage practices to collect mass information of various heads of state and citizens in other countries [13]. The US defended itself by indicating that the motivation for the surveillance was the protection of national security [14]. The US however, accused China of employing unrestrained industrial espionage with the purpose of stealing US trade secrets to the detriment of the US economic security [1]. Countries such as Taiwan also accused China of espionage [15]. 4. CONSEQUENCES FACING A STATE FOR INTRUDING INTO THE CYBERSPACE OF ANOTHER STATE 4.1 Introduction A state entering the cyberspace of another state faces consequences of which many are uncertain. An offending state cannot foresee or predict how the victim (injured) state will react to the intrusion within the international arena. In the light of this uncertainty, a state may decide that the risk of statelevel intrusion is worth taking. Most states will not openly acknowledge that it had intruded into the cyberspace of another. The purpose of the intrusion is to secretly, quickly and anonymously enter the cyberspace of another state to achieve a specific objective. The characteristics of cyberspace make attribution of the intrusion to a state difficult. It is possible for a state to hide the origin of its intrusion. In the absence of conclusive proof of the offending state’s intrusion, it is easy for a state to deny the intrusion and escape accountability. However, in some instances evidence confirming attribution is so conclusive that a state cannot deny the intrusion. A good example is the 2013 Snowden revelations of US unrestrained espionage practices

[13]. In other instances, there may not be conclusive evidence, but there may be strong circumstantial evidence as well as suspicions linking the state to the intrusion, for example the Estonian DDoS attacks and allegations that Russia supported the patriotic hacktivists [7] [10]. However, attribution to a state is not possible without conclusive proof. Rule 7 of the Tallinn Manual [3] states that the mere fact that a cyber operation has been launched from governmental cyber infrastructure is not sufficient evidence for attributing the operation to that state. This is exactly the claim the Russian government made (as well as China in 2001) when the US networks were intruded [1]. A country may at a later stage unofficially acknowledge the intrusion, such as the US in respect of the Stuxnet worm [11]. Countries may also deduce from international politics and relations between states which state(s) may have been responsible for the intrusion. 4.2 Legal consequences under international law The international legal system has no central authority to enforce compliance with the international law. In these circumstances states claim the right to enforce compliance with rules of the international law by responding to an illegal act with a reciprocal illegal act designed to compel compliance [16]. If an intrusion is prohibited under international law and there is conclusive evidence confirming the intrusion attributable to a state, then the offending state may face the following possible legal consequences:  The offending state may be held responsible under the law of state responsibility for its wrongful act. Dugard [16] indicates that where a state commits an international wrong against another state, it incurs international responsibility. In such a case an offending state is obliged to make reparation. The experts of the Tallinn Manual [3] indicated that the injured state may only use countermeasures to induce compliance with international law by the offending state. The majority of experts agreed that if the international wrongful act in question has ceased, the victim state is not entitled to initiate or

126

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 122-129 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) to persist in countermeasures [3]. Cyber countermeasures may not involve a threat of or use of force (rule 11 of the Tallinn Manual). The experts also distinguished between countermeasures and acts of retorsion. Acts of retorsion are so-called unfriendly, although lawful measures, a state takes against another, for example during the 2007 Estonian cyber intrusions, Estonia suspended some services to Internet Protocol (IP) addresses from Russia [3]. Other examples of retorsion are limitation of normal diplomatic relations, a trade embargo not in violation of a treaty obligation or termination of an aid programme [16].  Where the offending state was responsible for an armed attack, the victim state may resort to use of force under article 51 of the UN Charter. A victim state may only act in self-defence if it was the victim of the most grave form of use of force constituting an armed attack [6]. The victim state does not have an unlimited right to use force and it must adhere to the conditions of necessity or proportionality based on customary international law. Kono [6] indicates that the issue of anticipatory self-defence against a cyber attack will have to be discussed in future. At present the Tallinn Manual allows a victim state to take anticipatory actions even before an armed attack has been launched [3]. Taking into consideration that determining the category of intrusion and/or establishing conclusive proof of attribution may be stumbling blocks to state accountability, the offending state may decide that the risk is worth taking. 4.3 Consequences at international level A state may not incur consequences under the international law, either because the intrusion is not prohibited, such as espionage or the injured (victim) state may have decided not to take action, either because of the challenges relating to establishing accountability or because the injured state does not want to damage its relations with the offending state. At international level the offending state runs the risk of not achieving the objective with the intrusion and at the same time damaging its international relations with the injured state and other states that may have come out in support of the victim country. Russia may have thought that Estonia

would give in to its demand not to remove the statue, but Estonia did not change its decision [10]. Estonia publicly acknowledged it had been the victim of a massive cyber intrusion and it accused Russia of orchestrating the attacks which Russia denied. Many NATO countries came out in support of Estonia and interestingly enough, the attacks firmly placed Estonia on the world map. NATO took the attacks so seriously that in 2008 the CCD CoE was established in Estonia. The attacks may have had a positive spin-off for Estonia, but Iasiello [10] indicates that if the state behind the attacks was indeed Russia as circumstantial evidence suggests, then it was an unqualified failure as an instrument of public policy as it was unsuccessful in enforcing the Russian policy onto another country. Stuxnet clearly illustrates that the intrusion may not have been worth the risk. Although the Stuxnet worm delayed the development of a nuclear bomb, it is now doubtful whether it delayed the development permanently. Stuxnet was unfortunately not contained at the Natanz nuclear energy facility, but it spread beyond Iran and may be used against other countries [1]. Although Iran may not have publicly accused the US for the statelevel intrusion, it employed state-level intrusions in retaliation. In 2012 Iran attacked Amcu, a SaudiArabian oil plant and left behind a “calling card” of a burning US flag. [10] Iran then went on to launch a series of sequential attacks against the US financial industry including JPMorgan and Wells Fargo which resulted in the slowing down of overwhelmed servers and denying customers access to the bank services [10] [15]. Iran may have acted in retaliation to indicate to the US that it does have the cyber capabilities to intrude into the US cyberspace and that it will not accept intrusions into its cyberspace. Other countries may not have voiced their reservations of the usage of Stuxnet, but the use of what may be referred to as a cyber weapon has created a militarized environment where such intrusions may be seen as acceptable. Knake, [1] a former US security advisor, indicated that the US had crossed a rubicon and that the reversal of the consequences that came with the Stuxnet usage might not easily be accomplished. He warned that “the US has also launched what is likely to be a

127

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 122-129 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) cyber boomerang, a weapon that will someday be used to attack some of America’s own defenseless networks.” [1] The 2013 Snowden espionage revelations illustrate how a country may be seen as a so-called rogue country within the international community. [17] Countries are suspicious of the US motives for spying on them. The US has indicated that it gathered information for national security purposes, but some countries see the intrusions as enforcing its superiority onto cyberspace and advancing only its own national interest to the detriment of the existence of trust, openness and transparency in cyberspace. [17] Practicing restraint would have given the US a leg to stand on when complaining about other countries’ state-level intrusions and also in respect of cyberspace governance.

offending state did not achieve the objective for the intrusion. A state may feel the risk outweighs the consequences it may face, especially as it is not clear which consequences a state may face. In the light of past intrusions, states may have a perception that state-level intrusions are acceptable and that anything goes in cyberspace. It is time that acceptable state behaviour is established under the international law otherwise cyberspace may become a place where states willy-nilly pry on other states to advance their own interests and power or retaliate against the intrusion. Clarke [1] is correct when he states: “…if you are going to throw cyber rocks, you had better be sure that the house you live in has less glass than the other guy’s, or that yours has bulletproof windows.” But do states wish to inhabit such a world? I would hope not.

4.4 Legal consequences at national level of a state Addressing state-level espionage at international level by means of diplomatic discussions may not be successful. The US has on numerous occasions accused China of industrial espionage which allegations China has vehemently denied. [1] In May 2014 the US took an unprecedented step and one that may have taken China and other countries by surprise when it instituted criminal charges against 5 Chinese military personnel who had allegedly committed industrial espionage on behalf of the Chinese government by gathering US trade secrets. The US decision to institute criminal charges against Chinese nationals conveyed a symbolical message to the offending state, China, that it would not tolerate such state-level intrusions [18]. The US must have weighed the Chinese-US espionage practices against a subsequent strained US-China relationship and must have decided that US economic security outweigh smooth international relations. Only time will tell if such state actions will have the desired effect on the offending state.

6

5

CONCLUSION

Although cyberspace allows for state-on-state intrusions, it does not imply a state should intrude into another state’s cyberspace merely because it is able to. As illustrated, in some instances the

1. 2. 3. 4.

5.

6.

7. 8.

9.

10.

REFERENCES Clarke, R.A, Knake, R.K.: Cyber War. HarperCollins Publishers, New York (2012). Day, M.: “13 Planes vanish from radars over Europe,” http://www.telegraph.co.uk/news/worldnews/Europe/A ustria/10898385/13-planes-vani... Schmitt, M.N.: Tallinn Manual on the International Law applicable to Cyber Warfare, New York, Cambridge University Press (2013). Smith, P.: “How seriously should the threat of cyber warfare be taken?” http://www.eir.info/2014/01/17how-seriously-should-the-threat-ofcyber-warfare-be.. Zetter, K.: “Legal experts: Stuxnet attack on Iran was illegal act of force,” http://www.wired.com/threatlevel/2013/03/stuxnet-actfor... Kono, K.: “Briefing Memo: Cyber Security and the Tallinn Manual”, www.nids.go.jp/english/pubicatoon/pdf/.../briefing_e18 0.pdf Mälksoo, L.: “The Tallinn Manual as an international event.” http://www.diplomaatia.ee/en/article/the tallinn-manual-as-an-international-event/ Vihul, L.: “The Tallinn Manual on the International Law applicable to cyber Warfare”, http://www.ejiltalk.org/the-tallinn-manual-on-theinternation... Watney, M.M.: Challenges pertaining to cyber war under the International Law. In: The Third international conference on Cyber Security, Cyber Warfare and Digital Forensics pp. 1- 5 (2014). Iasiello, E.: “Cyber Attack: A Dull Tool to Shape Foreign Policy,” http://www.ccdcoe.org/publicatoins/2013/proceedings/ d3r1s3_Iasiello.pdf

128

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(2): 122-129 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) 11.

12. 13. 14.

15.

16. 17.

18.

Leyden, J.: “Cyberwarfare playbook says Stuxnet may have been ‘armed’ attack”, http://www.theregister.co.uk/2013/03/27/stuxnet_cyver war_r... Fidler, D.P.: “Was Stuxnet an Act of War? Decoding a Cyberattack”, http://ieeeexplore.ieee.org Leigh, D. Harding, L.: Wikleaks. Guardian Books, UK (2013). Lucas, E.: “Edward Snowden: Did the American whistleblower act alone?” http://www.telegraph.co.uk/news/worldnews/northame rica/usa/10595021/Edward-Sn France-Presse, A.: “Taiwan sets up internet shield to tackle ‘hacking,” http://www.scmp.com/news/china/article/1195632/taiw an-set... Dugard, J.: International Law: A South African Perspective. Juta, Cape Town, South Africa (2011). Matthew, J.: “Edward Snowden NSA Scandal: China calls on international community to form cyberspace code of conduct.” http://www.ibtimes.co.uk/edwardsnowden-nsa-scandal-china-merkel-obama-517736 Beauchamp, Z.: “How the US indictment of Chinese military hackers will change cyberespionage.” http://www.vox.com/2014/5/19/5731696/chinesehackers-cyberespionage-theft-cyber...

129

International Journal of Cyber-Security and Digital Forensics (IJCSDF) Published by The Society of Digital Information and Wireless Communications Miramar Tower, 132 Nathan Road, Tsim Sha Tsui, Kowloon, Hong Kong

Volume 3, Issue No. 3 - 2014

Email: [email protected] Journal Website: http://www.sdiwc.net/security-journal/ Publisher Paper URL: http://sdiwc.net/digital-library/browse/66

ISSN: 2305-0012

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 130-140 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Performance Measurement for Mobile Forensic Data Acquisition in Firefox OS Mohd Najwadi Yusoff, Ramlan Mahmod, Mohd Taufik Abdullah, Ali Dehghantanha Faculty of Computer Science & Information Technology, Universiti Putra Malaysia, Serdang, Selangor, Malaysia. [email protected],{ramlan,taufik,alid}@upm.edu.my ABSTRACT Mozilla Corporation has recently released a Linuxbased open source operating system, namely Firefox OS. The arrival of this Firefox OS has created new challenges, concentrations and opportunities for digital investigators. Currently, Firefox OS is still not fully supported by most of the existing mobile forensic tools. Even when the phone is detected as Android, only pictures from removable memory was able to be captured. Furthermore, the internal data acquisition is still not working. Therefore, there are very huge opportunities to explore the Firefox OS on every stages of mobile forensic procedures. This paper will present an approach for mobile forensic data acquisition in a forensically sound manner from a Firefox OS running device. This approach will largely use the UNIX dd command to create a forensic image from the Firefox OS running device. Apart from that, performance measurement will be made to find the best block size for acquisition process in Firefox OS.

KEYWORDS Mobile forensic, data acquisition, forensic image, dd command, Firefox OS.

1 INTRODUCTION The advancement of smartphone technology has attracted many companies in developing their own mobile operating system. Recently released Firefox OS is an open source mobile operating system which is purely based on Linux and Mozilla’s Gecko technology [1]. Firefox OS boots into a Gecko-based runtime engine and thus allow users to run applications developed exclusively using HTML5, JavaScript, and other open web application APIs. According to Mozilla Developer Network, Firefox OS is free from proprietary

technology, but still a powerful platform; it offers application developers an opportunity to create tremendous products [1]. Mozilla introduced WebAPI by bridging the capability gap between native frameworks and web applications. WebAPI enable developers to build applications, and run it in any standards compliant browser without the need to rewrite their application for each platform. In addition, since the software stack is entirely HTML5, a large number of developers were already established, and users can embrace the freedom of pure HTML5 [2]. To our knowledge, none of the existing mobile forensic tools are working perfectly with Firefox OS. For example, MobilEdit! is able to detect Firefox OS running phone, but listing it as an Android device. When we tried to perform data acquisition, MobilEdit! was only capable to acquiring some of the pictures from removable memory, the remaining are left undetected. In addition, we have also tried using Paraben Device Seizure, Oxygen Forensic Suite, Cellebrite Mobile Forensics as well as Micro Systemation XRY; and the result were even worse. The acquisition process for Firefox OS running phone become more exciting because the phone itself was detected as Android. This may be due to the similarity of both Android and Firefox OS in their based kernel. For that reason alone, this paper will demonstrate the use of Android Debug Bridge (ADB); to connect the phone with the host machine and acquiring the phone image using UNIX dd command. There are three types of acquisition of mobile devices; manual, logical and physical [3]. Manual acquisition is defined as the capability of acquiring data by interacting with the device itself. 130

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 130-140 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) Logical acquisition is recovering a bitwise copy of entities that reside in a logical storage, and lastly; the physical acquisition is solely related to the physical storage medium. In most cases, manual acquisition takes place simultaneously with the other two acquisition methods. On the contrary, there are strengths and weaknesses of each types of acquisition. Grispos stated that, logical acquisition is more efficient for recovering user data, whereas physical acquisition can retrieve deleted files [4]; but this procedure can damage the device while it is being dismantled. And for that reason, this paper will only perform the combination between manual and logical types of acquisition. During this process, we will be making a bitwise copy of all partitions, and keeping the log of actions taken. After that, we will run a few acquisition test with different block size to find the best block size for acquisition process in Firefox OS. Furthermore, the result will be presented in the graph form and we will recommend the best block size to be used for future investigation in Firefox OS. The objective of this paper is to present the detail steps on how we manage to acquire mobile forensic image from Firefox OS using UNIX dd command without making any changes to the phone during acquisition. This paper also present performance measurement for block size during acquisition process. This paper is organized as follows; Section (2) will explain about the state of the arts. Section (3) will present acquisition methodology and the detail steps. Section (4) is about performance measurement and Section (5) will give a brief conclusion and the future work to be considered. Acknowledgement and references are also presented at the end of this paper. 2 STATE OF THE ART 2.1 Early Mobile Investigation Data acquisition is the procedure of imaging and obtaining evidence from a mobile device and its peripheral equipment [5]. In the earliest mobile forensic investigation, most of the digital evidences in mobile phone were stored in SIM cards. Research by Goode stated that, it is vital to

acquire the data such as contacts and SMSs stored in SIM cards [6]. In addition, mobile phone memory and SIM cards also hold phone contacts which may contain critical evidences for an investigators. According to Goode, there are three evidence locations in mobile phone which are from SIM cards, identification information from a mobile phone (IMEI) and core network provider information. Similar work carried out by Willassen was by exploring SIM card and core network data in GSM phones [7]. According to Willassen, the SIM cards can provide information of network provider name with a unique identification number. The subscriber's name, phone number and address usually associated with the SIM cards. Consequently, phone records can also be retrieved from network providers. Furthermore, the contents of a SIM cards are binary data that can be taken, provided that the user has authentication either with a PIN or a PUK code. Programs or tools such as Cards4Labs and SIM-Surf Profi were used to decode the binary format into readable form. In addition, Willasen also able to recover the evidence such as phone logs, phone contacts, SMS, and phone IMEI obtained from both SIM cards and mobile phones. In similar attempt, Casadei was used open source tools, both in Windows and Linux for digital extraction from SIM cards [8]. As the result, Casadei was able to acquire the raw data in Binary format from the SIM cards. Casadei also presented an interpretation of binary raw data at a higher level of abstraction and used an open source tool named SIMbrush to examine the raw data. SIMbrush was designed to acquire digital evidence from any SIM cards in GSM network but have not tested for any modules under D-AMPS, CDMA and PDC. Additionally, SIMbrush focus more towards GSM network because GSM is the biggest mobile network in the world at that time and penetration in this network is rapidly increased. Marturana was extended the acquisition process in SIM cards by comparing data in SIM cards and smartphones [9]. According to Marturana, acquisition in the smartphone is much more complicated; this is due to the possibility of evidences are also stored in many places such as internal and flash memory. 131

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 130-140 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) 2.2 Revolutionary of Smartphone With the emergence of smartphones, focuses are more on the Windows Mobile OS due to its similarity in nature with the desktop environment. Windows Mobile OS is a simplified version of Windows OS developed by Microsoft; mainly for mobile devices. Research by Chen was able to extract SMS, phone book, call recording, scheduling, and documents from Windows Mobile OS via Bluetooth, Infrared and USB mode using Microsoft ActiveSync [10]. Microsoft ActiveSync used Remote API (RAPI) to manage, control, and interact with the connection equipment from the desktop computer. The acquired data were came from mobile phone internal memory, SIM card as well as removable memory. Similar research was continued by Irwin and Hunt by extracting evidences over wireless connections. They used their own developed forensic tools called as DataGrabber, CTASms and SDCap. DataGrabber was used to retrieve information from both the internal memory and any external storage card, CTASms to extract information from the mobile device’s Personal Information Manager (PIM) while SDCap was used to extract all information from external storage card. They were successfully mapping internal and external phone’s memory and transfer all files and folder to desktop computers [11]. By using RAPI function, acquisition process only capable to capture active data and capturing deleted data is not possible using this method. According to Klaver, physical acquisition method will be able to obtain non-active data in Windows Mobile OS [12]. Klaver was proposed a versatile method to investigate isolated volume of Windows Mobile OS database files for both active and deleted data. Klaver was used freely available tools of forensic application and explained the known methods of physical acquisition. Deleted data can be recovered by using advanced acquisition methods like chip extraction and this method was able to bypass password protection. Casey was extended the finding by describing various methods of acquiring and examining data on Windows Mobile devices. Casey was also able to capture text messages, multimedia, e-mail, Web

browsing, and Registry entries [13]. Some of the captured data by Casey were locked by the OS itself, and require XACT from Micro Systemation and ItsUtils to work together with Microsoft ActiveSync. These tools will help to unlock certain files and convert the ASCII format in cemail.vol structure to a readable SMS. This research was also focused on potentially useful sources of evidences in Windows Mobile OS and addressed the potential evidences found in “\temp” folder. In the recent work, Kaart made an investigation by reverse-engineering the pim.vol volume files in Windows Mobile OS [12]. pim.vol is a Microsoft’s Embedded Database (EDB) volume that consists of information related to phone contacts, calendars, appointments, call history, speed-dial settings and tasks [12]. Kaart was successfully reverse-engineering important parts of EDB volume format which allow them to recover unallocated records. Kaart was also delivered the mapping from internal column identifiers into readable format for some familiar databases in pim.vol volumes and created a parser that can automatically extract all allocated records exist in a volume. 2.3 Diversity of Mobile OS The proliferation of mobile technology has made many companies to produce their own mobile OS. Forensic approaches for Windows Mobile OS might not be applicable to other mobile platforms. Therefore, Savoldi made a brief survey and comparison between mobile forensic for Windows Mobile OS and Symbian S60 [14]. In his work, Savoldi acquired the evidences using both logical and physical methods. Savoldi was also illustrated the differences and identified possible common methodology for future forensic exploration. Conversely, Mohtasebi was studied four mobile forensic tools; namely Paraben Device Seizure, Oxygen Forensic Suite, MIAT, and MOBILedit! to extract evidences from Nokia E5-00 Symbian phone [15]. The comparison was to check the ability to extract evidence and to examine information types such as call logs, map history, and user data files. On the contrary, Casey was proposed a methodology for acquiring and examining forensic duplicates of user and system 132

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 130-140 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) partitions; from a device running on webOS [16]. These captured data is in .db3 format and can be analysed using SQL viewer. Some information are stored in UNIX string format, seen in date column in the database. 2.4 Bundle Software Package Most of the mobile manufacturers has provide a software package to communicate with their own products. One of the examples is Microsoft ActiveSync which is used for Windows Mobile OS. As for Apple iOS, Husain was used iTunes to force backup the iPhone and logical copy of backup can be found in computer hard drive [17]. This method was able to capture entire data from iPhone without Jailbreak the devices. Husain was used MobileSyncBrowser to analyze the backup file which is in binary format into list and database. Furthermore, SQLite Database Browser was used to analyze database file and Plist Editor was used to analyze Apple Property List file. Husain was also able to obtain data such as voice communication, text communication, audio-visual, location information, user activity and online activity related evidence. Similarly, Chun and Park were used Samsung Kies to extract SMS, photo and mobile image from Samsung Galaxy S [18]. This analysis also emphasis on the vulnerability when using free public Wi-Fi. However, most of the bundle software package that were used to acquire evidences, placing an agent into the mobile devices. This action may alter the stored data in mobile devices such as the last synchronization date and time; or the name of the last computer synchronize with the devices. For this reason, Rehault was proposed a method of using boot-loader concept; which is nonrewritable and able to protect the evidences from being altered [19]. Rehault was also proposed an analysis method to process specific files with specific format. The main focus in this research was to obtain registry hives and the cemail.vol which contain deleted data. By reconstructing back registry hives, digital evidence such as SMS, MMS as well as email can be obtained and retrieved from cemail.vol. An extended work for boot-loader concept by Rehault was published by

Chen for Android [20]. The concept of acquisition evidences is similar but this time was using Secure Digital (SD) card. This method was claimed can effectively perform the recovery of any deleted data. Besides that, Kumar was proposed an agent based tool developed for forensically acquiring and analyzing in Windows Mobile OS [21]. This tool is develop based on client server approach, whereby client is installed on desktop PC and server agent is inject into mobile devices before acquisition process. As for analyzing process, this tool was able to display and decode the image created during acquisition. This research also make a comparison between Paraben's Device Seizure, Oxygen's Forensics Tool as well as Cellebrite UFED and claimed to perform better in Windows Mobile OS. 2.5 Other Acquisition Techniques Apart from using the bundled software packages, the other way to obtain mobile images are by using the SSH connection. To use SSH, the phones should have SSH installed and the data has to be transferred to a remote host via the network; in most cases are using wireless connection. However, this is a lengthy process and very time consuming; it may require up to 20 hours depending on the image size. Alternately, GómezMiralles and Arnedo-Moreno were presented a novel approach by using an iPad’s camera connection kit attached via USB connection [22]. In order to acquire iPad’s image, this approach was greatly reduces the transferring time. On the bad side, Jailbreak is required in order to gain full access and it is not considered as a forensically sound manner for forensic investigation. For that reason, Iqbal made a research to obtain an Apple iOS image without Jailbreak the device and run the acquisition process on the RAM level [23]. Apple iOS devices need to reboot and enter the recovery mode before connected to their own developed tools. The imaging process was less than 30 minutes and they were successfully developed an acquisition method that protects the integrity of the collected evidences. Work by Jonkers was used flasher boxes to acquire data but there are some limitations was 133

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 130-140 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) observed; such as issues in verifying the data integrity and not really practical in normal investigation [24]. Consequently, removable memory become popular alternative for physical acquisition process. Rossi was demonstrated internal forensic acquisition in mobile devices by using removable memory [25] and this work becomes a stepping stone for the boot-loader concept. A tool to obtain the data is stored in a removable memory and the acquisition process is performed locally. In addition, this tool not only performs acquisition process, but also compiles log and marks the data with some one-way hash algorithm to provide data integrity. The test result was divided into three condition of mobile devices and result obtained are different. For that reason, Rossi suggest to maintain the device in the most possible original status. However, some approach does not work for volatile memory. As a result, Sylve was presented the first methodology and toolset for acquisition of volatile physical memory from Android devices [26]. This method was created a new kernel module for dumping memory and Sylve has further develop a tool to acquire and analyse the data. Sylve was also presented an analysis of kernel structures using newly developed volatility functionality. Similar method was also proposed by Dezfouli using force backup in an isolated folder [27], but this is yet to be implemented. On the contrary, Vidas was proposed a general method of acquiring process for Android by using boot modes [28]. This technique reconstructed the recovery partition and associated recovery mode of an Android for acquisition purposes. The acquired data has to be in recovery image format. Custom boot-loader method has become popular in Android because the user is able to get root permission; and able to acquire an image of the flash memory. Another research using boot-loader method was conducted by Park [29]. This research was mainly focused on fragmented flash memory due to the increase of flash memory deployment in mobile phones. The newly acquisition method is the live acquisition. Thing was proposed an automated system in acquiring evidences and claimed that this method consistently achieved 100% evidence

acquisition rate for outgoing message and 75.6% to 100% evidence acquisition rate for incoming message [30]. Thing was used Android as the test platform and Message Script Generator, UI/Application Exerciser Monkey, Chat Bot, memgrab and Memory Dump Analyzer (MDA) as the forensic tools. Although the acquisition rate is high, this method was only tested by using their own developed chat bot and yet to be tested using commercial Instant Messaging. Another live acquisition research is by Lai. Lai was proposed data acquisition in Android; and deliver the data to the Google cloud server in real time [31]. This method really can deliver the intended data, but the integrity of the data is questionable. On the other hand, Canlar was proposed LiveSD Forensics to obtain digital evidence from both the Random-Access Memory (RAM) and the Electronically Erasable Programmable Read Only Memory (EEPROM) of Windows Mobile OS. This research was claimed to generate the smallest memory alteration, thus the integrity of evidences is well preserved [32]. 3 ACQUISITION METHODOLOGY The goal of this paper is to propose a methodology to acquire the mobile forensic image from Firefox OS running phone. In general, Firefox OS architecture consist of 3 layers [33]. The first layer is an application layer called Gaia and it works as the user interface for smartphones. The second layer is an open web platform interface; using Gecko engine and provide all support for HTML5, JavaScript as well as CSS. All the targeted evidence are stored in this layer. The third layer called Gonk; is an infrastructure layer and based on Linux-Kernel. There are two types of storage in Firefox OS running phone which are internal storage and additional micro SD card. Acquiring data from the micro SD card is relatively easy; the phone only need to be connected to the host machine and micro SD card can be mounted as removable drive. However, acquiring data from internal storage and other user partitions is quite a challenging tasks. Subsection below will further elaborate about experimental setup and imaging process for Firefox OS running phone. 134

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 130-140 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) 3.1 Firefox OS Running Phone For acquisition process, we will use Firefox OS running phone released by Geeksphone, model name Peak. It was release in April 2013.

This phone powered by the dual core Qualcomm Snapdragon S4 processor and based on the ARMv7 instruction set. Table 1 below shows the specification detail for this phone. Table 1. Geeksphone Peak Specification Hardware Processor Memory Storage Battery Data Inputs Display Sensor Camera

Figure 1. Geeksphone Peak

This phone is equipped with Firefox OS version 1.1.1 as shows in Figure 2. Mozilla released an update for their OS regularly and any stable build can be update via over-the-air.

Connectivity

Compatible Network Dimension

Detail 1.2 GHz Qualcomm Snapdragon S4 8225 processor (ARMv7) 512 MB Ram -Internal 4GB -Micro SD up to 16GB - 1800 mAh - micro-USB charging Capacitive multi-touch IPS display 540 × 960 px (qHD) capacitive touchscreen, 4.3" -Ambient light sensor -Proximity sensor -Accelerometer 8 MP (Rear), 2 MP (Front) -WLAN IEEE 802.11 a/b/g/n -Bluetooth 2.1 +EDR -micro-USB 2.0 -GPS -mini-SIM card -FM receiver - GSM 850 / 900 / 1800 / 1900 - HSPA (Tri-band) - HSPA/UMTS 850 / 1900 / 2100 -Width: 133.6 millimetres (5.26 in) -Height: 66 millimetres (2.6 in) -Thickness: 8.9 millimetres (0.35 in)

3.2 Forensic Requirement Setup

Figure 2. Geeksphone Peak Information Detail

To begin with a forensic requirement setup, an additional driver for Geeksphone Peak need to be installed into the host machine. We will use Windows 8 as an operating system in the host machine. Once connected using the micro-USB 2.0 port, Windows 8 will ask for the driver. The supported USB driver can be downloaded from Geeksphone web. Once the installation finished, Geeksphone Peak will appear in the Device Manager as shows in Figure 3 and micro SD card will be mounted into the host machine as Linux File-CD Gadget USB Device.

135

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 130-140 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Figure 3. Windows 8 detect the phone as unknown device

Subsequently, we need to make a connection between the phone and the host machine. This connection is important because we need to acquire the data in full image so that can avoid any alteration into the possible evidences. Firefox OS is based on Linux-Kernel and the design more or less are similar with Android. For that reason, we can easily access the phone using Android Debug Bridge (ADB). The ADB is a toolkit integrated in the Android SDK package and consists of both client and server-side codes. The codes are able to communicate with one another. The bridge connection will be created between the phone and the host machine by using specific port. Since Firefox OS is a Linux-based open-source mobile OS, any rooting procedure are not required. To have ADB installed in the host machine, we need to download the Android SDK from Android developer page. The SDK file is about 480MB and unzip is necessary. After that, SDK Manager is launched and we need to install Android SDK Tools, Android SDK Platform-tools, and Android SDK Build-tools as shows in Figure 4. These three tools are required to run ADB. Now we are ready to start with the acquisition process.

Figure 4. ADB Required Tools

3.3 Device Imaging Process First of all, we need to know what data to be acquired and where it is stored. The targeted data is crucial so that we know what we should acquire and where to proceed. From a forensic standpoint, imaging the whole disk can preserve its contents from any changes. As for Firefox OS, the targeted phone image will consist of several partitions. It will cover the entire systems, user data and installed applications. It also covered several unknown partitions which is need to be further examine. It is necessary to have the phone battery charged to at least 20%. This is to avoid power loss during acquisition process. Any power loss might lead to device failure and recovery process will become much harder. Furthermore, it is advisable to turn off Bluetooth, 3G data, push notification, push email and localize service to minimize from external interference. Last but not least, we need to unmount micro SD card from the host machine. In order to do that, we need to go to phone Settings > Storage and disable phone storage as shows in Figure 5.

136

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 130-140 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Figure 7. Root Access

Figure 5. Disable USB Storage

In order to acquire the phone image, we will use UNIX dd command in ADB environment. To start ADB, we need to run command prompt (CMD) and pointing the command start to %Android SDK%\sdk\platform-tools folder. After that, we need to check connected phone and type

This command will establish the connection between the phone and the host machine and root@android:/ # access will appear in the CMD. In order to check the partition location, type the following; cd dev cd block ls

adb devices

Figure 6. Attached Devices

During checking, ADB will open the connection port and list down all supported devices as shows in Figure 6. Next is to type the following; adb shell

Figure 7. List of Partitions

These command will display all the existing partitions in the phone. The targeted phone partition name started with mmcblk0p1 till mmcblk0p21. We have two options to create the phone image. The first option is to run UNIX dd command for each partition, one by one and bit by 137

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 130-140 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) bit; started with mmcblk0p1 till mmcblk0p21. The second option is to run UNIX dd command to the parent tree of the partition which is mmcblk0; and it will later combined all the partitions into one image. We decided to run the second option and type the following; dd if=/dev/block/mmcblk0 of=/mnt/emmc/evidence.img The image is pointing into the micro SD card and we does not put any block size, by default it is 512KB. The process will take up until 10 minutes depending the block size chosen and the acquired image is around 3.64GB (internal storage size). Figure 8 shows the created log.

Figure 9. Acquired image from the Firefox OS running phone

4 PERFORMANCE MEASUREMENT

Figure 8. Log for Acquired Image

According to Figure 8, the captured time was 465.298 seconds and the speed was 8.012MB/s. This acquired image will cover all partitions in the internal memory and also cover unallocated partition. In order to transfer the acquired image into the host machine, we need to mount back the micro SD card and follow the previous step in Figure 5 which is phone Settings > Storage and enable back the phone storage. The host machine will again detecting the removable drive and acquired image will be appear as shows in the Figure 9.

The next step is a performance measurement during acquisition process. The test parameter in our measurement is the block size (bs) and the time taken for each acquisition process. In order to obtain an optimal value, we run a series of test for each value and the phone is keep to a minimum activity as possible. This is very important to get the best average result without any external interferences. Under these circumstances, we run three complete 3.64GB dumps for 12 different block size. The speed obtained in each test is shows in Table 2. Table 2. Throughput obtained with different block sizes bs 32 KB 64 KB 128 KB 256 KB 512 KB 1 MB 2 MB 4 MB 8 MB 16 MB 32 MB 64 MB

S1 (MB/s) 2.861 4.722 6.845 7.512 8.011 8.162 8.464 8.565 8.598 8.636 8.544 8.480

S2 (MB/s) 2.861 4.723 6.846 7.512 8.012 8.163 8.464 8.565 8.597 8.636 8.543 8.479

S3 (MB/s) 2.861 4.723 6.845 7.512 8.012 8.162 8.465 8.565 8.598 8.635 8.544 8.48

S (MB/s) 2.861 4.723 6.845 7.512 8.012 8.162 8.464 8.565 8.598 8.636 8.544 8.480

138

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 130-140 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) Based on this result, there are exponential increase in speed from 32KB to 512KB block size. However the speed differences become smaller when the block sizes used is larger. Figure 10 shows throughput analysis with different block sizes.

existing mobile forensic tools. This may be due to the differences of existing user data and the partition arrangement. As for performance measurement, we found that the best optimum speed are gained from 16MB block size and the speed will be reduce in the higher block size. As an outcome, we would like to suggest any block size between 2MB to 16MB for optimum acquisition speed. There are many more aspects to be explored. Our next focus will be on analyzing parts and we will go deeper on the system files, user data and application logs. Acknowledgments. Special thanks to academic staff of Universiti Putra Malaysia for providing continuous guide and support, and also to Ministry of Education Malaysia and Universiti Sains Malaysia for granting the scholarship to me. 6 REFERENCES 1. 2.

Figure 10. Throughput analysis with different block sizes

For this method we found that we can achieved the best acquisition speed by using 16MB block size. The speed will decrease when the larger block size is used, for this case is 32MB and 64MB block size. On the other hand, the different between 2MB to 64MB block size is too small and not noticeable.

3. 4.

5. 6.

5 CONCLUSION AND FUTURE WORK The arrival of Firefox OS has created new challenges, concentrations and opportunities for digital investigators. It is a very exciting tasks to explore new version of mobile OS since all the apps is purely built on HTML5. To our concern, even though Firefox OS based on Linux-kernel mobile OS, it does not mean existing mobile forensic tools will work fine with this release. We have proved that, only certain data able to read or captured from Firefox OS running phone by using

7. 8.

9.

Mozilla Developer Network, Firefox OS, https://developer.mozilla.org/enUS/docs/Mozilla/Firefox_OS Mozilla’s Boot 2 Gecko and why it could change the world, http://www.knowyourmobile.com/products/16409/mozil las-boot-2-gecko-and-why-it-could-change-world Barmpatsalou, K., Damopoulos, D., Kambourakis, G.: A critical review of 7 years of Mobile Device Forensics, In: Digit. Investig., vol. 10, no. 4, pp. 323--349, (2013) Grispos, G., Storer, T., Glisson, W. B.: A comparison of forensic evidence recovery techniques for a windows mobile smart phone, In: Digit. Investig., vol. 8, no. 1, pp. 23--36, Jul. (2011) Jansen W., Ayers, R.: Guidelines on Cell Phone Forensics - Recommendations of the National Institute of Standards and Technology (2007) Goode, A. J.: Forensic extraction of electronic evidence from GSM mobile phones, In: IEE Seminar on Secure GSM and Beyond: End to End Security for Mobile Communications, pp. 9/1--9/6 (2003) Willassen, S. Y.: Forensics and the GSM mobile telephone system, In: Int. J. Digit. Evid., vol. 2, no. 1, pp. 1--17, (2003) Casadei, F., Savoldi, A., Gubian, P.: SIMbrush: an open source tool for GSM and UMTS forensics analysis, In: First International Workshop on Systematic Approaches to Digital Forensic Engineering (SADFE’05), pp. 105-119 (2005) Marturana, P., Me, G., Berte, R., Tacconi, S.: A Quantitative Approach to Triaging in Mobile Forensics, In: 10th International Conference on Trust, Security and

139

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 130-140 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

10.

11.

12.

13. 14.

15.

16.

17.

18.

19. 20.

21.

22.

Privacy in Computing and Communications, pp. 582-588 (2011) Chen, S., Hao, X., Luo, M.: Research of Mobile Forensic Software System Based on Windows Mobile, In: 2009 International Conference on Wireless Networks and Information Systems, pp. 366--369 (2009) Irwin, D., Hunt, R.: Forensic information acquisition in mobile networks, In: 2009 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pp. 163--168 (2009) Kaart, M., Klaver, C., van Baar, R. B.: Forensic access to Windows Mobile pim.vol and other Embedded Database (EDB) volumes, In: Digit. Investig., vol. 9, no. 3--4, pp. 170--192, Feb. (2013) Casey, E., Bann, M., Doyle, J.: Introduction to Windows Mobile Forensics, In: Digit. Investig., vol. 6, no. 3--4, pp. 136--146, May (2010) Savoldi, A., Gubian, P., Echizen, I.: A Comparison between Windows Mobile and Symbian S60 Embedded Forensics, In: Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, pp. 546--550 (2009) Mohtasebi, S., Dehghantanha, A., Broujerdi, H. G.: Smartphone Forensics : A Case Study with Nokia E5-00 Mobile Phone, In: Int. J. Digit. Inf. Wirel. Commun., vol. 1, no. 3, pp. 651--655 (2012) Casey, E., Cheval, A., Lee, J. Y., Oxley, D., Song, Y. J.: Forensic acquisition and analysis of palm webOS on mobile devices, In: Digit. Investig., vol. 8, no. 1, pp. 37-47, Jul. (2011) Husain, M. I., Baggili, I., Sridhar, R.: A Simple CostEffective Framework for iPhone, In: Lect. Notes Inst. Comput. Sci. Soc. Informatics Telecommun. Eng. Digit. Forensics Cyber Crime, vol. 53, pp. 27--37 (2011) Chun, W., Park, D.: A Study on the Forensic Data Extraction Method for SMS , Photo and Mobile Image of Google Android and Windows Mobile Smart Phone, Commun. Comput. Inf. Sci. - Converg. Hybrid Inf. Technol., vol. 310, pp. 654--663 (2012) Rehault, F.: Windows mobile advanced forensics: An alternative to existing tools, In: Digit. Investig., vol. 7, no. 1--2, pp. 38--47, Oct. (2010) Chen, S.-W., Yang, C.-H., Liu, C.-T.: Design and Implementation of Live SD Acquisition Tool in Android Smart Phone, In: Fifth International Conference on Genetic and Evolutionary Computing, pp. 157--162 (2011) Kumar, S. S., Thomas, B., Thomas, K. L.: An Agent Based Tool for Windows Mobile Forensics, In: Lect. Notes Inst. Comput. Sci. Soc. Informatics Telecommun. Eng., vol. 88, pp. 77--88 (2012) Gómez-Miralles L., Arnedo-Moreno, J.: Versatile iPad forensic acquisition using the Apple Camera Connection Kit, In: Comput. Math. with Appl., vol. 63, no. 2, pp. 544--553, Jan. (2012)

23. Iqbal, B., Iqbal, A., Al Obaidli, H.: A novel method of iDevice (iPhone, iPad, iPod) forensics without jailbreaking, In: 2012 International Conference on Innovations in Information Technology (IIT), pp. 238-243 (2012) 24. Jonkers, K.: The forensic use of mobile phone flasher boxes, In: Digit. Investig., vol. 6, no. 3--4, pp. 168--178, May (2010) 25. Rossi M., Me, G.: Internal forensic acquisition for mobile equipments, In: 2008 IEEE International Symposium on Parallel and Distributed Processing, pp. 1--7 (2008) 26. Sylve, J., Case, A., Marziale, L., Richard, G. G.: Acquisition and analysis of volatile memory from android devices, In: Digit. Investig., vol. 8, no. 3--4, pp. 175--184, Feb. (2012) 27. Dezfouli, F. N., Dehghantanha, A., Mahmoud, R., Binti Mohd Sani, N. F., Bin Shamsuddin, S.: Volatile memory acquisition using backup for forensic investigation, In: 2012 International Conference on Cyber Security, Cyber Warfare and Digital Forensic (CyberSec2012), pp. 186--189 (2012) 28. Vidas, T., Zhang, C., Christin, N.: Toward a general collection methodology for Android devices, In: Digit. Investig., vol. 8, pp. S14--S24, Aug. (2011) 29. Park, J., Chung, H., Lee, S.: Forensic analysis techniques for fragmented flash memory pages in smartphones, In: Digit. Investig., vol. 9, no. 2, pp. 109-118, Nov. (2012) 30. Thing, V. L. L., Ng, K.-Y., Chang, E.-C.: Live memory forensics of mobile phones, In: Digit. Investig., vol. 7, pp. S74--S82, Aug. (2010) 31. Lai, Y., Yang, C., Lin, C., Ahn, T.: Design and Implementation of Mobile Forensic Tool for Android Smart Phone through Cloud Computing, In: Commun. Comput. Inf. Sci. - Converg. Hybrid Inf. Technol., vol. 206, pp. 196--203 (2011) 32. Canlar, E. S., Conti, M., Crispo, B., Di Pietro, R.: Windows Mobile LiveSD Forensics, In: J. Netw. Comput. Appl., vol. 36, no. 2, pp. 677--684, Mar. (2013) 33. Mozilla Developer Network, Firefox OS architecture, https://developer.mozilla.org/enUS/docs/Mozilla/Firefox_OS/Platform/Architecture

140

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 141 - 152 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

The Significance of Mandatory Data Breach Warnings to Identity Crime Eric Holm

Geraldine Mackenzie

Faculty of Business. Federation University Australia PhD Student, Bond University Ballarat, Australia [email protected]

Faculty of Law. Bond University Robina, Australia [email protected]

Abstract— The relationship between data breaches and identity crime has been scarcely explored in current literature. However, there is an important relationship between the misuse of personal identification information and identity crime as the former is in many respects the catalyst for the latter. Data breaches are one of the ways in which this personal identification information is obtained by identity criminals, and thereby any response to data breaches is likely to impact the incidence of identity crime. Initiatives around data breach notification have become increasingly prevalent and are now seen in many State legislatures in the United States and overseas. The Australian Government is currently in the process of introducing mandatory data breach notification laws. This paper explores the introduction of mandatory data breach notification in Australia, and lessons learned from the experience in the US, particularly noting the link between data breaches and identity crime. The paper proposes that through the introduction of such laws, identity crimes are likely to be reduced. Keywords—identity crime; data mandatory breach reporting; privacy

breaches;

INTRODUCTION The Australian Government has raised considerable debate around the introduction of data breach notification as a way of preserving data arising through breaches of data [1]. Public instances of data breaches have attracted awareness of the need for greater protection of personal data [1]. The heightened public awareness of these issues has prompted the Australian Government to introduce laws in Australia to deal with data breach

notification [2], the fundamental purpose of which is that the person whose data has been breached has a “right to know” of this breach [3]. Mandatory data breach notification is a legal requirement on the holder of data to notify those affected in the event of a data breach [1]; thus, through being notified of a breach, the person whose data has been misused may take appropriate action to prevent further harm resulting [4]. There are scarce information to link the notification of data breaches and identity crime, however, based on reports from the United States it is argued that a discernable reduction in the incidence of identity crime can occur through the introduction of mandatory breach notification laws [5]. Australia is in the process of introducing mandatory breach notification laws, which are in their final stages of legislative enactment, and will provide new opportunities to consider the relationship between mandatory breach notification and crime, in particular, identity crime. Data breaches are likely to become more important given the trends toward cloud computing and large data sets.

I.

II.

THE VULNERABILITIES OF PERSONAL IDENTIFICATION INFORMATION

Advances in technology have resulted in increased volumes of information being stored electronically, particularly on the Internet [5], which, include personal identification information [6]. Personal identification information is information that is unique to the person, for example their name, address and credit card number [7]. In the event of a data breach, it is this

141

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 141 - 152 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) information that can present vulnerabilities for identity crime to the individual [8]. Identity crime can also impact on corporations also where personal details are used to defraud corporations [9]. While corporations may be responsible for this crime, they can also be the victim. Nonetheless, 33% of the information exposed through data breaches includes personal identification information like names, addresses and credit card numbers [10].

data disclosure laws has reduced identity theft by at about 6 percent [7]. This points toward a reduction, albeit a small one, in the incidence of identity crime as a direct result of regulatory implementation. Although early to predict, it is probable that the same trend will be evident with the introduction of a mandatory breach notification in Australia.

In many countries there is no compulsion to record breaches of data. Further, in Australia, at present, there is no formal requirement regarding such breaches [1], however Australia will introduce data breach notification requirements in 2014 [11]. Prior to enactment of the new legislation, the approach to regulating privacy has been regulated by the Privacy Act 1988 (Cth), under which organisations and agencies storing personal information are subject to requirements to provide adequate security protection [12]. The Privacy Act applies to the use of information which is sensitive, including personal identification information [12]. There are criminal sanctions for the possession of personal identification information with the purpose of committing an offense in many States, for example Queensland [13]. However, when it comes to data breaches, the responsibility for the protection of private information is less clear in Australia, where an organisation or agency is not compelled to report data breaches, and the responsibility for this notification is therefore voluntary and falls back to the organisation concerned. This lack of mandatory reporting can have ramifications for identity theft.

In the existing regulatory environment in Australia, an organisation or agency involved in a data breach often exercises some discretion in dealing with that breach. As a consequence, many breaches may not be reported and appropriately actioned through notifying those involved or the Privacy Commissioner [6]. However, despite the largely voluntary nature of reporting, examples exist that demonstrate that some organisations and agencies, irrespective of the lack of compulsion to do so, actively take responsibility for data breaches and act accordingly by notifying those affected. [14]. However, because it is discretionary, this does not always occur.

III.

EXPLORING THE LINKS BETWEEN DATA BREACHES AND IDENTITY CRIME

The cost of data breaches has steadily increased [14]. At the same time, the number of identity crimes has increased, with for example the United States having over 15 million people that are victims of this crime each year [15]. A proportion of identity crime can be definitively traced back to data breaches, hence, the causal relationship between data breach and identity crime has significance. Noteworthy is the statement by the United States Federal Trade Commission that the introduction of

IV.

THE DRIVERS FOR MANDATORY BREACH NOTIFICATION LAWS IN AUSTRALIA

For localities that do not have mechanisms that compel data breach notifications such as Australia (at present), this results in difficulty in obtaining accurate data on the extent of data breaches. Symantec estimates that the cost of data breaches in Australia at $2.16 million in 2011 [14]. Nonetheless, examples of significant data breaches are still reported to the Privacy Commissioner, are from companies such as Sony [16] and Lego [17], with other notable breaches relating to Telstra [18] and Vodafone [19]. Likewise, significant instances of data breaches are also evident in other countries [20]. In this respect, for Australia, an advantage of having mandatory notification is that it may highlight a hidden societal problem [21]. Data breach costs continue to increase, according to research undertaken by the Ponemon Institute and IBM, the costs of data breaches increased in Australia by $4 per record up to a total of $145 per record in 2014 [22]. Similarly, the average organisational cost per for data breaches increased in Australia from 2.72 million dollars in 2014 to 2.8

142

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 141 - 152 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) million in 2014 [22]. A recent example of a data breach involving Australia involved eBay where a computer hacking attack resulted in data breaches to as many as 145 million customers [23]. The eBay breach was said to have occurred through the compromise of a small number of employee credentials [24]. These credentials were used to facilitate access to the personal identification information of customers. eBay admitted that the personal information had been taken but otherwise denied any risk of loss relating to financial information. The reason for this is that eBay account details are retained separate to financial details in PayPal. The particular personal information that was stolen included email addresses as well as physical addresses, phone numbers and dates of birth [24]. While this breach was not related to financial information, it involved personal identification information which has implications for identity crime. As a consequence of this breach, eBay took corrective action in notifying customers of the breach and also of the needed users to change log in details to prevent any further risk [25]. However, a criticism that arose following this breach was the delay taken by eBay to announce the breach and to make efforts to notify users. Further to this breach, there are investigations taking place on the liability of eBay for this breach particularly from U.S bodies located within Connecticut, Florida and Illinois [25]. There may result in further legal action and will be followed with interest. This eBay breach came only months after a significant data breach involving the retailer Target in the United States where 100 million users were purported to have been impacts [26]. In this breach up to 30 million financial details were stolen over a two month period with up to 70 million personal details stolen. The breach in Target was disclosed in December 2013 [26]. The personal details breached included names, addresses and phone number of Target customers [26]. As a consequence of this breach, Target identified a drop in profits of around 46 per cent in profit on the year before, this loss is directly attributable to this data breach. Arising from this breach, financial institutions also sufferred losses expressed through the reissuance of cards and

the upgrading of payment systems. These losses were estimated to amount to an estimated at 200 million dollars [26]. For corporations there are potentially significant costs attached with data breaches that include losses to future revenue through lost consumer confidence [22]. In the loss to Target, in addition to a financial sanction imposed of 17 million dollars ($US), this also impacts on the goodwill to the organisation which directly results in financial harm [27]. Research undertaken by Symantec in 2013 found that Australia is prominent as a country impacted by data breaches [28]. Furthermore, Australian companies had the largest number of records compromised particularly where contrasts might be made to data breaches in other countries like Italy and Japan which were comparatively minimal [28]. Associated with these loses are abnormally high rate of customer churn which means that consumer loss associated with this crime is significant, as evident from the Target example expressed previously [28]. Interestingly the United States expended the most resources on notification and perhaps attributes a lower levels of customer loss and from this it will be interesting to observe if this changes in Australia through the introduction of mandatory breach notification laws in Australia change the extent of churn [28]. V.

THE MARKET FOR INFORMATION

Personal identification information has a value, which can be equated to assets that can be traded and sold [29]. While often the value of this information is based on benefit that’s comes from it [29]. Now the risk to individuals is far more profound with information being shared online as well as traded like a commodity [30]. There are likely to be many costs associated with the misuse of information arising from data breaches that are not readily estimated. It is difficult to know how and in what way information will be used in the future. For instance, with identity crime, personal identification information can be warehoused by an identity criminal and used some time later to perpetrate crime and this makes it difficult to know when the crime will be committed and from what source the details were obtained. This crime is pervasive as many victims will not find out they

143

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 141 - 152 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) have becomes victims until they are contacted by debt collectors [30]. At that time, they will need to establish how this took place and from where to prevent it occurring again. This becomes difficult with such a time lapse between the data acquisition and the crime. Personal identification information has value and has created a market in the trade of it [31]. Many individuals born after 2012 will have a profound digital footprint on the Internet [32]. They may develop an identity on the Internet from birth that will extend their life. Personal information is far easier to aggregate because of the ways it is shared and social networking is partly responsible for this [32]. The capability for information dissemination is made more dramatic through the emergence of ‘big data’. ‘Big data’ can be expressed as data made up of complex data sets which are mammoth. Some of this data incorporates personal identification information, financial information as well as various other types [32]. Another emerging and potential risk of identity crime is that of the risk of data breaches through clouds [33]. The vulnerability for data in clouds, like that from big data is based on the extent of information stored which is likely to contain a significant amount of information [33]. The potential for loss for personal identification information is substantial given the amount of organisations storing data online and the reliance on this emerging technology for this information storage [34]. Romansky, Telang and Acquisti suggest that while a person has a right to know of a data breach involving their personal information, another driver for compelling such notification is information dissemination more broadly to increase communal knowledge of the occurrence [7]. Such dissemination of information means that the practices undertaken by organisations and agencies relating to the management of information may become more transparent [35], meaning that individuals are better aware of where the greater and lesser risks arise. for

Anecdotally, in Australia there is mixed support the introduction of mandatory breach

notification. In 2012, a representative sample of 700 Australians who were surveyed by eBay found that 80% of respondents supported the introduction of laws requiring notification of breaches in Australia [36]. This statistic is likely to be different today following the eBay data breach in 2014. Interestingly, many of those surveyed were most concerned about identity theft and the loss of financial data resulting from data breaches, and it was these considerations that drove their support [36]. Therefore a driver for regulatory change is the very real threat of identity crime [36]. This sentiment is similarly reflected other jurisdictions [37]. According to the Australian Privacy Breach Notification Discussion paper, earlier recommendations by the Australian Law Reform Commission (ALRC) in relation to mandatory breach notification were subject to criticism [1]. However, it was clear that some response to dealing with data breaches through the privacy regulatory responses available was still needed. The Office of the Australian Information Commissioner introduced a guide to handling security breaches as a measure to mitigate the impacts of data breaches [1], providing a guide to practical steps to handle data breaches, among other things [12]. However, a limitation of this is that it does not compel any notification in respect of a breach or prescribe penalties [1]. Nonetheless, the guide is helpful in assisting organisations and agencies with data breaches until more formalised rules and regulations take effect in Australia [1]. An important driver for the introduction of laws around mandatory data breach notification is that this could serve as a deterrent to overcome poor information management practices [1]. This is based on the consequences that flow to the organisation or agency that does not deal with personal identification information in an appropriate manner [1]. According to the Australian Privacy Breach Notification Discussion paper, there is merit in the identification of bodies that do not take appropriate steps in responding to data breaches [1]. In addition, a side benefit, is that the broader community confidence in the approaches that are taken to manage information [1]. These are

144

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 141 - 152 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) powerful motivators to having such laws introduced in Australia. Laws relating to mandatory notification of data breaches have been implemented in a number of countries including, the United States (mentioned above), Germany, Norway and Japan [37]. In the United States for instance, such approaches to dealing with data breaches go historically as far back as 2003 in California [38]. There are lessons to be learnt from those who have developed similar laws in the past, such as moderating the number of warnings to avoid fatigue [1]. Despite the obvious improvement to policy and practice that relates to the overall improvement to privacy that such a regulatory change makes, there are others that relate to the relationship between privacy and other crimes. VI.

THE LINK BETWEEN DATA BREACHES AND IDENTITY CRIME

Romanosky, Telang and Acquisti suggest that the link between identity theft and data breaches is tenuous due to the lack of data available to conclusively support this relationship [7], and that further, the data around identity crime is questionable. Needles also suggests that the relationship is not significant due to the lack of data available to support such a link [39]. However, Cate, Abrams, Bruening and Swindle aver that the nexus between data breaches and identity theft is under stated because the true extent of identity crime is not known [40]. However, Romanosky, Telang and Acquisti state that up over 30% of identity thefts are caused by data breaches by corporations [7], and Burdon notes that an important link exists between data breach notification and the mitigation of identity theft [41]. Likewise, Regan highlights that mandatory breach notifications can positively reduce identity crime through increasing awareness [42]. Therefore despite the difficulties in drawing direct linkages between data breaches and identity crime, there is a relationship [43]. The ALRC notes that in the United States, a key rationale for the introduction of mandatory breach notification laws was around mitigating the

potential for identity theft [44]. Accordingly, the ALRC suggests that in Australia, without regulatory oversight of data breaches and the appropriate notifications stemming from these, the risks associated with identity theft will only increase [44]. This may be aside from whether there are consistent criminal sanctions relating to identity crimes. Therefore, the ALRC argues that through regulating the reporting of data breaches it may be possible to mitigate the damage arising from identity crime [44]. Consequently, there is more work that is needed to explore the relationship between these variables. HOW MIGHT THE REGULATORY RESPONSE LOOK? In Australia, the approach proposed to mandate the reporting of data breaches is based on the privacy frameworks and related technologies [45]. Australian governmental agencies, as well as private sector organisations [46], are guided by principles outlining how information can be used, disclosed and stored [47]. This is different from the approach in the United States, which emphases specific uses of personal information such as health information, [48] and that of driver’s licenses [49]. However, there is often many forms of personal identification information that can become susceptible through data breaches. The principles in the United States are not founded on privacy principles in the same way as the Australian approach, and rather represent disparate approaches driven by state specific regulatory needs than an overarching approach, which means the approach in Australia is likely to be different to the United States. VII.

An issue with dealing with identity crime is that it is not regulated internationally and rather falls to various domestic regulatory mechanisms for effect [50]. The international agreement that deals with identity crime is in the European Convention on Cybercrime [51] which promotes cooperation and coordination in cyber-crimes. However, despite this convention there are arguably inadequate and inconsistent national responses to this crime [51]. This has implications for the prevention of the identity crime as well as the measures of redress

145

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 141 - 152 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) [51]. Discrepancies in the regulatory responses to identity crime are evident through research undertaken through the European Union which found that most countries do not have specific laws that deal with identity theft [51]. To provide a contrast, in Latvia, has sanctions attached to identity crime which state that the crime has a penalty of up to 15 years imprisonment [52]. In Romania the penalties for this crime are up to 20 years imprisonment for offences [53]. In contrast, other European countries like Finland which has a penalty of up to 4 years [54] and Denmark 6 years imprisonment [55]. Hence, with European Countries there are significant differences in the penalties for this crime. LESSONS LEARNED FROM THE US: WHAT SHOULD BE REPORTED? The response to the question of what should be reported is dependent on the stringency of response applied to the reporting requirement. A broader policy question that needs consideration is what breaches should be reported? This has been referred to as the ‘trigger’ for the notification of a breach in parts of the United States [56]. Reporting data breaches has a cost associated with it, and the more stringent the reporting requirement then the more costly it becomes [1]. However, not adequately reporting a breach renders the reporting process unworkable, and any remedial responses arising from the breach unattainable. Therefore, there is a delicate balance in the reporting process in terms of identifying the incidence of data breach that should be reported and in this respect, the response needs to be substantive enough to make it worthwhile. In the United States, for instance, a negative consequence has been observed through the overuse of notification mechanisms which can result in complacency due to fatigue [1]. It is hoped that this is will not be the outcome of regulatory reform in Australia as the focus is rather on an assessment and notification where there is a breach that places a person at risk of harm [57]. VIII.

The ALRC suggested models for data breach notification in Australia which were largely based on the United States approach [6]. Jurisdictions in the United States tend to have stringent triggers for

reporting data breaches but tend to vary between States; For instance, Indiana requires a database owner who knows or should know that the data has been breached to report such incidence [58]. This places a responsibility on the organisation concerned to report instances where there might be a suspicion of data loss. This may be a reasonable approach for Australia however there are diverging views about what information should qualify for mandatory breach notification; and the organisation or agency involved may not be in the best position to make determinations as to what should be reported. Therefore deferring such responsibility to another authority such as the Privacy Commissioner may be preferable [1]. This is what is being proposed in Australia under the current bill [57]. Ideally the data that is likely to have an adverse effect on the individual is the data that should be reported and this can be construed broadly [1]. Key to the effectiveness of any such notification of breach will be the speed in which such notification take place, so as to mitigate any possible consequences flowing from that breach [1]. Further the way in which this communication takes place will invariably impact on the speed of such a response [1]. In the United States, in California, the requirement is the most expedient manner to avoid unreasonable delay [56]. How this will be applied in an Australian context, is not yet clear, however it certainly appear to be as soon as practicable following the breach [57]. Regardless, any notification of breach must be timely to be effective particularly given the speed in which misuse of data can take place [46]. PROPOSALS FOR CHANGE IN AUSTRALIA A Federal approach to mandatory data breach notification could provide uniformity in respect of the approach taken and potentially avoid issues arising through a mixture of responses based on State laws or similar [42]. Similar to the United States this may overcome the issue of variability in State based approaches to mandatory breach notification and inconsistencies [58]. Another advantage of a Federally regulated approach is that it would provide consistent remedies for the victims IX.

146

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 141 - 152 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) of such data breaches [58]. This is the proposed approach to be adopted within Australia. The ALRC has suggested that market based incentives remain an important tool for improving information security measures [44]. The threat to reputation is a market driven force that is important in mitigating data breaches [59]. Arising from this threat, the damage to reputation for such bodies can be extensive [41]. The ALRC recognise reputational damage as an incentive for organisations to improve information security but ultimately they also take the view that this alone was not an adequate measure and needed to be accompanied by a regulatory response prompting action [6]. Nonetheless, it is important to acknowledge the role that is played by these other factors in mitigating the effect of data breaches. There has been debate around the sanctions that should apply to organisations or agencies that fail to notify of a data breach [1]. Therefore, if the penalties are civil and monetary then what should these be in terms of a sanction amounts? If these are non-monetary, then how should this be framed? The proposed changes would allow the Privacy Commissioner to investigate and make determinations as well as provide remedies for noncompliance through the Privacy Act [57]. The United Kingdom, for instance, applies a fixed penalty of a thousand pounds to data breaches [60]. Alternative options to civil penalties include administrative penalties, in addition to naming organisations and agencies that do not report data breaches [1]. This has implications for reputation which have been identified above. Perhaps an appropriate penalty involves a mixture of these options. The governmental approach to these remains unclear in until the bill becomes law and the future actions, in this regard will becomes clearer in time. An important part of mitigating data loss is to encourage organisations and agencies to engage with better data management practices, including acknowledging the steps that have been taken by organisations and agencies to mitigate risk attached to potential breaches. This might include for

instance, what emphasis is placed on encryption or similar steps to mitigate data breaches, which is recognised in other jurisdictions [61]. The ALRC considered this important for reducing liability in instances of data breaches [6]. The implementation of preventative measures needs to be recognised as a deterrent to data breaches, but the extent of this needs to consistently. In Australia, those against mandatory data breach reporting argue that such reporting can impose unreasonable financial burdens on organisations [62]. The cost stems from the cost in making contact with the person whose data has been breached [7]. This is reflected in Australia, at present by the voluntary reporting requirements/expectations under the Office of the Australian Information Commissioner [1]. However, at present, there is little incentive for organisations and agencies to comply with reporting of data breaches [63], and the ALRC similarly asserts that this poor market incentives result in low levels of reporting [64]. If the status quo in dealing with data breaches was working effectively it would be unlikely that such debate would be occurring regarding the need for mandatory notification of breach laws in Australia nor elsewhere. In Australia, the regulatory change is set to commence in 2014. The new regulatory requirements will oblige organisations to provide notifications where the data breach will result in harm that is ‘serious’ [65], including injury to feelings, reputation, financial or economic harm [66]. This requirement provides, among other things, for the entity concerned to as soon as practicable notify the Commissioner and each individual significantly affected by the data breach [67]. The regulatory change modifies the Privacy Act 1988 (Cth) by establishing mandatory notification through the changes proposed in the Privacy Amendment (Privacy Alerts) Bill 2013 [68]. This amendment provides the right for the Privacy Commissioner to take enforcement action, investigate complaints as well as obtain undertakings from organisations about compliance. Similarly civil penalties will also be available under this regime [68].

147

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 141 - 152 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) The trend to recognise legislative requirements for the notification of data breaches has certainly been a factor in introduction of such rules into Australia [66]. In particular, in this context, strong reference is made to the United States that has adopted some regulatory approach to dealing with data breaches. Further, reference has also been made to the European Union which similarly requires telecommunication and internet service providers to disclose certain data breaches to national authorities [69]. Importantly, within this jurisdiction, a further draft data protection regulation document broadens these obligations [70]. A concern expressed by the ALRC was that they wish to reduce the burden of compliance [1], a sentiment that is similarly reflected in other jurisdictions and certainly identified as an area of concern in the European Union [1]. Thereby care has been taken to identify breaches as ‘serious data breaches’ for the purposes of the legislation as the trigger upon which action must be taken [68]. What is interesting about the developments in the regulatory responses to data breach disclosure is that increasingly there is an awareness of the need for an appropriate regulatory response, and the government is tentatively approaching this. It will be interesting to observe the developments in this regard. X.

THE EVOLUTION OF THE LAW AND THE LESSON LEARNED

In 2014 amendments took effect in Australia in relation to privacy through The Privacy Amendment (Enhancing Privacy Protection) Act 2012 which introduced a range of changes to the privacy principles regulating the handling of personal information [71]. This applies to the ways in which personal information is dealt with by the Australian Governments agencies and some private sector organisations. The provisions enhanced the powers of the Office of the Australian Information Commissioner and changes to credit reporting laws, and provide greater recognition of external dispute resolution schemes and privacy codes [71]. The

recent changes to the privacy laws represent the largest changes to these laws in 25 years in Australia [71]. Specific to this article are the changes that have been made to Australian Privacy Principles 6, 7 and 8 which all relate to the disclosure of personal information; the use or disclosure of personal information, the use of information for direct marketing and cross-border disclosure of personal information. However, the specific Bill to provide for mandatory breach notification unfortunately lapsed in Australian Parliament resulting in a delay in it becoming law. The significant aspect of this bill is that it would allow the investigation of data breaches but this is delayed until the bill becomes law [71]. XI. THE FUTURE PROSPECTS OF A SUCCESSFUL IMPLEMENTATION – REGULATORY IMPACT ASSESSMENT

It can be difficult to assess the extent to which regulatory efforts will influence identity crime. More research is needed to gauge the impact of changes to privacy law on the incidence of identity crime. In studies by the European Union it was recognised that it is desirable to have a coordinated mechanism to report crime internationally [51]. The centralisation of data collection functions is important for the collection of data related to this crime as well as potentially providing support mechanisms for victims [51]. In addition, having a centralised data collection function, improves the common understanding of this crime [51]. The focus on victims is important as they wear the loss and are seldom the focus of regulation as the focus seems to remain on the offender rather than the victim. This might provide a useful way forward in dealing with this crime and perhaps also better understanding the relationship between data breaches and identity crime. SIGNIFICANCE This paper brings together existing literature on the relationship between identity crime and mandatory breach notification laws. Given the increased prevalence of both data breaches and of XII.

148

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 141 - 152 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) identity crime it is important to acknowledge the existence of the relationship between these variables. Further, where laws are introduced to deal with mandatory breach disclosure, as they are at present in Australia, it is vital to consider the implications such laws will have on the reduction of identity crime. It is also important to recognise that while there is a link between data breaches and identity crime that is somewhat tenuous, there is a significant relationship. This will only be measureable post-implementation, and can form the basis of future discussion. CONCLUSION Both the literature and issues currently being experienced in practice suggests that there is a need to be able to mitigate data breaches, which will in turn assist in the prevention of identity crime. It is not clear whether regulating the notification of data breaches is going to have discernible impact on identity crime, and only time will reveal the true extent of this. However, what is clear from this conceptual research is that there is a relationship between data breaches and identity crime that means that a reduction in one (data breaches) are likely to result in a reduction in the other. Hence, it is reasonable to suggest that the introduction of mandatory breach notification laws in Australia will have a direct impact on the incidence of identity crime in that country. XIII.

ACKNOWLEDGMENT We would like to acknowledge the support of Faculty of Business at Federation University Australia and the support of the Law Faculty at Bond University. I would like that thank the reviewers of the paper presented to Cybersec2014 for their helpful comments and feedback.

3.

4.

5.

6.

7.

8.

9. 10.

XIV.

XV. 1.

2.

11. 12.

REFERENCES

Australia. Australian Government, Discussion Paper: Australian Privacy Breach Notification. Barton: AttorneyGeneral’s Department; 2012. [Online]. Available: http://www.ag.gov.au/Consultations/Documents/Australia nPrivacyBreachNotification/AustralianPrivacyBreachNoti ficationDiscussionPaper.doc. [Accessed: 17 Jan 2013]. Australia. Australian Government, Media Release: Business warned to be ready for data breaches. Sydney: Office of the Australian Privacy Commissioner; 2012. [Online]. Available: http://www.oaic.gov.au/news-and-

13. 14.

15.

events/media-releases/privacy-media-releases/businesswarned-to-be-ready-for-data-breaches [Accessed: 5 Nov 2012]. L. C. Rode, “Database Security Breach Notification Statutes: Does Placing the Responsibility on the True Victim Increase Data Security?,” Houston Law Review, vol. 43, no. 5. pp. 1597-1621, Spring 2007. S. Romanosky, and A. Acquisti, “Privacy Costs and Personal Data Protection: Economic and Legal Perspectives,” Berkeley Technology Law Journal, vol. 24, no.3, pp. 1072-1074, December 2009. Canadian Internet Policy and Public Interest Clinic. “Approaches to Security Breach Notification: A White Paper,” https://www.cippic.ca/. [Online]. Available: https://www.cippic.ca/sites/default/files/BreachNotificatio n_9jan07-print.pdf. [Accessed: Nov. 5, 2012]. Australia. Australian Government, For your Information: Australian Privacy Law and Practice (ALRC Report 108) - Data Breach Notification. Sydney: Australian Law Reform Commission; 2008. [Online]. Available: http://www.alrc.gov.au/publications/51.%20Data%20Brea ch%20Notification/rationale-data-breach-notification. [Accessed: Nov. 5, 2012]. S. Romanosky, R. Telang, and A. Acquisti, “Do Data Breach Disclosure Laws Reduce Identity Theft?,” Journal of Policy Analysis and Management, vol. 30, No.2, pp 256-286, March 2011. M. Turner (2006, Jun. 21). “Towards a Rational Personal Data Breach Notification Regime,” Information Policy Institute [Online]. Available: http://www.perc.net/wpcontent/uploads/2013/09/data_breach.pdf. [Accessed: Jan. 17, 2013]. C. Bunton, "Corporate ID theft – is your company vulnerable?," Strategic Direction, vol. 21, No.2, pp.3-4, February 2005. Symantec. “Threat Activity Trends – Data Breaches that Could Lead to Identity Theft,” www.symantec.com. [Online]. Available: http://www.symantec.com/threatreport/topic.jsp?aid=data _breaches_that_could_lead&id=threat_activity_trends. [Accessed: Jan. 17, 2013]. Privacy Amendment (Privacy Alerts) Bill 2013. Australia. Australian Government, National Privacy Principles. Sydney: Office of the Australian Information Commissioner; 2001. [Online]. Available: http://www.privacy.gov.au/materials/types/download/877 4/6582. [Accessed: 17 Jan 2013]. Criminal Code 1899 (Qld) s 408D(1). Symantec. ”2011 Cost of Data Breach Study: Australia,” www.symantec.com. [Online]. Available: http://www.symantec.com/content/en/us/about/media/pdfs /b-ponemon-2011-cost-of-data-breach-australiaus.pdf?om_ext_cid=biz_socmed_twitter_facebook_marke twire_linkedin_2012Mar_worldwide__CODB_Australia. [Accessed: Jan. 17, 2013]. Identitytheft.info. “Identity Theft Victim Statistics,” http://www.identitytheft.info/. [Online]. Available:

149

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 141 - 152 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

16.

17.

18. 19.

20.

21.

22.

23.

24.

25.

26. 27.

http://www.identitytheft.info/victims.aspx. [Accessed: Feb. 6, 2014]. I. Paul, (2011, Apr. 29). “Sony Hackers Claim to Have Credit Cards,” PC World, [Online]. Available: http://www.pcworld.com/article/226692/sony_hackers_cl aim_to_have_credit_cards.html. [Accessed: Jan. 17, 2013]. NBCNews. “Data Breach Topples Australian Lego Fans,” NBCNEWS.com. [Online]. Available: http://www.msnbc.msn.com/id/47621717/ns/technology_ and_science-security/#.T81Lt8U0K1c. [Accessed: Jan. 17, 2013]. L. Battersby, “Telstra Red-Faced after Email Error,” Sydney Morning Herald, p. 7, Dec 7, 2010. A. Langmaid (2011, Jan. 9). “Vodafone Mobile Records Leaked onto the Internet,” Herald Sun [Online]. Available: http://www.heraldsun.com.au/news/victoria/vodafonemobile-records-leaked-onto-the-internet/story-e6frf7l61225984469970. [Accessed: Jan. 17, 2013]. IT Security Training. “IT Security Training Australia: Data Breach Notification in Australia,” IT Security Training. [Online]. Available: http://www.itsecuritytraining.com.au/content/data-breachnotification-australia-whitepaper-available [Accessed: Jan.17, 2013]. F.J. Garcia, “Data Protection, Breach Notification, and the Interplay between State and Federal Law: The Experiments Need More Time” Fordham Intellectual Property, Media and Entertainment Law Journal, Vol. 17, no. 3. pp 693-727, March 2007. Ponemon Institute. “2014 Cost of Data Breach Study: Australia,” IBM [Online]. Available: http://www01.ibm.com/common/ssi/cgibin/ssialias?subtype=WH&infotype=SA&appname=GTS E_SE_SE_USEN&htmlfid=SEL03021USEN&attachment =SEL03021USEN.PDF#loaded. [Accessed: Jun. 9, 2014]. SCMagazine. “European data authorities to probe eBay data breach,” SCMagazine. [Online]. Available: http://www.scmagazine.com/european-data-authorities-toprobe-ebay-data-breach/article/348676/. [Accessed: Jun. 02, 2014]. Yahoo!. “Massive breach at eBay, which urges password change,” Yahoo! [Online]. Available: https://au.news.yahoo.com/thewest/business/technology/a /23722507/massive-breach-at-ebay-which-urgespassword-change/ [Accessed: May. 28, 2014]. InTheCapital, “eBay Faces Legal Consequences for Data Breach,” InTheCapital [Online]. Available: http://inthecapital.streetwise.co/2014/05/26/ebay-databreach-consequences/. [Accessed: May. 26, 2014]. B. Krebs, “What a major data breach costs: Target by the numbers,” The Sydney Morning Herald, May 6, 2014. InformationWeek. “The Cost of Data Loss Rises,” InformationWeek [Online]. Available: http://www.informationweek.com/news/internet/showArti cle.jhtml?articleID=204204152. [Accessed: Jun. 1, 2014].

28. Symantec. ”2013 Cost of Data Breach Study: Australia,” www.symantec.com. [Online]. Available: https://www4.symantec.com/mktginfo/whitepaper/053013 _GL_NA_WP_Ponemon-2013-Cost-of-a-Data-BreachReport_daiNA_cta72382.pdf. [Accessed: Jun. 01, 2014]. 29. T. Hemphill. “Identity Theft: A Cost of Business?,” Business and Society Review?, vol. 106, Iss. 1, pp.51-63, December. 2001. 30. Yahoo! 7. “Child Identity Theft,” Yahoo! 7. [Online]. Available: http://au.tv.yahoo.com/sunrise/video//watch/26825601/child-identity-theft/. [Accessed: Apr. 30, 2012]. 31. CBSNews, “As Target Fallout Continues, Incidence of Fraud Emerge,” CBS. [Online]. Available: http://www.cbsnews.com/news/as-target-falloutcontinues-incidents-of-fraud-emerge/. [Accessed: Jun. 01, 2014]. 32. PCWorld, “Data Snatchers! The Booming Market for Your Online Identity.” PCWorld [Online]. Available: http://www.pcworld.com/article/258034/data_snatchers_t he_booming_market_for_your_online_identity.html [Accessed: May. 25, 2014]. 33. TopTechNews. “Cloud Could Triple Odds of $20M Data Breach,” TopTechNews [Online]. Available: http://www.toptechnews.com/article/index.php?story_id= 012000EWI0XC. [Accessed: May. 15, 2014]. 34. Tripwire. “Cloud Services Triple Likelihood and Cost of Data Breaches,” Tripwire [Online]. Available: http://www.tripwire.com/state-of-security/top-securitystories/cloud-services-triple-likelihood-and-cost-of-databreaches/. [Accessed: May. 15, 2014]. 35. P. M. Schwartz, and E. J. Janger, “E. Notification of data security breaches” Michigan Law Review, Vol. 105, no. 5, pp.913–984, March 2007. 36. A. MacGibbon and N. Phair, (2012, Apr). “Privacy and the Internet: Australian Attitudes Towards Privacy in the Online Environment”. Centre for Internet Safety [Online]. Available: http://www.canberra.edu.au/cis/storage/Australian%20Att itutdes%20Towards%20Privacy%20Online.pdf. [Accessed: Jan. 17, 2013]. 37. A. Moses, (2011, Jul. 27). “Thousands of Privacy Breaches Going Unreported”. The Age [Online]. Available: http://www.theage.com.au/technology/technologynews/thousands-of-privacy-breaches-going-unreported20110727-1hzes.html. [Accessed: Jan. 17, 2013]. 38. California Civil Code § 1729.98(a) 2003. 39. S.A. Needles, “The Data Game: Learning to Love the State-Based Approach to Data Breach Notification Law” North Carolina Law Review, vol. 88, no. 1. pp. 267-272, December 2009. 40. F. Cate, M. Abrams, P. Bruening and O. Swindle, (2009, Mar. 16). “Dos and Don’ts of Data Breach and Information Security Policy”. www.huntonfiles.com, [Online]. Available:

150

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 141 - 152 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

41.

42. 43. 44.

45. 46.

47.

48. 49. 50.

51.

52. 53. 54. 55. 56.

http://www.huntonfiles.com/files/webupload/CIPL_Dos_ and_Donts_White_Paper.pdf. [Accessed: Jan. 17, 2004]. M. Burdon, The Conceptual and Operational Compatibility of Data Breach Notification and Information Privacy Laws. Ph.D. dissertation Faculty of Law., Queensland University of Technology., Brisbane, 2011. P. M. Regan, “Federal Security Breach Notifications: Politics and Approaches,” Berkeley Technology Law Journal, Vol. 24, no. 3, pp. 1103- 1126, June 2009. J. K. Winn, “Are Better Security Breach Notification Laws Possible?,” Berkeley Technology Law Journal, Vol. 24, no.3, pp. 1133-1166, June 2009. Australia. Australian Government, Review of Australian Privacy Law. Sydney: Australian Law Reform Commission: 2007 [Online]. Available: http://www.austlii.edu.au/au/other/alrc/publications/dp/72 /DP72.pdf. [Accessed: 19 Jan 2013]. Privacy Act, 1988 (Cth). Parliament of Australia. Australia. Australian Government, Inquiry into Cyber Crime and its Impact on Australian Consumers. Sydney: Office of the Australian Information Commissioner; 2009. [Online]. Available: http://www.oaic.gov.au/images/documents/migrated/2009 -08-05053022/HoR_Comms_Cte_Cyber_Crime.pdf. [Accessed: 18 Jan 2013]. A. Langmaid, (2011, Jan. 9). “Vodafone Mobile Records Leaked onto the Internet”. Herald Sun [Online]. Available: http://www.heraldsun.com.au/news/victoria/vodafonemobile-records-leaked-onto-the-internet/story-e6frf7l61225984469970. [Accessed: Jan. 17, 2013]. Health Insurance Portability and Accountability Act of 1996, Pub. L. 104–191. United States of America. Drivers Privacy Protection Act of 1994, 18 U.S.C.x 2721. United States of America. Council of Europe. (2011, Jul.). Convention on Cybercrime: Member States of the Council of Europe – Article 12 [Online]. Available: http://conventions.coe.int/Treaty/Commun/ChercheSig.as p?NT=185&CM=&DF=&CL=ENG [retrieved: January, 2014]. Neil Robinson, Hans Graux, Davide Aria Parrilli, Lisa Klautzer and Lorenzo Valeri, “Comparative Study on Legislative and Non Legislative Measures to Combat Identity Theft and Identity Related Crime: Final Report,” RAND Europe [Online]. Available: http://ec.europa.eu/dgs/home-affairs/elibrary/documents/policies/organized-crime-and-humantrafficking/cybercrime/docs/rand_study_tr-982-ec_en.pdf [Accessed: May 15, 2014]. Latvia Criminal Code s 177(1). Romania Criminal Code Art. 215. Finland Criminal Code s 2 of Ch. 33. Denmark Criminal Code Art. 171. California Civil Code § 1798.29 (a) of 2003, [37] Indiana Code, § 24-4.9-3-1(1)(a) United States of America.

57. Australia. Parliament of Australia, Privacy Amendment (Privacy Alerts) Bill 2013,. Canberra: Parliament of Australia: 2014 [Online]. Available: http://parlinfo.aph.gov.au/parlInfo/search/display/display. w3p;query=Id%3A%22legislation%2Fems%2Fr5059_em s_96334aed-bdfb-4d27-809d-8c6085ac7f40%22 [Accessed: Jun. 13, 2014]. 58. B. Faulkner, “Hacking into Data Breach Notification Laws” Florida Law Review, vol. 59, no.5. pp. 1097-1108, March 2007. 59. T.M. Lenard and P.H. Rubin, “An Economic Analysis of Notification Requirements for Data Security Breaches” Emory Law and Economics Research Paper, No. 05-12. [Online]. Available: http://dx.doi.org/10.2139/ssrn.765845. [Accessed: 18 January 2013]. 60. Privacy and Electronic Communications (EC Directive) (Amendment) Regulations, 2011 United Kingdom. 61. Directive on privacy and electronic communications (Directive 2002/58/EC). 62. T.M. Lenard and P.H. Rubin, “Much Ado about Notification” Regulation, vol. 29, no. 1. pp.44-50, April 2006. 63. B. G. Arnold, “Losing It: corporate Reporting on Data Theft” Privacy Law Bulletin, vol. 3, No. 8. pp. 101-102, March, 2007. 64. M. Turner, Towards a Rational Personal Data Breach Notification Regime. PERC Information Policy Institute; Emory Law and Economics Research Paper No. 05-12. [Online]. Available: http://perc.net/files/downloads/data_breach.pdf. [Accessed: 18 January 2013]. 65. Australia. Australian Government, Australias better protected with mandatory breach notification. Sydney: Office of the Australian Information Commissioner; 2013. [Online]. Available: http://www.oaic.gov.au/newsand-events/media-releases/privacy-mediareleases/australians-better-protected-with-mandatorydata-breach-notification. [Accessed: 18 Jan 2013]. 66. Mondaq. “Australia: Mandatory data breach reporting bill introduced into parliament,” www.mondaq.com. [Online]. Available: http://www.mondaq.com/australia/x/247692/data+protecti on/Mandatory+Data+Breach+Reporting+Bill+Introduced +Into+Parliament. [Accessed: Jan. 18, 2013]. 67. Privacy Amendment (Privacy Alerts) Bill, 2013, Div. 2 s 6 ZB (f)(g). Canberra: Parliament of Australia. 68. Privacy Amendment (Privacy Alerts) Bill, 2013, Explanatory Memorandum. Canberra: Parliament of Australia. 69. Directive 2009/136/EC of the European Parliament and of the Council of 25 November 2009 amending Directive 2002/22/EC; Directive 2002/58/EC (EC) No 2006/2004. [Accessed: 18 Jan 2013]. 70. European Commission. “Proposal of the European Parliament and the Council on the protection of individuals with regard to the processing of personal data

151

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 141 - 152 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) and on the free movement (General Data Protection Regulation),” www.ec.eupopa.eu. [Online]. Available: http://ec.europa.eu/justice/dataprotection/document/review2012/com_2012_11_en.pdf. [Accessed: Jan. 18, 2013]. 71. Australia. Australian Government, Privacy law reform. Sydney: Office of the Australian Information

Commissioner: 2014. [Online]. Available: http://www.oaic.gov.au/privacy/privacy-act/privacy-lawreform#whatschanged. [Accessed: 5 May 2014].

152

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 153 - 162 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Composite Heuristic Algorithm for Clustering Text Data Sets Nikita Nikitinsky, Tamara Sokolova and Ekaterina Pshehotskaya InfoWatch [email protected], [email protected], [email protected]

ABSTRACT Document clustering has become a frequent task in business. Current topic modeling and clustering algorithms can handle this task, but there are some ways to improve the quality of cluster analysis, for example, by introducing some combined algorithms. In this paper, we will conduct some experiments to define the best clustering algorithm among LSI, LDA and LDA+GS combined with GMM and find heuristics to improve the performance of the best algorithm.

KEYWORDS Clustering, cluster analysis, topic modeling, LDA, LSI, GMM, Silhouette Coefficient

1 INTRODUCTION One of the most frequent applications of clustering in business is exploratory data analysis during marketing researches, for example, a customer satisfaction survey. But there is more than one way to use clustering techniques - cluster analysis of document sets sized up to 50 000 is a task, which may be essential in business. It might be necessary to cluster, for example, weekly document stream for DLP (Data Leakage Prevention) purposes (e.g. easier categorization of documents). To cluster small sets of documents we primarily need high clustering quality and may pay little attention to speed or computational complexity of a clustering algorithm – obviously, because modern computer hardware allows the user to perform complex computations in a short time, so small data sets are clustered fast even if an algorithm with high computational complexity is used. That is why we decided to conduct some experiments on algorithms with high

computational complexity in order to combine them in a way, allowing us to maximize quality of clustering. 2 METHODS Cluster analysis or clustering is a convenient method for identifying homogenous groups of objects called clusters. Objects in a specific cluster share many characteristics, but are very dissimilar to objects not belonging to that cluster. Further in this paper we will discuss clustering algorithms where every object can belong only to one cluster including cases where an object may belong to no cluster at all. In such cases we will create so called «garbage» cluster and put there all objects not classified by an algorithm. We will use the following clustering and topic modeling algorithms to create a combination showing highest performance: LSI (Latent Semantic Indexing) — is an unsupervised machine learning method, which is mostly used for dimensionality reduction. It is an indexing and retrieval method that uses a mathematical technique called singular value decomposition (SVD) to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text. LSI is based on the principle that words that are used in the same contexts tend to have similar meanings. A key feature of LSI is its ability to extract the conceptual content of a body of text by establishing associations between those terms that occur in similar contexts. [1] LDA (Latent Dirichlet Allocation) - is also an unsupervised machine learning method, which is mostly used for object clustering. It is a generative model that allows sets of 153

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 153 - 162 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) observations to be explained by unobserved groups that explain why some parts of the data are similar. For example, if observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's creation is attributable to one of the document's topics. Computational complexity of LDA+GS is O(NKW) where N is a number of documents, K is a number of clusters and W is the number of words in vocabulary. [2] Although mentioned above methods can be used alone, we will conduct experiments, in which we combine them with the following algorithms: GMM Classifier (Gaussian Mixture Model), which is an unsupervised machine learning method, is a probabilistic model that assumes all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters. Gaussian mixture models are often used for data clustering. Clusters are assigned by selecting the component that maximizes the posterior probability. Like k-means clustering, Gaussian mixture modeling uses an iterative algorithm that converges to a local optimum. Gaussian mixture modeling may be more appropriate than k-means clustering when clusters have different sizes and correlation within them. Clustering using Gaussian mixture models is sometimes considered a soft clustering method. The posterior probabilities for each point indicate that each data point has some probability of belonging to each cluster. Gaussian mixture distributions can be used for clustering data, by realizing that the multivariate normal components of the fitted model can represent clusters. Computational complexity of GMM (using EM-algorithm for convergence) is O(tkmn^3) where k is the number of clusters, n is the number of dimensions in a sample, m is a number of samples and t is a number of iterations. [3] Our choice fell on GMM because we considered it more robust and fast for cluster analysis compared to, for example, k-means. When applying GMM we arrange every object only to one cluster (thus, we make it

easier to estimate overall performance). GS (Gibbs Sampling) is a Markov chain Monte Carlo (MCMC) algorithm for obtaining a sequence of observations which are approximated from a specified multivariate probability distribution, when direct sampling is difficult. GS is widely used to enhance quality of topic modeling algorithms; it is a good algorithm for processing when the dimension of data is not very high. With high dimensional data it may be better to use Variational EM algorithm. [4] In our experiments we applied faster version of GS algorithm named Collapsed Gibbs Sampling algorithm. 3 EVALUATION METRICS To evaluate algorithm performance we used two types of metrics often utilized for cluster analysis purposes: 3.1

External Evaluation Metrics

In external evaluation, clustering results are evaluated based on data that was not used for clustering, such as known class labels and external benchmarks. Such benchmarks consist of a set of pre-classified items, and these sets are often created by human (experts). Thus, the benchmark sets can be thought of as a gold standard for evaluation. [5] We used the measurements:

following

external

Jaccard index - also known as the Jaccard similarity coefficient, is a statistic used for comparing the similarity and diversity of sample sets. The Jaccard coefficient measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets. [6] J ( A, B ) 

| A  B | | A  B |

(1)

V-measure score - is an entropy-based measure which explicitly measures how 154

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 153 - 162 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) successfully the criteria of homogeneity and completeness have been satisfied. V-measure is computed as the harmonic mean of distinct homogeneity and completeness scores, just as precision and recall are commonly combined into F-measure. V 

2 * (H * C ) (H  C )

(2),

where: H is homogeneity C is completeness

This metric is independent of the absolute values of the labels: a permutation of the class or cluster label values won’t change the score value in any way. [7] Adjusted Rand score - is a measure of the similarity between two data clusterings.[8] Adjusted mutual information score - a variation of mutual information (which is a measure of the variables' mutual dependence) may be used for comparing clusterings. [9][10] 3.2

Internal Evaluation Metrics

In internal evaluation clustering result is evaluated based on the data that was clustered itself. These methods usually assign the best score to the algorithm that produces clusters with high similarity within a cluster and low similarity between clusters. [11]

Which can be written as: S (i) 

( b (i)  a (i) )

(3),

max( a (i) , b (i) )

where: i is the sample a(i) is the average dissimilarity of i with all other data within the same cluster (i.e. the mean distance between a sample and all other points in the same class) b(i) is the lowest average dissimilarity of i to any other cluster which i is not a member (i.e. the mean distance between a sample and all other points in the next nearest cluster.)

We used cosine metric as the most common for measuring the distances for Silhouette Coefficient. When we have higher value of Silhouette Coefficient, it means that we have better distribution of documents to topics. [12] Based on Silhouette Coefficient measurements we apply Elbow method to define the number of clusters. This method assumes a choice of a number of clusters so that adding another cluster doesn't give much better modeling of the data (so called ―Knee of a curve‖). This method was originally designed to make predictions based on the percentage of variance explained and in some cases may appear unsuitable; in such cases we will choose the number of clusters where Silhouette Coefficient reaches maximum value [13]

We used the following internal measurement: Silhouette Coefficient — is a measure of how appropriately the data has been clustered and how well each object lies within its cluster. The Silhouette Coefficient is defined for each sample and is composed of two scores: 1. The mean distance between a sample and all other points in the same class. 2. The mean distance between a sample and all other points in the next nearest cluster.

Since, in real conditions, we are unable to use external metrics for evaluation of algorithms (because we usually don’t know the true number of clusters), we will evaluate quality of our models basing mostly on Silhouette Coefficient, applying external metrics as supplementary. The main external metric will be V-measure score as the most appropriate, because it is based upon two main criteria for clustering usefulness, homogeneity and completeness, which capture a clustering solution’s success in including all and only data-points from a given class in a given 155

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 153 - 162 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) cluster. For some cases we will utilize Jaccard index to let the reader better understand the situation.

4 DATA SETS We used some different data sets to check and validate the results: 1. Data set containing 600 documents, distributed to 5 topics – a «good» collection (distribution of documents: 83 to 163 documents per topic). Topics are easily distinguishable by human expert. 2. Data set containing 157 documents, distributed to 14 topics - «bad» collection (distribution of documents: 3 to 21 documents per topic). Topics are not distinguishable by human expert. 3. Data set containing 1000 documents, randomly assigned from the real document stream of the company; topic distribution is not predetermined; human experts considered the number of topics between 3 and 5 (including 3 and 5). 4. Data set containing 35000 documents, randomly assigned from the real document stream of the company; topic distribution is not predetermined. Human experts then estimated quality of the best algorithm performance on this data set. 5

EXPERIMENTS

We tested all these algorithms on the «good» collection to find out the best one and then evaluated the best algorithm performance on other collections 5.1

Choosing the Best Algorithm

5.1.1 LSI+GMM Data preprocessing: All words with length less than 3 symbols were deleted as well as all non-alphabetic characters. To obtain better results we preprocessed input

data with TF-IDF algorithm. In this algorithm we may vary two main parameters: number of LSI topics and number of GMM clusters. The LSI algorithm takes as input the collection of documents, processes it and then documents-topic matrix is returned. This matrix is then given to an input of GMM classifier, which processes the input matrix assembling documents to final categories (this is likely to increase the quality of clustering). We tested two heuristics: 1. Number of LSI topics is equal to number of output GMM clusters 2. Number of LSI topics is equal to number of output GMM clusters plus one, such as number of LSI topics is n+1, while number o GMM clusters is n (one of the topics becomes so called «garbage» topic — it accumulates objects, which could not be unambiguously arranged to other «real» topics) Table 1 contains evaluation metrics estimated on the «good» collection for LSI+GMM algorithm with 5 output categories: Table 1.

Heuristic 1 Heuristic 2 Jaccard index 0.575 0.57 Adjusted mutual 0.75 0.735 information score Adjusted Rand score 0.66 0.66 V measure score 0.74 0.74 Silhouette Coefficient 0.61 0.5 We can see that both heuristics showed comparable results when tested on a real number of categories; Heuristic 2 showed a decrease in Silhouette Coefficient value. But, more generally, if we vary the number of output categories and estimate Silhouette Coefficient and V-measure for them we will get the following results (Figures 1, 2, where green (upper) line is V-measure and blue (lower) line is Silhouette score):

156

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 153 - 162 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) assembling documents to final clusters (this must increase the quality of clustering). We tested the same two heuristics. Table 2 contains metrics estimated on the «good» collection for LDA+GMM algorithm with 5 output categories:

Table 2.

Figure 1. LSI+GMM, Heuristic 1

Jaccard index Adjusted mutual information score Adjusted Rand score V measure score Silhouette Coefficient

Figure 2. LSI+GMM, Heuristic 2

According to the results, the Silhouette Coefficient reached higher levels when we implemented Heuristic 2 (Figure 2). Nevertheless, both pikes indicated incorrect number of output clusters (6 and 8 correspondingly), V-measure showed comparable and also incorrect results (pikes at 6 and 8 correspondingly).

Heuristic 1 Heuristic 2 0.51 0.85 0.57 0.83

0.53 0.6 0.45

0.76 0.84 0.52

We can see that Heuristic 2 showed far better results for external metrics, but insignificantly better result for Silhouette Coefficient. If we vary the number of output categories and estimate both Silhouette Coefficient score and V-measure score for them we will get the following results (Figures 3, 4 where green (upper) line is V-measure and blue (lower) line is Silhouette score):

5.1.2. LDA+GMM Data preprocessing: All words with length less than 3 symbols were deleted as well as all non-alphabetic characters Words occurring only once (hapax legomena) were deleted

Figure 3. LDA+GMM, Heuristic 1

The LDA algorithm takes as input the collection of documents processes it and then documents-topic matrix is returned. This matrix is then given to an input of GMM classifier which processes the input matrix

157

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 153 - 162 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

changing the number of samples and keeping other parameters the same. The best number of Gibbs Samples is considered the number of samples when metric (e,g, Silhouette Coefficient) reaches highest values and then doesn't fluctuate much.

Figure 4. LDA+GMM, Heuristic 2

According to the results, Silhouette Coefficient reached a bit higher levels when we implemented Heuristic 2 (Figure 4). Nevertheless, both pikes indicated incorrect number of output clusters (4 and 7 correspondingly). V-measure, on the contrary, reached higher levels at Figure 3, however, indicating wrong number of categories (7 clusters), but on the Figure 4 this measure correctly identified correct number of clusters (5 clusters) reaching at the same time lower levels.

Figure 5.

5.1.3 LDA+GS+GMM Data preprocessing: All words with length less than 3 symbols were deleted as well as all non-alphabetic characters Words occurring only once (hapax legomena) were deleted In this algorithm we may vary three main parameters: number of LDA topics, number of Gibbs Samples and number of GMM clusters. For given quantity of LDA topics there are n iterations of Gibbs Sampling (where n is number of Gibbs Samples) and then documents-topic matrix is returned. This matrix is then given to an input of GMM classifier which processes the input matrix assembling documents to final clusters. Choosing proper number of Gibbs Samples: Knowing the real quantity of output categories we iteratively start the algorithm

Figure 6. We selected the best number of GS samples on the ―good‖ collection. The unchanged parameters were the number of LDA topics and the number of GMM clusters (as in Heuristic 2). As we can see from the picture (Figure 5), the plotted line reaches highest values at 50 samples and then don’t fluctuate much, so we assume that we can choose any quantity of samples above 50. Figure 6 verifies this assumption: plotted lines of Jaccard index and V-measure also reach highest levels and don’t fluctuate much at approximately 50 samples. Thus we will then use 100 samples as optimal 158

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 153 - 162 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) and versatile number of samples. We tested the same two heuristics. Table 3 contains metrics estimated on the «good» collection for LDA+GS+GMM algorithm with 5 output categories:

Table 3.

Heuristic 1 Heuristic 2 Jaccard index 0.66 0.99 Adjusted mutual 0.77 0.99 information score Adjusted Rand score 0.72 0.99 V measure score 0.79 0.99 Silhouette Coefficient 0.82 0.98

We can see that Heuristic 2 showed far better results for all metrics. It means that documents are better distributed to said number of output categories with Heuristic 2 implemented for this algorithm. If we vary the number of output categories and estimate both Silhouette Coefficient score and V-measure score for them we will get the following results (Figure 7, where green (upper) line is V-measure and blue (lower) line is Silhouette score) and Figure 7 where green (lower) line is V-measure and blue (upper) line is Silhouette score):

Figure 8. LDA+GS+GMM, Heuristic 2

According to the results, while both pikes indicated the same true number of clusters, Silhouette Coefficient reached higher levels when we implemented Heuristic 2 (Figure 8). V-measure also reached higher levels on Figure 7 while indicating true number of categories on both figures. We can suggest that Heuristic 2 improves the performance of LDA+GS+GMM and intensifies the results making it easier to determine the number of output categories 5.2 Estimating the Best Algorithm on Other Data Sets We tested LDA+GS+GMM algorithm on other collections using the parameters that we considered the best testing the algorithm on the ―good‖ collection: Number of GS samples is equal to 100 Number of LDA topics is equal to number of GMM clusters plus one (e.g. while number of GMM clusters is 5, number of LDA topics is 6) Data preprocessing: All words with length less than 3 symbols were deleted as well as all non-alphabetic characters Words occurring only once (hapax legomena) were deleted

Figure 7. LDA+GS+GMM, Heuristic 1

5.2.1 Data Set №2 We tested LDA+GS+GMM algorithm on the «bad» collection, estimated Silhouette coefficient score, V-measure score and Jaccard coefficient on it and had the following results (see Silhouette score on Figure 9,

159

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 153 - 162 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) Jaccard score and V-measure score on Figure 10, where green (upper) line is V-measure score and blue (lower) line is Jaccard coefficient):

number of documents is a need. But for a collection, where a difference in vocabulary between documents of different categories is significant this is likely not to be an issue. For example, a group of articles about cars and another group of articles about vegetables will be easily clustered even if we have size of each group of articles less than 50 items. 5.2.2. Data Set №3 We tested LDA+GS+GMM algorithm on the data set №3 containing 1000 documents and had the following results (Figure 11):

Figure 9.

Figure 11.

Figure 10.

Assuming that we selected the optimal parameters and using Elbow method based on Silhouette Coefficient plot we found it impossible to define (even approximately) the best number of output categories. Jaccard coefficient and V-measure score also showed contradictory results. We obtained such results because of two main factors: 1. The distribution of documents to topics is conventional (in such cases there are either no much difference in vocabulary between documents of different categories or difference between all documents is too high to group at least some of them into one definite cluster) 2. Number of documents is small. For a collection with such conventional distribution of documents to topics the decently large

As soon as we can’t use external evaluation metrics to estimate quality of clustering on this data set (because we don’t know the true number of clusters), here we will utilize only Silhouette Coefficient. Basing on Silhouette Coefficient plot we decided that 4 categories is the best number of clusters for this data set. Human experts considered the result of the algorithm good. Documents in four categories could easily be defined as contracts, financial documents, application forms and information letters + instructions. 5.2.3. Data Set №4

160

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 153 - 162 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Figure 12.

As soon as we can’t use external evaluation metrics to estimate quality of clustering on this data set (because we don’t know the true number of clusters), here we will utilize only Silhouette Coefficient. Basing on Silhouette Coefficient plot (Figure 12) we decided that 8 categories were the best quantity for this data set. Human experts defined documents in 8 categories as contracts, financial documents, documents in other languages, information letters, instructions, application forms and other internal documents.

6 AUTOMATING NUMBER OF CLUSTERS DETECTION The question about the most versatile algorithm to define proper number of clusters is still open-ended, but there are some general methods that may help to find a so called Knee of a curve. They are: 1. The largest magnitude difference between two points 2. The largest ratio difference between two points 3. The first data point with a second derivative above some threshold value 4. The data point with the largest second derivative 5. The point on the curve that is furthest from a line fitted to the entire curve. This list is ordered from the methods that make a decision about the knee locally, to the

methods that locate the knee globally by considering more points of the curve. The first two methods use only single pairs of adjacent points to determine where the knee is located. The third and fourth methods use more than one pair of points, but still only consider local trends in the graph. The last method considers all data points at the same time. Local methods may work well for smooth, monotonically increasing/decreasing curves. However, they are very sensitive to outliers and local trends, which may not be globally significant. The fifth method takes every point into account, but only works well for continuous functions, and not curves where the knee is a sharp jump. [14] When we use other evaluation metrics – not Silhouette Coefficient score - these simple number of cluster detection methods may help. But, nevertheless, in most cases it may be enough simply to choose a number of clusters, where Silhouette Coefficient score reaches the highest value. 7

CONCLUSION

According to the experiments we conducted, the best algorithm for processing relatively small set of documents (up to 50 000) with relatively small quantity of topics (up to 20) is LDA+GS+GMM. The Heuristic 2 may help to improve quality of LDA+GS+GMM and make it easier to determine number of output categories. Usage of Silhouette Coefficient is considered appropriate for determining best number of output clusters. The data set should not be too small in order to provide the clustering algorithm with processable data: data sets containing less than 500 documents are likely to be incorrectly classified provided the data set contains documents with no much difference in vocabulary between them. In cases, when we have data set with significant differences in vocabulary between its items this is not an issue. 8

FURTHER READING

There are some papers on automated number of clusters detection algorithms, such as [14],

161

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 153 - 162 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) proposing state-of-the-art algorithms that may be useful for cluster analysis. Hierarchical Dirichlet Process (HDP) is also a generative probabilistic topic modeling algorithm for text clustering, showing the performance comparable to Latent Dirichlet Allocation topic modeling algorithm [15] Although Latent Dirichlet Allocation works well for topic modeling there are now conducted multiple researches on more advanced topic modeling algorithms such as Higher-order Latent Dirichlet Allocation and other Higher-order topic modeling algorithms [16]. For processing large collection of documents different algorithms will be helpful, where the data is partitioned across separate processors and inference is done in a parallel, distributed fashion. These algorithms are Approximate Distributed Latent Dirichlet Allocation (ADLDA), Hierarchical Distributed Latent Dirichlet Allocation (HD-LDA) and Approximate Distributed Hierarchical Dirichlet Processes (AD-HDP). The easiest to implement algorithm among these three is AD-LDA, but it has no formal convergence guarantee. HD-LDA is more complicated than AD-LDA, but it inherits the usual convergence properties of Markov chain Monte Carlo (MCMC). AD-HDP algorithm followed the same approach as AD-LDA, but with an additional step to merge newly instantiated topics. [17]

9

REFERENCES

[1] Deerwester, S., et al, Improving Information Retrieval with Latent Semantic Indexing, Proceedings of the 51st Annual Meeting of the American Society for Information Science 25, 1988, pp. 36–40. [2] Blei, David M.; Ng, Andrew Y.; Jordan, Michael I (January 2003). "Latent Dirichlet allocation". In Lafferty, John. Journal of Machine Learning Research 3 (4–5): pp. 993–1022 [3] Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin [4] Casella, George; George, Edward I. (1992). "Explaining the Gibbs sampler". The American Statistician 46 (3): 167–174

in data. An introduction to cluster analysis. Wiley, Hoboken, NY [6] Tan, Pang-Ning; Steinbach, Michael; Kumar, Vipin (2005), Introduction to Data Mining [7] Rosenberg, Andrew and Hirschberg , Julia. V-Measure: A conditional entropy-based external cluster evaluation measure. Columbia University, New York. [8] Rand, W. M. (1971). "Objective criteria for the evaluation of clustering methods". Journal of the American Statistical Association (American Statistical Association) 66 (336): 846–850 [9] Meila, M. (2007). "Comparing clusterings—an information based distance". Journal of Multivariate Analysis 98 (5): 873–895. [10] Vinh, N. X.; Epps, J.; Bailey, J. (2009). "Information theoretic measures for clusterings comparison". Proceedings of the 26th Annual International Conference on Machine Learning - ICML '09. p. 1. [11] Manning, Christopher D, Raghavan, Prabhakar & Schütze, Hinrich. Introduction to Information Retrieval. Cambridge University Press. [12] Rousseeuw, Peter J. (1987). "Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis". Computational and Applied Mathematics 20: 53–65 [13] Ketchen, David J., Jr & Shook, Christopher L. (1996). "The application of cluster analysis in Strategic Management Research: An analysis and critique". Strategic Management Journal 17 (6): 441–458. [14] Salvador, Stan and Chan, Philip. Determining the Number of Clusters/Segments in Hierarchical Clustering/Segmentation Algorithms, Dept. of Computer Sciences Florida Institute of Technology, Melbourne [15] The, Yee Whye; Jordan, Michael I.; Beal, Matthew J.; Blei, David M.. ―Hierarchical Dirichlet Processes‖, Journal of the American Statistical Association. Vol. 101, No. 476, Dec., 2006, pp. 1566-1581 [16] Nelson, Christie, Pottenger, William M., Keiler, Hannah, and Grinberg, Nir. "Nuclear Detection Using Higher-Order Topic Modeling." 2012 IEEE International Conference on Technologies for Homeland Security. Waltham, MA. 13-15 Nov 2012. [17] Newman, David; Asuncion, Arthur; Smyth, Padhraic; Welling, Max. ―Distributed Algorithms for Topic Models‖, Journal of Machine Learning Research 10 (2009), pp. 1801-1828

[5] Kaufman L, Rousseeuw PJ (2005) Finding groups

162

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 163-169 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Enhanced Mobile Authentication Techniques Zakaria Zakaria Hassan Communication Engineering Department Higher Technological Institute Cairo , Egypt [email protected]

Talaat A. Elgarf Communication Engineering Department Higher Technological Institute Cairo , Egypt [email protected]

Abstract—Milenage algorithm applies the block cipher Rijnadael (AES) with 128 bit key and 128 bit block size. This algorithm is used in the 3GPP authentication and key generation functions (f1, f1*, f2, f3, f4, f5 and f5*) for mobile communication systems (GSM/UMTS/LTE). In this paper a modification of Milenage algorithm is proposed through a dynamic change of S-box in AES depending on secret key. To get a new secret key for every authentication process we add the random number (RAND) transmitted from the authentication center (AUC) to the contents of the fixed stored secret key (Ki) and thus the initialization of the AES will be different each new authentication process . For every change in secret key a new S-box is derived from the standard one by permuting its rows and columns with the help of a new designed PN sequence generator. A complete simulation of modified Milenage and PN sequence generator is done using Microcontroller (PIC18F452). Security analysis is applied using Avalanche test to compare between the original and modified Milenage. Tests proved that the modified algorithm is more secure than the original one due to the dynamic behavior of S-box with every change of the secret key and immunity against linear and differential cryptanalysis using Avalanche tests. This makes the modified Milenage more suitable for the applications of authentication techniques specially for mobile communication systems. Keywords—Authentication vector (AV), Modified MILENAGE Algorithm for AKA Functions (F1, F1*, F2, F3, F4, F5, F5*), AES, Dynamic S-BOX and PN Sequence Generator (LFSR).

I. INTRODUCTION Authentication includes the authenticity of the subscriber as well as the network. Authentication of mobile subscribers and network operators is a challenge of future researchers due to increasing security threats and attacks with the enhanced volume of wireless traffic. Authentication schemes in mobile communication systems are initiated during international mobile subscriber identity attach,

Abdelhalim Zekry Communication Engineering Department Faculty of Engineering Ain shams University Cairo, Egypt [email protected]

location registration, location update with serving network change, call setup, activation of connectionless supplementary services and short message services (SMS). Milenage algorithm is used for generating authentication and key agreement of cryptographic generating functions (MAC, XRES, CK and IK). The main core of Milenage algorithm is the Advanced Encryption Standard (AES) [1] which launched as a symmetrical cryptographic standard algorithm by the National Institute of Standard and Technology (NIST) in October 2000, after a four year effort to replace the aging DES. The Rijndael proposal for AES defined a cipher in which the key length can be independently specified to be 128 , 192 or 256 bits but the input and output block length is 128 bits [2],[3]. Four different stages are used in AES : Sub Byte transformation, Shift Rows, Mix Columns and Add Round Key. For both encryption and decryption, the cipher begins with an Add Round Key stage, followed by nine rounds that each includes all four stages, followed by a tenth round of three stages. [4]. This paper is organized as follows: In Section II, authentication schemes in mobile communications are described. In Section III, a proposed authentication scheme is presented depending on the dynamic change of S-box in AES, the new secret key for every authentication process and the new PN sequence generator. In Section IV, a complete simulation of the modified Milenage algorithm and the Avalanche test results are introduced. Discussions and Conclusions are presented in Section V.

163

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 163-169 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

II. AUTHENTICATION SCHEMES IN MOBILE COMMUNICATIONS. (i) Global System for Mobile Communication (GSM) / General Packet Radio Service (GPRS) Authentication and Key Agreement vectors . There exists a permanent, shared secret key Ki for each subscriber. This permanent key is stored in two locations: in the subscriber’s SIM card and in the Authentication Centre (AuC). The key Ki is never moved from either of these two locations. Authentication of the subscriber is done by checking that the subscriber has access to Ki. This can be achieved by challenging the subscriber by sending a random 128-bit string random sequence number (RAND) to the terminal. The terminal has to respond by computing a one-way function with inputs of RAND and the key Ki, and returning the 32-bit output Signed Response (SRES) to the network. Inside the terminal, the computation of this one-way function, denoted by A3, happens in the Subscriber Identity Module (SIM) card. During the authentication procedure, a temporary session key Kc is generated as an output of another one-way function A8. The input parameters for A8 are the same as for A3: Ki and RAND. The session key Kc is subsequently used to encrypt communication on the radio interface. The serving network does not have direct access to the permanent key Ki, so it cannot perform the authentication alone. Instead, all relevant parameters, so called the authentication triplet (RAND, SRES and Kc) are sent to the serving network element Mobile Switching Centre/Visitor Location Register (MSC/VLR) or Serving GPRS Support Node (SGSN) in the case of General Packet Radio Service (GPRS) from the authentication center (AuC) [5], [6]. (ii) Universal Mobile Telecommunications System (UMTS)/ Long Term Evolution (LTE) /Advanced LTE Authentication and Key Agreement Vectors.

. Universal Mobile Telecommunications System (UMTS) Generation of Authentication vectors (Quintets) in the authentication center (AUC). Upon the receipt of the authentication data request from the Visitor Location Register (VLR) / Serving GPRS Support Node (SGSN), The Home Location Register (HLR) / Authentication Centre (AuC) sends an authentication response back to the VLR/SGSN that contains an ordered array of n authentication vectors AV (1...n). The HLR/AuC starts with generating a fresh sequence number SQN and an unpredictable challenge

RAND. The authentication vectors are ordered based on sequence number. [5]. There are eight Cryptographic functions used in UMTS/LTE/Advanced LTE Authentication and Key Agreement to generate Authentication vector (AV). f0 is the random challengegenerating function. It should be a pseudo random number-generating function and map the internal state of the generator to the challenge value RAND, the length of RAND is 128 bits. The f1 is the network authentication function, f1* is the re-synchronization message authentication function, it is used to provide data origin authentication for synchronization failure information sent by the USIM to the AuC, f2 is the user authentication function, f3 is the cipher key derivation function , f4 is the integrity key derivation function, f5 is the anonymity key derivation function for normal operation and f5* is the anonymity key derivation function for resynchronization, f5* is only used to provide user identity confidentiality during resynchronization. K is the subscriber authentication key stored in the USIM and at the AuC, The length of K is 128 bits. [5], [7], [8]. To generate authentication quintuple, the HLR\AUC computes a message authentication code for authentication MAC-A= f1k(SQN || RAND || AMF), the length of MAC-A is 64bits. An expected response XRES = f2k (RAND), the length of XRES is 64bits. a cipher key CK = f3k (RAND), the length of CK is 128bits. An integrity key IK = f4k (RAND) the length of IK is 128bits and an anonymity key AK = f5k (RAND), the length of AK is 48bits that is used to conceal sequence number SQN, the length of SQN is 48bits, SQN= SQN ⊕ AK. The HLR/AuC aggregates the authentication token AUTN = SQN [⊕ AK] || AMF (16bits) || MAC-A, the lengths of AUTN is 128bits that forms the quintet Q =AV= (RAND, XRES, CK, IK, AUTN). [7], [8], [9].

.

Authentication and key derivation in the Universal Subscriber Identity Module (USIM). Upon receipt of a (RAND, AUTN), the USIM computes the anonymity key AK = f5k (RAND) and retrieves the unconcealed sequence number SQN = (SQN ⊕AK) ⊕ AK, XMAC-A = f1k (SQN || RAND || AMF), the response RES = f2k (RAND), the cipher key CK = f3k (RAND) and the integrity key IK = f4k (RAND) as shown in fig.2. [5], [6].

164

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 163-169 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

RAND

III.

AUTN

f5

SQN  AK

AK



AMF

MAC

SQN K

f1

f2

f3

f4

XMAC

RES

CK

IK

Verify MAC = XMAC Verify that SQN is in the correct range

Figure 1. Authentication and key derivation in the Universal Subscriber Identity Module [7].

(iii) Long Term Evolution (LTE) /Advanced LTE Generation of Authentication Vectors in the Home Subscriber Server (HSS). The LTE architecture is built on the existing architecture from UMTS. LTE standards reuse the authentication and key-agreement of UMTS. The LTE/Advanced LTE Authentication and Key Agreement (AKA) protocol also known as the Evolved Packet System (EPS) AKA protocol. The EPS-AKA protocol is executed between UE and the MME instead of between the USIM and the VLR/SGSN. The AuC generates UMTS AVs for EPS AKA in exactly the same format as for UMTS AKA. The Home Subscriber Server (HSS) part outside the AuC derives Local Master Key in EPS (KASME) from the CK and IK. EPS AV consists of [RAND, XRES, a local master key KASME and an AUTN] as shown in fig.1. [10], [11], [12].

Proposed Authentication Scheme in Mobile Communication. A modification of Milenage algorithm is proposed through a dynamic change of S-box in AES depending on the new secret key. To get a new secret key for every authentication process we add the random number (RAND) transmitted from the authentication center (AUC) to the contents of the fixed stored secret key (Ki) and so, the initialization of the AES will be different for each authentication process. For every change in secret key a new S-box is derived from the standard one by permuting its columns and rows with the help of a new designed PN sequence generator. Finally to get a strong Milenage algorithm generating all functions f1, f1*, f2, f3, f4, f5, and f5* and the outputs of the various functions used in User Authentication, Network Authentication, Data Integrity Check and Ciphering data. The outputs of the various functions are then defined as shown in fig.3.  Output of f1 = MAC-A, where MAC-A[0] .. MAC-A[63] = OUT1[0] .. OUT1[63] 

Output of f1* = MAC-S, where MAC-S[0] .. MAC-S[63] = OUT1[64] .. OUT1[127]



Output of f2 = RES, where RES[0] .. RES[63] = OUT2[64] .. OUT2[127]



Output of f3 = CK, where CK[0] .. CK[127] = OUT3[0] .. OUT3[127]



Output of f4= IK, where IK[0] .. IK[127] = OUT4[0] .. OUT4[127]



Output of f5= AK, where AK[0] .. AK[47] = OUT2[0] .. OUT2[47]



Output of f5* = AK, where AK[0] .. AK[47] = OUT5[0] .. OUT5[47] RAND OP

OPC SQN||AMF||SQN||AMF

OPC

c1

AUTN

UMTS (AV) = [RAND || XRES || CK || IK || AUTN]. EPS

(AV) = [RAND || XRES || KASME|| AUTN].

Figure 2. Generation of UMTS and EPS authentication vectors. [6].

c2

c3

f5

c4

OPC f2

OPC

rotate by r4

EK

OPC f1*

OPC

rotate by r3

EK

OPC f1

OPC

rotate by r2

EK

= [SQN ⊕ AK || AMF || MAC].

OPC

EK

OPC

rotate by r1

EK

c5 EK

OPC f3

rotate by r5

EK OPC

f4

f5*

Figure 3. Computation of the MILENAGE functions. [13].

165

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 163-169 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Upgrade of S-box (Dynamic S-box) depends on the new secret key (Key⊕RAND) for every authentication process and the new PN Random sequence generator [14]. The suggested generator consists of three Maximal lengths Linear Feedback Shift Register (LFSR) with thirty two, seventeen and fifteen taps. The period of this PN sequence= (232-1) (217-1) (215-1) the 1st 128 bits of the PN sequence generator is taken as the secret key to upgrade the S-box. The 1st 64 bits to rearrange the columns and the 2nd 64 bits to rearrange the rows of original S-box. The feedback functions of the LFSRs are: [15].

TABLE 2. FOR COLUMNS DYNAMIC S-BOX AFTER ARRANGEMENT = [324E8C5160BAD97F].

LFSR 1: F1=X15+X14 +1 LFSR 2: F2=X32+X22+X2+X+1 LFSR 3: F3= X17+X14 +1 To initialize the PN sequence generator as shown fig.4, the new secret key is divided into two vectors of 64 bit length that are XORed to produce the initial state of the PN sequence generator (64bits). Let the fixed stored authentication key Ki = [6C 38 A1 16 AC 28 0C 45 4F 59 33 2E E3 5C 8C 4F] and the RAND = [ EE 64 66 BC 96 20 2C 5A 55 7A BB EF F8 BA BF 63 ], the new secret key = Ki⊕RAND = [82 5C C7 AA 3A 08 20 1F 1A 23 88 C1 1B E6 33 2C]. The initialization vector of the PN sequence generator (reshaped new secret key) = [98 7F 4F 6B 21 EE 13 33]. The 1st 64 bits of the PN sequence generator will be [324E8C5160BAD97F] used to rearrange columns of S-box and the 2nd 64 bits of the PN sequence generator will be [F3A1597682CEBD40] to rearrange the rows of S-box to have the final modified form.

TABLE3. FINAL S-BOX ROWS AFTER ARRANGEMENT = [F3A1597682CEBD40] THAT USED IN MODIFIED MILENAGE ALGORITHM DURING THE NEW SECRET KEY TO GENERATE A NEW S-BOX SO CALLED [DYNAMIC KEY (S-BOX)].

IV. SIMULATION AND RESULTS A complete simulation of the modified Milenage algorithm is achieved using Microcontroller (PIC18F452). The Avalanche tests are introduced to compare between the original and modified milenage. (i) For AES standard – 128

Figure 4. PN random sequence generator. Table 1. AES standard S-BOX.

Plain text = [CF5747102773651A6E238818 A27CB9EF], Secret Key= [885C3649 B840D9E006D061F5F6FC6046] and Cipher Text = [B218A58FA18EB4B764737D51 83378B4E]. (ii) For Modified AES-128 [Dynamic S-box] using PN sequence random generator. Reshaped Secret Key 64 bit= [8E8C57BC4EBCB9A6], Columns dynamic Sbox after arrangement = [6093B714F2AEDC58] and final dynamic S-box ROWs after arrangement = [A14FC8D65B09E372].

166

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 163-169 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) TABLE 4. MODIFIED AES (DYNAMIC S-BOX) – 128. Plain text

=

CF 57 47 10 27 73 65 1A 6E 23 88 18 A2 7C B9 EF

TABLE 7. SAMPLES OF AVALANCHE TEST DUE TO CHANGE ONE BIT IN SECRET KEY OF AES-128 STANDARD ALGORITHM.

Secret Key = 88 5C 36 49 B8 40 D9 E0 06 D0 61 F5 F6 FC 60 46 Cipher Text = 38 B1 4D 2A 56 81 2F 13 FF EE 38 69 FA A4 77 40

(iii) Avalanche test

80.00% 60.00% 40.00% 20.00% 0.00% 1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 109 115 121 127

Avalanche effect Ratio

TABLE 5 . SAMPLES OF AVALANCHE TEST DUE TO CHANGE ONE BIT IN PLAINTEXT OF AES-128 STANDARD ALGORITHM.

128-bits

TABLE 6. SAMPLES RESULTS OF CIPHER TEXT AND AVALANCHE TEST DUE TO CHANGE ONE BIT IN PLAIN TEXT OF MODIFIED AES-128 ALGORITHM.

Figure 5. Avalanche effects of AES standard due to change one bit in Secret Key. TABLE 8. SAMPLES OF AVALANCHE TEST DUE TO CHANGE ONE BIT IN SECRET KEY OF MODIFIED AES-128 ALGORITHM.

167

70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% 1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 109 115 121 127

Avalanche effect Ratio

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 163-169 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

128 - bits

Figure 6. Avalanche effects of Modified AES due to change one bit in Secret Key. TABLE 9. RESULT OUTPUTS OF MODIFIED MILENAGE ALGORITHM TO DERIVE A STRONGER AUTHENTICATION VECTOR (AV) THAN OUTPUT OF STANDARD MILENAGE ALGORITHM (AUTHENTICATION VECTOR) IN 3GPP. [16], [17]. Key

6C 38 A1 16 AC 28 0C 45 4F 59 33 2E E3 5C 8C 4F

RAND

EE 64 66 BC 96 20 2C 5A 55 7A BB EF F8 BABF63

Dynamic Key

82 5C C7 AA 3A 08 20 1F 1A 23 88 C1 1B E6 33 2C

SQN AMF

414B9822 2181 4464

OP OPC TEMP OUT1 OUT2

1B A0 0A 1A 7C 67 00 AC 8C 3F F3 E9 6A D087 25 0A 3B 6E 4F 0C 94 36 9D 78 77 5A 2B 4D 46 42 A2 73 53 4E 81 30 59 7F D6 CC 0A 37 49 64 AF FB 19 7E 9E 92 1E 91 4B 06 C1 8F 77 84 C9 04 72 0D 25 6D 1E D4 7C 6B 80 9A BB 98 B9 2A 6C EA 33D18B

OUT3 OUT4

A2 33 69 6D 78 E0 3B D0 2B 20 0F CB 64 93 BD 95 4B 84 A8 0E 4C 44 F1 30 C6 1D D1 CF AC 52 63ED

OUT5 F1(MAC-A)

22 13 C7 4D F3 E2 89 AB 7A BC 96 1D B3 CD88C3 7E9E921E914B06C1

F1*(MAC-S)

8F7784C904720D25

F2(RES)

98B92A6CEA33D18B

F3(CK)

A233696D78E03BD02B200FCB6493BD95

F4(IK)

4B84A80E4C44F130C61DD1CFAC5263ED

F5(AK)

6D1ED47C6B80

F5*(AK)

2213C74DF3E2

AUTN

2C554C5E4A0144647E9E921E914B06C1

AV

EE6466BC96202C5A557ABBEFF8BABF6398B92A 6CEA33D18BA233696D78E03BD0A233696D78E03 BD02B200FCB6493BD954B84A80E4C44F130C61 DD1CFAC5263ED2C554C5E4A0144647E9E921E9 14B06C1

V. DISCUSSION AND CONCLUSIONS (i) The main weakness in Milenage, as stated by the cryptanalysts, is the use of bit rotations and constant XORs in the middle part of the milenage. Specially, if the kernel block cipher in milenage algorithm is susceptible to differential cryptanalysis, then an attacker is capable to do a variety of attacks on milenage algorithm. An attacker cannot predict any useful information if the kernel block cipher in milenage algorithm is a strong secure. This paper modifies the standard Milenage Authentication algorithm through the dynamic

change of the kernel block cipher AES. For every Authentication process a new S-box will be generated using a combination of received random sequence number (RAND), stored Authentication key (Ki) and PN sequence generator to rearrange the columns and rows of standard S-box in AES. Tests proved that the modified AES is more secure than the standard one, due to its dynamic structure. In addition to increasing its immunity to linear and differential cryptanalysis as shown by avalanche test results in table 10. TABLE 10. AVERAGE VALUE OF AVALANCHE TESTS FOR (PLAIN TEXT – SECRET KEY) IN AES AND MODIFIED AES. Input type of data Plaintext Plaintext Secret key Secret key

Type of algorithm Modified AES AES Modified AES AES

Avalanche average value 50.15% 49.71% 49.86% 49.84%

(ii) Execution time can be reduced as follows: MILENAGE algorithm with all the functions f1 to f5* are designed and implemented on an IC card processing with a 8-bit microprocessor running at 3.25 MHz with 8 kbyte ROM and 300byte RAM and produce AK, XMAC-A, RES, CK and IK in less than 500 ms execution time.[19]. Modified MILENAGE algorithm with all the functions f1 to f5* are designed and implemented on a microcontroller (PIC18F452) processing with 8-bit microprocessor running at 11.0592 MHz with 32 Kbyte ROM [6450 program bytes used from a possible 32768 (19.68%)] and 1536 byte RAM [1232 variable bytes used from a possible 1536 (80.21%)] and produce AK, XMAC-A, RES, CK and IK in 50.333 ms execution time. REFERENCES [1] P. Kitsos, N. Sklavos, O. Koufopavlou “UMTS security: system architecture and hardware implementation” in Wireless Communications and Mobile Computing.-May 2007.-Issue (4):Vol. (7).-pp. 483-494. [2]Federal Information Processing Standards Publications (FIPS 197), "Advanced Encryption Standard (AES) ", 26 Nov.2001. [3] J. Daemen and V. Rijmen, The block cipher Rijndael, Smart Card research and Applications, LNCS 1820, Springer-Verlag, pp. 288-296. [4] Reshma Nadaf and Veena Desai “Hardware Implementation of Modified AES with Key Dependent Dynamic S-Box” IEEE ICARET 2012.

168

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 163-169 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) [5] Valterri Niemi and Kaisa Nyberg “UMTS security”. England: John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester,West Sussex PO19 8SQ, ISBN 0-470-84794-8,2003. [6] Dan Forsberg, Gunther Horn, Wolf-Dietrich Moeller and Valtteri Niemi.”LTE security”. United Kingdom: John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ,2013. [7] 3GPP TS 33.102 V11.5.1 (2013-06) Technical Specification; Third Generation Partnership Project; Technical Specification Group Services and System Aspects; 3G Security; Security architecture (Release 11). [8] 3GPP TS 33.105 V11.0.0 (2012-09) Technical Specification; Third Generation Partnership Project; Technical Specification Group Services and System Aspects; 3G Security; Cryptographic algorithm requirements (Release 11). [9] Stefan Pütz, Roland Schmitz, Tobias Martin “Security Mechanisms in UMTS” DBLP : journals/dud/PutzSM01 ,Vol.25 , No.6,June 2001. [10] 3GPP TS 33.401 V12.9.0 (2013-09) Technical Specification; Third Generation Partnership Project; Technical Specification Group Services and System Aspects; 3GPP System Architecture Evolution (SAE);Security architecture (Release 12). [11] Sebastian Banescu and Simona Posea “Security of 3G and LTE”. Faculty of Computer Science , Eindhoven University of Technology. [12] Mun, H., Han, K., & Kim, K. 3G-WLAN interworking: security analysis and new authentication and key agreement based on EAPAKA.WirelessTelecommunications Symposium. WTS2009 (pp. 18). IEEE. (2009). [13] 3GPP TS 35.206 V11.0.0 (2012-09) Technical Specification; Third Generation Partnership Project; Technical Specification Group Services and System Aspects; 3G Security; Specification of the MILENAGE Algorithm Set: An example algorithm set for the 3GPP authentication and key generation functions f1, f1*, f2, f3, f4, f5 and f5*; Document 2: Algorithm Specification (Release 11). [14] khaled suwais and Azman samsudin “New Classification of Existing Stream Ciphers” INTECH Journal ,1 Feb.2010. [15] Shinsaku Kiyomoto , Toshiaki Tanaka and Kouichi Sakurai “K2: A Stream Cipher Algorithm using Dynamic Feedback Control” Springer, Communications in Computer and Information Science ,, Vol.23, 2009, pp 214-226. [16] 3GPP TS 35.207 V11.0.0 (2012-09) Technical Specification; Third Generation Partnership Project; Technical Specification Group Services and System Aspects; 3G Security; Specification of the MILENAGE Algorithm Set: An example algorithm set for the 3GPP authentication and key generation functions f1, f1*, f2, f3, f4, f5 and f5*; Document 3: Implementors' test Data (Release 11). [17] 3GPP TS 35.208 V11.0.0 (2012-09) Technical Specification; Third Generation Partnership Project; Technical Specification Group Services and System Aspects; 3G Security; Specification of the MILENAGE Algorithm Set: An example algorithm set for the 3GPP authentication and key generation functions f1, f1*, f2, f3, f4, f5 and f5*; Document 4: Design Conformance Test Data (Release 11).

[18] S3-010014 3GPP TSG SA WG3 Security “Analysis of the Milenage Algorithm Set.”QUALCOMM International , Gothenburg, Sweden, 27 February - 02 March, 2001. [19] 3GPP TS 35.909 V10.0.0 (2011-03) Technical Report ; Third Generation Partnership Project; Technical Specification Group Services and System Aspects; 3G Security; Specification of the MILENAGE Algorithm Set: Document 5: Summary and results of design and evaluation (Release 10).

169

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 170-182 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Forensic Evidence of Copyright Infringement by Digital Audio Sampling Analysis - Identification – Marking Stefan K. Braun Faculty of Management. Comenius University PhD Student, Comenius University Bratislava, Slovak Republic [email protected]

ABSTRACT

KEYWORDS

In recent years, the number of attempts to use digital audio and video evidence in litigation in civil and criminal proceedings has increased. Technical progress makes editing and changing music, film and picture recordings much easier, faster and better. The methods of digital sampling differ from the conventional pirated copy in that using a sample involves extensive changes and editing of the original work. Different digital sampling methods make the technical analysis and the legal classification more difficult. Targeted analysis methods can clearly identify a case of sampling and belong to the main field of forensic analysis. If persuasive evidence of an unauthorized use of sampling cannot be produced, the proof is useless in the legal process. Labelling technologies that are applied correctly make an important contribution to the effective detection of unauthorised sound sampling. There are hardly any holistic approaches that integrate the problem of sound sampling into the fields of analysis, identification, and labelling. In combination with specific technical protective mechanisms against sampling, an unauthorised use of samples protected by copyrights can be prevented or reduced. Using and sampling somebody else’s piece of music or video can be a copyright infringement. The copyright and the neighbouring rights of performing artists and the neighbouring rights of phonogram producers are affected by the consequences of illegal sampling. Part 1 of the article introduces the problems of digital audio sampling, Part 2 describes the typical manifestations of sampling, Part 3 illustrates various analytical procedures for the detection of audio sampling and Part 4 shows the identification by labelling strategies.

audio · authentication · bootlegging · digital techniques · single sound sampling · ENF · Electric Network Frequency · forensics · forensic audiology · real-time frequency analysis · cryptography · neighbouring rights · melody · mash-up · mix production · multi-sampling · phase inversion · remix · sample medley · sound sampling · sound separation · spectrogram · spectrometer measurement · sound collage · sound sequence sampling · copyright · watermarking

1

INTRODUCTION

1.1

The Problems and Classification of Digital Sound-Sampling

The word “sample” in this context stems from the piece of equipment known as a “sampler”. The sampler is supplied with sound information by integrating sound or microphone recordings. From the fed-in oscillation curves, samples are taken and stored. With the use of modern software, removed samples can be, for example, transposed in pitch and tempo, mutilated, transformed, tampered or mixed as desired [1, 2]. From the sample source voices, instruments, rhythms and parts of melody can be removed (“sampled out”) and incorporated into a new production. The purpose of sampling is the simple and inexpensive way of adopting desired sounds, instruments, or voices without having to invest in studio production costs, time and

170

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 170-182 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) effort. Furthermore, the sound characteristics of performers can be imitated and used as inspiration without their knowledge or consent. Users of samplers not only utilize notes but also sound from a specific production. The arrangement of individual sounds and timbres can be created, on the one hand, in the studio and, on the other, directly on the digital recording computer [1]. “Sound”, “timbre” and “tone” are used more or less synonymously in literature. The limiting factor is that from a physical point of view, timbre is only one of the many components of sound [3]. Sounds and melodies can be generally adopted from both existing music productions and recordings. In contrast to this, there are sound databases that can be downloaded from the internet and also physical data carriers such as sound libraries. In addition to shorter sound excerpts of a few bars or seconds, smaller melody parts, the socalled “licks” and smaller sequences are sampled. A specific sampled music sample therefore includes also the generated sound [4]. If there are, in addition to a certain sound, enough of these samples available to the user, he can put these together like a “mosaic” to create a “new” work. A very common form of sampling is taking foreign compositions from actual recordings into new music and film productions. Often pitches and characteristics are changed to differing degrees when adopting single tones or tone sequences in the sampling process. Processing. The processing of a musical work is always associated with a transformation. When composing, the melodic, harmonic and rhythmic form is changed. When this is text, it is reworked, modified, supplemented, replaced completely or translated into another language, for example. The result of such a major rearrangement is a newly created work. The cover

version shows the necessary individuality in the form of intellectual and approval-requiring creation [5]. The prerequisite is that the transformation in turn has the appropriate quality of “work”. It should be determined which musical design elements cause the creative peculiarity of the work. To be considered in this context in particular are the tonal system, the duration of the tone, timbre, volume, rhythm and melody. Processing eligible for protection. Processed work which is eligible for protection requires a recognizable creative performance of the editor, so that resulting from the compositional change or expansion of the musical substance of the original, a new, independent work is created. In contrast to such works which are eligible for protection are those which use an original work and take the musical substance of the original essentially unchanged and transfer the musical text of the original faithfully (e.g. editorial services) [6]. Works that have been created using other works or foreign melodies must be marked with the appropriate copyright information. For free works no permission for processing has to be sought from the originator. Protected works require this permission. Processing is the key feature when considering whether the original is eligible for protection [5]. It is crucial that the new work distinguishes itself from the old one and not only repeats an already existing one; the aesthetic overall impression of the new piece must not be present in the original work [7]. Melody. The melody is, in occidental music, the most important parameter and main information carrier. Together with the harmony it is the most important forming structure in music. The term melody includes three elements: Harmony (harmonizing of tones), rhythmos [sic] (temporal structure) and logos (text). Melodies are differentiated in their function and their classification as a vocal melody (range, phrase length) or instrumental melody [8]. The melody forms a self-contained tone system

171

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 170-182 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) (characteristic). It retains its own character even when accompaniment (rhythm) is eliminated or the sounds replaced (transposed). In music for easy listening and pop music, the vocal parts of the melody are considered to be the characteristic that can be assigned to the relevant song. 1.2

Services not Eligible for Protection

Typical techniques and thus ineligible for protection include mere conversions of sentences or sentence parts of a multi-part musical work, slight changes of melody, harmony and rhythm, or individual noise elements if the basic character of the original work remains the same [9]. Certain, recurring basic repeats or patterns, such as chord sequences, classic song structures or common elements of music are not eligible for protection [10]. Insignificant tonal variations, slight shortening or extensions taking into account the compositional or textual original work are permitted in this context [10]. Exceptions are to be seen under certain circumstances with regards to fingering in music course books when this characteristic forms the tone. The transposition of the pitch of the original is also one of the criteria ineligible for protection and does not change the melody. Criteria for Activities not Eligible for Protection. ─ Lack of originality. ─ Insignificant, minor changes. ─ Use of an original work, borrowing of partial works. ─ Transposition to a different key or pitch for technical artistic reasons. ─ Instrumentation and timbre of individual instruments, merely replacing an instrument. ─ Adaptation of the melody to the vocal abilities of the singer. ─ Making changes to the rhythm, replacement with another standard rhythm. ─ Note-for-note transcription of existing voices to another instrument.

─ Supplementing of performance indication, elaboration, fingering, applying punctuation. ─ Addition, change of phrasing. ─ Tempo and volume adjustments. ─ Doubling of voices. ─ Addition of accompanying voices in parallel motion (e.g. in the third or sixth). ─ Reduction of existing parts in the score of a piano movement. ─ Editorial services (publication of a preexisting musical work). ─ Digitization or compression into an MP3 file, for example. 2

TYPICAL MANIFESTATIONS OF SAMPLING

Cooper [11] divides audio editing into three levels: 1) Editing / tampering on a basic level, directly in the original material, during or after the recording; 2) Editing / tampering on an intermediate level, containing several fields copied from one or more original sources for a new recording; 3) Editing / tampering at a high level by means of appropriate editing and sound processing software. The edited version will then function as a “new original”. According to their type of use, the sampling techniques can be divided into single-tone sampling and melody sampling. Single-tone sampling distinguishes again between the actual sampling of a single-tone and a variant called “Multi-Sampling”, one of the economically most important and technically difficult to detect sampling forms. It is referred to colloquially as “sound sampling”. The parties involved in each sampling are always the originator or author, the performing artist and, in the case of indirect sampling, the record producer. If a digital sample is used, there is inevitably always a reproduction of works or parts of works.

172

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 170-182 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) 2.1

Origin of the Sound Material

Sampling of the Artists' Own Sound Material. Sound material can be recorded by the artists themselves or recorded and then sampled. This is usually done where there are certain fragments repeated in a musical work. Sampling is also carried out when certain figures of a piece have a repetitive character and do not differ in dynamics, articulation and rhythm. With this approach, difficult figures and phrases have to be recorded only once [12]. Sampling of Foreign Sound Material. Much more sampling material comes from external sources [12] such as sound recordings or individual tracks from multi-track tapes. Furthermore, so-called “factory sounds” and sound archives exist, for example, on CD or as downloads from internet archives. Natural Sounds. These are divided into signals produced by oneself and others as well as natural sounds, meaning sounds not shaped by humans including animal sounds, machinery and everyday sounds [13] and meteorological noises [12]. 2.2

Single-Tone Sampling

Direct Single-Tone Sampling. Under direct single tone sampling, sampling of individual instrumental sounds is understood. Here, a certain characteristic sound, for example, an instrument, a voice or a sound is taken in isolation, digitized, fragmented, and then imported into the sampling computer [12]. Using the keys of keyboards, the sound can be allotted to a button and then played. If there are sounds in different pitches, volumes and articulations, music can be played and modelled with specific musical characteristics. This process provides unrestricted access to the original sound of a music production. Indirect Single-Tone Sampling. Single-tone indirect sampling is the term used to refer to the

acquisition of sampled sounds from existing recordings, mostly audio recordings. A single tone can thus be isolated and the obtained sound then processed. The acquisition of single tones from a ready-mixed multi-track production by frequency superpositions of the singletones and instrumental tracks later mixed together is not quite so simple. A single tone from single tracks of a recording, however, is very easy to remove and include and of high quality [1]. Multi-Sampling. The term multi-sampling is used when several individual notes with different pitch intervals and volumes are distributed on a sampler keyboard. The distribution usually takes place according to the original pitch. Often tones of mixed productions are extracted which have superimposed frequencies of other instruments. If only one sound as in the singletone sampling is extracted, this would have to be transposed to a different pitch, which would lead to frequency distortions in any existing secondary frequencies. Therefore, different sounds according to their pitch ranges are extracted from different points of a piece in order to avoid this negative effect. An additional optimization is achieved by the blending (positional crossfading) of the samples with each other [1]. 2.3

Melody Sampling

Contrary to the sound use of the single tone sampling, tone sequences sampling is about the (partial) adoption of melodies, harmonies and rhythms and the subsequent collage-like composition of new musical works. In general, a sequence of sampled parts from well-known music productions is used to maintain the recognition effect [12]. A variety of procedures can be distinguished. Mixed Productions (Sample Medley). In mixed production consecutive characteristic music parts of a few seconds or bars are sam-

173

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 170-182 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) pled and successively linked together in a newly created mixed production. Here, the new mixed production either contains parts of samples [1] or, in extreme cases, consists entirely of such. By using adjustment of the tempo the individual samples must be adapted, where necessary, before the mixing takes place. The purpose of this approach is the recognition effect of the sampled work parts. The more clearly the recognition of parts of the originator’s work, the more successful the goal of the mixed production was implemented. Such mixed productions are created in the pop and dance genres by disc jockeys. Such productions were used before digital sampling technology existed, carried out by hand and the much more complicated and time-consuming tape cutting. Sound Collages. Unlike mixed productions, sound collages disguise their origin [14]. Instead of stringing together sound samples, into sound collages these are layered over each other (“batch processing”). It is not unusual for several layers of samples to be superimposed. For example, a melody sequence can be taken as a sample from work 1, a rhythm from work 2 and a guitar sequence from work 3. In general, the individual samples must then be adjusted with regards to volume, tempo, pitch and timbre, so that they fit together in a new production, often cut as a “loop”. As with mixed productions, the sound collages may consist either in part or entirely of samples. Cover Versions. The sampling technique with cover versions and remixes is understood as “hit-recycling”. Either the whole work or parts thereof, for example, the refrain, are taken from the original and backed with new rhythms and sounds. The purpose is the audible sound adaptation to new listening habits. Cover versions (interpretations of an earlier original) can be made without using the sampling technique. The sampling technique is still used consciously and for economic reasons, however, to maintain the successful part of the original. As

with the mixed productions, sampled parts should be recognized [1]. If the artist leaves the limited scope for interpretation set for cover versions and moves towards a processing with independent creative input into the piece, this change is subject to approval. Remixes. The remix follows the same rules as processing. Successful hits are frequently rereleased as a remix. Individual tracks of a multi-track tape are often completely “broken down into pieces” and recomposed and remixed along with new recordings. There are also mixed sound effects, new recordings of instruments and a far-reaching change in the sound of the material. The remix, however, can take place with the extraction of a sample [12]. Mash-Up. Mash-ups (also known as bootlegging, bastard pop or collage) have been enjoying increasing popularity for years. At the beginning of the 1990s, it was usually only 2 different pop songs whose vocal and instrument tracks were mixed with each other to form a remix [12]; today there are multi-mash-ups with several dozen mixed and sampled songs, artists, video sequences and effects. It is a challenge to mix this combination of different styles to new danceable tracks. The mash-up is a mix of sound collage and mixed productions. Usually known sequences of two or more (multi-mash-up) existing works are mixed to create a “new” work. The samples used are layered over each other (sound collage), as well as in series (mixed production). The incorporation of large parts of the original in the mash-up is the rule. In sampling, however, it is rather the exception [12].

174

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 170-182 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) 3

ANALYSIS METHODS FOR DETERMINING THE USE OF A SAMPLE

Evidence of sampled parts in a musical work can be achieved by means of different methods of analysis. 3.1

Musical Aspects

Under certain circumstances, a simple listening test is sufficient. As a rule, a direct comparison of the musical notation is carried out. Since most samples were changed in speed and pitch, it can be helpful, to adapt these in terms of pitch and tempo to the original before starting the analysis. Pitch changes and temporal extension have qualitative limits if a realistic overall impression should remain. Deviations of about 1520% produce audible noise and alienate the original. This can be desirable for creative reasons. Often sampled parts are superimposed with other instrument and vocal tracks. A simple separation is then no longer possible.

3.2

Physicals Aspects

Analysing and measurement methods provide evidence of the use of sampling. Electric Network Frequency Analysis (ENF). With regard to the validation of digital audio and video recording a common method recognized by expert forensics is the Electric Network Frequency Analysis (ENF). Each mains power supply leaves a characteristic frequency, a so-called "mains hum". This may not be audible but the oscillations can be detected in an audio file [15]. If digital recording devices are used such as cameras or audio recorders to record voice, music or film recordings these, in addition to the actual content, store the network frequencies of 50 or 60 Hz. This happens with battery powered devices in the same way [16].

The frequency of the carrier never has exactly the same value. The random fluctuations in the power supply are the result of the differences between produced and consumed current. The actual ENF signal can be extracted by using band-pass filters that filter out, for example, in a 50 Hz power supply the range 49-51 Hz. Interruptions or irregularities in the phase response can be an indication of tampering. The network frequency behaves in effect as a temporary digital watermark. The effectiveness of this method, however, is dependent on such a network signal existing at all [15]. In most situations, a visual comparison of the spectrogram with the frequencies which are stored in an ENF database is sufficient. More detailed studies require measurement and analysis in certain short time slots, which are compared to each other. With this method it is even possible to determine the exact location and the exact time of production of a recording. Corresponding reference samples prepared by continuous recording of the network frequencies in power networks (such as the German or European electricity grids) are a prerequisite [16]. Microphones also leave a particular frequency spectrum in the audio material. Should several different spectra show up in a recording, this can also be an indication of tampering [15]. The evaluation of the digital audio recording by detecting the exact measurements, the comparison, and a mapping of the individual frequencies in the reference database is, therefore, of great importance. Suitable methods are the spectrogram representation, “re-sampling”, real-time-frequency analysis (spectrometer measurement) and the phase inversion. Besides ENF analysis the spectrogram representation, “re-sampling”, real-time frequency analysis (spectrometer measurement) as well as phase inversion are appropriate forensic methods. Spectrogram Representation. In a spectrogram display the spectral density of a signal is

175

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 170-182 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) displayed over time. Figure 1 shows in A the recording A (an original music recording from 1990) and B the recording B (unauthorized editing by sampling removed from the original A and mingling with new instrumental tracks in 2007). The recordings were first equalized in tempo and pitch before the examination by means of a spectrogram representation and then directly compared (see “A” (left channel) as recording A and “B” (right channel) recording as B). A time section of 6 seconds is depicted. This corresponds to about 4 synchronous parallel bars from the two recordings. The Y-axis shows the frequency spectrum of 0 Hz (Hertz) to about 20 kHz and depicts approximately the hearing range of the human ear 1 . In the lower area, the lower frequencies are shown and the higher frequencies at the top. The horizontal Xaxis is the time axis. Frequency is the number of periods that are run through in a second. The unit of frequency is the Hertz (Hz). An oscillation is composed of a positive and a negative half-wave, i.e. the to-and-fro swing of the electrons is called an oscillation, wave or period [17]. With this representation, the audio material can be visualized. The representation in the frequency spectrum is used to gain direct access both to specific frequency ranges as well as certain time ranges in contrast to standard waveform processing (see Figure 1, Area C) which is always performed for the entire frequency domain. These frequency ranges can be shown in colour by means of analysis software. High and low frequencies are represented by different colours. The intensity and the level of the frequencies are displayed in a colour spectrum that extends from blue and white (the highest intensity) to purple and black (the lowest intensity). In simple terms, a bell sound in a piece of music can, for example, be reduced, replaced or removed by using the “Copy & Paste” software function to copy a part without 1

The hearing range (auditory sensation area) of human ear is from about 16-21 Hz to 16-19 KHz.

a bell and insert it over the desired place. In spectral processing, there are diverse modes that can be used. For example, it is possible to reduce levels by means of band, low and high pass filters (“damping”) - the peak level is blurred by mixing the frequencies and thus they “disappear” or are covered up. Furthermore, it is possible to transform the dynamics without changing the actual frequency content (“dispersion”).

C A B Figure 1. Comparison of the recordings A and B in the spectrogram representation (Source: Stefan K. Braun).

In Figure 1 a bell can be clearly seen in the sampled recordings A and B (red arrow). It lies in the frequency range of approximately 5400 Hz. In the AB-comparison it is very clear to see that all the different frequencies correspond, the patterns of both images are identical. The frequency spectrum of recording B is much richer. This is due to the mixed instrumental tracks added to the recording. The temporal distribution of the frequency phase and the significant characteristic features such as the existing bell in the recording have not been changed by the sampling. The area C represents the waveform processing. Visual procedures such as the spectrogram representation are important methods for aiding detection of manipulation. “Re-Sampling”. Under certain circumstances, a sampling procedure can be carried out via a so-called “re-sampling”. Here, in simple terms, the numerical values of the digital samples are compared with those of the original. This pre-

176

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 170-182 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) supposes, however, that there are identical comparative pieces. Usually the samplings used do not exist in isolation, but in the final product mixed together inseparably with other audio and instrument tracks, distorted with effects and changes in tempo and pitch. A direct comparison is no longer possible. Spectrometer Measuring. In sampling, a digital copying process cannot always be compared purely by listening. With spectrometer measurement a coherent frequency diagram can be displayed and a very accurate and detailed realtime frequency analysis performed. In this case, the frequency spectrum is represented as a linear graph. Peak levels are depicted as short horizontal lines showing the last reached maximum values (see Figure 2). Spectrometer measurements are also used in forensic analyses, e.g. vocal comparisons in the field of criminology.

A

B

Figure 2. A frequency spectrum in a real-time frequency analysis with a linear graph at a randomly selected point in time of the investigated sample (Source: Stefan K. Braun).

and continuity of the recordings and the detection of changes are of particular importance [18]. For real-time frequency analysis a random location of the samples to be examined was selected and fixed as a linear graph. The graphs show the result of a very similar, almost matching curve. Their frequency forms correspond in the typical manifestations in the characteristic points (e.g. shallow rise, steep climb, strong peaks between 500 Hz and 2700 Hz, falling from 7500 Hz). Within the investigated samples of 6 seconds duration all investigated linear graphs of the frequency spectrum show a relatively similar curve in terms of characteristics and patterns. Figures 4 and 5 show two or more overlapping linear graphs. The relatively similar curves from randomly selected positions in time on the same samples show clear similarities between the original A and sample processed B.

A

B

Figure 3. A frequency spectrum in a real-time frequency analysis with a linear graph at a different point in time of the same sample (Source: Stefan K. Braun).

In Figures 2 and 3, the amplitude of the wave of + / -0 dB (decibels) is represented by -96 dB on the vertical Y-axis, the horizontal X axis indicates the frequency band of 0 Hz to about 16 kHz. As described above, the recordings A and B are compared directly. In the authenticity analysis, the determination of the originality

177

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 170-182 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

A

B

Figure 4. A frequency spectrum in a real-time frequency analysis with two overlapping linear graphs at a randomly selected point in time of the same sample (Source: Stefan K. Braun).

A

B

Figure 5. A frequency spectrum in a real-time frequency analysis with three overlapping linear graphs at a randomly selected point in time of the same sample (Source: Stefan K. Braun).

The problem may be verification when a sampling was not created by copying, but by an extensive technical sound remake. Here there is a difference in the technical and legal view. While in terms of law, a remake “sample” can still be considered as such, it is technically a different object. If a sample is taken from an original, it can be determined relatively easily due to whether the frequency plot of the linear graphs is the same or different in the analysed sample. For example, physical characteristics of the same or different audio tracks of vocals can be represented by this method. Adopted or re-

made instrument passages can be revealed and checked for sameness with this method. Even non-audible differences of different blowing techniques for brass instruments or different striking techniques with keyboard instruments can be seen in the graph representation [13]. It is not possible to achieve congruent sound and frequency structures by imitating ways of playing and singing. If they are identical, everything points to a sampled adoption of the original. The limits of an identical representation of the linear graphs are reached when the samples in one object which are being compared are changed dramatically with respect to sound and are superimposed with other vocal and instrument tracks. Phase Inversion. In recording studio technology, phase reversal (phase inversion) is often used to correct wrongly polarized audio signals in the phase. In order to achieve certain effects, phases with correct polarity can also be reversed deliberately. Using this, undesired and reverse-poled phases can be added/mixed with the phases of the original signal, so that they cancel each other out, in whole or in part. For example, in a piece of music with vocals, the vocals are “filtered out” by phase inversion in order to obtain an instrumental or karaoke version. In the forensic evidence of phase reversal, a destructive interference is sought; the matching points (oscillations) of the samples cancel each other out. An oscillation is composed of a positive and a negative half-wave, and thus corresponds to a full circle of 360 degrees [17]. If two sine-phases in the fundamental frequency are shifted 180 degrees of the phase, they are opposed (mirrored or inverted) and so cancel each other out completely. If two or more waves are added, their amplitudes are reinforced; this is referred to as constructive interference. If the waves cancel each other out, destructive (complete) interference is the term used. Theoretically, both recordings must be completely identical in this experimental arrange-

178

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 170-182 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) ment, i.e. tempo, pitch, volume and the course of the wave form match completely. If in a recording, a phase inversion is performed and this phase is mixed together with the other identical recording without phase inversion, it results in a complete cancellation of the part concerned. In a study of the phase reversal, destructive interference was sought in order to mutually cancel the corresponding parts of the samples. Under practical conditions, the physical alignment of both recordings on exactly the same pitch is very difficult. The more accurate this process is, the greater the cancellation in the end. In the next step, recording A is inverted in phase and levelled with the pitch of recording B. Then both phases are superimposed. The result is shown in Figure 6. While the phases do not cancel themselves out completely, they clearly correlate with each other. This correlation is particularly evident in the direct comparison with the unprocessed recording B. Comparison objects are seldom completely identical in practice. A phase cancellation is therefore mostly only partially possible. The affected sample part has partial cancellations. What can be heard after a partial phase inversion is a clear “flanging effect”. This effect is caused by artificial zeroes which are the result of the cancellation of the audio signal in the frequency spectrum. At the same time, in the previous phase reversal, a phase shift will take place, which causes a shift duration (“delay”). Now both the (partially) erased places and also the shifting of the phases to each other are audible “Flanging” altered audio signals produce a kind of “floating” effect. Often the effect is described like a jet (“jet effect”) which moves through the music [19]. In simplified terms, the “flanging” effect is similar to that of a tape and tape recorder. If a spool is “braked” by hand, then it accelerates again when released. This creates the effect of “flanging”. Complete cancellation cannot be achieved even with perfect alignment of pitch as the examined images A and B are different in their character-

istics. This is mainly due to, as mentioned above, the superposed instrumental and rhythm tracks in recording B.

A B P Figure 6. Phase inversion. A (input A, phase inverted), B (B recording, normal phase), P (mixed phases from the recordings A and B and audible “flanging”). Recording B is “shorter” at the parts with the included sampling at the end of the sample, i.e. it stops earlier than at the corresponding part in recording A. Due to this, the flanging effect stops at the end of this part. At this point the previous partial cancellation is particularly apparent (Source: Stefan K. Braun).

4

IDENTIFICATION BY LABELLING STRATEGIES

There is almost no effective protection that prevents unauthorized copying. In the last 20 years or so, the affected industries have developed and used the most diverse digital copy protection and labelling systems. Known systems include Digital Rights Management (DRM), the Content Scrambling System (CSS), different types of holograms, signatures such as RIFD, Serial Copy Management System (SCMS) or, for example, digital watermarks. For novice users there might be restrictions in use as not all the playing devices are able to deal with the copy protection mechanisms such as the DRM restrictions. The technically versed professional is, regardless of the legal regulations, capable of getting round these precautions more or less easily. Although overall markings such as holograms, bar codes [20] or ISRC codes (International Standard Recording Code) [21] identify the product (recorded music, digital file) in terms of its originality, they do not protect or prevent a possible further illegal use. Of

179

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 170-182 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) importance is a modular approach between the requirements of sound sampling, in conjunction with a proper identification method: Protection and recognition of very small clippings that are superimposed with other signals in foreign productions reappear. All procedures which can be used have a main problem in common: the more they cost, the less value these are in a practical use. Fundamentally it must be distinguished between “data hiding” and “watermarking”. While data hiding conceals information in the medium or in a channel, the watermarking binds the information into the medium. Data hiding is used interchangeably with "information hiding", although the latter is more likely to be used for the cryptic method [22]. The following procedures seem appropriate for marking, identification and authentication of sound samples for further use: 4.1

Cryptographic Processes

Cryptographic processes can be divided into asymmetric, symmetric and hybrid 2 , as well as strong and weak methods. According to Lynch / Lundquist a crypticsecure data exchange is confronted with the following system requirements: Identification, authentication, verification, non-repudiation and privacy. If all five demands are met, this is referred to as a secure data exchange [23]. Asymmetric, cryptic processes are characterized by the fact that digital signatures have a private and a public code. With the use of the private codes it is ensured that only the owner of the product rights can assign an individual signature [20]. A test of the encryption is provided by the public code. Signatures, e.g. in the form of identification numbers (“Identification Keys”) in connection with a verification database allow the tracking of marked objects (“Tracking & Tracing”). 2

https://www.datenschutz.rlp.de/downloads/oh/ak_oh_ kryptographie_version1.pdf

4.2

“Watermarking”

The watermarking technology is a promising technology for the protection and prosecution of copyright infringement. The basic technique and main focus of research in digital watermarking consists of an integral, invisible [22] “interweaving” of identification (copyright information, names, logos, etc.) with the main channel without interfering with or impairing this. Audio signals (music and speech), images, movies, software, e-books and texts can be provided with individual markings in this way [24]. There are two important main groups of watermarking use: 1 Piracy resistant use, which prevents an attack on the watermark. Applications are copy-protection measures, “fingerprint” techniques and other preventive measures (e.g. hash functions). 2 A use that is weak in terms of being piracy resistant, the watermark is dissolved or minimally changed in the case of a piracy attack. When the watermark has been changed or is absent, copies of the originals are no longer recognized as originals [25]. There are important requirements for the labelling: 1 The easy readability of the watermark in retrospect, 2 Resistance to destruction, 3 The receipt of the signal in the case of the use of very small excerpts of the original file [25] and 4 The additional information must not be perceptible to the human ear [22]. There are several, often conflicting, properties that are the focus of watermarking: The amount of hidden or inserted information, the robustness and security of the data, the invisibility and the reading of the introduced data [22]. Labelling and identification systems which are based on an authentication and so distinguish the copy from the original can be used independently or with a database [20]. A check on the authenticity of the watermark and the control of the authentication is done, for example, using database systems. For audio files, for example, a watermark can be set as an “inaudible” frequency over the actual audio frequency

180

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 170-182 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) band. To read the information it needs the same algorithm, a “Watermark Key”, which was necessary for the earlier interweaving of the information. The recognition of copyright infringement takes place via a verification comparison on the database server. Disadvantages of such systems are a not quite closed security chain, as markers are not created directly at the premises of the copyright owner, but in the sales shop. If only digital files using the watermark process are detected, a direct use of recorded music media and trade on exchange platforms cannot be prevented. Piracy resistance has limits with the use of watermarking technology. A frequent copying and transforming creates a “fuzzy”, unreadable watermark. A significant advantage is in the aforementioned limitations of the preservation of the watermark even with format changes, compression, filtering, re-sampling, re-quantization, as well as recognizing the violation of even the smallest excerpts, as they occur with the sound sampling [24]. 5

ration) which can extract the whole melody will exacerbate the problem of piracy. On the other hand, the improved analysis and marking processes such as the watermarking technology offer more possibilities for the protection and detection as well as prosecution of copyright violations. References [1]

[2]

[3] [4]

Conclusion

In principle, only the adoption of free or lawfully licensed works is allowed for processing as a sample. If it is unclear whether sampling should be carried out, a sample-clearing with their respective rights holders and collecting societies can help. With regard to the validation of digital audio and video recording a common method recognized by expert forensics is the Electric Network Frequency Analysis (ENF). With this method it is even possible to determine the exact location and the exact time of the production of a recording. Visual procedures such as the spectrogram representation are important methods for aiding detection of manipulation. In the authenticity analysis, the determination of the originality and continuity of the recordings and the detection of changes are of particular importance. Sound Sampling will continue to win in importance and new extraction methods (sound sepa-

[5]

[6] [7] [8]

[9] [10]

[11] [12]

M. Häuser, Sound und sampling: Der Schutz der Urheber, ausübenden Künstler und Tonträgerhersteller gegen digitales Soundsampling nach deutschem und US-amerikanischem Recht. Dissertation. München: Beck, 2002. E. Adeney, “The sampling and remix dilemma: What is the role of moral rights in the encouragement and regulation of derivative creativity,” (English), Deakin Law Review, vol. 17, no. 2, pp. 335–348, 2012. T. M. Jörger, Das Plagiat in der Popularmusik. Dissertation, 1st ed. Baden-Baden: Nomos Verlagsgesellschaft, 1992. B. Wessling, Der zivilrechtliche Schutz gegen digitales Sound-Sampling: Zum Schutz gegen Übernahme kleinster musikalischer Einheiten nach Urheber-, Leistungsschutz-, Wettbewerbsund allgemeinem Persönlichkeitsrecht. Dissertation. Baden-Baden: Nomos Verl.- Ges, 1995. M. Pendzich, Von der Coverversion zum HitRecycling: Historische, ökonomische und rechtliche Aspekte eines zentralen Phänomens der Popund Rockmusik. Dissertation. Münster: LIT, 2004. Ohne Verfasser, GEMA: Schutzfähige Bearbeitungen freier Werke. Journal Article U. Loewenheim and B. von Becker, Handbuch des Urheberrechts. Textbook, 2nd ed. München: Beck, 2010. R. Amon, Lexikon der Harmonielehre: Nachschlagewerk zur durmolltonalen Harmonik mit Analysechiffren für Funktionen, Stufen und JazzAkkorde. Textbook. Wien, Stuttgart: Doblinger; Metzler, 2005. R. Moser, Handbuch der Musikwirtschaft. Textbook, 6th ed. Starnberg u.a: Keller, 2003. G. Berndorff, B. Berndorff, and K. Eigler, Musikrecht: Die häufigsten Fragen des Musikgeschäfts ; die Antworten. Textbook, 6th ed. Bergkirchen: PPV-Medien, 2010. A. J. Cooper, Detection of copies of digital audio recordings for forensic purposes. Milton Keynes: Open University, 2006. P. Wegener, Sound Sampling: Der Schutz von Werk- und Darbietungsteilen der Musik nach

181

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(3): 170-182 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

[13] [14]

[15]

[16]

[17] [18]

[19]

[20]

[21]

[22]

[23] [24]

[25]

schweizerischem Urheberrechtsgesetz. Dissertation. Basel: Helbing Lichtenhahn, 2007. R. Münker, Urheberrechtliche Zustimmungserfordernisse beim Digital Sampling. Dissertation. Frankfurt am Main, New York: P. Lang, 1995. T. Meschede, Der Schutz digitaler Musik- und Filmwerke vor privater Vervielfältigung nach den zwei Gesetzen zur Regelung des Urheberrechts in der Informationsgesellschaft. Dissertation. Frankfurt/Main, New York: P. Lang, 2007. R. Korycki, “Time and spectral analysis methods with machine learning for the authentication of digital audio recordings,” (Englisch), Forensic Science International, vol. 230, no. 1-3, pp. 117– 126, 2013. C. Grigoras, “Applications of ENF criterion in forensic audio, video, computer and telecommunication analysis,” (Englisch), Forensic Science International, vol. 167, no. 2-3, pp. 136–145, 2007. H. Meister, Elektronik: Mit Versuchsanleitungen und Rechenbeispielen, 8th ed. Würzburg: VogelBuchverlag, 1986. B. E. Koenig and D. S. Lacey, “An Inconclusive Digital Audio Authenticity Examination: A Unique Case,” (Englisch), Journal of Forensic Sciences, vol. 57, no. 1, pp. 239–245, 2012. J. Webers, Tonstudiotechnik: Handbuch der Schallaufnahme und -wiedergabe bei Rundfunk, Fernsehen, Film und Schallplatte, 4th ed. München: Franzis, 1985. M. Abramovici, Kennzeichnungstechnologien zum wirksamen Schutz gegen Produktpiraterie: Mit Ergebnissen aus Projekten MobilAuthent, O-Pur, EZ-Pharm. Research Project. Frankfurt am Main: VDMA, 2010. M. Schäfer and T. Hansen, ISCR - International Standard Recording Code: Das ISRC-Handbuch. Textbook. Available: http://www.musikindustrie.de/fileadmin/news/pub likationen/vb_isrc_handbuch.pdf (2014, Feb. 17). M. Faundez-Zanuy, J. J. Lucena-Molina, and M. Hagmüller, “Speech Watermarking: An Approach for the Forensic Analysis of Digital Telephonic Recordings*,” (Englisch), Journal of Forensic Sciences, vol. 55, no. 4, pp. 1080–1087, 2010. D. C. Lynch and L. Lundquist, Digital money: The new era of Internet commerce. Journal Article. New York: Wiley, 1996. S. V. Dhavale, “Lossless Audio Watermarking Based on the Alpha Statistic Modulation,” (Englisch), IJMA, vol. 4, no. 4, pp. 109–119, 2012. M. A. Nematollahi and S. A. R. Al-Haddad, “An overview of digital speech watermarking,”

(Englisch), Int J Speech Technol, vol. 16, no. 4, pp. 471–488, 2013.

182

International Journal of Cyber-Security and Digital Forensics (IJCSDF) Published by The Society of Digital Information and Wireless Communications Miramar Tower, 132 Nathan Road, Tsim Sha Tsui, Kowloon, Hong Kong

Volume 3, Issue No. 4 - 2014

Email: [email protected] Journal Website: http://www.sdiwc.net/security-journal/ Publisher Paper URL: http://sdiwc.net/digital-library/browse/66

ISSN: 2305-0012

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 183-199 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Advances of Mobile Forensic Procedures in Firefox OS Mohd Najwadi Yusoff, Ramlan Mahmod, Ali Dehghantanha, Mohd Taufik Abdullah Faculty of Computer Science & Information Technology, Universiti Putra Malaysia, Serdang, Selangor, Malaysia. [email protected],{ramlan,alid,taufik}@upm.edu.my

ABSTRACT The advancement of smartphone technology has attracted many companies in developing mobile operating system (OS). Mozilla Corporation recently released Linux-based open source mobile OS, named Firefox OS. The emergence of Firefox OS has created new challenges, concentrations and opportunities for digital investigators. In general, Firefox OS is designed to allow smartphones to communicate directly with HTML5 applications using JavaScript and newly introduced WebAPI. However, the used of JavaScript in HTML5 applications and solely no OS restriction might lead to security issues and potential exploits. Therefore, forensic analysis for Firefox OS is urgently needed in order to investigate any criminal intentions. This paper will present an overview and methodology of mobile forensic procedures in forensically sound manner for Firefox OS.

World-Wide Smartphone Sales (Thousands of Units) classified by mobile OS from Q1 2007 to Q4 2013 [1]. The number of sales are not limited to the smartphone, but also included other mobile devices such as tablets and PDAs. It is because tablets and PDAs using the same mobile OS by smartphone.

KEYWORDS Forensic framework, mobile forensic, investigation, forensic methodology, procedures, Firefox OS.

forensic forensic

1 INTRODUCTION Mobile devices are relatively small, portable and widely used by all ages of people in their daily life, business, entertainment, medical, as well as education. Mobile devices consist of mobile phones, smartphones, tablets, and personal digital assistant (PDA). The usage of mobile devices are gradually increase over the time especially smartphones and tablets. This increasing trends are due to its useful capability, numerous function and allowed many tasks which required personal computers as well as high processing power to be executed in mobile devices. Figure 1 shows the

Figure 1. World-Wide Smartphone Sales

Latest analysis by Gartner shows that the total numbers of smartphone sold in Q4 2013 is about 282 million units, while the total numbers of smartphone sold in Q1 2007 is about 24 million units [2-3]. In just 7 years, the total numbers of smartphone sold in Q4 2013 is about 12 times more than the total numbers of smartphone sold in Q1 2007. The highest sales growth goes to Android while the second highest goes to Apple iOS, a huge gap between first and second place. On the other hand, the remaining mobile OS shows inconsistent sale and sales growth. Figure 2 shows World-Wide Smartphone Sales by percentage [1].

183

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 183-199 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) native frameworks and web applications. WebAPI will enable developers to build applications, and run it in any standards compliant browser without the need to rewrite their application for each platform. In addition, since the software stack is entirely HTML5, a large number of developers were already established, and users can embrace the freedom of pure HTML5 [8].

Figure 2. World-Wide Smartphone Sales Percentage

The growth of mobile devices has led to numerous companies to join in the market shares. In Q1 2007, smartphone sales was dominated by Symbian OS, followed by Windows Mobile and Research in Motion (RIM). However, with the coming of Apple iOS in Q2 2007 and Android in Q3 2008, domination by Symbian OS was slowly reduce. At present, there are no more Windows Mobile as it was replaced by Windows Phone in Q4 2010 and Samsung Bada join the race in Q2 2010. In 2014, mobile OS market share is dominated by Android, followed by Apple iOS, Windows Phone, RIM, Symbian OS, and Bada respectively. In Q1 2012, Mozilla Corporation joined the battle by releasing their own mobile OS, named as Firefox OS [4]. The OS is able to run on selected Android-compatible smartphones. The first ever Firefox OS phone was released by ZTE in Q3 2013 and followed by Alcatel, LG and Geeksphone [5-6]. Firefox OS is an open source mobile OS which is purely based on Linux-Kernel and Mozilla’s Gecko technology [7]. Firefox OS boots into a Gecko-based runtime engine and thus allow users to run applications developed exclusively using HTML5, JavaScript, and other open web application APIs. According to Mozilla Developer Network, Firefox OS is free from proprietary technology but still a powerful platform; it offers application developers an opportunity to create tremendous products [7]. Mozilla introduced WebAPI by bridging the capability gap between

Unlike Apple iOS, Windows Phone, RIM and Android which full of manufacturer restriction, Firefox OS is based solely on HTML5, JavaScript as well as CSS, and those are totally open sources. By not having any restriction, security issues and potential exploit might come into question. According to Mozilla Developer Network, Firefox OS has designed and implemented multi-layered security model which deliver the best protection against security exploits [9]. In general, Firefox OS is using four layers security model, which are the mobile device itself, Gonk, Gecko and Gaia layers, in order to mitigate exploitation risks at every level. The mobile device is the phone running Firefox OS, while Gonk consists of the Linux-Kernel, system libraries, firmware, and device drivers. Gonk delivers features of the underlying mobile phone hardware directly to the Gecko layer. Gecko is the application runtime layer that delivers the framework for application execution, and implements the WebAPIs to access features in the mobile device. Gecko is operating as a gatekeeper that enforces security policies which designed to protect the mobile device from exploitation. Gecko also enforces permissions and preventing access of unauthorized requests. Last but not least, Gaia is the suite of web applications that delivers user experience [9]. The objective of this paper is to present an overview and methodology of mobile forensic procedures for forensic investigation in Firefox OS. This paper is organized as follows; Section (2) will explain about related work to-date. Section (3) will present the proposed methodology and detail steps in forensic procedure. Section (4) will give a brief conclusion and the future work to be considered. Acknowledgement and references are also presented at the end of this paper.

184

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 183-199 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) 2 RELATED WORKS 2.1 SIM Cards Investigation In the earliest mobile forensic investigation, most of the digital evidences in mobile phone were stored in SIM cards. Research by Goode stated that, it is vital to acquire the data such as contacts and SMSs stored in SIM cards [10]. In addition, mobile phone memory and SIM cards also hold phone contacts which may contain critical evidences for an investigators. According to Goode, there are three evidence locations in mobile phone which are from SIM cards, identification information from a mobile phone (IMEI) and core network provider information. Similar work carried out by Willassen was by exploring SIM card and core network data in GSM phones [11]. According to Willassen, the SIM cards can provide information of network provider name with a unique identification number. The subscriber's name, phone number and address usually associated with the SIM cards. Consequently, phone records also can be retrieved from network providers. Furthermore, the contents of a SIM cards are binary data that can be taken, provided that the user has authentication either with a PIN or a PUK code. Programs or tools such as Cards4Labs and SIM-Surf Profi were used to decode the binary format into readable form. In addition, Willasen also able to recover the evidence such as phone logs, phone contacts, SMS, and phone IMEI obtained from both SIM cards and mobile phones. On the other hand, Casadei was used open source tools, both in Windows and Linux for digital extraction from SIM cards [12]. As the result, Casadei was able to acquire the raw data in Binary format from the SIM cards. Casadei also presented an interpretation of binary raw data at a higher level of abstraction and used an open source tool named SIMbrush to examine the raw data. SIMbrush was designed to acquire digital evidence from any SIM cards in GSM network but have not tested for any modules under D-AMPS, CDMA and PDC. Additionally, SIMbrush focus more towards GSM network because GSM is the biggest mobile network in the world at that time

and penetration in this network is rapidly increased. Marturana was extended the acquisition process in SIM cards by comparing data in SIM cards and smartphones [13]. According to Marturana, acquisition in the smartphone is much more complicated; this is due to the possibility of evidences are also stored in many places such as internal and flash memory. 2.2 Windows Mobile With the arrival of smartphones, focuses are more on the Windows Mobile OS due to its similarity in nature with desktop environment. Windows Mobile OS is a simplified version of Windows OS developed by Microsoft; mainly for mobile devices. Research by Chen was able to extract SMS, phone contacts, call recording, scheduling, and documents from Windows Mobile OS via Bluetooth, Infrared and USB mode using Microsoft ActiveSync [14]. Microsoft ActiveSync used Remote API (RAPI) to manage, control, and interact with the connection equipment from the desktop computer. The acquired data were came from mobile phone internal memory, SIM card as well as removable memory. Similar research was continued by Irwin and Hunt by extracting evidences over wireless connections. They used their own developed forensic tools called as DataGrabber, CTASms and SDCap. DataGrabber was used to retrieve information from both the internal memory and any external storage card, CTASms to extract information from the mobile device’s Personal Information Manager (PIM) while SDCap was used to extract all information from external storage card. They were successfully mapping internal and external phone’s memory and transfer all files and folder to desktop computers [15]. By using RAPI function, acquisition process only capable to capture active data and capturing deleted data is not possible using this method. According to Klaver, physical acquisition method will be able to obtain non-active data in Windows Mobile OS [16]. Klaver was proposed a versatile method to investigate isolated volume of Windows Mobile OS database files for both active and deleted data. Klaver was used freely available 185

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 183-199 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) tools of forensic application and explained the known methods of physical acquisition. Deleted data can be recovered by using advanced acquisition methods like chip extraction and this method was able to bypass password protection. Casey was extended the finding by describing various methods of acquiring and examining data on Windows Mobile devices. Casey was also able to capture text messages, multimedia, e-mail, web browsing, and registry entries [17]. Some of the captured data by Casey were locked by the OS itself, and require XACT from Micro Systemation and ItsUtils to work together with Microsoft ActiveSync. These tools will help to unlock certain files and convert the ASCII format in cemail.vol structure to a readable SMS. This research was also focused on potentially useful sources of evidences in Windows Mobile OS and addressed the potential evidences found in \temp folder. In the recent work, Kaart made an investigation by reverse-engineering the pim.vol volume files in Windows Mobile OS [18]. pim.vol is a Microsoft’s Embedded Database (EDB) volume that consists of information related to phone contacts, calendars, appointments, call history, speed-dial settings and tasks [18]. Kaart was successfully reverse-engineering important parts of EDB volume format which allow them to recover unallocated records. Kaart was also delivered the mapping from internal column identifiers into readable format for some familiar databases in pim.vol volumes and created a parser that can automatically extract all allocated records exist in a volume.

back registry hives, digital evidence such as SMS, MMS as well as email can be obtained and retrieved from cemail.vol. On the other hand, Research by Grispos make a comparison of forensic tools using different approaches for acquisition and decode process [20]. According to Grispos, there are strengths and weaknesses of each types of acquisition. Grispos stated that, logical acquisition is more efficient for recovering user data, whereas physical acquisition can retrieve deleted files [20]; but this procedure can damage the device while it is being dismantled. Besides that, Kumar was proposed an agent based tool developed for forensically acquiring and analyzing in Windows Mobile OS [21]. This tool is develop based on client server approach, whereby client is installed on desktop PC and server agent is inject into mobile devices before acquisition process. As for analyzing process, this tool was able to display and decode the image created during acquisition. This research also make a comparison between Paraben's Device Seizure, Oxygen's Forensics Tool as well as Cellebrite UFED and claimed to perform better in Windows Mobile OS. On the other hand, Canlar was proposed LiveSD Forensics to obtain digital evidence from both the Random-Access Memory (RAM) and the Electronically Erasable Programmable Read Only Memory (EEPROM) of Windows Mobile OS. This research was claimed to generate the smallest memory alteration, thus the integrity of evidences is well preserved [22].

Microsoft ActiveSync in Windows Mobile OS placing an agent into mobile devices. This action may alter the stored data in mobile devices such as the last synchronization date and time; or the name of the last computer synchronize with the devices. For this reason, Rehault was proposed a method of using boot-loader concept; which is nonrewritable and able to protect the evidences from being altered [19]. Rehault was also proposed an analysis method to process specific files with specific format. The main focus in this research was to obtain registry hives and the cemail.vol which contain deleted data. By reconstructing

Symbian OS is one of the famous mobile OS in the golden age of smartphone. Forensic work by Mokhonoana and Olivier was discussed about the development of an on-phone forensic logical acquisition tool for the Symbian OS [23]. Mokhonoana and Olivier was performed UNIX dd command to acquire the image from the phone. They also discussed different methods of retrieving data from a Symbian OS consists of manual acquisition, using forensic tools, logical acquisition, physical acquisition and data acquired from service providers. Additionally, Breeuwsma was proposed a low level approach for the forensic examination of flash memories and describes three

2.3 Symbian OS

186

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 183-199 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) low-level data acquisition methods for making full memory copies of flash memory devices [24]. Their work has been identified as pioneer in physical acquisition techniques using chip-off, JTAG and pseudo-physical. The work has been tested on Symbian OS devices.

be executed on the Symbian OS and created linear bitwise copies of the internal flash memory [29]. SMIT able to acquire the image of the internal flash memory and copies the images to a removable memory. SMIT has been tested on a Nokia E65, E70 and N78 model.

On the other hand, Rossi and Me was proposed a new methodology and a tool to obtain the data by using the removable memory [25]. A tool to obtain the data is stored in a removable memory and the acquisition process is performed locally. In addition, this tool not only performs acquisition process, but also compiles log and marks the data with some one-way hash algorithm to provide data integrity. Test result is divided into three condition of mobile devices and result obtained are different. Therefore, Rossi and Me suggest to maintain the device in the most possible original status. Besides that, Dellutri was proposed a novel methodology by applying data reverse-engineering on Symbian devices [26]. The proposed methodology also shows full potential when running on mobile operating systems which data formats are not open or not public. The investigation used Mobile Internal Acquisition Tool (MIAT) and run into more than 50 Symbian OS devices. Deluttri was able to capture personal data, Symbian OS personal data files format, obsolete data and hidden information during forensic process.

Thing and Tan was proposed a method to acquire privacy-protected data from smartphones running the latest Symbian OS v9.4 and smartphones running the prior Symbian OS v9.3 [30]. They also presented reverse-engineering analysis work on the active and deleted SMS recovery from the onphone memory of Symbian OS. In addition, Thing and Chua was proposed a forensics evidentiary acquisition tool for Symbian OS [31]. Their acquisition tool was built to support a low-level bit-by-bit acquisition of the phone’s internal flash memory, including the unallocated space. They also conducted an analysis to perform a detail of the fragmentation scenarios in Symbian OS. Apart from that, Savoldi made a brief survey and comparison between mobile forensic investigation in Windows Mobile OS and Symbian S60 [32]. In his work, Savoldi acquired the evidences using both logical and physical methods. Savoldi was also illustrated the differences and identified possible common methodology for future forensic exploration. Conversely, Mohtasebi was studied four mobile forensic tools; namely Paraben Device Seizure, Oxygen Forensic Suite, MIAT, and MOBILedit! to extract evidences from Nokia E500 Symbian OS phone [33]. The comparison was to check the ability to extract evidence and to examine information types such as call logs, map history, and user data files.

Moreover, Yu was proposed a process model for forensic analysis in Symbian OS. The process model consists of five stages which are Preparation and Version Identification; Remote Evidence Acquisition; Internal Evidence Acquisition; Analysis and Presentation; and Review. According to Yu, this model can overcome some problems of forensic investigation in Symbian OS [27]. Savoldi and Gubain was presented an assessment about specifically forensic acquisition focusing on security features in Symbian OS [28]. They gave an overview about OS and file system architecture. According to Savoldi and Gubain, only physical acquisition was able to recover a bitwise copy of the flash memory without bypassing any security mechanisms. On the other hand, Pooters was created a forensic tool called Symbian Memory Imaging Tool (SMIT) to

2.4 Apple iOS Forensic investigation and examination in Apple iOS was practically started by Bader and Baggili. They performed investigation and examination on logical backup of the iPhone 3GS by using the Apple iTunes backup utility [34]. Significant forensic evidences such as e-mail messages, SMS, MMS, calendar events, browsing history, locations services, phone contacts, call history as well as voicemail recording was found and can be retrieved using this iPhone acquisition method. 187

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 183-199 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) Husain later extended the finding by proposed a simple and cost effective framework for iPhone forensic analysis [35]. Followed the same approached by Bader and Baggili, iTunes was used to force backup the iPhone and logical copy of backup data can be found in computer hard drive. This method can captured the entire data from iPhone without Jailbreak the devices. Jailbreak is a method to release the security features of the iDevice and permits a direct connect to the data inside. Husain was used MobileSyncBrowser to analyse the backup file which is in binary format; converting them into lists and databases. Furthermore, SQLite Database Browser was used to analyse database file and Plist Editor was used to analyse Apple Property List file. In the contrary, Husain and Sridhar was focused on Instant Messaging (IM) data in Apple iOS [36]. This research made an analysis to forecast the potential use of IMs that can lead to cyber bully and cyber stalking. Once the data captured, Paraben Device Seizure, Aesco Radio Tactics and Wolf Sixth Legion were used to analyse the data. The output of this analysis were included username, password, buddy list, last login time and conversation together with timestamp. Similarly, Jung was reviewed the types of Social Network Services (SNS) that available in Apple iPhone and made further studied on acquisition process in Apple iOS platform [37]. Jung made an analysis on eight SNS in Apple iOS which are Cyworld, Me2Day, Daum Yozm, Twitter, Facebook, NateOn UC., KakaoTalk and MyPeople. However, this method required root access to the devices. Therefore, Jung Jailbreak the device to get full access permission. The examination and analysis of SNS continued by Tso [38]. Tso was discussed five most popular mobile SNS applications in Apple iOS usages such as Facebook, WhatsApp Messenger, Skype, Windows Live Messenger and Viber. Tso was followed methods by Husain by forcing Apple iTunes to make a logical copy of backup data. Tso ran two experiment, the first data acquisition was after applications installation, while the second data acquisition was after applications deletion.

Moreover, Said was carried a research by comparing Facebook and Twitter in Apple iOS, Windows Mobile OS and RIM BlackBerry [39]. The aim of this research is to investigate the different types of logical backup and encryption techniques used in three mobile OS. In the same way, Mutawa later on made SNS comparison between Facebook, Twitter and MySpace in three different OS which are Apple iOS, Android and RIM BlackBerry [40]. The examination and analysis was started by uploading picture and post a comment from mobile devices using each SNS from different platform. The aimed of this analysis is to determine whether activities performed earlier are stored and can be captured from internal mobile devices memory. Similar research was published by Lee and Park by capturing the SNS data using different forensic tools [41]. On the other hand, Levinson explore an investigation for third party applications in iOS platform [42]. According to Levinson, most mobile forensic work emphasis on typical mobile telephony data such as contact information, SMS, and voicemail messages. However, there are heaps of third party application might leave forensically relevant artifacts in mobile devices. For investigation purpose, Levinson acquired data from 8 third party application in Apple iOS platform. As a result, Levinson found information about user accounts, timestamps, geo-locational references, additional contact information, native files, and various media files. All these data typically stored in plaintext format and can provide relevant information to a forensic investigators. Additionally, in order to create Apple iOS image, we need to install SSH server and transfer the Apple iOS internal storage via Wi-Fi into desktop computer. However, this approach may require up to 20 hours. Therefore, Gómez-Miralles and Arnedo-Moreno were presented a novel approach by using an iPad’s camera connection kit attached via USB connection [43]. This approach was greatly reduces acquisition time. On the bad side, Jailbreak is required in order to gain full access and it is not considered as a forensically sound manner for forensic investigation. For that reason, Iqbal made a research to obtain an Apple iOS image without Jailbreak the device and run the 188

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 183-199 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) acquisition process on the RAM level [44]. Apple iOS devices need to reboot and enter the recovery mode before connected to their own developed tools. The imaging process was less than 30 minutes and they were successfully developed an acquisition method that protects the integrity of the collected evidences. Recently, Ariffin was proposed an operational technique that allows digital forensic investigators to recover deleted image files by referring to Apple iOS journaling file system [45]. The proposed method was implemented on an iDevice that has been Jailbreak and used a customized RAM disk that was loaded into the device RAM. This technique was successfully recover deleted image files from the journal file and was tested on iPhone 3GS and iPhone 4. 2.5 Google Android Android becomes a new mobile forensic investigation emphasis due to its strong hold in the market share and currently have the biggest active user using Android platform. Android was built based on Linux-Kernel and user has the ability to rewrite the firmware, boot-loader and other root activities. Research by Lessard and Kessler were among the first in Android platform [46]. They performed logical acquisition on HTC Hero to acquire physical image. UNIX dd command was performed to get an image from \dev\mtd directory. The examination was done on flash memory and removable memory by AccessData Forensic Toolkit (FTK) Imager v2.5.1. Apart from that, Anti-forensic among the major consideration in Android investigation. Research by Distefano was preserved the data from being damaged by using anti-forensics approach through a local paradigm [47]. This approach was exported some critical data which need to be identified in XML format and save it into private directory and it is unreachable from any other applications. Albano also worked on anti-forensics to modify and erase, securely and selectively, the digital evidence in Android [48]. This technique do not use any cryptographic primitives or make any changes to the file system. Moreover, Azadegan introduce the design and implementation of three novel anti-

forensic approaches for data deletion and manipulation on three forensics tools [49]. This research also identify the limitation of current forensic tools flow design. Quick and Alzaabi was performed logical and physical acquisition on Sony Xperia 10i [50]. Logical acquisition was not able to acquire the full size of the file system, while physical acquisition achieved a bitwise acquisition of the flash memory. They also claimed that they has successfully generated a complete copy of the internal NAND memory. On the other hand, Sylve was presented the first methodology and toolset for acquisition of volatile physical memory from Android devices [51]. This method was created a new kernel module for dumping memory and Sylve has further develop a tool to acquire and analyse the data. Sylve was also presented an analysis of kernel structures using newly developed volatility functionality. On the contrary, Vidas was proposed a general method of acquiring process for Android by using boot modes [52]. This technique reconstructed the recovery partition and associated recovery mode of an Android for acquisition purposes. The acquired data has to be in recovery image format. Custom boot-loader method has become popular in Android because the user is able to get root permission; and able to acquire an image of the flash memory. Another research using boot-loader method was conducted by Park [53]. This research was mainly focused on fragmented flash memory due to the increase of flash memory deployment in mobile phones. In addition, Son was conducted an evaluation on Android Recovery Mode method in terms of data integrity preservation [54]. Son developed an Android Extractor to ensure the integrity of acquired data and presented a case study to test their tool’s ability. The test was conducted on seven Samsung smartphones with rooted Android capability and emphasis on YAFFS2 and Ext4 files. The results from the use of JTAG method was served as a comparison vector to the Recovery Mode. In the different research perspective, Racioppo and Murthy made a case study about physical and logical forensic acquisition in HTC Incredible 189

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 183-199 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) [55]. AccessData Forensic Toolkit (FTK) Imager v3.0.1 was used to make an image from removable SD memory. After that, they root the device and gaining access to the root directory. They were able to create a bitwise copy of the seven MTDs presented in \dev\mtd directory. Examination was done using Ubuntu’s scalpel and claimed that the physical acquisition much more effective to discover corrupted or destroyed files. Moreover, Andriotis was proposed a method for acquiring forensic evidences from Android smartphones using open source tools [56]. The investigation was able to obtain information regarding the use of the Bluetooth technology and Wi-Fi networks on four Android running devices. They specified the files and folders to be targeted and highlighted some security problems that might occur by exposing user’s passwords in certain cases by the Android system. In addition, Mylonas was extended the acquisition work and focusing on phone’s sensors as critical evidence sources [57]. They studied the involvement of sensor like accelerometer, proximity sensor, GPS, compasses, etc in forensic investigation procedures which can be used to infer the user’s context. However, they found that the sensor data are volatile and not available in post-mortem analysis. For that reason, they developed the tool named Themis to collect suspect’s data. Themis consists of two major parts, workstation version and the mobile agent in Android. They presented promising result but yet to prove its effectiveness in practice due to the need of bypass Android security mechanism and need to place mobile agent in each phone. The newly acquisition method is the live acquisition. Thing was proposed an automated system in acquiring evidences and claimed that this method consistently achieved 100% evidence acquisition rate for outgoing message and 75.6% to 100% evidence acquisition rate for incoming message [58]. Thing was used Android as the test platform and Message Script Generator, UI/Application Exerciser Monkey, Chat Bot, memgrab and Memory Dump Analyzer (MDA) as the forensic tools. Although the acquisition rate is high, this method was only tested by using their own developed chat bot and yet to be tested using

commercial IM. Another live acquisition research is by Lai. Lai was proposed data acquisition in Android; and deliver the data to the Google cloud server in real time [59]. This method really can delivered the intended data, but the integrity of the data is questionable. As for monitoring, Guido used live forensic to remotely monitor and audit malware activity on Android devices [60]. It consists of five elements, each one detecting changes in specific parts of the OS such as bootloader, recovery, file system, deleted files and APK files. Even many successful detections, they discovered some malware false positive result and inability to detect some deleted entries. 2.6 RIM BlackBerry Another major player in mobile market share is BlackBerry. As for the BlackBerry, Fairbanks was presented a framework for an open source BlackBerry IPD file forensics tool [61]. Fairbanks was executed a python tool that parses the IPD file upon user request for specific resources. That tool is able to capture the messages, contacts, SMS records, memos, call logs, and the task list from an IPD file. This work was tested on BlackBerry 7290. The result was good but they did not provide enough data to the potential readers. On the other work, Sasidharan and Thomas generated BlackBerry image from Blackberry Acquisition and Analysis Tool (BAAT), which is injected on the device before acquisition process [62]. The tool also analyse the forensic image and shows phone contents in different file viewers. However, they unable to read and parse the SMS databases because the version of the BlackBerry JDE at that time does not supports API to access SMS databases from the device. In the recent BlackBerry research, Marzougy was presented the logical acquisition of the .bbb file produced by a Blackberry Playbook tablet [63]. They used the BlackBerry Desktop Management (BDM) software to perform the logical acquisition. The extracted information were varying from system information and user data and also failed to retrieve deleted entries. For future work, they plan to run more tests on a wide range of BlackBerry models.

190

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 183-199 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) 3 PROPOSED WORK In order to run forensic investigation and analysis, we are proposing this methodology to be conducted during the investigation process. It is based on Smith and Petreski approach [64]. Our methodology and approach consist of three procedures. This methodology is a basic approach and purely designed for Firefox OS. There will be many type of files and analysis, thus it is designed to have specific targeted data checklist. The use of this checklist is to identify relevant data align with specific analysis. This data checklist can be updated from time to time. 3.1 Preparation and Preservation Procedure START

Sufficient information to start ?

knowledge about Firefox OS is not sufficient, knowledge gathering is required. It is vital to understand Firefox OS architecture before forensic investigation started. In general, Firefox OS architecture consist of 3 layers [65]. The first layer is an application layer called Gaia and work as user interface for smartphones. The second layer is open web platform interface. The second layer used Gecko engine and provide all support for HTML5, JavaScript as well as CSS. All the targeted evidence are stored in this layer. The third layer called Gonk is infrastructure layer and consist of Linux-Kernel. There are two types of storage in Firefox OS running phone which are internal SD storage and additional micro-SD card. Figure 4 shows an illustration version of Firefox OS architecture.

Knowledge ready

No

Firefox OS knowledge preparation

Yes

Create system configuration. Setup forensic software and hardware Firefox OS smartphone

Phone connectivity

Define forensic tools

Prepared relevant and nonrelevant data list

Verify data and device integrity

No

Return package to requestor

Yes

START Acquisition

Figure 3. Preparation and preservation procedure

The first procedure is the Preparation and Preservation. This procedure starts with knowledge and information check. If the

Figure 4. Firefox OS Architecture [65]

The second step is to create a system configuration and setup forensic software as well as related hardware. In this step, we need to have a smartphone preinstalled with Firefox OS, forensic tools, physical and logical connectivity. We will 191

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 183-199 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) use Geeksphone Peak as our test Firefox OS phone as per shows below

Mozilla released an update for their OS regularly and any stable build can be update via over-theair. Table 1 below shows the specification detail for this phone. Table 1. Geeksphone Peak specification Hardware Processor Memory Storage Battery Data Inputs Display Sensor

Figure 5. Geeksphone Peak

Geeksphone Peak is among the first Firefox running OS phone released and was marketed under developer preview model. It was released in April 2013. This phone is equipped with Firefox OS version 1.1.1 as shows in Figure 6.

Camera

Connectivity

Compatible Network Dimension

Figure 6. Geeksphone Peak information detail

Detail 1.2 GHz Qualcomm Snapdragon S4 8225 processor (ARMv7) 512 MB Ram -Internal 4GB -Micro SD up to 16GB - 1800 mAh - micro-USB charging Capacitive multi-touch IPS display 540 × 960 px (qHD) capacitive touchscreen, 4.3" -Ambient light sensor -Proximity sensor -Accelerometer 8 MP (Rear), 2 MP (Front) -WLAN IEEE 802.11 a/b/g/n -Bluetooth 2.1 +EDR -micro-USB 2.0 -GPS -mini-SIM card -FM receiver - GSM 850 / 900 / 1800 / 1900 - HSPA (Tri-band) - HSPA/UMTS 850 / 1900 / 2100 -Width: 133.6 millimetres (5.26 in) -Height: 66 millimetres (2.6 in) -Thickness: 8.9 millimetres (0.35 in)

As for the phone connectivity, we are using microUSB 2.0 port. Subsequently, we need to make a connection between the phone and the host machine. Firefox OS is based on Linux-Kernel and the design more or less are similar with Android. For that reason, we can easily access the phone using Android Debug Bridge (ADB). The ADB is a toolkit integrated in the Android SDK package and consists of both client and server-side codes. The codes are able to communicate with one another. We will run UNIX dd command from the ADB to acquire the phone image. After that, we will use AccessData Forensic Toolkit and HxD Hex Editor to further examine the acquired image during examination and analysis procedure.

192

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 183-199 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) Next we need to prepare a list of relevant and nonrelevant data. This list is very important to identify relevant data to be captured later. For example, if we conduct log analysis, relevant data will be chat log, call log, system log and other related information. The next step is to verify the integrity of data and device. Only if the integrity of data and device is confirmed, then we can proceed to Acquisition procedure. Else, the package need to be returned to requestor. 3.2 Acquisition Procedure START

Acquisition Process Acquiring and imaging forensic data

connected to the host machine and internal SD storage as well as micro-SD card can be mounted as removable drive. However, acquiring data from other user partitions in internal storage is quite a challenging tasks. An additional driver for Geeksphone Peak need to be installed into the host machine. We will use Windows 8 as an operating system in the host machine. Once connected using the micro-USB 2.0 port, Windows 8 will ask for the driver. The supported USB driver can be downloaded from Geeksphone web. Once the installation finished, Geeksphone Peak will appear in the Device Manager as shows in Figure 8 and internal SD storage as well as micro-SD card will be mounted into the host machine as Linux File-CD Gadget USB Device.

Mark data in the list

Data not relevant

Is there unprocessed data ?

No

Consider advising requester of initial finding

Yes

What type of data ? Data relevant

Targeted data acquired

START Analysis

Document this data and attributes to Relevant data list

Figure 7. Acquisition Procedure

The second procedure in our proposed methodology is Acquisition Procedure. This procedure is mainly for acquiring and imaging the targeted data. Here, we will use several forensic tools; physical or logical depending on the targeted data to obtain. The process starts with acquiring and imaging targeted forensic data which falls under relevant list. From our observation, we found that the evidences can be captured from internal SD storage, micro-SD and other user partitions. Acquiring data from some part of the internal SD storage and micro-SD card are relatively easy; the phone only need to be

Figure 8. Mounted phone storage

In order to access the storage from the host machine, we need to enable USB storage in the phone setting. To enable the storage, we need to go to phone Settings > Storage and enable USB storage as shows in Figure 9.

193

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 183-199 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Figure 9. Enable USB storage

After that, we need to go to Figure 11. Share micro-SD card storage using USB

Media storage > Internal Storage and enable Share using USB as shows in Figure 10. This option will make an internal SD storage to be appear in the host machine

Now we can see both internal SD storage and micro-SD appear in the host machine as shows in Figure 12.

Figure 10. Share internal storage using USB

Figure 12. Phone storage mounted as removable storage

To access the micro-SD card, repeat the same process by go to

From this way, we can access both internal SD storage and micro-SD card. However, only around 1GB storage can be access for internal storage, while the remaining 3GB internal storage still cannot be access. Figure 13 shows all the files in the internal SD storage.

Media storage > SD Card Storage and enable it as shows in Figure 11.

194

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 183-199 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) This command will establish the connection between the phone and the host machine and root@android:/ # access will appear in the CMD. After that, we type the following to run dd command; dd if=/dev/block/mmcblk0 of=/mnt/emmc/evidence.img

Figure 13. Phone storage

At the time we were conducting this experiment, there is no single file was stored in the micro-SD card. As for the internal SD storage, we can simply copy all these files into the host machine and analyse it. For remaining part of the internal storage, we need to run UNIX dd command to acquire the whole image of the internal storage. In order to acquire the phone image, we will use UNIX dd command in ADB environment. Before we start the ADB, we must disable the USB storage support and unmounts micro-SD card from the host machine. After that, we run command prompt (CMD) and pointing the command start to %AndroidSDK%\sdk\platform-tools folder. To start ADB, type the following

The image is pointing into the micro SD card and we does not put any block size, by default it is 512KB. The process will take up until 10 minutes depending the block size chosen and the acquired image is around 3.64GB (internal storage size). This acquired image will cover all partitions in the internal storage and also cover unallocated partition. In order to transfer the acquired image into the host machine, we need to mount back the micro SD card and follow the previous step in Figure 9 which is phone Settings > Storage and enable back the phone storage. The host machine will again detecting the removable drive and acquired image will be appear as shows in the Figure 15.

adb shell

Figure 15. Acquired image from the Firefox OS running phone Figure 14. Root Access

195

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 183-199 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) After the image is obtained, the relevant data list will be marked and updated. This step only involve specific files in the relevant data list. For example, if we want to run log analysis, only log file will be acquired. The second step is to verify the remaining data, if the remaining data in the smartphone is still relevant, the first process is repeated, and it will write down the details into relevant data list. After all relevant data is obtained, we will gather all initial finding and present it to the requester. Analysis procedure will start after all targeted data acquired.

What application created, edited, modified, sent, received the file to be ?

unspecified format files or AccessData Forensic Toolkit to examine the image file. In this procedure, the acquired data will be examined and analysed to find the application involved in creating, editing, modifying, sending and receiving targeted file. Later, we will investigate the data origin; and the directory it came from. The third sub step is to find when the file was created, accessed, modified, received, sent, viewed, deleted and launched. The fourth sub step is to analyze how it was created, transmitted, modified and used. Registry entry and system log need to be checked and relevant information will be identified and rectified again. Last but not least, we need to repeat all the step, to ensure that there is no missing data during the analysis. Once completed, we can start documenting the result. As for this testing purposes, we try to find the tree directory of the files in the internal SD storage. Table 2 shows tree directory in internal SD storage.

Where was it found and where it come from ?

Table 2. Tree directory in internal SD storage

3.3 Examination and Analysis Procedure START Examination and analysis using forensic tools

Evidence

When it was created, accessed, modified, received, sent, viewed, deleted and launch ?

Alarms setting

%Internal%\.gallery\previews\DCIM \100MZLLA %Internal%\Alarms

Installed apps

%Internal%\ Android\data

Photo gallery

%Internal%\ DCIM\100MZLLA

Downloads

Event logs

%Internal%\ Download %Internal%\ external_sd\Android\data %Internal%\ logs

Recovery folder

%Internal%\ LOST.DIR

Video files

%Internal%\ Movies

Audio files

%Internal%\Music

Notifications log

%Internal%\Notifications

Received picture

%Internal%\Picture

Podcasts info

%Internal%\Podcasts

Ringtones

%Internal%\Ringtones

Screenshots

%Internal%\screenshots

Temp folder

%Internal%\tmp

Downloaded updates

%Internal%\updates

Thumbnail in photo How was it created, transmitted, modified and used ?

Yes

Check registry entry and system log

Identify any other relevant information

Third party apps Is there any data left for analysis ? No

Documenting the result

Figure 16. Examination and Analysis procedure

The last procedure in our methodology is Examination and Analysis Procedure. This procedure will focus on deeper attributes of acquired data. All the information will be analysed using additional tools such as SQL viewer to open the database file, HxD Hex Editor to open

Tree Directory

196

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 183-199 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) This file analysis only covered for internal SD storage only. For remaining part of the internal storage, we need to further examine the phone image. For that reason, we use AccessData Forensic Toolkit to open the captured image. Once we open it, we can see the image consists of 21 partitions as shows in the Figure 17.

platform but it is purely design for Firefox OS. Checklist will be used to classify targeted data during acquisition. Later on, we will work on file, log, system, memory and full data analysis, therefore checklist is very important during each procedures. In this paper we only demonstrated certain steps to show our approach are working perfectly for Firefox OS. Every steps in each procedure has been explained properly to show how we can conduct further experiments. Acknowledgments. Special thanks to academic staff of Universiti Putra Malaysia for providing continuous guide and support, and also to Ministry of Education Malaysia and Universiti Sains Malaysia for granting the scholarship to me. 5 REFERENCES 1.

Figure 17. The list of the partition in the phone image

The internal SD storage belongs to partition no 18, named Internal SD-FAT32. Other than partition no 3 which is NONAME-FAT16 and partition no 18, remaining partitions were detected as unknown by AccessData Forensic Toolkit. From our observation, we believe that these partition will covers the recovery image, boot partition, system files, local files, cache, user’s data and other useful information for forensic investigators. 4 CONCLUSION AND FUTURE WORK This paper explains about an overview and methodology of mobile forensic procedures in Firefox OS. To our concern, there will be no restriction in Firefox OS and it is design to have the freedom of HTML5. Therefore, we will not include any step for rooting the devices; hence the integrity of the data can be preserved. This new approach might be applicable with other mobile

Mobile Operating System Market Share, http://en.wikipedia.org/wiki/Mobile_operating_system# Market_share 2. Gartner says worldwide smartphone sales reached its lowest growth rate with 3.7 per cent increase in fourth quarter of 2008, http://www.gartner.com/it/page.jsp?id=910112 3. Gartner Says Annual Smartphone Sales Surpassed Sales of Feature Phones for the First Time in 2013, http://www.gartner.com/newsroom/id/2665715 4. First Look at Mozilla’s Web Platform for Phones: Boot to Gecko, TechHive, http://www.techhive.com/article/250879/first_look_at_ mozilla_s_web_platform_for_phones_boot_to_gecko.ht ml 5. First Firefox OS Smartphone Has Arrived: Telefonica Prices ZTE Open At $90 In Spain, Latin American Markets Coming Soon, TechCrunch, http://techcrunch.com/2013/07/01/first-firefox-osphone/ 6. Mozilla Corporation, Mozilla Announces Global Expansion for Firefox OS, http://blog.mozilla.org/press/2013/02/firefox-osexpansion/ 7. Mozilla Developer Network, Firefox OS, https://developer.mozilla.org/enUS/docs/Mozilla/Firefox_OS 8. Mozilla’s Boot 2 Gecko and why it could change the world, http://www.knowyourmobile.com/products/16409/mozil las-boot-2-gecko-and-why-it-could-change-world 9. Mozilla Developer Network, Firefox OS security overview, https://developer.mozilla.org/enUS/docs/Mozilla/Firefox_OS/Security/Security_model 10. Goode, A. J.: Forensic extraction of electronic evidence from GSM mobile phones, In: IEE Seminar on Secure

197

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 183-199 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

11.

12.

13.

14.

15.

16.

17.

18

19.

20.

21.

22.

23.

24.

GSM and Beyond: End to End Security for Mobile Communications, pp. 9/1--9/6 (2003) Willassen, S. Y.: Forensics and the GSM mobile telephone system, In: Int. J. Digit. Evid., vol. 2, no. 1, pp. 1--17, (2003) Casadei, F., Savoldi, A., Gubian, P.: SIMbrush: an open source tool for GSM and UMTS forensics analysis, In: First International Workshop on Systematic Approaches to Digital Forensic Engineering (SADFE’05), pp. 105-119 (2005) Marturana, P., Me, G., Berte, R., Tacconi, S.: A Quantitative Approach to Triaging in Mobile Forensics, In: 10th International Conference on Trust, Security and Privacy in Computing and Communications, pp. 582-588 (2011) Chen, S., Hao, X., Luo, M.: Research of Mobile Forensic Software System Based on Windows Mobile, In: 2009 International Conference on Wireless Networks and Information Systems, pp. 366--369 (2009) Irwin, D., Hunt, R.: Forensic information acquisition in mobile networks, In: 2009 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pp. 163--168 (2009) Klaver, C.: Windows Mobile advanced forensics, In: Digit. Investig., vol. 6, no. 3--4, pp. 147--167, May (2010) Casey, E., Bann, M., Doyle, J.: Introduction to Windows Mobile Forensics, In: Digit. Investig., vol. 6, no. 3--4, pp. 136--146, May (2010) Kaart, M., Klaver, C., van Baar, R. B.: Forensic access to Windows Mobile pim.vol and other Embedded Database (EDB) volumes, In: Digit. Investig., vol. 9, no. 3--4, pp. 170--192, Feb. (2013) Rehault, F.: Windows mobile advanced forensics: An alternative to existing tools, In: Digit. Investig., vol. 7, no. 1--2, pp. 38--47, Oct. (2010) Grispos, G., Storer, T., Glisson, W. B.: A comparison of forensic evidence recovery techniques for a windows mobile smart phone, In: Digit. Investig., vol. 8, no. 1, pp. 23--36, Jul. (2011) Kumar, S. S., Thomas, B., Thomas, K. L.: An Agent Based Tool for Windows Mobile Forensics, In: Lect. Notes Inst. Comput. Sci. Soc. Informatics Telecommun. Eng., vol. 88, pp. 77--88 (2012) Canlar, E. S., Conti, M., Crispo, B., Di Pietro, R.: Windows Mobile LiveSD Forensics, In: J. Netw. Comput. Appl., vol. 36, no. 2, pp. 677--684, Mar. (2013) Mokhonoana, P. M., Olivier, M. S.: Acquisition of a Symbian Smart phone’s Content with an On-Phone Forensic Tool, In: Southern African telecommunication networks and applications conference pp. 1--7, (2007) Breeuwsma, M., Jongh, M. De, Klaver, C., Knijff, R. Van Der, Roeloffs, M.: Forensic Data Recovery from Flash Memory, In: Small Scale Digit. Device Forensics J., vol. 1, no. 1, pp. 1--7, (2007)

25. Rossi, M., Me, G.: Internal forensic acquisition for mobile equipments, In: 2008 IEEE International Symposium on Parallel and Distributed Processing, pp. 1–7, (2008) 26. Dellutri, F., Ottaviani, V., Bocci, D., Italiano, G. F., Me, G.: Data reverse engineering on a smartphone, In: 2009 International Conference on Ultra Modern Telecommunications & Workshops, pp. 1–8, (2009) 27. Yu, X., Jiang, L., Shu, H., Yin, Q., Liu, T.: A Process Model for Forensic Analysis of Symbian, In: Commun. Comput. Inf. Sci. - Adv. Softw. Eng., vol. 59, pp. 86-93, (2009) 28. Savoldi, P. G. Antonio: Issues in Symbian S60 platform forensics, In: J. Commun. Comput., vol. 6, no. 3, (2009) 29. Pooters, I.: Full user data acquisition from Symbian smart phones, In: Digit. Investig., vol. 6, no. 3--4, pp. 125--135, (2010) 30. Thing, V. L. L., Tan, D. J. J.: Symbian Smartphone Forensics and Security: Recovery of Privacy-Protected Deleted Data, In: Lect. Notes Comput. Sci. - Inf. Commun. Secur., vol. 7618, pp. 240--251, (2012) 31. Thing, V. L. L., Chua, T.: Symbian Smartphone Forensics: Linear Bitwise Data Acquisition and Fragmentation Analysis, In: Commun. Comput. Inf. Sci. - Comput. Appl. Secur. Control Syst. Eng., vol. 339, pp. 62--69, (2012) 32. Savoldi, A., Gubian, P., Echizen, I.: A Comparison between Windows Mobile and Symbian S60 Embedded Forensics, In: Fifth International Conference on Intelligent Information Hiding and Multimedia Signal Processing, pp. 546--550 (2009) 33. Mohtasebi, S., Dehghantanha, A., Broujerdi, H. G.: Smartphone Forensics : A Case Study with Nokia E5-00 Mobile Phone, In: Int. J. Digit. Inf. Wirel. Commun., vol. 1, no. 3, pp. 651--655 (2012) 34. Bader, M., Baggili, I.: iPhone 3GS Forensics : Logical Analysis Using Apple iTunes Backup Utility, In: Small Scale Digit. Device Forensics J., vol. 4, no. 1, pp. 1--15, (2010) 35. Husain, M. I., Baggili, I., Sridhar, R.: A Simple CostEffective Framework for iPhone, In: Lect. Notes Inst. Comput. Sci. Soc. Informatics Telecommun. Eng. Digit. Forensics Cyber Crime, vol. 53, pp. 27--37 (2011) 36. Husain, M. I. Sridhar, R.: iForensics : Forensic Analysis of Instant Messaging on, In: Lect. Notes Inst. Comput. Sci. Soc. Informatics Telecommun. Eng. - Digit. Forensics Cyber Crime, vol. 31, pp. 9--18, (2010) 37. Jung, J., Jeong, C., Byun, K., Lee, S.: Sensitive Privacy Data Acquisition in the iPhone for Digital Forensic Analysis, In: Commun. Comput. Inf. Sci. - Secur. Trust Comput. Data Manag. Appl., vol. 186, pp. 172--186, (2011) 38. Tso, Y.-C., Wang, S.-J., Huang, C.-T., Wang, W.-J.: iPhone social networking for evidence investigations using iTunes forensics, In: Proceedings of the 6th International Conference on Ubiquitous Information Management & Communication - ICUIMC ’12, (2012)

198

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 183-199 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) 39. Said, H., Yousif, A., Humaid, H.: IPhone forensics techniques and crime investigation, In: The 2011 International Conference and Workshop on Current Trends in Information Technology (CTIT 11), pp. 120-125 (2011) 40. Mutawa, N. Al , Baggili, I., Marrington, A.: Forensic analysis of social networking applications on mobile devices, In: Digit. Investig., vol. 9, pp. S24--S33, Aug. (2012) 41. Lee, J., Park, D.: A Study on Evidence Data Collection through iPhone Forensic, In: Commun. Comput. Inf. Sci. - Converg. Hybrid Inf. Technol., vol. 310, pp. 268-276, (2012) 42. Levinson, A., Stackpole, B., Johnson, D.: Third Party Application Forensics on Apple Mobile Devices, 44th Hawaii Int. Conf. Syst. Sci., pp. 1--9, Jan. (2011) 43. Gómez-Miralles L., Arnedo-Moreno, J.: Versatile iPad forensic acquisition using the Apple Camera Connection Kit, In: Comput. Math. with Appl., vol. 63, no. 2, pp. 544--553, Jan. (2012) 44. Iqbal, B., Iqbal, A., Al Obaidli, H.: A novel method of iDevice (iPhone, iPad, iPod) forensics without jailbreaking, In: 2012 International Conference on Innovations in Information Technology (IIT), pp. 238-243 (2012) 45. Ariffin, A., Orazio, C. D., Choo, K. R., Slay, J.: iOS Forensics : How can we recover deleted image files with timestamp in a forensically sound manner ?, In: Eighth International Conference on Availability, Reliability and Security (ARES), pp. 375–382, (2013) 46. Lessard, J., Kessler, G. C.: Android Forensics : Simplifying Cell Phone Examinations, In: Small Scale Digit. Device Forensics J., vol. 4, no. 1, pp. 1--12, (2010) 47. Distefano, A., Me, G., Pace, F.: Android anti-forensics through a local paradigm, In: Digit. Investig., vol. 7, pp. S83--S94, Aug. (2010) 48. Albano, P., Castiglione, A., Cattaneo, G., Santis, A. De: A Novel Anti-forensics Technique for the Android OS, In: 2011 International Conference on Broadband and Wireless Computing, Communication and Applications, pp. 380–385, (2011) 49. Azadegan, S., Yu, W., Liu, H., Sistani, M., Acharya, S.: Novel Anti-forensics Approaches for Smart Phones, In: 45th Hawaii International Conference on System Sciences, pp. 5424–5431, (2012) 50. Quick, D., Alzaabi, M.: Forensic analysis of the android file system YAFFS2, In: Proceedings of the 9th Australian Digital Forensics Conference, pp. 100–109, (2011) 51. Sylve, J., Case, A., Marziale, L., Richard, G. G.: Acquisition and analysis of volatile memory from android devices, In: Digit. Investig., vol. 8, no. 3--4, pp. 175--184, Feb. (2012) 52. Vidas, T., Zhang, C., Christin, N.: Toward a general collection methodology for Android devices, In: Digit. Investig., vol. 8, pp. S14--S24, Aug. (2011)

53. Park, J., Chung, H., Lee, S.: Forensic analysis techniques for fragmented flash memory pages in smartphones, In: Digit. Investig., vol. 9, no. 2, pp. 109-118, Nov. (2012) 54. Son, N., Lee, Y., Kim, D., James, J. I., Lee, S., Lee, K.: A study of user data integrity during acquisition of Android devices, In: Digit. Investig., vol. 10, pp. S3-S11, Aug. (2013) 55. Racioppo, C., Murthy, N.: Android Forensics : A Case Study of the ‘ HTC Incredible ’ Phone, In: Proceedings of Student-Faculty Research Day, pp. 1--8, (2012) 56. Andriotis, P., Oikonomou, G., Tryfonas, T.: Forensic analysis of wireless networking evidence of Android smartphones, In: 2012 IEEE Int. Work. Inf. Forensics Secur., pp. 109--114, Dec. (2012) 57. Mylonas, A., Meletiadis, V., Mitrou, L., Gritzalis, D.: Smartphone sensor data as digital evidence, In: Comput. Secur., vol. 38, no. 2012, pp. 51--75, Oct. (2013) 58. Thing, V. L. L., Ng, K.-Y., Chang, E.-C.: Live memory forensics of mobile phones, In: Digit. Investig., vol. 7, pp. S74--S82, Aug. (2010) 59. Lai, Y., Yang, C., Lin, C., Ahn, T.: Design and Implementation of Mobile Forensic Tool for Android Smart Phone through Cloud Computing, In: Commun. Comput. Inf. Sci. - Converg. Hybrid Inf. Technol., vol. 206, pp. 196--203 (2011) 60. Guido, M., Ondricek, J., Grover, J., Wilburn, D., Nguyen, T., Hunt, A.: Automated identification of installed malicious Android applications, In: Digit. Investig., vol. 10, pp. S96--S104, Aug. (2013) 61. Fairbanks, K., Atreya, K., Owen, H.: BlackBerry IPD parsing for open source forensics, In: IEEE Southeastcon 2009, vol. 01, pp. 195--199, (2009) 62. Sasidharan, S. K., Thomas, K. L.: BlackBerry Forensics : An Agent Based Approach for Database Acquisition, In: Commun. Comput. Inf. Sci. - Adv. Comput. Commun., vol. 190, pp. 552--561, (2011) 63. Marzougy, M. Al, Baggili, I., Marrington, A.: LNICST 114 - BlackBerry PlayBook Backup Forensic Analysis, In: Lect. Notes Inst. Comput. Sci. Soc. Informatics Telecommun. Eng. - Digit. Forensics Cyber Crime, vol. 114, pp. 239--252, (2013) 64. Smith, D. C., Petreski, S.: A New Approach to Digital Forensic Methodology, In: DEFCON, 2007 https://www.defcon.org/images/defcon-18/dc-18presentations/DSmith/DEFCON-18-Smith-SPMDigital-Forensic-Methodlogy.pdf 65. Mozilla Developer Network, Firefox OS architecture, https://developer.mozilla.org/enUS/docs/Mozilla/Firefox_OS/Platform/Architecture

199

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 200-208 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Alarming! Security Aspects of the Wireless Vehicle: Review Sabarathinam Chockalingam 1, Kailash Nagar, First Street, Near Police Colony, Karaikudi – 630002, India. [email protected] Harjinder Singh Lallie University of Warwick, WMG, Coventry, United Kingdom, CV4 7AL. [email protected] ABSTRACT The auto-mobile industry has grown to become an integral part of our day – to – day life. The introduction of wireless vehicles definitely have to pass through the analysis of potential security threats and vulnerabilities, and robust security architecture should be designed that are able to cope with these threats and vulnerabilities. In this work, we have identified various categories of research in 'Cyber Security of a wireless vehicle' and mainly focused on 'In – Vehicle Network' to identify various potential security threats and vulnerabilities as well as the suitable security solutions. In addition to providing a survey of related academic efforts, we have also outlined several key issues and open research questions.

KEYWORDS In-Vehicle Network, Security, Threats, Vulnerabilities, Wireless Vehicle.

1 INTRODUCTION Vehicle manufacturers started incorporating a lot of technological advancements which helps to replace mechanical solutions for vehicle control by software and electronic solutions ([1] and [2]). Nowadays people started demanding wireless internet access even in auto-mobile. They prefer to access internet even while driving on the highway. They also expect high data bit rates to surf the internet, to download files and to have a real time video conference calls through wireless communication similar to the wired communication's data bit rate [1]. Most

importantly manufacturers had implemented Vehicular Ad – Hoc Network (VANET) technology which creates mobile network by using moving vehicles as nodes in a network [1]. Importantly, these technological advancements in wireless vehicles brings in a lot of possibility for Cyber Attacks. So, we mainly focus on 'Cyber Security of a wireless vehicle' in this work. 2 LITERATURE REVIEW In recent times, wires are being replaced by wireless technology in auto-mobile. There are a lot of benefits in removing all the wires within a vehicle and implementing wireless communication in a vehicle. Some of which are, 1. It helps to avoid collision in the vehicle network by issuing an automatic warning which ensures safety ([1] and [2]). 2. It enables users to know about directions, weather reports. Users could also check e-mails, social media, and download files thereby increasing the comfort level of passengers even while travelling ([1] and [2]). 3. Installation cost of wireless technology in automobile is cheaper compared to wired technology. 4. Rapid Deployment, and Mobility [3]. Replacing wires with wireless communication in auto-mobile also brings in a lot of security challenges. 'Cyber security of a wireless vehicle' is the major concern in recent times which would be discussed in this section by reviewing various research conducted.

200

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 200-208 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) 2.1 Firmware updates Over The Air (FOTA) and Wireless Diagnostics Over the past decade there are a lot of research conducted with regard to FOTA and Wireless Diagnostics. FOTA help to save consumers' time and to reduce the labour costs of the manufacturers in service stations. This makes it simpler for manufacturers to fix the bug in a short time. In 2005, Mahmud et al proposed an architecture for uploading software in vehicle after making a few assumptions such as all the vehicles would be equipped with wireless interface units, company would need to upload software in the vehicle they manufactured, set of keys would be installed in the vehicle at the time of manufacturing [4]. Those keys would ensure the authentic communication between manufacturer and/or software supplier with the vehicle. They had also recommended software suppliers to send at-least two copies of software with a message digest to the vehicle inorder to improve security [4]. But their work is limited as it help to upload software in only one vehicle at a time which means it could be used only for wireless diagnostics where manufacturer would need to fix a particular vehicle which have problems and also their work did not cover the aspects of key management. In 2008, Nilsson et al proposed a protocol for FOTA which ensured data integrity, authentication, confidentiality, and data freshness [5]. They analysed the security aspects by conducting various experiments. But their work did not address a few major issues like privacy, key management. In 2008, Nilsson et al assessed the risks that involved with wireless infrastructure and derived a set of guidelines for creating secured infrastructure to do wireless diagnostics and software updates [6]. They identified portal security risks such as Impersonation and Intrusion, communication link security risks such as Traffic Manipulation, and vehicle security risks such as Impersonation and Intrusion, and the consequences of these risks such

as Execution of Arbitrary Code, Disclosure of Information, and Denial of Service [6].But they did not analyse the risks involved with the Engine Control Unit (ECU). They suggested to explore the use of Intrusion Detection System (IDS) and Firewall in wireless vehicles to improve the security [6]. In 2011, Idrees et al proposed a protocol which guaranteed a secured FOTA in wireless vehicles [7]. They mainly focussed on hardware security mechanism. This helped to improve the standard of security compared to the other systems. Key issue still need to be addressed in FOTA is key management and uploading software in multiple vehicles at the same time securely. 2.2 Digital Forensic Investigation Digital Forensic Investigation is important inorder to identify the criminal in-case of successful cyber attacks but till now there are only a few research conducted with regard to digital forensic investigation in wireless vehicles. In 2004, Carrier et al proposed an event – based digital forensic investigation framework [8]. This is used by Nilsson et al as the base for their work. Nilsson et al derived a list of requirements for detection, data collection, and event reconstruction based on the attacker model and digital forensic investigation principles [9]. They have also recommended to use event data recorder which would play a major role in digital forensic investigation, a method to detect events in vehicle, and to trigger an alert about security violation which would help the investigators to initiate the investigation [9]. Storing current state vehicle information in a secured location prove to be one of the important information during digital forensic investigation [9]. Major limitation of this work would be that they did not explore detection techniques which would help digital forensic investigation.

201

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 200-208 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) 2.3 In – Vehicle Network In – Vehicle Network play an important role in wireless vehicles. There are a lot of research conducted in this area over the years. In 2003, Mahmud et al analysed blue-tooth and their security issues in wireless vehicles [10]. In 2008, Larsson et al proposed specification based attack detection techniques within the In-Vehicle network [11]. In 2008, Verendel et al proposed a system that make use of honeypot in-order to gather attackers' information [12]. In 2008, Nilsson et al categorised ECUs based on the safety and security characteristics [13]. In 2009, Nilsson, et al analysed FlexRay protocol by simulating attacks [14]. In 2010, Rouf et al evaluated security and privacy of wireless tire pressure monitoring systems [15]. In 2010, Koscher et al summarised the potential risks involved with wireless vehicles after conducting various experiments [16]. In 2011, Kleberger et al categorised the research areas with regard to security aspects of the InVehicle Network in wireless vehicles [17]. In 2012, Schweppe et al proposed an architecture that incorporates data flow tracking into In – Vehicle Network which would ensure security and privacy [18]. In 2012, Onishi analysed new risks in the wireless vehicles caused by Carry-In Devices and suggested suitable countermeasures [19]. Detailed analysis of these research would be carried out in Section 3. 2.4 Vehicle – Vehicle Communication There were many research conducted in Vehicle – Vehicle communication which is one of the important aspects of wireless vehicle. In 2004, Mahmud et al proposed a technique to exchange messages between vehicles securely. They have also analysed about creating secure communication links between vehicles. After analysing, they concluded that this would be possible with the present technology [20]. This technique ensured authentication, authorisation, and data integrity. But they did not focus on the privacy aspect which is one of the important aspects in vehicle – vehicle communication [20].

In 2004, Hu et al analysed the wormhole attacks and they proposed how to detect the wormhole attacks using directional antennas [21]. In 2006, Raya et al analysed the vulnerabilities that exist in vehicular communication such as jamming, forgery, in – transit traffic tampering, impersonation, privacy violation, and on board tampering [22]. After analysing the hardware modules, they have recommended to use Event Data Recorder (EDR), and Tamper Proof Device (TPD) which would improve security. They concluded their work by listing open research problems in vehicular communication such as secure positioning, data verification, and Denial of Service (DOS) resilience [22]. In 2006, Moustafa et al proposed Authentication, Authorization, and Accounting (AAA) mechanism to authenticate vehicles on highways which would ensure secure data transfer between wireless vehicles. They considered Optimized Link State Routing (OLSR) protocol as the base to propose their reliable routing approach [23]. In 2007, Gerlach et al proposed a security architecture for vehicular communication using functional layer, organizational/component, reference model, and information centric views [24]. They suggested that this architecture could be used as a base for prototype implementation. Security level could be also analysed by conducting various practical experiments [24]. In 2008, Larson et al analysed the security issues of vehicle – vehicle communication. They used anti intrusion taxonomy introduced by Halme et al [25] as the base for discussing layers of defence – in – depth paradigm. They have also suggested vehicle manufacturers to adopt Defence – in – Depth approach in the future to improve security level in the wireless vehicles [26]. In 2008, Anurag et al introduced collision avoidance system using Global Positioning System (GPS). This system would ensure safety in wireless vehicles [27].

202

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 200-208 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) In 2010, Tripathi analysed problems that exist in Vehicular Ad-Hoc Networks which mainly focused on basic attacks such as cheating with sensor information, ID disclosure of other vehicles in order to track their location, Denial of Service (DoS), masquerading and also other sophisticated attacks such as hidden vehicles, tunnel, wormhole, and bush telegraph [28]. This work lacks practical analysis. In 2010, Amirtahmasebi et al discussed about various attacks such as sybil attack, bogus information, Denial of Service, impersonation, alteration attack, replay attack, and illusion attack in vehicular communication as well as various securing techniques such as digital signatures, tamper proof device, data correlation, and WAVE (Wireless Access in Vehicular Environments IEEE 1609.2) [29]. After reviewing various research conducted in 'Cyber Security of a Wireless Vehicle', we have identified four important categories of research in this area. They are: 1. Firmware updates Over The Air (FOTA) and Wireless Diagnostics 2. Digital Forensic Investigation 3. In – Vehicle Network 4. Vehicle – Vehicle Communication 3 IN-VEHICLE NETWORK In–Vehicle Network is the combination of Engine Control Units (ECUs), and buses. Most common networks are Controller Area Network (CAN), Local Interconnect Network (LIN), Media Oriented Systems Transport (MOST), and Flex Ray ([9], [11], and [13]). CAN play a vital role in communication of safety – critical applications like Anti-lock braking system, and Engine management systems [11]. LIN play a major role in communication of non – safety critical sensors, and actuator systems [13]. MOST is the high speed technology which is used to carry audio, and

video data [13]. CAN is being replaced by FlexRay in the recent years. Data is transferred from one network to another using wireless gateways ([9], [11] and [13]). In – Vehicle Network is illustrated in Figure 1 [12]. Wireless Communication

Wireless Gateway Wireless Vehicle

In – Vehicle Network

ECUs

Figure 1. In – Vehicle Network [12]

As discussed earlier in Section 2, there are a lot of research conducted with regard to 'In – Vehicle Network'. In 2003, Mahmud et al proposed a technique which would help to secure wireless In – Vehicle Blue-tooth networks. Short range communication between closer vehicles would help to avoid collision. This could be achieved using blue-tooth technology. Blue-tooth technology covers 10-100 metres in wireless communication [10]. They have also proposed a security architecture which make use of password protected Network Device Monitor (NDM) [10]. NDM help to activate devices which would want to take part in the wireless communication. It would make use of two different PINs: one for secured communication which should be changed after each session to encounter brute – force attack, and the another one for non – secured communication which is not necessary to change after each session [10]. NDM plays a major role in distributing PINs to devices which would be the suitable countermeasure against Man – in – the – Middle Attack [10]. This architecture could also be implemented at a very low cost. They concluded their work by addressing one of the important question “what happens if the activated

203

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 200-208 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) device got stolen or lost”?. They suggested in that case the owner would be able to deactivate that device manually using NDM [10]. But they failed to address “what happens if NDM malfunctions?” in their work. As there is no alternate solution, the entire system would be compromised in that case. In 2008, Larson et al proposed a technique that help to detect cyber-attacks within the in – vehicle network. They have also suggested the location where the attack detector could be placed. Their work mainly focussed on CAN protocol version 2.0 and CANopen draft standard 3.01 which is used to create protocol – level security specifications [11]. They have also mentioned that the abnormal messages could be detected with the help of communication protocol security specifications and Illegal attempts to transmit or receive messages could be detected with the help of ECU communication parameters [11]. They recommended to place a detector on each ECU because if a detector is placed in CAN it is impossible to detect the source and destination of the message as it would not support unique ECU identifiers [11]. But if we place the detector on each ECU, this would be helpful as the object directory of ECU knows which one to transmit and to receive. They evaluated the attack detector using different attacker actions such as Flood, Read, Replay, Spoof, and Modify [11]. They concluded that still there are some attacks which would be possible even if the attack detector is placed. In-order to make this system effective and complete, they have suggested to implement alternate approach like firewalls to complement the attack detector [11]. In 2008, Verendel et al proposed a technique to gather attackers' information using honeypot which is simulated In – Vehicle network as shown in Figure 2 [12]. This would allow us to analyse the attackers' behaviour thereby we could prevent cyber-attacks. They suggested that honeypot should be placed in the vehicle and gathered information should be processed at the central location. Larson et al have identified the major attacks in the In – Vehicle network. Verendel et al

used it as the base to detect the attacks early which would help to ensure safety. But they did not focus on the security of gathered data which would be analysed at the processing centre. This data could be tampered. Wireless Communication

Honeypot

Wireless Vehicle Wireless Gateway

Simulated In – Vehicle Network

ECUs

Figure 2. Vehicle Honeypot [12]

In 2008, Nilsson et al classified ECUs into five categories based on the safety and security characteristics. They were Powertrain, Vehicle Safety, Comfort, Infotainment, and Telematics [13]. After analysing the attacker model, they concluded that communication link is the main target for the attackers where cyber-attacks such as eavesdropping, intercepting, modifying, and injecting messages would be possible [13]. They discussed the process of assigning Safety Integrity Levels (SIL) ranging from highest level 4 to lowest level 1 based on their controllability after failure. They assigned highest SIL 4 for powertrain and vehicle safety ECUs. Powertrain category consists of brake system which is highly important to ensure safety. In case of failure, driver would not be able to control the vehicle. Vehicle Safety category consists of tire pressure monitoring, air bag, collision avoidance system which are also highly safety critical. They have

204

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 200-208 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) also assigned highest safety integrity level as in the case of failure, driver would not be able to control the vehicle [13]. They have assigned level 2 for comfort category as it would not affect the safety immediately. They have assigned level 1 for both infotainment and telematics. Infotainment category consists of audio and video systems, mobile communication is provided by the ECUs of telematics category which were not highly safety critical [13]. This work would help to prioritise the categories which need more protection to ensure safety in the wireless vehicles. In 2009, Nilsson et al focused on FlexRay protocol. Their work answered the question 'why CAN is being replaced gradually by FlexRay over the years?'. FlexRay is different and effective in various aspects compared to CAN. Some of which were: FlexRay support different topologies, higher data rates, and continuous communication [14]. They also considered security properties such as data confidentiality, data integrity, data availability, data authentication, and data freshness to evaluate the security of FlexRay protocol [14]. They used Nilsson – Larson attacker model as their base and focussed on attacker actions such as read, and spoof. Read attack is possible due to lack of confidentiality, and Spoof attack is possible due to lack of authentication in FlexRay. They concluded their work by simulating these attacker actions. The major limitation of this work was that they did not provide any prevention techniques for these attacks [14]. This work could be further expanded by identifying a few more attacker actions, providing detection, and prevention techniques for these attacks which would guarantee secured FlexRay protocol [14]. In 2010, Rouf et al evaluated wireless Tire Pressure Monitoring System (TPMS). Air pressure inside the tires could be measured using TPMS continuously which would help to alert driver incase of under inflated tires [15]. They have discussed about the security risks involved such as tracking auto-mobiles, and spoofing. They have also analysed TPMS experimentally and found out that the messages could be received up to 40m

away from the car with the help of low noise amplifier [15]. They have also discussed about TPMS architecture as shown in Figure 3 [15] which consists of TPM sensors fitted in each tire, TPM ECU/receiver, a TPM warning light at the dashboard, one or four antennas, and receiver is connected to the antennas. They have recommended to follow reliable software design for the software that run in the TPMS ECU which would help to prevent displaying false readings, to encrypt packets. Packet format should be improved in-order to limit eavesdropping, and spoofing attacks [15]. Eavesdropping is the major issue which need to be addressed in the future research to make this system effective. In future, this system could also be shielded to ensure that there is no chance of eavesdropping, and spoofing attacks.

Figure 3. TPMS Architecture [15] In 2010, Koscher et al used two wireless vehicles of same make and model which was manufactured in 2009 to evaluate the issues of wireless vehicle by conducting various experiments [16]. They have conducted experiments in three different settings [16]. 1. They extracted the hardware components and analysed them in lab [16]. 2. They elevated the car on jack stand and conducted various experiments to ensure safety [16]. 3. They drove the wireless vehicle on decommissioned airport runway and conducted various experiments to ensure safety [16].

205

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 200-208 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) They have also summarised the results of various experiments. Also they have discussed about the key security challenges in CAN such as broadcast nature, fragility to Denial of Service, no authenticator fields, and weak access controls [16]. In future, this work could be used as the base to design prevention techniques that would help to improve the security of wireless vehicles. In 2011, Kleberger et al have reviewed various research related to 'in – vehicle network' and identified five categories of research. They were problems in the In – Vehicle network, architectural security features, Intrusion Detection Systems (IDS), Honeypots, and Threats and attacks [17]. They identified problems in the In – Vehicle network such as lack of sufficient bus protection, weak authentication, misuse of protocols, poor protocol implementation, and information leakage. They suggested to investigate IDS for FlexRay because both specification based and anomaly based IDS have been suggested for CAN [17]. This research did not provide any security solutions but this could be used as the base for designing security solutions for the In – Vehicle network. In 2012, Schweppe, and Roudier proposed a system with taint tracking tools that would help to monitor data that flows between ECUs in the In – Vehicle network. Using taint tracking tools in auto-mobile guarantees privacy and security [18]. This system could be used with Rouf et al wireless Tire Pressure Monitoring System (TPMS) to prevent spoof attacks thereby ensuring security, and safety of that system. In 2012, Onishi focussed on potential risks involved with wireless vehicles and assessed their severity using Common Vulnerability Scoring System (CVSS) [19]. They have identified that Carry – In – Devices (CID) creates major risks in wireless vehicles because virus, and malware could invade the system [19]. They have suggested to use certification - authority which would help to verify the content of CID and issue certificates for CID without any malicious contents [19]. They

have summarised the limitations of ECU such as low computational power, low memory, and online software update issues. They suggested that it is difficult to monitor CID always due to low computational power. But a few years back Nilsson et al prioritized ECU categories based on their Safety Integrity Level (SIL) which could be used with this research. Protecting powertrain and vehicle safety ECUs from virus, malware ensures vehicle to be in controllable state even if virus, malware invades infotainment ECUs. Onishi suggested to send warning alerts to driver if virus, malware invades highly safety critical ECUs which would help to prevent major accidents [19]. We have identified several key issues by exploring various research conducted in 'In – Vehicle Network'. They are: ➔'Securing Gateway ECU' as it is the entry point for attackers. Successful attacks would give an opportunity for attackers to gain full control of the vehicle. ➔ 'Securing Communication Links' which could help us to prevent attacks like eavesdropping, interception, and modifying messages. ➔ 'Ensuring confidentiality, and privacy' in the system. ➔ 'Securing attackers' information at the processing centre' as the information gathered using honeypot to be analysed at the processing centre lacks security. ➔ 'Monitoring Carry – In – Devices always' to ensure safety and security of the wireless vehicle. We have also identified several open research questions. They are: ➔ 'How to configure firewall in the wireless gateway?' ➔ 'How to improve the security of CAN, FlexRay?' ➔ 'How to implement Intrusion Detection System in the In – Vehicle Network?' ➔'How to monitor Carry – In – Devices always in wireless vehicle?' ➔ 'How to shield the communication link from attacks?'

206

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 200-208 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) 4 CONCLUSION After reviewing various research conducted in 'Cyber security of a Wireless Vehicle', we have identified four categories such as 'Firmware updates Over The Air (FOTA) and Wireless Diagnostics, Digital Forensic Investigation, In – Vehicle Network, and Vehicle – Vehicle Communication'. We focussed on 'In -Vehicle Network' and identified several key issues by reviewing various research conducted. It is evident that security lacks in the 'In – Vehicle network' from this work. We have also identified various research problems that have not been adequately addressed. This work could be used as a starting point in the future to address the identified open research questions and improve security in the 'In –Vehicle Network' as it highlights various security problems. 5 REFERENCES 1. C. Ribeiro, “Bringing Wireless Access to the Automobile: A Comparison of Wi-Fi, WiMAX, MBWA, and 3G”, 21st Computer Science Seminar, pp. 1–7, 2005. 2. A. Gandhi, and B.T. Jadhav, “Role of Wireless Technology for Vehicular Network,” International Journal of Computer Science & Information Technologies (IJCSIT), Vol. 3, No. 4, pp. 4823–4828, 2012. 3. P. Parikh, M.G. Kanabar, and T.S. Sidhu, “Opportunities and Challenges of Wireless Communication Technologies for Smart Grid Applications,” IEEE Power and Energy Society General Meeting, pp. 1-7, July 2010. 4. S.M. Mahmud, S. Shanker, and I. Hossain, “Secure Software Upload in an Intelligent Vehicle via Wireless Communication Links,” Proceedings of the IEEE Intelligent Vehicles Symposium, pp. 588–593, 2005.

7. M.S. Idrees, H. Schweppe, Y. Roudier, M. Wolf, D. Scheuermann, and O. Henniger, “Secure Automotive OnBoard Protocols : A Case of Over-the-Air Firmware Updates,” Proceedings of the third International Workshop on Communication Technologies for Vehicles,Vol. 6596, pp. 224–238, 2011. 8. B.D. Carrier and E.H. Spafford, “An Event-Based Digital Forensic Investigation Framework*,” Digital Forensic Research Workshop, pp. 1–12, 2004. 9. D.K. Nilsson and U.E. Larson, “Conducting Forensic Investigations of Cyber Attacks on Automobile In-Vehicle Networks,” International Journal of Digital Crime and Forensics, vol. 1, no. 2, pp. 28–41, 2009. 10. S.M. Mahmud and S. Shanker, “Security of Wireless Networks in Intelligent Vehicle Systems,” Proceedings of 3rd Annual Intelligent Vehicle Systems Symposium, NDIA, pp. 83–86, 2003. 11. U.E. Larson, D.K. Nilsson, and E. Jonsson, “An approach to specification-based attack detection for invehicle networks,” IEEE Intelligent Vehicles Symposium, pp. 220– 225, June 2008. 12. V. Verendel, D.K. Nilsson, U.E. Larson, and E. Jonsson, “An Approach to using Honeypots in In-Vehicle Networks,” 68th IEEE Vehicular Technology Conference, pp. 1207– 1211, 2008. 13. D.K. Nilsson, P.H. Phung, and U.E. Larson, “Vehicle ECU Classification Based on Safety – Security Characteristics,” Proceedings of 13th International Conference on Road Transport and Information Control (RTIC), pp. 1–7, 2008. 14. D.K. Nilsson, U.E. Larson, F. Picasso, and E. Jonsson, “A First Simulation of Attacks in the Automotive Network Communication Protocol FlexRay,” Proceedings of the International Workshop on Computational Intelligence in Security for Information Systems (CISIS'08), Vol. 53, pp. 8491, 2009.

5. D.K. Nilsson, and U.E. Larson, “Secure Firmware Updates over the Air in Intelligent Vehicles,” Communications Workshops, 2008. IEEE International Conference on, pp. 380–384, May 2008.

15. I. Rouf, R. Miller, H. Mustafa, T. Taylor, S. Oh, W. Xu, M. Gruteser, W. Trappe, and I. Seskar, “Security and Privacy Vulnerabilities of In-Car Wireless Networks : A Tire Pressure Monitoring System Case Study,” Proceedings of 19th USENIX Security Symposium, pp. 323 – 338, 2010.

6. D.K. Nilsson, U.E. Larson, and E. Jonsson, “Creating a Secure Infrastructure for Wireless Diagnostics and Software Updates in Vehicles,” Proceedings of the 27th international conference on Computer Safety, Reliability, and Security, Vol. 5219/2008, pp. 207–220, 2008.

16. K. Koscher, A. Czeskis, F. Roesner, S. Patel, T. Kohno, S. Checkoway, D. McCoy, B. Kantor, D. Anderson, H. Shacham, and S. Savage, “Experimental Security Analysis of a Modern Automobile,” IEEE Symposium on Security and Privacy, pp. 447–462, 2010.

207

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 200-208 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) 17. P. Kleberger, T. Olovsson, and E. Jonsson, “Security aspects of the in-vehicle network in the connected car,” IEEE Intelligent Vehicles Symposium (IV), pp. 528–533, June 2011. 18. H. Schweppe and Y. Roudier, “Security and privacy for in-vehicle networks,” 1st IEEE International Workshop on Vehicular Communications, Sensing, and Computing (VCSC), pp. 12–17, June 2012.

28. K.P. Tripathi, “An Essential of Security in Vehicular Ad hoc Network,” International Journal of Computer Applications, Vol. 10, No. 2, pp. 11-16, 2010. 29. K. Amirtahmasebi and R.S. Jalalinia, “Vehicular Networks – Security , Vulnerabilities and Countermeasures,” Master of Science Thesis in the program Networks and Distributed Systems, Chalmers University of Technology, pp. 1–55, June 2010.

19. H. Onishi, “Paradigm Change of Vehicle Cyber Security,” Proceedings of 4th International Conference on Cyber Conflict (CYCON), pp. 1-11, 2012. 20. S.M. Mahmud, S. Shanker, and S.R. Mosra, “Secure Inter-Vehicle Communications,” Proceedings of SAE World Congress, pp. 8-11, 2004. 21. L. Hu and D. Evans, “Using Directional Antennas to Prevent Wormhole Attacks,” Proceedings of Network and Distributed System Security Symposium (NDSS), pp. 1 - 11, 2004. 22. M. Raya, P. Papadimitratos, and J. Hubaux, “Intervehicular Communications – Securing Vehicular Communications,” IEEE Wireless Communications, Vol. 13, No. 5, pp. 8–15, 2006. 23. H. Moustafa, G. Bourdon, and Y. Gourhant, “Providing Authentication and Access Control in Vehicular Network Environment,” Proceedings of IFIP TC-11 21st International Information Security Conference (SEC), Vol.201, pp. 62–73, 2006. 24. M. Gerlach, A. Festag, T. Leinmuller, G. Goldacker, and C. Harsch, “Security Architecture for Vehicular Communication,” Proceedings of International Workshop on Intelligent Transportation (WIT), pp. 1 – 6, 2007. 25. L.R. Halme, and K.R. Bauer. “AINT Misbehaving: A taxonomy of anti-intrusion techniques,” Proceedings of 18th National Information Systems Security Conference, pp. 163– 172, 1995. 26. U.E. Larson, and D.K. Nilsson, “Securing vehicles against cyber-attacks,” Proceedings of the 4th annual workshop on Cyber security and informaiton intelligence research: developing strategies to meet the cyber security and information intelligence challenges ahead – CSIIRW, pp. 1 – 3, 2008. 27. D. Anurag, S. Ghosh, S. Bandyopadhyay, "GPS based vehicular collision warning system using IEEE 802.15.4 MAC/PHY standard," 8th International Conference on ITS Telecommunications (ITST), pp.154-159, 2008.

208

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 209-234 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

A Survey on Digital Forensics Trends 1

1, 2, 3

Mohsen Damshenas, 2*Ali Dehghantanha, 3Ramlan Mahmoud Faculty of Computer Science and Information Technology, University Putra Malaysia *Corresponding Author [email protected], {alid, ramlan}@upm.edu.my

1. ABSTRACT Digital forensic has evolved from addressing minor computer crimes to investigation of complex international cases with massive effect on the world. This paper studies the evolution of the digital forensic; its origins, its current position and its future directions. This paper sets the scene with exploring past literature on digital forensic approaches followed by the assessment and analysis of current state of art in both industrial and academic digital forensics research. The obtained results are compared and analyzed to provide a comprehensive view of the current digital forensics landscape. Furthermore, this paper highlights critical digital forensic issues that are being overlooked and not being addressed as deserved. The paper finally concludes with offering future research directions in this area. Keywords Digital Forensic, Mobile Device Forensic, Forensic Framework, Network Forensic, String Analysis 2. INTRODUCTION The term computer crime, first used in 1976 in a book by Donn Parker titled “crime in computer” [1],[2] stepped into the legal system by Florida Computer Crimes Act 1978 dealing with unauthorized deletion or modification of data in a computer system [3]. However, the first actual computer analysis and response team was established by FBI in 1984 to conduct advanced digital forensic investigation of the crime scenes [4]. One of the first complicated digital forensic investigation cases was performed in 1986, pursuing a hacker named Markus Hess [5]. Hess had gained unauthorized access to Lawrence Berkeley National Laboratory (LBL) and was detected and investigated by Mr. Clifford Stoll.

At the time of the incident, there was not any standard digital forensic investigation framework in place so Clifford had to do the investigation on his own. As Clifford’s objective was discovering the identity of the hacker, he did not change anything in the system and only collect the possible traces. By tracking the hacker for months using so called alarms which send notification when the attacker was active, he finally managed to discover the identity and location of the attacker by cooperation with the FBI and the Telco Company. Since the case was involved different military, academic and individual bodies in U.S and Germany the jurisdiction of the case became a big issue [6]. The day by day improvement of digital devices makes digital crime way more complicated than it was back in 1986. Nowadays crimes are happening over cloud which mandates cross national forensic investigation. It is therefore an essential demand for the security experts to realize their strengths in investigation of complex digital crimes via studying the history and current trends in the field. Specialists need to understand that digital forensics is not about looking at the past because of having an attack history; neither looking at the present in fear of being attacked; nor about looking at the future with uncertainty about what might befall us but about to being ready all the times for the moving target. This survey offers a critical review and investigation over:  Origins of digital forensics  its evolvement to its current the current position  And its future trends and directions Vividly, using a map without knowing the current position is difficult; realizing the future can only be possible if the current and the past are clear. Hence, section 2 of this paper rectifies history of events and researches in the field in

209

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 209-234 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

the period of 2002 to 2007 while section 3 concentrates only on recent research works in the period of 2008 and 2013 and offers categorization of major fields and a statistical comparison between industrial and academic in the field. Finally, we look into overlooked issues and offer several future research directions in the field. 3. DIGITAL FORENSICS: THEN The formal beginning of academic community research in the area of digital forensic investigation was in 2002 with an article called “Network Forensics Analysis” authored by Corey et al. [7] who studied Network Forensic Analysis Tool (NFAT) and highlighted its benefits in regard to traffic capture, traffic analysis and security issues. In 2004, Stevens [8] illuminated the issue of comparing and correlating time stamps between different time sources and proposed a clock model to address these timing issues by simulating the behavior of each independent time stamp (A.K.A independent clock). This model can be used to remove the expected clock errors from time stamps and develop a more accurate timeline analysis of the events. Forte [9] studied tools and techniques which were widely used for digital forensic investigation namely “GLIMPSE” and “GREP” and explained their syntax and applications. GREP as one of the most popular text search tool was covered comprehensively in this article. Carrier and Grand [10] studied the essential requirements for volatile memory acquisition and proposed a hardware based method to acquire memory data with the least possible changes. This method employed a hardware expansion card (PCI slot card) to create a forensic image of volatile memory with the push of a button. However, this technique required pre-installation of the card on the victim machine. Mocas [11] identified basic properties and abstractions of digital forensic investigation process and proposed a comprehensive framework for unifying all characteristics of digital forensic. Vaughan [12] presented a methodology for evaluating the evidential value of an Xbox game console system. Moreover, he proposed a methodology for evidence extraction and examination from a suspect Xbox system.

Nikkel [13] outlined the forensic investigation analysis of IP networks and domain names. This article defined point of concerns of a presence which automatically collects evidences related to the Internet presences, time-stamped these evidences, store the evidences in a neat manner, generate the integrity hash checksum of the evidence and finally produced an official report of the discovered information. Buchholz and Spafford [14] studied the effects of metadata on digital forensics to find out which information can be useful in a computer forensic investigation. Moreover, they demonstrated potentials of metadata in computer forensic investigation and analyzed issues in obtaining and storing these data. In 2005, Francia and Clinton [15] outlined the required resources and procedures to establish a forensic lab and presented a cost efficient Computer Security and Forensic Analysis (CSFA) experimental lab design and implementation. Nikkel [16] discussed forensic investigation challenges in file and contents recovery of magnetic tape data acquisition and analysis and proposed a methodology for determination of tapes content. Jansen and Ayers [17] provided an overview of the available forensic investigation tools for Personal Digital Assistants (PDA) devices and PDAs software and hardware. They utilized some basic scenarios to evaluate those tools in a simulated environment using simulated evidentiary situations and offered a snapshot of how these tools work under provided situations. Bedford [18] explained the forensic investigation challenges of Host Protected Area (HPA) in IDE drives. The HPA is a part of a hard drive that is hidden from the operating system and the user whichis often used to hide sensitive data and as such is a good source of evidence. In 2006, Harrison [19] described a project for explaining real-life issues in digital forensic investigation and utilized a group-based project for practicing computer investigation in academic environments. Laurie [20] studied well-known Bluetooth security flaws and available techniques and tools to exploit those flaws; and then discussed the effects of these attacks on common digital forensic investigation practices. Nikkel [21] described concepts of distributed network-based evidences, forensic

210

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 209-234 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

techniques of evidence acquisition over the network and the weaknesses of these techniques. In addition, he suggested some improvements to conquer challenges in network data collection. Ieong [22] highlighted major principles of digital forensic investigation and outlined eight roles with their responsibilities in common digital investigation frameworks. The paper proposed a framework named FORZA which is based on Zachman’s framework [23]. Harris [24] studied and analyzed several anti-forensic techniques and classified them to achieve a standard method for addressing anti-forensics issues and then outlined general strategies to preserve forensic integrity during investigation. Garfinkel [25] proposed Forensic Feature Extraction (FFE) and Cross Drive Analysis (CDA) techniques to analyze large data sets of disk images and other types of forensic data. These two methods were used for prioritizing and systematically identifying social networks usage in the suspect’s system. Schuster [26] explored the structure of memory containing processes’ data and then proposed a search pattern to scan the whole memory dump for traces of physical memory objects independent of kernel’s list of memory objects. As a proof of concept, an implementation of the technique successfully revealed some hidden and terminated processes and threads even after rebooting the system. Jeyaraman and Atallah [27] proposed an approach to examine the accuracy and effectiveness of automated event reconstruction techniques based on their capabilities to identify relations between case entities. They then quantified the rate of false positives and false negatives and scalability in terms of both computational burden and memory-usage. Alink et. al., [28] introduced a novel XML based approach for storing and retrieving forensic traces derived from digital evidences which was implemented in a prototype system called XIRAF. This system runs the available forensic analysis tools against the evidence file and then exports the result in an XML database. Johnston and Reust [29] outlined procedures for investigation of an attack which includes compromising of several systems with sensitive data and discussed challenges involved in this process. Mead [30] studied the National Software Reference Library (NSRL) repository

of known software, file profiles, and file signatures by examining the uniqueness of the signatures produced in NSLR repository. This repository is essentially vital for digital forensic investigators to improve the speed of data analysis by omitting the file analysis of trusted files according to the retrieved hash checksum from NSRL. Nikkel [31] presented a forensic evidence collector tool using promiscuous packet capture on an embedded hardware using open source software. This tool operates as a standalone tool in different modesandcan be preconfigured by the investigator. In 2007, Wang et al. [32] analyzed methods and applications of cryptography in digital forensic investigation, and highlighted differences between these methods. Afterwards, the authors discussed the weaknesses of SHA-1 and approaches to crack SHA-1 in order to highlight the issue of potential clashes in checksum verification and possible effects on related applications. Peisert et al. [33] presented the importance of examining the order of function calls for forensic analysis and showed its usefulness in isolating the causes and effects of the attack through intrusion detection systems. This analysis, not only detects unexpected events in the order of function calls, but also detects absence of expected events. Castiglione et al. [34] investigated the issues of hidden metadata in compound documents that use opaque format and could be exploited by any third party. The authors proposed a steganography system for Microsoft Office documentsand introduced FTA and StegOle as tools to improve the forensic analysis of Microsoft Office documents. Murphey [35] proposed a method to automatically recover, repair and analyze Windows NT5 (XP and 2003) events logs. Authors implemented a proof of concept code to repair common corruptions of multiple event logs in one simple step without any manual user intervention. Spruill and Pavan [36] studied the U3 technology for portable applications and illustrated different artifacts left behind from a committed crime through a portable application. This research also investigated some of the common applications used on U3 drives. Turner [37] questioned usefulness of the common digital forensic investigation approaches in live incident

211

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 209-234 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

investigation and demonstrated the applications of Digital Evidence Bag (DEB) storage format in dynamic environments. Richard et al. [38] explained Data Evidence Container (DEC) format to bundle arbitrary metadata associated with evidence along with the actual evidences. Authors then explored the challenges of utilizing this container and proposed Forensic Discovery Auditing Module (FDAM) as a complementary mechanism Nikkel [39] demonstrated digital forensic investigation principles for IPv6 networks and discussed about IPv6 addressing, packet structure, supported protocols and information collection from public registrars such as WHOIS. They finally presented some IPv6 tools for collection and analysis of network traffic and investigation of remote IPv6 nodes. Masters and Turner [40] demonstrated techniques of magnetic swipe card data manipulation in different types of devices and proposed the application of Digital Evidence Bag (DEB) as a suitable format for bulking the evidences obtained from a magnetic swipe card. Lyle and Wozar [41] studied challenges in making a forensic image of a hard disk and addressed practical problems like resolving some faulty sectors causing difficulties in imaging process. Arasteh et al. [42] proposed a formal model checking technique for forensic analysis of system logs and provided a proof of concept system using tableau-based proof

checking algorithm. Arasteh and Debbabi [43] presented a forensic analysis method to extract threads history by using threads stack. The technique used a process model to retrieve data from a thread stack and then compared against an assembly model to verify extracted properties. Schatz [44] proposed a technique for obtaining volatile memory from arbitrary operating systems and provide a “point in time” snapshot of the volatile memory. Body Snatcher, an implementation of this method was used for presenting the proof of concept of the proposed technique. Barik et al. [45] studied the issues of Ex2 file system in terms of authenticity of the OS created timestamps andproposed a solution to preserve authentic date and time stamp for the studied issues using Loadable Kernel Modules (LKMs). An overview of the above mentioned researches, as indicated in Figure 1, shows a smooth growth in the number of digital forensic investigation research articles published in main relevant journals namely "Digital Investigation", "Information Forensics and Security, IEEE Transactions on", "Computers & Security", "Computer Law & Security Review", "Multimedia Tools and Applications", "Computer Standards & Interfaces", "Computers & Mathematics with Applications", "Information Sciences", "Selected Areas in Communications, IEEE Journal on".

Figure 1: number of articles in digital forensics in the period of 2002 – 2008 (No articles matched our search criteria in 2003)

212

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 209-234 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

All the studied articles are extracted by searching the term “forensic” in the abstract of 4 main computer science indexing services known as Science Direct, Springer, IEEE and ACM; though the search was limited only to articles published in computer science during the period of 2002 – 2008.

data tempering [53], program packing [54], generic data-hiding [52], and even disk sanitizing [55], [56]. In spite of all studies in the field of digital forensic investigation we are yet to have a comprehensive reliable study which offers analysis of related scientific research trends in the field. To address the aforementioned issue, we started this survey with searching in computer science research journals indexed by four main scientific database namely Science Direct, Springer, IEEE and ACM for papers published in the period of Jan 2008- Mar 2013 that include “forensic” as an author keyword or in the paper abstract. The result of the search contained articles published in main journals in the field namely "Digital Investigation", "Information Forensics and Security, IEEE Transactions on", "Computers & Security", "Computer Law & Security Review", "Multimedia Tools and Applications", "Computer Standards & Interfaces", "Computers & Mathematics with Applications", "Information Sciences", "Selected Areas in Communications, IEEE Journal on". After several brainstorming sessions to identify main journals in the field, it was decided to keep our focus only on journals which published 2 or more papers in our searching period as shown in Table 1.

4. NOW From the very first days of digital forensic investigation’s life up to now, there were vast improvements in digital forensics techniques ranging from recovering deleted evidences and searching the megabytes storage devices to deal with petabytes storage devices [46], cloud based investigations [47], mobile device examinations [48], wireless network investigations [49], and database forensics [50]. Generally, the current perspective of digital forensic investigation can be categorized into four main types namely Computer Forensic, Smart Device Forensic, Network Forensic and Database Forensic. Among mentioned categories, computer forensics has attracted the most attention of academicians and professionals. On the other hand, digital criminals and intruders are trying to minimize footprints of their actions utilizing anti-forensic techniques. Some of the common approaches of anti-forensics are using cryptography [51], steganography [52], metaTable 1 – Final result of participated Journals with number of related papers in each Journal Name Digital Investigation IEEE Transactions on Information Forensics and Security Computers & Security Computer Law & Security Review Multimedia Tools and Applications ACM Computing Surveys Computer Standards & Interfaces Computers & Mathematics with Applications Information Sciences Personal and Ubiquitous Computing IEEE Journal on Selected Areas in Communications

Number of Related Papers 117 20 9 4 3 2 2 2 2 2 2

213

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 209-234 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Figure 2 – Number of forensics relevant papers from Jan2008 –to Mar2013 Figure 2 offers a ratio of the position; “Digital Investigation” journal (indexed in “Science Direct”) with 117 published articles (66% of total articles in the studied period) has the highest number of published articles followed by “IEEE Transactions on Information Forensics and Security” journal (indexed in “IEEE”) with 20 published articles (12% of total articles in the studied period) and “Computers & Security” journal (indexed in “Science Direct”) with 9 articles (5% of total articles in the studied period). Not being an exception, this research faced some limitations. The main limitation of this study was to include all perspectives of digital forensic using trust worthy sources. The most important selection factor for us was creditability of our sources so we can provide a trustable review of current forensics landscape. As such, we just selected papers from reputable sources which indirectly implicates that we ignored possibly relevant papers which were not published in reputable sources. Selecting journals was not only based on the mere number of articles they published (2 or more) in a specific period (20082013), but to make sure only papers of very relevant journals which are reflecting realistic view of the field are selected. Moreover, to improve the reliability of the given perspective

in addition to reputable academic sources some articles from the SANS reading room were included as the representative of the industrial research. 4.1. Results Obtained The data collection process began with brainstorming among authors to identify major types of digital forensic investigation to classify and rank research papers appropriately. For example, all topics related to smartphone investigation, iPad data acquisition, GPS data analysis and similar subjects were categorized under mobile forensics. ISO9660 analysis, metadata analysis, cluster and all investigations relevant to the science of file systems grouped as file system forensics. The same approach was taken for all other categories as well as shown in Figure 3. Finally, papers in topics like digital audio forensics, malware forensics [57], database forensic and cloud forensic which were not fall in previously mentioned categories and there were 5 or lesser papers in those topics were grouped as “Others” as shown in Figure 4. Identification of general categories was more difficult than it was expected as there were difficulties in categorizing inter-disciplinary papers or papers with wide range of focus. For example, papers under collecting volatile

214

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 209-234 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

memory of mobile phones could be categorized in both volatile memory acquisition and in mobile phones investigation category. Obviously, we could not count a paper multiple times and in such cases, the team carefully read the paper thoroughly to find out the major focus of the paper with specific attention to the paper abstract, keywords and conclusion. Afterwards, we categorized the paper in the most relevant category. It is notable that, the classification of this study is exclusively the opinion of the authors and although all relevant scientific and statistical techniques were employed to ensure correctness and comprehensiveness of the study but all conclusions are subjective to author’s view-points only. Moreover, there were several publications contain news section or discussion that cannot be qualified to be called full article. Each of these papers were analyzed carefully

and included in this survey if they offer valuable knowledge or results about the current state of the art in digital forensic. When investigating each of the journals separately, it is interesting to note different topics emphasized in each journal. Table 2 lists the number of publications in each journal in each topic. Outstanding result indicates that the “Digital Image Forensic” topic took the lead with 20 articles, followed by “Mobile Device Forensic” with 14 articles, “forensic framework” at 12, “forensic tools” and “file system forensic” both at 9 that constitute the top six topics. In “IEEE Transactions on Information Forensics and Security”, articles on anti-forensics were more than all other categories (3 articles), followed by “digital image forensic”, “forensic tools” and “string analysis” with only 1.

Figure 3 – Digital forensic investigation categories in period of 2008 - 2013

215

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 209-234 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Figure 4 – Digital forensic investigation topics with less than 5 papers (“Other” category)

Table 2 – List of journals and number of published articles grouped by the category

216

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 209-234 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

4.1.1. Topics covered by SANS This subsection of the study analyzes articles published in the SANS reading room as a wellestablished organization known as a pioneer in non-academic digital forensic investigation research. SANS papers are considered as a reliable source for studying industrial digital forensic research trends. However, the quantitiy of SANS forensics articles is just a few in compare to the academic journals as publication is not included in the main duties of the forensics practitioners.

We have identified 24 papers in SANS reading room and categorize them similar to the academic papers. The outcome includes “File Forensic”, “File System investigation”, “Forensic Framework Development”, “Forensic Tools Investigation”, “Legal Aspects of Digital Forensics”, “Mobile Device Investigation”, “Operation System Investigation”, “Database Investigation”, “Network Investigation”, “Steganography” and “String Analysis” categories as shown in Figure 5.

Figure 5 – Categories of SANS articles

The further analysis of the SANS articles indicates the differences between academic and industrial research trends. The results show that the share of forensic framework, network forensic and forensic tools from overall SANS articles is %21, %17 and %17; while they have only %8, %5 and %8 of academic journals. On the other hand, in academic environment, digital

image forensic, mobile device forensic and forensic tools were the leading topics. Figure 6 is the result of comparing the common topics between SANS and academic journals while the SANS topics lacks some of the topics covered in academic environments. Figure 7 clearly indicate the results by providing the share of each topic in SANS and academic journals.

217

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 209-234 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Figure 6 – Comparison of common categories in academic and SANS publications in term of number of articles

Figure 7 – SANS vs. Academic Journals (the “Other” in academic journals indicates the topics which were not covered in SANS articles)

218

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 209-234 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

4.2. Discussion and Analysis In this section, we offer detailed analysis of each of 10 most categories identified in the previous sections namely digital image forensic, mobile device forensic, forensic tools, volatile memory forensics, network forensics, antiforensic, data recovery, application forensics, file system forensics and forensic frameworks. We critically analyze trends and current state of the art in each category and rectify possible future trends. 4.2.1. Digital Image Forensics Since the wild increase in usage of digital images, the crimes involving digital images such as image forgery are increased consequently and detection of the image tampering brought many difficulties to forensic investigators and forensic researchers. The major detectable research topics in digital image forensics are image authenticity, image correction, steganography, image processing and source detection. Image authenticity as the most popular topic in Digital Image Forensic category concerns about the reliability of the evidence. Amerini et al. [58] proposed a SIFT based algorithm to detect multiple copy-move attacks on an image. This technique extracts and matches the image features to identify similar local region on the image and then makes a hierarchical cluster of the extracted features to identify the cloned areas. In case the image is classified as unauthentic, the geometrical transformation will be identified to discover the original area and the copy-moved area. Gou et al. [59] introduced a source identification technique with the ability of recognizing various post-processing operations on scanned image. Utilizing the noise analyses through wavelet analysis, neighborhood detection and image de-noising, made this approach capable of detecting the model of scanner used to scan the image and type of the image source (scanner, digital camera or computer generated). Chen et al. [60] introduced a source digital camera identification technique with integrity verification capability based on Photo Response Non-uniformity Noise (PRNU) imaging sensor fingerprint. The PRNU is generated through the maximum likelihood principle derived from normalized model of the sensor output and then compare to a pre-

generated experimental dataset. Yuan [61] introduced a novel method for detection of median filtering in digital images. The key points of this method were the capability of median filtering detection for low resolution, JPEG compressed images, and tampering detection in case a median-filtered part is inserted in a non-median-filtered image and vice versa. Mahdian and Saic [62] studied the addition of locally random noise to tampered image regions for anti-forensic purposes and introduced a segmentation technique for dividing the digital image into various partitions based on homogenous noise levels. Their novel approach utilized tiling the high pass wavelet coefficients at the highest resolution with nonoverlapping blocks, to estimate the local noise level and median based method to estimate the standard noise level of the image. Farid and Bravo [63] introduced a novel methodology for computer aided differentiation of photorealistic computer generated images and photographic image of people using images with different resolution, JPEG compression, and color mixture of the image. Kornblum [64] studied the quantization tables in JPEG compression and explained how quantization tables can be used for differentiating between images processed by software and intact images. Authors utilized other factors such as the presence or absence of EXIF data, signatures of known programs, and color signatures of real skin to increase the success rate of the detections. Mahalakshmi et al. [65] proposed an approach for detection of image manipulations through available interpolation related spectral signature method. This method could detect common forgeries like re-sampling (rotation, rescaling), contrast enhancement and histogram equalization. Moreover, a set of techniques were introduced for detection of global and local contrast enhancement and histogram equalization. Source detection explicitly tries to identify the source device with which the image is taken or edited. Tsai et al. [66], the author employed support vector machines and decision fusion to identify the camera source model of an image. The evaluation result of the proposed model indicates 91.66% accuracy for images take by 26 cameras. Choi et al. [67] proposed a new approach for color laser printer identification of

219

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 209-234 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

an unknown printed image, based on the noisy texture analysis and support vector machine. This method identifies the invisible noises of the schema according to wiener-filter and the 2D Discrete Wavelet Transform (DWT) filter and then the texture of the noise is analyzed via gray level co-occurrence matrix (GLCM). Image processing involves the analysis of the image to find specific patterns for variety of purposes. Islam et al. [68] proposed a framework for detection of children skin in images which utilized a novel vision model based on the Markov Random Fields (MRF) prior by employing a skin model and human affineinvariant geometric descriptor to identify skin regions containing pornographic. Steel and Lu [69] proposed a system called Automated Impersonator Image Identification System (AIIIS), which tracks the used image in impersonation attack back to the original source by employing digital watermarking technique. This technique it encoded access details of the image (like IP address, server and the download date-time) and download them to the image. In [70], the author utilized demosaicing artifacts to identify the digital camera model. The process includes estimating demosaicing parameters, extracting periodicity features of the image for the purpose of detecting simple forms of demosaicing and finally defining a set of image characteristics as features to be used for designing classifiers that distinguish between digital cameras. Image correction as another hot topic in image forensic, mainly intends to ease the analysis of the image. Tang et al. [71] developed a Knowledge Based (KB) approach to remove JPEG blocking artifacts in skin images. The approach utilized a Markov model based algorithm and a one-pass algorithm to implement the inference; plus a block synthesis algorithm for handling the cases with no prior record in dataset. Moreover, in order to reduce the search time in the dataset, the author proposed a multi-dimension indexing algorithm. Lin et al. [72] introduced a method for detecting the image source encoder alongside the estimation of all coding parameters based on the intrinsic fingerprints of the image source encoder. The evaluation of this approach indicated the success rate of more than 90% for

transform-based encoders, sub-band encoders, and DPCM encoders when PSNR is higher than 36dB. Steganography as one of anti-forensic techniques in digital images plays a significant role for identification of the evidence. Huang [73] provided a solution for detection of double JPEG compression using the “proper” randomly perturbed ratio. This approach is highly dependent on finding the correct ratio; thus a novel random perturbation strategy is utilized on the JPEG coefficients of the recompressed image. Kirchner and Bohme [74] challenged the current image tampering detection techniques by presenting types of image transformation operation that cannot be detected using the available resampling detection tools. Among these attacks, resampling with edge-modulated geometric distortion and the dual-path approach are nearly impossible to be detected. 4.2.2. Mobile Device Forensics Due to the wild incline in usage of mobile devices such as smart phones, tablets and GPS devices, investigation of such devices is significantly vital; thus obtaining and analyzing the evidence from mobile devices has a great value. In continue Mobile live memory, Mobile forensic framework, Mobile forensic tools and Mobile on-scene triage topics are discussed. Mobile live memory concerns about imaging and analyzing mobile devices volatile memory. In [75] the authors presented a novel approach to obtain forensic image of an Apple iPad device through iPad camera connection kit and a USB storage device. The results of the evaluation indicated that this method can obtain image up to 30 times faster than acquiring image through the usual method utilizing Wi-Fi channel. Thing et al. [76] proposed a system for real-time digital forensic analysis of mobile memory with focusing on dynamic properties of memory. The system evaluation consists of investigation of different communication variables (i.e. message length, message interval, dump interval, keypress interval) which resulted in 95.6% and 97.8% for dump intervals of 40 and 60 seconds. Sylve et al. [77] illustrated the process of live memory collection of the android devices, the new memory dump module (named DMD or LiME (Linux Memory Extractor)) and the challenges of device independent memory

220

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 209-234 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

acquisition; it then proposed a memory collection method which can dump memory to SD card or network storage. The proposed method operates via “rooting” the android device with methods like “Rage against the Cage” and other privilege escalation techniques. Depends on the type of device, utilizing the right mobile forensic tool always plays a noteworthy role. In [78] authors enriched their previously published methodology for smartphone evidence acquisition (Mobile Internal Acquisition Tool) in term of improvements and assessments. The MIAT can be executed from a memory card like MMC and explores recursively the file system tree and copy each entry to a backup volume. The behavior of MIAT is also evaluated by comparing to Paraben Device Seizure as a wellknown tool. Vidas et al. [79] illustrated the digital forensics collection method of Android device via a modified boot loader from recovery bootimg. The advantage of this method is mainly because it only slightly changes the recovery partition and no user or system partition is affected during the collection stage of the investigation unlike normal approaches which “root” the android device for image acquisition. Mobile forensic framework discusses different procedures and frameworks for mobile device forensics. In [80], the authors investigated the use of mobile phone flash boxes in a forensic framework and proposed a validation procedure to ensure the integrity of the acquired evidence in this new method. Unfortunately using mobile phone flash boxes do not provide forensically sound evidence; even though this proposed method increased the percentage of the reliability of the evidence, it still lacks the sufficient evidence integrity proof and demand to be lifted up by further research. Owen and Thomas [81] studied the available framework for mobile forensic investigations and highlighted the lacks and strength of each in comparison with the common digital forensic investigation practices of hard disk drives. Grispos et al. [82] compared the available mobile forensic toolkit for discovering to what extend these tools can operate and how much they are reliable. The result of this comparison indicated the limitations of using file carvers for information recovery, conflicts of diverse

methods of information recovery, several differences between the documented capabilities of the CUFED and its actual performance, and the fragility of existing mobile forensic toolkits when recovering data from partially corrupted file system images. Mobile on-scene triage determines the priority of evidence collection in mobile devices. In [48], the author presented a methodology for obtaining and analyzing evidences in webOS system partition; the article provides solution for obtaining different types of evidence (i.e. Call logs, contacts, calendar events and etc.). The article also provided approaches for recovering deleted files from a webOS operating system. Eijk and Roeloffs [83] discussed the challenges of obtaining evidence from TomTom GPS device and approaches to acquire volatile memory of the device via either JTAG signals or loading a small Linux distribution to the GPS memory. The article described the structure of data on GPS device and technique for analyzing the obtained data. In [84] the author studied the available framework for forensic investigation of windows mobile equipped mobile phones and showed the similarities of windows mobile investigation with normal windows investigation on PC systems. This work demonstrated that Windows mobile phone investigation obeys the common digital forensic investigation framework and the analysis of collected data is very much similar to windows operating system. Mislan et al. [85] studied the on-scene triage of mobile devices and compare its requirements with common digital forensic investigation requirements; it formalized the on-scene triage process and provides guidelines for standardization of the process. At the end, it defined the basic requirements of an automated on-scene triage. 4.2.3. Forensic Tools Employing different tools in the process of digital forensic is inevitable for the investigator, yet choosing the most suitable tool can affect the result of the investigation. The identified research topics of this category would be as forensic formats, new forensic tools, and forensic tools examination. This topic, new forensic tools, reviews novel tools discussed in forensic articles. In [86] authors presented a forensic investigation tool

221

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 209-234 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

named Cyber Forensic Time Lab which help the investigator to sort all evidences according to their time variables for generating Temporal analysis of the crime. Klaver [87] introduced the forensic application of available tools on Windows CE (Windows Mobile) after studying the typical hardware information and software components. The author explained the usage of current tools for forensic investigation of Windows CE mobile phones. Joyce et al. [88] discussed the Mac OS x forensic investigation approaches and introduced MEGA, a comprehensive tool for analyzing Mac OS x files from an image. MEGA offers ease of access to Spotlight metadata, content search and FileVault encrypted home directories. In [89], authors described their novel techniques to develop a memory analysis tool for variety of operating systems. Inoue et al. [90] provided a new tool for volatile memory imaging of the Mac OS x and then compared it using four metrics of completeness, correctness, speed and interference. Moreover, a visualization method called the density plot is introduced for indicating the density of repeated pages in an image. There is always a demand for testing approaches to examine the performance and accuracy of the tools. In [91] the first generation of computer forensic tools is challenged by the authors as they have limitations and flaws in processing speed, data design, auditability, task analysis, automation and data abstraction; and then the second generation tools requirements is discussed and some available tools is suggested for handling each deficiency. Pan and Batten [92] presented a performance testing approach for digital forensic tools offering testing with good quality via a limited number of observations. The proposed method is specially designed for forensic tool testing and claimed of being the best available software among forensic tool performance testers. Choosing the forensic format of the evidence has an important role in the forensic investigation process; in continue four novel forensic formats are proposed by different authors. In [46], the author introduced a redesign of the Advanced Forensic Format (AFF) based on the ZIP file format specification. This file format support full compatibility with previous versions of AFF

while it offers the capability of storing multiple type of evidence from various devices in one archive and an improved separation between the underlying storage mechanism and forensic software. Garfinkel [93] studied the Digital Forensic XML (DFXML) language and discussed motivations, design and utilization of this language in processing forensic investigation information. Levine and Liberatore [94] introduced DEX, an XML format for recording digital evidence provenance, which enable investigators to use the raw image file and reproduced the evidence by other tools with same functionality. Conti et al. [95] studied automated tools of mapping large binary objects like physical memory, disk image and hibernation files via classifying of regions using a multi-dimensional, information-theoretic technique. 4.2.4. Volatile Memory Forensics Extracting potential evidences from volatile memories is another challenge for forensic investigators as the basic requirements of knowing the structure of memory is not satisfied. The detectable volatile memory forensic topics are data extraction from volatile memory, volatile memory mapping and forensic imaging of volatile memory. Data extraction from volatile memory involves utilizing different tools and techniques to analyze and obtain available evidences. In [96], authors studied the tools and techniques of extracting Windows registry data directly from physical memory dump and then presented an attack against on-disk registry analysis techniques by modifying the cashed registry values in physical memory. Maartmann-Moe et al. [51] proposed a novel technique for identification of cryptographic keys stored in volatile memory and support the method by a proof of concept, Interrogate, which can identify AES, Serpent and Twofish cryptography keys. Baar et al. [97] described a novel approach for identifying and recovering files mapped in physical memory to recognize the source of the data and the usage of them via three different algorithms (carving allocated file-mapping structures, unallocated file-mapping structures and unidentified file pages). Schuster [98] studied nonpaged pool area of the physical memory and its potential in containing massive

222

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 209-234 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

amount of information about the cunning and even closed processes. In continue the author demonstrated the Microsoft Windows operating system pool allocation technique. Volatile memory mapping is an essential research area as it is the primary requirement for volatile memory analysis and data extraction. In [99], the author described an algorithm for detecting paging structure of processes in physical memory dump; this method is based on hardware layer and works on both Windows and Linux with minor tweaking. Stevens and Casey [100] introduced a methodology for recognizing and extracting user command line history in Microsoft Windows operating system after studying the structure of command line data on physical memory dumps. At the end authors provided a proof of concept tool for Microsoft Windows XP. Hejazi et al. [101] introduced a unique technique for collecting case-sensitive information from extracted memory content based on analyzing the call stack and security sensitive APIs. The method is limited to Microsoft Windows XP (SP1, SP2). In [102], authors explained the debugging structures in physical memory and Microsoft Program’s Database (PDB) to extract configurations, network activities and processes information from any Windows NT family operating system. J. Okolica and Peterson [103] demonstrated the significance of the clipboard information in digital forensic investigation and described the structure and the procedure of retrieving copy/paste information from Microsoft Windows XP, Vista, and 7. This technique can obtain information from the software which the data have copied from (i.e. Notepad or WordPad). J. Okolica and Peterson [104] proposed a novel efficient methodology for reverse engineering the Windows drivers and dynamic link libraries. In addition, the authors have revealed the network connection information structure on physical memory which eases the analysis part of the investigation. Even though there is a lack of scientific forensic imaging technique for volatile memories, there is only one research found in the scope of this survey. Rabaiotti and Hargreaves [105] presented a novel method for obtaining forensic image of an embedded device by exploiting a buffer overflow vulnerability and execute

customized code for creating the image of the console memory. The current work only covered Microsoft Xbox gaming console while the idea is applicable to any type of embedded device. 4.2.5. Network Forensics Obtaining evidences from network traffics is a great challenge for investigators mainly due to the live characteristic of network packets. Beverly et al. [106] showed that IP packets, Ethernet frames and other network related data is available in physical memory and these data can be retrieved from hibernation files or memory images. Additionally, the authors proposed the network carving algorithm, techniques and essential tools to identify and extract the information. Thonnard and Dacier [107] presented an analysis framework for extracting known attack patterns from massive amount of honeynet data sets. This method allows the analyst to select different feature vectors and appropriate similarity metrics for creating clusters. Shebaro and Crandall [108] described the privacy related challenges of network flow recording for different purposes and then introduced a network flow recording technique by utilizing identity based cryptography and privacy preserving semantics. Zhu [109] introduced an iterative algorithm to discover network based attack patterns via network traffic logs; this method used a unique feedback mechanism to propagate the chance of being under attack or suspicious score and then pass it to the next iteration without any dependency to human supervision (i.e. defining threshold). Tracking the source of traffic in networks is another challenge when network address translation and other types of network services interfere in the communication. In [110], the author studied the forensic investigation challenges of Network Address Translation (NAT) service enabled networks and proposed a model and algorithm for discovering of observed traffic into the source behind the NAT; the model used specific correlation of NAT gateway artifacts left overs to identify the source. Takahashi et al. [49] focused on reviewing IEEE 802.11 wireless network characteristics based on the traffic pattern of nodes; in order to recognize user fingerprints, rogue access points and media access control protocol missuses. Hsu et al.

223

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 209-234 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

[111] proposed a forensic approach for identifying the voice over IP (VoIP) call source via network operators (NWO) and service providers (SvP) without depending on routers logs. 4.2.6. Anti-Forensic The procedure of forensic investigation can be affected by the criminals via anti-forensic action; researchers challenge the integrity of forensic process by demonstrating new types of anti-forensic method. Evidence collection and common attacks are two identified topics in this area. Distefano et al. [112] analyzed antiforensic attacks on mobile devices and presented some fully automated examples of these attacks. The authors then examined the effectiveness and strength of the implementations, by using them against cursory examination of the device and tools for obtaining the internal memory image. Sun et al. [113] proposed two anti-forensic steganography approaches based on exploiting modification direction (EMD) technique for storing and extracting data in image. The highlight of EMD (HoEMD) and the adaptive EMD (AdEMD) are high efficient and high quality methods introduced in this work. Stamm and Liu [114] introduced anti-forensic approaches for removing forensically important indicators of compression such as image compression history from an image. This technique utilized a novel approach to identify and omit the compression fingerprints of an image’s transform coefficients. Khan et al. [115] presented a novel approach to hide data in a deniable way (to use plausible deniability) by storing sensitive data on a cluster based filesystem; to do so, a covert channel encodes the data via altering the fragmentation pattern in the cluster distribution of the secret file. Rekhis and Boudriga [116] studied the digital forensic investigation techniques of anti-forensic attacks and then characterized secure evidence, provable and non-provable attacks. As the main contribution of this research, the authors developed an anti-forensic attack aware forensic approach using a state-based logic. Casey et al. [117] explained the challenges of full disk encryption (FDE) for digital forensic investigators and provided a guidance to take necessary items from the crime scene, to ease the access to encrypted data or live forensic

acquisition of the running system. The provided measures can help obtaining evidences in an unencrypted state or finding the encryption key or passphrase. 4.2.7. Data Recovery As one of the most essential stage of evidence identification and collection, data recovery can be affected by different variables like file system type; below some of the data recovery techniques are reviewed. In [118], the author analyzed the main properties of the Firefox SQLite database and its use in forensic investigation. Moreover, a novel algorithm is proposed to recover deleted records from unallocated space based on the fact that Firefox utilizes temporary transaction files. Jang et al. (Jang et al., 2012) described the integrity issue of forensic image corruption after acquisition and provided a novel algorithm for recovering and protecting the evidence image using recovery blocks. The recovery process is applicable when data blocks (generate by recovery blocks) are damaged. Yoo et al. [119] highlighted the challenges of multimedia file carving and NTFS compressed file carving; and proposed solutions for these challenges. The solution for multimedia file carvings is based on the characteristics of AVI, WAV and MP3 files. In [120], the author discussed the structure of Windows registry data and the behavior of Windows in deleting the registry records. In continue a technique is developed for identifying the deleted Windows registry records and recovery of deleted keys, values and other attributes of records. Garfinkel et al. [121] studied the usage of cryptographic hashes of data blocks for finding data in sectors and introduced the concept of “distinct disk sector”. Moreover it provided some approaches for better detection of JPEG, MPEG and other compressed data files. Chivers and Hargreaves [122] explored the structure of Windows search database and the challenge of recovering the deleted records after the file is deleted; it then proposed a novel record carving approach for identifying and recovering the deleted database records from database unused space or filesystem. King and Vidas [123] compared the solid state disk (SSD) with normal HDD and proved the TRIM command can affect the investigation process. The analysis from 16

224

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 209-234 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

different disks shows that TRIM enabled disks has almost 0% recoverable data while the SSDs with disabled TRIM has the average of %100 success rate. 4.2.8. Application Forensics The forensic investigation of applications is quite advantageous as these applications usually store specific evidences. Identifying and collecting these evidences demands for prior research on the application behavior. In [124], the authors studied Internet Download Manager (IDM) activities recorded and effects on different files such as log files, Windows registry and history from artifacts point of view. This study demonstrated approaches and told to detect different attributes of download requests like URL, download time and login credentials. Garfinkel (Garfinkel, 2012b) shared the experience of construction a Korean Reference Data Set (KRDS) based on National Software Reference Library RDS (NSRL RDS) and developed a model for both effective importing of NSRL data sets and adding Korean specific data sets. Lallie and Briggs [125] explored three well-known peer-to-peer network clients (BitTorrent, µTorrent and Vuze) and analyzed their artifacts on Windows registry using the effects created by installation and working with these clients. In [126], the authors outlined the significance of web browsers in forensic investigation and proposed a methodology for evidence collection and analysis from web browsers log files. Lewthwaite and Smith [127] looked into the Limewire artifacts remained in Windows registry and other log files. It also has developed a tool, AScan, to identify and recover evidences from unallocated spaces and slack spaces of hard disk drives. Fellows [128] described the significance of recovering the WinRAR temporary files and studied the behavior of WinRAR in creating these temporary files. The results of this research indicated that there is a chance to detect and recover the evidence file from deleted temporary folders while the original file is protected by cryptographic solutions. 4.2.9. File System Forensics Conducting research on file systems structures and attributes is extremely important mainly due to the need to extract evidence from these file

system which can be failed in case of unknown file systems. In [53], the authors studied the structure of file time attributes in the NTFS file system and analyzed the modifications in access, modification and creation time attributes of files/folders by the user under different operating system. Grier [129] proposed a methodology for examining filesystems and detecting emergent patterns unique to copying via stochastically modeling filesystem behavior through routine activity and emergent patterns in MAC timestamps. Beebe et al. [130] explained the functionality, architecture and disk layout of ZFS file system and then discussed the forensic investigation methods available for ZFS. This work also brought some of the forensic challenges of ZFS to light. Carrier [131] investigated the credibility of ISO9660 file system in forensic investigation and the fact that this file system can be used for data hiding. The details of data hiding process is studied and then used to create an image with hidden data for examining the available forensic toolkits. In [132], authors presented a novel approach to identify the disk cluster size without relying on meta data of file system; instead, by detecting the difference between the entropy difference distributions of the non-cluster boundaries and cluster boundaries. Kavallaris and Katos [133] introduced a technique for identification of past pod slurping type of attacks using information stored in filesystem time stamp. This technique infers a file’s transfer rate from access time attribute and correlate it to the common rate of the suspicious USB (obtained from Windows registry) to identify the victim USB device. 4.2.10. Forensic Frameworks Obviously, the forensic frameworks are the basis of the forensic process and developing new frameworks can guarantee the adaptation of forensic science with new technologies. In [134], authors made an extensive survey of available network forensic framework implementations and proposed a generic digital forensic investigation model with comparison to the available models. Finally, the implementation techniques of the proposed model are discussed. Beckett and Slay [135] highlighted the demand for adapting the new technologies with the science of forensic investigation so the digital forensics becomes a

225

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 209-234 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

true forensic science. This article examined the roots of scientific methods to establish principles of digital forensic as a science. In [136], authors discussed the lack of standardized data sets, corpora, as a necessary requirement of forensic research; and presented a taxonomy for defining several available corpora. Guo et al. [137] studied the scientific description of computer forensic characteristics via identifying the basic functionalities of computer forensic investigation procedure; and introduced a functionality oriented validation and verification framework for digital forensic tools by using function mapping approach. Shields et al. [138] proposed PROOFS, a continuous forensic evidence collection tool, which utilizes data retrieval methods on file system forensic. PROOFS generate and save signature for files that are copied, deleted or altered6 over the network. In [139], the authors analyzed the possible effects of a fictitious P2P model on committing a copyright violation crime while dealing with the illegal distribution of digital contents. Cohen et al. [140] introduced an open-source enterprise forensic investigation tool that provides remote access to memory and raw disk on multiple platforms. In continue, it described the architecture of the tool and how it performs enterprise forensic investigation regularly. Kahvedžić and Kechadi [141] presented a novel digital investigation ontology (DIALOG) for the management, reuse and analysis of digital investigation knowledge via creating a general vocabulary capable of describing the investigation at any level. Case et al. [142] explained the issue of correlation between digital investigation tools and then proposed a framework for automatic evidence identification with support of different targets. The implemented prototype presents the integrated analysis of configuration log file, memory image, disk image and network traffic captures. In [143], the author shared the experience of forensic investigation of a commercial closed circuit television digital video recorder by using raw disk analysis of the storage disk. The result of investigation showed that with extra effort on processing the raw video data, it might be possible to extract the evidence in cases where the recording device is not functional. Kao et al.

[144] proposed three analytical tools for clarifying forensic investigation issues of cybercrime using three strategies: Multi-faceted Digital Forensics Analysis (MDFA), Ideal Log and M-N model; in same order, they covered the basic elements of cyber-crime using new definitions of the forensic investigation, traceable elements of the evidence, and finally the ISP log records. Serrano et al. [145] designed a multi-agent system (MAS) debugging framework for developing a forensic analysis (in cases where MASs indicate complex tissues of connection among agents) by utilizing the pathfinder networks (PFNETs). 4.3. Critical overlooked issues Privacy issues caused by digital forensic investigation is one of the topics which deserves more research in future as the issue rises where the investigation can threat the secrecy of unrelated data. The same challenge became even more complicated when the cloud computing and massive shared resources get involved. Neglect in conducting effective researches may lead to a direct conflict with citizens` right of privacy can cause the digital forensic face a deadlock where the law enforcers cannot differentiate between potential evidences and other private data. Studying the available works in users right of privacy shows that the solution can be in successful identification of related evidence objects based on existing privacy policies. The current feasible solution is using formal method to tag pieces if data according to the privacy policy and only then start collecting evidences [146], [147]. Thanks to cloud computing concept, many conflicts introduced to the digital forensic investigation [148]–[150]. The jurisdiction of the data is one of the most challenging topics which seem to be overlooked. The digital forensic community should realize that this conflict will cause massive obstacle in legal aspects of the investigation. Developing a suitable cross national low is one of the solution which has been working on in the past couple of years but it demands much more affords to be feasible [151], [152]. Moreover, only a few articles discuss digital forensic investigation science and digital forensic awareness. It is utterly vital to

226

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 209-234 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

understand that the bests of forensic frameworks may have conflicts with the true nature of forensic science; the majority of these conflicts end up affecting the integrity of precious evidence. As an instance, imaging the physical memory in a forensic manner is one of the challenges which did not attract much of the researchers` attention. It is for this reason that forensic science has emerged as a significant aspect of digital forensics. A well-conducted awareness campaign can help teach and make digital investigators and forensic researchers aware of these challenges. This may also help to update the investigators about the latest technologies and their new conflicts with forensic investigation disciplines on a regular bases; not only a once-off exercise. 5. CONCLUSION AND FUTURE WORKS Digital crime is a moving target, from the era of telephone hackers up to the current state of the complex malware intrusions. With new developments and innovations, new types of crime came along. This survey result has shown that as we entered the twenty-first century, the scope of digital forensic investigation has widened and its focus is fast shifting toward mobile device and cloud based investigations. Digital forensics now requires a more coordinated and focused effort from the national and international society, governments and the private sector. It is no coincidence that the study shows a shift towards mobile device and cloud forensic while the true nature of forensic science becomes the essence of investigation frameworks. This survey results have also shown that most of today’s forensic challenges are to a greater extent in direct conflict with common digital forensic practices. All indicators points to a scientific approach in the future development of the digital forensic discipline. However, as we move forward to address the new challenges it is also critical that we continue strengthening the technologies. Finally, New research efforts is required that minimize the gap between regulatory issues and technical implementations. 6. REFERENCES [1] M. Pollitt, “A History of Digital Forensics,” in Advances in Digital

[2] [3] [4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

Forensics VI, vol. 337, K.-P. Chow and S. Shenoi, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 3– 15. D. B. Parker, Crime by computer. Scribner, 1976. E. Casey, Digital Evidence and Computer Crime. Academic Press, 2004. P. Sommer, “The future for the policing of cybercrime,” Computer Fraud & Security, vol. 2004, no. 1, pp. 8–12, Jan. 2004. S. L. Garfinkel, “Digital forensics research: The next 10 years,” Digital Investigation, vol. 7, Supplement, pp. S64 – S73, 2010. C. Stoll, “Stalking the wily hacker,” Communications of the ACM, vol. 31, no. 5, pp. 484–497, May 1988. V. Corey, C. Peterman, S. Shearin, M. S. Greenberg, and J. Van Bokkelen, “Network forensics analysis,” IEEE Internet Computing, vol. 6, no. 6, pp. 60 – 66, Dec. 2002. M. W. Stevens, “Unification of relative time frames for digital forensics,” Digital Investigation, vol. 1, no. 3, pp. 225–239, Sep. 2004. D. Forte, “The importance of text searches in digital forensics,” Network Security, vol. 2004, no. 4, pp. 13–15, Apr. 2004. B. D. Carrier and J. Grand, “A hardwarebased memory acquisition procedure for digital investigations,” Digital Investigation, vol. 1, no. 1, pp. 50–60, Feb. 2004. S. Mocas, “Building theoretical underpinnings for digital forensics research,” Digital Investigation, vol. 1, no. 1, pp. 61–68, Feb. 2004. C. Vaughan, “Xbox security issues and forensic recovery methodology (utilising Linux),” Digital Investigation, vol. 1, no. 3, pp. 165–172, Sep. 2004. B. J. Nikkel, “Domain name forensics: a systematic approach to investigating an internet presence,” Digital Investigation, vol. 1, no. 4, pp. 247–255, Dec. 2004. F. Buchholz and E. Spafford, “On the role of file system metadata in digital

227

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 209-234 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

forensics,” Digital Investigation, vol. 1, no. 4, pp. 298–309, Dec. 2004. G. A. Francia and K. Clinton, “Computer forensics laboratory and tools,” Journal of Computing Sciences in Colleges, vol. 20, no. 6, pp. 143–150, Jun. 2005. B. J. Nikkel, “Forensic acquisition and analysis of magnetic tapes,” Digital Investigation, vol. 2, no. 1, pp. 8–18, Feb. 2005. W. Jansen and R. Ayers, “An overview and analysis of PDA forensic tools,” Digital Investigation, vol. 2, no. 2, pp. 120–132, Jun. 2005. M. Bedford, “Methods of discovery and exploitation of Host Protected Areas on IDE storage devices that conform to ATAPI-4,” Digital Investigation, vol. 2, no. 4, pp. 268–275, Dec. 2005. W. Harrison, “A term project for a course on computer forensics,” Journal on Educational Resources in Computing, vol. 6, no. 3, p. 6–es, Sep. 2006. A. Laurie, “Digital detective – Bluetooth,” Digital Investigation, vol. 3, no. 1, pp. 17–19, Mar. 2006. B. J. Nikkel, “Improving evidence acquisition from live network sources,” Digital Investigation, vol. 3, no. 2, pp. 89–96, Jun. 2006. R. S. C. Ieong, “FORZA–Digital forensics investigation framework that incorporate legal issues,” digital investigation, vol. 3, pp. 29–36, 2006. J. P. Zachman, “The Zachman FrameworkTM Evolution,” EA Articles (page intentionally left blank), 2009. R. Harris, “Arriving at an anti-forensics consensus: Examining how to define and control the anti-forensics problem,” Digital Investigation, vol. 3, Supplement, pp. 44–49, Sep. 2006. S. L. Garfinkel, “Forensic feature extraction and cross-drive analysis,” Digital Investigation, vol. 3, Supplement, pp. 71–81, Sep. 2006. A. Schuster, “Searching for processes and threads in Microsoft Windows memory dumps,” Digital Investigation, vol. 3, Supplement, pp. 10–16, Sep. 2006.

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

S. Jeyaraman and M. J. Atallah, “An empirical study of automatic event reconstruction systems,” Digital Investigation, vol. 3, Supplement, pp. 108–115, Sep. 2006. W. Alink, R. A. F. Bhoedjang, P. A. Boncz, and A. P. de Vries, “XIRAF – XML-based indexing and querying for digital forensics,” Digital Investigation, vol. 3, Supplement, pp. 50–58, Sep. 2006. A. Johnston and J. Reust, “Network intrusion investigation – Preparation and challenges,” Digital Investigation, vol. 3, no. 3, pp. 118–126, Sep. 2006. S. Mead, “Unique file identification in the National Software Reference Library,” Digital Investigation, vol. 3, no. 3, pp. 138–150, Sep. 2006. B. J. Nikkel, “A portable network forensic evidence collector,” Digital Investigation, vol. 3, no. 3, pp. 127–135, Sep. 2006. S.-J. Wang, H.-J. Ke, J.-H. Huang, and C.-L. Chan, “Concerns about Hash Cracking Aftereffect on Authentication Procedures in Applications of Cyberspace,” IEEE Aerospace and Electronic Systems Magazine, vol. 22, no. 1, pp. 3 –7, Jan. 2007. S. Peisert, M. Bishop, S. Karin, and K. Marzullo, “Analysis of Computer Intrusions Using Sequences of Function Calls,” IEEE Transactions on Dependable and Secure Computing, vol. 4, no. 2, pp. 137 –150, Jun. 2007. A. Castiglione, A. De Santis, and C. Soriente, “Taking advantages of a disadvantage: Digital forensics and steganography using document metadata,” Journal of Systems and Software, vol. 80, no. 5, pp. 750–764, May 2007. R. Murphey, “Automated Windows event log forensics,” Digital Investigation, vol. 4, Supplement, pp. 92–100, Sep. 2007. A. Spruill and C. Pavan, “Tackling the U3 trend with computer forensics,” Digital Investigation, vol. 4, no. 1, pp. 7– 12, Mar. 2007.

228

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 209-234 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

P. Turner, “Applying a forensic approach to incident response, network investigation and system administration using Digital Evidence Bags,” Digital Investigation, vol. 4, no. 1, pp. 30–35, Mar. 2007. G. G. Richard, V. Roussev, and L. Marziale, “Forensic discovery auditing of digital evidence containers,” Digital Investigation, vol. 4, no. 2, pp. 88–97, Jun. 2007. B. J. Nikkel, “An introduction to investigating IPv6 networks,” Digital Investigation, vol. 4, no. 2, pp. 59–67, Jun. 2007. G. Masters and P. Turner, “Forensic data recovery and examination of magnetic swipe card cloning devices,” Digital Investigation, vol. 4, Supplement, pp. 16– 22, Sep. 2007. J. R. Lyle and M. Wozar, “Issues with imaging drives containing faulty sectors,” Digital Investigation, vol. 4, Supplement, pp. 13–15, Sep. 2007. A. R. Arasteh, M. Debbabi, A. Sakha, and M. Saleh, “Analyzing multiple logs for forensic evidence,” Digital Investigation, vol. 4, Supplement, pp. 82– 91, Sep. 2007. A. R. Arasteh and M. Debbabi, “Forensic memory analysis: From stack and code to execution history,” Digital Investigation, vol. 4, Supplement, pp. 114–125, Sep. 2007. B. Schatz, “BodySnatcher: Towards reliable volatile memory acquisition by software,” Digital Investigation, vol. 4, Supplement, pp. 126–134, Sep. 2007. M. S. Barik, G. Gupta, S. Sinha, A. Mishra, and C. Mazumdar, “An efficient technique for enhancing forensic capabilities of Ext2 file system,” Digital Investigation, vol. 4, Supplement, pp. 55– 61, Sep. 2007. M. Cohen, S. Garfinkel, and B. Schatz, “Extending the advanced forensic format to accommodate multiple data sources, logical evidence, arbitrary information and forensic workflow,” Digital Investigation, vol. 6, Supplement, pp. S57 – S68, 2009.

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

M. Taylor, J. Haggerty, D. Gresty, and D. Lamb, “Forensic investigation of cloud computing systems,” Network Security, vol. 2011, no. 3, pp. 4 – 10, 2011. E. Casey, A. Cheval, J. Y. Lee, D. Oxley, and Y. J. Song, “Forensic acquisition and analysis of palm webOS on mobile devices,” Digital Investigation, vol. 8, no. 1, pp. 37 – 47, 2011. D. Takahashi, Y. Xiao, Y. Zhang, P. Chatzimisios, and H.-H. Chen, “IEEE 802.11 user fingerprinting and its applications for intrusion detection,” Computers & Mathematics with Applications, vol. 60, no. 2, pp. 307 – 318, 2010. M. S. Olivier, “On metadata context in Database Forensics,” Digital Investigation, vol. 5, no. 3–4, pp. 115 – 123, 2009. C. Maartmann-Moe, S. E. Thorkildsen, and A. Årnes, “The persistence of memory: Forensic identification and extraction of cryptographic keys,” Digital Investigation, vol. 6, Supplement, pp. S132 – S140, 2009. B. R. Mallio, “Message hiding using steganography, and forensic approaches for discovery,” J. Comput. Sci. Coll., vol. 23, no. 3, pp. 6–6, Jan. 2008. J. Bang, B. Yoo, and S. Lee, “Analysis of changes in file time attributes with file manipulation,” Digital Investigation, vol. 7, no. 3–4, pp. 135 – 144, 2011. V. G. Cerf, “Defense against the Dark Arts,” IEEE Internet Computing, vol. 16, no. 1, p. 96, Feb. 2012. A. Savoldi, M. Piccinelli, and P. Gubian, “A statistical method for detecting ondisk wiped areas,” Digital Investigation, vol. 8, no. 3–4, pp. 194 – 214, 2012. F. N. Dezfoli, A. Dehghantanha, R. Mahmoud, N. F. B. M. Sani, and F. Daryabar, “DIGITAL FORENSIC TRENDS AND FUTURE,” International Journal of Cyber-Security and Digital Forensics (IJCSDF), vol. 2, no. 2, pp. 48–76, 2013. M. Damshenas, A. Dehghantanha, and R. Mahmoud, “A SURVEY ON MALWARE PROPAGATION,

229

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 209-234 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

[58]

[59]

[60]

[61]

[62]

[63]

[64]

[65]

[66]

ANALYSIS, AND DETECTION,” International Journal of Cyber-Security and Digital Forensics (IJCSDF), vol. 2, no. 4, pp. 10–29, 2013. I. Amerini, L. Ballan, R. Caldelli, A. Del Bimbo, and G. Serra, “A SIFT-Based Forensic Method for Copy–Move Attack Detection and Transformation Recovery,” Information Forensics and Security, IEEE Transactions on, vol. 6, no. 3, pp. 1099–1110, 2011. H. Gou, A. Swaminathan, and M. Wu, “Intrinsic sensor noise features for forensic analysis on scanners and scanned images,” Information Forensics and Security, IEEE Transactions on, vol. 4, no. 3, pp. 476–491, 2009. M. Chen, J. Fridrich, M. Goljan, and J. Lukás, “Determining image origin and integrity using sensor noise,” Information Forensics and Security, IEEE Transactions on, vol. 3, no. 1, pp. 74–90, 2008. H. D. Yuan, “Blind forensics of median filtering in digital images,” Information Forensics and Security, IEEE Transactions on, vol. 6, no. 4, pp. 1335– 1345, 2011. B. Mahdian and S. Saic, “Using noise inconsistencies for blind image forensics,” Image and Vision Computing, vol. 27, no. 10, pp. 1497 – 1503, 2009. H. Farid and M. J. Bravo, “Perceptual discrimination of computer generated and photographic faces,” Digital Investigation, vol. 8, no. 3–4, pp. 226 – 235, 2012. J. D. Kornblum, “Using JPEG quantization tables to identify imagery processed by software,” Digital Investigation, vol. 5, Supplement, pp. S21 – S25, 2008. S. D. Mahalakshmi, K. Vijayalakshmi, and S. Priyadharsini, “Digital image forgery detection and estimation by exploring basic image manipulations,” Digital Investigation, vol. 8, no. 3–4, pp. 215 – 225, 2012. M.-J. Tsai, C.-S. Wang, J. Liu, and J.-S. Yin, “Using decision fusion of feature selection in digital forensics for camera

[67]

[68]

[69]

[70]

[71]

[72]

[73]

[74]

[75]

source model identification,” Computer Standards & Interfaces, vol. 34, no. 3, pp. 292 – 304, 2012. J. H. Choi, H. Y. Lee, and H. K. Lee, “Color laser printer forensic based on noisy feature and support vector machine classifier,” Multimedia Tools and Applications, pp. 1–20, 2011. M. Islam, P. A. Watters, and J. Yearwood, “Real-time detection of children’s skin on social networking sites using Markov random field modelling,” Information Security Technical Report, vol. 16, no. 2, pp. 51 – 58, 2011. C. M. S. Steel and C.-T. Lu, “Impersonator identification through dynamic fingerprinting,” Digital Investigation, vol. 5, no. 1–2, pp. 60 – 70, 2008. S. Bayram, H. T. Sencar, and N. Memon, “Classification of digital camera-models based on demosaicing artifacts,” Digital Investigation, vol. 5, no. 1–2, pp. 49 – 59, 2008. C. Tang, A. W. K. Kong, and N. Craft, “Using a Knowledge-Based Approach to Remove Blocking Artifacts in Skin Images for Forensic Analysis,” Information Forensics and Security, IEEE Transactions on, vol. 6, no. 3, pp. 1038–1049, 2011. W. S. Lin, S. K. Tjoa, H. V. Zhao, and K. J. R. Liu, “Digital image source coder forensics via intrinsic fingerprints,” Information Forensics and Security, IEEE Transactions on, vol. 4, no. 3, pp. 460–475, 2009. F. Huang, J. Huang, and Y. Q. Shi, “Detecting double JPEG compression with the same quantization matrix,” Information Forensics and Security, IEEE Transactions on, vol. 5, no. 4, pp. 848–856, 2010. M. Kirchner and R. Bohme, “Hiding traces of resampling in digital images,” Information Forensics and Security, IEEE Transactions on, vol. 3, no. 4, pp. 582–592, 2008. L. Miralles and J. Moreno, “Versatile iPad forensic acquisition using the Apple Camera Connection Kit,” Computers

230

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 209-234 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

[76]

[77]

[78]

[79]

[80]

[81]

[82]

[83]

[84]

[85]

& Mathematics with Applications, vol. 63, no. 2, pp. 544 – 553, 2012. V. L. L. Thing, K.-Y. Ng, and E.-C. Chang, “Live memory forensics of mobile phones,” Digital Investigation, vol. 7, Supplement, pp. S74 – S82, 2010. J. Sylve, A. Case, L. Marziale, and G. G. Richard, “Acquisition and analysis of volatile memory from android devices,” Digital Investigation, vol. 8, no. 3–4, pp. 175 – 184, 2012. A. Distefano and G. Me, “An overall assessment of Mobile Internal Acquisition Tool,” Digital Investigation, vol. 5, Supplement, pp. S121 – S127, 2008. T. Vidas, C. Zhang, and N. Christin, “Toward a general collection methodology for Android devices,” Digital Investigation, vol. 8, Supplement, pp. S14 – S24, 2011. K. Jonkers, “The forensic use of mobile phone flasher boxes,” Digital Investigation, vol. 6, no. 3–4, pp. 168 – 178, 2010. P. Owen and P. Thomas, “An analysis of digital forensic examinations: Mobile devices versus hard disk drives utilising ACPO & NIST guidelines,” Digital Investigation, vol. 8, no. 2, pp. 135 – 140, 2011. G. Grispos, T. Storer, and W. B. Glisson, “A comparison of forensic evidence recovery techniques for a windows mobile smart phone,” Digital Investigation, vol. 8, no. 1, pp. 23 – 36, 2011. O. van Eijk and M. Roeloffs, “Forensic acquisition and analysis of the Random Access Memory of TomTom GPS navigation systems,” Digital Investigation, vol. 6, no. 3–4, pp. 179 – 188, 2010. F. Rehault, “Windows mobile advanced forensics: An alternative to existing tools,” Digital Investigation, vol. 7, no. 1–2, pp. 38 – 47, 2010. R. P. Mislan, E. Casey, and G. C. Kessler, “The growing need for on-scene triage of mobile devices,” Digital

[86]

[87]

[88]

[89]

[90]

[91]

[92]

[93]

[94]

[95]

[96]

Investigation, vol. 6, no. 3–4, pp. 112 – 124, 2010. J. Olsson and M. Boldt, “Computer forensic timeline visualization tool,” Digital Investigation, vol. 6, Supplement, pp. S78 – S87, 2009. C. Klaver, “Windows Mobile advanced forensics,” Digital Investigation, vol. 6, no. 3–4, pp. 147 – 167, 2010. R. A. Joyce, J. Powers, and F. Adelstein, “MEGA: A tool for Mac OS X operating system and application forensics,” Digital Investigation, vol. 5, Supplement, pp. S83 – S90, 2008. A. Case, L. Marziale, and G. G. R. III, “Dynamic recreation of kernel data structures for live forensics,” Digital Investigation, vol. 7, Supplement, pp. S32 – S40, 2010. H. Inoue, F. Adelstein, and R. A. Joyce, “Visualization in testing a volatile memory forensic tool,” Digital Investigation, vol. 8, Supplement, pp. S42 – S51, 2011. D. Ayers, “A second generation computer forensic analysis system,” Digital Investigation, vol. 6, Supplement, pp. S34 – S42, 2009. L. Pan and L. M. Batten, “Robust performance testing for digital forensic tools,” Digital Investigation, vol. 6, no. 1–2, pp. 71 – 81, 2009. S. Garfinkel, “Digital forensics XML and the DFXML toolset,” Digital Investigation, vol. 8, no. 3–4, pp. 161 – 174, 2012. B. N. Levine and M. Liberatore, “DEX: Digital evidence provenance supporting reproducibility and comparison,” Digital Investigation, vol. 6, Supplement, pp. S48 – S56, 2009. G. Conti, S. Bratus, A. Shubina, B. Sangster, R. Ragsdale, M. Supan, A. Lichtenberg, and R. Perez-Alemany, “Automated mapping of large binary objects using primitive fragment type classification,” Digital Investigation, vol. 7, Supplement, pp. S3 – S12, 2010. B. Dolan-Gavitt, “Forensic analysis of the Windows registry in memory,”

231

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 209-234 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

[97]

[98]

[99]

[100]

[101]

[102]

[103]

[104]

[105]

[106]

[107]

Digital Investigation, vol. 5, Supplement, pp. S26 – S32, 2008. R. B. van Baar, W. Alink, and A. R. van Ballegooij, “Forensic memory analysis: Files mapped in memory,” Digital Investigation, vol. 5, Supplement, pp. S52 – S57, 2008. A. Schuster, “The impact of Microsoft Windows pool allocation strategies on memory forensics,” Digital Investigation, vol. 5, Supplement, pp. S58 – S64, 2008. K. Saur and J. B. Grizzard, “Locating ×86 paging structures in memory images,” Digital Investigation, vol. 7, no. 1–2, pp. 28 – 37, 2010. R. M. Stevens and E. Casey, “Extracting Windows command line details from physical memory,” Digital Investigation, vol. 7, Supplement, pp. S57 – S63, 2010. S. M. Hejazi, C. Talhi, and M. Debbabi, “Extraction of forensically sensitive information from windows physical memory,” Digital Investigation, vol. 6, Supplement, pp. S121 – S131, 2009. J. Okolica and G. L. Peterson, “Windows operating systems agnostic memory analysis,” Digital Investigation, vol. 7, Supplement, pp. S48 – S56, 2010. J. Okolica and G. L. Peterson, “Extracting the windows clipboard from physical memory,” Digital Investigation, vol. 8, Supplement, pp. S118 – S124, 2011. J. S. Okolica and G. L. Peterson, “Windows driver memory analysis: A reverse engineering methodology,” Computers & Security, vol. 30, no. 8, pp. 770 – 779, 2011. J. R. Rabaiotti and C. J. Hargreaves, “Using a software exploit to image RAM on an embedded system,” Digital Investigation, vol. 6, no. 3–4, pp. 95 – 103, 2010. R. Beverly, S. Garfinkel, and G. Cardwell, “Forensic carving of network packets and associated data structures,” Digital Investigation, vol. 8, Supplement, pp. S78 – S89, 2011. O. Thonnard and M. Dacier, “A framework for attack patterns’ discovery in honeynet data,” Digital Investigation,

[108]

[109]

[110]

[111]

[112]

[113]

[114]

[115]

[116]

[117]

vol. 5, Supplement, pp. S128 – S139, 2008. B. Shebaro and J. R. Crandall, “Privacypreserving network flow recording,” Digital Investigation, vol. 8, Supplement, pp. S90 – S100, 2011. Y. Zhu, “Attack Pattern Discovery in Forensic Investigation of Network Attacks,” Selected Areas in Communications, IEEE Journal on, vol. 29, no. 7, pp. 1349–1357, 2011. M. I. Cohen, “Source attribution for network address translated forensic captures,” Digital Investigation, vol. 5, no. 3–4, pp. 138 – 145, 2009. H.-M. Hsu, Y. S. Sun, and M. C. Chen, “Collaborative scheme for VoIP traceback,” Digital Investigation, vol. 7, no. 3–4, pp. 185 – 195, 2011. A. Distefano, G. Me, and F. Pace, “Android anti-forensics through a local paradigm,” Digital Investigation, vol. 7, Supplement, pp. S83 – S94, 2010. H. M. Sun, C. Y. Weng, C. F. Lee, and C. H. Yang, “Anti-forensics with steganographic data embedding in digital images,” Selected Areas in Communications, IEEE Journal on, vol. 29, no. 7, pp. 1392–1403, 2011. M. C. Stamm and K. J. R. Liu, “Antiforensics of digital image compression,” Information Forensics and Security, IEEE Transactions on, vol. 6, no. 3, pp. 1050–1065, 2011. H. Khan, M. Javed, S. A. Khayam, and F. Mirza, “Designing a cluster-based covert channel to evade disk investigation and forensics,” Computers & Security, vol. 30, no. 1, pp. 35 – 49, 2011. S. Rekhis and N. Boudriga, “A System for Formal Digital Forensic Investigation Aware of Anti-Forensic Attacks,” Information Forensics and Security, IEEE Transactions on, vol. 7, no. 2, pp. 635–650, 2012. E. Casey, G. Fellows, M. Geiger, and G. Stellatos, “The growing impact of full disk encryption on digital forensics,” Digital Investigation, vol. 8, no. 2, pp. 129 – 134, 2011.

232

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 209-234 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

[118] M. T. Pereira, “Forensic analysis of the Firefox 3 Internet history and recovery of deleted SQLite records,” Digital Investigation, vol. 5, no. 3–4, pp. 93 – 103, 2009. [119] B. Yoo, J. Park, S. Lim, J. Bang, and S. Lee, “A study on multimedia file carving method,” Multimedia Tools and Applications, pp. 1–19, 2012. [120] T. D. Morgan, “Recovering deleted data from the Windows registry,” Digital Investigation, vol. 5, Supplement, pp. S33 – S41, 2008. [121] S. Garfinkel, A. Nelson, D. White, and V. Roussev, “Using purpose-built functions and block hashes to enable small block and sub-file forensics,” Digital Investigation, vol. 7, Supplement, pp. S13 – S23, 2010. [122] H. Chivers and C. Hargreaves, “Forensic data recovery from the Windows Search Database,” Digital Investigation, vol. 7, no. 3–4, pp. 114 – 126, 2011. [123] C. King and T. Vidas, “Empirical analysis of solid state disk data retention when used with contemporary operating systems,” Digital Investigation, vol. 8, Supplement, pp. S111 – S117, 2011. [124] M. Yasin, A. R. Cheema, and F. Kausar, “Analysis of Internet Download Manager for collection of digital forensic artefacts,” Digital Investigation, vol. 7, no. 1–2, pp. 90 – 94, 2010. [125] H. S. Lallie and P. J. Briggs, “Windows 7 registry forensic evidence created by three popular BitTorrent clients,” Digital Investigation, vol. 7, no. 3–4, pp. 127 – 134, 2011. [126] J. Oh, S. Lee, and S. Lee, “Advanced evidence collection and analysis of web browser activity,” Digital Investigation, vol. 8, Supplement, pp. S62 – S70, 2011. [127] J. Lewthwaite and V. Smith, “Limewire examinations,” Digital Investigation, vol. 5, Supplement, pp. S96 – S104, 2008. [128] G. Fellows, “WinRAR temporary folder artefacts,” Digital Investigation, vol. 7, no. 1–2, pp. 9 – 13, 2010. [129] J. Grier, “Detecting data theft using stochastic forensics,” Digital

[130]

[131]

[132]

[133]

[134]

[135]

[136]

[137]

[138]

[139]

Investigation, vol. 8, Supplement, pp. S71 – S77, 2011. N. L. Beebe, S. D. Stacy, and D. Stuckey, “Digital forensic implications of ZFS,” Digital Investigation, vol. 6, Supplement, pp. S99 – S107, 2009. B. D. Carrier, “Different interpretations of ISO9660 file systems,” Digital Investigation, vol. 7, Supplement, pp. S129 – S134, 2010. M. Xu, H.-R. Yang, J. Xu, Y. Xu, and N. Zheng, “An adaptive method to identify disk cluster size based on block content,” Digital Investigation, vol. 7, no. 1–2, pp. 48 – 55, 2010. T. Kavallaris and V. Katos, “On the detection of pod slurping attacks,” Computers & Security, vol. 29, no. 6, pp. 680 – 685, 2010. E. S. Pilli, R. C. Joshi, and R. Niyogi, “Network forensic frameworks: Survey and research challenges,” Digital Investigation, vol. 7, no. 1–2, pp. 14 – 27, 2010. J. Beckett and J. Slay, “Scientific underpinnings and background to standards and accreditation in digital forensics,” Digital Investigation, vol. 8, no. 2, pp. 114 – 121, 2011. S. Garfinkel, P. Farrell, V. Roussev, and G. Dinolt, “Bringing science to digital forensics with standardized forensic corpora,” Digital Investigation, vol. 6, Supplement, pp. S2 – S11, 2009. Y. Guo, J. Slay, and J. Beckett, “Validation and verification of computer forensic software tools—Searching Function,” Digital Investigation, vol. 6, Supplement, pp. S12 – S22, 2009. C. Shields, O. Frieder, and M. Maloof, “A system for the proactive, continuous, and efficient collection of digital forensic evidence,” Digital Investigation, vol. 8, Supplement, pp. S3 – S13, 2011. S.-J. Wang, D.-Y. Kao, and F. F.-Y. Huang, “Procedure guidance for Internet forensics coping with copyright arguments of client-server-based P2P models,” Computer Standards & Interfaces, vol. 31, no. 4, pp. 795–800, Jun. 2009.

233

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 209-234 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

[140] M. I. Cohen, D. Bilby, and G. Caronni, “Distributed forensics and incident response in the enterprise,” Digital Investigation, vol. 8, Supplement, pp. S101 – S110, 2011. [141] D. Kahvedžić and T. Kechadi, “DIALOG: A framework for modeling, analysis and reuse of digital forensic knowledge,” Digital Investigation, vol. 6, Supplement, pp. S23 – S33, 2009. [142] A. Case, A. Cristina, L. Marziale, G. G. Richard, and V. Roussev, “FACE: Automated digital evidence discovery and correlation,” Digital Investigation, vol. 5, Supplement, pp. S65 – S75, 2008. [143] N. R. Poole, Q. Zhou, and P. Abatis, “Analysis of CCTV digital video recorder hard disk storage system,” Digital Investigation, vol. 5, no. 3–4, pp. 85 – 92, 2009. [144] D.-Y. Kao, S.-J. Wang, and F. F.-Y. Huang, “SoTE: Strategy of Triple-E on solving Trojan defense in Cyber-crime cases,” Computer Law & Security Review, vol. 26, no. 1, pp. 52 – 60, 2010. [145] E. Serrano, A. Quirin, J. Botia, and O. Cordón, “Debugging complex software systems by means of pathfinder networks,” Information Sciences, vol. 180, no. 5, pp. 561 – 583, 2010. [146] A. Dehghantanha, N. Udzir, and R. Mahmod, “Evaluating user-centered privacy model (UPM) in pervasive computing systems,” Computational Intelligence in Security for Information Systems, pp. 272–284, 2011. [147] C. Sagaran, A. Dehghantanha, and R. Ramli, “A User-Centered Contextsensitive Privacy Model in Pervasive Systems,” in Communication Software and Networks, 2010. ICCSN’10. Second International Conference on, 2010, pp. 78–82. [148] S. Biggs and S. Vidalis, “Cloud Computing: The impact on digital forensic investigations,” in Internet Technology and Secured Transactions, 2009. ICITST 2009. International Conference for, 2009, pp. 1 –6. [149] M. Damshenas and A. Dehghantanha, “Forensics Investigation Challenges in

Cloud Computing Environments,” presented at the The International Conference on Cyber Security, Cyber Warfare and Digital Forensic, Kuala Lumpur, Malaysia, 2012, pp. 190–194. [150] F. Daryabar, A. Dehghantanha, N. I. Udzir, N. F. binti M. Sani, and S. bin Shamsuddin, “A REVIEW ON IMPACTS OF CLOUD COMPUTING ON DIGITAL FORENSICS,” International Journal of Cyber-Security and Digital Forensics (IJCSDF), vol. 2, no. 2, pp. 77–94, 2013. [151] T. Hasani and A. Dehghantanha, “A Guideline to Enforce Data Protection and Privacy Digital Laws in Iran,” Proceedings of International Conference on Software and Computer Applications (ICSCA 2011), pp. 3–6, 2011. [152] P. Hunton, “The stages of cybercrime investigations: Bridging the gap between technology examination and law enforcement investigation,” Computer Law & Security Review, vol. 27, no. 1, pp. 61 – 67, 2011.

234

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 235-245 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

e-Fraud Forensics Investigation Techniques with Formal Concept Analysis 1

WAZIRI, Victor Onomza, 2Abdullahi Umar, 3MORUFU Olalere, 1,,3 Cyber Security Science Department, 2Department of Computer Science School of Information and Communication Technology Federal University of Technology, Minna-Nigeria

Abstract One of the cardinal impacts of Cyber Security (CS) is the combating of financial e-fraud and other related crimes. Formal Concept Analysis (FCA) could serve as a great useful weapon in the detection and retrieving of various crime activities perpetrated on the cyberspace. This paper proposes a CS-based investigation process through visualization and data analysis of mobile communication devices using the formal Concept Analysis which is a data analysis technique based on lattice theory and propositional Calculus. The method to be employed visualizes the lattice that maybe conceived as a set of common and distinct attributes of data in such a way that classifications are done based on related data with respect to time and events of the crime performance within a some Internet geographical space. The lattices were then built using Galicia 3.2 Lattice building software using the data connected of criminal activities that were collected from the mobile phones, Laptops etc.. The results obtained were used in building a more defined and conceptualized system for analysis of e-fraud data in the cyberspace that could be easily visualized and intelligently analysed by the cyberspace computational systems processes. Keywords: Financial Crime, lattice Theory, Formal Concepts analysis, data analysis and visualization 1.

INTRODUCTION

Radim (2008), propounded Formal Concept Analysis (FCA) as a method of data analysis that describes relationship between a particular set of objects and a particular set of attributes. FCA produces two kinds of attributes from the input data. The first is concept lattice which is a collection of formal concept in the data that are hierarchically ordered by a sub-conceptsuperconcept relationship. The formal concepts are particularly clusters which represent natural human-like concepts such as may be visualized in an international

network of botnet syndicates. The second output of FCA consists of attributes implication which describes a particular dependency that is valid in the data in the syndicate cycle of operations. The most important feature of FCA is the integration of three components (a, b, c) (where ―a‖ signifies Discovery, ―b‖ reasoning and ―c‖ output) of conceptual processing of data and knowledge. These FCA components are Discovery and Reasoning which underlines the formation of concepts in data lattices or a network syndicate. Discovery and Reasoning involve dependencies in database; visualizing of the database, picturing the conceptualization of 235

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 235-245 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

the dependencies with the current folding or unfolding capabilities outputs. This could be

visualized conceptually as in figure 1: preventing crimes in various ramifications especially in digital forensics analysis.

Figure 1: Concept Lattice that shows a visualized construe of FCA. This could be construed as a botnet syndicate over some specified geographical space Virtually, all societies in the modern world are troubled by some related fraud activities such as e-fraud in some abstracted cybercrime environment in this modern cyberspace advancement. Most of these cybercrimes are carried out by individuals or organized groups in the form of syndicate over large geographical territories. While cybercrime rate may vary from one country to another and from one region to another, black hackers behaviour remains a case for concern amongst most members of the public commercial and governmental institutions; thus hinder e-Governance and e-commerce efficient utilization of administrative secret information and gaining substantial benefits in financial realizations respectively. Cybercrime prevention and control have been government’s prerogatives through various enacted law enforcement agencies. With the increasing use of the computerized systems to track cybercrimes penetrations and malware detections, computer data analysis have begun to assist law enforcement agencies and detectives to speed up the process of

The most efficient and effective way of fighting cybercrimes today cannot be construed without geographical profiling. Black Hacks activities have become very complex in such a way that rapid monitory can only be achieved by using intelligent sensors within geographical components (Kester, 2013) and analyzing using analytical efficient tools. Digital Forensic investigative approaches are needed to analyse connected series of crimes data on the Internet to determine the most probable area of offender residence (but this could be difficult due to the application of proxies used by botnet masters). By incorporating both qualitative and quantitative methods, we shall assist in understanding spatial behaviour of an offender and focussing the investigation to a smaller area of the local network. Typically used in cases of serial murder or theft, the technique could help the police detectives to prioritize information in largescale major crime investigations that often involve hundreds of interconnected nodes and informants (Wikipedia 2013). A criminal pattern of network computer analysis in some given location on the Internet is very crucial in cybercrimes detection. Discovering the relationship between Computer systems on the Local networks have to be engaged in order to gather and interpret intelligence so as to control the network interrelationship connectivity as well as influence effective decision making as can be conceptualized

236

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 235-245 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

below

diagrammatically

in

figure

Methodology of the FCA, Section 4 depicts some experimental evidences of section 3 and their interpretations, giving three basic visualization figures for an intelligent abstraction, section 5 summarizes the research through some short statements and some vital recommendations for future research work. 2.0

Figure2: Pattern analysis theory Figure 2 describes some unary binary relationship that exists between some crime pivotal spot as the root for discovery through Intelligence analysis being applied to decipher the crime, therefore, deduces some intelligence decision that would make impacts over the crime interoperable scene and the root. This paper proposes a CS-based digital forensic investigation for some abstractive abstractive e-fraud environment by visualizing and analyzing data of mobile communication devices using Formal context and Galois lattices, a data analysis technique based on Lattice Theory Propositional Calculus. This method considered the set of common and distinct attributes of data in such a way that categorizations are done based on related data found on evidence nodes with respect to time and events. All these deductions are done without concrete adduced dataset. This will help in building a more defined and efficient conceptualized systems for analysis of digital forensics data that could easily be intelligently analyzed and visualized by computer systems on the mobile computing space. The rest of the paper is structured as follows: Section 2 reviews some literatures on ground, section 3 gives an in-depth theoretical

RELATED WORKS

Cyber Crimes activities are geospatial phenomena through the Internet and as such are geospatially, thematically and temporally correlated. Thus, Cyber crime datasets must be interpreted and analysed in conjunction with various factors that can contribute to the formulation of some specific crime on the Internet. Discovering these correlations allows a deeper insight into the complex nature of Cyber criminal behavioural pattern (McGarrell & Schlegel, 1993). There are challenges faced in today’s world in terms of Cyber crime analysis when it comes to graphical visualization of the crime patterns. Geographical representation of cybercrime scenes and crime types has become very important in gathering intelligence about crimes. This provides a very dynamic and easy way of monitoring cyber criminal activities and analysing them as well as producing effective countermeasures and preventive measures in solving them. Kester, (ibid) proposed a new method of visualizing and analysing crime patterns based on geographical crime data. In the United States, the FBI collaborates with local law enforcement and prosecutors to share intelligence and efforts through teamwork has demonstrated effectiveness in addressing traditional crime involving drugs, weapons, gangs, and violence (McGarrell & Schlegel, 1993; Russell-Einhorn, 2004). By extension, many scholars and practitioners has asserted 237

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 235-245 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

the importance of forming comparable teams to combat digital forensics crime with the hope of similar positive outcomes (Conly & McEwen, 1990). Rogerson and Sun (2001) described a new procedure for testing changes over time in the spatial pattern of point events, combining the nearest neighbour statistic and cumulative sum methods. The method results in the rapid detection of deviations from expected geographical patterns. The method was illustrated using 1996 arson data from the Buffalo, NY, Police Department. The appearance of patterns could be found in different modalities of a domain, where the different modalities refer to the data sources that constitute different aspects of a domain. Particularly, the domain that refers to crime and the different modalities refer to different data sources within the crime domain such as offender data, Weapon data, etc. In addition, patterns also exist in different levels of granularity for each modality. In order to have a thorough understanding of a domain, it is important to reveal hidden patterns through the data explorations at different levels of granularity and for each modality. Therefore, Yee Ling Boo and Alahakoon, D presented a new model for identifying patterns that exist in different levels of granularity for different modes of crime data. A hierarchical clustering approach – growing self organizing maps (GSOM) has been deployed. The model was further enhanced with experiments that exhibit the significance of exploring data at different granularities (Boo and Alahakoon, 2008). Formal Concept Analysis (FCA) is a method of data analysis with growing popularity across various domains that can easily be extended to the cyberspace similarity. FCA

analyses data that describes relationship between a particular set of objects and a particular set of attributes. Such data commonly appear in many areas of human activities. FCA produces two kinds of output from the input data. The first is a concept lattice. As concept lattice is a collection of formal concepts in the data which is hierarchically ordered by a sub-concept super concept relations. Formal concepts are particular clusters which represent natural human-like concepts such as ―organism living in the water‖, ―car with all wheel drive system‖, ―number divisible by 3 and 4‖, etc. The second output of FCA is a collection of socalled attribute implications. An attributes implication describes a particular dependency which is valid in the data such as ―every number divisible by 3 and 4 is divisible by 6‖, every respondent with age over 60 is retired‖, etc (Radim, 2008). Modern police organizations and intelligence services are adopting the use of FCA in crime pattern analysis for tracking down criminal suspects through the integration of heterogeneous data sources and visualising this information so that a human expert can gain insight in the data (Russsell and Einhorn, 2004). 3.0

METHODOLOGY

In this section, we give a detail mathematical theory representation of the FCA analysis [12]. It could also be used in data analysis, information retrieval ( a useful tool for Digital Forensics and Internet Network Security analyses), and Knowledge discovery. It could also be useful in the application as a a conceptual clustering method, which clusters simultaneously objects and gives formative clearance for their descriptions. It is

238

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 235-245 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

applicable for efficiently computing association rules. Thus, FCA makes concepts as units of thoughts, that consists of two main parts:

B´:= {g ∈ G | ⩝m ∈ B :< g, m> ∈I }. For A, A1, A2 ⊆ G holds: 

a. The extension consists of all objects belonging to the concept; and b. The intension consists of all attributes common to all those objects

A - A

B - B

Thus as outlined [12, 13] we present the FCA Algorithm as follow:

A formal concept is a pair (A, B)

In FCA as a formal concept consists of a set of objects, G, a set of attributes, M, and a relation between G and M, I ⊑ G × M; where I is a binary relation of G and M. A formal concept is a pair (A, B) where A ⊑ G and B ⊑ M. Every object in A has every attribute in B. For every object in G that is not in A, there is an attribute in B that object does not have. For every attribute in M that is not in B, there is an object in A that does not have that attribute. A is called the extent of the concept and B is called the intent of the concept. Abstractive details of FCA runs as these:

• A is a set of objects (the extent

If g ∈ A and m ∈ B then ∈ I, or g I m. A formal concept is a triple ; where   

G is a set of object; M is a set of attributes; and •I is a relation between G and M. ∈I is read as



―object g has attribute m‖. For A ⊆ G, we define

where

of the concept),

•B

a

set

of

attributes

(the

intent of the concept), •A—— B and B —— A

The concept lattice of a formal context (G, I) is the set of all formal concepts of (G, I) Together with the partial order (A1, B1)

 

B is a set of attributes intent of the concept), A – B and B – A.

(the

The concept lattice of a formal context (G, M, I) is the set of all formal concepts of (G, M, I) together with the partial order (A1, B1) ≤ (A2, B2): - A1 ⊑ A2 ( B1⊑ B2)

The concept lattice is denoted by (G, M, I) 

Theorem: The concept lattice is a lattice, i.e. for two concepts (A1, B1) and (A2, B2), there is always;



A greatest common super-concept: (A1⋂A2, (B1⋃ B2) ´´)

A´:= {m ∈ M | ⩝g ∈ A :< g, m> ∈ I }.

is

For B ⊆ M, we define dually

239

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 235-245 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)



And a least common super-concept: ((A1 ⋃ A2) ´´, B1⋂B2)

More general, it is even a complete lattice, i.e. the greatest common sub-concept and the least common super-concept exist for all (finite and infinite) sets of concepts. Def.: Let X ⊆M. The attributes in Y are independent, if there are no trivial dependencies

between them. 4.0

EXPERIMENTAL RESULTS

Let us consider e-fraud events such that F is the set of events and E1, E2….En are subsets off. Let figure E = [E1, E2……En] = (Financial fraud, Money Laundering, ATM Bombing, ...). Let the objects be the events in some sub-geographical jurisdiction regions (internet network) for thr e-fraud (in case the e-fraud is Covered by some syndicates that involves different geographical regions through hacking connectivity arrangement

Corollary: The set of all concept intents of a formal context is a closure system. The corresponding closure operator is h(X):= X``. An implication X→Y holds in a context, if every object having all attributes in Y also has all attributes in X.

such as botnet) in which the events occurrence is a set depected by the zombie nodes (A, B, C, D, E, F, G, H, I, J and K) as indicated on the Figure 3 map. Note that each cluster in figure 3 may consist of zombies; a set of compromised computational nodes. Let P be the set of suspects intending to be investigated and PI, P2…..Pn are the elements that belong to P. The concept lattice of context as shown n figure 4, is the set of all formal concepts of which are aggregated to form clusters

Figure 3: e-Fraud Syndicate Network Connectivity

Table1: Events x (Geographical locations and Persons)

240

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 235-245 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Figure 4: Galois lattices of Intent and Extent

241

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 235-245 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Figure 5: Galois lattice of INTERPRETATION



Galois lattice (graph) is generated from the input data table (formal context) in figure 2. The object (Event or Crime etc) is represented in a row and attributes (Suspects/persons) are represented in a column. The symbol X indicates a binary relation between the objects and the attributes.

 

The numbers 1, 2, 3…….n in the column represents the objects or Crime and letters a, b, c……. k represents geographical locations where the offences are commuted as can be seen from figure 1, and P1, P2……..Pn represents the suspects involved in the various crime. The graph (Galois lattice) is made up of the following features:

 

The concept in hierarchical order represented by values in small ellipse The line diagram showing the relationship The objects (Event or crime) otherwise called ―Extent‖ (E) The attributes (people involved in the crime) otherwise called ―Intent‖ (I) The geographical locations a, b, e, f…….. k.

Note: Concept – represents the set of Events sharing the same values for certain set of attributes. In this paper, we consider the attributes of the concepts, 24, 25, 26, and 49 in relation to the attributes of concept 28 from figure 3, as indicated in the second hierarchy of the concept. Thus;

242

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 235-245 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Figure 6: Extraction of figure 3 works. FCA tools need further improvement since

Cybercrimes

require

more

intelligent

law

those used today were designed in the early 20’s.

enforcement departments toward successfully

It is hoped that past knowledge can be assimilated

identifying and retrieving details from the

with current observations of computer-related

harddisks for scientific analysis and preparing

criminality to inform and guide the science of

them for comprehensive presentations in the

police investigations in the future.

courts of law. Nonetheless, retrieving the crimes

Formal Concept Analysis (FCA) was used to

deatails needs some development of efficient

analyze crime data-based on information gathered

software through intensive research

from suspects’ mobile communication devices

References

such as mobile phone, tablets etc. Visualization of relationships between the occurrences of various crime events within different geographical areas was

achieved

successfully.

This

method

considered the set of common and distinct attributes of data in such away that categorization was done based on relationships between concepts. The result from the approach will help in building a more defined and conceptual systems that will make data relationships to be easily visualized and intelligently analyzed by ICT-based investigation systems. 5.0 RECOMMENDATION/ CONCLUSION Nowadays, Cybercrimes are increasing over all over the Internet are increasing by leaps.

[1]

Hinduja, S. (2007): Computer crime investigations in the United States: leveraging Knowledge from the past to address the future. International Journal of Cyber Criminology, 1(1), 1-26. [2] QA Kester (2013): Visualization and analysis of geographical crime patterns using formal concept analysis. International Journal of Remote Sensing and Geo-science (IJRSG) 2 (1), 30-35 [3] A Geographic profiling retrieved from http://en.wikipedia.org/wiki/Geographicgrof iling #citeref-2 [4] Russell-Einhorn, M. L. (2004): FederalLocal Law Enforcement Collaboration in Investigating and Prosecuting Urban Crime, 1982-1999: Drugs, Weapons, and Gangs (No. NCJ201782): National Institute of Justice. McGarrell, E. F & Schlegel, K. (1993): The implementation of federally funded multi

243

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 235-245 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

[6]

[7]

jurisdictional taskforces: Organizational structure and inter agency relationships. Journal of Criminal Justice, 21(3), 231-244. Conley, C. H., & McEwen, J. T. (1990): Computer crime, U.S. Department of Justice. National Institute of Justice. P. Rogerson, Y. Sun (November, 2001,) Spatial monitoring of geographic patterns: an application to crime analysis Computers, Environment and Urban Systems, Volume 25, Issue 6, Pages 539— 556 Yee Ling Boo; Alahakoon, D (Dec. 2008), "Mining Multi-modal Crime Patterns at Different Levels of Granularity Using Hierarchical Clustering, "Computational Intelligence for Modelling Control &

Automation, 2008 International Conference on, vol., no., pp.12681273,10- 12 [9] Radimb ―Elohl Avek introduction to formal concept analysis Olomouc; 2008. [10] Security Informatics, Special issues on Formal Concept Analysis in Intelligence and Security Informatics Springer, 2012. [11] Analysis Home page from: http://www.upriss.org.uk/tea/tea.html [12] Gard Stumme [2002]: A Tutorial on Formal Concept Analysis [13] Radim, Belohlavek (2008): Introduction to Concept Analysis, Department of Computer Science, Palacky University, Olomouc

Bibliography Dr. Victor O. Waziri is an Associate Professor in the Department of Cyber Security, Federal Federal University of Technology, Minna-Nigeria. His Computational Research is based on Computational Intelligence with Applications on Cyber Security related problems. In most cases, Matlab, Maple and Mathematica are the basis for his accessory in modeling and Simulations in Modern Cryptographic analyses. His researches area also extends into Computational Optimization and in Zero-day Malware Detections. He has published many papers in reputable Journals at both International and Local Levels. He Lectures various courses in the Department that include Cryptography, Network Security, Clouds Security, Data Mining, Computational Theory, Automata and Programming Languages Morufu Olalere was born in Ode-omu, Osun state, Nigeria in 1980. He has B.Tech Industrial mathematics/computer science and M.Sc. in Computer science (Biometrics) from Federal University of Technology, Akure and University of Ilorin, Nigeria in 2005 and 2011 respectively. He is a LECTURER from department of Cyber Security Science, Federal University of Technology Minna, Nigeria and currently on study fellowship as a PhD student in the faculty of computer Science and Information Technology, University Putra Malaysia. His research interest includes biometric, network security, cryptography and security in Bring Your Own Devices. Mr. Olalere is a member of Computer Professionals Registration Council of Nigeria (CPN), a member of Nigeria Computer Society (NCS) and a member of Association for Information Systems (AIS). he is a certified ethical hacker.

244

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 235-245 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Abdullahi Bn Umar is a lecturer in the department of computer science Federal College of Education, Kano. He bagged PGDE, TRCN, B. TECH in Computer Physics from Federal University of Technology, Minna and currently undergoing M. TECH degree program in computer science in the same University (FUTMIN). Abdullahi Bn Umar is a member of African Educational Research Network (AERN) and has contributed to literature through article publications in reputable journals.

245

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 246-261 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Security Breaches, Network Exploits and Vulnerabilities: A Conundrum and an Analysis Oredola A. Soluade Iona College Hogan School of Business New Rochelle - New York 10801 [email protected] Emmanuel U Opara College of Business Prairie View A&M University Prairie View - Texas U.S.A. [email protected]

ABSTRACT Enterprise systems are continuously on cyberattacks as struggle for solutions are sort. Today, these organizations spend over $70 billion on IT security, but are unable to protect the organization since cyber criminals routinely discover exploits and breach those defenses with zero-day attacks that bypass traditional technologies. These attacks occur during the vulnerability window that exists in the time between when a loophole is first exploited and when software developers start to develop and publish a counter to that threat. Many organizations had been breached by the zero-day attack concept. This means that at least one attacker had bypassed all layers of organization defense-in depth architecture. Using data from our survey of 202 participants, this study analyses how attacks are changing the cyber platform and why traditional and legacy defenses are functioning below par and expectations. KEYWORDS: Security Breaches, Vulnerabilities, Threats, Malware, Adware, Exploits.

1 INTRODUCTION

Enterprise systems function in a globallyconnected world, which constantly are witnessing globally-distributed cyber threats. Study has indicated, these threats are not restricted by geographical boundaries, but are targeted at all technologies, hardware/software/service providers, endusers, consumers, private and the public sector alike. The cost and frequency of these incidences are on the rise. There threats have evolved as a new dimension of interest and concerns, gaining political and societal attention. Understanding the exponential growth and magnitude of cyber threats, the future tasks and responsibilities associated with cybersecurity is critical to enterprise systems competitiveness, survival and profitability. Cyber-attacks are now facts of organizational concerns because the rates of occurrences are growing exponentially [1]. Today's security defenses are failing because organization's legacy platforms leverage technologies have been dependent on signatures as security authentication mechanisms. These platforms may be good at blocking basic malware that

246

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 246-261 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

are known and documented, however legacy platform technologies do not stand a chance against today's sophisticated, dynamic cyberattacks that occur across multiple vectors and stages of the cyber platform. Examples are biological weaponries, nukes, climate change, zero-day attacks, and transactional crimes. An example for bio-threats is the public health community’s dream making the arrival and spread of communicable diseases as easy to predict and track as weather. This study found that potential exploits, and threat levels have escalated to the point that with time, motivation, and funding, a resolute criminal will likely be able to penetrate any system that is accessible directly from the Internet In 2013, Hackers concentrated on placing malware content on enterprise systems and organization’s Websites as a means to gain entrance into the network. These criminals employ “drive-by-downloads” from fake websites as well as “water hole to exploit potential vulnerabilities. Also in same year, a press report describes pages on an official U.S. Government website that contain content promoting third-parties products that were enriched with malicious payloads. As these threats evolve, they could infiltrate the interior of the network, the core, the distribution layer and the user access edge where the defense and visibility is minimal. Once inside a system, the payload quietly target specific assets and individuals in the enterprise system. Most of the objectives of such attacks are to collect and exfiltrate intellectual property of state/trade secrets for competitive advantage within industry, and use for economy and sociopolitical ends [2]. The fact remains that current antivirus solutions are not capable of eradicating

targeted attacks. Study has shown that organization’s Industrial Control System [ICS] platforms are targets for cyber attackers. These include automation, process control, access control devices, system accounts and asset information that are considered valuable to attackers. Cyber criminals can leverage the lack of corporate security policies, procurement languages, asset inventory that are present in many Industrial Control Systems [ICS]. A coordinated and known security exploit can be combated by a structured and adequately designed security infrastructure that includes preventive mechanisms such as intrusion prevention, antivirus, content security and internal and external firewalls. However, targeted threats designed for specific exploits could pose a tougher challenge. Further attackers could come from nation states, insiders and other trusted parties such as contractors or vendors. Hacktivists or politically motivated attackers and script kiddies are also included in this group. Based on these facts, the potential-risk and challenges are very high. The key in this study is to identify what’s at stake and the key challenges in gaining visibility to customized threats. Also to create threat awareness in areas where systems could be vulnerable. These include but not limited to the following areas: [I] Network Access: Malware can spread vertically through the network by trusted system to system connections of VPN because it is easy for it to maneuver undetected through a controlled platform. [II] System Management; when there are extensive delays by security professionals in patching and operating systems upgrades, attackers can exploit such systems. Further, criminals can leverage default usernames and 247

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 246-261 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

passwords or weak authentication mechanism. [III] Supply Chain; Criminals can attack third party vendors, contractors or integrator in an attempt to exploit an ICS environment asset owner or multiple enterprise systems. [IV]Interconnects; Criminals can attack ICS systems by exploiting applications that communicate through network segmentation because many ICS platforms are susceptible to network-based Man in the Middle attacks. This study will provide actionable intelligence to ensure enterprise systems breach discover and mitigate exploits as they occur in the organization. In our survey questions 8 – 22 [appendix 1], we asked in the survey, who the top cyberthreats, that are facing their organization? This question was raised because, most members of security teams, do not agree on what constitutes the significant threats to their organizations. We also asked survey respondents the type of proactive tools they use to counter zero-day attacks and Advanced Persistent Threat [APT]. This is a commonly use term to define remote attacks employed by sophisticated threats actors. These actors could be nation states or their intelligence services. Some of the intelligence services include these: • • • • • • • • •

Malware Outbound Traffic Rogue Device Geolocation of IP Traffic Distribution intrusion detection systems (DIDS) Deep Packet Inspection [DPI] External Footprint Co-operating security managers (CSM) Watermarking/tagging

The findings from this study, will articulate the current cyber security measures enterprise systems will have to deploy to counter vulnerabilities, potential breaches and threats.

2 LITERATURE REVIEW In 2013, study showed that at least ten major banks were attacked by “hacktivists: These attacks were motivated by individuals whose exploit achievement is to destroy the reputation of the firm or its principals by employing techniques such as defacing websites to leaking enterprise private content. JPMorgan Chase, Bank of America, Citigroup, Wells Fargo and others have faced such attacks [3]. Baldor [4] in their study noted that a data security breach at Montana State health records compromised the social security numbers and other important information of about 1.5 people. These cyber-criminal gained access to a computer server tied to the Montana Department of Public Health and Human Services, exposing sensitive or confidential information of current and former medical patients, health agency employees and contractors. Oyemade [5], among others, cited that information technology faces a constant flood of alerts from every system, challenging organization’s security expert’s to find the critical threats in a sea of false alerts or insignificant events. They noted that ultimate protection means the near elimination of false positive and remediation of infected endpoints. Tester, [6]) in their report noted that more zero-day vulnerabilities were discovered in 2013 than any other year. The 26 zero-day

248

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 246-261 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

vulnerabilities discovered represent an 81 percent increase over 2012 and are more than the three previous years combined. Zero-day vulnerabilities are coveted because they give attackers the means to silently infect their victim without depending on social engineering. Target Corp and Neiman Marcus as a study summarized are not the only U.S. retailers whose networks were breached over the holiday shopping season last year, however, it reported that smaller breaches on of at least three other well-known U.S. retailers took place and were conducted using similar techniques as the one on Target, [13]. A recent study found that restaurant chain “P.F. Chang's China Bistro” had a breach on its card processing systems that could have resulted in theft of customer payment card information at 33 of its 210 U.S. locations. The potentially stolen data includes card numbers, cardholder's name and/or the card's expiration date and other relevant information [7]. In another report [8], it was stated that the 9/11 Commission, in its 10th anniversary report, cautions Americans and the U.S. government to treat cyber threats more seriously than they did terrorist threats in the days and weeks leading to Sept. 11, 2001. Steinbart [13], in their study also indicated that baby monitors, security cameras and routers, were famously hacked in 2013. Furthermore, security researchers demonstrated attacks against smart televisions, automobiles and medical equipment. This gives this study a preview of the security challenge presented by the rapid adoption of the Internet of Things (IoT).

Later studies, [9], [10], among others summarized that data shows that companies are learning from past cyber-attacks and breaches. There is evidence companies are becoming better at managing the costs incurred to resolve a data breach incident and for the first time in seven years both the organization cost of a data breach and the cost per lost or stolen record declined in 2012. According to a study by [12], citing rising number of high profile cyber-attacks — most recently at Twitter, LinkedIn and Yahoo— it was noted that governmental agencies are stepping up its scrutiny of cyber security. This is leading to increased calls for legislation and regulation, placing the burden on companies to demonstrate that the information provided by customers and clients is properly safeguarded online. Another studies by [10], also summarized that despite the fact that cyber risks and cyber security are widely acknowledged to be a serious threat, a majority of companies today still do not purchase cyber risk insurance, though this is changing. Their study suggests that more companies are now purchasing cyber coverage and that insurance has a key role to play as companies and individuals look to better manage and reduce their potential financial losses from cyber risks in future. 3 METHODOLOGY In order to pilot test the cyber-security concerns, the authors constructed, distributed and collected responses from survey questionnaires at a cyber-security business professional conference in May 2013 at San Antonio Texas. The survey population comprises of professionals who publish research findings and work in their respective fields. These are 249

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 246-261 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

experts with extensive history in teaching and in the business world. Survey data was distributed to senior IT professionals from midmarket (100 to 999 employees) and enterprise-class (1000 employees or more] organizations. The survey questionnaires were distributed to 320 attendees. The number completed and returned was 202. Overall, we consider these as an equitable representative random sample. Most of the survey items were Likert scale types, yes/no responses or categorical, ordinal items, gender, ranks of personnel. The study conducted a survey of 24 questions covering a range of security issues that are of importance and of concern to IT and security administrators in small and medium size businesses [SMBs]. The questions were designed and conducted to obtain a snapshot of the state of security issues in SMBs and to confirm issues that have been raised in other security studies.

4 FINDINGS/RESULTS A series of hypotheses were tested to determine the extent of awareness of potential threats to data at rest. and attitude of the respondents to security in their organization. The first hypothesis is to determine the extent to which the respondent’s attitude to hackers, current and former employees, foreign nationstates, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other affect their feeling of security in their organization. H0: There is no dependence between feeling secure in an organization on the one hand, and their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External

footprints, and Document Watermarking and Tagging on the other. H1: The sense of feeling secure in an organization depends on a number of factors – including: their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging. Conclusion: Attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Taggings, do not significantly impact the feeling of respondents about the security in their organization.

The second hypothesis is to determine the extent to which the male respondent’s attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other affect their feeling of security in their organization. H0: For male respondents, there is no dependence between feeling secure in an organization on the one hand, and their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other. H1: For male respondents, the sense of feeling secure in an organization depends on a number of factors – including: their attitude to hackers, current and former employees, foreign nation-

250

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 246-261 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging. Conclusion: When this analysis is controlled for gender, it turns out that male respondents, who have confidence in their company network, also believe that Rogue Device Scanning is the most proactive activity/technique to counter persistent threats to their organization. This dependency is found to be significant at the 1% significance level with a p-value of 0.008) The third hypothesis is to determine the extent to which the female respondent’s attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other affect their feeling of security in their organization. H0: For female respondents, the sense of feeling secure in an organization depends on a number of factors – including: their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging. H1: For female respondents, the sense of feeling secure in an organization depends on a number of factors – including: their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging. Conclusion: When this analysis is controlled for gender, it turns out that female respondents who have confidence in their company

network is independent of their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging. In order to determine whether there is dependence between the assessments of the effectiveness of an organization’s network system on the one hand, and their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other, the following hypothesis was tested. H0: There is no dependence between the assessment of the effectiveness of an organization’s network system on the one hand, and their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other. H1: There is dependence between the assessment of the effectiveness of an organization’s network system on the one hand, and their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other. Conclusion: Attitude of respondents to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging, does not significantly impact the assessment of the effectiveness of an organization’s

251

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 246-261 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

network system and their feelings about the security in their organization. The hypothesis was then tested for a subset of male respondents. The result is as follows: H0: There is no dependence between the assessment by male respondents, of the effectiveness of an organization’s network system on the one hand, and their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other. H1: There is dependence between the assessment by male respondents, of the effectiveness of an organization’s network system on the one hand, and their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other. Conclusion: Attitude of male respondents to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Taggings, does not significantly impact the assessment of the effectiveness of an organization’s network system and their feelings about the security in their organization.

hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other. H1: There is dependence between the assessment by female respondents, of the effectiveness of an organization’s network system on the one hand, and their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other. Conclusion: Attitude of female respondents to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Taggings, does not significantly impact the assessment of the effectiveness of an organization’s network system and their feelings about the security in their organization.

When the hypothesis was tested for a subset of female respondents, the result is as follows:

Another hypothesis that was tested is to determine if there is dependence between Investment in cybersecurity as best solution for cyber-attacks on the one hand, and their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other. The resultant test is as follows:

H0: There is no dependence between the assessment by female respondents, of the effectiveness of an organization’s network system on the one hand, and their attitude to

H0: There is no dependence between Investment in cybersecurity as best solution for cyber-attacks on the one hand, and their attitude to hackers, current and former

252

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 246-261 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other. H1: There is dependence between Investment in cybersecurity as best solution for cyberattacks on the one hand, and their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other. Conclusion: Attitude of respondents to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Taggings, do not significantly impact Investment in cybersecurity as best solution for cyberattacks. The hypothesis was then tested for a subset of male respondents. The result is as follows:

H0: There is no dependence between male Investment in cybersecurity as best solution for cyber-attacks on the one hand, and their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other. H1: There is dependence between male Investment in cybersecurity as best solution for cyber-attacks on the one hand, and their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other. Conclusion: At the 5% significance level, it is determined that the attitude of male respondents to foreign nation-states, do significantly impact male Investment in cybersecurity as best solution for cyberattacks.

Table 1. Regression output of attitude of male respondents to Foreign Nation-States

Coefficients Unstandardized Coefficients B Var014: Foreign Nation-states are perceived as the groups that pose the greatest cybersecurity threat to the organizations of male respondents When the hypothesis was tested for a subset of female respondents, the result is as follows: H0: There is no dependence between female Investment in cybersecurity as best solution

-0.398

Standardized Coefficients

t

Sig.

Std. Error Beta

0.173

-0.23

-2.301

0.024

for cyber-attacks on the one hand, and their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External

253

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 246-261 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

footprints, and Document Watermarking and Tagging on the other.

footprints, and Document Watermarking and Tagging on the other.

H1: There is dependence between female Investment in cybersecurity as best solution for cyber-attacks on the one hand, and their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External

Conclusion: At the 5% significance level, it is determined that the attitude of female respondents to organized crime do significantly impact Investment in cybersecurity as best solution for cyberattacks in their organization.

Table 2. Regression output of attitude of female respondents to Organized Crime groups

Coefficientsa Unstandardized Standardized Coefficients Coefficients Std. B Error Beta -1.426 3.282

Model t Sig. (Constant) -.434 .665 Var015: Organized crime groups are perceived as the groups that .384 .128 .345 3.003 .004 pose the greatest cybersecurity threat the organizations of female respondents a. Dependent Variable: Var005: Do you agree that investment in cybersecurity in 20132014....will provide the best systems solutions to thwart cyber-attacks?

Another hypothesis that was tested is to determine if there is dependence between Rating Downtime as the greatest IT concern of their organization on the one hand, and their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other. The result of that test is as follows: H0: There is no dependence between Rating Downtime as the greatest IT concern of their organization on the one hand, and their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External

footprints, and Document Watermarking and Tagging on the other. H1: There is dependence between Rating Downtime as the greatest IT concern of their organization on the one hand, and their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other. Conclusion: At the 5% significance level, it is determined that the attitude of respondents to hackers (p-value 0.047) and organized crime (p-value 0.037) do significantly impact Rating Downtime as the greatest IT concern of their organization.

254

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 246-261 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Table 3. Regression output of attitude of respondents to Hackers and Organized Crime groups

Coefficientsa Unstandardized Standardized Coefficients Coefficients Std. Model B Error Beta t 1 (Constant) 4.219 .815 5.175 Var012: Hackers are the groups that pose the greatest cybersecurity threat .098 .049 .141 2.002 to your organization Var015: Organized crime are the groups that pose the greatest -.076 .036 -.153 -2.097 cybersecurity threat to your organization a. Dependent Variable: Var006: Rate your company's IT concerns with regard to Downtime The hypothesis was then tested for a subset of male respondents. The result is as follows: H0: There is no dependence between male Rating Downtime as the greatest IT concern of their organization on the one hand, and their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other.

Sig. .000 .047

.037

organization on the one hand, and their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other. Conclusion: At the 5% significance level, it is determined that the attitude of male respondents to hackers (p-value 0.029) does significantly impact Rating Downtime as the greatest IT concern of their organization

H1: There is dependence between male Rating Downtime as the greatest IT concern of their

255

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 246-261 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Table 4. Regression output of attitude of male respondents to Hackers

Coefficientsa Unstandardized Coefficients B Std. Error 3.421 1.344

Standardized Coefficients Beta

Model t Sig. (Constant) 2.545 .013 Var012: Hackers are perceived as the group that poses the greatest cybersecurity threat to .164 .074 .235 2.214 .029 the organizations of male respondents a. Dependent Variable: Var006: Rate your company's IT concerns with regard to Downtime

When the hypothesis was tested for a subset of female respondents, the result is as follows: H0: There is no dependence between female Rating Downtime as the greatest IT concern of their organization on the one hand, and their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other.

H1: There is dependence between female Rating Downtime as the greatest IT concern of their organization on the one hand, and their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other. Conclusion: At the 5% significance level, it is determined that the attitude of female respondents to organized crime (p-value 0.020) does significantly impact Rating Downtime as the greatest IT concern of their organization.

Table 5. Regression output of attitude of male respondents to Hackers

Coefficientsa Unstandardized Coefficients Std. B Error 4.393 1.169

Standardized Coefficients

Model Beta t Sig. 1 (Constant) 3.758 .000 Var015: Organized crime is perceived as the group that poses the greatest cybersecurity threat -.108 .046 -.279 -2.378 .020 to the organizations of female respondents. a. Dependent Variable: Var006: Rate your company's IT concerns with regard to Downtime

256

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 246-261 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

Another hypothesis that was tested is to determine if there is dependence between Rating of Compliance as the greatest IT concern of their organization on the one hand, and their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other. The result is as follows: H0: There is no dependence between Rating of Compliance as the greatest IT concern of their organization on the one hand, and their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other. H1: There is dependence between Rating of Compliance as the greatest IT concern of their organization on the one hand, and their attitude to hackers, current and former employees, foreign nation-states, organized H1: There is dependence between male Rating of Compliance as the greatest IT concern of their organization on the one hand, and their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other. Conclusion: Attitude of respondents to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Taggings, does not significantly impact male Rating

crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other. Conclusion: Attitude of respondents to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Taggings, do not significantly impact Rating Compliance as the greatest IT concern of their organization The hypothesis was then tested for a subset of male respondents. The result is as follows: H0: There is no dependence between male Rating of Compliance as the greatest IT concern of their organization on the one hand, and their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other. Compliance as the greatest IT concern of their organization. When the hypothesis was tested for a subset of female respondents, the result is as follows: H0: There is no dependence between female Rating of Compliance as the greatest IT concern of their organization on the one hand, and their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other. H1: There is dependence between female Rating of Compliance as the greatest IT

257

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 246-261 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

concern of their organization on the one hand, and their attitude to hackers, current and former employees, foreign nation-states, organized crime, malware analysis, Geolocation of IP traffic, Subscription Services, External footprints, and Document Watermarking and Tagging on the other.

Conclusion: At the 5% significance level, it is determined that the attitude of female respondents to Penetration Testing (p-value 0.021), and Examination of External Footprints (p-value 0.050) as the most proactive activity/techniques used to counter persistent threats to their organization, significantly impact Rating Compliance as the greatest IT concern of their organization.

Table 6. Regression output of attitude of female respondents to Penetration Testing & External Footprints

Coefficientsa Unstandardized Standardized Coefficients Coefficients Std. Model B Error Beta t 1 (Constant) 4.705 1.037 4.536 Var017: Penetration Testing is the most proactive .219 .093 .253 2.357 activity/technique used to counter persistent threats to your organization Var022: Examining External Footprint is the most proactive -.144 .072 -.230 -1.992 activity/technique used to counter persistent threats to your organization a. Dependent Variable: Var007: Rate your company's IT concerns with regard to Compliance 5 OVERALL CONCLUSION Despite the tremendous amount of money organizations pour into traditional security measures every year, attackers are able to penetrate security defenses and compromising networks at will. As this study shows, it doesn’t matter what vendor or combination of security in-depth tools an organization employs, hackers will continue to hack a system when they can exploit vulnerabilities in a network. At a minimum, organizations should include reducing waste

Sig. .000

.021

.050

on redundancy, backward-looking technologies and redeploying those resources on defenses designed to find and stop today’s zero-day and advanced attacks. 6 IMPLICATION FOR PRACTIONERS AND RESEARCHERS As this study has indicated, today’s attacks often involve malware tailored to compromise a single vector. When such attack commences, they never stop until the objective is achieved. Typical security in-depth architecture

258

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 246-261 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

comprising of framework of discrete layers that include anti-virus software, intrusionprevention systems, next generation firewalls, Web gateways, are poorly equipped to combat today’s advanced attacks. However, without a comprehensive and cohesive analysis across all attack vectors, the current defense mechanism can miss the signs that an attacker has breached organization defenses.

7 CHALLENGES A bigger challenge is foundational. The reason is because the cores in the typical architecture rely on a mix of binary signatures, blacklists, and reputation to identify threats. As the study showed, signatures are ineffective because antivirus (AV) vendors are not able to keep up with the speed of new malware binaries. This is because the malware is custom-made for specific target and as result, AV vendors will never be able to detect the malware and create signature for its defense. Other concerns include many attacks that exploit zero-day vulnerabilities and application blacklist that are blind to attacks because it uses encrypted binaries or hijack legitimate apps and processes. Further, reputable defenses such as Web gateways and IPS are not poised to stop

attacks from newly setup universal resource locators [URLs] or compromised websites that are used as drive-by-downloads.

8 SUMMARY AND CONCLUSIONS The study reveals that IT professionals were generally optimistic about the levels of their policies, technical control and mitigation implementation strategies; however, our finding proved that organizations could better align its investments and resources in order to cope with the advanced exposures and attacks as they develop. Phishing attacks, compliance policy violations, unsanctioned device and applications use, unauthorized data access, comprise the top list issues of concerns. Network device intelligence and system integrity, core components of all compliance frameworks and security best practices should be beefed up. The study concludes by noting that without a security policy, the availability of a network can be compromised. An adequate policy comprises of assessment of the risk to the network, organizing a response team, implementing a security change management practice, monitoring the network for security violations, maintaining a review process that modifies the existing policy and adapt to lessons learned.

259

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 246-261 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012)

9 REFERENCES 1.

Anderson, Kerry A.; “A Case for a Partnership Between Information Security and Records Information Management,” ISACA Journal, vol. 2, 2012, www.isaca.org/archives

2.

Ashford, Warwick; (2013), “Why Has DLP Never Taken Off?,” Computer Weekly, 22 January 2013, www.computerweekly.com/news/224017641 4/Why-has-DLP-never-taken-off

3.

Steinbart, Paul John; Robyn L. Raschke; Graham Gal; William N. Dilla; “Information Security Professionals’ Perceptions about the Relationship between the Information Security and Internal Audit Functions,” forthcoming in the Journal of Information Systems, 2013

4.

5.

6.

Baldor, Lolita C. (2013), ““US Ready to Strike Back Against China Cyberattacks,” Yahoo News, 19 February 2013, http://news.yahoo.com/us-ready-strike-backagainst-china-cyberattacks-225730552-finance.html Oyemade, Ronke; “Effective IT Governance Through the Three Lines of Defense, Risk IT and COBIT,” ISACA Journal, volume 1, 2012, www.isaca.org/archives Tester, Darlene; “Is the TJ Hooper Case Relevant for Today’s Information Security Environment?” ISACA Journal, vol. 2, 2013

7.

Constantin, Lucian; “South Carolina Reveals Massive Data Breach,” PC World, 27 October 2012, www.pcworld.com/article/2013186/southcarolina-reveals-massive-data-breach.html

8.

Forrester,(2012), “Rethinking DLP: Introducing the Forrester DLP Maturity Grid,” September 2012, www.forrester.com/Rethinking+DLP+Introd ucing+The+Forrester+DLP+Maturity+Grid/f ulltext/-/E-RES61231

9.

Gelbstein, Ed; “Strengthening Information Security Governance,” ISACA Journal, vol. 2, 2012, www.isaca.org/archives

10. Mashable, “How Much Does Identity Theft Cost?,” 28 January 2011, http://mashable.com/2011/01/28/identitytheft-infographic 11. Romanosky, Sasha; David Hoffman, Alessandro Acquisti, (2014), “Empirical Analysis of Data Breach Litigation,” Temple University Beasley School of Law, Legal Studies research paper no. 2012-29, 2012, https://papers.ssrn.com/sol3/papers.cfm?abstr act_id=1986461 12. Goldman, C. FreeWave Technologies. www.elp.com/articles/powergrid_internationa l/print/volume-17/, 2012. 13. Steinbart, Paul John; Robyn L. Raschke; Graham Gal; William N. Dilla; “The Influence of Internal Audit on Information Security Effectiveness: Perceptions of Internal Auditors,” working paper, 2013

260

International Journal of Cyber-Security and Digital Forensics (IJCSDF) 3(4): 246-261 The Society of Digital Information and Wireless Communications, 2014 (ISSN: 2305-0012) Appendix I: Cybersecurity Survey Questionnaire 1

2

Var001: Gender

Male

Female

Var002: Executive or Senior IT Administrator?

Exec

Snr. IT

Var003: How secure is the company network?

Very Secure

Secure

Moderately Effective Var005: Do you agree that investment in cybersecurity in 2013-2014....will provide Moderately Extremely Agree the best systems solutions to thwart cyberattacks? Agree Strongly Var006: Downtime is the greatest IT concern of my organization Disagree Disagree Strongly Var007: Compliance is the greatest IT concern of my organization Disagree Disagree Strongly Var008: eDiscovery is the greatest IT concern of my organization Disagree Disagree Strongly Var009: Security Issues is the greatest IT concern of my organization Disagree Disagree Strongly Var010: Network Growth is the greatest IT concern of my organization Disagree Disagree Strongly Var011: User support is the greatest IT concern of my organization Disagree Disagree Var012: Hackers are the groups that pose the greatest cybersecurity threat to your Strongly Disagree organization Disagree Var013: Current & former employees are the groups that pose the greatest Strongly Disagree cybersecurity threat to your organization Disagree Var014: Foreign Nation-states are the groups that pose the greatest cybersecurity Strongly Disagree threat to your organization Disagree Var015: Organized crime are the groups that pose the greatest cybersecurity threat to Strongly Disagree Disagree your organization Var016: Malware Analysis is the most proactive activity/technique used to counter Strongly Disagree persistent threats to your organization Disagree Var017: Penetration Testing is the most proactive activity/technique used to counter Strongly Disagree persistent threats to your organization Disagree Var018: Rogue Device Scanning is the most proactive activity/technique used to Strongly Disagree counter persistent threats to your organization Disagree Var019: Analysis & Geolocation of IP Traffic is the most proactive Strongly Disagree activity/technique used to counter persistent threats to your organization Disagree Var020: Subscription Services is the most proactive activity/technique used to Strongly Disagree counter persistent threats to your organization Disagree Var021: Deep Packet Inspection is the most proactive activity/technique used to Strongly Disagree counter persistent threats to your organization Disagree Var022: Examining External Footprint is the most proactive activity/technique used Strongly Disagree to counter persistent threats to your organization Disagree Var023: Don't Know/Not Sure of the most proactive activity/technique used to Strongly Disagree counter persistent threats to your organization Disagree Var024: Document Watermarking & Tagging is the most proactive activity/technique Strongly Disagree used to counter persistent threats to your organization Disagree Var004: How effective is the network security system of your organization?

Extremely Effective

3

4

5

Somewhat Secure

Not Very Secure Don't Know

Effective

Not Effective

Don't Know

Agree

Disagree

Don't Know

Not Sure

Agree

Strongly Agree

Not Sure

Agree

Strongly Agree

Not Sure

Agree

Strongly Agree

Not Sure

Agree

Strongly Agree

Not Sure

Agree

Strongly Agree

Not Sure

Agree

Strongly Agree

Not Sure

Agree

Strongly Agree

Not Sure

Agree

Strongly Agree

Not Sure

Agree

Strongly Agree

Not Sure

Agree

Strongly Agree

Not Sure

Agree

Strongly Agree

Not Sure

Agree

Strongly Agree

Not Sure

Agree

Strongly Agree

Not Sure

Agree

Strongly Agree

Not Sure

Agree

Strongly Agree

Not Sure

Agree

Strongly Agree

Not Sure

Agree

Strongly Agree

Not Sure

Agree

Strongly Agree

Not Sure

Agree

Strongly Agree

261