Improving Performance of Forensics Investigation with Parallel ...

3 downloads 295 Views 409KB Size Report
with Parallel Coordinates Visual Analytics. Wen Bo Wang. Faculty of Engineering & ITˈ. University of Technology, Sydney. Sydney, Australia. Mao Lin Huang.
2014 IEEE 17th International Conference on Computational Science and Engineering

Improving performance of Forensics Investigation with Parallel Coordinates Visual Analytics Wen Bo Wang

Mao Lin Huang

Faculty of Engineering & ITˈ University of Technology, Sydney Sydney, Australia

School of Computer Software, Tianjin University, Tianjin 300072, China School of Software, University of Technology, Sydney Sydney, Australia [email protected]

Liang Fu Lu

Jinson Zhang

Faculty of Engineering & IT, University of Technology, Sydney Sydney, Australia

Abstract — Computer forensics investigators aim to analyse and present facts through the examination of digital evidences in short times. As the volume of suspicious data is becoming large, the difficulties of catching the digital evidence in a legally acceptable time are high. This paper proposes an effective method for reducing investigation time redundancy to achieve the normalization of data on hard disk drives (HDD) for computer forensics. We use visualization techniques, parallel coordinates, to analyse data instead of using data analysis algorithms only, and also choose a Red-Black tree structure to de-duplicate data. It reduces the time complexity, including the time spent of searching data, adding data as well as deleting data. We show the advantages of our approach; moreover, we demonstrate how this method can enhance the efficiency and quality of computer forensics task. Keywords—Computer Forensics; Digital Evidence; Visuaization Techniques; parallel coordinates; Red-Black Tree;

I.

INTRODUCTION

Computer has become an integral part of the daily lives of citizens around the world with the increasingly wide use of computers. However, as the FBI stated that fifty percent of the cases the FBI now open involve a computer [1]. Crimes involve with computers have been named as computer crimes. Computer crimes were recognized in the early 1980s, it consists of different scenarios. One is taking computer as the target of criminal activity; and another is using computers to be repositories of evidence or instruments of crimes [2, 3]. For example, criminal stores illegally copied video files on their personal computer; people modify or delete information (files or packets) [4] in order to get economic benefits from others; hacker access private information (account details, ID, logs etc.) through logging in one’s personal computer, and then rewrite or read their requirements without permission, etc. As these activities have taken many concerns both to individual 978-1-4799-7981-3/14 $31.00 © 2014 IEEE DOI 10.1109/CSE.2014.337

Faculty of Engineering & IT, University of Technology, Sydney Sydney, Australia

and in communities, a new research field has been proposed, computer forensics, which aims to protect, recover and analyze digital evidence for legal uses in court. Forensic Researchers tried to improve the ability of investigation through various ways, including consistently revise the guiding steps of finding evidences during the past decades, which is also called building forensic model [5, 6]. Just like before constructing a building, designers must finish the blueprint first and then go step by step as the guidance of the whole framework; at the same time, according to the technical requirements of each phase in forensic investigation, developments of forensic tools over the past years have also been much anticipated. For example Encase [7], which is considered one of the best computer forensics packages available because of the software’s various component programs; FTK (Forensic Tool Kit) [7], which allows user to examine an imaged disk through a gallery view, a hexadecimal view and searching the disk for keywords. However, although the techniques of forensic investigation have been improved gradually, the number of challenges is also increased. One of the major problems is the efficiency and accuracy of analyzing the growing large volume data. Therefore, in this paper, we mainly focus on the effects of analyzing large volume data in computer forensics investigation. The remaining of the paper is structured as follows: Section II mainly introduces the existing methods in modeling forensic processes, and discusses their advantages and disadvantages. In Section III, We propose a new investigating model by adding visual techniques to optimize the quality of investigation efficiency. Section IV is the certification of the effectiveness of our new model with added functions, and we take hard disk drive as an investigated example to certify our new models effectiveness. Finally, our conclusion and future work are presented in Section V. 1840 1838

II. RELATED WORK Models aim to establish a clear guideline of a complex problem. Forensic investigators proposed various frameworks to clearly guide a forensic investigation. A general model has been proposed in [8] in the emerging of computer forensic, it separated the processes into 4 parts: preparation, forensic analysis, forensic report and admissibility. Later in 2002, Reith [5] extended the model to 9 parts including identification, preparation, approach strategy, preservation, collection, examination, analysis, presentation, and returning evidence, that shows more processing details. Carrier [9] found an interaction function between physical and digital investigation, and he adds digital crime scene into model in the order to implement it. Considering the problem of user community applicability, N.L.Beebe and J.G.Clark [10] introduced a multitier, hierarchical framework, which is beneficial on logical analysis of investigation. Ricci S.C.Leong [11] focused on the availability of real investigation, and also discovered a hierarchical and objectives-based model. Generally, these models aim to provide an abstract reference framework, which is independent of any particular technology or organizational environment, and also to help develop and apply methodologies to new technologies. Although these models optimized forensic investigations during the past years, less people considered to improve the ability through using visualization techniques, which has been applied in many areas and gives great help in each field, such as Aerospace, Biology and Geology. Therefore, we are focusing on combination of visualization techniques and forensic models to improve the efficiency and accuracy of forensic investigation. III. A NEW FORENSIC INVESTIGATION MODEL Though many models concentrated on part of the investigative process have been proposed, general model is needed which can be incorporative with other aspects, so our new investigation model is not only giving a general view of the whole investigation process, but is also considering more on the protection and accuracy of digital evidence by applying security certification before and after the analysis process, shows in Fig.1. Our new model divides forensic investigation [12, 13] into 5 parts including: evidence preparation, data protection, analysis, data certification, and report. The first stage is evidence preparation; including data collection, imaging data and duplication of digital storage media, for example, researchers use series of text-based commands to duplicate files. The second stage mainly protects the integrity of evidence, investigators begin the process of ensuring that the data are safe and minimized, for example, ‘fingerprint’ is one of the methods to keep the security of original information. The third stage is the forensic analysis, which is always considered to be the most tedious and complicated part of the computer crime investigation. In our new model, we use visualization [21] techniques to process the whole data instead of text searching or image searching, as visualizing information

can extract [22, 23] essential information and knowledge from large and complex data sets, and can also represent them clearly and comprehensively in graphic format for better understanding the patterns hidden in. Specifically, the analysis part is implemented through 5 steps: visualizing original data, correlation analysis, Data classification, multiple linear regression, and data filtering.

Fig.1 New Computer-Related Forensics Investigation Model

Specifically, since we have collected the original data, we use a data viewing technique to visualize the original first so the users can get an overview of the whole dataset. We choose parallel coordinate to be the main visualization tool, and then to analyze data relationship by correlation analysis, and also use color scheme to reflect the strength of the relationship among data. The related data also need to be classified, so the next step is to cluster and show data classification by parallel coordinates. In order to access more accurate and deeper information knowledge from datasets, we will also do the multiple liner regression analysis, finally, use data filtering algorithm, for example collaborative filtering algorithm, to select the required information. As the particularity of digital evidence, the fourth stage is information authentication, which keeps the stability and legitimacy of investigating results. Followed by this step, the forensic analyst will prepare a report that will be formatted to provide an easy-to-read document including all evidence recovered throughout the investigation and analysis. Generally, our model provides a view for understanding the process of investigations, and considers security problems of original data, processing data and also data results due to they are going to be placed in the law enforcement environment, and as some researchers worked on forensic model focused on the issues of different phrases to optimize the corresponding forensic efficiencies, our model also concentrates on an important process, the data analysis, to use information visualization techniques to improve the efficiency of the forensic investigation.

1839 1841

IV. CERTIFICATION

Tree: A structure has n finite set of nodes, n (n҆0). 1) If n=0, the structure is called “Empty Tree” 2) If n0,the structure satisfies two conditions: a) Only one particular node is Root Note.

A. Data Storage Medium In our implementation, we choose data on hard disk drives in our experiment as it is estimated that over 90% of all new information produced in the world is being stored most on hard disk drives, and it has also been a main forensic target during the past years. So it is important for investigators to deeply understand the data stored in computer hard disk drives [14, 15]. Hard disk drive is also one kind of the auxiliary memory, which holds permanent or semi-permanent data on some external magnetic or optical medium. It always costs lower per byte than primary memory does. What’s more, auxiliary memory will not lose any data when computer has been powered off. This means the second memory is a non-volatile storage device. There are also differences of transporting with CPU between the first and second memory. The latter utilizes the input and output channels to communicate with CPU, as it shows in Fig.2, which describes the details of auxiliary memory communicating with CPU.

b)If n>1, nodes (except root note) can be divided into m (m>0), these nodes are disjoint and finite sets, marked as T1,T2,…㧘Tm, each subsets is called a sub-tree. For example, a hard disk drive contains 4 folders A, B, C, and D, so it has 4 children, and each folder contains some files and also sub-files, particularly, some files are same, and they will affect the efficiency of investigation if the data set is in a large volume. In this case, when represent the structure of hard disk drive, it contains 4 different cases: First: The duplicated files are stored at the same partition. Then the processing method can be any one of those methods. Second: The duplicated files are stored at different location. One file is a leave node without any children, while the other contains children nodes. Under this situation, we choose to delete node without any sub-nodes. Third: The duplicated files are stored at different location, and both of them contain children: one contains one sub-node, while the other contains two sub-nodes. Under this situation, we choose to delete node with one sub-node.

Fig.2. Memory communicates with central processing unit via address bus and data bus.

It is also important to discover how HDD is constructed, and how it is to image data, recover or search information in specific place, the allocated or unallocated or unused space. Files are distributed at the memory by a controller. Importantly, not all the space on a hard disk drive is available for user files. In addition, an HDD may have one or more partitions logically [16]; it is providing a way for users to flexibly create different virtual disks that can be used for different purposes. When a system will store data on disk, it first read/write the data on all the sectors of the first tracks in the first cylinder, and then the next magnetic header within the same cylinder, followed by this until all contents wrote into disks. It is much easier for us to locate and analyze data as we know that data are stored on the track of the disk sector in the mode of information. B. Data distribution According to the data arrangement of hard disk drive, it can be seen as a tree structure as hard drive disk is the root, and the sub-folders are the child. Therefore, we can improve investigation efficiency from a structure point. A trees logical structure[17] can be defined recursively as a collection of nodes (starting at a root node), where each node is a data structure consisting of a value, together with a list of references to nodes (the "children"), with the constraints that no reference is duplicated, and none points to the root.

Fourth: The duplicated files are stored at different location, and both of them contain two sub-nodes. Under this situation, our standard is to choose the approach with less processing (right rotation, left rotation) times. For example, the time of deleting files (Fig.5) is large than that of in Fig.4. Therefore, a key problem is to choose the better nodes to be deleted among all the same files. Particularly, because an R-B (2-3-4) Tree structure has some additional properties: Property1: The colour of node can only be black or red. Property2: The colour of root node is black. Property3: All the colour of leaves (node) is black. Property4: If a red node has two sub-nodes, these two nodes are black nodes. (All paths from each leaf to the root cannot have two red nodes consistently) Property5: The path from any node to its leaves contains the same number of black notes. Because of the design of R-B tree, any data can be sought, inserted, and deleted within time complexity of O(log 2 (n)) ; and if data are structured in this type, it provides the best possible guarantee of insertion time find time in a worst case, this make it valuable in time-sensitive applications, such as real time applications. Secondly, although it is complicated, it has good running time, and it is very effective in practice, it can search within Time Complexity [20] of O(log 2 (n)) . Refer to the Table I.

Mathematical notation [18,19] as:

1840 1842

TABLE I TIME COMPLEXITY OF RB-TREE Average

In general, Data trees [22,23] are a valuable tool for describing, defining, cataloguing and generalizing data which enhances the strategies and practices of knowledge management within an organization. Arranging data in this way provides a powerful mechanism for groups in society to classify information from single or multiple disciplines into layers of facts and figures. Provide information to organizations in this way allows the examination of data from different perspectives to solve problems, identify issues and make predictions.

Worst

Search

O(log 2 (n))

O(log 2 (n))

Insert

O (log 2 ( n))

O(log 2 (n))

Delete

O(log 2 (n))

O(log 2 (n))

Therefore, in our model, we choose to transform data structure into R-B tree structure, then deal with data on an R-B Tree. The processing contains two situations [21]:

C. Redundancy reduction Usually, a computer HDD contains numbers of files, but some of them might be duplicated files. Fig.6 is the main theory of reducing redundancy files. Firstly, displaying the whole original data; and then finding redundancy data through data searching methods, like wildcard matching algorithm, and choose the proper nodes to delete based on the ease of implementation of deleting and reorganizing a R-B Tree; the last step is to reorganize the layout, which is similar with implementing the traditional, shown in Fig.8.

Case1: The target file is the Root Node. In this case, the whole tree will be deleted. Case2: Under this situation, the most comprehensive part is how to repair the R-B tree after it has deleted the target in order to meet its 5 properties. The repair part can be classified as the following discussed. The target is a children node, and its parents’ node is Red node. Under this situation, it can also be divided into 2 different cases: as the sibling of target is Red (see Fig.3 or Black. Specifically, if the sibling is Black, there are another two different situations based on the property 3 of R-B tree: One is that the two nodes of Siblings’ children have the same color, see Fig.3; the other is that they are with different colors, see Fig.4, which is also contains two cases, including the right son is Black, left son is Red, see Fig.4 or the right son is Red, while the left one is black, see Fig.5.

Fig.8. Non-Redundancy Visualization Model

Fig.3. Sibling of target is Red

Fig.5. Left Son is Red

Because of the duplicated files in HDDs, we propose to reduce data redundancy before deeper analyzing datasets. Specifically, see the whole HDD as a tree, so discovering target in a tree becomes a process to traversal nodes throughout a tree. So it is easy to find that the data redundancy to search files among files which have same contents within different partition, and it would not affect the final evidence if we process them first. It is not only wasting the space of HDD, but also wasting time for finding clue. And using R-B tree structure is also a benefit to delete duplicated files. Refer to part B in Section IV.

Fig.4. Right son is Black

D. Visualization -Parallel Coordinates In our model, we use parallel coordinates [24, 25] as the main visual techniques, which is one of the most popular visualization information technologies. The following is how parallel coordinates works.

Fig.6. Left and right son are Black.

When the parents’ color changes to Black, most of the cases use the same rotation except the children of its sibling with same colors, Fig.7 gives the details.

Points on the plane are represented by lines, and any two points determine a line. Take a point in two dimensional as an example: there are four points:

{a 1 , Fig.7. Parents’ Color changes to Black.

1841 1843

a 2 , a 3 , a 4 , b1 , b 2 , b 3 , b 4

}

x

x

­ ° ° ° ° ° ° ° l : ® ° ° ° ° ° ° ° p ¯

on 1 and 2 respectively, and two points determines a line. These lines can be mapped onto a line with different points. And now, this point p is rather than a segment but a whole line:

l : y = ax + b ↔ p : (

d b , ) 1 − a (1 − a )

The transforming process [26] is shown in Fig.9

a = 1, p → ∞ , therefore, d b p:( , ), a ≠ 1 1− a 1− a Generally, an N-dimensional line l can be described by the

N − 1 linear equations: ­ ° ° °° l :® ° ° ° °¯ l N In

the

l1 , 2 : x 2 = a 2 x 1 + b 2 l 2 ,3 : x 3 = a 3 x 2 + b 3 ! ! li − 1,i : x i = a i x i −1 + b i −1,N : x N = a N x N −1 + b N X

i−1

X

plane, the relation labeled can be represented as a set of point i

P 12 , P 23 , P 34 , ! , P i − 1 , i , P N

It represents ( P1 , P 2 ) satisfies the

−1,N

p 2 ,3 : x 3 =

1 + 1 × (1 − a 3 ) b3 , 1 − a3 1 − a3

p 3 ,4 : x 4 =

1 + 2 × (1 − a 4 ) b4 , 1 − a4 1 − a4

! ! p i − 1 ,i : x i = N −1,N

: xN =

1 + ( i − 2 ) × (1 − a i ) bi , 1 − ai 1 − ai 1 + ( N − 2 ) × (1 − a 1 − aN

N

)

,

bN 1 − a

N

In addition, this visual technique amplifies human cognitive capabilities by increasing cognitive resources, reducing search, enhancing the recognition of patterns, supporting the easy perceptual inference of relationships, perceptual monitoring of a large number of potential evens. What’s more, in this model, we mainly focus on 4 functions to discover data interactively and visually, they are: data correlation, data classification, data filtering and regression analysis. Specifically, correlation data mining is to predict the strength of relationships between pairs of variable to facilitate the estimation of one variable based on what is known about another, so we choose spearman’s rank correlation coefficient

l3,4 : x 4 = a 4 x 3 + b 4

l i − 1 ,i − n , N = 2 , ! , N

1 + 0 × (1 − a 2 ) b2 , 1 − a2 1 − a2

It is obvious that using parallel coordinate to analyse data can only to display original data in graph [27,28], but visually interacting among data, human and algorithm, which dynamically shows the operation between user and algorithm, algorithm and data, data and user. And there are many advantages [29] applying parallel coordinate to forensics according to its characters. Parallel Coordinates not only aids to identify files easily and visually but can display the relationship among files directly [30,31]. It maps multidimensional data on a 2D space and help to analyze the data relationship with similar attributes. It has low representational complexity and be mathematically rigorous. It provides manipulable mediums to enable the exploration of a space of parameter values, and is also able to over view all the information, and parallel coordinates is user-friendly and understandable for users to get the result of clustering data. Each file attribute value is shown as a point, and one line represents one file, and parallel coordinates can also display the final results.

Fig.9. Geometry Theory of Parallel Coordinate in 2D

If

p 1,2 : x 2 =



l12 relation, and to measure statistical dependence between variables, and apply parallel coordinate to visualize the context and details, and use colour scheme to identify the strength of relationship. In order to implement data classification, our model contains four methods based on the software of ArcGIS: Defined Intervals, Quantile, Standard Deviation, and Natural Breaks; as for data filtering, for example, filters unrelated data with this crime scene, applies collaborative filtering algorithm; finally, we add multiple linear regression analysis to optimize the analysis function during the whole process, it identifies the most

( P2 , P3 )(P3 , P4 )! ( Pi −1 , Pi ) Have relationships with a set of ݈ respectively, see below:

l 23 , l 34 , ! l i − 1 , i N Therefore, in parallel coordinates a line l ⊂ R , as represented by N − 1 points P i .And P i can be computed as following equations.

1842 1844

relevant relationships in a particular data set, in our model, it goes through exploratory analysis, execute multiple linear regression, and to update statistical graphical indicators to display. V. CONCLUSION The traditional digital forensics approach [32] of seizing a system(s)/media, transporting it to the lab, making a forensic image(s), and then search the entire system for potential evidence, however, this is no longer appropriate in some circumstances. In cases such as child abductions, missing or exploited persons, time is of the essence. In these types of cases, investigators dealing with the suspect or crime scene need investigative leads quickly; in some cases it is the difference between life and death for the victim(s). So the need for the timely identification, analysis and interpretation of digital evidence is becoming more crucial.

[8] [9] [10] [11] [12]

[13] [14] [15]

In this paper, we introduce the current model for forensic investigations and discover the existed challenges, and then propose a new model with visualization techniques instead of analysis to improve the quality of investigating, the main reason is that visual analysis can not only attack certain problems whose size, complexity, and need for closely coupled human and machine analysis may make them otherwise intractable, but also advance science and technology developments in analytical reasoning, interaction, data transformations and representations for computation and visualization, analytic reporting, and technology transition.

[16]

In addition, in a storage device, many copies of repeating data exited. These files could waste lots of time to access the target data. Therefore, data de-duplication can be used to improve storage utilization and the efficiency of investigating. So we add a single step in visual analysis to process the duplicated files, and use hard disk drive as an example to certify the efficiency of our approach.

[21]

In the future, we plan to optimize this model by intellectual visual analytics, and implement an investigation tool mainly for analyzing digital information. As the data size is larger and larger day by day, so big data visual analyzing [33] challenge is also being considered as our future work.

[17]

[18]

[19] [20]

[22] [23]

[24] [25] [26] [27]

REFERENCES [1]

[2] [3] [4]

[5] [6] [7]

D.Hayes. Quoting Scott C.Williams, supervisory special agent for the FBI’s computer analysis and response team in Kansas City. Page A1, April 26,2002. J.R Vacca. "Computer forensics: computer crime scene investigation",Cengage Learning, 2005. M. Rogers "The role of criminal profiling in the computer forensics process", Computers & Security, 2003, 22, (4), pp. 292-298 B.Schneier and J.Kelsey. "Secure audit logs to support computer forensics", ACM Transactions on Information and System Security (TISSEC), 1999, 2, (2), pp. 159-176 M. Reith, C. Carr, G.Gunsch. “An examination of digital forensic models”, International Journal of Digital Evidence, 2002, Vol 1. Issue3. C.Brian , E.H. Spafford. ”Getting physical with the ditital investigation process.” International Journal of Digital Evidence, 2003, Vol 2, Issue2. W.T.Robert, J.E.Eric, L.John. Digital Crime and Digital Terrorism. Pearson Publishing. Third edition.2013.

[28]

[29]

[30]

[31]

[32] [33]

1843 1845

L.Garber,"EnCase: A Case Study in Computer-Forensic Technology", IEEE Computer Magazine January, 2001. V.Baryamureeba, F.Tushabe, The enhanced digital investigation process model, Proceedings of the Fourth Digital Forensic Conference.2004. S.C.Ricci. “FORZA-Digital forensics investigation framework that incorporate legal issues.”.Digital Investigation. 2006,Vol 3, PP.29-36. Robert Moore, CyberCrime investigation high-technology computer crime second edition. Apanderson publishing. A.S.Chad, P.J.Fitzpatrick, J.E.S.II, T.J.Jannkun-Kelly.”A visual Analytics Approach for Correlation, Classification and Regression Analysis”.Innovative Approaches of Data Visualization and Visual Analytics.2012. S.Teerlink, and R.F.Erbacher. "Foundations for visual forensic analysis", Foundations for visual forensic analysis .IEEE, 2006 , pp. 192-199. E.Grochowski, R.F.Hoyt. "Future trends in hard disk drives", Magnetics, IEEE Transactions on, 1996, 32, (3), pp. 1850-1854. Liu Huaihui. "The research on the tactic of computer static forensics", 2012 International Conference on Artificial Intelligence and Soft Computing, Lecture Notes in Information Technology. Vol.12.2012, P.207-212. M.K.Rogers, K,Seigfried. "The future of computer forensics: a needs analysis survey", Computers & Security, 2004, 23, (1), pp. 12-16. K.L.Norman, and J.P.Chin. "The effect of tree structure on search in a hierarchical menu selection system", Behaviour & Information Technology, 1988, 7, (1), pp. 51-65. J.Stasko, R.Catrambon, M.Guzdial, K.McDonald. "An evaluation of space-filling information visualizations for depicting hierarchical structures", International Journal of Human-Computer Studies, 2000, 53, (5), pp. 663-694. S.Hanke. "The performance of concurrent red-black tree algorithms". Springer, 1999. S.J.Nasuto, M.J.Bishop. Time Complexit Analysis of the Stochastic Diffusion Search. J.Han, M.Mmber, J.Pei"Data mining: concepts and techniques". Morgan kaufmann, 2006. R.E.Tarjan. "Updating a balanced search tree in O (1) rotations", Information Processing Letters, 1983, 16, (5), pp. 253-257. C.S.Leung, and Q.I.Khan. " DSTree: a tree structure for the mining of frequent sets from data streams", in Editor: "Book DSTree: a tree structure for the mining of frequent sets from data streams", IEEE, 2006, edn., pp. 928-932. http://en.wikipedia.org/wiki/Visual_analytics. A. Inselberg. "Parallel coordinates: visual multidimensional geometry and its applications", Springer, 2009. A.Inselberg, B.Dimsdale. "Parallel coordinates: Human-Machine Interactive Systems", Springer, 1991, pp. 199-233. M.L.Huang, Q.V. Nguyen. "A space efficient clustered visualization of large graphs", in Editor: "Book A space efficient clustered visualization of large graphs" , IEEE, 2007, pp. 920-927. R.M.Edsall. "The parallel coordinate plot in action: design and use for geographic visualization", Computational Statistics & Data Analysis, 2003, 43, (4), pp. 605-619. R.Finsterwalder, "A Parallel Coordinate Editor as a Visual Decision Aid in Multi-Objective Concurrent Control Engineering Environment IFAC CAD Contr", Sys., Swansea, UK, 1991, 119, pp. 122. D.A. Keim. "Designing pixel-oriented visualization techniques: Theory and applications", Visualization and Computer Graphics, IEEE Transactions on, 2000, 6, (1), pp. 59-78. E.J.Wegman."Hyper-dimensional data analysis using parallel coordinates", Journal of the American Statistical Association, 1990, 85, (411), pp. 664-675. M.Robert. CyberCrime Investigating High-technology Computer Crime. Anderson Publishing. Second Edition.2011. http://sloanreview.mit.edu/article/big-data-analytics-and-the-path-frominsights-to-value