Semantic knowledge representation in Terrorist

1 downloads 0 Views 589KB Size Report
methods, algorithms and systems providing precision tools for analysts to support the concept of ... ability for integration with GIS tools supporting the visualization of geographic features. ..... group members involved in bombing and assassination in years 2005 to 2009. ... Allows him to check the current validity of the ...
Semantic knowledge representation in Terrorist Threat Analysis for Crisis Management Systems Mariusz Chmielewski1, Andrzej Gałka1, Piotr Jarema1, Kamil Krasowski1, Artur Kosiński1 1

Computer Science Department, Cybernetics Faculty, Military University of Technology, Kaliskiego 2, 00-908 Warsaw, Poland

Abstract. In recent years problem of identifying terrorist threat has become a priority topic for government and military organizations. This paper concludes our ideas on new concepts of indirect association analysis to extract useful information for terrorist threat indication. Method introduces original approach to knowledge representation as a set of ontologies and semantic network, which are then processed by the inference algorithms and structure graph analysis. Described models consist of experience gathered from intelligence experts and several open Internet knowledge systems such as Global Terrorism Database [13], Profiles in Terror knowledge base. We managed to extract core information from several ontologies and fuse them into one domain model aimed to provide basis for indirect associations identification method. Keywords: counter-terrorism, decision support, ontology, GIS, crisis management, GTD

1 Introduction Emerging terrorist threats have been extremely difficult in analysis. In recent years this problem has become a priority topic for government and military organizations which often need to use sophisticated methods to detect criminal networks and individuals associated with them. Conclusions received through the multi-analysis allow effectively monitor the activities of criminal organizations, help to identify the phases of actions being prepared or in progress. Opposing irregular groups requires the ability to analyze large data sets that can come from unreliable sources or be incomplete. Researches show that the analysis of activities of terrorist organizations is possible, however, requires the use of interdisciplinary knowledge. In recent years rapid development of information technology, have highlighted the need to introduce methods, algorithms and systems providing precision tools for analysts to support the concept of Semantic Web. This idea is associated with, what is being called, the next stage of development of the Internet grouped around the description languages and ontologies [12] with a strong theoretical basis for inference mechanisms [3]. Semantic net gives one of the most important advantage for terrorist activities data representation – scalability and flexibility of knowledge representation. Presented method of semantic network analysis and association acquiring, aims at:

• Providing a tool for operating on large information resources, • Eliminating the unreliable and unwanted information within the semantic network (essential requirement due to algorithm complexity), • Selecting significant nodes and relations between them (for the analysis) [8], • Searching the indirect relations between the nodes based on already stored knowledge in the semantic network (building the new knowledge in the system). Presented tool is based on a method of identification of hidden semantic associations on the web using the ontology description generation methodology, which provides a dedicated approach to modeling and data processing [11]. The main assumption of presented method is that the generation of semantic network based on registered atomic events, provide data, in which the proposed algorithms will find indirect links between the vertices of the network. This process can be seen as building of new knowledge [5], thus providing specific inference algorithms for semantic networks. Designed tool gives a scalable architecture able to be developed towards a complex networks reasoning by expanding the number of data sources used and dynamic assigning of semantic graph quantitative characteristics. Analysis of associations, for the purpose of this work, is understood as seeking through semantic network in order to find the links between vertices, providing modifications to its structure along with addition of new links, new vertices (the facts) or the reclassification of vertices or arcs network [7]. The result of this process is new knowledge acquisition in semantic network, which allows to detect information of potential crises. Developed method makes use of a rich set of algorithms (filtering and quantity analysis) in order to eliminate unnecessary information while seeking for new associations. To extend application of the developed method, process of analyzing developed semantic model had been additionally extended towards multi-criteria decisionmaking [9]. Depending on the tasks analyst may request extended answer taking into account the strength of the connection type. One of the main problems that had been managed in this work is the acquisition of data and its transformation into the semantic model. We base our data on unclassified data sources found on the Internet, both in semantic form (models RDF, OWL - http://profilesinterror.mindswap.org/) and pure relational Global Terrorism Database [13] .One of the stages of constructing the tool was transformation of relational model into a semantic data and its further integration in order to provide uniform database of terrorist organizations, incidents, and events merged with spatial services. One of the main requirements defined based on the method and designed tool is the ability for integration with GIS tools supporting the visualization of geographic features. In addition to ontological mechanisms, two separate sub-libraries based on BBN Openmap™ and NASA WorldWind™ were developed. Chosen GIS tools typically operate on the basis of relational or object-oriented data. Implemented modifications made it possible to extend mechanisms towards semantic data representation and querying subsystem using SPARQL.

Fig. 1. The concept of the collection data in semantic model and its further processing for the analysis of Terrorist Threats

2 Spatial data subsystem as important functionality of decision support systems Analysis, detection or prevention of terrorist threats is strongly associated with various types of geographic data. Their presentation in the correct form may improve many aspects related to the identification of terrorism. GIS systems may significantly contribute to the identification of risks through [9]: • accelerating problem-solving and decision-making by presenting the situation on maps • improving communication - faster and more effective exchange of information • increasing the effectiveness of learning and training of analysts • increasing control over the area of operations - visualization of data from different sources at the same time, introduces the phenomenon of synergy in relation to analyzing the data separately • broadening the horizon of perception of the problem - analyzing the GIS display often allows for a broader look at the problem

To determine the requirements for the terrorism threat identification system we must define what data and dependencies between them can be used. Spatial regularity that may be important in preventing the attacks may have a following backing: • historical - analysis of historical political geography of the area, ex. nonexisting boundaries of administrative divisions • environment - an analysis of diversity of natural conditions • cultural heritage - an indication of diversity of cultural regions with their politics • structural - the level of urbanization, demographic indicators, the level of education of inhabitants, the structure of their employment, and level of income This information will allow presenting the overall situation and define the likelihood of occurrences of threats in the territories, track tension and aggression between them. Based on these data, it is possible to designate areas potentially at risk and to determine the risk profile. For a full analysis of the problem of terrorism there is a need for more precise data, such that not only will help define areas of danger, but also will help to detect the occurrence of specific attacks - in a particular place and particular time. These should include information about terrorist groups about: • areas of actions, • the purpose and type of their distribution, • method of operation, • the path of travels, • used weapons, • the location of recent activity An important element is the tracking of communication that occurs between groups of suspects. The geographical representation of the message tracking can be helpful in the analysis of the potential attack site. Increased communication can mean planning the attack, the sudden rupture of the attack that is already close. To localize locations of probable attacks it is necessary to define the location of strategic sites: the place of mass groupings of people, government institutions, embassies, sports facilities, military units, bridges, airports, stations, etc. In order to make more detailed proposals of connecting terrorist groups with specific localizations we must define dependencies. For example we may connect different places to be particularly susceptible to attacks of a particular type because of strategic importance for the terrorist group, etc..

3 Data Fusion, analysis and reasoning For the purpose of this work, data fusion is defined as integration of data gathered from different sources into one common form, allowing for data management in a uniform manner. The data sources may be not only the new digital maps, satellite imagery but also numerical data as statistical yearbooks and particularly in the case of the identification data for the terrorist attacks all kinds of institutions that can help some way to monitor and detect threats. Modern standards for developing systems such as SOA (Service Oriented Architecture) head towards the creation of platform-independent interfaces that

provide specific services, as for example a standard of exchanging maps such as Web Map Service (WMS). The problem arises at the stage of the merging data on the target system. One of the most flexible and versatile ways of storing data are semantic models and in consequence ontologies. Such representation allows to create selfdescribing graph structured data, ready for classifications and reasoning. Modern tools such as the Jena Semantic Web Framework can manage knowledge stored in ontologies not just at level of viewing and filtering, but also allows finding hidden dependencies by reasoners. An important value of the method is the ability to dynamically accept new information inserted by the analyst. Definition of new spatial elements, shared in the system globally, speed up communication and broadens the horizon of the problem of perception by other users. Since a large problem is verifying completeness of the data provided from external sources system should also allow analysts to verify the incoming data.

4 Process of collecting facts – knowledge base completion The easiest way for building a semantic model is provisioning already previously collected data to a model with identical or similar structure to the structure of the data. In practice, however, such situations rarely occur. Models usually have a lot of independent sources. There is no need to do that automatically when a set of data is small, but for large collections of facts, it is almost impossible. Locating thousands of records in the semantic model, requires automation and use of additional software for the analysis of syntactical text. Typically, due to different structures of sets feeding a semantic model, an analysis is performed by which the individual solution is achieved. While designing core ontology authors distinguished main differences between designing ontologies and data models. The key difference is not the language the intended use, but the ability to share the model and its concepts for multiple users and applications. Data models usually live in a relatively small closed world – identified mostly in the system; ontologies are meant for an open, distributed world, systems. The acquisition of facts about the terrorist incidents, groups and profiles of terrorists for the purpose of ABox is a separate challenge. The study of terrorism engaged in most government institutions, which do not provide data or heavily restrict access to them. Specially constructed for this project tools, the multi-agent system of monitoring RSS feeds, and, found in the network database, containing historical data of terrorist incidents, were used to endow the model. The core of the incidents is a collection of data from GTD and GTD2. Global Terrorist Database is historical data, set up under the project at the University of Maryland in the U.S. Department of Justice under the supervision of the United States. The set contains a collection of 59 503 incidents of terrorism that took place between 1970 - 1997, with no accompanying data from 1993. Data is based on a database generated by the PGIS (Pinkerton Global Intelligence Services), which contains 67 165 records. Data collection was completed in December 2005. This part of the collection was made public only in 2007 [10]. Due to the nature of the data, the

user has access only to a limited number of records. Authors of collections of data made it available to the format of SAS, SPSS, and text format which were the basis for migration mechanisms constructed using XML parsers.

Fig. 2. The process of migrating data to the semantic model GTD2 is continuation of a GTD of the work created at the same university, supplementing incidents registered in the period of 1998 - 2004. Database structure has changed and has been extended with new information. It contains 126 variables of which a large proportion is confidential. GTD and GTD2 provide basic information about the date and place of incident, the terrorist groups involved in the incidents and used weapons. Facts migration to the semantic model from GTD and GTD2 were made in the same way, but the structure of the syntactical analyzer of text had to be revised, because of the differences between sets. The following stages supplying the semantic model: 1. Transforming a set of SAS datasets to XML based format or CSV files; 2. Implementation of appropriate tools built in Java using JDOM package for spatial transformation a intermediate model to the semantic model; 3. To manage the semantic model used the open source JENA Semantic Framework library. For the purposes of a semantic model a transitional model in MySQL relational database was created and then loaded into the data posted on the above website. Then the data were loaded, from the database to the semantic model. Until recently, access to the service semantic model was based on the Java Servlet mechanism. Because of small flexibility of presented solution, access to the model is being redesigned towards SOA, using WebServices.

5

Semantic model of terrorist threat For the purpouse of this work we define Terrorist Threat Model as:

Τ= Oi , I j , Ak P ,Cra , Evl

(1)

where: Oi - is the i-th terrorist organisation and its description concerning all known leaders, active members, registered events, attacks, statements concerning time, location, media I j - is the j-th subject of potential attack (critical infrastructures, people)

Ak P - is the k-th stored pattern of attack (series of events that are significant for the p-th attack type or one of its phases) Cra - is the a-th crisis situation

Ev - l-th vector of significant events that had been or have been registered and are l connected or is suspect of the indirect dependency. A critical infrastructure can an institution, building, facility which is important from the desired point of view connected directly with the domain of attack target.

Cr = Oi ,I j ,Tm ,Wa , state p ,ϕa (2) a where: T - is a m-th type of attack, modeled using taxonomy of attacks and the

m

availability of designed attacks scenarios W - is a collection of weapon types used directly in attack, including

a

conventional and ABC weapons, their classification and their destruction factors

{

p state - is the phase of the terrorist act preparation St = state p

ϕ

a

}

- is a vector of quantity factors calculated for the analyzed crisis situation.

Vector consists of two types of factors deterministic and probabilistic defining: • number of registered suspected events in given time unit, • probability of given type of attack, • probability distribution of: o time to attack, o human losses, o critical infrastructure destruction.

Fig. 3. Semantic crisis model presenting core elements using ontology relations. Using model we designed knowledge base of terrorist threat, that additionally had been extended using domain models of: • Biological weapons (diseases, agents, viruses, bacteria, fungus, and means of their transportation) • Chemical weapons (chemicals, toxins, venoms, cure, drugs, remedies) • Conventional warfare (taxonomy, weapons, munitions, explosives, means of deployment) • Critical infrastructures, urbanised area structures, government and authorities facilities, • Spatial elements and their characteristics, • Terrorist organisations and their description including group members, • Crime taxonomy including criminal registry model Prepared terminology and model creates a kind of dictionary which aggregates certain data about terrorist organizations, critical infrastructures, attack patterns. Prepared semantic model have been compared in search of missing domain

description to available dictionary http://www.taxonomywarehouse.com/

provided

by

Taxonomy

Warehouse

6 Semantic searching – SPARQL Queries SPARQL Query Language is used to search data in RDF format. RDF (Resource Description Framework) is a metadata specification model, as set out by the W3C, typically implemented in XML . The purpose of RDF is to enable machine processing of abstract descriptions of resources automatically. It can be used both to search for data and tracking information of chosen subject. RDF describes resource by using the expression consisting of three elements: an subject, predicate and object.

Fig. 4. RDF elements in semantic statement presenting the Subject-PredicateObject triple. The SPARQL query language is executed on the triples collections stored in the RDF format. Sample SPARQL query: SELECT ?groupName WHERE { ?y terrorism:Country "Poland" . ?y terrorism:Group ?groupName. }

Query compares the three conditions in "where" clause with triples that are contained in the RDF graph. For a given query a predicate and object are given, therefore the pattern will match only values with predicate and object set like in the query. The result will be a collection of subjects of events as defined just after the “SELECT” keyword. SELECT ?suspect WHERE { ?event a terrorism:Event. ?event terrorism:Date ?date.

?event terrorism:TypeOfEvent ?eventType. ?eventType a terrorism:EventType. { ?eventType terrorism:Name "BOMB ATTACK". } UNION { ?eventType terrorism:Name "ASSASSINATION". } ?group terrorism:ResponsibleGroup ?event. ?groupMember terrorism:MemberOf ?group. ?suspect terrorism:Knows ?groupMember. ?suspect a terrorism:Person. ?suspect terrorism:Live ?city. ?city terrorism:Situated "POLAND". FILTER ( ?date > "2005-01-01"^^xsd:date && ?date < "2007-01-01"^^xsd:date ) } ORDER BY ?city

In real conditions more complicated queries are needed. This query shows the syntax of a query with more than one condition and a result of a subgraph. The aim is to find all suspected people living in Poland who had been in contact with terrorist group members involved in bombing and assassination in years 2005 to 2009.

7 Acquiring data for knowledge base Knowledge base of the designed system operates in a distributed environment. The nature of the terrorist incidents and the methods of acquiring information related to them, requires to search suitable sites on the web. These arguments in favour of creating a working system based on agent system - a system consisting of autonomous software components. In contrast to object-oriented projects, in multiagent systems the information may not always be correct and more importantly, may not always reach its target. It is recommended that a single operation should be performed by several agents simultaneously, in order to minimize errors, given that the operations may be performed. Multiagent systems are used in large environments - in computer networks, which can also include the Internet. The information provided in this network may be not fully in line with reality. The devices in the network may be crashed (links, hubs or servers). Agent systems don’t need to be precisely synchronized, which is characterized by high tolerance to the delay of communication. Using visualization tool, the analyst can easily determine the location of the selected class instance in the hierarchy. “America” is the subclass of World. It is also higher in the hierarchy from North_America and South_America. Visualization based on interactive graph provides invaluable assistance to the analyst at the time of designing ontologies. Allows him to check the current validity of the

defined compounds and present a hierarchy of dependencies between classes. This allows for the most precise representation in the model chosen part of reality. It also provides assistance to the users. User, which is not familiar with the analyst’s concepts, looking at the graph representation of the knowledge base, is able to read information about the structure and hierarchy of classes in the model.

Fig. 5. Semantic data visualisation using developed environment for crisis management tools presenting gathered data from GTD and composed knowledge base

8 Conclusions Presented approach for modelling and terrorist threat assessment has been applied in works of NATO Modelling and Simulation Group along with decision support module in Crisis Management System for Warsaw Agglomeration. At this moment we have managed to develop, reliable means of data migration and its instantiation in form of knowledge base, extending available methods, dedicated for ontology reasoning. Conducted research allowed us to verify available mechanism for semantic inferencing and their application in domain of crisis management systems and heterogeneous data sources integration. So far developed method and its implementation has proven that this field of research propose interesting results for huge knowledge bases, especially in form of indirect association analysis which can be further extended towards complex and Bayesian network algorithms indicating

quantity characteristics for terrorist threat. Many of presented ideas have already been applied in real world applications, used for demonstration of available decision support methods developed at Cybernetics Faculty in the domain of asymmetric threat management systems. Further research is aimed at extending ontology models in domains of weapons, critical infrastructures and their sensitivity based on the destruction factor and specific environment description. Acknowledgments. This work was partially supported by grant “Research Project No PBZ-MIN/011/013/2004”, “Research Project No PBZ-MNISW-DDO-01/1/2007”.

References 1. 2. 3. 4. 5. 6. 7. 8. 9.

10. 11.

12. 13.

D. Roman, U. Keller, H. Lausen, J. Bruijn, R. Lara, M. Stollberg, A. Polleres, C. Feier, C. Bussler, D. Fensel: Web Service Modeling Ontology, Applied Ontology, 2005 M. Mohammadian, Intelligent Agents for Data Mining and Information Retrieval, Idea Group Publishing, 2004 J. Davies, D. Fensel, F. Harmelen, Towards the Semantic Web: Ontology-driven Knowledge Management, HPL-2003-173, JOHN WILEY & SONS, LTD, 2003 M. Herrmann, O. Dalferth, M. A. Aslam, Applying Semantics (WSDL, WSDL-S, OWL) in Service Oriented Architectures (SOA), 10th Intl. Protégé Conference, 2007 E. Currie, M. Parmelee, Toward s Knowledge-Based Solution for Information Discovery in Complex and Dynamic Domains, 7th Intl. Protégé Conference, 2004 Krebs V., Mapping Networks of Terrorist Cells, Connections, 24(3): 43-52, 2005 Mannes A., Golbeck J. Building a Terrorism Ontology, University of Maryland, College Park Barthelemy M., Chow E., Eliassi-Rad T. Knowledge Representation Issues in Semantic Graphs for Relationship Detection, UCRL-CONF-209845. Najgebauer A., Decision support systems in conflict situations. Models, methods and the interactive simulation environments., Military University of Technology , Warsaw1999, ISBN 83-908620-6-9. Najgebauer A. and others, M&S Tool for the Early Warning Identification of Terrorist Activities, Final report in the project NATO MSG – 026 – TG – 19, 2007 Najgebauer A., Antkiewicz R., Chmielewski M., Kasprzyk R. (2008) The prediction of terrorist threat on the basis of semantic association acquisition and complex network evolution, Journal of Telecommunications and Information Technology 2/2008 ISSN 1509-4553 http://protege.stanford.edu/publications/ontology_development/ontology101-noymcguinness.html National Consortium for the Study of Terrorism and Responses to Terrorism, Global Terrorism Database - http://www.start.umd.edu/start/