Building Web Ontology for Crop-Pests Domain to ...

6 downloads 146814 Views 1MB Size Report
Building Web Ontology for Crop-Pests Domain to Allow Better. Search on the Web .... last decade, but the one that, in our opinion, best characterizes the ...
Third International Conference on Informatics and Systems, Mar 19-22, 2005 Cairo University, Faculty of Computers and Information, Giza, Egypt

Building Web Ontology for Crop-Pests Domain to Allow Better Search on the Web A. M. Hafez Teaching Assistant, Department of Computer & Information Sciences, Institute of Statistical Studies and Research, Cairo University, [email protected]

M.B. Riad Professor, Department of Information System, Faculty of Computers & Information, Cairo University, [email protected]

Abstract

1

M. Nour El_dein Assistant Professor, Department of Information System, Faculty of Computers & Information, Cairo University, [email protected]

Introduction

Most of the web users now face difficulties in Searching for information. Finding the right piece of information on the Web is often a trouble. In searching the Web for specific information, one gets lost in huge amounts of irrelevant material and may often miss the relevant matter. Searches are imprecise, often returning pointers to many thousands of pages (and this situation worsens as the Web grows). In addition, a user must read through the retrieved documents to extract the desired information—so even once a truly relevant Web page is found, the search may be difficult or the information obscured. Thus, the same piece of knowledge must often be presented in different contexts on the same Web page and adapted to different users’ needs and queries. However, the Web lacks automated translation tools to allow this information to be transformed automatically among different representation formats and contexts. The semantic web is an extension to the current web, thought by Tim Berners-Lee [2], inventor of the World Wide Web (WWW), Uniform Resource Identifier (URIs) , Hyper Text Transfer Protocol (HTTP), and Hyper Text Markup Language (HTML) [4] to provide the methods required to enable machines to interpret the web content for meaning not just for display and layout. In order to do so, it tries to combine the power of some AI techniques used in knowledge representation and acquisition with web technologies and Hypertext to provide a web of meaning to enable the development of more powerful applications on the web [5]. The semantic web approach is to annotate web resources with Semantic markups [3] that can be interpreted for its meaning by machines. Semantic web uses eXtensible Markup Language (XML) [8] to provide the ability to create the required elements. The uniqueness of these elements is provided by using Uniform Resource Identifier (URIs). All semantic web languages use XML syntax for its representation. Semantic web uses RDF [12]

Ontology is the backbone of the semantic web. To apply the semantic web concept, you should build ontology for your domain. Building ontologies require cooperation among domain experts to classify items in the domain, specify properties, and determine rules for inferring new information from the ontology. Once ontology was built, stored in knowledge base system and published on the web, an intelligent search engine or an inference engine can query and make inference on this ontology. Also ontology developers can reuse and extend this ontology. In this paper we will show how we can build ontology for the crop pests in the Agriculture domain, specify what the benefits of this ontology are and how the query result is more accurate than searching traditional web. In our process we used the Ontology Web Language (OWL) [1] as it is the most recent Ontology language proposed by World Wide Web Consortium W3C (http://www.w3c.org) and for its powerful features. Also we used protégé2000 as Ontology Editor as it is suitable for OWL.

‫ ﻟﺘﻄﺒﻴ ﻖ‬.‫( ه ﻮ اﻟﻌﻤ ﻮد اﻟﻔﻘ ﺮي ﻟﻤﻔﻬ ﻮم اﻟﻮﻳ ﺐ اﻟ ﺪﻻﻟﻲ‬Ontology)‫ﺑﻨ ﺎء‬ ‫( ﻓ ﻲ ﻥﻄ ﺎق ﻡﻌ ﻴﻦ‬Ontology) ‫ﻡﻔﻬ ﻮم اﻟﻮﻳ ﺐ اﻟ ﺪﻻﻟﻲ ﻳﺠ ﺐ ﻋﻠﻴﻨ ﺎ أن ﻥﺒﻨ ﻲ‬ ‫وه ﺬﻩ اﻟﻌﻤﻠﻴ ﺔ ﺕﺘﻄﻠ ﺐ ﺕﻌ ﺎون ﺑ ﻴﻦ ﺧﺒ ﺮاء ه ﺬا اﻟﺘﺨﺼ ﺺ ﻟﺘﺤﺪﻳ ﺪ اﻟﻤﻔ ﺎهﻴﻢ‬ ‫ ﺛ ﻢ ﺕﻌﺮﻳ ﻒ اﻟﻘﻮاﻋ ﺪ اﻟﺘ ﻲ ﺕﺴ ﺎﻋﺪ‬، ‫ ﺕﺤﺪﻳ ﺪ ﺧﺼﺎﺋﺼ ﻬﺎ‬، ‫ ﺕﺼ ﻨﻴﻔﻬﺎ‬، ‫اﻟﻤﻮﺝ ﻮدة‬ ‫ ﺑﻤﺠ ﺮد‬.‫ﻋﻠ ﻲ اﺱ ﺘﻨﺘﺎج ﻡﻌﻠﻮﻡ ﺎت ﺿ ﻤﻨﻴﺔ ﻡ ﻦ ﻡﻌﻠﻮﻡ ﺎت ﺹ ﺮﻳﺤﺔ‬ ‫ ﻳﻤﻜ ﻦ‬، ‫( وﺕﺨﺰﻳﻨﻬﺎ ﻓﻲ ﻥﻈﺎم ﻡﻌﺮﻓ ﻲ ﺛ ﻢ ﻥﺸ ﺮهﺎ ﻋﻠ ﻲ اﻟﻮﻳ ﺐ‬Ontology)‫ﺑﻨﺎء‬ ‫ﻟﻤﺤﺮآ ﺎت اﻟﺒﺤ ﺚ اﻟ ﺪﻻﻟﻲ أن ﺕﺠﻴ ﺐ ﻋﻠ ﻲ أﺱ ﺌﻠﺔ ﺕﺨ ﺺ اﻟﻤﻌﻠﻮﻡ ﺎت اﻟﻤﺨﺰﻥ ﺔ‬ ‫وﻳﻤﻜﻨﻪ أﻳﻀﺎ اﺱﺘﻨﺘﺎج ﻡﻌﻠﻮﻡﺎت ﺝﺪﻳﺪة ﻟﻴﺴﺖ ﻡﺨﺰﻥﺔ ﺑﻨﺎء ﻋﻠ ﻲ ﻗﻮاﻋ ﺪ اﺱ ﺘﻨﺘﺎج‬ ‫( ﻓ ﻲ ﺕﻔ ﺲ اﻟﺘﺨﺼ ﺺ‬Ontology) ‫ أﻳﻀﺎ ﻳﻤﻜﻦ ﻟﻤﻄ ﻮري‬.‫ﺕﻢ ﺕﺤﺪﻳﺪهﺎ ﻡﺴﺒﻘﺎ‬ ‫ ﻓ ﻲ ه ﺬا اﻟﺒﺤ ﺚ ﺱ ﻨﻘﺘﺮح ﺑﻨ ﺎء‬.‫أو ﺕﺨﺼ ﺺ أﺧ ﺮ أن ﻳﻌﻴ ﺪوا اﺱ ﺘﺨﺪاﻡﻬﺎ‬ ‫( ﻟﻸﻓﺎت اﻟﺰراﻋﻴﺔ اﻟﺘﻲ ﺕﺼﻴﺐ اﻟﻤﺤﺎﺹ ﻴﻞ اﻟﺰراﻋﻴ ﺔ وﺕﻮﺿ ﻴﺢ‬Ontology) ‫اﻟﻔﻮاﺋﺪ اﻟﺘﻲ ﺕﻌﻮد ﻋﻠ ﻲ اﻟﺨﺒ ﺮاء اﻟ ﺰراﻋﻴﻴﻴﻦ واﻟﻤﻬﺘﻤ ﻴﻦ ﺑﻬ ﺬا اﻟﻤﺠ ﺎل ﻡ ﻦ ه ﺬﻩ‬ (Ontology) ‫ ﻡﻦ ﺧﻼل ذﻟﻚ ﺱﻮف ﻥﺘﻌ ﺮض ﻟﺨﻄ ﻮات ﺑﻨ ﺎء‬. (Ontology) ‫ ﺑﻌﺾ اﻷﺱﺌﻠﺔ ﺱ ﻴﺘﻢ ﻋﺮﺿ ﻬﺎ ﻋﻠ ﻲ ه ﺬا اﻟﻨﻈ ﺎم‬. (OWL) ‫ﺑﺎﺱﺘﺨﺪام ﻟﻐﺔ اﻟﺒﻨﺎء‬ ‫وﻋﻠ ﻲ ﻡﺤﺮآ ﺎت اﻟﺒﺤ ﺚ اﻟﻌﺎدﻳ ﺔ ﻋﻠ ﻲ اﻟﻮﻳ ﺐ وﺱ ﺘﺘﻢ اﻟﻤﻘﺎرﻥ ﺔ ﻓ ﻲ إﻃ ﺎر ردود‬ .‫اﻟﻨﻈﺎﻡﻴﻦ‬

250

Third International Conference on Informatics and Systems, Mar 19-22, 2005 Cairo University, Faculty of Computers and Information, Giza, Egypt

as a flexible data model which gives it the power to represent the web semi-structured data. Resource Description Framework Schema RDFS [9] provides extended vocabulary with defined formal semantics to be used with RDF. Ontology languages extend RDFS basic semantic with more vocabulary, it also utilises formal languages as DL (Description Logic) , a family of knowledge representation languages designed for encoding knowledge about concepts and concept hierarchies, for more powerful representation of knowledge semantics. Ontology [11] is a key player in the semantic web due to its role in providing an explicit and formal representation of shared domains knowledge, which can be interpreted by machines.

2

2.1

Ontologies vs. knowledge bases

Ontologies are closely related to knowledge bases. The distinction between ontologies and knowledge bases lies on the different role played by represented knowledge. Ontologies tend to represent knowledge that is more or less consensual of a community of people, whereas knowledge bases represent knowledge that is specific of the particular problem that the knowledge based system solves. Ontologies are concerned with static domain knowledge. A knowledge base usually includes knowledge that changes with inferences. Knowledge represented in ontologies does not change with inference. For instance, while an ontology on enterprise modelling contains concepts, such as activity, process, resource, in a knowledge base, one would have represented the particular activities that are performed by a particular enterprise, the particular processes that take place in that enterprise, the actual process, activities, costs, resources that were used to build or produce a particular product, an estimate of the resources that were inferred to be needed to satisfy a new order that has just arrived. Therefore, knowledge in ontologies is more appropriate to be reused and shared across applications [16].

What are ontologies? general idea

Ontologies were developed in artificial intelligence to facilitate knowledge sharing and reuse. Since the beginning of the 1990s, ontologies have become a popular topic for investigation in artificial intelligence research communities, including knowledge engineering, natural language processing, and knowledge representation. More recently, the notion of ontology has also become widespread in fields such as intelligent information integration, cooperative information systems, information retrieval, electronic commerce, and knowledge management. The reason ontologies are becoming so popular has to do in large part with what they promise: a shared and common understanding of some domain that can be communicated among people and application systems. Because ontologies aim at consensual domain knowledge, their development is often a cooperative process involving different people, possibly at different locations. People who agree to accept an ontology are said to "commit" themselves to that ontology. Many definitions of ontologies have been offered in the last decade, but the one that, in our opinion, best characterizes the essence of an ontology is based on the related definitions by Gruber [13]; An ontology is a formal, explicit specification of a shared conceptualization. A "conceptualization" refers to an abstract model of some phenomenon in the world that identifies the relevant concepts of that phenomenon. "Explicit" means that the type of concepts used and the constraints on their use are explicitly defined. "Formal" refers to the fact that the ontology should be machine understandable. Different degrees of formality are possible. Large ontologies like WordNet [25] provide a thesaurus for over 100,000 terms explained in natural language. On the other end of the spectrum is CYC [26], which provides formal axiomating theories for many aspects of commonsense knowledge. "Shared" reflects the notion that an ontology captures consensual knowledge, that is, it is not restricted to some individual but accepted by a group.

2.2

Ontologies vs. schemas

Sometimes there could be a kind of confusion between ontologies and Schemas especially with XML schema [7] documents. It is important to understand the differences and similarities between ontologies and different kinds of schemas and what each can provide. The most popular schemas that are used now are Database Schema, XML Schemas and RDF Schemas. What schemas and ontologies have in common is that they represent the structure and semantic of data, yet, they vary on the level and form of the semantics and structure they provide. Most Schemas is more involved with the structure of information, while ontologies main concern is representing semantics in a machine interpretable way. The relational database schema is the most common example for a Database schema, which has become a base for most of the used databases now. In the relational database schema the structure of data is presented as set of tables with defined relations and conforms to a set of defined integrity constrains [10]. XML Schemas is a way to define restrictions on the structure of XML documents in a domain that is validated against one schema, the main goal of XML Schemas is to unify the structure and syntax for a set of XML documents to enable sharing of these documents between specific applications, while the ontologies goal is to provide a shared formal semantics of a domain to enable sharing, reusing and reasoning of information between any

251

Third International Conference on Informatics and Systems, Mar 19-22, 2005 Cairo University, Faculty of Computers and Information, Giza, Egypt

set of heterogeneous applications that support the language these ontologies are written with. Both Ontology and XML schema play a different role on representing information on the World Wide Web, yet, it is possible to make a kind of integration between them, some ontology languages as DAML+OIL [15] and OWL supports the use of XML schema data types [7] within ontologies. RDF Schemas are very close to the ontologies as they almost share the same goal. However, ontologies are much richer than RDF Schemas as they provide better defined formal semantics and a much better support for reasoning systems. The main differences between ontologies and schemas are indicated by Fensel, D. in [16], as follows: 1. An ontology language is often syntactically and semantically richer than schema languages. 2. An Ontology language is mostly more flexible which allows it to describe diverse types of data models. 3. Ontologies can be best used for describing semistructured information that is written in natural language (as available on the web). 4. An ontology must represent a shared formal semantic of the domain, as it mainly used for sharing and exchanging information. 5. An ontology provides a domain theory not just the structure of a data container. 6. Schemas are more powerful than ontologies in defining the structure of information. 7. Schemas provide a larger variety of data types than ontologies.

2.3

3

Web of ontologies

The idea of the web of ontologies is that integrating a number of ontologies where each represents a part of a larger domain to build new ontologies in the same or related domains. Including other ontologies that represent the same domain can be very helpful, extending and refining old ontologies will save a lot of time and guarantee minimum level of compatibility between ontologies, as they will be sharing part of their vocabularies. Building ontologies on modular bases will help a lot in that approach, for example an ontology that represents a data model of humans and their relationships can be used by many other ontologies where people are involved in it. There is much available ontology on the web that can be used; a list of ontologies in different domains is available in [27]. On the web it is the right of any one to build and publish his ontology, which can be good or bad or even totally wrong or faked. The success of any ontology will depend on how many users will use or extend it, a rating mechanisms can be created to help users (human or machines) to choose which ontologies to use and which to ignore (a web of trust can help in such rating depending on the source of the ontology). The rating of ontologies can be according to the author and the number of ontologies that use or extend it. A new layer could be added to search engines specialised in finding ontologies and concepts as in Swoogle (semantic search engine: http://www.swoogle.com ) , such layer can help developers to find ontologies related to their work and help web users (humans or agents) to limit their search to specific domain.

The role of ontologies in semantic web

Ontologies play a key role in the semantic web, as it currently represents the most practical way for enabling the idea of “Web of meaning” for humans and machines. It is one of the main concerns of the semantic web community to develop ontologies for a wide set of domains, and to enable integration between these ontologies. An ontology should describe common and shared formal semantics of the domain it represents, these shared formal semantics is to be used for adding meaning to web resources. However, it is almost impossible to get the people across the world to agree on one unified representation of concepts in any domain. This could be due to cultural, religious, language or even geographic differences; it is the nature of humans. Even people within the same domain and who share the same believes could have different point of views in some issues. Another issue is that the facts themselves can change with time. That is why any system intended to use ontologies on the web must be able to deal with a wide set of ontologies that could have contradictions, and to support having more than one version of the same ontology to add any required updates.

4

Benefits of web ontologies in crop pests domain

Crop pests is one of the many domains that can benefit from building ontologies that represent the key concepts, relations and structure of pests and all other related entities (crops, lands, crop parts, seasons, etc…). Such ontologies can help in simplifying and automating of: • Archiving crop pests in the agriculture domain in many countries and languages by their domain experts. • Finding and locating information about crop pests needed for interested users and domain experts (using metadata like the crop name, pest name, pest category, pest figure etc…). • Researches related to the domain (for example, discovering new pests or new fighting methods). • Integrating information about crop pests to be accessed in an easy manner.

252

Third International Conference on Informatics and Systems, Mar 19-22, 2005 Cairo University, Faculty of Computers and Information, Giza, Egypt

All of these fields can benefit from Crop Pests ontology. It will also help to provide the availability and accessibility of all kinds of Crop Pests knowledge over the web. There are already many web sites with huge data about Crop Pests on the web. However, it is very hard to query, find and access these data, especially if the target is to find information about a pest that you don’t know its name, just you know its figure as currently there is very little metadata available for multimedia resources which make it very difficult to locate, especially with most of the current search engines using a key word based search as its main approach.

5

o • • • •

Ontology building process



Building ontologies is usually done be domain experts, however, if ontologies are to be used on a large scale on the WWW, it should be available for web developers to build or extend ontologies. Here are our guidelines for building Ontology. •

• • • • • •

What are the valid instances for each class?

After designing the ontology use the language and the tool to create your classes, properties, restrictions and inference rules. If you need some general ontology try to include it from your domain instead of building it. Check the consistency of your constructs using an automated reasoning service like FaCT or RACER. Publish your ontology to the web using Swoogle to allow other developers to reuse it. Fill your instances of data and generate HTML display for it on the web.

These questions are not easy to answer, especially defining the structure of classes and relations between them.. Some of the Current tools that have built-in reasoner (such as Oiled with FaCT reasoner and protégé with RACER [17] reasoner) can help to check the consistency of the ontologies, and detect some flaws.

First you should determine the Domain which your content belongs to, i.e. if we will publish web site about the pests that affect the agriculture crops, our domain will be crops which is sub domain of Agriculture. after that Search, by a semantic search engine about (http://pear.cs.umbc.edu/swoogle/) already built Ontologies, in this domain. If exist, think about the ability of reusing or extending these ontologies by reading about its description. If not, try to build your ontology from scratch. If so you should learn an ontology language like OIL, DAML+OIL, or OWL. Then learn an Ontology editor tool, compatible with the Ontology language you have learned, like protégé, OntoEdit…etc. try to design your ontology by answering these questions: o What are the domain and goals of the ontology? o How general should the ontology be? o What type of queries the ontology is suppose to answer? o What are the concepts (classes) of the domain? o What is the structure of these classes (deciding each class sub and super classes)? o What are the relations (properties) between classes? o Which classes can have specific properties (domain)? o What are the valid values (instances or literals) for these properties (range)?

6

Designing crop pests ontology

Domain of the ontology: •

Crop pests, sub domain of Crops, sub domain of Agriculture.

Goals of the ontology: • • •

To be used in enabled semantic web sites that are related to crops domain. To be used by semantically enabled agents (specifically search agents). Can be used by experts of Agriculture labs.

General characteristics: • •

• •

It is a domain specific ontology. Based on Agriculture Vocabulary and taxonomy by AGROVOC thesaurus [28], and classifications of crops from WordNet thesaurus [29], vercon web site [30]. Interested with crop pests in Egypt. Encoded with the Arabic Languages so users in Egypt can use it.

Language and tools used:

• OWL as representation language. • Protégé 2000 [20] version 3.0 beta Constructing and editing tool.

as

Protégé-2000 Protégé is an ontology editor and knowledge acquisition tool, developed at Stanford University by Stanford medical informatics team to create and modify

253

Third International Conference on Informatics and Systems, Mar 19-22, 2005 Cairo University, Faculty of Computers and Information, Giza, Egypt

2. Properties, Properties are binary relations on individuals - i.e. properties link two individuals together. For example, the property “‫”اﻟﻨﺒﺎﺕﺎت_اﻟﻤﻌﺮﺿﺔ‬ might link the individual “‫ ”اﻟﺬﺑﻮل_اﻟﺒﻜﺘﻴﺮي‬to the individual “‫”اﻟﻔﺎﺹﻮﻟﻴﺎ‬.

reusable ontologies. It was used in knowledge representation applications especially medicine. The advantage of protégé is that it is very customisable which help in extending it via plugging to support different semantic web languages, many plug-in are available on protégé web site that support extra languages such as DAML+OIL [15] and OWL.

6.1

3. Classes, OWL classes are interpreted as sets that contain individuals. They are described using formal (mathematical) descriptions that state precisely the requirements for membership of the class. For example, the class “‫اﻟﻤﺤﺼﻮﻟﻴﺔ‬-‫“ اﻷﻓﺎت‬would contain all the individuals that are pests in our domain of interest. Classes may be organised into a superclass-subclass hierarchy, which is also known as a taxonomy. Subclasses specialize (‘are subsumed by’) their superclasses. For example consider the classes “‫ ”اﻵﻓﺎت_اﻟﻤﺤﺼﻮﻟﻴﺔ‬and “‫ﻓﻄﺮﻳﺔ‬-‫“ أﻡﺮاض‬, “‫ﻓﻄﺮﻳﺔ‬-‫”أﻡﺮاض‬ might be a subclass of “‫اﻟﻤﺤﺼﻮﻟﻴﺔ‬-‫“ اﻷﻓﺎت‬,so “-‫اﻷﻓﺎت‬ ‫ ”اﻟﻤﺤﺼﻮﻟﻴﺔ‬is the superclass of “‫ﻓﻄﺮﻳﺔ‬-‫”أﻡﺮاض‬. This says that, ‘All “‫ﻓﻄﺮﻳﺔ‬-‫ ”أﻡﺮاض‬are “‫’اﻵﻓﺎت_اﻟﻤﺤﺼﻮﻟﻴﺔ‬, ‘All members of the class “‫ﻓﻄﺮﻳﺔ‬-‫ “ أﻡﺮاض‬are members of the class “‫’”اﻵﻓﺎت_اﻟﻤﺤﺼﻮﻟﻴﺔ‬, ‘Being a “-‫أﻡﺮاض‬ ‫ ”ﻓﻄﺮﻳﺔ‬implies that it is an “‫’”اﻵﻓﺎت_اﻟﻤﺤﺼﻮﻟﻴﺔ‬, and ‘‫ﻓﻄﺮﻳﺔ‬-‫ ”أﻡﺮاض‬is subsumed by “‫’”اﻵﻓﺎت_اﻟﻤﺤﺼﻮﻟﻴﺔ‬. One of the key features of OWL-DL is that these superclass-subclass relationships (subsumption relationships) can be computed automatically by a reasoner.

Choosing OWL language

Ontologies are used to capture knowledge about some domain of interest. Ontology describes the concepts in the domain and also the relationships that hold between those concepts. Different ontology languages provide different facilities. The most recent development in standard ontology languages is OWL from the World Wide Web Consortium (W3C). Like Protégé, OWL makes it possible to describe concepts but it also provides new facilities. It has a richer set of operators - e.g. and, or and negation. It is based on a different logical model which makes it possible for concepts to be defined as well as described. Complex concepts can therefore be built up in definitions out of simpler concepts. Furthermore, the logical model allows the use of a reasoner which can check whether or not all of the statements and definitions in the ontology are mutually consistent and can also recognize which concepts fit under which definitions. The reasoner can therefore help to maintain the hierarchy correctly. This is particularly useful when dealing with cases where classes can have more than one parent.

6.2

7

Components of OWL ontologies

Crop pests ontology classes’ hierarchy design:

The top classes (under owl:thing) of the ontology as showed in Figure 2 are: • ‫اﻷﻓﺎت _اﻟﻤﺤﺼﻮﻟﻴﺔ‬ • ‫اﻟﻤﺤﺎﺹﻴﻞ‬ • ‫أﺝﺰاء_اﻟﻨﺒﺎت‬ • ‫ﻡﻮاﺱﻢ_اﻟﺰراﻋﺔ‬ • ‫أﻥﻮاع_اﻟﺘﺮﺑﺔ‬ There are many sub classes to make the ontology as general as possible to support any extensions. The main class of the ontology that define the main concepts in the crop pests domain which is expected to be the topic of most queries is: “‫”اﻷﻓﺎت_اﻟﻤﺤﺼﻮﻟﻴﺔ‬. The path of the class ‫ ”اﻷﻓﺎت_اﻟﻤﺤﺼﻮﻟﻴﺔ‬in the ontology is as follows: owl:Thing -> “‫”اﻷﻓﺎت _اﻟﻤﺤﺼﻮﻟﻴﺔ‬. The class has many properties as showed in figure 1, and subdivided into many subclasses as showed in figure 2, an individual of the this class is shown in Figure 3. The details of Ontology implementation and code can be found in [14].

OWL ontologies have similar components to Protégé frame-based ontologies. However, the terminology used to describe these components is slightly different from that used in Protégé. An OWL ontology consists of Individuals, Properties, and Classes, which roughly correspond to Protégé Instances, Slots and Classes. 1. Individuals, Individuals, represent objects in the domain that we are interested in. An important difference between Protégé and OWL is that OWL does not use the Unique Name Assumption (UNA). This means that two different names could actually refer to the same individual. For example, “‫ ”اﻟﻌﻔﻦ_اﻷﺑﻴﺾ‬and “‫ ”اﻟﻌﻔﻦ_اﻟﻤﺎﺋﻲ‬might all refer to the same individual. Individuals are also known as instances. Individuals can be referred to as being ‘instances of classes’.

254

Third International Conference on Informatics and Systems, Mar 19-22, 2005 Cairo University, Faculty of Computers and Information, Giza, Egypt

Slot :Name (Inherited from owl:Thing) :Documentation (Inherited from owl:Thing) ‫اﻷﻓﺔ‬-‫اﺱﻢ‬ ‫أﺧﺮي‬-‫أﺱﻤﺎء‬ ‫اﻹﺹﺎﺑﺔ‬-‫ﺵﻜﻞ‬ ‫اﻹﺹﺎﺑﺔ‬-‫أﻋﺮاض‬

Value Range XMLSchema:string XMLSchema:string XMLSchema:string XMLSchema:string Image widget XMLSchema:string

‫اﻹﺹﺎﺑﺔ‬-‫أﺱﺒﺎب‬ ‫اﻹﻗﺘﺼﺎدﻳﺔ‬-‫اﻷهﻤﻴﺔ‬ ‫اﻟﻤﺼﺎب‬-‫ااﻡﺤﺼﻮل‬ ‫اﻟﻤﺼﺎب‬-‫اﻟﺠﺰء‬ ‫اﻷﻓﺔ‬-‫ﻥﻮع‬ ‫اﻟﻤﻜﺎﻓﺤﺔ‬-‫ﻃﺮق‬ Figure 1: properties of class “ ‫“ اﻷﻓﺎت _اﻟﻤﺤﺼﻮﻟﻴﺔ‬

XMLSchema:string XMLSchema:string Owl:‫اﻟﻤﺤﺎﺹﻴﻞ‬ Owl:‫اﻟﻨﺒﺎت‬-‫أﺝﺰاء‬ Owl:‫اﻟﺰراﻋﻴﺔ‬-‫اﻷﻓﺎت‬ XMLSchema:string

Figure 2: Subclasses, properties and restrictions of class “‫”اﻷﻓﺎت_اﻟﻤﺤﺼﻮﻟﻴﺔ‬, by Protégé 2000 [20]

255

Third International Conference on Informatics and Systems, Mar 19-22, 2005 Cairo University, Faculty of Computers and Information, Giza, Egypt

Figure 3: A Screen shot of an instance of class “‫ ”اﻷﻓﺎت_اﻟﻤﺤﺼﻮﻟﻴﺔ‬by Protégé 2000 [20]

8

8.2

Testing queries via Algernon Inference engine

Pests usually may have more than one name, for example “‫ ”اﻟﻌﻔﻦ_اﻷﺑﻴﺾ‬has other names like “‫ ”اﻟﻌﻔﻦ_اﻟﻤﺎﺋﻲ‬, and the two names refer to the same pest so a search for any of them should return the same results.

The tests target is to demonstrate the ability of using inference engines on the semantic web to provide more accurate and powerful searches. The performance aspects can not be checked, as the data set which the inference engine works on is very small compared with any search engine over the web. However, most of the expected performance problems can be solved through extra development for more powerful inference engines.

Query 1.1: Syntax: ((: Instance ‫? أﻡﺮاض_ﻓﻄﺮﻳﺔ‬a)( owl:sameAs ?a ‫))اﻟﻌﻔﻦ_اﻷﺑﻴﺾ‬. Explanation: This Query asks to find any instance of class "‫ " أﻡﺮاض_ﻓﻄﺮﻳﺔ‬that have ‫ اﻟﻌﻔﻦ_اﻷﺑﻴﺾ‬as one of its names. The same query is used for “‫ ”اﻟﻌﻔﻦ_اﻟﻤﺎﺋﻲ‬as follow: ((:Instance ‫? أﻡﺮاض_ﻓﻄﺮﻳﺔ‬a)( owl:sameAs ?a ‫))اﻟﻌﻔﻦ_اﻟﻤﺎﺋﻲ‬ Results: This query result will be the same for the two queries; it will return one record of “‫ ”اﻟﻌﻔﻦ_اﻟﻤﺎﺋﻲ‬or “ ‫ ”اﻟﻌﻔﻦ_اﻟﻤﺎﺋﻲ‬as in figure 4.

Protégé provides a default query system, however, it is very limited and does not make inferences as it mostly query at the structure level of the ontology. The main point of power for using ontologies is making inferences. So Algernon as an example for inference engine that was used over protégé to point up some simple and useful inferences that can be done over the ontology.

8.1

Scenario 1

Query 1.2: Syntax: ((:Instance ‫?اﻷﻓﺎت_اﻟﻤﺤﺼﻮﻟﻴﺔ‬a) (‫? اﻷﺱﻢ‬a ‫اﻟﻌﻔﻦ_اﻷﺑﻴﺾ‬ ) Explanation: This query is more general as it uses the class “‫ ”اﻷﻓﺎت_اﻟﻤﺤﺼﻮﻟﻴﺔ‬which is a super class of the “‫ “ أﻡﺮاض_ﻓﻄﺮﻳﺔ‬class, yet the inference engine can infer that any instance of type “‫ ”أﻡﺮاض_ﻓﻄﺮﻳﺔ‬is also a type of any of its super classes. Results: This query result will return the same result of query 1.1 as in Figure 5

The sample data set:

The data set used in the ontology was collected from more than web site mainly from vercon website [30]. The ontology contains: 117 classes, 87 slots, 274 frames, 2 facets, 56 Instances (30 ‫اﻵﻓﺎت_اﻟﻤﺤﺼﻮﻟﻴﺔ‬, ١٤ ‫اﻟﻤﺤﺎﺹﻴﻞ‬, 7 ‫أﺝﺰاء_اﻟﻨﺒﺎت‬, 3 ‫أﻥﻮاع_اﻟﺘﺮﺑﺔ‬, 2 ‫اﻟﺰراﻋﻴﺔ‬-‫)اﻟﻤﻨﺎﻃﻖ‬

256

Third International Conference on Informatics and Systems, Mar 19-22, 2005 Cairo University, Faculty of Computers and Information, Giza, Egypt

8.3

of crop that affected by the pest (i.e. -‫اﻷزهﺎر‬-‫اﻟﻘﺮون‬-‫اﻷوراق‬ ‫اﻟﺠﺬور‬-‫)اﻟﺴﻴﻘﺎن‬.

Scenario 2

Query 2.1: Syntax: ((:instance ‫? ﺡﺸﺮﻳﺔ‬a) (‫? اﻟﻨﺒﺎﺕﺎت_اﻟﻤﻌﺮﺿﺔ‬a ‫)اﻟﻔﺎﺹﻮﻟﻴﺎ‬. Explanation: This Query asks to find any instance of insects (class “ ‫ ) "ﺡﺸﺮﻳﺔ‬that affect the bean crop “‫”اﻟﻔﺎﺹﻮﻟﻴﺎ‬.. Results: As in Figure 6, the inference engine returns eight insects which affects the bean planet. The names of insects are at the right of the figure and the details of each insect at the left.

Metadata are important to enable inference engines to answer specialized queries, a search for “ ‫اﻻﻓﺎت اﻟﺘﻲ ﺕﺼﻴﺐ‬ ‫ ”ﻥﺒﺎت اﻟﻔﺎﺹﻮﻟﻴﺎ‬should be different from a search for “ ‫اﻻﻓﺎت‬ ‫”اﻟﺘﻲ ﻳﺴﺒﺒﻬﺎ ﻥﺒﺎت اﻟﻔﺎﺹﻮﻟﻴﺎ‬, and should be different from “ ‫اﻻﻓﺎت‬ ‫”اﻟﺘﻲ ﺕﺼﻴﺐ أوراق ﻥﺒﺎت اﻟﻔﺎﺹﻮﻟﻴﺎ‬. The Crop Pests ontology provides three important slots about any Crop Pests, “‫ ”اﻟﻨﺒﺎﺕﺎت_اﻟﻤﻌﺮﺿﺔ‬which refers to the crop itself, “‫ ”أﻗﺎت_اﻟﻤﺤﺼﻮل‬which refers to the pests that affect the Crop, and “‫ ”أﺝﺰاء_اﻟﻨﺒﺎت_اﻟﻤﻌﺮﺿﺔ‬which refers to the parts

Figure 4: Results of query 1.1

257

Third International Conference on Informatics and Systems, Mar 19-22, 2005 Cairo University, Faculty of Computers and Information, Giza, Egypt

Figure 5: Results of query 1.2

Figure 6: Results of query 2.1 Query 2.2:

258

Third International Conference on Informatics and Systems, Mar 19-22, 2005 Cairo University, Faculty of Computers and Information, Giza, Egypt

Syntax: ((:instance ‫? ﺡﺸﺮﻳﺔ‬a) (‫? اﻟﻨﺒﺎﺕﺎت_اﻟﻤﻌﺮﺿﺔ‬a ‫? اﻷﺝﺰاء_اﻟﻤﻌﺮﺿﺔ_ﻟﻺﺹﺎﺑﺔ()اﻟﻔﺎﺹﻮﻟﻴﺎ‬a ‫))اﻷوراق‬. Explanation: This Query is more specialized as it asks to find any instance of insects ( class “‫ ) ”ﺡﺸﺮﻳﺔ‬that affect the bean crop “‫ ”اﻟﻔﺎﺹﻮﻟﻴﺎ‬but only its leafs “‫”اﻷوراق‬. Results: As in Figure 7, the query return subset of the results for query 2.1 (four insects) as it concentrates on the insects that affect the leafs only, not all parts of the crop.

8.4

Scenario 3

A string search is also allowed. Search for a word like “‫ ”اﻟﻠﻔﺤﺔ‬will retrieve all instances that contain this word as in Figure 8.

Figure 7: Results of query 2.2

Figure 8: Results of query 3.1

259

Third International Conference on Informatics and Systems, Mar 19-22, 2005 Cairo University, Faculty of Computers and Information, Giza, Egypt

9

9.1

Queries on WWW search engines

Google Web Search

Keywords and results: • Keyword: ‫اﻟﻌﻔﻦ اﻷﺑﻴﺾ ﻗﻲ اﻟﻔﺎﺹﻮﻟﻴﺎ‬, Result: No result found as in Figure 9. • Keyword:‫اﻟﻌﻔﻦ اﻷﺑﻴﺾ‬, Result: Returned 732, none of them is relevant as in Figure 10. • Keyword:‫ اﻟﻌﻔﻦ اﻟﻤﺎﺋﻲ‬, Result: The search returned ٥٤ results ,None of the them is relevant as in Figure 11.

Some of the pervious queries have been carried out over the WWW using the famous Search engines, Google using both web search and image search. The next section is an overview of the results of these queries. The relevance of the results to search criteria is checked usually over the first top 10 results as they should be the most relevant ones.

Figure 9: Results of Google search about “‫”اﻟﻌﻔﻦ اﻷﺑﻴﺾ ﻗﻲ اﻟﻔﺎﺹﻮﻟﻴﺎ‬

Figure 10: Results of Google search about “‫”اﻟﻌﻔﻦ اﻷﺑﻴﺾ‬

260

Third International Conference on Informatics and Systems, Mar 19-22, 2005 Cairo University, Faculty of Computers and Information, Giza, Egypt

Figure 11: Results of Google search about “‫”اﻟﻌﻔﻦ اﻟﻤﺎﺋﻲ‬ From the results of these sample queries over traditional search engine, it was appeared that the search engine either retrieves many irrelevant results or returns no results at all. The cause of the first case is that it does not understand the intended meaning or the context of the query where it only match each word of the query separately with content of each document over the web. The cause of the second one is that the concept may be represented on the web in a term different than in the query and the search engine has nothing to know that the two terms are equivalent since it has no semantic.

10 Related work Structuring knowledge of agriculture domain on the web is the main interest of Food and Agriculture Organization (FAO) of United Nations [31]. Specifically there is the Agriculture Ontology Service (AOS) project [32] which intended to convert all information in the agricultural sector for which there exist already many well-established and authoritative controlled vocabularies, such as FAO's AGROVOC Multilingual Thesaurus [33], the CAB Thesaurus [22], and AgNIC [35] ( the thesaurus of the National Agricultural Library in the United States).

The previous scenarios are very simple one; much more complicated queries can be required by the Crop Pests community. For example a researcher that wants to compare between two pests should be able to view any other similar comparison and full documented information about the two pests and their effects. Such task will require a long time and much effort to be accomplished with the use of current search engines, as it would be divided to simpler subquires, such as a query for each pest on each own , a query for articles about each one and a query about a comparison made by others. Each one of these queries will take some time to filter and extract the relevant links, and then the required information is needed to be extracted from each link.

11

Conclusion

The implementation of the semantic web on a universal scale is still not easy to achieve due to many limitations such as the problem of the extra cost that the process of annotating web resources will need. Also that most of the current semantic web tools such as inference engines and the methods of storing RDF raw data are still not powerful enough to scale up to process the huge amount of information available on the web. The best approach now for the semantic web is to be applied for domain-specific purposes as tested in this paper, limiting the semantic web in a specific domain can provide many benefits and it can satisfy most of the semantic web targets such as better and efficient methods for finding, accessing and integrating information. This approach can lead in the end to the universal scale the semantic web seeks. The results of the given scenarios imply that:

261

Third International Conference on Informatics and Systems, Mar 19-22, 2005 Cairo University, Faculty of Computers and Information, Giza, Egypt

1. 2. 3.

[5] Berners-Lee T. and Fischetti M. Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web by its Inventor, Harper, San Francisco, 1999. [6] Berners-Lee, .T, Hendler, J. and Lassila O. ”The Semantic Web”, Scientific America , May 2001. [7] Biron, P. and Malhotra A. XML Schema Part 2: Datatypes. W3C, May 2001. [8] Bray T., Paoli, J., and Sperberg-McQueen C. XML. W3C (World Wide Web Consortium), February 1998. At:http://www.w3.org/TR/1998/REC-xml9980210.html. [9] Brickley, D. and Guha. R. V. RDF Vocabulary Description Language 1.0: RDF Schema. W3C Working Draft, 23 January 2003. Available at http://www.w3.org/TR/2003/WD-rdf-schema-20030123/. [10] Elmasri, R. and Navathe S. B. Fundamentals of Database Systems, 3rd ed. Addisson Wesley 2000. [11] Fensel, D., Harmelen, van F., Horrocks, I., McGuinness, D. L., and Patel-Schneider, P. F. OIL: An ontology infrastructure for the Semantic Web. IEEE Intelligent Systems, 2001, 16(2), pp. 38-44. [12] Graham K. and Jeremy J. Carroll. Resource Description Framework (RDF): Concepts and abstract syntax. W3C Working Draft, 2003. Available at http://www.w3.org/TR/2003/WD-rdf-concepts-20030123. [13] Gruber T. R.: A Translation Approach to Portable Ontology Specifications, Knowledge Acquisition, 5:199—220, 1993. [14] Hafez, A. M. Enhancing to Semantic Web for Better Search on the Web. M.S.c thesis (in preparation), Information Systems department, Faculty of Computers & Information, Cairo University. [15] Horrocks, I. DAML+OIL: A Reason-able Web Ontology Language in: Extending Database Technology, 2002, pp. 213. [16] Klein, M., Broekstra, J., Fensel, D., Harmelen, van F. and Horrocks, I. Ontolohgies and schema languages on the web in: Dieter Fensel, James Hendler, Henry Liberman and Wolfgang Wahlster, ed. Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential, The MIT press, 2003, pp. 95-139. [17] http://www.sts.tu-harburg.de/~r.f.moeller/ racer/ [18] http://www.openrdf.org/about.jsp [19] http://www.w3.org/TR/soap/ [20] http://www.protege.stanford.edu/ [21] http://www.protege.stanford.edu/ plugins/ owl/ [22] http://194.203.77.66/ [23] http://www.fao.org/agris/ [24] http://www.kaon/semanticweb.org [34] http://www.kaon.semanticweb.org/ [35] http://www.agnic.org [25] http://www.cogsci.princeton.edu/~wn [26] http://www.cyc.com [27] http://www.mindswap.org/~rreck/onts [28] http://www.fao.org/agris/aos/Applications /intro.htm [29] http://www.wordreference.com [30] http://www.vercon.sci.eg/vercon.asp# [31] http://www.fao.org/index_ar.htm [32] http://www.fao.org/agris/aos/Applications/ intro.htm [33] http://www.fao.org/agrovoc/

There are many points of weakness in the current search engines. Semantic web approach can enable much more powerful search engines. Any further development of search engines will always be limited by the lack of formal semantics on the web. This is due to the fact that the data on the web is targeted for human consumption only.

One of the important challenges that face the semantic web now is providing an efficient solution for storing and querying huge amount of RDF data with the use of ontologies. Sesame [18] can be a base for such solution. It can be argued that the cost of the semantic web can be high. Yet, the expected benefits deserve this cost, as the semantic web implementation can open the way for the development of much better web applications.

Future Work Building ontologies in a large scale is not easy thing, so we need a lot of research in the following areas to put the dream in fact. 1. Making annotating Semantic web resources easy for web developer. 2. Provide methods for efficient integration and translation between ontologies. 3. Development of semantically enabled search engines. 4. Development of methodologies to apply semantic web in large scale. 5. Making building , storing, reusing, integrating, and querying ontology at hand for developers.

References [1] Bechhofer, S., Harmelen, van F., Hendler, J., Horrocks, I., McGuinness, D., Patel-Schneider P. F. and Stein L. A. OWL Web Ontology Language Reference [online]. W3C Candidate Recommendation 18 August 2003. [2] Berners-Lee .T, Semantic Web Road map [online]. September 1998 .Available via: http://www.w3.org/DesignIssues/Semantic.html [16 July 2003]. [3] Berners-Lee .T., Foreword in: Dieter Fensel, James Hendler, Henry Liberman and Wolfgang Wahlster, ed. Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential, The MIT press, 2003, pp. xi-xxiii. [4] Berners-Lee T. and Connolly D. Hypertext Markup Language - 2.0. IETF HTML Working Group, November 1995. At: http://www.ics.uci.edu /pub/ietf/html/rfc1866.txt.

262