Building a Personal Knowledge Recommendation ... - IEEE Xplore

4 downloads 15501 Views 7MB Size Report
Building a Personal Knowledge Recommendation System using Agents, Learning. Ontologies and .... classify navigated pages (Web Content Mining). [9, 10].
Proceedings of the 10th International Conference on Computer Supported Cooperative Work in Design

Building a Personal Knowledge Recommendation System using Agents, Learning Ontologies and Web Mining Juliana Lucas de Rezendel, Vinicios Batista Pereira', Geraldo Xexeo 2, Jano Moreira de Souza'2 'COPPE/UFRJ- Graduate School ofComputer Science 2DCCIIM Institute ofMathematics Federal University of Rio de Janeiro, PO Box 68.513, ZIP Code 21.945-970, Cidade Universitaria Rha do Funddo, Rio de Janeiro, RJ, Brazil tjuliana, vinicios, xexeo, jano}@cos. ufrj. br -

Abstract In this paper we consider a process which complements the learning process for building personal knowledge through the exchange of knowledge chains. This approach consists in the partial automatization of the process of creating knowledge chains, through the use of the technology from agents, ontologies and data mining. The agents will monitor all media used by the learner, and will classify its content using an ontology. From there, we want to create and recommend a chain to the learner. This point became important when we observed that the learners weren't motivated to create their chains, which, normally, takes a lot ofeffort. Keywords: CSCW, Collaborative Knowledge Design, Recommendation System, Learning ontology, Web Mining.

1. Introduction Today people need to acquire new knowledge faster and in a much greater volume than in the past. To complement the learning process, there are communities of practice focused on learning, i.e., learning communities. These communities act as both a method to complement teaching in the traditional classroom, and to acquire knowledge in evolution. [1] Pawlowski [2] defined a learning community as being an informal group of individuals engaged in a common interest, which is, in this case, the improvement of the learner's performance using computer networks. One of the principles of Wenger [3] for cultivating communities of practice is the sharing of knowledge to improve personal knowledge. Another issue related to making a successful community should be intense communication between the members. Finally, a community should assist the members in building up their personal knowledge. [4] To complement the learning process, we considered a process to promote knowledge building, dissemination 1-4244-0165-8/06/$20.00 C 2006 IEEE.

-

and exchange in learning communities. The need for a number of individuals to work together (on knowledge design) raises problems in the CSCW domain. [5] Knowledge design [6] is defined as a science of selecting, organizing and presenting the knowledge in a huge knowledge space in a proper way so that it can be sensed, digested and utilized by human beings efficiently and effectively. It aims to offer the right knowledge to the right person in the right manner at the right point of time. According to Xexeo [7], the design activity has been described as belonging to a class of problems that have no optimal solution, only satisfactory ones. They are complex, usually interdisciplinary in nature and require a group of people to solve it. Designing knowledge is similar in principle to designing computer software. It takes time, careful thought and creativity to do it well. The biggest difference is that you can't just load the knowledge into someone's brain like you can do with the software in a computer; you need an implementation procedure to build the knowledge in the learner's mind. [6]

1.1. Motivation To complement the learning process, a system has been developed to promote knowledge building, dissemination, and exchange in learning communities. This system is called Knowledge Chains Editor (KCE) and is based on a process for building personal knowledge through the exchange of knowledge chains (KCs) [1]. It is implemented on top of COPPEER'. The process differential is the addition of "how to use" the available knowledge to the triad "authors" (who), "localization" (where), and "content" (what), which are commonly used. The KC (shown in Figure 1) is a structure created to organize knowledge structure and organization. A KC is made up of a header (which contains basic information related to the chain) and a knowledge unit (KU) list. 1

COPPEER [7] is a framework for creating very flexible collaborative peer-to-peer (P2P) applications. It provides non-specific collaboration tools as plug-ins.

Proceedings of the 10th International Conference on Computer Supported Cooperative Work in Design

1.2. Related Work

a) Knowledge Chain b) Knowledge Composition

Figure 1. Knowledge organization Conceptually, knowledge can be decomposed into smaller units of knowledge (recursive decomposition). For the sake of simplification, it was considered that there is a basic unit which can be represented as a KU (a structure formed using a set of attributes). To build his KC the learner can use the KCE. In the case of questioning he must create a KU whose state is "question". At this moment the system starts the search. It sends messages to other peers and waits for an answer. Each peer performs an internal search. This search consists of verifying if there are any KUs similar to the one in the search. All KUs found are returned to the requesting party, as shown in Figure 2.

Figure 2. KCE architecture The creation of a KU of type 'question' is obviously motivated by the learner's need to obtain that knowledge. So far, we have considered the existence of two motivating factors for the creation of available KCs. The first would be a matter of recognition by the communities, since each KU created has a registered author. The second would be the case where the professor makes them available "as a job", with the intention of guiding his students' studies. However, we were aware that the learner needs more motivation to create new KCs. In the attempt to solve this problem, in this work we present a proposal for an evolution of the KCE. The main goal is to recommend potential KCs that can be accepted, modified or even discarded by the learner. These KCs will be created from the data collected by monitoring (carried out by a software agent2) learner navigation. 2

A Software Agent [8]

can

be defined as a complex object with attitude.

Apart from KCE, there are other tools that stimulate knowledge sharing in communities. These include: WebWatcher [16], which is a search tool where the learner specifies his interests and receives the related pages navigated by the other community members. OntoShare [17] uses software agents which allow the user to share relevant pages. MILK [18] allows the communities to manage knowledge produced from metadata. The main difference between these tools and the KCE is that they are focused on sharing "where" and/or "with whom" the knowledge can be found. KCE adds the sharing of "what" and "how to use" this knowledge. The remainder of this paper is organized as follows. The main concepts of web mining and learning ontologies are presented in the next two sections. Section 4 presents the proposed idea and the prototype developed. Conclusions are given in section 5.

2. Web Mining In a simplified way, we can say that web mining can be used to specify the path taken by the user while he is navigating on the web (Web Usage Mining) and to classify navigated pages (Web Content Mining). [9, 10] However, there is a problem that cannot be solved only using web mining, and this is the difficulty in calculating the information hierarchy. This problem can be solved with the use of ontologies3. In addition to the availability of little (maybe any) structure in the text, there are other reasons why text mining is so difficult. The existing concepts of a text are usually rather abstract and can hardly be modeled by using conventional knowledge representation structures. Furthermore, the occurrence of synonyms (different words with the same meaning) and homonyms (words with the same spelling but with distinct meanings) makes it difficult to detect valid relationships between different parts of the text. [12]

2.1. Web Usage Mining We use web usage mining when the data is related to user navigation, this means, when we store and analyze the order of the navigation pages, the visit length for each page and the exit page. This information will be important

for verifying, respectively, what the order of the navigated concepts is, after page classification; and which 3Ontology [11] is a formal specification of concepts and their relationships. By defining a common vocabulary, ontologies reduce concept definition mistakes, allowing for shared understanding, improved communications, and a more detailed description of resources.

Proceedings of the 10th International Conference on Computer Supported Cooperative Work in Design

pages are relevant when the user doesn't follow the structure of a site and goes to a new site on the same subject, or stops studying the subject. [9]

2.2. Web Content Mining Once the relevant pages are selected using web usage mining, web content mining can be used to analyze and to classify the page content. [9] In this kind of mining the input data is the HTML code of the page and the output data is one or more possibilities for classification of the page in accordance with the used ontology. In order to simplify the page classification we used an automatic summarization technique (AST) to extract the most relevant sentences from the page. [12] First, the AST applies several preprocessing methods to the input page, namely case folding, stemming and removal of stop words. The next step is to separate the sentences. The end of a sentence can be defined as a "." (full stop), a "!" (exclamation mark), a "?" (question mark), etc. In HTML texts we can also consider tags of the language. Once all the sentences of the page are identified, it is necessary to give a "weight" to each remaining word based on its HTML tag [Tablet] and to compute the value of a TF-ISF (term frequency - inverse sentence frequency) measure for each word. For each sentence s, the average TF-ISF weight of the sentence denoted Avg-TF-ISF(s) is computed by calculating the arithmetic average of the TFISF(w,s) weight over all the words w in the sentence. Sentences with high values of TF-ISF are considered relevant. Once the value of the Avg-TF-ISF(s) measure is computed for each sentence s, the final step is to select the most relevant sentences, i.e. the ones with the largest values of the Avg-TFISF(s) measure. In the current version of our system this is done as follows: the system finds the sentence with the largest Avg-TF-ISF(s) value, called the Max-Avg-TF-ISF value; the user specifies a threshold on the percentage of this value, denoted percentage-threshold. Sentences with high values of TFISF are selected to produce a summary of the source text. According to Larocca [12] this technique has been evaluated on real-world documents, and the results are satisfactory.

3. Building and Using Ontologies According to Guarino [13] the ontologies can be categorized in 4 types: top-level, domain, task and application. Top-level ontologies describe very general concepts like space, time, object, etc., which are independent of a particular problem or domain. Domain ontologies and task ontologies describe, respectively, the vocabulary related to a generic domain (like medicine, or automobiles) or a generic task or activity (like diagnosing or selling), by specializing the terms introduced in the

top-level ontology. Application ontologies describe concepts depending both on a particular domain and task, which are often specializations of both the related ontologies. A more generic ontology can become without great effort, more specific in accordance with the necessity. However, to transform a specific ontology into a more generic one can be a difficult task. Therefore, in this work we first created a domain ontology and from this we created a more specific ontology which was more appropriate to our needs. The prototype developed has been created to the Java learning community and the first ontology created was a domain ontology which describes the object oriented (00) language concepts. After this, specific properties were added to the created ontology to incorporate thesaurus functionalities. This way, the software agent can search in the ontology for words found in the text and correlate web pages with ontology concepts, transforming the domain ontology. All classes that symbolize concepts from an 00 language inherit of a superclass called Concept. In our case, this superclass contains a property named keyword, which is used on the page classification, and if we need to add new properties related to the classification it is enough to make it in the Concept class. To transform the new ontology in the domain ontology it is enough to remove the Concept class. The 00 language ontology was instantiated to Java to be used as a specific base of knowledge by the application. With the concepts and relations instantiated, it's possible to compare the keywords found in the page mining process with the ontology keywords. The attribution of weights to the page keywords makes possible the probabilistic classification of the page according to the ontology concept. The relationship between the ontology concepts can be used to support decisions about the concept represented by a page. When the page has the occurrence of keywords that are concepts related to the same concept, the page can be classified as a representation of the common concept. Con, ept

ClassLibrary

-

DataTyp

\

e

Class _

-

InnerClass

Superclass -contams

f subclass

l..* instance

Figure 3. Example of ontology

Proceedings of the 10th International Conference on Computer Supported Cooperative Work in Design

For example, in Figure 3, we have an ontology that has the concept Package related to the concept Class, and Package java.util related to Class, Vector and HashTable. If a page has keywords with the same weight referring to the java classes Vector and HashTable, the system can consider that both are related to Package java.util and can classify the page as a reference to Package.

all this information, the agent can build a potential KC that will be recommended to the learner. 2 Select pages 111111 gl' anId

1. Moitoh

iiavigated1pages

Softwrare Agent

store it

3.1. Learning Ontologies As has been said before, the addition of the collected information during web mining to the existing ontology makes the creation of learning ontology possible. Collaborative learning ontology [14] is the system of concepts for modeling the collaborative learning process, such as 'learning goal', 'learning group type' and 'learning scenario'. When the ontologies are in use they are usually arranged in three layers. The top layer is the negotiation level that corresponds to negotiation ontology. The intermediate layer corresponds to the collaborative learning ontology. Here, only important abstracts for negotiation from agent level remain as the necessary scope of information at an abstract level. The negotiation level is the level that represents the important information for negotiation at an abstract level. The bottom layer is the agent level that corresponds to individual learning ontology. This work contemplates only the two lower layers of a collaborative learning ontology, as it captures the learner's personal learning process, which supports the lowest layer; and allows the exchange of learning processes, creating the necessary information for the highest layer.

4. Automatic Building of Personal Knowledge Chains The main target of this work is to automatically build knowledge chains to be recommended to the learners. As has been previously stated, the learner can accept, modify or even discard these KCs. For this to be possible, the proposal is to extend the Knowledge Chains Editor (KCE) [1] to automatically build personal KCs. In order for this to occur, it is necessary to have an ontology of the considered domain. The goal is to determine the sub-groups of navigated concepts (concepts found in the navigated pages), and relate them to the pages.

The software agent will observe the learner's navigation through web pages (of the considered domain), and then it will store the page content and the time spent on each page (as shown in Figure 4). After this, another agent has the responsibility for mining the navigation and the page content to determine the sub-group of ontology navigated concepts related to the navigated pages and to create a graph from this. With

1! j

kiiow1ed'e h ee to

tK

6. Recoiiiiend

pa=Ies

W

laiowvledc,e tr-ee to

benurner

Sofafe A.eit

,rb p

Tot I~~~~~~

AlToh

3 Get stoi e d

T,Z9-sF eveted2 i ".

4. Caasifiv s pages

witlh oitolo g

0

c oiic epts

oncepts;

Figure 4. KCE personal knowledge recommendation architecture It is necessary to point out that the new KC will be recommended to the same learner that is navigating on the web. He will decide if he wants to add (or not) the recommended KC to his personal knowledge. From this point onwards, if the learner accepts the KC, it can be exchanged between the community members using the KCE.

4.1. Knowledge Chains Recommendation System The learner software agent is responsible for observing the learner's navigation and storing the navigated page content, visit length and the times that it has been accessed. In this first stage the agent only creates a database of web pages and access information (Web Usage Mining). At a later stage, with a frequency determined by the user, another software agent will select, from the stored pages, the pages that are related to the subject discussed by the community (The subject must be known because it is necessary to have an ontology on it in the community). This will be made by comparing the content of the web page with a set of keywords (ontology concepts) related to the subject in question. In this way the stored pages are filtered, with only the ones that are in fact of interest to the community remaining. This also solves any problems related to user privacy, since those pages that are not related to a community subject are discarded. With this set of stored pages the system has a guided graph, because the navigation order has been stored. As

Proceedings of the 10th International Conference on Computer Supported Cooperative Work in Design

the system objective is to make a KC with the concepts studied by the learner, it is necessary to use text mining techniques to classify the pages in accordance with the described concepts of the ontology. This classification is based on the proposals of Desmontils [15] and Jacquin [16]. However, instead of using a thesaurus with an ontology, we have improved our ontology by adding, in all concepts, a vector of attributes with the keywords related to the concept. Thus, we can do the mining and the classification only using the ontology. At this time, the system needs to remove all the stop words from the text on a page. Then it is necessary to give a "weight" to each remaining word, based on its HTML tag. The weights are given in accordance with the values given in Table 1.

match partly with one concept from ontology and partly with another.

Figure 5. Web page navigation

In this case, there is a relevance degree for each concept relating to the page. Therefore, for each page, the result is:

Page 1: a 60%; b 10%-; c 30%-;*Page 2: a 0%; b 0%. c 100%-; *-

Table 1. Higher coefficients associated with HTML markers [15, 16] HTML marker

description Document Title

Keyword Hyper-link

Font size 7 Font size +4 Heading level I Image title Underline font Italic font Bold font

HTML marker

Weigth



10 9 8 5 5 3 2 2 2 2

Once the frequency and the weight of the keywords on a page are compared with the ontology concepts, the page receives degrees of relevance. With this relationship between pages and ontology concepts, the graph of pages can be transformed into a knowledge chain. This KC will be recommended to the learner, and he can decide what to do with it. As there are many software agents "working" for the learners, a lot of KCs will be created. Therefore, it is possible to identify absent concepts in the navigation of one learner that have already been studied by another, and recommend KUs, concepts, pages and even the users who know the concepts the learner doesn't know.

4.2. Example The following example shows how a KC is created from the learner's navigation through web pages. Figure 5 shows the web pages navigated by the learner and Figure 6 shows the ontology of the community where arrows represent a non hierarchical relationship. In the first stage, web mining will be performed, and according to the keywords found on the web page, it may

Figure 6. Community ontology

After relating the web pages to the most relevant ontology concepts, the software agent will create a learning path in the ontology, which is a learning ontology, c d e

a

and the creation of the KU is initiated, mapping the web pages on the learning ontology.

a.ui

or

At this time the KUs are created using the learning ontology and all the information on the learner's navigation through web pages. In this example it is necessary to study "attribute", then study "class", to study "object". atn-bi

ft.-

Ii

h- Fobi i P-

p,

h-

P-

As has been said before, a KU is a structure formed by an attribute set. These attributes are grouped into

categories: General (name, description, keywords, author, creation date, last use date, etc), Life Cycle (history, current state, and contributors), Rights (intellectual property rights and conditions of use), Relation (the relationship between knowledge resources), Classification (the KU in relation to a classification system) and Annotation (comments and evaluations of the KUs and their creators). Many of these attributes can be automatically filled, which facilitates the creation of new KCs.

Proceedings of the 10th International Conference on Computer Supported Cooperative Work in Design

5. Conclusions and Future Work The growing number of learning communities which communicate online makes it possible to exchange, and use chains of explicit knowledge as a strategy for creating personal knowledge. Today, we have the WWW (who, what, where) triad, where "who" is the people who have the knowledge, "what" is the knowledge itself, and "where" is its location - in our case, the peer in which it is located. Using knowledge chains, we hope to add "how to use" the available knowledge to the existing triad. As has been previously stated, to motivate the learner in the creation of new KCs, we propose a personal knowledge recommendation system that uses software agents technology to monitor learner navigation; uses web mining to plot the path taken by the user while he is navigating on the web and to classify the navigated pages; and uses learning ontologies in addition to all the information collected for the cration of new KCs. The experimental use of the extended KCE shows evidence that, when used by a learner to build a personal KC, the hypothesis that he/she creates more new KCs, that he will achieve a reduction in the time dedicated to studying a specific subject as well as gaining a more comprehensive knowledge of the subject studied has been confirmed. In order to evaluate whether the KCE's objective has been reached, experiments aimed at obtaining qualitative and quantitative data that would make the verification of the hypothesis under consideration possible must be carried out. It is necessary to emphasize that it is not the goal of this work to ensure that the learner has assimilated everything in his KCs. Our goal is to stimulate the creation of new KCs, so that the knowledge network can expand, and better assist the community members. Due to the fact that this work is still in progress, many future projects are expected to take place. The most important are: improving the algorithm used to map the web page on the ontology nodes, and extending the monitored domain, considering any media manipulated by the learner, instead of only the navigated web pages.

Acknowledgement This work was partially supported by CAPES and CNPq.

References [1] J.L. Rezende, et al, "Building personal knowledge through exchanging knowledge chains", Proc. of IADIS Int. Conf on WBC, Algarve, Portugal, 2005, pp. 87-94. [2] S. Pawlowski et al., "Supporting shared information systems: boundary objects, communities, and brokering", Proc. of the 21th Int. Conf on Information Systems, Brisbane, Australia, 2000, pp. 329-338.

[3] E. Wenger et al., Cultivating Communities of Practice: A guide to Managing Knowledge, Harvard Business School Press, 2002. [4] J.M. Souza, A. Tornaghi and A. Vivacqua, "Creating educator communities", Int. Journal Web Based Communitie, Grd-Bretanha, 2005, pp. 1-15. [5] J.L. Rezende, J.M. Souza, J.F. Souza and G.B. Xexeo, "Peer-to-Peer collaborative integration of dynamic ontologies", Proc. of the 9th Int. Conf on CSCWD, Coventry, UK, 2005. [6] M. Leitch, Human Knowledge Design, An undergraduate project, February 1986. (Published on the web on 31 July 2002) [7] J.M. Souza, et al, "COE: A Collaborative Ontology Editor based on a Peer-to-Peer framework", Int. Journal of Advanced Engineering Informatics, Germany, pp. 1-15. [8] J.M. Bradshaw, "An introduction to software agents", in J.M. Bradshaw (eds), Software Agents, MIT Press, 1997. [9] O.R. ZaYane, "Web Mining: Concepts, Practices and Research", Conference Tutorial Notes, XIV SBBD, Joao Pessoa, Paraiba, Brazil, Oct 2000. [10] R. Cooley, B. Mobasher and J. Srivastava, "Web Mining: Information and Pattern Discovery on the World Wide Web". Proc. of the 9th IEEE Int. Conf on Tools with Artificial Intelligence, Newport Beach, CA, USA, Nov 1997. [11] T.R. Gruber, "Toward principles for the design of ontologies used for knowledge sharing", Int. Workshop on Formal Ontology, 1993. [12] J. Larocca Neto, et al, "Document clustering and text summarization", Proc. of the 4th Int. Conf on Practical Applications of Knowledge Discovery and Data Mining, 2000. [13] N. Guarino, "Formal ontology in information systems", Proc. of FOIS'98, Trento, Italy, IOS Press, June 1998. [14] T. Supnithi, et al, "Learning goal ontology supported by learning theories for opportunistic group formation", in S.P.Lajoie and M.Vivet (eds), Artificial Intelligence in Education, IOS Press, 1999. [15] E. Desmontils and C. Jacquin, "Indexing a web site with a terminology oriented ontology", SWWS, 2001, pp.549-565. [16] T. Joachims, D. Freitag and T. Mitchell, "Webwatcher: A tour guide for the world wide web", Proc. of the 15th Int. Joint Conf on Artificial Intelligence (IJCAI97), Nagoya, Japan, Aug 1997, pp. 770-775. [17] J. Davies, A. Duke Y. Sure, "OntoShare - A knowledge management environment for virtual communities of practice". Proc. of the Int. Conf on Knowledge Capture (K-CAP03), Sanibel Island, Florida, USA, 2003. [18] A. Agostini et al., "Stimulating knowledge discovery and sharing", Proc of Int. ACM Conf on Supporting Group Work, Sanibel, Florida, 2003, pp.248-257.