Susan Dumais Homepage

5 downloads 116 Views 77KB Size Report
Forbes article by William Baldwin on our anti-Spam work. .... Vacation in Australia, Sept 1998: 1) Boat ride in Melbourne; 2) Stanley St; 3) Koala and us; 4) Koala ...
Semantic Web Graph Implied by User Preferred Activities∗ Jie Wu

Karl Aberer

School of Computer and Communication Sciences Swiss Federal Institute of Technology (EPF), Lausanne 1015 Lausanne, Switzerland {jie.wu, karl.aberer}@epfl.ch The web can be viewed as a directed graph if we take entities in the web as the vertices and connections which imply some relationship between pairs of the entities as the directed edges. Several such graphs can be set up to model the World Wide Web and be represented, saved and processed in proper mathematical way, e.g., the application of the theory of matrix computation. When we use the web documents as the entities and the web links from one page to another as the directed edges, the graph becomes the normal one that is widely used in algorithms like PageRank [1] and its descendants. In this family of algorithms, hyper web links are intuitively considered to embed the information of page importance from the viewpoint of page authors. This information is then used to form a characteristics matrix of the web and compute the rankings of web documents in Internet search engines. However, none of the user interests and preferences of the web surfers, the real end consumers of the Internet, is included in this simplest model of the web graph. Though in reality, visitors’ browsing activities indicate a lot in the importance of a document. In the traditional static models, the information on document importance conveyed by interactive browsing is neglected. Furthermore, if we take the visiting trails of the surfers as the edges between document vertices, the new model of the web graph would be quite different from the previous one. One would argue that this second model is nothing more than the first model plus the random jump mechanism described in PageRank. But this is not true since the random jump mechanism used in PageRank does not take any advantage of user interests and preferences which is actually more valuable than the static hyper link in this situation. Here user interests and preferences include but not limited to: pause time on a document, frequency of random jump, habits of revisiting the same page(s), starting page of surfing (which is usually a home page of some web site, but not a uniformly random web document in the Internet), language preference, topic preference, geographical preference, etc.. Most of these semantic 1 aspects also have the properties of being dynamic and time-related. None of these semantic issues can be represented in the first simple model of the web graph. ∗ The work presented in this paper was supported (in part) by the National Competence Center in Research on Mobile Information and Communication Systems (NCCR-MICS), a center supported by the Swiss National Science Foundation under grant number 5005-67322. 1 The semantic used here has a meaning different from that in the notion semantic web which is more related to the techniques that enable automatic comprehension of the web by machines.

1

The management of the list above of dynamic semantic aspects of user interests and preferences can not simply be fulfilled by matrix computation as the PageRank algorithm. Rather we think theories of social intelligence can be applied. In an ongoing research effort, we develop a module [2] for web servers with a mechanism inspired by swarm intelligence to make it possible for the web servers to interact with web surfers, to maintain the dynamic semantic information of user surfing, and thus obtain a proper local ranking of web documents. The proof-of-concept implementation of our idea demonstrates the potential of our model in two aspects: one to show that collective intelligence is feasible to obtain and it is better than any single one in a self-organized system; the other to obtain meaningful ranking for web documents on a web server. The local rankings can also be used as input for the generation of global web rankings in a decentralized way [3]. This semantic view of the web graph can be further extended at a higher abstract level in such a way that the web sites instead of documents are taken as the vertices in the graph. For the same reasons, static hyper links can not reflect the plenty of semantic information when the transition happens of the semantic trails implied by user preferred activities from one site or region to another. Thus we can have the semantic web graphs implied by user interests and preferences at two different abstract levels: at the document level and at a higher, primarily the site level. At the former level, the graph intends to have the multiplicity of semantic aspects local to web sites or small-scale regions. Local ranking of documents can be computed based on this local semantic information. At the higher level, the focus is put on the general importance of a site or region from the users’ viewpoint. Relative ranking among sites or regions can be derived by the semantic web graph at this level. At the end, decentralized architectures [4] can be used to put them together and lead to an integrated global ranking of web documents.

References [1] S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. WWW7 Conference, Computer Networks 30(1-7), 1998. [2] Jie Wu and Karl Aberer. Swarm intelligent surfing in the web. Third International Conference on Web Engineering, ICWE’03, Oviedo, Asturias, Spain. July 14-18, 2003. [3] Karl Aberer and Jie Wu. A framework for decentralized ranking in web information retrieval. The Fifth Asia Pacific Web Conference, APWeb 2003, Xi’an, China. Sept. 27-29, 2003. [4] Jie Wu. Towards a decentralized search architecture for the web and P2P systems. Workshop on Adaptive Hypermedia and Adaptive Web-Based Systems (AH2003) of The fourteenth conference on Hypertext and Hypermedia, HyperText 2003, Nottingham, U.K. Aug. 26-30, 2003.

2