A Web Interface for Visualizing Web Search Engine Results - CiteSeerX

1 downloads 58305 Views 126KB Size Report
Web search engines play an important role to assist users in finding information of ... However, though the ranking presents a good ref- erence about query ...
A Web Interface for Visualizing Web Search Engine Results∗ Hsi-Chin Yang, Min-Chi Tzeng, and Cheng-Zen Yang Department of Computer Engineering and Science Yuan Ze University 135 Yuantung Road Chungli, Taiwan, R.O.C. {edgar,angeline,czyang}@syslab.cse.yzu.edu.tw Abstract Web search engines play an important role to assist users in finding information of interest in a large amount of Web pages. However, the procedure of finding information is still tedious due to the limitations of traditional list-based representations. In this paper, we present a Web interface design called VISE to provide visualizations of Web search results. A prototype has implemented in Java, and is used in conducted empirical studies to compare VISE’s visualization interface with Google’s traditional list-based interface. Though the results are primitive, they show that the VISE interface indeed improves the searching efficiency and the user perception effect.

1. Introduction As the amount of information on the Web is explosively growing, manually tracking hyperlinks to find information of interest is time-consuming and thus infeasible. Therefore, Web search engines emerge to help people effectively find the information. The search engines do not only return the search results, but also rank the results according to some ranking algorithm such as HITS [13] and PageRank [6]. However, though the ranking presents a good reference about query relevance, we have observed that the information-finding procedure is still tedious. The main reason is that the visual representations of most existing search engines are in a list-based style. Therefore, though the search engines maintain different kinds of relevance information of the search results, few of them are represented [18]. For example, the list-based representation can hardly demonstrate the cross-reference relationships between the result entries due to its linear characteristic. If there is a ∗ This research was supported by the National Science Council under grants No. NSC 90-2213-E-155-018.

reference relationship between two search results, it cannot be found without a time-consuming procedure of manually tracking. Furthermore, the linear ranking of Web pages is synthesized from different attributes. Therefore, the Web pages of a same Web site but with a few differences may be spread across the list in a wide range, and interleaved with other result entries. Careless users will neglect the interleaved information entries and miss some important Web pages. In addition, these stretched entries will frustrate users because the entries usually contain little new information. A visual interface design provides possibilities to relieve the problems of the list-based representation. Different interfaces have been proposed recently [2, 4, 5, 7, 8, 9, 10, 12, 15, 16, 18, 19, 20, 21]. However, most of the previous researches focus on how to visualize the relationship between the query terms and the search results rather than on how to visualize more implicit information of the query results. TopicShop [2], Envision [20], and GRIDL [21] are some examples in which the visualization is query-centered. Only a few systems notice this problem and take other information such as the connectivity relevance into their visualization account. For example, WebQuery visualizes the neighbor graph with VANISH visualization tool [7, 8]. Mapuccino [12] and Fetuccino [5] developed in IBM visualize the search results and their link reference information. CardVis [19] uses a pack of cards to show each connected component in a card. However, WebQuery may introduce many irrelevant nodes in the neighbor graph due to the lack of relevance checking. Mapuccino and Fetuccino avoid such problem by utilizing search algorithms to tailor the site map. However, their searching and ranking system is different with most of the general search engines. CardVis suffers from the problem that the top card will shade the content of other cards. In this paper, we present a visual interface design called VISE (Visual Interface for Search Engines) that concentrates on visualize more implicit information in addition

to the information provided in the scrolled list approach. Contrary to the previous work, VISE directly uses the ranks from existing search engines, but additionally visualize extra information such as reference relationships in search results. VISE has three major design features: 1. VISE provides a graphical user interface in which the Web page entries of the same Web site are clustered and the reference relationships between search result entries are visualized. Therefore, users can learn more information without tedious sifting processes. 2. VISE facilitates the searching procedure with information visualization techniques. The results of our empirical studies show that the subjects can more effectively find information of interest. 3. VISE is an intermediate visualization interface that is independent of the back-end search engines. VISE exploits the ranks returned from the back-end search engine and provides add-on visualization services. To visualize the reference relationships in a clear view, VISE adopts a CGD-based algorithm [17] because of its efficient space utilization and minor edge-crossing effect. In addition, VISE encapsulates result entries of the same Web site into a respective delegate node. Therefore, users can have clear structural views about the search results. We have implemented a prototype system in Java and performed empirical studies to compare VISE with Google’s list-based interface. Though the experimental results are primitive, they show that VISE has achieved three contributions. First, users can more effectively find the information of interest from VISE visualization. Second, users can easily recognize the closely related pages with delegate nodes. Third, since VISE is a stand-alone design, it can be tailored to different search engines without modifying them. The rest of the paper is organized as follows. The next section gives an overview of related research work. Section 3 elaborates the interface architecture. Section 4 presents the empirical experiments. Finally, Section 5 discusses the future research directions and concludes the paper.

2. Related Work Visualization techniques for Web search results have received notice for years. WebQuery is a famous system designed to visualize the results of a search query [8]. The query results are processed by expanding a neighbor graph from nodes that have reference relationships with the original query result set. Then, the graph is visualized [8]. Compared with WebQuery, VISE is distinctive in three aspects. First, VISE’s interface is independent of any search

engine but WebQuery’s interface is integrated with a search engine. Therefore, VISE is highly adaptable to different search engines. Second, VISE does not consider the neighbor nodes that are not in the query results because such nodes are decided to be irrelevant to the query terms. Third, some artificial nodes are added in WebQuery to let all nodes be connected, but VISE does not add these extra nodes. In VISE, each subgroup of the search results is explicitly displayed so users can easily distinguish each subgroup from others. Mapuccino [12] and its descendant Fetuccino [5] are developed in IBM to visualize the search results and their link reference information. As WebQuery, they employ several search algorithms to dynamically crawl the relevant pages, and a visual interface to visualize the search results. The visualization interface is therefore specific to their unique ranking and searching mechanisms. CardVis uses a pack of cards to show each connected component in a card [19]. When a user wants to know the search space, the most relevant card is moved to the top and at the focus. The user can then change to other cards after s/he clicks on other keywords. However, the top card thus shades the content of the other cards except their indexes. Therefore, users still need to sift through the pack to find the highly valuable entries. TopicShop provides a visualization interface for the task of topic management [2]. In TopicShop, users view site profiles, sort sites by their properties, and organize sites into categories through an Explorer interface. However, it provides limited information due to the constraint of the listbased and icon-based nature. Envision [20] and GRIDL [21] visualize the search results on an axis basis. Envision can explicitly show different kinds of relevance ranks along the x-axis. However, it does not provide visualizations of reference relationships. GRIDL provides a graphical user interface through which the search results are visualized with categorical and hierarchical axes. However, GRIDL suffers from two problems. First, users will get lost after a period of using GRIDL because the category expansion changes the axes and the display without explicit history trace. Second, GRIDL does not visualize the reference relationships. INSYDER [15, 16] incorporates several information visualization techniques such as document vector and tilebars. However, it focuses on clearly displaying the relevance ranks and showing an overview of search results, The hyperlink reference relationships are not visualized. HyperSpace [4] provides a 3D spatial display for visualizing relationships of search results. However, the visualization focuses on query sequences. Though users can understand the relevance between the search results of two consecutive queries, they cannot recognize the reference relationships in the search results.

WebCiao [10]provides a graphical interface to visualize the results of queries. The graphical interface is based on Ciao [9] However, Ciao-based visualization is mainly for visualizing graphs in an architecture style, not for graphs in which there may be full with back reference links. WebCiao is designed mainly for managing website structures.

3. The System Architecture VISE is an intermediate interface that accepts the user requests, formulates the query terms, forwards the formulated requests to a designated back-end search engine, and visualizes the search results. In addition, VISE analyzes the reference relationships and proximity of result entries and layouts the analysis results in a graphical user interface. From the interface, the users recognize connected node groups in which the entries have reference relationships. The VISE architecture is similar to the model proposed in [1]. The system architecture is illustrated in Figure 1. VISE consists of four major modules: a graphical user interface (GUI), a query formulator (QF), a crawling analyzer (CA), and a visualization engine (VE). The GUI and VE modules are responsible for visual representation and implemented as Java applets. The QF and CA are responsible for processing query requests and search results and implemented as a Java server process at the VISE server. Query Terms

Query Formulator

GUI Visual Results Users Visualization Engine

Hyperlink Structures

Query

then either expand a collapsed delegate node or browse the Web page of a node in another window. If a node is just clicked, the preview area previews a part of the page. Figure 2 shows an example in which the query string is “thesis writing”, the node expansion option is checked, and the maximum node number is 100. Figure 2(a) shows the initial screen. Figure 2(b) depicts the generated visualization. Each node in the graph represents a search result entry. The nodes are placed horizontally according to the ranks of the search results. If there are reference relationships in a group of nodes, a connected graph will be drawn and placed in front of single nodes. The rank score of each node is also labeled to help users understand the ranking.

(a) Initial screen snap-shot.

Search Engine

Query Results Crawling Analyzer

Pages Web Sites

Graph Processing

Query Processing

VISE Interface

(b) After querying and expanding. Figure 1. The VISE system architecture. Figure 2. A visualization example of querying “thesis writing”.

3.1. The Graphical User Interface VISE GUI consists of three areas: a user input area, a visualization area, and a preview area. In the user input area, users can input the query terms, set up the maximum node number to display, and control the reactions to doubly mouse clicking . The query terms will be re-formulated in QF and sent to the back-end search engine. After the results are returned and analyzed, the visualization area shows the connectivity graph with the constraint of the maximum node number. With doubly mouse clicking, the users can

By default, the nodes of a same Web site are encapsulated in a delegate node, not expanded in a connected graph. In Figure 2(b), node 4 remains in default encapsulated mode. To show the weight of a delegate node, a color scheme is used. For example, node 4 is brown to mean that it is the delegate for three or four nodes. If the delegate node is doubly clicked, a connected graph is expended. The connected graphs are placed in front of other unexpanded nodes. In Figure 2(b), two delegate nodes of rank 1 and

2 are expanded. The arrowheaded lines between nodes are also colored to help users understand whether the connected nodes are of the same web site. In Figure 2, for example, the arrowheaded line between node 2 and node 3 is gray to mean node 2 and node 3 are of the same web site. Figure 3 demonstrates another example in which the query term is “data mining”. Node 0 is clicked and has a cross on it. Its content is also previewed. This figure shows that the search results can be clustered into several connected graphs according to the reference relationships. Node 1 and 12 seems to have higher page authority according to HITS [13]. Users can decide to browse these two pages first.

InputGraph(){ NodeSelected=number of selected nodes; NodetoShow=MAX number of nodes to show; i=0; rCount=number of results; /*insert node into graph*/ While(NodeSelected < NodetoShow && i is less than rCount) If(result[i] is not contained by other results) insert result[i++] into graph G;

}

/*insert edge into graph*/ For each node Node[i] that is inserted For each node Node[j] linked from Node[i] and Link[j] was inserted{ E=edge(Node[i], Node[j]); If(E does not cause a loop path or is not a transitive edge) insert E into graph G; }

Figure 4. The algorithm for constructing hyperlink graphs.

3.3. The Visualization Engine

Figure 3. Another example of querying “data mining”.

The visualization engine (VE) employs a CGD-based algorithm [17] to visualize the Hasse graphs. There are three benefits in using CGD. First, the node layout is analyzed concurrently in two dimensions, and the result graph is more balanced. Second, the result graph has relatively few unnecessary edge crossings. Third, the node expansion can be practiced without cumbersome computation. More details about the CGD-based algorithm can be found in [17].

4. Empirical Studies 3.2. Query Processing The query formulator (QF) re-forms the query strings in an appropriate format. After the search results are returned, the crawling analyzer (CA) parses the results and analyzes the hyperlink structures of the result entries. Entries of the same Web site are encapsulated into a delegate node to reduce the display complexity. To visualize the reference relationships, CA first translates the hyperlink structures into Hasse graphs [17]. A Hasse graph is a directed graph containing no transitive edges. More details about the Hasse graph can be found in [3, 17]. When constructing the graphs, CA temporarily removes the circular paths for two reasons. First, the produced Hasse graphs are unique and the construction complexity is thus reduced. Second, the aesthetics of visual effects will be improved if the number of upward arcs is reduced [11]. But after the graphs are constructed, the circular paths will be recovered. Figure 4 shows the algorithm used to construct Hasse graphs.

A prototype has been implemented in Java with JDK 1.2.2 running on a Microsoft Personal Web Server. The GUI and VE modules are implemented in a Java applet, and QF and CA are designed as a Java server process. Google [22] is selected as the back-end search engine for its effective ranking performance. To understand the effectiveness in VISE visualization, two experiments were conducted to compare VISE’s visual interface with Google’s list-based interface. Since the crawling process in the VISE prototype could cost a lot of time, we did not count this cost. To this problem, a caching scheme is now under development to improve the system efficiency. In the experiments, ten graduate students aged from 24 to 26 in our department volunteered to participate. They were familiar with the conventional list-based search interface. Therefore, only primitive tutorial training about VISE usage was provided. In the experiments, they had to answer the following four questions: (1) the satisfaction of searching efficiency to find information of interest; (2) whether they could find high-quality information; (3) whether the results could be clearly represented; and (4) the degree of

understanding the reference relationships between result entries.

4.1. Methods In the first experiment, the subjects performed Web searching tasks with several predefined strings such as “NBA” and “thesis writing” because their results had been proved to have many reference relationships. In the second experiment, the subjects performed Web searching tasks with several freely chosen strings. This experiment is to simulate the real cases because the search results do not guarantee to have many cross reference relationships. At the end of each experiment, they completed the questionnaire giving the subjective rating of two interfaces on a scale of 1 (lowest score) to 10 (highest score). The subjects also reflected their commentaries about VISE.

4.2. Results and Discussion Table 1 summarizes the results of the first experiment. The results show that VISE improves the search efficiency. More high-quality information can be found in VISE. A user reported that he could effectively recognize some highly referenced Web pages with VISE. However, these pages did not attract his notice with Google. However, VISE does not lead too much in clarity aspect for two reasons. First, Google simultaneously shows the preview information of all entries on the same page, but VISE previews only an entry. Some subjects thus ranked Google higher. Second, some subjects reported that they are interfered with a large amount of edge crossings in some connected graphs. Table 2 summarizes the results of the second experiment. VISE’s interface still outperformed Google’s interface. However, the subjects ranked VISE much higher in the first three questions. The reason is that most of the subjects were more familiar with the VISE visualization interface and they preferred the visual displays. The score difference is reduced for the last question because the visualization techniques cannot do all their best for user cognition in some query cases. There are many valuable suggestions. One of them is that the focu+context technique [14] should be provided so users can more effectively understand the whole picture about the search results. Another is that VISE should more appropriately layout the disjointed graphs for facilitating the browsing procedure. Totally, the subjects expressed a high satisfaction for VISE and a high desire for more powerful visualization techniques. Though the VISE prototype is experimental, the subjects have shown high interest in using VISE to find information over the Web.

5. Conclusions and Future Work Web search engines play an important role to assist users in finding information of interest in a large amount of Web pages. However, few search engines have considered the interface design issue. The information-finding procedure is still tedious due to the traditional list-based representation. This paper presents the VISE interface design in which information visualization techniques are incorporated to visualize the Web search results. VISE has three major design features. First, VISE visualizes reference relationships and proximity in search results. Second, VISE facilitates the information searching procedure. Third, VISE is an intermediate visualization interface that is independent of the back-end search engine design. We have compared the VISE interface with the traditional list-based interface by using the same set of search tasks, search engine, and search results. The results of the empirical studies show that VISE helps users quickly identify information entries of highly referenced, and thus find the information of high interest. Though the results are primitive, they show that VISE indeed improves the searching efficiency and the user perception effect. However, many possible improvements still exist in VISE. One important issue is how to quickly visualize the reference structures. To date, the VISE prototype suffers an inefficiency problem in crawling remote Web pages because it is deigned to gather and analyze the reference information on the fly. If the back-end search engine embeds the reference information in the result lists, this problem can be relieved. However, this approach is complicated because the search engine design must be changed. Another feasible approach is to employ a cache architecture that is now under development in VISE project. However, caching incurs the inconsistency problem that is now being studied. Visual effect is another important issue. The subjects expressed a desire for more information visualization techniques. For example, the visualized display should be clearer for complicated graphs, the layout should be redesigned to express more information, and the focus+context technique should be adopted. These suggestions show that related visualization techniques need to be further investigated. In addition, a more comprehensive survey needs to be conducted for further investigation.

References [1] O. Alonso and R. Baeza-Yates. A Model and Software Architecture for Search Results Visualization on the WWW. In Proc. of the 7th Intl. Symposium on String Processing and Information Retrieval (SPIRE 2000), pages 8–16, 2000. [2] B. Amento, W. Hill, L. Terveen, D. Hix, and P. Ju. An Empirical Evaluation of User Interfaces for Topic Management

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10] [11]

[12]

Table 1. The summarized results of the first experiment. VISE Google Average σ Highest Lowest Average σ Highest Searching efficiency 8.05 0.72 9 7.5 7.2 0.87 9 High-quality info. 8.3 0.64 9 7 7.6 1.2 9 Clarity 7.9 0.94 9 7 7.4 1.28 9 Reference relationships 8.5 0.67 10 8 6 1.55 9

Lowest 6 5 5 4

Table 2. The summarized results of the second experiment. VISE Google Average σ Highest Lowest Average σ Highest Searching efficiency 8.1 0.83 9 6 7.2 0.87 9 High-quality info. 8.4 0.66 9 7 7.5 0.81 9 Clarity 8.2 0.75 9 7 7.4 0.92 9 Reference relationships 8.4 0.49 9 8 6.4 1.43 9

Lowest 6 6 6 4

of Web Sites. In Proc. of the ACM CHI’99, pages 552–559, May 1999. G. D. Battista, P. Eades, R. Tamassia, and I. G. Tollis. Graph Drawing – Algorithms for the Visualization of Graphs. Prentice Hall Inc., 1999. R. Beale, R. J. McNab, and I. H. Witten. Visualizing Sequences of Quries: A New Tool for Information Retrieval. In Proc. of 1997 IEEE Conf. on Information Visualization, pages 57–62, Aug. 1997. I. Ben-Shaul, M. Herscovici, M. Jacovi, Y. S. Maarek, D. Pelleg, M. Shtalhaim, V. Soroka, and S. Ur. Adding Support for Dynamic and Focused Search with Fetuccino. In Proc. of the 8th Intl. WWW Conference, 1999. S. Brin and L. Page. The Anatomy of Large-Scale Hypertextual Web Search Engine. In Proc. of the 7th Intl. WWW Conference, Apr. 1998. J. Carri`ere and R. Kazman. Research Report: Interacting with Huge Hierarchies: Beyond Cone Trees. In Proc. of the 1995 IEEE Symposium on Information Visualization, pages 77–81. IEEE, October 1995. J. Carriere and R. Kazman. WebQuery: Searching and Visualizing the Web through Connectivity. In Proc. of the 6th Intl. WWW Conference, pages 701–711, Apr. 1997. Y.-F. Chen, G. S. Fowler, E. Koutsofios, and R. S. Wallach. Ciao: A Graphical Navigator for Software and Document Repositories. In Proc. of the Intl. Conf. on Software Maintenance, pages 66–75, 1995. Y.-F. Chen and E. Koutsofios. WebCiao: A Website Visualization and Tracking System. In Proc. of WebNet97, 1997. P. Eades and L. Xuemin. How to Draw a Directed Graph. In Proc. of the IEEE Workshop on Visual Languages, pages 13–17, Oct. 1989. M. Hersovici, M. Jacovi, Y. S. Maarek, D. Pelleg, M. Shtalheim, and S. Ur. The Shark-Search Algorithm – An Application: Tailored Web Site Mapping. In Proc. of the 7th Intl. WWW Conference, Apr. 1998.

[13] J. M. Kleinberg. Authoritative Sources in a Hyperlinked Environment. In Proc. of the 9th Annual ACM SIAM Symposium on Discrete Algorithms, pages 668–677, Jan. 1998. [14] J. Lamping, R. Rao, and P. Pirolli. A Focus+Context Technique Based on Hyperbolic Geometry for Visualizing Large Hierarchies. In Proc. of the ACM SIGCHI’95, pages 401– 408, May 1995. [15] T. M. Mann. Visualization of WWW-Search Results. In Proc. of the IEEE 10th Intl. Workshop on Database and Expert Systems Applications, pages 264–268, Sept. 1999. [16] T. M. Mann and H. Reiterer. Evaluation of Different Visualizations of Web Search Results. In Proc. of the IEEE 11th Intl. Workshop on Database and Expert Systems Applications, pages 586–590, Sept. 2000. [17] C. L. McCreary, R. O. Chapman, and F.-S. Shieh. Using Graph Parsing for Automatic Graph Drawing. IEEE Transactions on Systems, Man and Cybernetics, Part A, 28(5):545–561, Sept. 1998. [18] D. S. McCrickard and C. M. Kehoe. Visualizing Search Results Using SQWID. In Proc. of the 6th Intl. WWW Conference, Apr. 1997. [19] S. Mukherjea and Y. Hara. Visualizing World-Wide Web Search Engine Results. In Proc. of 1999 IEEE Intl. Conf. on Information Visualization, pages 400–405, July 1999. [20] L. T. Nowell, R. K. France, D. Hix, L. S. Heath, and E. A. Fox. Visualizing Search Results: Some Alternatives to Query-Document Similarity. In Proc. of the 19th ACM SIGIR, pages 67–75, Aug. 1996. [21] B. Shneiderman, D. Feldman, A. Rose, and X. F. Grau. Visualizing Digital Library Search Results with Categorical and Hierarchical Axes. In Proc. of the 5th ACM Intl. Conf. on Digital Libraries, pages 57–65, June 2000. [22] The Google search engine. http://www.google.com/.