A Query Construction Service for large-scale Web Search Engines

5 downloads 1818 Views 225KB Size Report
go to a large-scale search engine and submit a query. .... Google's search wiki2 and Yahoo's SearchMonkey3 ap- ... Google's approach presents a list of pairs.
A Query Construction Service for large-scale Web Search Engines Ioannis Papadakis Department of Archives and Library Sciences Ionian University, GREECE [email protected] Sofia Stamou Computer Engineering and Informatics Dept. Patras University, GREECE [email protected] Abstract The most popular way for finding information on the web is go to a large-scale search engine and submit a query. Despite their wide usage, large-scale search engines are not always effective in tracing the best possible information for the user needs. There are times when web searchers spend too much time searching over a large-scale search engine. When (if) they eventually succeed in getting back the anticipated results, they often realize that their successful queries are significantly different from their initial one. In this paper, we introduce a query construction service for assisting web information seekers specify precise and unambiguous queries over large-scale search engines. The proposed service leverages the collective knowledge encapsulated mainly in the Wikipedia corpus and provides an intuitive GUI via which web users can determine the semantic orientation of their searches before these are executed by the desired engine.

1. Introduction Currently, large-scale web search engines are the predominant mean for accessing the flourishing data that is available on the web. One thing that makes search engines so popular is that they enable users query the web in an intuitive yet simple manner, i.e. by submitting a few keywords to the engine’s search box. Despite the intended simplicity associated with querying the web via a large-scale search engine, there are times when web searchers spend too much time reformulating queries, without being able to satisfy their information needs. Search engines provide little help to users with vague knowledge of the terminology employed within relevant documents. Even if searchers succeed in locating the information sought, they often realize that their successful queries differ significantly from their initial query.

Michalis Stefanidakis Computer Science Department Ionian University, GREECE [email protected] Ioannis Andreou Digital Systems Department University of Piraeus, GREECE [email protected] In this paper, we propose a query construction service that resides on top of large-scale web search engines and aims at assisting information seekers formulate search queries that are expressive of their search intentions. This way, we implicitly help search engines better understand user information needs and accordingly serve their queries. The motive of this work is to bridge the semantic gap between the initial query and the query that should be addressed to the large-scale web search engine, if the user knew the terminology employed by the documents that actually pertain to his information needs. The proposed query construction service extends the functionality of the traditional search box and acts as an intermediate layer between searchers and large-scale web search engines. Specifically, the service’s search box incorporates auto-suggest functionality that offers query suggestions based on the semantic information of an underlying ontology. Upon selection of a suggested query by the user, the latter is provided with information about the semantics of his selected query. Semantic information is visualized as a conceptual ontology whose nodes represent concepts and whose labeled links represent the semantic relation that connects concepts together. By traversing the ontology, the user can improve his query and submit it for search. The ontology contains information driving mainly from Wikipedia1 and is exposed to the searcher through an interactive, ontology-browsing GUI. The rest of the paper is organized as follows. We begin our discussion with an overview of the different search modes that web users employ, we present the main difficulties associated with querying the web and we discuss a number of methods that have been proposed for addressing such difficulties. In Section 3, we introduce our query construction service. In Section 4, we give several snapshots of our service’s GUI in order to illustrate the functionality and intuitiveness of our proposed method. Section 5 discusses the proposed approach and Section 6 provides an evalua1

www.wikipedia.org/

tion for assessing the usefulness of the proposed service in the web search process. In Section 7, we outline the main differences between our study and related works and we conclude the paper in Section 8.

2. Preliminaries and Related Work In this section, we describe the different search strategies that users employ when querying the web, in order to illustrate how the different querying behaviors affect both the engines’ retrieval performance and the users’ search experience. Then, we outline the current search paradigm that search engines support in order to address the corresponding difficulties. Finally, we discuss the different methods that have been proposed for assisting web information seekers specify good queries.

2.1 Web Querying Behaviors It is common knowledge that searchers do not employ a standard behavior when querying the web. This is essentially because people have different backgrounds and varying needs and thus they make their query selections based on different criteria and underlying knowledge. Currently, there exist a number of studies (e.g. [6], [14]) that try to elucidate the different search modes that web users employ. In this direction, [13] identified four intersecting information seeking modes: (i) the known-item, (ii) the exploratory mode, (iii) the don’t know what I need to know and (iv) the re-finding mode. Given that the aim of our study is to help users formulate good queries in various information seeking modes, we rely upon the research suggested in [13]. In particular, the known-item search mode adheres when the user has a specific information need and is capable of picking the right keywords for specifying his query. Under the known-item search, any difficulties that search engines encounter with respect to answering known-item queries emerge from the intrinsic nature of natural languages, as we discuss next. The exploratory search mode is employed when the user has a specific information need but is not sure how to express it in a set of keywords. Under this search mode, the challenge that search engines encounter is how to assist users formulate intention-descriptive queries. The don’t know what I need to know mode refers to the situation that a user submits a query without a specific goal in mind. Such searches might occur in complex or unknown domains (i.e. legal, medical) as well as in case the user’s need is to get an update of what is on the web about his query. The paradox of this search mode is that neither the user nor the engine are able to resolve the intention of the query without the assistance of some external resource, e.g. pages retrieved for an initial query [10]. Thus, the greatest challenge is how to help users crystallize their search goals at query time.

Finally, the re-find mode is encountered when the user queries the engine in order to find information that he has already seen in a previous search. Generally speaking, the fourth information seeking mode is usually addressed by personalized search, where users have to sign-in to the personalized search engine. There is also the option of addressing this mode outside the search engine (i.e. web browser’s bookmarking functionality), but this will not concern us further in this paper. Having described the main information seeking modes that web users employ while interacting with large-scale web search engines, we now present the way in which search engines perceive and interpret search queries.

2.2 Search Engines’ Query Handling Although information seekers experience different modes when querying the web, large-scale search engines’ main concern is to find the most efficient way to rank search results. However, without any help from the search engine at query construction time, searchers that do not know exactly what they are looking for and/or how to express it as a search query, are most likely to issue a misleading query that will eventually result in the retrieval of perfectly ranked, irrelevant documents. To make things worse, queries might be ambiguous or polysemous; using identical terms to represent distinct information needs, which constitutes the retrieval of relevant data arduous. Evidently, as users become more dependent on web data to find information about a subject of interest, there is an ever-increasing need that we equip search engines with modules that can assist information seekers select queries that express their varying search intentions in a distinguishable by the engine manner.

2.3 Query Selection for Improved Searches To address the difficulties that web users encounter when searching for information about a topic of interest, a number of techniques have emerged such as search personalization, query refinement, relevance feedback, etc. Search personalization is the process of incorporating information about the user needs in the query processing phase. One approach to personalization is to have users describe their general search interests, which are stored as personal profiles [12]. Another approach employs relevance feedback. Relevance feedback dictates that queries are reformulated based on previously retrieved relevant and non- relevant information resources [7]. This technique provides a controlled query alteration process that is designed to emphasize some terms and to deemphasize others, as required in particular search environments. However, this approach cannot be easily applied to large-scale

web search engines, where authentication is difficult to impose and diversity prevails. Moreover, as noted in [9], it is overambitious to expect searchers to voluntarily provide feedback to the overall information seeking process without proper motivation. Even in the case of automatic (blind) relevance feedback, where terms from the top few information resources returned are automatically fed back into the query [21], success is by no means self evident. In an effort to incorporate personalization functionality within their popular, large-scale web search engines, Google’s search wiki2 and Yahoo’s SearchMonkey3 approaches take advantage of their email services for authenticating their users and consequently log their personal search tactics (e.g. based on the analysis of past clickthrough data [15]). They both employ auto-suggest functionality within the search box and they both anticipate explicit feedback from the searchers during their searching process. Upon addressing a query to the search engine, Google’s approach presents a list of pairs. Each pair contains a suggested query constructed by the search engine together with the corresponding first result. This way, users have the chance to disambiguate their initial query by choosing the suggestion that best matches their information needs. On the contrary, Yahoo’s approach requires explicit feedback from the searchers during their searching process. Whether they provide enough motives for the web searchers to spend extra time in providing such feedback, still remains to be seen. Another approach towards this direction dictates the refinement of user queries with semantically related terms [4]. Most of the efforts in this direction concentrate on the disambiguation of the query terms based on either local (i.e. results sets) or global (usually ontologies expressed as thesauri [16]) document analysis. However, when it comes to large-scale web search engines, the utilization of ontologies in query construction methods is difficult for three reasons [2]: (i) integration is extremely hard, (ii) the web imposes scalability and performance restrictions and (iii) there is a cultural divide between the semantic web and information retrieval disciplines. Having the above remarks in mind, we introduce an ontology-based, query construction service suitable for largescale web search engines. The ontology models machinereadable, semantic information provided by DBpedia [1]. The details of the proposed service are given next.

accumulated mainly in Wikipedia and made readily available through DBpedia. Serving as an application for the semantic web, the proposed approach provides an interactive GUI that seamlessly integrates the knowledge provided by web users with large-scale web search engines. Such knowledge is modeled in a carefully designed, low complexity and extensible ontology that supports basic reasoning against a large volume of asserted and/or inferred facts. Basic reasoning refers to the situation where the possible inferences that are required from the ontology are known in advance, since they refer to answers addressed to a finite set of GUI-driven questions. This way, the underlying ontology is serialized in a way that promotes performance and scalability. On the other hand, expressivity is reduced. Nevertheless, it is our belief that this is a fair tradeoff considering that the proposed approach is greatly affected by performance and scalability.

3.1 Harvesting Data from Wikipedia There exist several studies that rely upon the DBpedia datasets for building highly expressive ontologies via the combination of Wikipedia and WordNet4. The two most widely known resources that have emerged from such efforts are the Kylin Ontology Generator (KOG) [19] and the YAGO ontology [17]. In addition, [18] studied how to transform Wikipedia into a giant knowledge base of semantic triples that can be used for faceted browsing. Based on the findings of the above studies, we encapsulate the knowledge available in DBpedia into a query construction service that acts as a mediator between searchers and large-scale web search engines. Our service aims at assisting both users and engines perform successful web retrieval tasks. Our service consists of a client and a server. The client utilizes Ajax technology and serves towards augmenting user typed keywords with information from DBpedia. The server is written in python and uses the Twisted http server framework5 to fulfill automated client-side requests, by retrieving data related to the user needs from the underling DBpedia-based ontology. Retrieval queries are eventually transformed into relational SQL-select statements since the ontology is stored into a MySQL database.

3. Query Construction Service The approach we adopt towards building a query construction service makes use of the collaborative knowledge 2

Google’s search wiki, http://www.google.com/psearch

4

3

Searchmonkey, http://developer.yahoo.com/searchmonkey

5

http://wordnet.princeton.edu/ http://twistedmatrix.com

Table 1. Statistics on the Wikipedia harvested data Collection period: January 2009

Dataset Wikipedia articles Disambiguation entries Categories WordNet classes Articles linked to WordNet classes Infobox records

Value 2,866,994 226,978 339,112 124 497,797 19,230,789

So far, our service utilizes the following DBpedia datasets6: (i) the Wikipedia articles, (ii) the list of disambiguations that Wikipedia encodes for connecting generic articles to their specific interpretations, (iii) the categories under which the Wikipedia articles are classified, (iv) the WordNet classes to which Wikipedia articles correspond and which mainly pertain to the synset name that represents the entities’ corresponding properties and features and (v) the articles’ infobox datasets that contain semantically rich key-value pair of properties about the considered articles. Table 1 summarizes the DBpedia data that we explored in our work. Relying on the above dataset, we organized it into an ontological scheme as shown in Fig. 1. disambiguated_by infobox property 1…n infobox infobox value 1 value 2 infobox value n

Categories

addressed to the search engine. Nevertheless, our ontology construction process is not limited to the DBpedia data, but rather it can be fruitfully employed for any other semantic resource that is available in machine-readable format.

4. GUI for Structuring Queries In this section we turn our discussion to the description of the GUI, which extends the traditional search box by providing the opportunity to view the semantics of the typed keywords and/or alternative formulations of the intended searches. The principle upon which the GUI design took place is that it should be interactive, inductive, easy to use and fast to execute. Having such requirements in mind and based on the work of [5] on ontology visualization, we built the query formulation box illustrated in Fig. 2. This box enables users to type their search terms and receive in response a set of alternative query wordings. In particular, upon typing a few characters of a search query, the box suggests a number of strings that can be attached to the typed tokens in order to complete them. The autocomplete suggestions are leveraged from the titles of the Wikipedia articles that our service encapsulates.

in_category Articles

Wordnets of_wordnet

Figure 1. Ontology Schema.

The class Articles contains all the Wikipedia article references organized as class instances. The class Categories is employed to host the respective categories of the Wikipedia articles. WordNet classes store all the possible types of the article entities. The article disambiguations are expressed as a reflexive ‘disambiguated_by’ relation, which has the Articles class as both domain and range. In a similar manner, the ‘in_category’ relation is employed to link articles to their belonging categories. According to DBpedia, there exist 7,721,468 in_category relations that connect articles with their respective categories. Furthermore, the articles that are associated with WordNet classes are linked to their respective entity type encoded in WordNet via the ‘of_WordNet’ relation. Finally, the Wikipedia infoboxes of name-value pairs of properties are ontologically expressed as datatype properties of their corresponding article instances (sketched as dotted arrows in Fig. 1). Based on the above elements, we developed an ontology that we incorporated into our query construction service in the hope of assisting users decipher the semantic orientations of their candidate queries before these are actually 6

http://wiki.dbpedia.org/Downloads32

Figure 2. Auto-suggestions for query construction.

In case the user does not wish to employ any of the suggested query alternatives, he can ignore the suggestions and search with his self-selected keywords. On the other hand, if the user selects a suggested alternative; an HTTP-GET request is addressed to the server aiming at extracting semantic information for the selected query suggestion. Semantic information pertains to (i) the query disambiguations (possibly) grouped by the WordNet classes to which disambiguations belong, (ii) the Wikipedia categories associated with each of the suggestions and (iii) key-value pairs harvested from the Wikipedia infoboxes. Query Disambiguation. Query disambiguation is performed in one or two steps: at first, upon selecting the ‘disambiguated by’ box, the user receives a list of all the corresponding disambiguations that match his selected suggestion (Fig. 3a). Such disambiguations could be grouped by a WordNet class, provided they share a common WordNet meaning. In such case, upon selecting the corresponding WordNet label, a second-level disambiguation list appears.

lected box using a line labeled with the key name of the selection. At the same time a search query consisting of the keywords deriving from the two box titles is addressed to the underlying search engine.

Figure 3a. Query Disambiguations.

Figure 5. Infobox for the Disambiguated Query.

Figure 3b. Selected Query Disambiguations.

By selecting either one of the first- or second-level disambiguations, a new box containing the disambiguated entity is sketched at the right (Fig. 3b), which is connected to the previous box with a line labeled ‘disambiguated by’. Simultaneously, a search query that consists of keywords deriving from the two box titles (elimination of duplicates is applied) is addressed to the underlying large-scale search engine. Categories. In case the selected suggestion (from the search box) is associated with Wikipedia categories, the label ‘in category’ appears on the interface and a similar process is initiated. The searcher is prompted to select the ‘in category’ relation in order to find the category that best matches his intended query semantics. Upon category selection, a new box containing the selected category is sketched at the right of the interface and is connected to the previous box with a line labeled ‘in category’ (Fig. 4).

Figure 4. Selected Category for the Disambiguated Query.

Simultaneously, a search query that consists of keywords deriving from the two box titles (elimination of duplicates is applied) is addressed to the underlying search engine. Infoboxes. Finally, if the selected suggestion (from the search box) is associated with Wikipedia infoboxes, these are displayed as labels beneath the query inside the box. The user is prompted to select a key in order to obtain the corresponding values (Fig. 5). Upon the selection of an infobox value, a new box containing the selection is sketched at the right and connected to the previously se-

Each sketched box corresponds through it’s title to a part of the resulting query. The user decides which boxes will eventually participate in the search query by clicking on the checkboxes that reside on top of each box (see Fig. 6).

Figure 6. Selecting query terms.

Based on the above query construction steps, we provide the user with information for determining and accordingly expressing the semantic orientation of his queries, before these are submitted for search.

5. Discussion As final notes, we should firstly underline the fact that the searcher is always in control of the query construction process. In case he has difficulty in using the service, the overall search process does not break down but rather the user may proceed as usual and address the query to the search engine. Moreover, by employing the proposed service, the searcher is instantly acquainted with query terms that otherwise would take him a lot of time to gather by exhaustively running through the search results of an initially vague query. Additionally, the provided functionality is smoothly integrated to the traditional search engine’s GUI, since it occupies just a small portion of the screen on top of the search box, thus leaving plenty of room for the display of search results. Furthermore, the simplicity of the underlying architecture not only renders the proposed service scalable to future enhancements with more semantically-rich

datasets, but also guaranties its rapid execution time. The above features are very important for large-scale web search engines where time and space play a crucial role for their prosperity. The employment of common web widgets such as the auto-suggest box and clickable divisions (
) as well as the absence of semantic web terminology from the GUI, renders the proposed service fast to learn and easy to use. Finally, we should mention that if there is no information in Wikipedia about the user typed terms, the query is transparently forwarded to the search engine that our system integrates. Therefore, the worst case scenario in the search process is that the user does not get any help from the service, but still his query is automatically submitted to the underlying engine for search. Our query construction service has so far been integrated with two major web search engines (Google and Yahoo) and can be accessed online7. Thus, we believe that the integration is doable for any search engine that gives programmable access to its search box.

6. Evaluation In this section, we discuss the details of an evaluation we carried out in order to validate the impact of our service in the searching process on the web. For this purpose, we decided to develop a simple, fast to answer online questionnaire and to distribute it (through social networking sites and mailing lists) to people that spend considerable time online. Their specific demographic features (e.g. age, sex, occupation, etc) were not recorded during the evaluation process. Eventually, we received 72 responses, allowing us to obtain a concrete picture about the impact of the service to a specific group of users, consisting of web searchers that employ large-scale web search engines as part of their everyday habits. The overall impact of the proposed service to the entire web population can only be assessed through the adoption of the service from a largescale web search engine.

6.1 Evaluation process Each participant was asked to initiate a search session with a large–scale search engine through the employment of the proposed service. Afterwards, he was asked to fill the provided questionnaire, which consisted of seven closed-type questions. The questionnaire together with the statistics of the survey are available online8. 7

http://195.251.111.53/snh/entry/index.html for Google and http://195.251.111.53/snh/entry/index2.html for Yahoo. 8 http://195.251.111.53/snh/entry/survey_results.html

We asked the participants to answer the questionnaire for each of their searches, in order to record, (i) the type of searches performed, (ii) the query refinements, (iii) our service’s contribution in the query refinement process, (iv) the user satisfaction from the search results and (v) our service’s usability.

6.2 Evaluation results Based on the statistics of our data we observe that in 87.3% of the searches, users modified their initial queries with alternative keyword formulations and that the majority of the query modifications (i.e. 77.4%) were assisted by our service. The first question attempts to identify the information seeking mode (defined by Spencer [13]) of the searcher prior to the initiation of the searching process. According to the results of the survey, 18 of the searches belong to the ‘known-item’ mode, 36 belong to the ‘exploratory’ mode and 18 belong to the ‘don’t know what I need to know’ mode. In order to assess the impact of the service to each mode, we grouped together the remaining questions per mode. Known-item searches. From the 18 searches adhering to this mode, the initial query was altered in nearly half of them (9). From these, the proposed service was employed in 5 searches with balanced satisfaction: In a five-point scale, 2 participants answered that their information needs were not satisfied, 1 participant answered that his satisfaction was neutral and 2 participants answered that their information needs were satisfied. Satisfaction was also balanced in the remaining searches where the proposed service was not employed: 3 participants answered that their satisfaction was neutral and 1 participant answered that his information needs were satisfied. From the above results, we can conclude that the employment of the proposed service for the known item information seeking mode did not influence significantly user satisfaction. Exploratory searches. From the 36 searches adhering to this mode, the initial query was altered in all of them. Only one participant did not employ the service to refine his query. The proposed service was exclusively employed in 30 searches with increased satisfaction: In a five-point scale, 3 participants answered that their information needs were not satisfied, 7 participants answered that their satisfaction was neutral, 15 participants answered that their information needs were satisfied and 5 participants answered that their information needs were greatly satisfied. From the above results, we can conclude that the employment of the proposed service greatly influenced user satisfaction.

Finally, the most popular option to refine the initial query through the employment of the proposed service was the ‘in category’ choice (26 participants) followed by the ‘clarified by’ choice (6 participants). ‘Don’t know what I need to know’ searches. From the 18 searches adhering to this mode, the initial query was altered in nearly all of them (17). The traditional way of refining the initial query (i.e. by exhaustively running through search results) was employed in only 2 searches. The proposed service was exclusively employed in 13 searches with increased satisfaction: In a five-point scale, 1 participant answered that his information needs were completely not satisfied, 1 participant answered that his satisfaction was neutral, 9 participants answered that their information needs were satisfied and 2 participants answered that their information needs were greatly satisfied. From the above results, we can conclude that the employment of the proposed service greatly influenced user satisfaction. Finally, the most popular option to refine the initial query through the employment of the proposed service was the ‘in category’ choice (8 participants) followed by the ‘clarified by’ choice (5 participants). Usability. With respect to usability, the answers (in a five-point scale) that our participants gave to the question: “Do you think that the provided service was easy to use?” are quite encouraging: 4.17% said that they completely disagree, 11.11% said that they were neutral, 51.39% said that they agree and 18.06% said that they completely agree. Results demonstrate that our service has a significant potential in assisting web users make informed query selection and thus improve their web search experience. Our service’s impact is especially pronounced for searches pertaining to vague information needs for which the user is unable to pick the best keywords for verbalizing his search intentions.

7. Comparison with Relevant Works The idea of using external resources for improving retrieval effectiveness is now new. As a matter of fact, many works suggest the utilization of large corpora [8] or ontologies [3] for making rich connections between user queries and document collections in the hope of improving the accuracy of search results. Recently, researchers suggested that Wikipedia may serve as the external resource to support query formulation. In this respect, Wikipedia-based querying methods have been reported in [10] and [11] respectively. Here, we briefly discuss those approaches in order to underline their commonalities and diversifications form our work. Specifically, in [11] a query expansion method based on Wikipedia articles is introduced, which aims at improving

retrieval effectiveness for weak queries for which pseudorelevance feedback is not helpful. Despite the commonality of using Wikipedia for empowering web information retrieval, our work is different from [11] in the following. First and foremost, our aim is not to automatically expand the user queries with semantic information harvested from Wikipedia, but rather to inform users about the different interpretations of their search keywords and thus enable them to make informed query selections. Hence, under our model the user maintains control throughout his searches by picking the query terms himself, rather than letting the engine take over after issuing his initial query. Moreover, the work in [15] is intended for use in controlled repositories, not for integration with large-scale web search engines. In the work of [10], the Koru interface is introduced that offers WikiSauri, i.e. thesauri extracted from the Wikipedia articles, upon which users rely to find alternative formulations for their queries in order to satisfy their information needs. Although the proposed system shares both a common objective and approach with our query construction service, it is different from our work in four significant ways. Firstly, it is addressed to a digital library’s search engine, not a large-scale web search engine. Moreover, the enormous size of the information resources within largescale web search engines together with their highly dynamic and complex nature demands for completely decoupling the query interface from the underlying information resources, in order to avoid scalability and performance problems [2]. This is not the case in Koru, where query terms are ranked according to their relevance to the document collection. Additionally, the proposed GUI employs commonly used widgets and thus entails a reduced learning curve compared to Koru. Finally, the contribution of our service was evaluated against a real web dataset originating from the Google search engine. Conversely, Koru was evaluated against a TREC document collection in which the query relevant documents were already known.

8. Conclusions In this paper, we introduced a novel query construction service that has been seamlessly integrated with large-scale web search engines in an attempt to assist information seekers specify precise and unambiguous queries. The proposed service relies upon the semantic information stored in DBpedia, which it organizes into an ontology and enables users understand the semantic orientation of their search keywords before these are actually issued for search. To demonstrate the provided functionality, we have implemented an interactive, non-intrusive and easy-to-use query construction service via which users obtain information about the semantics of their search terms as well as alternative wordings for verbalizing their search intentions.

Semantic information is gradually provided to the users upon request and helps them crystallize their search pursuits progressively. The evaluation of the proposed service produced highly encouraging results, considering the fact that participants were not specifically instructed how to use the service. Instead, they were provided with a URL pointing to the service. Both the how-to instructions and the questionnaire were available directly from the homepage of the proposed service. Moreover, the fact that only 11.11% of the participants did not make use of the provided suggestions for refining their initial query, indicates that despite the fact that Wikipedia contains just about 3,000,000 articles, nevertheless this number is capable of addressing the majority of everyday queries to large-scale search engines. Of course, this is valid just for the part of the web population represented by the specific group of users that participated in the survey. Future work is underway in the direction of extending the underlying ontology with more datasets from DBpedia, providing this way more options to web searchers wishing to address semantically rich queries to large scale, web search engines.

9. References [1] Auer, S., Bizer, C., Lehmann, J., Kobilarov, G., Cy-

[2]

[3]

[4]

[5]

[6] [7]

ganiak, R. and Ives, Z. 2007. DBpedia: a nucleus for a web of open data. In Proc.of the 6th Intl. Semantic Web Conference. Baeza-Yates, R., Ciaramita, M., Mika, P. and Zaragoza, H. 2008. Towards semantic search, natural language and information systems. In Springer Lecture Notes in Computer Science, vol.5039, pp. 4-11. Bhogal, J., Macfarlane, A. and Smith, P. 2007. A review of ontology based query expansion. In Information Processing and Management, vol.43, no.4, pp. 866-886. Billerbeck, B., Scholer, F., Williams, H.E. and Zobel, J.2003. Query expansion using associated queries. In Proceedings of the ACM CIKM Conference. Papadakis I., Stefanidakis M. 2008. Visualizing ontologies on the web, New Directions in Intelligent Interactive Multimedia, Studies in Computational Intelligence, vol. 142, Springer Verlag, pp. 303-311. Broder, A. 2002. A taxonomy of web search. In SIGIR Forum, 36(2). Salton G., Buckley C. 1990. Improving Retrieval Performance by Relevance Feedback, Journal of the American Society for Information Science, 41(4), pp. 288-297.

[8] Diaz, F. and Metzler, D. 2006. Improving the estima-

tion of relevance models using large external corpora. In Proceedings of the SIGIR Conference, pp. 154-161. [9] Bradley, P. 2008. Human-powered Search Engines: An Overview and Roundup, Ariadne, 54, available online: http://www.ariadne.ac.uk/issue54/searchengines/. [10] Milne, D.N., Witten, I.H. and Nichols, D. 2007. A knowledge-based search engine powered by Wikipedia. In Proc. of the 16th Conf. on Knowledge Management, pp.445-454. [11] Li, Y., Luk, R.W.P., Ho, E.K.S. and Chung, F.L. 2007. Improving weak ad-hoc queries using wikipedia as external corpus. In Proceedings of the SIGIR Conference, pp.797-798. [12] Pazzani, M., Muramatsu, J., Billsus, D. 1996. Syskill & Webert: identifying interesting web sites. In Proceedings of the 13th National Conference on Artificial Intelligence, pp.54-61. [13] Spencer, D. 2006. Four models of seeking information and how to design for them boxes and arrows. Available at: http://www.boxesandarrows.com/view/four_modes_of _seeking_information_and_how_to_design_for_them [14] Spink, A., Park, M., Jansen, B. and Pedersen, O. 2006. Multitasking during web search session. In Information Processing and Management, 42(5): 1366-1378. [15] Stamou, S. and Ntoulas, A. 2009. Search personalization through query and page topical analysis. In User Modeling and User Adapted Interaction, vol. 19, no.12, pp. 5-33. [16] Stamou, S., Kozanidis, L., Tzekou, P. and Zotos, N. 2009. Ontology-driven personalized query refinement. In the Journal of Web Engineering, vol.8, no.2, pp. 113-153. [17] Suchanek, F.M., Kasneci, G. and Weikum, G. 2007. YAGO: a core of semantic knowledge-unifying WordNet and Wikipedia. In Proc. of the WWW Conference, pp. 697-706. [18] Weld, D.S., Wu, F., Adar, E., Amershi, S., Fogarty, J., Hoffmann, R., Patel, K. and Skinner, M. 2008. Intelligence in Wikipedia. In Proc. of the 23rd AAAI Conference. [19] Wu, F. and Weld, D.S. 2008. Automatically refining the Wikipedia infobox ontology. In Proceedings of the 17th Intl. World Wide Web Conference, pp. 635-644. [20] Sun, J.T. 2005. CubeSVD: A Novel Approach to Personalized Web Search, In Proceedings of WWW ’05, pp. 382–390. [21] Billerbeck, B. and Zobel, J. 2004. Questioning query

expansion: an examination of behaviour and parameters. In Proc Australian Database Conf, pp. 69-76.