Exploring Exploratory Search: A User Study with

0 downloads 0 Views 738KB Size Report
Aemoo: Exploratory search based on knowledge patterns over the semantic web. Semantic Web Challenge, ISWC2012. [20] Nguyen, V., Bodenreider, O., ...
Exploring Exploratory Search: A User Study with Linked Semantic Data Vania Dimitrova, Lydia Lau, Dhavalkumar Thakker, Fan Yang-Turner, Dimoklis Despotakis School of Computing, University of Leeds, UK {V.G.Dimitrova, L.M.S.Lau, D.Thakker, F.Yang-Turner, D.Despotakis}@leeds.ac.uk

ABSTRACT The maturation of semantic technologies and the growing popularity of the Linked Open Data (LOD) cloud make it possible to expose linked semantic data sets to end users in order to empower a range of analytical tasks taking advantage of knowledge integration and semantic linking. Linked semantic data appears to offer a great potential for exploratory search, which is open-ended, multi-faceted, and iterative in nature. However, there is limited insight into how browsing through linked semantic data sets can support exploratory search. This paper presents a user study with a uni-focal semantic browsing interface for exploratory search through several data sets linked via domain ontologies. The study, which is qualitative and exploratory in nature and uses music as an illustrative domain, examines (i) obstacles and challenges related to user exploratory search in LOD and (ii) the serendipitous learning effect and the role semantics plays in that. The approach and lessons learnt can benefit future human factor studies to evaluate interactive exploration of linked semantic data, as well as technology developers to become aware of issues that have to be addressed in to facilitate exploratory search with LOD.

Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Metadata; Search Process; H.5.4 [Hypertext/Hypermedia]: Architectures; User Issues; Navigation

General Terms Design, Experimentation, Human Factors

Keywords Semantic Data Exploration, User Interaction, Linked Open Data, Exploratory Search

1. INTRODUCTION The Linking Open Data initiative has engaged various communities to share their data for sustainable usage based on semantic web technologies [8]. Increasingly, applications have been developed to make use of these available resources. Although the motivation behind is to enable automated integration of data from different sources, it is the humans who are the consumers of these Linked Data. The human usage is the ultimate Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Hypertext 2013, May 1–3, 2013, Paris, France. Copyright 2013 © 2013 ACM 978-1-4503-2006-1... $15.00.

judgment for the benefit of information integration and linking [3]. There are arguments that Linked Data can be utilised to enable user-oriented exploratory search systems for the future internet, e.g. [25]. Stepping on such arguments, the paper aims to examine user exploratory search behaviour when interacting with a semantic data browser over linked semantic data. In contrast to regular search, exploratory search gives a more complete overview of a topic. Exploratory search is open-ended, multi-faceted, and iterative in nature, and is commonly used in scientific discovery, learning, and sense making [18, 27]. Exploration demands more time, effort and creativity from the user, but rewards the user with deeper knowledge [18]. Exploratory search is particularly beneficial for ill-structured problems and more open-ended goals, with persistent, opportunistic, iterative query processes. Exploratory tasks inherently have uncertainty, ambiguity and discovery as common aspects [18]. Linked semantic data appears to offer a great potential for exploratory search. Earlier studies suggested that tags [13] or some form of presentation of the knowledge space structure [23] could benefit browsing and learning. The work presented here starts from these claims, and examines the role of semantic tags and their effect on browsing and learning in a class of applications called semantic data browsers. The paper presents a case study with a conventional semantic browsing interface in the music domain, providing a multi-facted eveluation approach to examine exploratory search aspects. The work presented here is conducted within two EU projects that study the application of interactive exploration of semantic linked data to facilitate sensemaking (Dicode1) and learning (ImREAL2). We follow linked open data tenets and exploit available linked datasets in the application domains we experiment with. A semantic data browser shell, called Pinta3, which provides unifocal faceted exploration of linked semantic data was developed. This paper uses an instantiation of Pinta in a Music domain – MusicPinta – which links music datasets from the Web of data and social content from Amazon reviews about musical instruments. The main contribution of this paper to interactive exploration of semantic data is the multi-faceted evaluation study which follows methodologies from the research community on exploratory search. The findings indicate that semantic facets support exploratory search and facilitate serendipitous learning; and indicate issues that need further attention to enable 1

www.dicode-project.eu

2

www.imreal-project.eu

3

Using an analogy with Christopher Columbus’ ship ‘La Pinta’; in our case, Pinta is a browser shell providing a means to explore through a vast amount of data.

exploratory search with LOD (e.g. avoiding ‘empty links’ and sensing the quality of information the user will see from a link).

2. RELATED WORK Semantic data browsers have emerged from a collective effort in the semantic research community. Such browsers operate on semantically augmented data (e.g. tagged content) and lay out browsing trajectories using relationships in the underpinning ontologies. Tabulator [2] can be considered the first semantic data browser which enables users to browse data by following semantic links to resources. Two types of semantic data browsers have since emerged – (i) pivoting (or set-oriented browsing) and (ii) multi-pivoting. In a pivoting browser, a many-to-many graph browsing technique is used to help a user navigate from a set of instances in the graph through common links [22]. Exploration is often restricted to a single start point in the data and uses 'a resource at a time' to navigate anywhere in a dataset [1]. This form of browsing is also referred as uni-focal browsing. A second type of browsers supports multi-pivoting which allows a user to start from multiple points of interest. For example, PolyZoom enables multi-focus exploration of maps for a user to zoom multiple parts of the map at the same time [12]. Different interfaces have also been proposed in recent work. Notable attempts are Parallax [11], VisiNav [7], FacetGraphs [9] and ICAW [24]. The paper presents a fairly traditional semantic data browser which provides a uni-focal interface for browsing through several linked semantic datasets. Our contribution to semantic data browsers is considering user behaiour and examining how semantic data browsers can support exploratory search. Exploratory search has been trialed in different domains. Nguyen et al. [20] provided a web tool with a graphical interface for interactive knowledge exploration within the biomedical domain (which integrates the experimental data with the knowledge extracted from the PubMed articles). An empirical study was conducted to examine the impact of browsing an ontology-driven information system on users’ ability to learn domain knowledge[10]. While the expected learning effect did not occur among differently trained participant groups, previous online search experience showed a positive correlation with participant performance. A mixed approach supporting exploratory search through a simple keyword-based search interface over diverse sources such as linked data, Wikipedia, Twitter, and Google News was provided in [19]. An application of semantic data browser in learning was developed recently in the mEducator project [4]. Despite these initial prototypes, more case studies and experimentation are needed to identify which features in a semantic data browser may benefit or hinder the exploratory process. For this line of investigation, experimental and methodological toolset developed by information retrieval and human-computer interaction research on exploratory search [21] can provide useful guidelines. Metrics that emerged from brainstorming and breakout sessions at a workshop on exploratory search [26] indicate how to assess the performance of exploratory search systems, considering: engagement and enjoyment, information novelty, task success, task time, learning and cognition. However, as raised by an evaluation of an exploratory search tool (CoSen) [23], when people want to change or grow their current knowledge structure, they may not be able to specify precisely what is needed. Not only is this structural-information need hard to express, there is also a lack of effective search mechanisms for finding appropriate structures. This supports the assumption that semantic may be able to fill this gap and underpins the approach and the findings in this paper.

A key component of exploration is human learning, a topic studied extensively by cognitive psychologists [16]. An evaluation study of a faceted search interface [15] showed that facets played a major role in the browsing process, accounting for about half of the time spent looking at actual results. This underscores the importance of facets, which is adopted in the design of the semantic data browser presented in this paper. We further examine the role of semantic facets (facts, related terms, and content) in exploratory search tasks by looking into how users complete exploratory search tasks in an unfamiliar domain.

3. CASE STUDY: MUSICPINTA We have developed a uni-focal semantic browsing interface – Pinta - for exploratory search through several data sets linked via domain ontologies. Pinta uses existing ontologies and performs semantic augmentation of text and meta data utilizing the General Architecture for Text Engineering (GATE4). The semantic augmentation produces annotated sets of extracted entities with offset, ontology URI and type information. The annotated sets are converted to RDF triples and stored in a semantic repository (using OWLIM5) which combines the functionality of an RDFbased DBMS and an inference engine. SPARQL queries (using the Sesame API6) over the semantic repository provide concept/content lookup functionalities to find related and relevant concept(s) or content(s). The main goal of Pinta is to enable users to easily tap into resources built from the Web and, in particular, exploring the use of the Linked Data paradigm. The music domain has been selected for an instantiation of the browser, called MusicPinta. The Web of data is rich in music-related content - as of 2011, there were at least 13 datasets identified, with a diverse range of concepts covering instruments, performances, artists, and music genres. The data sets used in MusicPinta (see Figure 1) are: •

DBpedia: this includes the part about musical instruments and artists. This dataset is extracted from dbpedia.org/sparql using CONSTRUCT queries.



DBTune: includes music-related structured data made available by the DBTune.org in linked data fashion. Among the datasets on DBTune.org we utilise: (i) Jamendo - a large repository of Creative Commons licensed music; (ii) Megatune - an independent music label; and (iii) MusicBrainz - a community-maintained open source encyclopedia of music information.



Amazon reviews of musical instruments shown in Pinta.

All datasets, except the reviews, were available as RDF datasets and utilise Music ontology as schema. The Amazon reviews were converted in RDF using Pinta’s semantic augmentation. Table 1 presents the datasets that support exploration centred on instruments, artists, albums, tracks and records. Table 1: Main topics and data sets in MusicPinta Topic

Supported in Dataset(s)

Musical Instruments

MusicBrainz, DBpedia

Artists

MusicBrainz, DBpedia, Jamendo, Megatune

4

http://gate.ac.uk/

5

http://www.ontotext.com/owlim

6

http://www.openrdf.org/doc/sesame2/api/

Albums/Tracks/Records

MusicBrainz, Jamendo, Megatune

Customer reviews

Amazon Reviews

The datasets coming from DBTune.org (such as MusicBrainz, Jamendo and Megatunes) already contain the “sameAs” links between them for linking same entities. We utilise the “sameAs” links provided by DBpedia to link MusicBrainz and DBpedia datasets. This way, the DBpedia is linked to rest of the datasets from DBtune.org. This allows MusicPinta Pinta to benefit from the DBPedia abstract descriptions of entities and their images.

4. EXPERIMENTAL STUD STUDY To get an insight of how MusicPinta can support exploratory search through linked semantic data, we conducted an experimental study following methodological recommendations for evaluating exploratory search systems [27]. The experimental study followed a within-subjects design method with two exploratory search tasks as the within-subjects within factor. To avoid bias by the ordering of tasks, tasks participants were divided equally into two groups; one group performed task 1 then task 2, and the other group performed d task 2 then task 1.

Figure 1:: Linked data semantic datasets used in MusicPinta DBpedia is also very rich in terms of categorisation of m musical instruments. For example, the categories separate instruments according to their country of origin/use. The MusicPinta datasets has 2.4M entities and 19M triple statements, taking 2GB physical space, including 876 musical instruments entities, 71k performances (albums, records, tracks), 188k music artists. Figures 2 and 3 show example user interface in MusicPinta.

Figure 3:: Extract from the information about electric guitar entry point for task 2 in the experimental study (Section 44)

Figure 2:: Extract from the information about bouzouki entry point for task 1 in the experimental study (Section 4) 4

Participants. The study involved 12 participants recruited on voluntary basis (a compensation of £15 Amazon vouchers was paid). Half of the participants were native speakers and the other half spoke and communicated in English fluently. All participants had IT background, good experience in web search and some experience in data analysis. Half of the participants participan have visited sites with music information regularly, while the others did this only occasionally; 4 participants listened to online music sites daily. Half of the participants indicated that they currently practiced musical instruments (none of these we were instruments they had to research in the study, see below).

Method. Each participant attended an individual session, conducted and observed by an experimenter, about an hour, with: • •





• •

Pre-study questionnaire [5 min] - collecting information about the user profile and test his/her domain awareness. Introduction to MusicPinta [10 min] – the participants followed a script which introduced the main features of the system using the instrument tenor saxophone as an example; Task 1 [20 min] - identify distinctive characteristics of the musical instrument bouzouki [15min], after which complete a task difficulty questionnaire [5 min]. Task 2 [20 min] - usage and features of the musical instrument electric guitar [15min], after which complete a task difficulty questionnaire [5 min]. Post-study questionnaire [10 min] – test again the participant’s domain awareness and gather usability feedback. Brief interview [5 min] – overall impression of MusicPinta.

Domain awareness test. Testing domain awareness is crucial for identifying whether there was any learning effect from the browsing behavior. A free word association test, seen as a reliable measure of prior knowledge in reading comprehension studies [29], was used. We asked participants to write any associated words to a list of 12 musical instruments. This technique does not add noise to the experiment and does not influence the browsing behaviour (we used 4 instruments participants could explore/learn about in task 1, 4 instruments that participants could explore/learn in task 2, and 4 instruments unrelated to either tasks). Tasks. To design the study tasks, we have followed the main characteristics of exploratory search tasks summarised in [28]: the main goal is learning and/or investigation of a musical instrument; there is a low level of specificity about the information needed and how to find it; search is open ended, requires finding several items and involves a degree of uncertainty; tasks are ‘not too easy’ and include multiple facets. The study required participants to complete two tasks related to researching musical instruments and positioned within an advertisement scenario of a hypothetical UK music shop (see tables 2 and 3). In both tasks, the participants were given an entry point to the browser and asked to fill in their answers in a provided template (see tables 4 and 6). Table 2: Task 1 Characteristics of a musical instrument The music shop is extending its collection of instruments with international musical instruments. You work in an advertising agency which has been asked to prepare an advertisement script for some of the new instruments that will appear in the shop. A key part of the preparation of the advertisement script is the research of the product. You have been asked to conduct a research of one of the new instruments, called bouzouki, using the information available in MusicPinta. You have to identify: • • •

the main characteristics of bouzouki; up to five similar instruments to bouzouki; features that make bouzouki distinctive from the similar ones you have chosen.

Go to ‘Semantic Search’ in MusicPinta and type bouzouki. Browse the content and follow links. Complete the provided form.

The completion of Task 1 required mainly browsing through the musical instrument classification (in both DBTune and DBpedia) and reading descriptions provided from DBpedia. The task was analytical in nature, as users had to perform comparison and identification of distinctive features. In contrast, Task 2, which required browsing through content about music albums and artists, and reading through Amazon reviews, was more ambiguous and involved some creative thinking and imagination.

Table 3: Usage and features of a musical instrument The music shop wants to increase the sales of its traditional musical instruments, such as electrical guitars. It intends to do this by adding links to creative commons album recordings with electric guitars, together with some interesting information about these albums to inspire customers to play/buy electric guitars or other musical instruments. Furthermore, when displaying its electric guitar items, the shop wants to highlight key features people look for when purchasing electric guitars. You are asked is to conduct the research to address the above requirements by using information provided in MusicPinta. You have to review the information about electric guitar and identify: • •

three interesting album recordings that include electric guitars and specify what is interesting; key features that people look for when purchasing an electric guitar.

Go to ‘Semantic Search’ in MusicPinta and type electric guitar. Browse the content and follow links. Complete the provided form.

Task difficulty. After each task, the users were asked to fill-out a short questionnaire to rate their subjective level of cognitive load using a modified version of the NASA-TLX questionnaire [6]. In addition, the participants were asked to think aloud; the experimenter kept notes of any interesting comments made. Data collected. The data collected in the study includes: (i) the forms with the participants’ outputs for tasks 1 and 2; (ii) the preand post-experiment questionnaires and word association tests; (iii) system log data; (iv) experimenter notes. The data was analysed using qualitative and quantitative methods (including non-parametric statistical test); the results are presented below.

5. RESULTS 5.1 Task Success Two musical instrument experts (one for Bouzouki, one for Electric guitar) have marked the outcome of participants for the two tasks. The marking is to measure how successful the participants have been in completing the tasks using MusicPinta. Task 1: Bouzouki characteristics Table 4: Sample answer to task 1 from a participant Characteristics of bouzouki: Greek musical instrument, lute family, string instrument, nice looking, sharp sound, mandolin Instruments similar to bouzouki Instrument Mandolin Tambura Banjo

Ukulele

Similarity Similarly looking Kind of bouzouki Plucked string instrument, lute

Difference Bouzouki No information about tambura Banjo has 4 pairs of strings, Bouzouki can be with 3 or 4 pairs of strings, shape is different – bouzouki looks more elegant Ukulele is more like a guitar (looks a small guitar)

Plucked string instrument, 4 pairs of strings Balalaika Plucked string Balalaika - is in Russia, shape is instrument, folk different, 3 strings instrument Sitar Plucked string Indian, Pakistan, has many strings instrument, folk instrument Summary of distinctive features of bouzouki: Greek, plucked string instrument, 3 or 4 pairs of strings, elegant looking.

For task 1, participants produced a form with 3 sections: (i) characteristics of bouzouki; (ii) instruments similar to bouzouki;

and (iii) summary of distinctive features of bouzouki (see Table 4 for a sample answer by a participant). The expert examined the information about bouzouki (using both the description and the semantics presented in MusicPinta) and identified main characteristics. The participants’ answers were compared with the experts’ answer and scored according to the overlap. For similar instruments, the expert considered whether the instruments identified by the users were appropriate taking into account information shown in MusicPinta. For example, all similar instruments listed in Table 4, except ukulele, are marked as appropriate (ukulele is seen as inappropriate as its connection with bouzouki is via string instrument which is seen as a rather generic link for this task). The similarity and difference specified by the user were scored regarding appropriateness, as well as the summary of distinctive characteristics. The answer in Table 4 does not provide sufficient description about the bouzouki’s difference with mandolin and tambura. The average score of similarity-difference was 70% (st dev 14). The percentage achieved by all participants on the different components in task 1 is given in Table 5. All together, the participants identified 44 characteristics (70%, individual score median 4) from the description section of bouzouki (including the picture), and 19 descriptions (30%, individual score median 1.5) from the semantic tags. The difference between the two sets is significant (Wilcoxon test, W=-60, p