Ontology-based keyword search evaluation.docx

5 downloads 369 Views 73KB Size Report
In this way, our approach can focus only on the domain search we want to offer. We have performed a keyword search for each taxonomy and its pruned version, ...
Ontology-based keyword search evaluation Following with the use case of the DBpedia, we have focused on giving a proof of concept of the main entry point to our system: the keyword-based search on Linked Data. To do so, we use the DBPedia’s SPARQL endpoint as Linked Data endpoint, and three different article taxonomies to focus the searches on different domains. Please, note that we are only evaluating the keyword search, not the refinement step: we are not defining thoroughly the objects of the search (the properties that allows our system to further refine the results as recommendations), but we assume for these examples that the user might be interested in anything. This is not against the flexibility of our system, but just to show how it performs over the whole set of DBPedia’s articles. In order to quantify the improvement of our ontology-guided keyword search with respect to Wikipedia's keyword search, we have prepared three taxonomies that focus on different knowledge domains. Moreover, we have another version for each of these taxonomies which we have pruned to avoid noise coming from non-related data. In this way, our approach can focus only on the domain search we want to offer.

We have performed a keyword search for each taxonomy and its pruned version, and have analyzed the results. In the first table, we can see which categories have been pruned: on one hand, we have pruned a set of categories and their subcategories choosing them by URI (their subcategories have also been omitted due to space reasons); and, on the other hand, we have applied an extra filtering by keyword, i.e. if the URI contained any of the substrings detailed in the table, we pruned it. The resulting categories can be found in http://sid.cps.unizar.es/HybridKeywordSearch-data/categories/.

For each result set, we have selected the ten first results provided for both Wikipedia and Ontology-based keyword searches. We have compared to each other taking the user’s intended meaning into account, highlighting the closest results for each search as shown in the following table:

For the first search, the user inputs the keywords “web build” looking for information about textile industry, how web is being manufactured and how it can be used to make clothes. So, the chosen taxonomy is based on Industry. We can see that the Wikipedia only provides us with ITrelated content, as it retrieves the results from the whole domain, being the IT-related the most popular ones. Our approach using the base taxonomy gives us more accurate results. These results are improved by improving the definition of the search domain in the pruned taxonomy search: most companies, persons and IT-related resources are not anymore in the search domain. For the second search, the user inputs “flame extinguisher” looking for flame extinguishing techniques and devices. The chosen search domain is Chemistry. We can see the results retrieved by our system are related to the specified search domain, and provides a good set of results, while Wikipedia results only provides us with the “Fire extinguisher” article; the other results are not really related to chemistry science and the given keywords.

Finally, the third example behaves in a similar way to the second one. The user, as in the paper, inputs the keywords “fish movement” looking for articles related to the mechanics of the movements of fishes. The results retrieved by our system are more accurate for the user’s needs as our system search only mechanics related articles, while Wikipedia results would be out of his/her interests. Note how the highlighted results are not affected by using the pruned version: in this case, the benefits of using a smaller domain are related to efficiency issues (the less subjects have to be searched on, the faster the search is). However, these results are also suggesting us that our approach performs very well even with automatically extracted taxonomies, which facilitates the definition of search domains. Looking at these results, the difference between both search methods clearly arises. The improvement that our approach achieves is directly due to the definition of the search domain: it provides extra information for the query transparently to the user. Moreover, the efforts of defining the search domain are worthy according to the search precision that is achieved (even the automated version - the version of the taxonomy that can be built automatically - obtains more focused results than the Wikipedia keyword search).