VTT INFORMATION TECHNOLOGY Technology Report

16 downloads 18146 Views 1018KB Size Report
Aug 9, 2005 ... Technology Report. Version ... the part of VTT Information Technology. ... This report presents the current state and future development potential of ...... A hierarchical classification of things, for example, in the classification of.
VTT INFORMATION TECHNOLOGY RESEARCH REPORT TTE4-2005-14

RISE

Technology Report Version 1.2 (public)

9.8.2005 Asta Bäck, Tommo Reti, Janne Saarela, Risto Sarvas, Marko Turpeinen, Sari Vainikainen

RISE

Technology Report

Version history Version

Date

Author(s)

Reviewer

Description

1.0

18.5.2005

AB, SV, MT, RS, TR, JS

CS

First version made available to the project steering group and partners.

1.1

20.5.2005

JS

AB

Chapter 3.2 updated.

1.2

9.8.2005

RS

AB

Chapter 2. updated

AB

Abstract added, Preface updated. Public version of the report.

Contact information Asta Bäck VTT Information Technology P.O. Box 1200, FIN-02044 VTT, Finland Street Address: Tekniikantie 4 B, Espoo Tel. +358 9 4561, fax +358 9 456 7024 Email: [email protected] Web: http://www.vtt.fi/tte/ End of editable fields

Last modified on 19 August, 2005 I:\Content\rise_dohasta\RISE_report_20050809.doc

Copyright © VTT Information Technology 2001. All rights reserved. The information in this document is subject to change without notice and does not represent a commitment on the part of VTT Information Technology. No part of this document may be reproduced without the permission of VTT Information Technology.

RISE

Technology Report

UDK Key Words

media, metadata, metadata vocabularies, semantics, semantic web, ontologies

Abstract This report presents the current state and future development potential of technologies that are relevant in developing new media products and services based on content that is described with semantically rich metadata. It has been written within the project "Rich semantic media for private and professional users". The project addresses two important trends and opportunities: utilising semantics in media products and combining commercial media content with user created material. The report starts with a presentation of the framework for the project and describes three key demonstration ideas. The demonstrations explore combining professional and private content and how this process benefits from semantic support. The main focus of the project and this report is on semantically supported media applications but the report briefly looks at the Semantic Web status in general and what kind of semantically supported applications there are already currently available. The big challenge in creating semantically supported media applications is creating the necessary infrastructure, i.e. ontologies and content with metadata. A formal language is needed for specifying the ontology, and there the recently approved the Web Ontology Language (OWL) is an important step. However, there are still only few practical applications, so it is still unclear how well this language can meet the requirements. Query language is another important component, and there, SPARQL Query Language for RDF, is currently being defined. Creating metadata is very laborious, if made manually. Therefore, different way of creating metadata must be utilised: metadata may be captured as a byproduct during the production and consumption processes, it may be created explicitly, or may be inferenced based on the content itself. Metadata creation is not in the focus of the project, but the main principles are briefly explained. Several existing media related vocabularies are presented in the report as well as some publicly available ontologies. There is lack of publicly available vocabularies and ontologies, and those that exist, such as IPTC, are often utilised only to limited extent. The report also lists several project that deal with related issues.

VTT Information Technology

i

Last modified on 19.08.05

RISE

Technology Report

Preface This report has been produced as part of the project "Rich Semantic Media for Private and Professional Users" (RISE). The project addresses two important trends and opportunities: utilising semantics in media products and combining commercial media content with user created material. The role of this report is to gather and assess the status of the relevant technologies, and trends. The report will be utilised in making decisions relating to building the prototypes, and it will be processed further into roadmaps.

The project is financed by TEKES, Alma Media Oyj, SanomaWSOY Oyj, YLE, Profium Oy, and VTT. The project steering group consists of the following persons: Keith Bonnici (TEKES), Eskoensio Pipatti, (SanomaWSOY Oyj), Hannele Vihermaa (Alma Media Oyj), Jouni Siren (YLE), Janne Saarela (Profium Oy), Caj Södergård (VTT) and Marko Turpeinen (HIIT).

The report has been written by Asta Bäck and Sari Vainikainen from VTT Information Technology, Marko Turpeinen, Risto Sarvas and Tommo Reti from HIIT, and Janne Saarela from Profium. Chapters 1, 2.2.1, 3 (3.2 together with Janne Saarela), 4 (excluding 4.3.1), 6, 9.1, 9.2, 9.3, 9.4, 9.5, 10.4, 11.1, 11.2, 12 (excluding 12.5 and 12.6) were written by VTT research scientists. Chapters 2.2.2, 4.3.1, 5, 8, 9.6, 9.7, 9.8, 10 (excluding 10.4), 11.3, 12.5, 12.6. were written by HIIT research scientists. Chapter 7 was written by Janne Saarela, Profium, as well as most of chapter 3.2. All partners, HIIT, Profium and VTT participated in writing the chapters 2.1 and 13. The authors want to thank the RISE steering group for their support and comments.

VTT Information Technology

ii

Last modified on 19.08.05

RISE

Technology Report

Contents 1 Introduction...................................................................................................................... 1 2 Framework and research questions .................................................................................. 3 2.1 Framework ................................................................................................................ 3 2.2 Demonstration descriptions ...................................................................................... 5 2.2.1 StorySlotMachine .......................................................................................... 5 2.2.2 My Own Hockey Clip - Professional Personal Memorabilia ........................ 6 2.2.3 DiMaS............................................................................................................ 8 3 Introduction to semantic metadata ................................................................................... 10 3.1 Metadata and ontologies ........................................................................................... 10 3.2 Vision and status of Semantic Web .......................................................................... 10 4 Semantic applications and folksonomies ......................................................................... 13 4.1 Semantic web technologies in scientific publishing ................................................. 13 4.2 Networks................................................................................................................... 13 4.3 Private media creation and folksonomies ................................................................. 13 4.3.1 Personal Photography .................................................................................... 15 5 Creating and utilising ontologies ..................................................................................... 17 5.1 Why and when ontologies are needed?..................................................................... 17 5.2 Challenges in creating and maintaining ontologies .................................................. 17 5.3 Ontology development lifecycle ............................................................................... 19 5.4 Ontology mapping .................................................................................................... 19 5.5 Ontology languages .................................................................................................. 20 5.5.1 Web Ontology Language (OWL) .................................................................. 20 6 Adding semantics to the content ...................................................................................... 21 6.1 Main principles ......................................................................................................... 21 6.2 Automatic / semiautomatic methods......................................................................... 22 6.3 Annotation tools........................................................................................................ 24 7 Semantic information retrieval......................................................................................... 28 7.1 Querying ................................................................................................................... 28 7.2 Reasoning.................................................................................................................. 28 7.2.1 Rule languages............................................................................................... 28 8 Service-Oriented Computing and Web Services.............................................................. 30 9 Metadata vocabularies and taxonomies ........................................................................... 32 9.1 Dublin Core (DC) ..................................................................................................... 33 9.2 IPTC.......................................................................................................................... 34 9.2.1 IPTC NewsCodes .......................................................................................... 35 9.2.2 IPTC NewsML and NITF.............................................................................. 35

VTT Information Technology

iii

Last modified on 19.08.05

RISE

Technology Report

9.3 PRISM ...................................................................................................................... 37 9.3.1 Application area............................................................................................. 37 9.3.2 Relations between the PRISM and other specifications ................................ 38 9.4 Book metadata .......................................................................................................... 39 9.5 Learning Object Metadata (LOM) ............................................................................ 40 9.6 YSA (Yleinen Suomalainen Asiasanasto) ................................................................ 41 9.7 Open Directory Project (i.e. dmoz.org) .................................................................... 41 9.8 WordNet.................................................................................................................... 42 10 Standards and recommendations...................................................................................... 43 10.1 MPEG-7 ................................................................................................................... 44 10.1.1 Structure ........................................................................................................ 44 10.1.2 Description Tools .......................................................................................... 44 10.2 MPEG-21 ................................................................................................................. 46 10.3 SMIL (Synchronized Multimedia Integration Language) ....................................... 47 10.4 Syndication .............................................................................................................. 48 10.5 Licence and Rights Metadata................................................................................... 49 10.5.1 XrML............................................................................................................. 49 10.5.2 ODRL ............................................................................................................ 49 10.5.3 Creative Commons........................................................................................ 50 11 Ontologies ........................................................................................................................ 51 11.1 Tourism / Travel ontologies..................................................................................... 51 11.2 Cultural heritage ...................................................................................................... 51 11.3 Common sense ontologies ....................................................................................... 54 12 Relevant projects.............................................................................................................. 55 12.1 Automatic generation of multimedia presentations ................................................. 55 12.2 NEWS-project.......................................................................................................... 57 12.3 Neptuno.................................................................................................................... 59 12.4 Mobile services related projects............................................................................... 60 12.4.1 Rotuaari ......................................................................................................... 60 12.4.2 Kontti............................................................................................................. 61 12.5 Mobile Media Metadata (MMM-1) ......................................................................... 61 12.5.1 Memorabilia and metadata ............................................................................ 62 12.6 ARKive .................................................................................................................... 62 12.7 Ontology support in eLearning area ........................................................................ 63 13 Conclusions...................................................................................................................... 66

VTT Information Technology

iv

Last modified on 19.08.05

RISE

Technology Report

Terminology Concept An abstract idea generalised from particular instances. Involves the idea of the existence of objects, processes, or relation of objects, i.e., table, cell, man, raining, family, etc. [http://para.unl.edu/para/SpedProg/Glossary.HTML] Faceted taxonomies Facets can be thought of as different axes along which documents can be described, and each faceted taxonomy contains a number of terms. Media content Content in any format that can be used in news, entertainment, education or other type of publishing. It may be text, photos, graphs, audio, video, or other formats or their combination. This term is used at general level without reference to a particular object. Media object, media resource, media asset A specific identifiable piece of media content. Metadata Data about data, or information about information. In practice, metadata comprises a structured set of descriptive elements to describe an information or media resource. [http://www.ktweb.org/rgloss.cfm] Ontology In philosophy, ontology is the study of what there is, an inventory of what exists. In this report, ontology refers to a model of entities and interactions in some particular domain of knowledge or application. In other words, an ontology captures knowledge of its domain. According to Tom Gruber, an AI specialist at Stanford University, "ontology is the specification of conceptualisations, used to help programs and humans share knowledge." Semantic media, semantically rich media Media that is described with ontology-supported metadata. The meanings defined in the ontology may relate to the actual content that is discussed or presented in the media object, or its role as a component in the media product or service. The latter meaning is specific to the media sector and content. Semantics The study of meanings: the historical and psychological study and the classification of changes in the signification of words or forms viewed as factors in linguistic development [http://www.m-w.com/] Taxonomy A hierarchical classification of things, for example, in the classification of biological organisms.

VTT Information Technology

v

Last modified on 19.08.05

RISE

Technology Report

Term Word or phrase used to label a concept. Terms in a thesaurus can be either preferred terms or non-preferred terms. [http://www.willpowerinfo.co.uk/glossary.htm] Thesaurus A controlled vocabulary in which concepts are represented by preferred terms, formally organised so that paradigmatic relationships between the concepts are made explicit, and the preferred terms are accompanied by lead-in entries for synonyms or quasi-synonyms. The purpose of a thesaurus is to guide both the indexer and the searcher to select the same preferred term or combination of preferred terms to represent a given subject. [http://www.willpowerinfo.co.uk/glossary.htm] The following relations: broader term, narrower term, and related term, are often definer for the terms in a thesaurus. Vocabulary (Controlled vocabulary) A set of selected words that are to be used for some purpose, e.g. to describe the content of a document.

VTT Information Technology

vi

Last modified on 19.08.05

RISE

Technology Report

1 Introduction This report presents the current state and future development potential of technologies that are relevant in developing new media products and services based on content that is described with semantically rich metadata. It has been written within the project "Rich semantic media for private and professional users". The project addresses two important trends and opportunities: utilising semantics in media products and combining commercial media content with user created material. The report has been created concurrently with defining and developing demonstration applications, and the topics included in the report were chosen and emphasised based on the needs from the application development work. The scope of the report is wide, which makes it impossible to cover these issues in depth. Several references and web links to additional material are included in the report. The RISE project defines the concept of Rich semantic media as media content in any format of which there is descriptive metadata. Metadata is not only a stand-alone vocabulary, but there is an ontology (or several ontologies) that give more meaning to the metadata. Ontologies typically describe the relations between the terms, and it will be possible to make inferences utilising the knowledge that has been captured into the ontology. This means that some tasks may be completely automated, or, at least, people may be offered more advanced support in their tasks. The most typical example of semantically supported applications is searching. When a search utilises semantic knowledge of concepts and their relations, it is possible to find more relevant material than for example when the search is made utilising free text where the search term must be found within the document. The underlying ontology may be utilised to expand or focus the search. Semantics can also be utilised to suggest additional related sources of information or content. At the most advanced level, Semantic Web technologies can be utilised to extract knowledge out of a document pool, and this, for example, can be used to build or update ontologies. This could even be used in areas like technology forecasting: when some terms emerge in a new context, this may be a sign of a new breakthrough or innovation. Another important area for ontology driven applications are context sensitive information services. There ontologies have been utilised to offer mobile users with relevant and context sensitive information adapted to their terminals. For example, background or interest profile of the user combined with the location information is utilised to offer content that is assumed to be of interest to this kind of a user even though she or he has not made any explicit search. In these applications, the content is most often small content objects. In media, the aim is to create and offer interesting and relevant media products and services, so content must in most cases be processed further than only presented as a search result list. This leads to an important sector specific application area: generating media presentations. Here the vision is to make it possible to automatically compile presentations that meet individual needs and expectations. As an intermediate step, editors

VTT Information Technology

1

Modified on 19.08.05

RISE

Technology Report

could benefit from semantically made suggestions as to what which components could be utilised in their story or presentation, and the editors could take these automatically generated suggestions as starting points to their work. This idea can also be combined with the emerging trend where end users take a more active role in modifying the content and how it is presented. Instead of offering end users exactly predefined presentations, they could be offered automatically generated presentations that they could modify to better meet their specific needs. Depending on the application, users could also be offered the opportunity to combine the media content with their own content. This way the user would be able to make a really personalised version of the content. For example, a travel guide could be complemented with the users' own comments of the sites and their own photos could be added. In order to build new interesting media applications, there are several challenges to address. We must understand what people may want to do with content, and then we must define the ontology and metadata vocabulary that describe the content in a way that support this. This is not as simple as it may first seem to be because there are many conceivable tasks and uses for the content with different or overlapping requirements. It is crucial that the needs are anticipated correctly, otherwise all work put to creating and utilising the semantic metadata is wasted. Also different media formats, such as text, images, audio and video need to some extent media specific metadata descriptions. However, interoperability across media formats is needed, because electronic platforms make it possible to utilise and combine different formats. The report is organised as follows: Chapter 2 gives more information of the research agenda. It explains the research questions of the project, and describes the demonstrations that are be to built in the project. The demonstrations focus this technology overview. Chapter 3 gives some perspective to the utilisation of metadata in relation to media content, and describes the vision of Semantic Web. Chapter 4 explains that new opportunities that are emerging as more and more content with metadata is available on the Internet, and also the phenomenon and opportunities that emerge, when large numbers of people may contribute with content and metadata. Chapter 5 gives an introduction and overview to ontologies, which are one of the key building blocks to creating semantic applications. Creating, maintaining and utilising ontologies is a huge area, where a lot of development work is being done, and many issues remain to be solved. Chapter 6 looks at the different ways that metadata can be created, and chapters 7 and 8 look at how semantic metadata can be manipulated and utilised. Chapter 9 presents the most relevant metadata vocabularies, and chapter 10 some metadata related standards and recommendations. Chapter 11 presents some ontologies that were assessed for utilisation in the project, and chapter 12 some projects that are interesting and relevant from the RISE project point of view. The main conclusions are presented in chapter 13.

VTT Information Technology

2

Modified on 19.08.05

RISE

Technology Report

2 Framework and research questions 2.1 Framework The goal of the RISE project can be summarised into the following task: To create an framework where individuals, media companies, 3rd parties and service providers can create and utilise semantically enriched media content to produce appealing new media services and content. These parties have different interests and needs: •

Individuals’ interests • How to enrich own digital collections? • How to share media in their own communities and networks? • Media companies’ interests • How to make existing media assets and investments relating to them into new business? • How to create completely new media products utilising rich semantic metadata? • 3rd parties’ interests (e.g. advertisers, promotors, media production companies, community hosts) • How to make non-commercial content available for commercial services? • How to utilise active users and amateurs in creating and maintaining commercial services? • How web-based communities can disseminate their information to other sites, link it dynamically to other services, and combine it with other media content? • Service providers’ interests • How to earn money from individuals, media companies and 3rd parties (e.g. other information owners, advertisers) Figures 1 - 3 describe different viewpoints to this area: data-flow, added value and the technologies needed to manage these data flows and supported services.

3rd Parties Digital Assets Rights Ads Digital Assets Rights Ads

Digital Assets Metadata Rights Service Providers

Individuals Digital Assets Metadata Rights

Media Companies Digital Assets Metadata Rights

Figure 1. Data flows in offering semantically rich media content.

VTT Information Technology

3

Modified on 19.08.05

RISE

Technology Report

Individuals or end user are not seen as only recipients of content but they may also produce and supply digital content added with metadata and right information. In Figure 1, the word metadata refers to any kind of descriptive metadata that describes the content; rights information is also metadata but it is here mentioned separately because of its special character and importance.

3rd Parties

• information delivery for targeted user groups

Service Providers

Individuals

• enriched personal collections

• increased revenues via targeted advertising • low operational cost structure via automated match-making

Media Companies

• increased revenues via re-use of archived content • lower operational cost structure via ad-hoc freelance content

Figure 2. Added value of semantically rich media services and products Semantically rich media is beneficial to different parties in various ways (Figure 2). Individuals can use these methods to enrich their personal media collections. Service providers may get increased revenues from targeted advertising and reduce costs by automated matching of content to the right customers. Media companies get additional revenues from re-using their stored media assets, and the 3rd parties can more efficiently target their information to the desired user groups.

3rd Parties Semantic metadata • RDF, RDFS, OWL Licenses • Creative Commons

Service Providers

Individuals Semantic metadata • RDF, RDFS, OWL Licenses • Creative Commons

Semantic queries and rules • SPARQL, SWRL Licenses • Creative Commons

Media Companies Semantic metadata • RDF, RDFS, OWL Licenses • Creative Commons

Figure 3. Technologies for expressing semantic metadata

VTT Information Technology

4

Modified on 19.08.05

RISE

Technology Report

Figure 3 provides some examples of technologies used in encoding, searching and transferring semantically rich media. Creative Commons (cc) licenses are (http://www.creativecommons.org/) an example of a new model for expressing licensing terms in a practical and user friendly way.

2.2 Demonstration descriptions 2.2.1 StorySlotMachine This demonstration application combines the main themes of the RISE project: utilising metadata and ontologies, letting the end user take an active role, if he or she is willing to do so, and combining commercial media content with consumer related content (fusion media). It also aims at making an application that is fun to use. This application gives the opportunity to study following aspects that are central in making semantically supported media applications: • describing and utilising different media objects and media formats to be used in the same application, • utilising ontologies to promote reusing media objects, • adding automatic features to generate media presentations, and • getting feedback from consumers with a new type of application. As the name of the demonstration suggests, the application is based on a slot machine metaphor. The user tries his or her luck with the content, and may partially or completely redo the try by "locking" the best results. The RISE demonstration targets at travel or location related information. The idea is that when planning a trip to a certain location, the user may look for background and orientation information of the destination. Instead of just listing the found resources, a presentation is created out of the components according to the target device that the user has specified. The result can be utilised as a guidebook during the trip, and after the trip the user may add his or her own photos, and comments, as well as search for additional information of things that have come up during the trip. The application does not include detailed information of the opening times or other such "real-time" information. Another way to use the application is to start with the user's own images, and to make an interesting presentation of one's own trip with some enriching media content. The application does not aim at being an exhaustive search tool but relies on randomness within the user defined limits. Ontologies are utilised to produce some of the basic information that is shown to the user, and also to select content that deals with topic. The ontology needs to be built to capture the basic knowledge relating to the target area, and then content can be added utilising the metadata. This way, new media content can be added and utilised as the pool from where content is presented to the user. Passive users may be directly happy with the presented media content, whereas active users may refine the searches and look for additional content. They are also given the opportunity to add their own content, such as images to the collection. This gives quite a new dimension to personalisation. The demonstration application targets at two main user groups: normal private travellers and school children who make a group trip to a destination and who do some information

VTT Information Technology

5

Modified on 19.08.05

RISE

Technology Report

searching and processing before, during and after the trip. By targeting at these two application areas, we can also address the challenges of utilising content for different purposes than what they were initially created for. The media content for the first application includes • newspaper articles and images, • encyclopaedia articles and some image and multimedia elements, • literature excerpts, and • video clips of TV programmes. This selection of media content requires that we take the different media types into consideration, which requires addressing the interoperability of metadata vocabularies across media types. The software development is made in two phases. In the first phase the concept is being defined and tested with potential users with the help of paper-based layouts and mockuser interfaces. This gives the opportunity to get feedback both from potential users and publishers. The technical feasibility of the concept is tested with the help of a step-by-step implementation. Media content is gathered from different sources and in different formats, and an off-line application is made first phase. First, the necessary metadata is added manually to the content. The results of this phase can be utilised to evaluate, which pieces of metadata are important and useful, and this information can then be utilised later when more content will be added. At this later phase, the need and possibilities to utilise automatic methods for creating the necessary metadata can be assessed. 2.2.2 My Own Hockey Clip - Professional Personal Memorabilia Professional Personal Memorabilia is a concept about mass-customisation of digital media products. With available semantic metadata and digital media technology it is possible to automates the customisation of media products, such as videos about ice hockey games. The video creation system behind this concept combines the user’s own personal media (e.g., photos, video clips, text messages created during the game) with professional media (e.g., goal shots and other game footage) and advertisements (e.g., video clips and logos). The mixing of these different media objects is done based on a semantic metadata template that describes the rules and options for combining the media. The created video clip is at the same time digital memorabilia for the end user and an advertisement for the service provider and advertisers. Unlike in traditional broadcasting, where a single video is broadcast to millions, this system makes it possible to send million personalised videos to million individuals. Also, unlike traditional broadcast media, personal media is most often further explicitly shared with friends and family (e.g., personal photos). This enables novel ways of viral marketing and gathering of social network information of customers using media objects and mass-customisation. Figure 4 gives an overview of the basic idea. In the middle is the video template, which is a script or an outline of the video clip with information and rules on what kind of material is included and where. There can be different templates for different kind of clips, for example, a video template for creating a video clip from a car show is different from the template described here, i.e., a hockey game. Also, within the genre there can be

VTT Information Technology

6

Modified on 19.08.05

RISE

Technology Report

variations depending on the audience: the template can be designed to be more family oriented or more adult oriented. This can be achieved, for example, by having variety in beer commercials and team mascots. Also, as an input to the template extra information about the user and the game can be taken into account: did the user attend the game or watch it from tv, what was the game (time, date and teams), where did the user sit in the arena, were any of the user's friends or family there, and so on.

Figure 4. Conceptual illustration of the Semantic Metadata Template for creating a personal hockey video clip. Above the template is professional media, below it is personal and group media. On the left lower section are examples of preferences for selecting the template.

The Role of Semantic Metadata and Video Templates The main challenge of this scenario is the availability of semantically rich metadata for the media objects. From the user's perspective the annotation of metadata objects can be done as a side product of upload. For example, the user can be asked to upload an image of himself in the hockey game on 27th of February. The image uploaded can then be included with metadata location about who is in the picture, as well as, when and where was it taken. The motivation for the user to do this is the video created and the anticipation of how the uploaded image looks in the final video. For the end user there is also the motivation that if he decides to share the picture within his group, it will be more probably used in other people's clips if it is richly annotated, i.e., uploaded according to the directions.

VTT Information Technology

7

Modified on 19.08.05

RISE

Technology Report

For the commercial stakeholders the motivation for annotation is different. Of course, if the creation of these kinds of video clips is profitable, it may be motivation enough to provide the required metadata. Nevertheless, the more customisable the videos are the more they need annotated media. Therefore, the cost of annotation should be taken into account. We will not go into the details of this, but want to emphasise that the amount of annotated media is proportional to the versatility of the video template, which relates again to the total cost of having a system of this kind. Less versatility means less annotated media means less costs. Video Rendering The actual technical implementation of the video compilation and rendering program is out of the scope of this description at this stage of the project. However, the objective is that the end result, the video file itself would include all the metadata used. In other words, the video itself would be richly annotated for future use in other videos made by the user or his group. Not only would the system create a database of richly annotated media, but the metadata would also provide information about user preferences. For example, who is selected most often the favourite player, or how many of HIFK fans have Apple computers. In other words, there is the potential of using the metadata as information on consumer preferences. It is good to bear in mind that the inclusion of rich metadata with the video brings about issues of privacy in sharing. For example, the metadata of the shared video clip has the information where the user was at what time, and perhaps with whom. Also, using this information for commercial purposes requires an acceptance from the end user. 2.2.3 DiMaS The Digital Content Distribution Management System (DiMaS) is being developed at the Helsinki Institute for Information Technology HIIT. DiMaS proves as a concept and a technical prototype that it is possible to make a system for amateur and professional multimedia producers to publish their work on existing highly popular peer-to-peer (P2P) file sharing networks, and importantly, the system enables producers to insert content descriptions, to manage intellectual property and usage rights, and to charge for the consumption. DiMaS enables versatile licensing, rights description and enforcing, and content encryption scenarios without introducing another multimedia file format and a respective player. Distributed P2P networks offer high availability and bandwidth through many users’ wideband connections, and good scalability with no central servers as bottlenecks. P2P multimedia file sharing networks are widely popular creating a huge content sharing base and a complex value network. This offers new business opportunities to various seasoned actors, and also, to new ones like multimedia producing amateur communities that are looking for alternatives to the traditional media publishing channels. These multimedia producing communities want to distribute their creations to a large audience, which P2P file sharing networks can provide. However, the P2P networks hardly support the use of metadata for digital rights management, charging mechanisms, nor content descriptions. To solve this issue, and to avoid disputes over on-line contracts, often another new file format and a respective supporting player are introduced, hence adding to an already large pool of multimedia formats and players.

VTT Information Technology

8

Modified on 19.08.05

RISE

Technology Report

Before the popular P2P file sharing networks can really be harnessed and leveraged by multimedia producers with solid contract practices, the following problems must be addressed: How to include the different content, rights, and charging descriptions in standardised metadata? How to bundle metadata, e.g., several rights and license descriptions, and the actual content into one package rather than having them in separate files? How to make this package easily accessible by the end-user without creating a new file format and another player application? There is no single metadata language that provides universal and standard descriptions for content, various rights, and pricing information for all content types. DiMaS encodes the descriptions into XML metadata using specific standards for each separate set of information (e.g., MPEG-7 for content, RDF for license text, and ODRL for user rights description and enforcement). This approach creates several different, very informative and browsable metadata files, while the actual content file is optionally encrypted against unwanted usage. DiMaS introduces the concept of a Distribution Package to bundle the metadata files with the actual content into one executable file with a built-in logic and user-interface for browsing all metadata, viewing an optional preview file, and for purchasing the actual content. The package can contain multiple licenses, for example, in human and machine readable format, multiple expression languages, like both ODRL and XrML, for the same content, and separate license variations for different legal regimes. The system requires no dedicated client software for the end user other than Java 2 Standard Ed. Runtime Environment.

VTT Information Technology

9

Modified on 19.08.05

RISE

Technology Report

3 Introduction to semantic metadata 3.1 Metadata and ontologies Metadata is not a new thing. People have a natural tendency to group and classify things, and this has been applied to information resources for a long time. For example, the Universal Decimal Classification system was initially developed more than 100 years ago. When the information is in books or other non-digital formats, separate metadata records are needed for managing and searching information. This early metadata was mostly centrally created utilising predefined vocabularies and classifications. When more and more information became available in digital format, also other types of information retrieval became possible. Free text search became the most typical way of searching and for most users it is the default and preferred way of searching. Ontologies belong originally to the field of philosophy where they relate to the question of what exists. In computer science this term is used for knowledge representation within a specific area. It captures the knowledge of a domain and it is expressed so that it can be utilised in computing for making inferences. Knowledge representation and ontologies were very much in focus when artificial intelligence applications were built in the 1980'ies and they are coming back now in webenabled fashion.

3.2 Vision and status of Semantic Web "The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in co-operation." (Berners-Lee, Hendler, and Lassila, 2001). There are two main building blocks for the semantic web: • Formal languages: RDF - Resource Description Framework, DAML+OIL, OWL (Web Ontology Language), which is being developed by the Web Ontology Working Group of the W3C. • Ontologies - communities will use the formal languages to define both domain specific ontologies and top-level ontologies to enable relationships between ontologies to be determined for cross-domain searching, exchange and information integration. In other words, the vision of Semantic Web is that of a more intelligent and automatic web, where web resources can be utilised more effectively and many routine tasks can be performed and questions answered automatically based on the information in the ontologies and in the Internet documents. The semantic infrastructure makes it possible to utilise available web resources more effectively, flexibly and automatically then before. Semantic Web is still very much a vision and whether is will come true on the public Internet is still an open question. If we look at the steps in Figure 5, we can state that the use of URI addressing and Unicode character sets is widely spread. In addition, use of

VTT Information Technology

10

Modified on 19.08.05

RISE

Technology Report

Figure 5. The building blocks of the Semantic Web according to Tim Berners-Lee (Semantic Web - XML2000; http://www.w3.org/2000/Talks/1206-xml2k-tbl/)

XML for document management and data exchange with XML Schema technology has reached a level of maturity as these technologies are widely supported in open source and commercial software offerings. RDF metadata technology and its schema technology are technologies which have been developed at the W3C and brought to final standardisation level in early 2004. Following this, many software vendors have introduced support for these technologies. However, these technologies are not as widely supported as the underlying XML, Unicode and URI technologies. Ontology vocabulary layer today has a name: Web Ontology Language (OWL) which was standardised in early 2004 by the W3C and for there are some software offerings that provide support for this. One should note that OWL is not an industry specific vocabulary addressing specific use cases but a tool for constructing such vocabularies which require expressivity outside the capabilities of RDF schema technology Layers of Logic, Proof and Trust have not yet reached the status of an open standard but there is a growing interest towards specifying a rule language on top of the ontology layers. For example, W3C recently (April 2005) organised a workshop on rule technologies. Logic, Proof and Trust are seen as a necessary tool for any software component to be able to deduce implicit knowledge about the explicit metadata descriptions. Without digital signatures, the public Internet will suffer from lack of trust. Thus the digital signatures are a necessary ingredient that need to be in place before e.g. search engines will index formal metadata descriptions such as RDF.

VTT Information Technology

11

Modified on 19.08.05

RISE

Technology Report

Trust is ultimately the user experience and not a technology to be standardised. Given set of metadata, ontologies and rules encoded with sound mathematic logic, a proof can show the user how a deduction came into existence. The RISE project mainly targets at looking at the RDF+rfdschema, ontology vocabulary and logic levels. In other word, how can media content be described (metadata, RDF+rfdschema), and what kind ontologies are needed to make it possible to make logical decisions relating to utilising media objects.

VTT Information Technology

12

Modified on 19.08.05

RISE

Technology Report

4 Semantic applications and folksonomies 4.1 Semantic web technologies in scientific publishing There is only little information available about how Semantic Web technologies are being used in any field of publishing. W3C organised a workshop relating to Semantic Web for Life Sciences in October 2004 [http://www.w3.org/2004/10/swls-workshopreport.HTML]. There, two application examples were given. In Nature Publishing Group, applications are focused on RSS and RDF whereas Elsevier told that they focus on using RDF and OWL internally to manage ontologies and store outputs of text mining. Also methodologies and issues relating to allowing authors to self-publish semantics were discussed. Elsevier has an "authorgateway", which can be used to input basic Dublin Core type metadata, but it does not allow authors to add "assertional" metadata (e.g. ") or create ontologies. Before this can be done, the copyright issue must also be addressed, i.e. who owns the ontology and how it can be utilised. When dealing with ontologies in applications where consequences for mistakes are serious, like in the medical field, the liability and responsibility issues are particularly important.

4.2 Networks Few consumer applications are currently presented or marketed as Semantic Web applications, whereas social computing and applications have caught a lot of attention, and they have a semantic dimension. Social computing applications ("6th degree applications") make it possible to store information of which people are connected to each other. The first such applications were closed, centralised systems, and this was quickly seen as a hinder, because the information was locked into the different systems. With a common vocabulary, information could be gathered and combined from different sources, which has huge potential. The FOAF vocabulary (Friend of a Friend) [http://xmlns.com/foaf/0.1/] was defined as a tool to do this. It is currently one of the most known semantic vocabularies. This vocabulary makes it possible to describe things about people such as, which people know each other, which organisations these people belong to and which papers they have published. Blogging is another phenomenon that has caught a lot of attention recently, for several reasons. Here we mention it as an example of networks of related websites and persons. Most blogs have little explicit semantics, but bloggers make links to other bloggers' websites, as well as to other published material they find interesting and relevant to their topic. This creates a hub for related resources. This has in practice been made possible with a number of tools being available such as RSS or Atom feeds for syndication and permalinks for making stable links.

4.3 Private media creation and folksonomies We can also see some other "grass root" approaches to semantic web in consumer applications. A new term, folksonomy, has been defined, and it refers to the practice of collaborative categorisation using freely chosen keywords (definition in Wikipedia).

VTT Information Technology

13

Modified on 19.08.05

RISE

Technology Report

There are several popular services currently attracting a [http://www.guardian.co.uk/online/story/0,3605,1403974,00.html].

lot

of

interest

A site for storing and sharing bookmarks [http://del.icio.us/] lets the users group their bookmarks utilising self-defined tags. Flickr [http://www.flickr.com/] is another one of these sites. It lets its users upload and share their photos and, most importantly, attach tags to the photos. Users may choose who their photos are visible to, and public images may be sought with the help of these tags. The most popular keywords are shown to the users as can be seen in Figure 6, and the size of the keyword indicates its popularity. Some typical problems can be seen here. For example, there are words that practically mean the same (tree, trees; newyork, newyorkcity). The term "cameraphone" does not probably mean that it is a picture of a cameraphone, but that it was taken with a cameraphone. However, by showing the most popular terms, users are given the opportunity to see which tags other users have created, and by making it easy to change the tag, users may choose to utilise these tags, if they wish to do so. We can also see that the terminology is simpler here than in official vocabularies. For example, instead of using the tag "self-portrait", many users have chosen to tag a photo of themselves with the tag "me".

Figure 6. Most popular user created tags in Flickr In May 2005 (http://www.flickr.com/photos/tags/) Flickr also analyses which tags often co-exist and suggests additional tags as links to related photos. These tags are called 'Related' and 'See also'. There is no explicit explanation as to what the relation is, but as wee can see in Figure 7, the suggested tags have a clear connection to the initial tag of this example (Helsinki).

VTT Information Technology

14

Modified on 19.08.05

RISE

Technology Report

With this large number of images, some of which are also published under the Creative Commons licence, new ideas for utilising them will come. The images could be used as a photo travelling guide, or a source of learning material.

Figure 7. Flickr suggests more tags in relation to the tag that was used in the search. They are grouped as related and See also. Wikipedia is another example of large scale voluntary co-operation. The main innovations in Wikipedia relates to the process of how the content is created and modified. Anyone is allowed to make the modifications; if the content is vandalised, an earlier version can be restored. The Wikimedia Foundation has other initiatives in addition to the flagship project Wikipedia. Wiki Commons can be mentioned as an example of a new initiative. There the aim is to create a pool of free media objects, such as images and video clips that can be used in other Wikimedia projects. [http://commons.wikimedia.org/wiki/Commons:About]

4.3.1 Personal Photography The organisation and management of personal media, mainly photographs, is currently a popular area of product development. Based strongly on the use of metadata for archiving, organising and managing media for professional use (e.g., media production, news production, etc.) there are some commercial products that leverage the use of semantic metadata (e.g., Adobe Photoshop Album, Apple iPhoto). However, the main issue with using semantic metadata for personal use is that unlike, for example, in news business, the information required for personal use is not standard among people: the kind of information one person considers important might have no use for someone else, or the same information used by two people actually means different things. For example, a parent who photographs his daughter’s soccer games might use a soccer game specific metadata structure, and, on the other hand, information such as “my sister” is very different for different people. The example of Flickr described above is one approach to this solution of sharing metadata, but as one can quickly see by searching the Flickr pictures, the critical problems of common metadata are inherent.

VTT Information Technology

15

Modified on 19.08.05

RISE

Technology Report

The personality of metadata used for personal pictures creates challenges for automatic metadata creation. Therefore, many of the commercial digital image management systems rely on basic metadata framework that has to be filled and extended by the user. Often the framework is three main categories: people, places, events. The idea is, for example, that users fill out the “people” category with people or extended with categories such as “relatives, friends, colleagues”. This requires quite an effort from the user of the system, especially, if there is in addition to creating the metadata structure, thousands of pictures to annotate. To summarise, the main issues in using semantic metadata in personal media management and use is that the metadata has to be created in co-operation with the end user. Without the end user the metadata will remain on a general level not providing the advantages possible. The involvement required from the end user requires that the end user understands both the concept of metadata and the advantages of having semantically rich metadata. These issues are a hot topic in both commercial applications and academic research.

VTT Information Technology

16

Modified on 19.08.05

RISE

Technology Report

5 Creating and utilising ontologies In many application areas, such as e-commerce, e-government, content standardisation, and legal information systems, the modelling of content, regulatory, rights usage, and/or pricing knowledge is critical. For example, reasoning methods, application scenarios, end-user terminals, the weight and order of ruling, and parsing license texts require special semantic patterns and often for each different content types. Computational agents require machine-readable descriptions of the content and capabilities on the network. In the case of existing data, the challenging task of automated extraction of semantic information from media content documents is often required. Then metadata descriptions must be bundled in addition to the human-readable versions of the information.

5.1 Why and when ontologies are needed? Even the highly popular World Wide Web as it is currently constructed resembles to a poorly laid out map. Our accessibility into the media is normally based on free text searches, lacking document connectivity and usage patterns. This makes difficult to find truly relevant material from the mass of this data. The sheer magnitude of the data is unmanageable for human, or at least without powerful tools. To service an end-user in the world of increasing mass of information, it is necessary to go beyond keywords, and specify the meaning of the resources available through various networks. This additional layer of interpretation attempts to capture the semantics of the data described using formalised and machine-readable conceptual models, i.e. ontologies. Both ontologies for automatically processable, knowledge-rich content characterisation of information sources, and metadata for information source description, evaluation, and access, have an important role in scenarios for intelligent information access. For example, in almost all news services, the content is organised according to some form of categorisation. The value of good-quality and detailed categorisation increases as the news is to be used in an asynchronous and customised manner. Formally, in the context of knowledge representation, ontology is a partial specification of a conceptual vocabulary to be used for formulating knowledge-level theories about a domain of discourse; i.e., ontologies provide a way to formalise a common vocabulary for describing some area of interest. When agents (human or artificial) agree on such a vocabulary, then they can share and reuse knowledge.

5.2 Challenges in creating and maintaining ontologies One should not underestimate the difficulties related to building a good ontology. Persons developing ontologies must understand the content domain, and be able to concretise this knowledge into an ontology. Ontology creation occurs usually when the content provider decides to extend its domain coverage. If the ontology for the domain is already defined, the development of new content products requires typically only ontology modifications, not a totally new

VTT Information Technology

17

Modified on 19.08.05

RISE

Technology Report

ontology. Even if the same ontology cannot be directly used, the content provider should use existing knowledge as basis to avoid duplicated work. The nature of required metadata depends on the purpose it is used for. For example, content formatting requires mostly structural metadata whereas content selection and personalization is based on semantic metadata. The requirements of content production are relatively easy to reflect to the ontology, because most of the production normally follows the well-defined production guidelines. The more challenging task is to estimate, how the finished content is used and how those operations affect the ontology. One way to approach this challenge is to examine all supported media as well as the ways and reasons customers consume the content on these media. From this analysis it is possible to conclude requirements for content products. Content producers consider themselves as experts in modelling their content domain, but they often forget that different customers and customer communities may have different views on the same subject matter. Customers have varying interests and expertise levels, their terminology differs and they interpret things differently. All these variations should be considered and taken into account as much as possible when the semantic metadata is produced. The goal of the producer should be to create semantic metadata that covers most of the needs of an imaginary customer. The contents and reusability of current and planned information sources define the structure of the ontologies. If the same content can be used multiple times, the semantic metadata must fulfil all the requirements of resulting content products. It is also important to consider how much of the metadata the incoming feeds already contain and is this metadata usable. The key issue here is how much of the conversion from information feed ontology to the provider ontology can be automated. For example, if 95 percent of incoming information is produced by a single source, it might be advisable to develop the provider ontology so that manual conversion effort is minimised. Whatever the approach, the content provider must understand what kind of information it has access to and what are its key characteristics. However, customer needs ultimately define why the ontology exists. If existing or future users do not need a certain quality of metadata there is not point in producing it. Even though it is possible to produce vast amounts of metadata, it should be produced only if it is valuable to the production process or to the customers. The identification of these needs is a difficult but important task and requires a joint effort of different departments including management, marketing and editorial staff. Typical methods to identify customer needs include traditional business planning, customer segmentation, and marketing activities. Dimensions, detail level, and domain coverage all affect the complexity of the ontology. If the concept model for a certain dimension is simple or if the semantic metadata can be created automatically, the ontology may contain multiple dimensions. Adding a dimension that requires a lot of manual work needs to be carefully considered. If the content provider wants to produce highly detailed semantic metadata, the amount of dimensions and the domain coverage must be limited, or the metadata publishing process cannot be managed. Likewise, if the content provider wants to cover a wider domain, the detail level suffers. If the production of a certain piece of information is too expensive in relation to its perceived value, there is no point to include it in the ontology. This in turn is related to how much of the metadata publishing process can be automated and how

VTT Information Technology

18

Modified on 19.08.05

RISE

Technology Report

much of the information can be derived from the metadata in the incoming feeds. If the customers value highly the promptness of the information, the ontology and tools must allow high throughput rates. Even if the fast processing times are not a necessity, the supporting tools must support easy browsing of the ontology without the need to memorise its internal structures. One way to measure this readiness is simply by testing the ontology in practice by producing semantic metadata with it. If customers experience no difference between a number of documents even though they all have different metadata descriptions, the ontology may be too detailed. On the other hand, if the customers feel that two documents should be differentiated although they both have identical semantic metadata, there might be a need to deepen the ontology.

5.3 Ontology development lifecycle Semantic content metadata descriptions are dynamic by their nature. Metadata structures change over time, which produces a number of questions related to managing already existing metadata. Moreover, not only the structures change, but the overall correctness and value of information changes over time. We must be able to produce and manage multiple versions of both the metadata and the underlying ontologies. It is thus important to understand the dynamic nature of the domain and how the degradation affects the conceptualisations, and what to do with the existing categorised information, when the underlying conceptual models for categorisation change. For example, information on the Web is continuously changing, but there are no effective means to enforce any synchronisation of ontologies used by different services. Ontology versioning refers to the ability to manage ontology changes and their effects by creating and maintaining different variants of the ontology. There are three general requirements for a versioning methodology: • provide an unambiguous reference to the intended definition of concepts; (identification), • make the relation between several versions of constructs explicit (change tracking), and • provide methods to give a valid interpretation to as much data as possible (transparent evolution).

5.4 Ontology mapping There are very few real life applications where only one stable vocabulary or ontology can be used. The big challenge in making the Semantic Web work is that different applications are able to know when they are talking about same things. As the number of ontologies that are made publicly available and accessible on the Web increases steadily, so does the need for applications to use them. A single ontology is no longer enough to support the tasks envisaged by a distributed environment like the Semantic Web. Multiple ontologies need to be accessed from several applications. Ontology mapping can provide a common layer from which several ontologies could be accessed and hence could exchange information in a semantically sound manner.

VTT Information Technology

19

Modified on 19.08.05

RISE

Technology Report

5.5 Ontology languages 5.5.1 Web Ontology Language (OWL) The Web Ontology Language (OWL), which is being designed by the W3C Web Ontology Working Group, is intended to provide a language that can be used to describe the content classes and relations between them. OWL is a language for defining and instantiating content ontologies. Ontology is a term borrowed from philosophy that refers to the science of describing the kinds of entities in the world and how they are related. In addition to content descriptions, ontologies aim to reflect the rules and regulations using task models for social activities from the surrounding social world. This involves multitude of multilingual and terminological aspects of ontology, metadata, and content standardisation. Engineering of ontologies includes, e.g., conceptual analysis, representation, modularization and layering, reusability, evolution, and dynamics. For example, regulatory ontologies map property rights, persons and organisations, legal procedures, contracts, and legal causality to name a few. The OWL language 1. formalises a domain by defining classes and properties of those classes, 2. defines individuals and assert properties about them, and 3. reasons about these classes and individuals to the degree permitted by the formal semantics of the OWL language. Where earlier ontology languages, e.g., DARPA Agent Markup Language Ontology Language (DAML+OIL), have been used to develop tools and ontologies for specific user communities, particularly in the sciences and in company-specific e-commerce applications, they were not defined to be compatible with the architecture of the World Wide Web in general, and the Semantic Web in particular. OWL uses both URIs for naming and the description framework for the Web provided by RDF to add the following capabilities to ontologies: • Ability to be distributed across many systems • Scalability to Web needs • Compatibility with Web standards for accessibility and internationalisation • Openess and extensiblility OWL builds on the Resource Description Framework (RDF) and RDF Schema and adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. "exactly one"), equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes.

VTT Information Technology

20

Modified on 19.08.05

RISE

Technology Report

6 Adding semantics to the content 6.1 Main principles Metadata can be created manually, as a by-product of the content creation and utilisation processes, automatically or as a combination of these alternatives. There is a large number of automatic methods depending on the media type and how the media content is being/ has been created and utilised. Manual metadata creation refers to creating explicit metadata manually. This is often done by people who are responsible for archiving content. Content creators or providers are also often required to supply some metadata. Both these potential metadata creator groups have their strengths and weaknesses. The creators have good knowledge of what the content is about and what it is like, but they may not be very familiar with the purpose of the metadata and how to utilise the metadata vocabulary correctly. The opposite is true with people who specialise in creating metadata. And even with metadata specialists it may be difficult to obtain similar results over time, with several people making these descriptions. Another aspect is to utilize active users for creating semantic metadata. Media houses could consider this opportunity. Semantic annotation refers to structuring and marking up the content semantically (e.g links to URIs.) This allows a semantic processing of the information. Often the term annotation refers to comments, notes, questions or other forms of external remarks by users, but we use the term in wider sense including not only these personal notes, but also other semantic metadata about the content. This semantic annotation can be produced manually by the users or it can be provided by semi- or automatic indexing tools. Utilizing the original vision of the Web as a space for collaboration and not just a oneway publishing medium community, annotation can be a powerful method for adding value to the content. The most critical issues relating to this kind of an approach are the quality, accuracy and relevance of the annotations: how can it be ensured that the annotation is accurate and contains relevant information. Also systems for managing the annotations are needed. It is necessary to have metadata about the annotations in order to track and maintain the annotations. Another requirement for the annotation system is extensibility; it should be possible to add new terms in a coherent and meaningful way. One problem is to ensure the consistency of the terms used for indexing metadata, and that's why controlled vocabularies are often used. Because most of the semantic annotation tools are too complicated to use for others than IT-experts, the new trend is to utilize grass-root methods, which means that users can define the tags by themselves (like Flickr). Automatic methods are based on analysing the content and creating the metadata based on what can be inferred from the content. This has been applied most successfully to text, but methods are being developed to automatically analyse other media formats as well. These methods are needed in particular when metadata needs to be created out of to existing material, where the change to utilise content creators and the production process to generate metadata, has been lost.

VTT Information Technology

21

Modified on 19.08.05

RISE

Technology Report

If the metadata needs are taken into consideration when setting up the content creation and modification processes, a lot of metadata can be produced nearly automatically. It is possible to collect information like who imported/created the media object, and when, and which actions have been taken on it and where the object has been used. It is often possible to infer a lot of metadata based on where the object has been used. Here the challenge is to correctly understand which pieces of metadata are valuable. Also too much metadata has a price. McCalla, G. (2004) argues that content usage is the reliable way to produce metadata. According to him, if we knew the profile of the person who has used and found some content useful, we could deduce a lot of valuable metadata about the resource object. McGalla has his primary focus on learning objects where possibilities for describing user features and linking them to content features is somewhat easier than for more general types of content. The requirements to the amount and precision of the metadata depend on how the metadata is to be utilised. If the aim is to replace viewing the actual content, which would be idea when the metadata is utilised to create ready-made presentations, the requirements are very high: the metadata must reflect the suitability of the content component correctly: relevant ones should be found included and redundant, irrelevant and inappropriate ones excluded (Haase Kenneth, 2004).

6.2 Automatic / semiautomatic methods Because of the high cost and subjectivity associated with human-generated metadata, there is need for technologies and tools that enable the automatic or at least semiautomatic classification and segmentation of different digital resources (text, images, audio and video resources). This means utilising natural language processing, image processing, speech recognition and video-segmentation technologies. (Jane Hunter, 2003) gives an good overview about automatic metadata extraction methods and projects related to them. Natural language technologies make it possible to automatically analyse the semantic structure of textual documents. The most common method is to scan every word in a document and analyse the frequencies of patterns of words. Meaningful units can be extracted automatically (information extraction) and organised through clustering or classification (text mining). This is important both for knowledge mark-up and ontology development. Another method of auto-categorisation is noun phrase extraction - the extracted list of noun phrases can be used to generate a catalogue of entities covered by the collection. Nevertheless, it should be remembered that the accuracy of automatic categorisation is not as good as that of human categorisation. One common solution is to use automatic categorisation or metadata extraction as a starting point. The result can then be shown to a human editor who checks the relevance of the automatic generated metadata and modifies it, if needed. This is a faster method than completely manual metadata enrichment. Some of the systems that use automatic information extraction systems need first some manual annotation in order to train the system. After the training period, the system can be used to annotate the content automatically.

VTT Information Technology

22

Modified on 19.08.05

RISE

Technology Report

Image retrieval is mainly based on searching verbal image descriptions or by image content such as colour, texture, and simple shape properties. Automatic object recognition techniques are being developed and used in image indexing to recognise generic classes of objects and concepts (like buildings, flowers, animals etc.). Another research topic is automatic linguistic indexing of pictures. A framework for classifying images is presented in (Jaimes & Chang, 2000). It organises image descriptions into two may levels: syntax/percept and semantic levels. The syntax level descriptions can be produced based on looking at the image without specific knowledge. The semantic level requires knowing details about the image, such as where image was taken and who are the persons in it. The abstract level is determined based on cultural concepts and are open to different interpretations; for example, colours may have different symbolic or abstract meaning in different cultures. Automatic methods can only address the syntactic level.

Figure 8. Image description framework according to (Jaimes Chang, 2000). The topmost levels describe features that may be produced automatically and/or without specific knowledge, but as we proceed to the more specific and abstract levels the more information is needed and the description may also be culturally dependent. Speech recognition systems can generate searchable text that is linked to timestamp on the recorded media, so users can both call up text and jump right to the correct position in the audio clip. The accuracy of speech recognition has been a problem although the technology is developing rapidly. The challenge of video indexing systems is to parse hours of video, segment it and add semantics in order to make videos easy to find and reuse. Several technologies have been used for video indexing. Embedded textual data like timestamps can be utilised, scene change detection works especially for programs which have clear structure like news programs. Also speech recognition is used to convert words into text, which then can be handled with natural languages processing technologies.

VTT Information Technology

23

Modified on 19.08.05

RISE

Technology Report

6.3 Annotation tools Semantic annotating tools and systems have emerged in recent years. A number of systems have been developed to demonstrate different capabilities. The approaches adopted by different systems do not necessarily compete against each other but rather address different issues and can be used for different purposes. There are different annotation tools for different media types; text, video, image, audio. Most of the available annotation tools are prototypes. And most of them are still meant for annotating textual documents although automatic tools for video and image content have also been developed. Annotation tools differ based on what kind of pages can be annotated. Most of the tools are for static web pages. One example of solution for annotating dynamic web pages is so called deep annotation, which is used in CREAM framework. The idea of deep annotation is described later on this chapter. The tool support for different ontologies and also the integration with other tools and products varies. Level of automation varies from manual annotation to semi- and automatic annotation. Some of the solutions require that making annotations needs to be taught to the system, but after the initial period annotations can be made automatically. Annotation solutions also vary based on whether the annotation is planned to be made by professionals or by ordinary users and whether the annotations can be made collaboratively. Shared annotations on the web offer an important means to support collaboration between co-workers, students or any other group wanting to write and read each others comments in the context of Web documents. One example of the frameworks that support collaborative annotation is the Annotea framework. In Annotea, the annotation means attaching web pages with users' comments, notes or explanations. Annotea support annotation types like advice, change, comment, example, explanation, question and SeeAlso. The annotations are modeled as RDF metadata and XPointer and XPath are used to attach the annotations to document. Annotations can be shared by storing them into annotation servers and queried by using annotation capable clients such as Amaya editor/browser. Shared annotations can be used for example in solutions for basic collaboration, shared bookmarks or presenting evaluation results of documents. The first client implementation of Annotea is W3C's Amaya editor/browser. It is a tool used to create and update documents directly on the Web. Annotea is based on a document-centric approach, where the users are browsing documents and examining annotations related to them. Annotea is a powerful tool for joint authoring with a small group of collaborating agents sharing a common goal. However, it might be problematic to use it for annotation sharing in open user communities. In these cases, there is no guarantee that a freely articulated annotation would convey the same meaning to the different users. CREAM is an annotation and content authoring framework that is meant for easy creation of relational metadata (e.g. relations between instances). The difference between Cream and Annotea is that in Cream annotations are intended to be made while authoring the web page. In accordance with RDF principles, URIs are used as values for annotation. SCREAM extends the CREAM framework with an information extraction component for

VTT Information Technology

24

Modified on 19.08.05

RISE

Technology Report

the semi-automatic generation of annotations. Automatic information extraction is supported by Amilcare, which is an adaptive information extraction system (IES). CREAM has evolved several life-cycles and can be nowadays used for annotation of static as well as dynamic pages. CREAM supports the manual and the semi-automatic annotation of static web pages, the authoring of new web pages with simultaneous creation of metadata and deep annotation of web pages defined dynamically by database queries. (Handschuh, S. and Staab, S., 2003) The idea of this "deep annotation" is to create mapping rules between the database and the client ontology. For dynamic web pages (e.g. ones that are generated from the database that contains a catalogue of books) it does not seem to be useful to annotate every single page. Rather one wants to "annotate the database" in order to reuse it for one's own Semantic Web purposes. This solution requires that web sites are willing to describe the structure of their information in order to share information in semantic web. If they do that user may create mappings into his own information structures (e.g. his ontology). Deep annotation is defined as an annotation process that utilizes information proper, information structures and information context in order to derive mappings between information structures. The mappings may then be exploited by the same or another user in order to query the database underlying a web site in order to retrieve semantic data. (Handschuh, S. and Staab, S., 2003) The CREAM-based Ont-O-Mat/Annotizer is a tool integrating ontologies and information extraction tools. OntoMat-Annotizer supports the user with the task of creating and maintaining ontology-based OWL-markups i.e. creating of OWL-instances, attributes and relationships. It include an ontology browser for the exploration of the ontology and instances and a HTML browser that will display the annotated parts of the text. It is Javabased and provides a plug-in interface for extensions. The intended user is the individual annotator i.e., people that want to enrich their web pages with OWL-meta data. Instead of manually annotating the page, OntoMat allows the annotator to highlight relevant parts of the web page and create new instances via drag and drop interactions. [http://annotation.semanticweb.org/Members/cobu/AnnotationTool.2004-07-28.1138 ] MnM and Melita are examples of semiautomatic annotation tools. Both of them use Amilcare as their information extraction system, IES. Amilcare is a system that learns information extraction rules from manually marked-up input. The process supported by these systems comprises several activities, including manually annotating web pages (IES), training the IES using the annotated pages, tuning the performance of the trained system, and running the IES to automatically annotate a set of pages. AeroDAML is a web service which automatically generate DAML annotations from a given web page using a given ontology with the help of WordNet. On the web [http://ubot.lockheedmartin.com/ubot/hotdaml/aeroswarm.html] is also the demonstration about the AeroSWARM which works with same principle, but creates OWL mark-up from the inserted web page. The service analyzes the text of the document and extracts generic information such as: people, places, organization, time, nationality, etc. based on selected ontology. Ontologies available are Sumo ontology, Open Cyc and AeroSWARM ontology. Magpie and KIM are both plug-ins for Internet Explorer. KIM is a Knowledge and Information Management platform for automatic semantic annotation, web page indexing and retrieval. It offers a server, web user interface, and Internet Explorer plug-in. User requests annotation from browser plug-in, which highlights the entities in the current web

VTT Information Technology

25

Modified on 19.08.05

RISE

Technology Report

page and generates a hyperlink used for further exploring the available knowledge for the entity. The recognition of named entities is based for upper-level KIM ontology (KIMO). KIM also performs ontology population. As a base line, KIM analyzes texts and recognizes references to entities (like persons, organizations, locations, dates). Then it tries to match the reference with a known entity, having a unique URI and description in the knowledge base. Alternatively, a new URI and entity description are automatically generated. Finally, the reference in the document gets annotated with the URI of the entity. [http://annotation.semanticweb.org/Members/cobu/annotationtool.2004-1004.2595486734] Magpie is semantic filter, where user can use his/her own ontology in order to be able to recognize and annotate the content based on that ontology. Annotation is made automatically on client side. The goal for Magpie is to support interpretation and information gathering. Several solutions have been made for manual annotation of images. One example of this kind of solution is a tool, which is developed for semantic annotation and search in a collection of art images. The underlying ontologies are represented in RDF Schema and are based on existing thesauri like ATT (The Art and Architecture Thesaurus), WordNet, and ICONCLASS (classification system, which provides a hierarchically organised set of concepts for describing the content of visual resources). Annotation template is linked to the ontologies. The annotation tool generates user interface for annotation and searching images. The annotations are stored in an RDF file. (Laura Hollink & al, 2003) One example about the automatic annotation of images is presented by Simone Santini from University of California. The idea is based for two representations for images; a textual representation based on terms extracted from the page that links to the image and visual representation based on suitable image features. The images derive their meaning based on both of these. Text features are extracted using standard informational retrieval techniques, while visual features are derived from a spectral decomposition of the region adjacency graph of the image. For querying and navigating this kind of information visual similarity, textual similarity and relations induced by links where used. (Handschuh, S. and Staab, S., 2003) A number of tools have been developed to enable semantic descriptions to be manually attached to video. Examples of these tools are Ricoh - MovieTool and IBM - MPEG 7 Annotation Tool. Short comparison of different video annotation systems can be read in [http://metadata.net/filmed/pub/MMM04_FilmEd.pdf]. The IBM MPEG-7 Annotation Tool assists in annotating video sequences with MPEG-7 metadata. Each shot in the video sequence can be annotated with static scene descriptions, key object descriptions, event descriptions, and other lexicon sets. The annotated descriptions are associated with each video shot and are put out and stored as MPEG-7 descriptions in an XML file. IBM MPEG-7 Annotation Tool can also open MPEG-7 files in order to display the annotations for the corresponding video sequence. IBM MPEG-7 Annotation Tool also allows customized lexicons to be created, saved, downloaded, and updated. [http://www.alphaworks.ibm.com/tech/videoannex] Ricoh - MovieTool is a tool for creating video content descriptions conforming to MPEG7 syntax interactively. MovieTool is a tool for describing video contents. It is intended for use by researchers and designers of MPEG-7 applications. By using MovieTool, the user

VTT Information Technology

26

Modified on 19.08.05

RISE

Technology Report

can create the structure while watching the video. A major advantage to using MovieTool is that the user can quickly and easily see the correspondence between the MPEG-7 descriptions and the video structure of each scene. Explanatory descriptions can be added for each scene and become part of the MPEG-7 file. The MPEG-7 file can then be used, for example, to search for or jump directly to specific scenes. [http://www.ricoh.co.jp/src/ multimedia/MovieTool/] Filmed prototype application is an example of collaborative video indexing, annotation and discussion tool. It enables real-time collaborative indexing, browsing, annotation and discussion of video content between multiple groups at remote locations. Annotations can be associated with segments, keyframes or still regions within frames. Annotation component of the software uses Annotea. The system is developed over the Australian GrangeNet broadband research network. (R. Schroeter, J. Hunter, D. Kosovic, 2004)

Further reading http://annotation.semanticweb.org/ Annotea, http://www.w3.org/2001/Annotea/ Amaya (based on Annotate framework), http://www.w3.org/Amaya/ S-CREAM; http://www.aifb.unikarlsruhe.de/~sst/Research/Publications/ekaw2002scream-sub.pdf Ontomat-Annotizer: http://annotation.semanticweb.org/ontomat/index.html MnM: http://kmi.open.ac.uk/projects/akt/MnM/index.html Melita: http://nlp.shef.ac.uk/melita/ KIM Platform; semantic annotation: http://www.ontotext.com/kim/semanticannotation.html Magpie, http://kmi.open.ac.uk/projects/magpie/main.html AeroDaml: Paul Kogut and William Holmes, AeroDAML: Applying Information Extraction to Generate DAML Annotations from Web Pages , [http://ubot.lockheedmartin.com/ubot/papers/publication/AeroDAML2.pdf], First International Conference on Knowledge Capture (K-CAP 2001) Workshop on Knowledge Markup and Semantic Annotation, Victoria, B.C. October 21, 2001

VTT Information Technology

27

Modified on 19.08.05

RISE

Technology Report

7 Semantic information retrieval 7.1 Querying Once semantic systems become implemented, the ability to find relevant information based on these semantics becomes crucial. The end-users need to be provided with user interfaces which support them in this information retrieval task. These user interfaces typically take into account usability aspects while hiding away the technical implementation details. Those involved in creating the user interfaces or the search systems in general and those who implement the semantic repositories are faced with the same question: what technologies should we use to expose the semantics to other software systems and to the developers building an end-user application. Semantic systems based on the Semantic Web technologies consider the RDF data model as the sound basis for encoding metadata. However, the market is fragmented when it comes to exposing this metadata for querying. In fact, many RDF repository vendors have developed different query languages for the very same purpose: expressing a query based on the RDF data model using a declarative syntax while hoping the access to the repositories is universal and not bound to one single programming paradigm such as Java programming language. The World Wide Web Consortium (W3C) is addressing this market fragmentation by working to create an open standard for querying RDF repositories. This upcoming technology is called SPARQL ‘sparkle’ and is analogous to SQL in many ways. Both relational algebra and RDF have a comprehensive academic research to back up their use. Both relational databases and RDF repositories have had a history of different query languages and access layers. RDF repositories are now in the process of getting a declarative query language to make it easier for developers to build RDF based applications.

7.2 Reasoning Despite the analogous story of querying RDF repositories and querying relational databases there are differences which add value in the way information is today processed. Reasoning or inferencing or in general the ability to deduce implicit knowledge from explicit knowledge is something RDF repositories are capable of doing by nature. This capability means more dynamic and adaptive information handling where information encoded yesterday may no longer need migration in order to be used by another application but the ability to manage concept ‘composer’ as concept ‘creator’ may be as simple as updating the set of rules used by the RDF repository. 7.2.1 Rule languages The set of rules to be used by an RDF repository is something that can change over time. Traditionally the information model (or a schema in a relational database) has been first modelled and then an application built using this information model.

VTT Information Technology

28

Modified on 19.08.05

RISE

Technology Report

The decoupling of instance data and the view an application sees over this data is something that can be decoupled with Semantic Web technologies. An application may have been programmed to look for ‘creator’ by the name ‘John Adams’. As new concepts such as ‘composer’ can be introduced to the system via the rules, the application may start returning audio files composed by ‘John Adams’ without migration or development and testing work with the existing application. The rule languages are also being standardised. Web Ontology Language (OWL) is a technology that originates from the W3C and provides three layers of expressivity in expressing constraints and dependencies between concepts (such as ‘creator’ and ‘composer’). OWL has a formal background enabling different software vendors provide inference engines that all give the same answer given the same set of instance data and the ontology definition encoded in an OWL document. Semantic Web Rule Language (SWRL) is a candidate technology for expressing userdefined rules such as stating that if a document has two creators, those two creators are friends with each other. Such inferencing of implicit knowledge (friend) from explicit knowledge (creator) enables further intelligence to be added into RDF repositories.

VTT Information Technology

29

Modified on 19.08.05

RISE

Technology Report

8 Service-Oriented Computing and Web Services Semantic Web is about the current effort in academia and industry aimed at transforming the World Wide Web from a network of computers, i.e., loggable network addresses, that makes information available and facilitates transactions, toward a ubiquitous environment that provides resources and services. This new kind of approach pursues to facilitate the emergence of dynamic virtual organisations and communities, and other collaborative working co-operatives utilising service-oriented information systems over the Internet. This so called service-oriented computing is the new emerging cross-disciplinary paradigm that has risen to offer solutions to various challenges in distributed computing, typically on large-scale computer networks, i.e., the Internet. It is drawing growing attention from the popularity of peer-to- peer networks, which keep on finding new application areas. The service-oriented system architecture changes the way software applications are designed, delivered, and consumed. Services offer novel, often autonomous, platform-independent computational elements that can be described, published, and discovered using standard protocols. They offer means to build networks of collaborating applications distributed within and across organisational boundaries, or closer to consumers. Web Services are the current most promising technology based on the idea of serviceoriented computing. They provide the basis for the development and execution of business processes that are distributed over the network, even on uncontrolled computers. Web Services possess the potential to offer interoperability across platforms and systems. They are neutral to built-in description languages utilising standardised messages over system interfaces. This makes them suitable for accessing various connecting systems over heterogeneous environments. Basically, Web Services technology aims at providing standard mechanisms for describing the interface and the services available, as well as protocols for locating such services and invoking them. The technology consists of a set of formal descriptions and languages standards, e.g., Web Services Description Language (WSDL), Universal Description, Discovery and Integration (UDDI), and Simple Object Access Protocol (SOAP). As with all emerging new technologies, developments are taking Web Services to various directions. On the other hand, new definitions are on the way to support the specification of more complex services, e.g., the so called Web Service orchestration and choreography, out of simpler ones. This includes proposals like BPML, XLANG and BizTalk, WSFL, WS-BPEL, and WS-CDL. The second approach consists of the design of new sort of meta-Web Services that can be exploited at run-time by other Web Services, e.g., managing the cooperation of Web Services or acting as dynamic registry services. Combined with recent developments in such areas as, say, semantic web, agent technology, business languages, and formal methods for representing and analyzing the behaviour of communication of concurrent and distributed systems, services can provide the automated network-based support needed for e-business collaboration and integration both at the data, semantic, and business logic levels. Even if applications for such a network may be complex, distributed logic could also offer new ways of implement novel trust and verification systems lacking vulnerability of a single-point-of-failure.

VTT Information Technology

30

Modified on 19.08.05

RISE

Technology Report

However, service-oriented computing poses a number of research challenges that are fundamentally cross-disciplinary and transversal to more established research fields. From a viewpoint of computer science, research challenges include, e.g., the composition, discovery, integration, and monitoring of services, their quality and security, methodologies supporting their development, evolution, adaptation, as well as their lifecycle management. Moreover, distributed service-based systems call for innovations in user interface design, data verification, trust management, business and pricing models, digital rights management, and data structures and ontology, for example. The area is attracting the interest of researchers in very diverse communities, such as database software and artificial intelligence engineers, IPR lawyers, economists, user experience researchers, and sociologists, just to name a few. No doubt, break-troughs involve crossing the boundaries of the existing communities and attracting top-level scientific contributions from each different scientific community. Fundamental steps toward truly service- oriented computing include the reasoning about Web Service semantics, i.e., behaviour and equivalence of the service. This means realising registry services where retrieval is based on the meaning of a service and not just a Web Service name. From a different viewpoint, emergence of virtual organisations and communities also call for novel methods on how the transactions of the two parties are to be legally conducted. This means the inclusion of digital licenses and contracts between net parties, a system feature often underestimated by otherwise competent system engineers. In many cases, contracts are still treated as external legal documents, not relating to user rights or to the information system. Mechanisms for distributed e-business may be lacking both support for user rights and licenses and a system to manage these cross- organisational collaboration or consumer contracts. The modern very first generations of peer-to-peer networks are notorious for lacking many important features needed solving before the popular file sharing networks can really be harnessed and leveraged for business. Furthermore, they are merely the first stepping-stones on the way to, so called, the third wave in computing. The dawn of computing was the era of massive mainframes, each shared by lots of people, before the modern personal computing era with the close one person to one machine interaction. The era of ubiquitous service- oriented computing pursues to send technology into the background of our lives and releases the power of many networked computers to the reach of one mobile user. The modern file sharing programs actually share only the first of the many resources of a computer, a hard drive. Developers and researchers are already working with the idea of a so-called consumer grid enabling truly distributed computing over the public Internet. In this future scenario, a user wouldn't need to know any computer or a net address to log onto, but simply asks for a service or media through advanced user interfaces that are available as electricity. This paradigm shift is from a computer network to a service network.

VTT Information Technology

31

Modified on 19.08.05

RISE

Technology Report

9 Metadata vocabularies and taxonomies This chapter gives an overview of vocabularies and taxonomies that are most relevant to the demonstrations that are being built in the RISE project. Table 1 lists the chosen vocabularies and explains why they were included. Table 1. Brief description of the vocabularies that are presented in this report. Vocabulary

Status

Relevance to the RISE project

Dublin Core is a metadata vocabulary for describing web resources. Origin in the library community.

Widely used on the Internet.

This is a widely used, general purpose metadata vocabulary that needs to be taken in any general purpose web application

The International Press Telecommunications Council (IPTC) is a consortium of world's major news agencies and news industry vendors. It develops and maintains technical standards for improved news exchange that are used by major news organisation in the world.

There are several standards with different levels of market acceptance. Some new metadata codes have been introduced recently.

The VTT demonstration has news related material.

The Publishing Requirements for Industry Standard Metadata (PRISM) has been created for supporting the reuse of content, such as magazine articles.

It has been difficult to get information of the how much the PRISM specification is used, which can be regarded as an indication that it is not so widely used. The group that develops it consists of American companies.

The aim of the PRISM vocabulary is relevant for the VTT demonstration.

Book metadata has mostly been created for libraries and books shops. The more wide spread use of online shopping has increased the importance of book metadata.

Libraries use different vocabularies, such as Dewey Decimal Classification and Universal Decimal Classification. ONIX is gaining importance in online selling; additional vocabularies are needed for describing the subjects.

The VTT demonstration includes material from books, so metadata vocabularies that are used in describing book and that kind.

Learning Object Metadata (LOM) is a standard defined by IEEE Learning Technology Standards Committee. This vocabulary has been defined for describing any kind of

This is the most know vocabulary for describing learning materials, and it is also fairly widely used.

The VTT demonstration also includes the learning aspect.

VTT Information Technology

32

Modified on 19.08.05

RISE

Technology Report

learning materials Yleinen Suomalainen Asiasanasto (YSA)

Many applications utilise this vocabulary for metadata descriptions and it is the largest Finnish thesaurus available on the net.

The VTT application needs terminology to describe content and YSA is a relevant alternative there.

Open Directory Project (ODP) or dmoz.org is the most comprehensive human edited directory on the web.

Widely used on the Internet as "de-facto open ontology" for classifying Web site content, for example Google directory uses dmoz. Mature resource that has been constantly updated for more than ten years. Translated to many languages, but is still mostly in academic use.

An example of the open collaborative work that is being currently done on the web.

WordNet, an electronic lexical database, is considered to be the most important resource available to researchers in computational linguistics, text analysis, and many related areas.

This is an extensive database, which is available for free. Many relevant research projects utilise this. Unfortunately this has not been translated into Finnish.

9.1 Dublin Core (DC) The Dublin Core Metadata Initiative established a set of metadata to describe electronic resources in a manner similar to a library card catalogue. The Dublin Core includes 15 elements designed to characterise resources. These elements are (http://dublincore.org/documents/dcmi-terms/), http://dublincore.org/documents/dces/: 1 Title 2 Creator 3 Subject and Keywords 4 Description 5 Publisher 6 Contributor 7 Date 8 Resource Type 9 Format 10 Resource Identifier 11 Source 12 Language 13 Relation 14 Coverage 15 Rights The metadata description may be made more generally applicable by utilising different controlled vocabularies to describe what the content is about. Nordic Metadata Project has created a web-based tool for creating Dublin Core records (See the Finnish version at

VTT Information Technology

33

Modified on 19.08.05

RISE

Technology Report

http://kkweb.lib.helsinki.fi/cgi-bin/dc.pl). That page also includes links to some controlled vocabularies such as the Finnish YSA (Yleinen suomalainen asiasanasto). The Dublin Core Metadata Initiative also maintains some terms including elements, element refinements, encoding schemes, and vocabulary terms (the DCMI Type Vocabulary) [http://dublincore.org/documents/dcmi-terms/]. The DCMI Type Vocabulary provides a general, cross-domain list of approved terms that may be used as values for the Resource Type element to identify the genre of a resource [http://dublincore.org/documents/dcmi-type-vocabulary/]. The type terms are: • collection, • dataset, • event, • image, • interactive resource, • moving image, • physical object, • service, • software, • sound, • stillimage and • text. This is a very general classification, but worth paying attention to, because this information may be available of a large number of resources that have been described with this vocabulary.

9.2 IPTC The International Press Telecommunications Council (IPTC) is a consortium of world's major news agencies and news industry vendors. It develops and maintains technical standards for improved news exchange that are used by major news organisation in the world. The IPTC provides both news exchange formats to the news industry (NewsML and NITF) and creates and maintains sets of topics to be assigned as metadata values to news objects like text, photographs, graphics, audio- and video files and streams for many years now. This allows for a consistent coding of news metadata over the course of time. The metadata vocabularies relating to news are called IPTC NewsCodes. The main characteristics of the IPTC standards are the following: •

NewsML is a method for packaging, relating, and managing diverse pieces of media. Offers a universal metadata approach to all sorts of specialised content formats.



NITF is a specialised format for marking up news stories. Helps a publisher to differentiate a headline from a byline or paragraph, and also helps a publisher mark up inline text entities, such as organisations and people.



SportsML is a specialised format for sports scores, schedules, standings, and statistics.

VTT Information Technology

34

Modified on 19.08.05

RISE

Technology Report



ProgramGuideML is a specialised format for listings for program guides on television and radio.



Metadata Vocabularies are now called NewsCodes. They are controlled vocabularies of terms of significance to publishers. They include a taxonomy of subject codes, listings of roles and genres of news components, and ratings for relevance, priority, urgency, and other characteristics.

IPTC NewsCodes and NewsML are presented briefly in the next two chapters 9.2.1 IPTC NewsCodes The universe of NewsCodes are currently split into 28 individual sets for increased manageability as topics usually relate to a specific area and likely to be used exclusively in a specific metadata element of a news exchange format (See Appendix 1). The earlier existing vocabularies for genre, media type, newsitem type, subject code and subject qualifier are now included in these NewsCodes. The subject codes and genre definitions are currently (2005) being localised to Finland. Also the scene vocabulary that describes photographs is being localised. Possible useful new metadata sets are confidence, which could be used to mark the reliability of the content; and role, that indicates the position or importance of a single item in a collection of news items. The scene vocabulary is to be used to classify images. The vocabulary items may be grouped into six main categories according to what they describe: 1. person related (9 terms), 2. scenery related (9 terms), 3. activities related (3 terms) (these also typically include persons, so they could be also regards as person related), 4. symbolic (1 term), 5. unusual or amusing images (1 term) and 6. images from making a movie or TV programme (1 term). In Finland, the adoption of the following classification for news related images under discussion: • news related images (uutiskuvat) • subject matter images (aihekuvat) • person related images (henkilökuvat) and • symbol images.

9.2.2 IPTC NewsML and NITF The News Markup Language (NewsML) and News Industry Text Format (NITF) are the most important new content related vocabularies. NewsML is designed to provide a media-independent, structural framework for multi-media news. NewsML defines a way to package related news components into a package. The News Industry Text Format (NITF) defines a vocabulary for tagging the actual news story. It also includes some metadata, and this overlaps with NewsML, which includes many metadata elements.

VTT Information Technology

35

Modified on 19.08.05

RISE

Technology Report

NewsML is meant to support managing news throughout their whole life cycle • in and between editorial systems • between news agencies and their customers • between publishers and news aggregators • and between news service providers and end users. At the heart of NewsML is the concept of a news item which can contain different media – text, photos, graphics, video - together with all the meta-information that enables the recipient to understand the relationship between components and understand the roles of each component. Everything the recipient might need to know about the content of the news provided can be included in NewsML’s structure. For example, NewsML enables publishers to provide the same text in different languages; a video clip in different formats; or different resolutions of the same photograph. NewsML’s rich metadata concept can help with things like revision levels that make it easy to track the evolution of a NewsItem over time, status details (publishable, embargoed, etc.) and administrative details, such as acknowledgements or copyright details. NewsML has default metadata vocabularies to ease implementations but it does not dictate which metadata vocabulary is to be used (IPTC Subject Codes, ISO country codes etc.) – a providers just have to indicate which vocabulary they are using. Multiple vocabularies can be utilised within the same NewsItem. For text objects in a NewsItem, the IPTC’s News Industry Text Format (NITF) is recommended.

Figure 9. A news item in a NewsML package may consist of alternative versions for different channels, and there may be complementary elements, such as images and graphs.

VTT Information Technology

36

Modified on 19.08.05

RISE

Technology Report

This model permits including different versions of one news item into one package. The possibilities to manage different versions, such as translations are being improved in the coming next version. The initial version of NewsML was approved in October 2000. Since then it went along with minor revisions: version 1.1 was approved in October 2002; version 1.2 was approved in October 2003. IPTC is currently working on a new version of NewsML (2.0). The IPTC web site has a list of companies that have reported to IPTC that they use NewsML. The list has currently (Feb, 2005) 14 news content providers and 8 system suppliers.

9.3 PRISM 9.3.1 Application area The Publishing Requirements for Industry Standard Metadata (PRISM) specification defines an XML metadata vocabulary for managing, aggregating, post-processing, multipurposing and aggregating magazine, news, catalogue, book, and mainstream journal content. PRISM recommends the use of certain existing standards, such as XML, RDF, the Dublin Core, and various ISO specifications for locations, languages, and date/time formats. In addition PRISM provides a framework for the interchange and preservation of content and metadata, a collection of elements to describe that content, and a set of controlled vocabularies listing the values for those elements. PRISM focuses on metadata for: • General-purpose description of resources as a whole • Specification of a resource’s relationships to other resources • Definition of intellectual property rights and permissions • Expressing inline metadata (that is, markup within the resource itself). Today PRISM consists of two specifications. The PRISM Specification, itself, provides definition for the overall PRISM framework. A second specification, the PRISM Aggregator DTD is a new standard format for publishers to use for delivery of content to web sites and to aggregators and syndicators. It is an XML DTD that provides a simple, flexible model for transmitting content and PRISM metadata. The PRISM Working Group was established in 1999 by a group of American companies primarily involved in the production of serial and web-based editorial content. This group includes publishers, other rights holders, systems integrators, software developers and content aggregators who face common content application challenges such as re-use of content in multiple media types, rights and contract management, better access to content archives, and faster, less expensive exchange and integration of disparate sets of content across the enterprise and with outside business partners. The representatives of these companies believe that developing and adopting a standard set of XML metadata will assist them in managing and automating their labour-intensive content workflow processes. The result of this collaboration is the PRISM specification.

VTT Information Technology

37

Modified on 19.08.05

RISE

Technology Report

The Working Group released Version 1.0 of the PRISM specification in April of 2001. Version 1.1 was released a year later. As of November, 2004, the released version of the PRISM specification is 1.2. The PRISM specification is built on existing standards such as XML, RDF, the Dublin Core, and various ISO specifications for locations, languages, and date/time formats. On top of this base, it defines a small number of XML namespaces and controlled vocabularies in order to meet the goals of interoperability, interchange, and reuse.

Figure 10. Overview of PRISM 1.2 metadata elements (http://www.prismstandard.org/resources/PRISM%20and%20PAM.pdf). As can be seen, the vocabulary (elements pcv, PRISM Controlled Vocabularies) also includes the means to define a vocabulary or a taxonomy. RSS (RDF Site Summary) 1.0 module for PRISM 1.2 has been developed by Nature Publishing Group [http://npg.nature.com], and it can be found at [http://www.prismstandard.org/resources/mod_prism.html]. It augments the RSS core and Dublin Core modules with channel- and item-level elements taken from the basic PRISM vocabulary. An EU project called NEWS is currently developing a core NEWS ontology that utilises the taxonomies and terms included in IPTC and PRISM vocabularies (see Chapter 12.1 for more information). 9.3.2 Relations between the PRISM and other specifications The PRISM specification says the following about its relation to the NewsML: " NewsML [IPTC-NEWSML] is a specification from the International Press Telecommunications Council (IPTC) aimed at the transmission of news stories and the automation of newswire services. PRISM focuses on describing content and how it may be reused." [http://www.prismstandard.org/specifications/Prism1%5B1%5D.2.pdf]

VTT Information Technology

38

Modified on 19.08.05

RISE

Technology Report

The same documents says that PRISM is mostly complementary to the two syndication specifications RSS and ICE. The PRISM Rights Language (PRL, see section 5.4) is the part of the PRISM specification which is closest to eXtensible Rights Markup Language (XrML). However, the two have different goals. PRL assumes that the sender and receiver of a PRISM communication already have a business arrangement that is specified in a contract. PRISM’s focus is on lowering the costs of complying with that agreement. Thus, it provides a standard means of expressing common terms and conditions. PRISM specifies as little as possible about the internal behaviour of systems. XTM is an XML representation of ISO Topic Maps [ISO-13250], an approach for representing topics, their occurrences in documents, and the associations between topics. This is very similar to PRISM’s use of controlled vocabularies. XTM documents require that topics use a URI as a unique identifier. PRISM descriptions can directly cite XTM topics when there is a need to use them where PRISM allows values from controlled vocabularies. There is also a simple mapping between the XTM format and the PRISM group’s simple XML format for controlled vocabularies.

9.4 Book metadata There is a long history of vocabularies and classifications relating to books and their topics. The Universal Decimal Classification is a system of library classification developed by the Belgian bibliographers Paul Otlet and Henri la Fontaine at the end of the 19th century. It is based on the Dewey Decimal Classification, but is more powerful. It uses auxiliary signs to indicate various special aspects of a subject and relationships between subjects. UDC has been modified and extended over many years to cope with the increasing output in all disciplines of human knowledge, and is still under continuous review to take account of new developments. [http://www.udcc.org/about.htm]. In Finland, this classification is utilised, in scientific libraries. There is another related classification scheme for general libraries , which is called in Finnish Yleisten kirjastojen luokitusjärjestelmä (http://ykl.kirjastot.fi/). Some general libraries, like for example the Helsinki City Library have developed their own version of it. This general classification scheme is also utilised in marketing books: Kirjavälitys, a company specialising in book logistics, gathers information about books in sale in Finland and there this general classification scheme is utilised [http://www.kirjavalitys.fi/Uudet_sivut/Tietopalvelut/ uutuustietolomake.HTML]. Additional information about the contents can be given with the help of keywords. It has become important to provide good descriptions of books when more and more books are being sold and bought online. ONIX (ONline Information eXchange) is a vocabulary to describe book related information. It is currently being developed further to make it possible to describe other content types that can be sold in similar manner. A subject description must be included in the ONIX message (MainSubject). Here, national vocabularies are being used: most US retailers and wholesalers require a BISAC subject heading, and for UK users a BIC subject heading is mandatory in order to meet BIC Basic criteria. Additional subject fields may added and there other identification schemes may be used (ONIX for Books, Product Information Message. Product Record Format.

VTT Information Technology

39

Modified on 19.08.05

RISE

Technology Report

Release 2.1, revision 02 July 2004. Documentation revised February 2005. 182 p. Editeur)

9.5 Learning Object Metadata (LOM) Learning Object Metadata (LOM) is a standard defined by IEEE Learning Technology Standards Committee. This vocabulary consists of nine metadata element categories: The General category groups general information that describes the learning object (LO) as a whole. This category includes elements that indicate an identifier for the learning object, its title, the language it uses to communicate to the end user, a textual description, keywords, etc. The Lifecycle category comprises the features related to the history and current state of the LO. It includes information on the status and version of the LO, as well as on contributions of individuals and organisations. The Meta-Metadata category groups information about the descriptive metadata itself. This category mirrors the lifecycle one in the context of the metadata: for instance, the origin of the metadata description, as well as its potential validator, etc. can be identified. The Technical category specifies the technical requirements and characteristics of the learning object. This includes data elements that cover its format, size, location, as well as technical requirements for using the LO. The Educational category is devoted to the educational and pedagogical characteristics of the learning object. These data elements indicate: • interactivity type, i.e. whether the LO is more suited for active or expository learning • resource type, like for instance exercise, simulation, questionnaire, etc. • interactivity level (on a scale from low to high) • semantic density • intended end user role (teacher, author, learner or manager) • context (school, higher education, training or other) • typical age range • difficulty level (again on a scale from low to high) • a description of how the learning object is to be used in education or training • the language of the intended end user (which may be different from the language of the learning object itself, for instance in the context of language learning) The Rights category groups the intellectual property rights and conditions of use for the learning object. This includes information on whether or not any costs are involved with the use of the learning object, and whether or not any copyright restrictions apply. The Relation category combines features that define the relationship between the learning object and other related learning objects. This category includes information on the nature of the relationship (e.g. ‘is based on’, ‘is part of’, etc.). The Annotation category provides comments on the educational use of the learning object and provides information on when and by whom the comments were created.

VTT Information Technology

40

Modified on 19.08.05

RISE

Technology Report

The Classification category describes the LO in relation to a particular classification system. This is fairly widely used, but it is also often criticised. For example, many of the fields in the Educational category are difficult to define objectively and consistently. This vocabulary also gives little support to describing the actual content, so additional vocabulary need to be used there. There is also little structure related metadata. The emphasis is more on describing existing learning material units, not the components in a way that would help in putting learning objects together to make them into usable aggregations. RDF Site Summary 1.0 Modules: Learning Object Metadata [http://www.downes.ca/xml/RSS_LOM.htm] has been defined in similar way as for PRISM. Also an ontological presentation has been made [http://web.syr.edu/%7Ejqin/LO/LOM_html/index.html].

out

of

this

vocabulary

Much of the metadata in the LOM vocabulary is similar to what is also included in other metadata vocabularies except the educational category.

9.6 YSA (Yleinen Suomalainen Asiasanasto) YSA (Yleinen Suomalainen Asiasanasto, Finnish General Thesaurus, http://vesa.lib.helsinki.fi/ysa/index.html) is a Finnish language general-purpose thesaurus. The thesaurus is maintained by the National Bibliographic Services in Finland. YSA has been used for indexing Finnish publications since 1987, mainly for the purposes of libraries, museums, and archives of various kinds both in public and in the industry. It contains some general 23,000 terms, which are divided into 61 domain groups, such as Physics, History etc. YSO (Yleinen Suomalainen Ontologia, Finnish General Ontology) is a national upper ontology conforming to indexing practices of various content providers could be created. The work concentrates on enriching the semantic information of YSA and for providing better disambiguation of the concepts and terms in the thesaurus.

9.7 Open Directory Project (i.e. dmoz.org) The Open Directory Project (ODP), also known as dmoz.org, is the most comprehensive human edited directory on the web. It is maintained by a global community of volunteer editors. ODP uses a hierarchical ontology scheme, i.e. a taxonomy, for organizing site listings. The goal in developing a category structure is to create a system that allows people to easily find material. Listings on a similar topic are grouped into categories, which can then include smaller categories. The ODP follows a peer-review process, so no individual editor owns his or her category. In May 2005, there are more than 590,000 categories in the multi-lingual ODP ontology, the ODP database has over 4 million sites listed, and it is maintained by over 68,000 human editors. The Open Directory data is made available for free to anyone who agrees

VTT Information Technology

41

Modified on 19.08.05

RISE

Technology Report

to comply with the free use license. It also used in many commercial services, such as Google and Hotbot, for site category information. Further reading Open Directory Editing Guidelines: http://dmoz.org/guidelines/subcategories.html

9.8 WordNet WordNet, an electronic lexical database, is considered to be the most important resource available to researchers in computational linguistics, text analysis, and many related areas. Its design is inspired by current psycholinguistic and computational theories of human lexical memory. English nouns, verbs, adjectives, and adverbs are organised into synonym sets, each representing one underlying lexicalised concept. Different relations link the synonym sets. Wordnet has originated from Princeton University and is available for free. It has also been translated to several languages (see for example http://multiwordnet.itc.it/english/home.php), but is not yet available in Finnish.

VTT Information Technology

42

Modified on 19.08.05

RISE

Technology Report

10 Standards and recommendations This chapter gives an overview of the standards and W3C recommendations that are most relevant to the demonstrations that are being built in the RISE project. Table 2 lists the chosen standards why they were included. Table 2. Brief description of the standards and W3C recommendations that are presented in this report. Standard / Recommendation

Status

Relevance to the RISE project

MPEG-7 aims to be a generic multimedia description language that enables descriptions ranging from very low-level information to high-level semantic information.

One of the main standards for describing multimedia contents. However, it has not reached critical mass yet and its future adoption to use remains to be seen.

One example of worldwide standardization of content descriptions. Also an example of the difficulties of creating a description standard for multimedia content. The project should have a critical eye on the standard and not take it as a given.

MPEG-21 identifies and defines the mechanisms and elements needed to support the multimedia delivery chain consisting of Users interacting with Digital.

Standard partly approved till 2005, rare in use.

This standard will probably have a major impact to multimedia production and distribution in five years time. It should be considered in research projects.

SMIL is designed to be the standard markup language for timing and controlling streaming media clips.

Current version is 2.1. Rather new concept, not very popular, few players. Popularity in the future is hard to estimate, but gains popularity if more device manufactures implement it as their storage format.

The most relevant aspect might be how to attach metadata information to SMIL presentations. If there are any plans to use Flash, the SMIL should be considered instead.

RSS and Atom have made lowend content syndication possible and popular. ICE is the high-end syndication specification that also supports commercial aspects.

RSS was first utilised in larger scale to share information of weblog updates but is increasingly utilised to share information in various fiedls, for example of news or product updates.

Future web applications are often distributed, and these low-end syndication solutions can often be utilised in applications that allow user participation.

XrML, ODRL are XML vocabulary to describe usage rights; the aim is to be able to convey usage information between applications automatically.

No common standard that addresses all issues.

Relevant to know and acknowledge.

Creative Commons is a model that makes it possible to give

CC is quickly gaining more and more acceptance and

Creative Commons is a somewhat popular model for licensing

VTT Information Technology

43

Modified on 19.08.05

RISE

different kinds of licenses to media resources over the Internet. Rights may vary from different level of non-commercial use and modification even to commercial utilisation opportunities.

Technology Report

use. Even some services (Flickr, Yahoo) make it possible to search to resources with certain CC license.

content, but not a metadata recommendation as such.

10.1 MPEG-7 MPEG-7 aims to be a generic multimedia description standard that enables descriptions ranging from very low-level information to high-level semantic information. It is an open standard developed by the same group (Moving Picture Experts Group, MPEG) as the other MPEGs. The objective of MPEG-7 is to have a multimedia description language that supports current needs for multimedia descriptions as well as unknown future needs and applications for multimedia metadata. The standard does not include any feature extraction algorithms or any other means for acquiring or generating the metadata. 10.1.1 Structure The four basic elements of the standard are Descriptors, Description Schemes, Description Definition Language, and Systems Tools. A Descriptor is a representation of a feature of multimedia. A feature is a distinctive characteristic of described multimedia (i.e., data) that signifies something to somebody. A feature can be low-level such as the resolution of a video, or high-level such as licensing rights. Description Scheme is the structure and semantics of the relationships between Descriptions or other Description Schemes. In other words, Description Schemas are a set of components, which can be Descriptors or other Description Schemas. Description Definition Language (DDL) is the standard language that allows the creation and modification of Description Schemes and Descriptors. The DDL is built on top of XML Schema, adding to the XML Schema support for vectors, matrices, and typed references. Systems Tools are for preparing the MPEG-7 descriptions for transport and storage in binary form. The binarisation also allows synchronisation between the descriptions and the actual content. 10.1.2 Description Tools The best way to give and overview what kind of things can be described with MPEG-7 it is probably best to go through the Description Tools. The Descriptors and Description Schemas form together a set of predefined description tools. The tools are then grouped together according to their functionality.

VTT Information Technology

44

Modified on 19.08.05

RISE

Technology Report

Figure 11. Description Tools of MPEG-7. Basic Elements. These are the generic entities used as building blocks by various description tools. They include basic data types (numbers, matrices, vectors and country), links and locators (time, media locators, and referencing tools), and other basic descriptions tools for places, people, textual annotations, controlled vocabularies, and so on. Schema Tools. The Schema Tools are the tools for wrapping description tools for use by applications. This group also includes the package tools for organising related description tools into groups with personalised labels for easing the use by specific applications. Content Description Tools. Content Description Tools represent perceptible information, including structural aspects (structure description tools), audio and visual features, and conceptual aspects (semantic description tools). The Structure description tools describe content in terms of spatio–temporal segments organised in a hierarchical structure. Audio, visual, annotation, and content management description tools can be attached to the segments to describe them in detail. Visual description tools include the visual basic structures (such as description tools for grid layout, time series, and spatial co-ordinates) and visual description tools that describe colour, texture, shape, motion, localisation, and faces. Audio description tools comprise the audio description framework and high-level audio description tools that describe musical instrument timbre, sound recognition, spoken content, and melody. The Semantic description tools describe the content with real-world semantics and conceptual notions: objects, events, abstract concepts, and relationships. The semantic and structure description tools can be crosslinked. Content Management Tools. The Content Management Tools specify information about media features, creation, and usage of multimedia content. The Media description tools describe the storage media, coding format, quality, and transcoding hints for adapting content to different networks and terminals. The Creation description tools describe the

VTT Information Technology

45

Modified on 19.08.05

RISE

Technology Report

creation process (for example, title, agents, materials, places, and dates), classification (for example, genre, subject, parental rating, and languages), and related materials. The Usage description tools describe the conditions for use (for example, rights and availability) and the history of use (for example, financial results and audience). Content Organization Tools. Content Organization Tools create and model collections of multimedia content and descriptions. Each collection can be described as a whole by their attribute values characterised by models and statistics. Navigation and Access Tools. Navigation and Access Tools specify summaries, partitions and decompositions, and variations of multimedia content for facilitating browsing and retrieval. Summary description tools provide both hierarchical and sequential navigation modes to provide efficient preview access to the multimedia material. Partitions and decompositions description tools allow multiresolution and progressive access in time, space, and frequency. Variations description tools describe pre-existing views of multimedia content: summaries, different media modalities, (for example, image and text), scaled versions, and so on. User Interaction Tools. User Interaction Tools describe user preferences (for personalised filtering) and usage history pertaining to the consumption of the multimedia content. Further reading Martinez, J.M., Koenen, R., and Pereira, F. MPEG-7: The Generic Multimedia Content Description Interface, Part 1. IEEE Multimedia, 9 (2), pp. 78-87. Martinez, J.M. Overview of MPEG-7 Description Tools, Part 2. IEEE Multimedia, 9 (3), pp. 13-15. Manjunath, B.S., Salmebier, P., Sikora, T. (editors) Introduction to MPEG-7 Multimedia Content Description Interface, Wiley, New York, 2002

10.2 MPEG-21 The MPEG-21 Multimedia Framework initiative aims to enable the transparent and augmented use of multimedia resources across a wide range of networks and devices. Work on the standard MPEG-21 started in June 2000 and it is partly approved till 2005. MPEG-21 defines a normative open framework for multimedia delivery and consumption for use by all the players in the delivery and consumption chain. This open framework will provide content creators, producers, distributors and service providers with equal opportunities in the MPEG-21 enabled open market. This will also be to the benefit of the content consumer providing them access to a large variety of content in an interoperable manner. MPEG-21 is based on two essential concepts: the definition of a fundamental unit of distribution and transaction (the Digital Item) and the concept of Users interacting with Digital Items. The Digital Items can be considered the “what” of the Multimedia

VTT Information Technology

46

Modified on 19.08.05

RISE

Technology Report

Framework (e.g., a video collection, a music album) and the Users can be considered the “who” of the Multimedia Framework. The goal of MPEG-21 can thus be rephrased to: defining the technology needed to support Users to exchange, access, consume, trade and otherwise manipulate digital Items in an efficient, transparent and interoperable way. MPEG-21 identifies and defines the mechanisms and elements needed to support the multimedia delivery chain as described above as well as the relationships between and the operations supported by them. Within the parts of MPEG-21, these elements are elaborated by defining the syntax and semantics of their characteristics, such as interfaces to the elements. The MPEG-21 Multimedia Framework recognises that to achieve true end-to-end interoperability for digital exchange of content, more is needed than interoperable terminal architecture. MPEG-21's goal is to describe a 'big picture' of how different elements to build an infrastructure for the delivery and consumption of multimedia content relate to each other. In setting the vision and starting the work, MPEG-21 has drawn much new blood to MPEG, including representatives from major music labels, the film industry and technology providers; both IDF and the indecs consortium are now active participants. For more information see [ http://www.chiariglione.org/mpeg/standards/mpeg-21/mpeg21.htm ]

10.3 SMIL (Synchronized Multimedia Integration Language) Recommended by the World Wide Web Consortium (W3C), SMIL is designed to be the standard markup language for timing and controlling streaming media clips. SMIL is based on the eXtensible Markup Language (XML). Rather than defining the actual formats used to represent multimedia data, it defines the commands that specify whether the various multimedia components should be played together or in sequence. SMIL works for a media player similar to the way that HTML works for a Web browser. And just as HTML markup displays in any browser, the standardized SMIL language fosters interoperability between media players. You can find the official SMIL 2.0 specification at the W3C Web site: [ http://www.w3.org/TR/smil20/ ] Basically, SMIL enables Web developers to divide multimedia content into separate files and streams (audio, video, text, and images), send them to a user's computer individually, and then have them displayed together as if they were a single multimedia stream. The ability to separate out the static text and images should make the multimedia content much smaller so that it doesn't take as long to travel over the Internet. This is an improvement over monolith file formats such as Macromedia Flash. It will also open possibilities to author multimedia presentations from bits and pieces divided over a network without actually making a single package out of them. Also, say, a Quicktime video on a web site goes unnoticed by search engines like Google, but an authored SMIL presentation shows up in search results.

VTT Information Technology

47

Modified on 19.08.05

RISE

Technology Report

10.4 Syndication There are currently multiple versions of RSS in use including RSS 1.0, RSS 2.0 and many deprecated versions. Both RSS 1.0 and RSS 2.0 are being separately and independently developed - RSS 2.0 is not a progression of RSS 1.0 despite what the version numbers might suggest. [http://www.eevl.ac.uk/rss_primer/] RSS 1.0 stands for 'RDF Site Summary'. RSS 1.0 utilises the Resource Description Framework (RDF) which is the W3C recommendation for metadata. This format makes it possible to describe the content in a more flexible way. For example, existing metadata vocabularies can be expressed with this vocabulary. The specification can be found at [http://web.resource.org/rss/1.0/spec]. RSS 2.0 stands for 'Really Simple Syndication' and the emphasis is clearly on simplicity. RSS 2.0 follows on from the various RSS 0.9x specifications (RSS 0.90, RSS 0.91, RSS 0.92, RSS 0.93 ). The specification can be found at [http://blogs.law.harvard.edu/tech/rss] Atom Publishing Format and Protocol is currently being developed by IETF. It is still in draft status, but it is already widely used. The feed format enables syndication; that is, provision of a channel of information by representing multiple resources in a single document. The working group will use experience gained with RSS as the basis for a standards-track document specifying the model, syntax, and feed format. The feed format and HTTP will be used as the basis of work on a standards-track document specifying the editing protocol. The goal for the working group is to produce a single feed format and a single editing protocol. The working group home page is at [http://www.ietf.org/html.charters/atompub-charter.HTML]. Information and Content Exchange (ICE) is an XML-based Web protocol for content syndication. In June 2004, The latest version, ICE 2.0, was released in June 2004. It is Web Services compliant. Dianne Kennedy, Vice President of Publishing Technologies for IDEAlliance and Editor of the ICE 2.0 Specification, has described the difference between RSS and ICE 2.0 as follows: “Unlike RSS and other light-weight syndication protocols, ICE 2.0 is designed to support industrial-strength content syndication. It provides for subscription management, verification of delivery, and scheduled delivery in both push and pull modes. ICE is the protocol for syndicators who are distributing ‘valued content’ that generates a revenue stream or requires guaranteed delivery in a secure environment.” [http://www.icestandard.org/about/] ICE was designed to be RSS compatible and may be used to extend RSS capabilities: ICE can carry RSS feeds and add to it specified delivery (delivery policy), guaranteed receipt, and push delivery. According to [http://www.icestandard.org/resources/ice_chapter_062204.pdf], the main ICE 2.0 features are: • syndicators can describe business rules, such as usage constraints and intellectual property rights. • syndicators can create and manage catalogues of subscription offers. These can be accessed by content type, source, and other criteria. • utilises XML to represent the messages that syndicators and subscribers exchange. The ICE message structure keeps the content independent of the protocol itself, so virtually any data can be exchanged – from text to streaming video.

VTT Information Technology

48

Modified on 19.08.05

RISE

• • •

Technology Report

subscribers can specify a variety of “push” or “pull” delivery modes, as well as delivery times and frequency. Subscribers can also specify content update parameters, such as incremental or full updates. ICE-based tools allow content to be obtained from and delivered to, a wide variety of content repository types. These include databases, content management systems, file directories, Web servers, PDAs, wireless, and Internet appliances. ICE 2.0 was defined to function as a web service, which means that ICE 2.0 will rely on other standards and specifications such as HTTP, XML and SOAP to provide standardised functionality, while ICE will define high-level business rules of the web services technology stack.

10.5 Licence and Rights Metadata One domain of use for semantic metadata descriptions is describing the licensing terms and immaterial property rights (IPRs) of digital products or services. Rights expression languages (REL) are meant to describe the rights involved in a specific product in a detailed manner so that all entities involved can act accordingly. For example, using a rights expression language, an entity can state that it gives another entity a non-exclusive license to complete specific operations on particular information certain amount of times in a specified period of time if the other entity pays certain fees. Such information is adequately included in the rights description part of information product’s metadata. For further reading see MobileIPR Final Report, HIIT Publications 2003-3. It is quite demanding to define a formal language that can be used to correctly express all the necessary rights in different jurisdictions. There is some interesting work going on for defining such a language. The two most prominent emerging Rights Expression Languages for common standards are Extensible Rights Markup Language (XrML) from ContentGuard, Inc. and Open Digital Rights Language (ODRL) from IPR Systems Ltd. 10.5.1 XrML The growth engine for XrML is Microsoft that has incorporated either the full language or a subset of the language in all of its DRM solutions, including Media Rights Manager for Windows Media Format (audio and video) and Windows Rights Management Services for Windows Server 2003. Another big name that has licensed XrML is Sony, but the consumer electronics giant has yet to implement any technology based on XrML. See more at [http://www.xrml.org/] 10.5.2 ODRL Meanwhile, ODRL is the selection created by Open Mobile Alliance (OMA), headed by Nokia. Nokia has already released an SDK for implementing OMA compatible download applications with DRM, and it has implemented the spec in its 3595 phone. ODRL is also supported in an open-source DRM package for the emerging MPEG-4 multimedia format called OpenIPMP. When comparing the languages, ODRL has the advantage of being more concise, meaning that rights descriptions in ODRL tend to be more compact than their equivalents in XrML and that ODRL interpreters can be smaller (in memory footprint) than XrML interpreters. The latter factor is especially important in the mobile device space, where memory is at a premium. Research group Datamonitor predicts the

VTT Information Technology

49

Modified on 19.08.05

RISE

Technology Report

market for digital content over mobile phones will reach 38 billion dollars in three years. ODRL also has some media-specific constructs that XrML does not share, including the ability to specify attributes of media objects such as file formats, resolutions and encoding rates. See more at http://www.odrl.net/ 10.5.3 Creative Commons Although not a rights expression language or a metadata scheme as such, the Creative Commons project is a noticeable approach to dealing with rights and licensing of digital products. The idea of Creative Commons (CC) is to let people configure licences for their digital content. Rather than applying the traditional two opposite options for content licensing (public domain vs. all rights reserved), CC licences take advantage of the whole spectrum between the two extremes. For example, one can license his content free to use as long as the original author is mentioned and the use is non-commercial. Creative Commons approaches the description of the licences from three angles. Each of the configured licences have a layman description for regular people to understand, a legal description for juridical validity (localised for specific legal regimes), and a technical RDF metadata description for systems and applications to automate the use of these licences. In other words, the CC model uses RDF to reference the textual licenses located in the CC servers. See more at [http://www.creativecommons.org/]

VTT Information Technology

50

Modified on 19.08.05

RISE

Technology Report

11 Ontologies One of the Rise demonstrations focuses at travelling and learning context. Several topics like history, nature, culture and news can be combined there. There are several ontologies which relate to these areas. Also ontologies for describing time, events and places are needed.

11.1 Tourism / Travel ontologies Most of the tourism ontologies are ontologies about the actual travelling information and services like accommodations, restaurants, things to see, activities etc. Different projects try to gather and combine tourist information from different web sites. Links to tourism ontologies and other related ontologies (like geographical ontologies) can be found in an Ontology collection document made by the E-Tourism portal –project [http://www.deri.at/research/projects/e-tourism/2004/d10/v0.2/20041005/ ] Most of those ontologies seem irrelevant or inappropriate for our project. The Rise project demonstration does not aim at offering actual tourism information, but focuses on background information about the places, historical, cultural and natural attractions. However, our ontology should be able to describe attractions and maybe also some annual events in order to be able to connect relevant published content to these places. The most interesting ontology of the above mentioned document is Mondeca tourism ontology, which is described as follows: "Mondeca´s tourism ontology includes tourism concepts from the WTO thesaurus [see www.world-tourism.org], terms describing the various dimensions used for tourism object profiling, tourism and cultural objects (place, museum, restaurant, housing, transportation, events...), tourism packages and tourism multimedia content. But also the richness of semantic relationships between those objects were considered. For more see [ www.mondeca.com. ]

11.2 Cultural heritage The history of attractions and places is one important perspective to travelling. In order to combine commercially published historical information with places and attractions, we need an ontology that describes the most important historical periods, places, events and persons in the Finnish history, and information about Finnish cultural heritage. Ontologies for describing the cultural heritage are being used in museums. CIDOC Conceptual Reference Model (CRM) The CIDOC CRM is intended to promote a shared understanding of cultural heritage information by providing a common and extensible semantic framework that any cultural heritage information can be mapped to. It is mainly used by museums, libraries and archives, which integrate and interchange different heterogeneous cultural heritage information. It is a top level ontology which can be used also to describe the cultural heritage of tourist attractions, places, buildings, persons etc. Published historical content (like content of historical books, articles and learning materials) can be combined to cultural heritage information.

VTT Information Technology

51

Modified on 19.08.05

RISE

Technology Report

It was developed by the ICOM/CIDOC Documentation Standards Group and CIDOC CRM SIG over 10 year period of time. CIDOC CRM is currently being elaborated by the International Standards Organisation as Committee Draft ISO/CD 21127. The CIDOC CRM is presented as an object-oriented extensible data model, but also an RDF Schema encoding is provided. The class hierarchy for the CIDOC CRM (v. 3.4.9) can be seen in the following figures. The detailed specification of the CIDOC CRM which includes detailed descriptions of the class and property definitions, hierarchies and relationships, is available from Cidoc CRM web site [ http://cidoc.ics.forth.gr/ ].

Figure 12. The main classes of the Cidoc CRM Ontology

VTT Information Technology

52

Modified on 19.08.05

RISE

Technology Report

Figure 13. Information_Object class of the Cidoc CRM ontology comprises identifiable immaterial items, such as a poems, jokes, data sets, images, texts and multimedia objects.

Also a combination of CIDOC CRM and MPEG 7 can be found. It has been developed in order to combine multimedia contents of museum and descriptions of museum objects [http://metadata.net/harmony/MW2002_paper.pdf ]. It can possibly be utilised to combine any multimedia objects to different cultural heritage information. Finnish cultural heritage Finnish cultural ontology, MAO is being developed in the MuseumFinland-project by HIIT. The Finnish cultural ontology MAO was developed on the basis of MASA, the Finnish thesaurus for museums. Most of the terms of MASA relate to the cultural history of objects, which are not so relevant for our project, but some of the terms might be useful for automatic content enrichment of published historical content. There are some domain ontologies created in the MuseumFinland-project that may be relevant in our project (Eero Hyvönen & al, 2004): • Actors ontology defines persons, companies, organisation and other active agents. • Locations ontology defines areas and places on the earth and in Finland in particular. • Times ontology defines a taxonomy of different time eras and periods by time intervals. • Events ontology defines situations, events and processes that take place in society such as wars. Events ontologies are subsets of MAO cultural ontology. Also encyclopaedias like the WSOY Facta include a lot of information about historical periods, persons and events as well as cultural information. This could be used as a starting point in creating and populating an ontology with instances of historical persons of certain time period.

VTT Information Technology

53

Modified on 19.08.05

RISE

Technology Report

11.3 Common sense ontologies Cyc is an artificial intelligence project, which attempts to assemble a comprehensive ontology and database of everyday common-sense knowledge, with the goal of enabling AI applications to perform human-like reasoning. Cyc was started in 1984 by Doug Lenat and is owned by Cycorp, Inc. in Austin, Texas. Typical pieces of knowledge represented in the Cyc database are "Every tree is a plant" and "Plants die eventually". When asked whether trees die, the inference engine can draw the obvious conclusion and answers the question correctly. The Knowledge Base (KB) contains over a million human-defined assertions, rules or common sense ideas. These are formulated in the language CycL, which is based on predicate calculus and has syntax similar to that of the Lisp programming language. The original knowledge base is proprietary, but a smaller version of the knowledge base was released as OpenCyc (http://www.opencyc.org/) under an open source license. The latest version (OpenCyc 0.9) was released in February 2005. The knowledge base contains 47,000 concepts and 306,000 facts and can be browsed on the OpenCyc website. The knowledge base is released under the LGPL. Open Mind (http://www.openimind.org) originated at the MIT Media Lab is a similar attempt of collecting common sense data. However, this knowledge base is collected from the contributions of thousands of people across the Web. Since Open Mind –project started in 1999, it has accumulated more than 700,000 English facts from over 15,000 contributors.

VTT Information Technology

54

Modified on 19.08.05

RISE

Technology Report

12 Relevant projects 12.1 Automatic generation of multimedia presentations One central idea in the RISE project is the possibility to generate a presentations out of the process the media objects, so that instead of given the user a list of potentially interesting media object, the resources are organised and presented so that the resulting presentation is experienced as more valuable than a list. This chapter presents some other projects that have similar goals. The Artequakt project aims at producing biographies of artists utilising web resources and generating stories utilising the information that has been extracted and presenting it according to a story template. (Kim & al., 2002) In other words, this project aims at producing text documents by extracting the relevant content from existing web documents. Knowledge Extraction

Narrative Generation

Web

5. Interaction

1. Extraction

Linky

story templates

Servlets Servlets Servlets

6. Instantiation

2. Population

6.Instantiation

Ontology

KB

7. Rendering

web pages

3.Consolidation

KB

4. Indexing

DB

Information Management

Figure 14. The Artquakt architecture: biographies are generated utilising information availbale in web pages. The quality of the story may not be as good as produced manually, but it may consist information that would otherwise be difficult to find. (Kim & al., 2002) A good example of the required solutions and the challenges in creating multimedia presentations are shown in (Little, Geurst & Hunter, 2002) The work utilises OAI

VTT Information Technology

55

Modified on 19.08.05

RISE

Technology Report

metadata which consists of Dublin Core records of media objects, and multimedia presentations are generated through the following steps: 1. Interactive search by users 2. Semantic inferencing utilising the DC metadata 3. Mapping the inferred semantic relationships to spatial and temporal relationships or multimedia formatting objects. 4. Presentation generation utilising the previous found media objects, semantic relations, mappings to MFO's and other constraints 5. User-directed presentation generator with the help of which the user can modify the presentation by taking any of the presented media objects as a base set for as a new starting point. MPEG-7 Semantic Relations Description Scheme was taken as the base for mapping semantic relations to spatial/temporal relations. This was utilised as the top level. For example, the semantic relation X created Y is mapped to MPEG-7 relation X result Y, and this can be mapped to a temporal/spatial relation spatialLeft(X,Y). These temporal/spatial rules can then be utilised to create the presentation order or position on a page. The authors suggest that there should be more than one temporal/spatial relation for one MPEG-7 relation so that the presentations were not too repetitive. The presentations are generated by using the Cuypers presentation generation software, which was developed further in this project. Cuypers processes the presentation with the help of the following layers: 1. Semantic structure layer, which completely abstracts from the presentation's layout of navigational structure, and in the reported work, this level was extended by semantic inferencing and semantic spatio-temporal mapping process. 2. Communicative device level, which determines the main presentation structure. They can be based on generally known concepts or metaphors, and a book self is mentioned as an example. 3. Qualitative constraint level includes rules for placing the objects in relation to each other; for example where the caption should be placed in relation to an image. 4. Quantitative constraint level manages the output format independent issues and conflicts. 5. Final-form presentation level manages the issues relating to produce the final output. The authors conclude that the media content they had via the OAI records was not very interesting and considerably limited the choice of topics. Also the DC metadata records caused many problems. Only the unqualified Dublin Core vocabulary was utilised and it includes only very basic and limited information and gives little opportunities to inferencing semantic relationships. Also the information in the records is unreliable. When different kinds of media objects are being described with this vocabulary, the elements seem often to be used inconsistently. It also makes it possible to make high level descriptions of resources, and for generating multimedia presentations, it would be more useful to have access to fine-grained details and excerpts. Celetano and Gaggi propose a multimedia presentation generator with following steps: 1. Data selection, and the selected item have information of their content and technical features 2. Definition of the presentation schema, and

VTT Information Technology

56

Modified on 19.08.05

RISE

Technology Report

3. Filling the schema with the retrieved data. Their work however focuses only on the technical aspects of creating the presentation and it does not address utilising semantic information of the content. Jourdan and Bes also propose an architecture for creating multimedia presentations automatically. They identify four main steps in the process: 1. Content selection 2. Temporal ordering 3. Layout definition 4. Hyperlink definition In their proposed architecture, managing constraints and defining parameters for the presentation are emphasised. Parameters relate to the elements to be included (e.g. background music) or how the topic should be covered (e.g. in a sport report, should all games be reported in similar brief manner, or if the selected ones should be covered as much in detail as possible). Vizipen (Mariappan & Aslandogan, 2004) is a multimedia generator aimed at eLearning applications. This application also includes what the authors call concept dictionary, which can be regarded as an ontology. These project examples show that there are activities relating to the automatic content presentation. None of these projects addressed the issue of combining private and commercial content. The Artequakt project is closely related to knowledge extraction, and it aims at creating stories out of text materials - perhaps revealing new facts about the subject. The projects give examples of how to proceed in the task of creating presentations, and these can be taken into consideration when planning the RISE project demonstrations. They also indicate that we are in the early stages in this area. For example, none of the projects reported any significant user tests. The project where Dublin Core metadata indicated the potential of utilising the metadata. It also concluded Dublin Core metadata alone does not give any major opportunities in that area because of its limited content.

12.2 NEWS-project The NEWS engine web services (NEWS) project is funded in the EU 6th Framework Programme by IST – semantic-based knowledge services. It was launch on April 1, 2004 and will run until March 31, 2006. [http://www.dfki.uni-kl.de/~bernardi/News/] The goal of the NEWS project is to develop News Intelligence Technology for the Semantic Web. Its main purpose is to extend the reach and delivery capabilities of online content provision and syndication services by supporting advanced personalised news discovery, analysis and presentation, and fostering interoperability across the news content provision and fruition lifecycle.

VTT Information Technology

57

Modified on 19.08.05

RISE

Technology Report

To this end, NEWS develops a configurable composition of services that will allow users to access, select and personalise delivery of multimedia and multilingual news content. The services comprise the collection and ontological annotation of news material, the processing and presentation of content via headline generation and trend analysis, and a user interface for comprehensive information access that can be configured to meet individual user needs NEWS addresses aims at -

Using Semantic Web standards to define ontologies for the news industry;

-

Implementing an ontological annotation component which automatically applies Semantic Web standards for the news industry to newswires;

-

Developing news intelligence components with multilingual and multimedia capabilities, which use automatic ontological annotation to support semantic-based analysis, personalisation and delivery of knowledge from newswires, and

-

Integrating ontological annotation and news intelligence components as Web services into a standard interoperable platform that enables end users and applications alike to find and utilise service components dynamically.

Figure 15. The NEWS ontology and its links to existing vocabularies and ontologies.

VTT Information Technology

58

Modified on 19.08.05

RISE

Technology Report

12.3 Neptuno In the Neptuno project, semantic web technologies are used to improve a digital newspaper archive. This means improving the process of creation, maintanance and explotation of the digital archive of a newspaper. The project is being conducted by two universities (Universidad Autónoma de Madrid and Universitat de Lleida), a news media company (Diari SEGRE), and a technology provider (iSOCO, S. A.) and it is being funded by the Spanish Ministry of Science and Technology. The goal of the project is to develop a high-quality semantic archive for the Diari Segre newspaper where a) reporters and archivers have more expressive means to describe and annotate news materials, b) reporters and readers are provided with better search and browsing capabilities than those currently available, and c) the archive system is open to integration in an electronic news marketplace. (P. Castells, F. Perdrix, E. Pulido, M. Rico, R. Benjamins, J. Contreras, J. Lorés , 2004) According to the website of the project [http://nets.ii.uam.es/neptuno] the main components of the platform being developed are: • An ontology for archive news, based on journalists' and archivers' expertise and practice. The system uses IPTC Subject Reference System as a thematic classification system for news archive content. They have converted IPTC topic hierarchy to an RDF class hierarchy and then established a mapping between the classification system (thesaurus) previously used at Diari SEGRE and the IPTC. They have built their own ontology for representing the actual archive contents. •

A knowledge base, where the archived materials are described using the ontology. A DB-to-ontology conversion module has been developed for the automatic integration of the existing legacy archive materials into the knowledge base. Some concepts of the ontology correspond to the tables of existing database, such as News, Photographs, Graphics and Page. These are subclasses of the Contents concept. Content classification can be made in three different ways: Content is classified by utilising the subject following the IPTC subject classification, according to the genre of the content (like breaking news, summary, interview, opinion, survey, forecast etc.) or utilising some keywords that describe the content. One of the concepts used in the ontology is the NewsRelation concept that allows establishing relationships among news such as "extention", "previous", "comment" etc.



A semantic search module, where meaningful information needs can be expressed in terms of the ontology, and more accurate answers are supplied. The search module combines direct search by content classes and class fields with the possibility to browse the IPTC taxonomy.



A visualization and navigation module to a) display individual archive items, parts or combinations of items, and groups of items, and b) provide semantic navigation facilities based on automatically inferred links between materials (news threads, paths, dynamic clusters). Visualisation ontology was created to organising the concepts and attributes in order to be published in the portal.

VTT Information Technology

59

Modified on 19.08.05

RISE



Technology Report

A personalisation module to improve searches and adapt navigation to user profiles.

At the first stage of the project, reporters are the primary users of the archive exploitation functionalities. The project web site includes a demo of the system. Both News and Neptuno projects are focused on news. VTT's StorySlotMachine demonstration includes news materials but also other type of media content. The results and experiences of these projects are very different but the results of these projects are interesting for our project as well.

12.4 Mobile services related projects 12.4.1 Rotuaari The focus of the Rotuaari project, which was carried out at the University of Oulu, was in mobile multimedia services of the future http://www.rotuaari.net/servicesystem.html?lang=en&id=14). The project included different mobile services and field trials. From the RISE project point of view, there were two interesting field trials, Digital Oulu Cultural database and MobileKärppä. A summary of the results of different field trials can be found at http://www.rotuaari.net/downloads/smartrotuaari2-yhteenveto.pdf (in Finnish). Digital Oulu Cultural Database The Digital Oulu Cultural Database mobile service made it possible to look for more information about historically valuable attractions and persons in Oulu. The database included 48 cultural objects, and approximately 230 presentations related to them. The presentations contained media files in many media formats (text, image, voice and video.) The content was based on Kaleva's archives (local newspaper), video recordings of Oulu Guides' guided city tours and texts produced by the historian Markus H. Korhonen. New cultural objects cpoöd be added to the database with a ready-made tool. One of the problems of the trial was that the amount of content was too small. The cultural objects were described with Dublin Core metadata and the ontology is the Finnish museum ontology, MAO. The content of the cultural database can be browsed by registering as a test user at http://www.rotuaari.net/home. The difference between Digital Oulu Cultural Database and the RISE StorySlotMachine demonstration is that content in Digital Oulu solution is ready made cultural content for certain sightseeing attractions. In StorySlotMachine "the story" about the sight can always be different because it is produced on the fly from different content objects based on selection of "storytemplate" and users. In StorySlotMachine it is also possible to connect the users own content with commercial media content. Digital Oulu Cultural Database is developed as mobile service. The focus in the first phase of development of Rise demonstration has been in PC web solution although it is planned to have also some mobile features. MobileKärppä, [http://www.rotuaari.net/downloads/publication-32.pdf]

VTT Information Technology

60

Modified on 19.08.05

RISE

Technology Report

12.4.2 Kontti The KONTTI –project [http://www.vtt.fi/tte/projects/kontti/] designed and implemented a context-aware service platform and services. The platform enables management and sharing of contexts, presence information and contextual content, and it provides context adaptation and context aware messaging. Also personalisation and context tools were implemented. User profiles are created and maintained with the help of ontologies. The information contained in the profile is structurally based on ontologies. Several modular ontologies like user, context, service and device ontologies were used. For example the user can select a device she wants to use from device ontology, a set of services she wants to use from the service ontology and information about time and place from the context ontology. She can then incorporate the information together as her contextual profile. (Kolari & al., 2004) Several field trials were made; context study, everyday use, historical route, theatre festival and social circle. Especially historical route trial is interesting from the RISE demonstration point of view. The aim of the historical route trial was to study the concept of a context aware tourist route. The service identified the user's current location and offered historical information (pictures, text and video) of the location. The user accessed the content through a webbased mobile portal which was used with PDA-device. The service displayed automatically the nearest points of interest with thumbnail images. After clicking a thumbnail the user could view more information and a larger image. From there the user could return to the "Nearest sights". If there were messages to the user in the current spot, an envelope appeared at the top of the screen. A finding of the trial was that users liked the idea of the context-aware historical route. The pictures and descriptions of the sights were regarded as interesting. One feature that the users would have liked to have was the option of listening to information about the sights with headphones. The project took place between January 2002 and December 2003. Project partners were VTT, Tekes, Nokia, Radiolinja and Teamware. The context sensitive mobile services developed in the Kontti project gives a lot of interesting opportunities for travelling related applications. If the mobile properties of StorySlotMachine demonstration will be developed further, including the ideas of the Kontti project should be considered.

12.5 Mobile Media Metadata (MMM-1) In co-operation with Helsinki Institute for Information Technology HIIT and the School for Information Management and Systems (SIMS) at the University of California, Berkeley a mobile image metadata system MMM (for Mobile Media Metadata) was developed during the time period August 2002 – December 2003. The system was designed in the spring of 2003, implemented in the summer, and tested with users and evaluated in the fall. We provide here a brief overview and a more detailed description of

VTT Information Technology

61

Modified on 19.08.05

RISE

Technology Report

the research is best achieved by reading the actual papers. Here we discuss the first version of the MMM system, MMM-1. The second version of the system, MMM-2, is currently developed at UC Berkeley. 12.5.1 Memorabilia and metadata One of the backgrounds for building the MMM system was the recording of gaming and game-related experiences for personal and public use. This kind of digital memorabilia can be any kind of media (e.g., images, video, sound clips, text). Both creating and handling this memorabilia is a tedious task that can be automated by annotating the memorabilia with information about its contents. For example, who is this gamer in the picture, where was it taken, what is the game, how does this picture relate to the game, does it relate to other people, when was it taken, and so on. This kind of content describing metadata enables searching, organising, and sharing of the memorabilia. Also, this kind of metadata can facilitate the re-use of media to create more sophisticated memorabilia, for example, automatic composition of real life images, screenshots, and advertisements to automatically create a gameplay video or slideshow. The research of MMM looks into the problem of how to get the content describing metadata information at the time of media capture/generation. The system described takes advantage of the contextual information on the mobile phone, the processing power and shared metadata of a remote server, and the semantic knowledge of the user via user interaction. For further reading, see: Risto Sarvas, Erick Herrarte, Anita Wilhelm, and Marc Davis. Metadata Creation System for Mobile Images. MobiSys 2004, Boston, MA, USA. ACM Press 2004. Anita Wilhelm, Yuri Takhteyev, Risto Sarvas, Nancy Van House, and Marc Davis. Photo Annotation on a Camera Phone. Extended Abstracts in the Proceedings of CHI 2004, Wien, Austria. ACM Press 2004. Marc Davis and Risto Sarvas. Mobile Media Metadata for Mobile Imaging. ICME 2004 Special Session on Mobile Imaging, Taipei, Taiwan, IEEE Computer Society Press, 2004.

12.6 ARKive ARKive [http://www.arkive.org/] is a large multimedia database that contains film, stills, audio and text about globally endangered and native UK animal and plant species as well as their habitats. It aims at offering a wide range of user customised access to the core multimedia data, and full integration of the core data with external educational resources. Storing, quering and publishing the content of the ARKive is based on RDF and XSLT. There are currently three ARKive websites: the main ARKive website, Planet ARKive and ARKive Education. These repurpose ARKive's core information for different user groups: adults, children and educators.

VTT Information Technology

62

Modified on 19.08.05

RISE

Technology Report

ARKive is an initiative of The Wildscreen Trust (www.wildscreen.org.uk). HP Laboratories support the ARKive by funding a research team to develop the technical infrastructure including the content re-purposing architecture. (HP Labs, ARKive, http://www.hpl.hp.com/research/ssrc/services/publishing/arkive/ ) The content can be delivered taking into account the requested format and the target audience. Typical formats might be a species page, a video of a behavioural ethogram, a habitat description, or an arbitrary collection, e.g. “Big Cats of Africa”. The target audience is characterised by learning context, content-target age, “reading age” and preferred language. (Dingley,A.;Shabajee,P. , 2001) PICS ratings is used to indicate if the content may be too gory or sexually explicit for some age groups. A default PICS rule is implied by each targeted age group. A future direction for ARKive is in the use of SMIL (Synchronised Multimedia Integration Language). SMIL renders a presentation from video, audio, text or other media types. SMIL also opens up the possibility of custom-assembled video as a response to dynamic queries, serving up a composite presentation of those clips matching the search. The basic idea of the ARKive project is similar to ours - repurposing multimedia content for different user groups, also including the educational aspect. ARKive is a solution for the Wildscreen Trust multimedia database. The goal of RISE StorySlotMachine application is wider, and it aims at becoming more general concept that could be used in different applications with different media sources.

12.7 Ontology support in eLearning area Quite a large number of applications based on ontologies have been published for eLearning. The applications typically utilise several ontologies. Examples of such ontologies are a domain ontology that captures knowledge of the subject that is being taught, an ontology that describes learners and their goals and features, and an ontology that describes pedagogical approaches, and how content should be presented. One such example is presented in (Zhuge & Li, 2004). Another recent example of an eLearning related application can be found at (Tane & al., 2004). The application is built on a domain ontology that can be used to capture the users' understanding on the topic. This ontology can be utilised when new material is searched and it can be updated and modified when the user gets more understanding of the subject.

VTT Information Technology

63

Modified on 19.08.05

RISE

Technology Report

Figure 16. A general architecture for constructivist learning (Zhuge & Li, 2004). Semiautomatic metadata specification is made based on a course ontology that captures the structure, content (what is the resource about) and context (when to present).

Their Watchdog application has the following main functionalities: 1. Visualisation and interactive browsing techniques allow for the browsing of the ontology and knowledge base in order to improve the interaction between of the user with the content. 2. A focused crawler finds related web sites and documents that match the user’s interests. The crawl can be focused by checking new documents against the user’s preferences as specified in terms of the ontology. 3. An Edutella peer enables querying for metadata on learning objects with an expressive query language, and allows publishing local resources in the P2P network. 4. A subjective clustering component is used to generate subjective views onto the documents. 5. An ontology evolution component comprises ontology learning methods which discover changes and trends within the field of interest.

VTT Information Technology

64

Modified on 19.08.05

RISE

Technology Report

Figure 17. The architecture of the Courseware Watchdog application /Tane & al., 2004/. The ontology has an important role in supporting document clustering, which is not made only by utilising the text in the documents but it is combined with the knowledge and concepts expressed in the ontology.

VTT Information Technology

65

Modified on 19.08.05

RISE

Technology Report

13 Conclusions This reported covered the main development trends, phenomenon and technical issues relating to creating semantically supported media products and services. Semantically supported media products can be seen as part of the wider effort of creating Semantic Web. Semantic Web is the effort and vision of transforming the World Wide Web from a network of computers to a ubiquitous environment that provides resources and services. This approach facilitates the emergence of dynamic virtual organisations and communities, and other collaborative working co-operatives utilising service-oriented information systems over the Internet. The sheer magnitude of the data on the Internet is unmanageable for human, at least without powerful tools. To service an end-user in the world of increasing mass of information, it is necessary to go beyond words, and specify the meaning of the resources available through various networks. This additional layer of interpretation attempts to capture the semantics of the data described using formalised and machine-readable conceptual models, i.e. ontologies. The vision of Semantic Web is based on the huge number of resources that can be utilised intelligently. The mass effect can be and is being utilised also relating to media resources. The web makes it possible for large numbers of unknown people to share resources, like images in Flickr, and construct knowledge, like Wikipedia and dmoz, and to produce valuable and meaningful resources. The challenge is to find business models that can utilise these efforts, and to connect media contents in commercially sensible way. Semantic applications offer many interesting opportunities, but there are also several challenges in building such solutions. Creating an ontology is a challenging task because it requires that several people with different backgrounds understand the ontology in the same way. There are several ontology editing tools on the market, but their support for the total ontology lifecycle process is poor. Tools for collaborative editing, testing, deployment and maintenance of ontologies are needed. There is lack of publicly available vocabularies and ontologies, and those that exist, such as IPTC, are often utilised only to limited extent. Public vocabularies are important because it is possible that media companies may get content that is supplied with this vocabulary. Public vocabularies are often on a very general level, which limits their applicability and utilisation opportunities. IPTC is good for describing news, but not for other types of content. The same is true with LOM. The more specific information is needed, the more unlikely it is that a public vocabulary contains this information. The eLearning examples presented in this report showed that a real application needs support from several ontologies, and domain ontologies are in key position there. In Finland, there is no common thesaurus, like WordNet, which has been utilised in many projects. The Finnish YSA contains some terminology, but needs much work to become really useful. YSA has varying quality in the different sections. Anyway, there are many possible ways to classify things, and it is obvious that one vocabulary or ontology can never meet all needs.

VTT Information Technology

66

Modified on 19.08.05

RISE

Technology Report

Creating metadata is very laborious, if made manually. Therefore, another challenge is to develop methods and processes where metadata can be created automatically, or it is created as a normal processes. Natural language technologies has been successfully used to analyse the textual content. Automatic annotation of large image and video databases is still a problem although the automatic methods have been developed. The improvement in speech recognition techniques seems quite promising considering the automatic annotation of video content. Here, the opportunities to utilise users as metadata creators should also be explored, as well as capturing metadata along the resource life cycle. Quering the information from semantic systems has been problematic because almost every vendor of repository system has implemented their own query language. Fortunately the World Wide Web Consortium (W3C) is now working on to create an open standard SPARQL for querying RDF repositories. RDF repositories are now in the process of getting a declarative query language to make it easier for developers to build RDF based applications. Web Services are the current most promising technology based on the idea of serviceoriented computing. They provide the basis for the development and execution of business processes that are distributed over the network, even on uncontrolled computers. Web Services possess the potential to offer interoperability across platforms and systems. They are neutral to built-in description languages utilising standardised messages over system interfaces. Fundamental steps toward truly service-oriented computing include the reasoning about Web Service semantics, i.e., behaviour and equivalence of the service. This means realising registry services where retrieval is based on the meaning of a service and not just a Web Service name. From a different viewpoint, emergence of virtual organisations and communities also call for novel methods on how the transactions of the two parties are to be legally conducted. This means the inclusion of digital licenses and contracts between net parties, a system feature often underestimated by otherwise competent system engineers. The important question in the RISE project is to find out, how much and what kind of metadata is needed to support reusing metadata content for the active future consumers. For example, is it necessary to know, what an image contains, or is it enough to know that it was used together with a certain article. Due to the technical problems of standardizing metadata languages and descriptions, and because of the difficulties in automatically creating metadata, there is a need to find the minimum amount of metadata needed. To minimize the amount of metadata to the ones that are most valuable and usable there has to be an understanding of the application of the metadata. To use metadata efficiently, therefore, it is critical to design metadata ontologies, descriptions etc. with the same kind of engineering principles that are the basis of any software design.

VTT Information Technology

67

Modified on 19.08.05

RISE

Technology Report

References Castells, Perdrix, Pulido, Rico, Benjamins, Contreras, Lorés (2004). Neptuno: Semantic Web Technologies for a Digital Newspaper Archive. 1st European Semantic Web Symposium (ESWS 2004). Heraklion, Greece, May 2004. http://nets.ii.uam.es/neptuno/publications/neptuno-esws04.pdf Dingley,A.;Shabajee,P. (2001); Use of RDF for content re-purposing on the ARKive project , Advanced Learning Technologies, 2001. Proceedings. IEEE International Conferenceon 6-8 Aug. 2001 Page(s):199 - 202 Haase Kenneth (2004), Context for Semantic Metadata. MM'04, October 10-16, 2004, New York, New York, USA. Handschuh, S. and Staab, S. (2003), Annotation for the Semantic Web, Volume 96 Frontiers in Artificial Intelligence and Applications, 2003, 240 pp., ISBN: 1 58603 345 X Laura Hollink, Guus. Schreiber, Jan Wielemaker and Bob. Wielinga (2003). Semantic Annotation of Image Collections. [http://www.cs.vu.nl/~guus/papers/Hollink03b.pdf ] In S. Handschuh, M. Koivunen, R. Dieng and S. Staab (eds.): Knowledge Capture 2003 -Proceedings-- -- Knowledge Markup and Semantic Annotation Workshop, October 2003. Hunter (2003), Working towards MetaUtopia - A Survey of Current Metadata Research, DSTC Pty Ltd, University of Qld, Australia, http://archive.dstc.edu.au/RDU/staff/janehunter/LibTrends_paper.pdf Hyvönen, Junnila, Kettula, Mäkelä, Saarela, Salminen, Syreeni, Valo, & Viljanen (2004), Finnish Museums on the Semantic Web: The user’s Perspective on MuseumFinland [http://www.archimuse.com/mw2004/papers/hyvonen/hyvonen.html], Museums and Web Conference (MW 2004), March 31 - April 1, 2004, Arlington, USA Jaimes & Chang, (2000). A Conceptual Framework for Indexing Visual Information at Multiple Levels. IS&T/SPIE Internet Imaging, Vol 3964, San Jose, CA, USA. Jan. 2000. 14 p. Kolari, Juha; Laakko, Timo; Hiltunen, Tapio; Ikonen, Veikko; Kulju, Minna; Suihkonen, Raisa; Toivonen, Santtu; Virtanen, Tytti. (2004) Context-Aware Service for Mobile Users - Technology and User Experiences. [http://www.vtt.fi/inf/pdf/publications/2004/P539.pdf] VTT Information Technology, Espoo. 2004. 167 p. VTT Publications 539. ISBN 951-38-6396-4; 951-38-6397-2 Little, Geurst & Hunter, (2002) Dynamic Generation of Intelligent Multimedia Presentations through Semantic Inferencing. Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries. Pages: 158 - 175. Year of Publication: 2002. ISBN:3-540-44178-6

VTT Information Technology

68

Modified on 19.08.05

RISE

Technology Report

Lloyd Rutledge*, Martin Alberink, Rogier Brussee, Stanislav Pokraev, William van Dieten and Mettina Veenstra. (2003) Finding the Story – Broader Applicability of Semantics and Discourse for Hypermedia Generation. HT’03, August 26–30, 2003, Nottingham, United Kingdom. Mariappan & Aslandogan (2004) Vizipen: A System for Automatic Generation of Multimedia Concept Presentations. E-Learn 2004: World Conference in E-Learning in Corporate, Government, Healthcare, and Higher Education, Washington, DC, Nov. 1-5, 2004. 8p. [http://ranger.uta.edu/~alp/publications/E-Learn2004.pdf ] McCalla, G. (2004). The Ecological Approach to the Design of E-Learning Environments: Purpose-based Capture and Use of Information About Learners. Journal of Interactive Media in Education, 2004 (7). Special Issue on the Educational Semantic Web. ISSN:1365-893X [www-jime.open.ac.uk/2004/7] PRISM: Publishing Requirements for Industry Standard Metadata Version 1.2. 2004 08 06 http://www.prismstandard.org/specifications/Prism1%5B1%5D.2RFC.pdf Sanghee Kim, Harith Alani, Wendy Hall, Paul Lewis, David Millard, Nigel Shadbolt, and Mark Weal. Artequakt (2002): Generating tailored biographies with automatically annotated fragments from the web. In Proceedings Semantic Authoring, Annotation and Knowledge Markup Workshop in the 15th European Conference on Artificial Intelligence, Lyon, France, 2002 Schroeter, Hunter & Kosovic (2004) , "FilmEd - Collaborative Video Indexing, Annotation and Discussion Tools Over Broadband Networks", [http://metadata.net/filmed/pub/MMM04_FilmEd.pdf] International Conference on MultiMedia Modeling, Brisbane, Australia, January 2004. Tane, J., Schmitz, J. & Stumme, G. (2004) Semantic Resource Management for the Web: An Elearning Application. WWW2004, May 17–22, 2004, New York, New York, USA. ACM 1581139128/04/0005. Zhuge & Li, (2004) Active e-course for constructivist learning. International World Wide Web Conference archive. Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters. Pages: 246 - 247. 2004. 2p. ISBN:1-58113-912-8

VTT Information Technology

69

Modified on 19.08.05

RISE

Technology Report

Appendix 1. IPTC NewsCodes, their main purpose and breif comments relating to applying them in the RISE project.

NewsCode set

Brief description by IPTC

Audiocoders

Vocabulary for various types of software based audio en/decoders currently in use. List of names (not values!) to describe physical characteristics of content like "width" and "height" for photos, or "sampling rate" for audio. Vocabulary to define colour space like RGB, YUV or CMY. Describes the degree of certainty that data assigned are correct.

Characteristics Property

Colorspace Confidence

Encoding Format

Genre *)

How Present

Importance

Labeltype

Location

Media Type *)

Vocabulary of popular encoding schemes used to transform data. Describes the technical format of a content like JPG for a picture, MP3 for audio or NITF or PDF for text. Describes the nature, journalistic or intellectual characteristic of a news object, not specifically its content.

Describes the way (e.g. prominent, or in passing) in which a topic occurs in the content of a news object. Describes the relative significance of the metadata applied to a news object.

Comment relating to applicability in the RISE demo

This information could be utilised to describe the confidence of usercreated content.

This list includes genre descriptions that might be useful in reusing content in new aggregations. Some examples: advice, analysis, background, forecast, interview, profile, and retrospective.

This is used to tell, how relevant or descriptive some metadata description is. It could be applied when there are several metadata descriptions relating to an object.

Describes the type of a label attached to a news object. (Labels are portions of human readable text unlike most other metadata which are considered to be primarily machine readable only.) List of identifiers used to describe regions of the world where events take place. Describes the type of media in a very

VTT Information Technology

70

Modified on 19.08.05

RISE

MIME Type

Newsitem Type *)

Notation Of Interest To

Priority Property Provider

Relevance

Role

Technology Report

general way, like text, photo etc. Describes the type of media in a more specific way by using IANA registered MIME types. Describes the type of content that a news item carries in a very general way. Describes the technical notation of a piece of content. Describes the target audience for a NewsItem, based for example on demographics, geography or other groupings. (see also "Relevance" below) Describes the relative importance of a NewsItem for distribution. NewsML specific: Describes the type of a NewsML Property element. A unique ID assigned by the IPTC to a company, publication or service provider. Describes the extent in which a news object is relevant to the target audience specified by "OfInterestTo" (see above). Describes the role of a news object within a package of several news objects like e.g. "Main" (content), "Supporting", or "Caption".

Scene

Describes the scene of what is covered by the content.

Status

NewsML specific: The current usability of a NewsItem within NewsML. The Subject Codes is a three level system for describing content by a well defined set of terms. Topics of level Subject provide a description of the editorial content of a News at a high level, a SubjectMatter provides a description at a more precise level and finally a SubjectDetail at a rather specific level. Currently about 1300 terms are available and several of them could be assigned to a single news object enabling a very narrow description of the content. Subject Qualifiers provide a narrower attribute like context for e.g. a sports related subject code. NewsML specific: The kind of thing that the individual thing represented by the

Subject Code *)

Subject Qualifier *)

Topic Type

VTT Information Technology

71

Includes a rough age grouping as well as parents and teenagers.

This element is meant to describe the main components that can be included in a new story; not the internal structural elements. A vocabulary for describing image settings; for example profile, group, night scene,...

Modified on 19.08.05

RISE

Urgency Videocoder

Technology Report

topic can be characterised as. Describes the relative importance of a news object for editorial examination. Vocabulary for various types of software based video en/decoders currently in use.

NewsCodes sets marked with an asterisk *) are members of the older "Subject Reference System"

VTT Information Technology

72

Modified on 19.08.05