Towards Multimedia Digital Libraries

4 downloads 9660 Views 164KB Size Report
A multimedia digital library copes with the storage and retrieval of resources of different media such as ..... implements automatic signature recognition. Several ...
Towards Multimedia Digital Libraries Cláudio de Souza Baptista Computer Science Department University of Campina Grande Av. Aprígio Veloso, 882, Bodocongó Bloco CN Sala 210 Campina Grande – Pb, 58109-970, Brazil Tel.: 55 83 3310 1122 Fax 55 83 3310 1122 Email: [email protected] Ulrich Schiel Computer Science Department University of Campina Grande Av. Aprígio Veloso, 882, Bodocongó Bloco CN Sala 210 Campina Grande – Pb, 58109-970, Brazil Tel.: 55 83 3310 1122 Fax 55 83 3310 1122 Email: [email protected]

Towards Multimedia Digital Libraries Abstract A multimedia digital library copes with the storage and retrieval of resources of different media such as video, audio, maps, images and text documents. The main improvement with regard to textual digital libraries is the possibility of retrieving documents in different media combining metadata and content analysis. Content Based Indexing and Retrieval is a complex and ongoing research field, with specific problem statements for each media. A prototype of a multimedia digital library is presented. INTRODUCTION Digital libraries are a combination of available resources, coupled with services which provide access to them. Although most of the resources are in digital form, and can therefore be retrieved from a client machine, there are those which may be available only in hard-copy. In such cases, indexing and searching services are provided, enabling end-users to discover which resources are available and where they can be located. However, recent advances in multimedia technology have radically changed information systems. Multimedia involves not only the manipulation of alpha-numerical data, but also new data types such as audio, video, images, maps, and text. These new data types are known as multimedia data and the development of information systems that cope with them has become a highly attractive research area. One of such information systems are multimedia digital libraries. A multimedia digital library copes with the storage and retrieval of resources of different media such as video, audio, maps, images and text documents. Previously, searching and indexing procedures were restricted to alpha-numeric data types. In the context of textual resources, this is acceptable and efficient, but is not true for multimedia data types where interpretation of their semantics is required for effective indexing and searching. Furthermore, there are specific domains, such as spatial and temporal applications, which require tailored searching, browsing and indexing mechanisms. Multimedia digital libraries have some characteristics that make them different from other digital libraries. Some of these main characteristics are presented below:   



Data model: due to the high complexity of multimedia data it is imperative to provide a model with high level of abstraction that can use a hierarchical approach in order to represent content, relationships, structure, behavior and dynamics of objects. Large objects: multimedia digital libraries need to cope with large volume of data. Instead of some kilobytes to store a record in a conventional system, mega- or even gigabytes of storage are required for multimedia objects. Indexing: multimedia digital libraries must provide new index techniques such as content-based information retrieval that enables not only exact match queries but also similarity queries. In these, fuzzy operators may be needed, and a ranking list of approximate matches is given as a result. Due to the large size of objects and some special features such as continuous playing, new techniques of indexing and buffering, which require real-time constraints, and synchronization, are necessary. Interface: multi-modal interface is required with some facilities such as visual query, browsing, audio-visual interface and virtual reality.



Pre-processing: some treatment must be given to multimedia data before using them; such procedures include compression techniques, data quality enhancement, and addition of metadata in order to deliver more semantic information to raw data.

This paper describes the main issues on designing a multimedia digital library. We discuss backgorund issues on digital libraies, and highlight the relevance of multimedia metadata. Next section focuses on a new query paradigm based on content-based retrieval for images, video and audio. Finally future trends and a conclusion are addressed. BACKGROUND Digital Libraries Evolution Digital libraries have evolved from the concepts associated with traditional paper-based libraries. These libraries include mechanisms that support electronic documents in different formats and media involving new issues and challenges. The first generation is characterized by defining the role of a library, and the services provided, without making use of information systems. The collections and resources are indexed and searched via manual indexing cards. Users are allocated cards with their personal details, plus loan and reservation information. A library can be viewed as a collection of resources and services. Resources include books, journals, magazines, games, maps, videos and audio material. Services include loan, reservation, searching, and facilities to physically access the collections. There is specific copyright legislation, in which ownership and authorship are clearly defined. Rules for accessing the collection are previously specified and there is a community of users who obtain authorization in order to use the library services and resources. As a rule, the resources do not change; for instance, an individual book will never change its contents and authorship, although new editions of the same book might appear. Various people, including staff members and users, interact with the library. Users are the consumers of the library resources and they utilize its services. It is also important to mention that, as the resources are physical, the notions of loan and reservation services are very important. When users borrow a copy of a specific resource, they are allowed to retain the resource for a pre-specified period of time which is established according to the library policies. Furthermore, when users need resources that have already been lent to other users, the reservation service is used. Lastly, it is important to note that libraries need a physical location to store the collections, and there is also a timetable of the library open hours. The second generation involves the computerization of the library system, which results in transforming the manual card system into an electronic one. In this case, collections and resources are indexed and searched via special purpose software, and other services, such as loan and reservation, are also computerized. Firstly, each library developed an individual system which could be accessed locally. The Online Public Access Catalogues software, commonly known as OPAC, was widely adopted as the library system. Although it is still used in libraries, OPAC demonstrates several limitations such as poor user interface, and it often provides a centralized solution implemented on expensive mainframes. Apart from these features, OPAC provides information about user borrowing details; searches based on different attributes such as title, author, subject, ISBN, and classmark; a boolean search including stop-lists (words that should not appear in the search); searches based on the type of resource, such as book or periodical; and browsing, based on attributes mentioned previously.

Another major problem that was not encompassed by OPAC was the ability to interconnect and inter-operate across a network of library catalogue servers. With the advent and acceptability of the Internet as the infrastructure upon which interconnectivity and interoperability can be built, it was then realized how important and feasible it was to provide interlibrary communication. A user could therefore pose a query that could traverse and retrieve information from different library catalogues which are distributed across different locations. The adoption of standards which enabled interoperability between libraries distributed geographically was required. In order to fulfill that requirement, the ANSI/NISO Z39.50 standard was adopted (Z39.50, 1995). This standard defines an application protocol which enables access to heterogeneous and distributed resources using a unique interface. Z39.50 is based on client-server architecture which functions as a seamless gateway to remote database systems. The Z39.50 facilities include connection, searching, retrieving, manipulation of error messages, and access to information about content, such as, for example, a schema in a database. The main advance in the third generation is the fact that library information systems now provide not only index and search services, but also retrieval, as resources move from a hard copy paper-based format to a mainly digital one. Such as traditional libraries, digital libraries are a combination of available resources, coupled with services which provide access to them. Although most of the resources are in digital form, and can therefore be retrieved from a client machine, there are those which may be available only in hard-copy. In such cases searching services are provided, enabling end-users to discover which resources are available and where they can be located. For digital documents the indexing service can be completely automatic, which is not the case for hard-copy ones. Some researchers argue that digital libraries will deal with both digital and traditional resources for many years to come. Digital libraries involve actors, who interact with the system, and components, which execute the different services provided. These actors can be categorized according to the role they play in the system; they include data-providers, dataconsumers, and librarians (or data managers). Data-providers are responsible for creation and collection of the data set in a way that makes it interesting for the other class of users. This is accomplished by providing a rich semantic description of the data sets, usually using metadata. Data-consumers are the digital library end-users who utilize its services in order to discover a particular data set that meets their requirements in a particular application domain. Finally, librarians are responsible for the administration of the digital library services and resources. Their role includes organizing the classification of the collections, inclusion of new documents, defining the policies and rules of utilization, maintaining a catalogue of users and data-providers, and deciding with which other digital library they would like to intercommunicate. The fourth generation introduces the retrieval of resources of different media such as video, audio, maps, images and text documents. Previously, searching and indexing procedures were restricted to alpha-numeric data types. In the context of textual resources, this is acceptable and efficient, but is not true for multimedia data types, where interpretation of their content is required for effective indexing and searching. Furthermore, documents with spatial and temporal information require tailored searching, browsing and indexing mechanisms. This generation is still evolving; while it is feasible to think in terms of a general digital library that may deal with all the complexities of those different data types, it is likely that specific typedependent data repositories, such as video, image, geo-referenced and textual digital libraries,

will emerge. It is imperative that these libraries inter-operate so that queries across different digital libraries supporting different data types can be accomplished. Important Issues on Digital Libraries The main innovation in the field of digital libraries is evident in the fact that most of resources are in electronic format. In this format, there is no need for physical resources linked to loan, access and reservation. Resources are ideally held in a distributed database which should be accessed over the Internet. The user, instead of taking a hard copy of the document in the library, downloads a new copy. The quality of the data can become more difficult to assess if the Internet is used not only as a client access to the library but also as a repository of information. Following some researchers' definitions of the term digital library, which advocate that the Internet can be viewed as a huge digital library, the problem of data quality erupts (Arms, 2001). As a consequence of this, there should be concerns about data accuracy, originator, and integrity, once they are not easily measured (as they are in traditional libraries). There are difficulties in determining how to charge for the library services, and especially, how to guarantee copyrights on the data which are downloaded by users. Moreover, security is a great issue with the increasing influx of new viruses and hacker attacks. This issue will be addressed later in this chapter. Social and psychological aspects must be taken into consideration in the move to a digital format, given that everything is now accessible via a computer system, and less human interaction is therefore required. This can result in difficulties on making effective use of the library, as usability issues must be thoroughly addressed. Further, multi-language interfaces and facilities, such as thesauri and translators, should also be provided. One the other hand, a reservation service becomes useless, since indefinite soft copies are allowed from a document. Security is a major problem in digital libraries, particularly with reference to unauthorized use of library resources. The usual security approach that has been adopted is to establish an access control to the library resources. Under this arrangement, data consumers should have a registration record with their contact information, and should be given a login name for authorization and a password for authentication. A security log recording all access made should exist in order to enable effective auditing. Ethical policies should be explained to all users in order to make sure they use the library appropriately. Copyright is another important issue in digital libraries, as governments have not yet agreed a method by which to effectively establish copyright laws for digital data (Onsrud & Lopez, 1998). The problem of copyright legislation is more evident now that data can be downloaded, and each country may have its own specific legislation. Guaranteeing that the user will not alter data and resell them is a high priority. Spatial data is usually very expensive to capture and generate, so it is highly important that intellectual property rights be imposed and obeyed. Moreover, users are usually interested in a specific part of the spatial data set. Copyright is related to the use, replication and update of data and usually lasts for a certain period of time. Aslesen (1998) has classified the former as usage rights and the latter two as marketing rights (which include selling and distribution processes).

MULTIMEDIA METADATA Metadata have been defined in the computing literature as data about data, information

needed to make data useful, or information that describes the content, quality, condition, along with other characteristics of data (Sheth &Klas, 1998). Metadata aid the search, semantic interpretation and retrieval of data. The metadata concept is not new and has been used in several applications, in a transparent way, for many years. However, the wide dissemination of information on the Internet, and the large data sets available demand that metadata play a crucial role by allowing more accurate searches, data quality and interoperability of these huge data sets. Sometimes it is very difficult to draw a line between data and metadata representing a clear boundary. For instance, one can ask whether a thumbnail of an image would be considered data or metadata. The same applies to an abstract of a document, a video or audio clip. One solution may be to consider metadata as data themselves, integrating metadata into the data model. This approach provides transparency and simplicity to the way in which metadata are handled (Bohm & Rakow, 1994). Textual metadata for multimedia documents have been investigated for many years and, as a result, some standards like SGML and Dublin Core have been proposed. Bohm presents a classification of metadata for multimedia resources in which SGML is emphasized (Bohm & Rakow, 1994). Some examples of multimedia metadata are:  Audio: number of samples per second, number of channels, audio class, the coding in which it has been recorded, and a speaker identification;  Video: duration time, number of frames per second, compression technique, color, texture, lighting, video class and keyframes;  Image: resolution (dpi), format, compression technique, colour histogram, image brightness, and object name, location, and composition;  Text: indices on word tokens, author name, date, publication, keywords and publisher. Dublin Core is a proposal for a metadata element set used for the discovery and cataloguing of resources stored on the Web. The Dublin Core metadata set was initially proposed in a Metadata Workshop in 1995 (Weibel et.al., 1995). Dublin Core was designed to be extensible (it permits the addition of new metadata elements); inter-operable (it can be used by several resource discovery tools); and simple to use (it could be used by novice authors without much background in structuring documents for library cataloguing). The Dublin Core element set has fifteen elements: title, creator, subject, description, publisher, contributor, date, type, format, identifier, source, language, relation, coverage, and rights. Furthermore, Dublin Core is syntax independent, which means that it can be encoded in languages such as HTML and XML. Also, all elements are repeatable, modifiable and optional. The elements can be refined using qualifiers.

QUERYING DIGITAL LIBRARIES USING CONTENT-BASED RETRIEVAL A classic manner of querying a digital library is by object attributes. For example, a user may discover the book authors by accessing the author attribute of a table Books. Exact match is used for retrieving objects which satisfy the query constraints. Nonetheless, this paradigm is not applicable to multimedia data and a new one has been developed for querying multimedia database. This new paradigm is known as content-based retrieval (Gupta&Jain, 1997, Hsu, Chu & Taira, 1997, Smolier & Zhang, 1994). While in content based retrieval the query is

textual, some isolated efforts has been done to stay the query itself in the media to be retrieved. For instance, a system for music retrieval by humming has been developed by Ghias, A., Loghan, J., Chamberlin, D. & Smith B. (1995). In content-based retrieval, users may place a query for something that is similar to the information provided. Basically, the user submits a query to the database by object content. This concept is in fact not new; it is derived from domain-oriented relational calculus (GarciaMolina, Ullman & Widom, 2001). One of the best known implementations of this calculus is the query language QBE (Query by Example) proposed by IBM (Zloof, 1977). The novelty in the content-based retrieval concept is that the queries are submitted to a multimedia database, so it not only compares strings but also tries to discover the degrees of similarity among images, sounds and videos. Therefore, the system must provide a large collection of image, video and audio processing tools such as analysis, classification, and pattern recognition of these data types. The concept of similarity is fundamental in content-based retrieval. In conventional database systems, data are stored and retrieved by keywords or numbers, which enables the system to manipulate data efficiently. Usually, when a query is submitted to the system, an exact match response appears as a result, as, for example, with the query 'retrieve all employees whose salaries are 1,500.00 pounds'. Concerning image databases, the exact match is substituted by the similar match. For instance, in a facial database in a police station, the policeman could ask the database for pictures similar to a suspect's picture. In this case, if there is more than one picture that approximates that of the suspect, the system creates an ordered list. In this ordered list, a rank is used to express to what extent each picture in the database is similar to the suspect's picture. Usually, query by content is done within the following features: Color: the user asks for images similar to a particular one, concerning the color distribution. The system uses a color histogram to match the images that satisfy the user's query. The user can provide more details such as the foreground and background colors, as, for example, with the query 'retrieve the images that have green in foreground and blue in the background'. Some special words such as 'mostly', 'some', can be used to give more semantics to the users query, as, for example, with the query 'retrieve images that are mostly pink' or 'show me the images that have some yellow'. Texture: using mathematical representation, images can be classified according to scale of texture. An example of query using texture is 'retrieve images that have a texture similar to this sample'. Shape: information about area, circularity, can be used to retrieve objects by their shapes, as, for example, with the query 'retrieve images that have circle inside square shapes'. Sound: Using algorithms for audio recognition, the system reads samples provided by the user and looks for similar samples sequence in an audio database. For example, 'retrieve all songs that have a similar tune to this one'. Spatial: Using spatial operators like right-of, left-of, in-front-of, behind, beside, inside, and far-from, the system can run spatial queries such as 'retrieve images that have a house with the neighbor on the right being a hospital'.

Temporal: using temporal topological operators, queries such as 'retrieve video that shows this image after that one' are possible. Video: In this case the system must provide algorithms for indexing and retrieval of video segments. The user can ask about a given scene in a movie, or about some objects that are contained in a movie, by using a high level language. For example, as with the query 'retrieve clips that contain a flying bird'. Text: Using text pattern recognition algorithms and image similarity retrieval, the system is able to match some text with a sample. For instance, in a text database, users would like to know all books that mention a given quotation. Another example is a bank system that implements automatic signature recognition. Several content-based retrieval products have been proposed so far including IBM QBIC (Flickner et. al., 1995), VIRAGE (Bach et. al., 1999), Excalibur (Informix, 1998), Blobworld (Carson et.al., 1996), and Visualseek (Smith & Chang, 1996). A MULTIMEDIA DIGITAL LIBRARY PROTOTYPE We have developed a multimedia digital library prototype, called Freebie (Jamesson, 2006), an opensource multimedia digital library, which copes with image, video, audio, and text. Freebie was designed in a 4 tiers architecture, using the Model-View-Controller design pattern. These tiers are described in the following. The Viewer tier is responsible for the user interface. It was developed using HTML, JSP, JavaScript and Struts technologies. It is composed of the following pages: data insertion, data maintainance, searching, metadata result presentation, document details, and error page. The Controller tier is responsible for receiving the requests from the users, through the viewer, and transforming them into actions which will call the Freebie functions in the Model tier. These functions, which compose the Freebie business logic, are: object inclusion; metadata exhibition; metadata update; object deletion; and object searching. The persistence tier is implemented in the Postgresql object-relational database system. The communication between the Java classes and the data stored in the database server is done via the JDBC protocol. When an multimedia object is entered into the system, depending on its type, metadata are extracted automatically and inserted in the database using the Dublin Core metadata set. For text documents, we are using the Jakarta POI Java API for dealing with files in the Microsoft OLE2. We are also extracting text from PDF files using the PDFBox Java API. Text indexing is done using Tsearch V2 – Full Text Search, which is an Postgresql package which enables to parse a text, extract the stopwords and index the remaining text using GiSTGeneralized Index Search Trees and store these terms in the database server. After that, user may query the indexed document using any word of the text besides the traditional query on metadata. Concerning images, Freebie extracts metadata from JPEG images, through EXIF (Exchangeable Image File Format), which is a format used by many digital cameras. Also,

when an image in any format is inserted into the digital library, a thumbnail of it is automatically generated and stored as a metadata for that image, so that in the metadata result set, users may visualize this thumbnail before downloading the full image. EXIF may contain data about camera manufacturer and model, date and time, resolution, exposure, flash, geographic coordinates, title, comments (description), authos, copyrights and so on. Freebie uses the Metadata Extraction Java API to extract EXIF metadata. Video and audio are stored in a URL and their Dublin Core metadata should be entered manually into the digital library. We are working on the integration of Freebie with VideoLib, so that the query capabilities of the latter may be used in the former (Rego et.al. 2006). Users may query the multimedia objects in Freebie by using any of the Dublin Core elements. Moreover, three kinds of spatial coordinates are associated to the documents: concerning the content, the authors and the place of publication. This information can also be used in the query. Queries on documents are posed in a different page so that users may input the terms they would like to search. After submitting the query, a result set metadata page is displayed as shown in figure 1. Users may select one of the returned objects to retrieve their full metadata description and their full content.

Figure 1. Freebie result set metadata.

FUTURE TRENDS AND CONCLUSION Multimedia digital libraries are definitively in the research agenda on digital libraries. Therefore, there are many issues which could be explored in order to improve the service provided by such libraries. The first issue, is to address multimodal user interfaces so that users may not only effectively query the underlying multimedia documents using an appropriate search tool, but also they might have an appropriate environment for the ingestion process and the management activities.

The second issue is to explore new indexing techniques for audio and video, and to extend the content-based query processing to these media. Furthermore, several multimedia data have spatio-temporal characteristics. These may also be adequately indexed and retrieved . The third issue is the integration of mobile computing with digital libraries, so that users may access multimedia content from mobile devices from anywhere and anytime. There are many concerns such as device limitation, context awareness, low bandwidth, and so on. The fourth issue is related to the extension of the well know digital library standards, such as Z39.50 and OAI, to cope with multimedia content. Lastly, the use of service oriented architecture enhanced with multimedia capabilities is a new trend on distributed multimedia digital libraries. REFERENCES Aslesen, L. (1998) Intellectual Property and Mapping: a European Perspective. In P. Burrough and I. Masser, editors, European Geographic Infrastructures: Opportunities and Pitfalls, GISDATA 5, pp. 127-135. Taylor & Francis. ANSI/NISO Z39.50. (1995) Information Retrieval (Z39.50): Application Service Definition and Protocol Specification. Technical report, Z39.50 Maintenance Agency. Arms, William, Digital Libraries. 2nd Edition (2001 MIT Press. Bach, J., Fuller, C., Gupta, A., Hampapur, A., Horowitz, B., Humphrey, R., Jain, R. & Shu C. (1996) The Virage Image Search Engine: an Open Framework for Image Management. SPIRE conference. Storage and Retrieval for Still Image and Video Databases IV. Baldonado, Chang, M. Gravano, C. L. & Paepcke A. (1997). The Stanford Digital Library Metadata Architecture. International Journal of Digital Libraries, 1:108-121. Bohm, K. & Rakow, T. (1994). Metadata for Multimedia Documents. ACM SIGMOD Record, 23(4):21-26. Carson, C., Thomas, M., Belongie,S., Hellerstein, J. M. & Malik J. (1999) Blobworld: A System for Region-based Image Indexing and Retrieval. In Third International Conference on Visual Information Systems. SpringerVerlag. Flickner, M. Sawhney, H. Ashley, J. Huang, Q. Dom,B. Gorkani,M. Hafner,J. Lee,D. Petkovic,D. Steele,D. & Yanker, P. (1995) Query by Image and Video Content: the QBIC System. IEEE Computer, 28(9):23-32. Garcia-Molina, H., Ullman, J. D. Widom, J. D. Database Systems: The Complete Book. 1st edition (2001) Prentice Hall. Ghias, A., Loghan, J., Chamberlin, D. & Smith B. (1995) Query By Humming -- Musical Information Retrieval in an Audio Database. ACM Multimedia

Gupta, A. & Jain, R. (1997) Visual Information Retrieval. Communications of the ACM, 40(5):71-79. Hsu,C-C. Chu,W. & Taira, R.. A (1996) Knowledge-Based Approach for Retrieving Images by Content. IEEE Transactions on Knowledge and Data Engineering, 8(6):522-532. Informix (1998) Inc. Informix Answers Online. Version 1.91 - CD- ROM. Jain, R. & Hampapur, A. (1994) Metadata in Video Databases. ACM SIGMOD Record, 23(4) pp. 27-33. Jamesson, W.; Baptista, C. S. ; Schiel, U.; Silva, E.R.; Menezes, L.C. & Fernandes, R. M. (2006). Freebie: Uma Biblioteca Digital Baseada em Software Livre com Suporte a Buscas Textual e Espacial. Proceedings of the 12th Brazilian Symposium on Multimedia and the Web, WebMedia 2006. pp. 155-164. Onsrud H., & Lopez. X. (1998) Intellectual Property Rights in Disseminating Digital Geographic Data, Products and Services: Conflicts and Commonalities among EU and US Approaches. In P. Burrough and I. Masser, editors, European Geographic Infrastructures: Opportunities and Pitfalls, GISDATA 5, (pp. 127-135). Taylor & Francis. Rego, A.S., Baptista C. S., Silva, E.R., Schiel,U., Figueirêdo, H. F.. (2007) VideoLib: a Video Digital Library with Support to Spatial and Temporal Dimensions. Proceedings of rhe 22nd Annual ACM Symposium on Applied Computing, Seoul, Korea, Smolier S. & Zhang, H. (1994) Content-Based Indexing and Retrieval. IEEE Multimedia, 1(2):62-72. Sheth, A., & Klas, W. (editors). Multimedia Data Management - Using Metadata to Integrate and Apply Digital Media. McGraw-Hill, 1998. Smith, J. & Chang S. (1996) VisualSEEk: a Fully Automated Content-based Image Query System. In Proceedings of the Fourth ACM Multimedia Conference (MULTIMEDIA'96), pp. 87-98, New York, NY, USA, ACM Press. Weibel, S., Godby, J., Miller, E. & Daniel R. (1995) OCLC/NCSA Metadata Workshop Report. Technical report, Office of Research, OCLC Online Computer Library Center, Inc. Zloof, M. (1977) Query-By-Example: a Database Language. IBM Systems Journal, 16(4):324-343. KEY TERMS Content-based extraction: process of extracting information from non-text multimedia data. Document indexing: the process of extracting the information contained in a document, creating a index. While the index is always a set of terms, the source document may be in form of text or other media, as video, images, sound.

Digital Library: an environment for retrieval of digital documents. In contrast to a conventional library, documents if interest are not taken away from the library but can be downloaded or prompted at the host of the user. Information retrieval: the process of finding information in a set of documents by use of a computer. Multimedia data: sources in form of text, video, image, maps or sound. Multimedia document: a document which contains more than one media. Whereas in hardcopy documents only text, image and maps can be combined, electronic documents can contain any combination of multimedia data.