Encryption and Authentication for Scalable Multimedia - CiteSeerX

15 downloads 104846 Views 307KB Size Report
Another approach is to apply a transcoder at some node of the multimedia delivery .... Encryption and digital signature generation in a DRM system, on the other ...
Encryption and Authentication for Scalable Multimedia: Current State of the Art and Challenges Bin B. Zhu1, Mitchell D. Swanson2, Shipeng Li1 1

2

Microsoft Research Asia, Beijing, 100080, China General Dynamics Advanced Information Systems, Bloomington, MN 55431, USA

ABSTRACT Scalable coding is a technology that encodes a multimedia signal in a scalable manner where various representations can be extracted from a single codestream to fit a wide range of applications. Many new scalable coders such as JPEG 2000 and MPEG-4 FGS offer fine granularity scalability to provide near continuous optimal tradeoff between quality and rates in a large range. This fine granularity scalability poses great new challenges to the design of encryption and authentication systems for scalable media in Digital Rights Management (DRM) and other applications. It may be desirable or even mandatory to maintain a certain level of scalability in the encrypted or signed codestream so that no decryption or re-signing is needed when legitimate adaptations are applied. In other words, the encryption and authentication should be scalable, i.e., adaptation friendly. Otherwise secrets have to be shared with every intermediate stage along the content delivery system which performs adaptation manipulations. Sharing secrets with many parties would jeopardize the overall security of a system since the security depends on the weakest component of the system. In this paper, we first describe general requirements and desirable features for an encryption or authentication system for scalable media, esp. those not encountered with the non-scalable case. Then we present an overview of the current state of the art of technologies in scalable encryption and authentication. These technologies include full and selective encryption schemes that maintain the original or coarser granularity of scalability offered by an unencrypted scalable codestream, layered access control and block level authentication that reduce the fine granularity of scalability to a block level, among others. Finally, we summarize existing challenges and propose future research directions. Keywords: Multimedia encryption, multimedia authentication, selective encryption, scalable encryption, scalable authentication, security, digital rights management, DRM, layered access control, JPEG 2000, MPEG-4 FGS, fine granularity scalability

1. INTRODUCTION Recent advances in wired and wireless communications and increasing hardware capability have led to a phenomenal growth of multimedia applications in our daily lives. Many efforts have been pursued to enable users to access multimedia anywhere, anytime, with any devices. Networks may have different capacities and characteristics. Devices may have different display sizes and computing capabilities. To realize access to multimedia anywhere, anytime, and with any devices, a traditional approach is to compress a single multimedia signal into multiple copies, with each copy targeted at a specific application scenario such as a PC with wideband Internet access, a 3G cellular phone, etc. These multiple copies are all stored in a server to make them available for each individual user to select a copy that best fits his or her need. Another approach is to apply a transcoder at some node of the multimedia delivery path to generate a lower resolution or quality bitstream to fit the targeted network condition or device capability. Both approaches may need to be combined since multiple copies of a multimedia signal can only address a given number of preset client capabilities and network conditions.

Author contact information: Bin Zhu (contact author): [email protected], Mitchell D. Swanson: [email protected], Shipeng Li: [email protected]

A more elegant solution is to encode a multimedia signal with a scalable codec. Scalable codecs have been attracting increased interest and attention in recent years in both industry and academia due to their flexibility and easy adaptation to a wide range of applications. A scalable codec encodes a signal into a single codestream which is partitioned and organized according to certain scalable parameters or importance. Based on scalabilities offered by a codestream, each individual user can extract from the same codestream the best representation that fits his or her application. Different scalable codecs offer different scalabilities. Possible scalabilities include quality, rate, spatial, temporal, color space, etc. Early scalable codecs provide layered scalability [1][2]. Later scalable codecs provide fine granularity scalability (FGS) [3][4]. An FGS scalable codec offers near continuous optimal tradeoff between quality and rates in a large range. Unlike the traditional approaches, a single scalable codestream is stored and used for all different applications, with possible simple adaptation manipulations such as truncation on the codestream. Scalable coding saves a lot of storage space and transmission bandwidth. This capability of “one compression to –meet –the –needs –of –all –applications” is very desirable in many multimedia applications. Several DCT- or wavelet-based scalable codecs have been developed or are under active development. The Moving Picture Experts Group (MPEG) has recently adopted a new scalable video coding format called Fine Granularity Scalability (FGS) to its MPEG-4 standard [3]. The Joint Photographic Experts Group (JPEG) has also adopted a waveletbased scalable image coding format called JPEG 2000 [4]. MPEG is also actively developing a new scalable video coding format [5]. All these scalable codecs offer fine granularity scalability. In MPEG-4 FGS, a video sequence is compressed into a single stream consisting of two layers: a base layer and an enhancement layer. The base layer is a nonscalable coding of a video sequence at the lower bound of a bitrate range. The enhancement layer encodes the difference between the original sequence and the reconstructed sequence from the base layer in a scalable manner to offer a range of bitrates for the sequence. In JPEG 2000, a codestream is organized hierarchically into different structural elements: tiles, components, resolution levels, precincts, layers, and packets. Packets are the most fundamental building block in a JPEG 2000 codestream. Video packets are the corresponding counterparts in an MPEG-4 FGS codestream. We would like to point out that although a scalable granularity at the fundamental building block level may be enough for many applications, some applications may need to exploit the finest scalable granularity offered in a scalable codestream. Both MPEG-4 FGS and JPEG 2000 offer much finer scalable granularity than their respective fundamental building blocks. For example, an MPEG-4 FGS codestream can be truncated at any (RUN, EOP) symbol of a bit-plane of an 8 by 8 DCT block. The scalable granularity to be supported in scalable media encryption or authentication is an important factor to consider in designing an encryption or authentication scheme for a scalable codec. Multimedia Digital Rights Management (DRM) manages all rights for multimedia from creation to consumption [6][7]. DRM technologies are widely used in protecting valuable digital assets. Standardization of DRM systems has been actively pursued in recent years. MPEG has recently adopted a DRM framework, eXtensions to the Intellectual Property Management and Protection (IPMP-X), for both MPEG-2 and 4 [8][9]. The Open Mobile Alliance (OMA) has recently adopted a DRM system for mobile environments [10]. There are also several proprietary DRM systems available on the market. Typical commercial DRM systems include the Windows Media Rights Manager (WMRM) from Microsoft [11], Commerce and Rights|System from InterTrust [12], Electronic Media Management System (EMMS) from IBM [13], and Helix DRM from RealNetworks [14]. A typical DRM system encrypts multimedia content which is distributed to consumers with distribution channels such as superdistribution. Superdistribution is a powerful distribution mechanism that treats ease of replication of digital content as an asset rather than a liability. Superdistribution actively encourages free distribution of digital content via any distribution mechanism imaginable to reach the maximum number of potential consumers. A DRM system controls consumption of multimedia content through a license which contains the decryption key along with specifications on how the content can be used by a user. A license is usually individualized, typically encrypted with a key that binds to the hardware of a user’s player, so the license cannot be illegally shared with others. Control of content consumption rather than distribution is much more efficient in protecting digital assets in the digital world since modern networks, storage, and compression technologies have made it trivial to transfer digital content from one device or person to another. Figure 1 shows flow of protected content and license in a DRM system. As shown in the figure, the content distributor is in general different from the content provider and the license server (or clearing house), and does not know the encryption or decryption keys. Like in conventional secure communications where message confidentiality and integrity have to be ensured, the authenticity and integrity of received multimedia need to be guaranteed in many multimedia applications. With modern computing power and editing tools, it is not difficult to modify multimedia content to convey a very different semantic meaning yet appear genuine perceptually. Multimedia authentication is used to check integrity of multimedia content. In addition to integrity verification, a multimedia authentication system may also enable origination verification (the

content is really created or sent by the specified party), non-repudiation (a content creator or sender cannot deny having generated or sent specified content), and tamper localization so untampered parts can still be used.

Figure 1: Flow of protected content and license in a DRM system

Multimedia encryption [15] and authentication [16][17] have been widely studied in the past decade. A natural question is: can we apply those developed technologies to newly developed scalable coding such as MPEG-4 FGS or JPEG 2000? An encryption or authentication technology developed for traditional non-scalable coding may not be appropriate for scalable coding. A scalable codec is designed to compress multimedia content once, and the single resulting codestream can be easily adapted to fit different applications. Encryption of a scalable codestream may reduce or destroy the scalability of the original codestream. Similarly, a scalable codestream may drop some less important data in its delivery to consumers, which is desirable or mandatory in many applications. Such a drop of data may render the resulting codestream as being tampered and rejected if an authentication scheme designed for non-scalable coding is used for a scalable coding. In DRM and many multimedia applications, content is encrypted or signed by the content provider which is different from content distributors. During its delivery to consumers, content may be processed by untrusted intermediate stages to best fit the needs of targeted consumers. If encryption or authentication of a scalable codestream destroys the scalability of the scalable coding, these intermediate processing stages are not able to perform simple scalable adaptation to select the best fitting codestream. Therefore the advantages and benefits of scalable coding are neutralized. In this case, a processing stage has to first decrypt or verify the received content, apply adaptations to the decrypted scalable codestream, and then re-encrypt or re-sign the resulting codestream. More computing power and resources are needed to perform the assigned task. More importantly, secrets such as the encryption key have to be shared with a processing stage. There are might be many processing stages from creation to consumption of a scalable content. Security of the system is greatly lowered since a successful attack to any processing stage would jeopardize the security of the whole system. These processing stages may not be protected equally well. Some may be poorly protected or even malicious. Scalability, esp. fine granularity scalability, offered by scalable coding poses great new challenges to the design of encryption and authentication systems for a scalable codestream. Encryption of a scalable codestream should minimize the impact to the scalability of the underlying scalable coding so that intermediate stages can process directly on the encrypted codestream without decryption. Authentication should be able to authenticate all possible resulting codestreams after legitimate adaptation manipulations on a scalable codestream. These design goals may be difficult to achieve, esp. if the full original scalability is required to be preserved for FGS scalable coding. Scalability of encrypted or authenticated scalable codestreams may also enable new services that cannot be offered by non-scalable coding. In this paper, we provide an overview of scalable encryption and authentication technologies for DRM and other multimedia applications. We first introduce MPEG-4 FGS and JPEG 2000 and describe general requirements and desirable features for a scalable encryption or authentication system in Section 2. Scalable encryption technologies are reviewed in Section 3, and scalable authentication in Section 4. In the concluding Section 5, we summarize existing challenges and propose future research directions. Before we conclude this section, we would like to mention differences between a DRM system and access control in multicast and broadcast applications. In both cases, encryption is used to prevent unauthorized usage of protected

content. Unlike multicast and broadcast where end-to-end communication and multimedia transfer are the focus to ensure unsubscribed consumers cannot play protected content, a DRM system provides persistent protection against unauthorized access or usage of protected content throughout the whole life of the content. Sophisticated rights can be supported in a DRM system to meet a wide range of requirements in different applications, for example, the number of times or days a video content can be played, image content is not allowed to print, etc. Both applications have different threat models and different concerns in the system design. For example, how to deal with the dynamic nature of subscribers is one of the major concerns in the design of an access control system for multicast and broadcast. Encryption keys are usually frequently changed to prevent an expired subscriber to continue playing protected content. The frequency of key change can be as small as a couple of seconds [18]. A consequence of frequent rekeying is that a less secure algorithm may be used to encrypt multimedia content since a hacker has to re-do attacks after the encryption key is changed. Encryption keys have to be used in much longer period in DRM applications since frequent rekeying in an encrypted codestream adds burden to key management and distribution. This implies that DRM applications may require higher security in multimedia encryption than multicast and broadcast applications. Another important difference is that multimedia encryption and authentication can be dynamically performed on the fly by a streaming server so that network characteristics such as transport packet size, transmission error properties, etc., can be exploited in the encryption and authentication. Encryption and digital signature generation in a DRM system, on the other hand, are performed by the content provider who has no idea on how protected content will be distributed. Network characteristics may not be exploited at encryption or authentication time so that the same protected content can be used by an as wide as possible range of potential consumers. This paper focuses on encryption and authentication of scalable codestreams for DRM applications, but some technologies designed for multicast and broadcast are also briefly mentioned. Readers interesting in encryption for broadcast are referred to an early paper by Macq and Quisquater [19].

2. SCALABLE CODING AND DESIRED FEATURES IN ITS ENCRYPTION AND AUTHENTICATION 2.1.

Scalable Coding with Fine Granularity Scalability

Many scalable coding technologies have been proposed in the literature. Two recently adopted scalable coding technologies by standard bodies are the MPEG-4 Fine Granularity Scalability [3] for scalable video coding and JPEG 2000 [4] for scalable image coding. Both coding technologies offer fine granularity scalability (FGS) which is superior to old scalable coding technologies with layer scalability [1][2]. Encryption and authentication for FGS scalable coding is more challenging than layer-level scalable coding since the fine granularity scalability may need to be preserved in an encrypted or signed scalable codestream. In the next subsections, we shall briefly describe the two FGS coding standards. 2.1.1. MPEG-4 Fine Granularity Scalability Video Coding This subsection gives a brief introduction to MPEG-4 FGS so readers can better understand the encryption and authentication technologies for MPEG-4 FGS described in this paper. More details can be found in [3]. In MPEG-4, the Video Object (VO) corresponds to entities in the codestream that can be accessed and manipulated. An instance of VO at a given time is called a Video Object Plane (VOP) [20]. MPEG-4 FGS encodes a video sequence into two layers: a nonscalable base layer which offers the lowest quality and bitrate for the scalable codestream and a scalable enhancement layer which offers enhancement in a large range of SNRs and bitrates to the base layer. The MPEG-4 Advanced Simple Profile (ASP) provides a subset of non-scalable video coding tools to achieve high efficiency coding for the base layer. The base layer is typically encoded at a very low bitrate. The FGS profile is used to obtain the enhancement layer to achieve optimized video quality with a single stream for a wide range of bitrates. More precisely, each frame’s residue, i.e., the difference between the original frame and the corresponding frame reconstructed from the base layer, is encoded for the enhancement layer in a scalable manner: DCT coefficients of the residue are compressed bit-plane wise from the most significant bit to the least significant bit. DCT coefficients in each 8 by 8 block are zigzag ordered. For each bitplane of each block, (RUN, EOP) symbols are formed and variable-length coded to produce the output enhancement layer codestream, where RUN is the number of consecutive zeros before a nonzero value and EOP indicates if there is any non-zero values left on the current bit-plane for the block. For a temporal enhancement frame which does not have a corresponding frame in the base layer, the bit-plane coding is applied to the entire DCT coefficients of the frame. This is

called FGS temporal scalability (FGST). FGST can be encoded using either forward or bi-directional prediction from the base layer. MPEG-4 FGS provides very fine grain scalability to allow near rate-distortion (RD)-optimal bitrate reduction. In MPEG-4 FGS, video data is grouped into video packets, which are separated by the resynchronization marker. The bit-plane start code, fgs_bp_start_code, in the enhancement layer also serves as a resynchronization marker for error resilience purposes. Video packets are aligned with macroblocks. In MPEG-4 FGS, video packets are determined at the time of compression, but can be changed later by modifying resynchronization marker positions. Due to the different roles they play, the base layer and the enhancement layer are typically unequally protected against network imperfection in transmission in practical applications. The base layer is usually well protected against bit errors or packet losses, and is virtually lossless in transmission. The enhancement layer, on the other hand, is lightly or not protected against network imperfection. We would like to emphasize that MPEG-4 FGS scalability is offered by the enhancement layer. The base layer does no scalability. This implies that encryption and authentication of the base layer do not affect the scalability of the resulting codestream. In other words, traditional encryption and authentication technologies developed for non-scalable coding can be directly applied to the base layer. 2.1.2. JPEG 2000 Image Coding JPEG 2000 is a wavelet-based image coding standard [4]. In JPEG 2000, an image can be partitioned into smaller rectangular regions called tiles. Each tile is encoded independently. Data in a tile is divided into one or more components in a color space. A wavelet transform is applied to each tile-component to decompose it into different resolution levels. The wavelet coefficients are quantized by a scalar quantization to reduce the precision of the coefficients except in the case of lossless compression. Each subband is partitioned into smaller non-overlapping rectangular blocks called codeblocks. Each code-block is independently entropy-encoded. The coefficients in a code-block are encoded from the most significant bit-plane to the least significant bit-plane to generate an embedded bitstream. Each bit-plane is encoded within three sub-bitplane passes. In each coding pass, the bit-plane data and the contextual information are sent to an adaptive arithmetic encoder for encoding. The arithmetic coding is terminated at the end of the last bit-plane encoding for a code-block. For error resilience, JPEG 2000 also allows for the termination of the arithmetic coded bitstream as well as the re-initialization of the context probabilities at each coding pass boundary to enable independent decoding of the bitstream from each coding pass. The compressed bitstream from each code-block is distributed across one or more layers in the codestream. Each layer represents a quality increment. A layer consists of a number of consecutive bit-plane coding passes from each code-block in the tile, including all subbands of all components for that tile. The fundamental building block in a JPEG 2000 codestream is called a packet. A packet is simply a continuous segment in the compressed codestream that consists of a number of bit-plane coding passes for each code-block in the precinct. Each packet in a tile can be uniquely identified by the four parameters: component, resolution level, layer, and precinct. All packets for a tile can be ordered with different hierarchical ordering in a JPEG 2000 codestream by varying the ordering of these parameters in nested “for loops”, where each “for loop” is for one parameter from the above list. Details on JPEG 2000 can be found in [4]. 2.2.

Requirements and Desired Features in Scalable Encryption and Authentication

There are some general requirements and desirable features to be supported in scalable multimedia encryption and authentication. Different applications may have different subsets of the requirements and features. These requirements are usually related to each other, and some are mutually competitive. Careful balancing of contradicting requirements and a tradeoff are always necessary in designing a practical encryption or authentication system for scalably encoded multimedia content. 1) Encrypted content leakage (perceptibility): Audiovisual data usually has a very high data rate. Selective encryption of important portions of data is a popular approach in multimedia encryption. The unencrypted portion of data would inevitably leak out some content information. Applications may allow different levels of content leakage. For example, an encryption video with distorted objects for unauthorized users is usually acceptable for home movie entertainment since the entertainment value is destroyed after encryption. Military and financial applications, on the other hand, may not allow leakage of any content. Acceptable content leakage plays a critical role in selecting an appropriate encryption scheme in an application. 2) Security: This is an essential requirement for any encryption or authentication system. Multimedia encryption and authentication technologies have several unique features that differ from conventional data encryption and authentication. Multimedia has strong statistical and perceptual redundancy, which may be exploited to attack

3)

4)

5)

6)

7)

an authentication system. An encrypted codestream cannot entirely remove perceptual redundancy which can also be exploited to attack the encryption. As we mentioned in 1), different applications may also have different content leakage requirements. Therefore criteria of a successful attack also differ from application to application. In some high security applications, partial recovery of encrypted content may be considered as a successful breach. Multiple levels of security, i.e., security scalability, may be desirable to meet security requirements in a variety of applications. We would like to mention here in particular that video or audio encryption should be robust to known-plaintext attacks in DRM and other applications where content encryption keys are not changed frequently, since commercial audiovisual content may contain short wellknown clips, e.g., a company logo clip. Many proposed multimedia encryption and authentication algorithms are based on some “global” secrets which can be easily deduced with a short known clip and its ciphertext and therefore are vulnerable to known-plaintext attacks. Scalability: Due to high redundancy in multimedia data, minor modifications to a codestream may still be considered as authentic in multimedia authentication. For example, dropping of less important data blocks of a scalable codestream without much degradation of the resulting perceptual quality is usually acceptable in multimedia authentication. A consequence is that multimedia authentication should also allow intermediate stages to perform perceptual-quality preserving adaptations such as rate reduction. As we mentioned in Section 1, protected content may be necessarily processed by many intermediate stages from creation to consumption. This is particularly true for a scalable codestream since scalable coding is designed to use a single compressed codestream to meet all different application scenarios when proper adaptation manipulations are applied on the scalable codestream. An encrypted or signed codestream should preserve a certain level of scalability of the underlying scalable coding, either the full original scalability or a coarser scalability, depending on applications, so that intermediate stages can perform adaptation manipulations on the codestream directly without decryption or re-signing. Otherwise decryption/encryption and authentication have to be performed by an intermediate stage before and after manipulating the codestream, which incurs computational overhead, and, more importantly, lowers the system security since secrets have to be shared with these intermediate stages, and some of these stages may be untrusted or poorly protected. Usability and flexible accessibility: A multimedia encryption or authentication system should be able to address different requirements in levels of security and ways to access and play multimedia content in a wide range of applications. This is particularly desirable for scalably encoded content since scalable coding is designed to achieve compression once and decompression in many ways so that a single scalable codestream can be used in many different scenarios when proper adaptations are applied. Error resilience: Errors do occur in multimedia storage and transmission. Wireless networks are notorious for transmission errors. Network packets may be lost in transmission due to congestion, buffer overflow, and other network imperfections. Encryption, with a block cipher for example, may expand a single bit error in a ciphertext to many bit errors in the decrypted plaintext. In other words, encryption may introduce error propagation when error occurs. A well-designed encryption system should confine the encryption-incurred error propagation to minimize perceptual degradation, and enable quick recovery from bit errors and fast resynchronization from packet losses. Error resilience is also needed in multimedia authentication, esp. for scalably coded content, since a loss of partial data due to network imperfection may render the received codestream unable to be authenticated even though it can be decompressed to a good quality. Many proposed multimedia encryption and authentication algorithms were designed under a perfect transmission environment. These encryption algorithms may suffer a great perceptual degradation for an extensive period, and authentication algorithms may reject received content, should bit errors or packet losses occur during multimedia transmission. Compression overhead: Compression overhead due to encryption or authentication manifests in several ways. Compression efficiency may be lowered directly by modifying well-designed compression parameters in encryption, or modifying statistical properties of the data to be encrypted in encryption or embedding-based authentication. Additional headers may be added to a compressed codestream for decryption parameters or authentication data, boundary indicators of encrypted or signed segments, etc. This compression overhead should be minimized. Complexity: Encryption/decryption or authentication incurs computational overhead. Many applications require real-time playback of multimedia on inexpensive devices such as portable devices where low complexity is an essential requirement. Complexity and security are typically mutually competitive.

8) Format Agnostic: A complex system such as Microsoft Windows Media Player has to support many different types of encoded codestreams simultaneously. To simplify the complexity of such a system, it is desirable that multimedia encryption and authentication algorithms are agnostic to encoding formats so a simple encryption or authentication module can be used to process all differently compressed codestreams. This would also enhance interoperability. Popular commercial DRM systems such as Microsoft’s WMRM [11] and OMA’s DRM [10] use format agnostic approaches. This requirement specifically rules out any format compliant schemes. 9) Format Compliance: A popular coding technology may have a large installation base. Many multimedia systems were designed without much consideration of encryption or authentication so that a later add-on encryption or authentication may not be supported by installed devices. For example, an MPEG-2 IPMP-X protected codestream may be rejected by a DVD player as unrecognized or incorrect format. To address this “backward” compatibility problem, it is often desired that the encrypted or authenticated codestream is still format compliant. This requirement is in direct contradiction to the format agnostic requirement. 10) Tamper localization in multimedia authentication: Multimedia generally consists of large amounts of data. When tamper occurs, it is desirable to locate tampered regions so that the untampered data can still be used as long as the perceptual quality or the semantic meaning of the content has been preserved. In designing a security system for an application, a threat model is among the first to be set up. There are many ways to attack a multimedia encryption or authentication system. Almost all attacks in conventional encryption and authentication can be equally applied to the multimedia case. New attacks exploiting multimedia characteristics can also be developed. Among them is to exploit the perceptual redundancy which cannot be completely removed in an encrypted codestream. This is particularly true for selective encryption of multimedia. Error concealment technologies developed to hide errors due to transmission imperfection can be used to launch such an attack, as studied in [21][22].

3. SCALABLE ENCRYPTION Many encryption algorithms have been proposed for multimedia encryption. The most straightforward approach is the naive algorithm, a name borrowed from [23], which encrypts a compressed video stream with a conventional cipher such as the Data Encryption Standard (DES) [24] in the same way as encrypting text. A naive algorithm usually has a large computational overhead and the worst error resilience performance. It is inappropriate for encryption of scalable codestream since scalability is completely removed by this encryption. It is impossible to perform any adaptation manipulation directly on the encrypted scalable codestream generated with a naïve algorithm. Having said that, it is still possible to apply a conventional cipher to multimedia data (header excluded) inside each block in the same way as text encryption after a scalable codestream is carefully partitioned into data blocks, yet preserve block-level scalability [25][26][27][28][29]. This is particularly attractive if format agonistic encryption is required. With a fast yet secure cipher, such as the Chain & Sum (C&S) cipher proposed in [30], it is possible to dramatically reduce the computational complexity of this “full encryption” approach to meet the processing speed requirement in practical applications, as described in [28][29]. Another approach is the selective algorithm which exploits compression and perceptual characteristics, and encrypts only important parts of the compressed multimedia data. Selective encryption offers a large range of selections of tradeoffs among security levels, computational complexity, and content leakage. Many selective encryption schemes have been proposed. A partial list of representative selective encryption schemes is given in [31]. Many selective encryption schemes developed with non-scalable compression in mind can be applied equally well to scalable coding and preserve the original or coarser granularity of scalability in the encrypted codestream. One example is the format compliant encryption schemes such as those proposed in [22][32]. Selective encryption usually leaks some information of the encrypted content and is often less secure than a full encryption scheme in trading for finer granularity scalability and higher processing speed. We shall focus in the following on proposed technologies that were specially designed for encrypting scalable codestreams with certain levels of scalability in the encrypted codestream. This type of multimedia encryption is referred as scalable encryption in this paper. Some of proposed scalable encryption schemes have been overviewed in [33]. Early scalable encryption schemes [34][35][36][37][38] are based on partitioning transform coefficients in a nonscalable coding, such as DCT coefficients in MPEG-2 video coding, into multiple layers according to the importance of the coefficients. For example, the scheme proposed in [36] partitions DCT coefficients into three layers: base, middle and enhancement layers. This rough data partition provides layered scalability in the resulting codestream. The base layer is encrypted to provide minimum protection to the codestream. If higher protection is desired, the middle layer or

even the enhancement layer can be encrypted. Each layer is encrypted independently with the same key or a different key [38]. The encrypted codestream provides layered scalability that higher layers can be truncated if necessary. FGS scaling coding such as MPEG-4 FGS and JPEG 2000 offer much finer granularity scalability than the scalability offered by aforementioned schemes which partition transform coefficients into layers of importance. Researchers have proposed encryption schemes designed specifically to work with FGS scalable coding [25] [26] [27] [28] [29] [39][40][43][46][42][41]. Grosbois et al. proposed in [39] encryption schemes for JPEG 2000 to provide access control on resolutions or on layers. To provide access control on resolutions, the signs of the wavelet coefficients in high frequency subbands are pseudo-randomly flipped. The output of a pseudo-random sequence generator is used to determine if the sign of a coefficient is inverted or not. A different seed to the generator is used for each code-block. A seed for a code-block is encrypted and inserted into the codestream right after the last termination marker of the codeblock by exploiting the fact that any byte appearing behind a termination marker is skipped by a JPEG 2000 standard compliant decoder. The resulting encrypted codestream is JPEG 2000 format compliant. To provide access control on JPEG 2000 layers, the bits in the coding-passes codewords belonging to the last layers are pseudo-randomly flipped in the same way as that used for image resolution scrambling. Since a seed has to be inserted into the codestream for each code-block, the encrypted codestream should have noticeable overhead. A possible problem is that an intermediate stage unaware of the proposed schemes may remove inserted seeds for a code-block in performing some adaptation manipulations such as truncating bitstreams corresponding to lower bit-planes of a code-block. When a seed is removed, the corresponding code-block cannot be decrypted. Unlike a JPEG 2000 codestream which is entirely scalable, an MPEG-4 FGS codestream contains a non-scalable base layer and a scalable enhancement layer. Since an enhancement layer VOP depends on the base layer VOP(s) in decompression, it is natural to encrypt the base layer to protect the content. Any encryption scheme designed to encrypt non-scalable MPEG video can be used to encrypt the base layer. For example, the base layer can be fully encrypted such as the scheme proposed in [40] where video data inside each video packet is independently encrypted with the C&S cipher [30] or selectively encryption such as the scheme proposed in [40] where the DC values with known number of bits (i.e. intra_dc_coefficient and dct_dc_differential), the sign bits of DCT coefficients, and the Motion Vector (MV) sign bits (i.e., the sign bits of horizontal_mv_data and vertical_mv_data), as well as the MV residues, horizontal_mv_residual and vertical_mv_residual are encrypted to provide format-compliant encryption. The C&S cipher was selected to be used in [40] since it has low complexity, and, more importantly, any difference as small as a single bit in the plaintext results in completely different ciphertext if encrypted with the C&S cipher. This property is much desired in multimedia encryption because extra encryption parameters such as the initial vector (IV) are not needed in independently encrypting each data segment. Extra encryption parameters, if used, have to be embedded into the codestream to send to the decryption module to properly decrypt the encrypted data, and lower the compression efficiency. This advantage is clearly shown by the scalable encryption schemes proposed in [28][29][40] which use the C&S cipher and the encrypted codestream has negligible degradation to the compression efficiency, as compared to other schemes which have noticeable compression overhead. In addition, these scalable encryption schemes are also robust to known-plaintext attacks [29]. Encryption of the base layer alone in MPEG-4 FGS seems to render the resulting video frames random since an enhancement layer VOP is the residue of the difference between original VOP and the corresponding reconstructed VOP from the base layer. This may be true if each frame is viewed individually. When these frames are viewed together as a video sequence, the outlines and trajectories of moving objects are readily visible. The nature of these objects and their actions can be easily identified [29] [40]. Leakage of such content information may not be acceptable in some applications. If that is the case, the enhancement layer should also be encrypted. We have proposed a light-weight, format-compliant selective encryption scheme called Scalable Single Layer FGS Encryption (SSLFE) to encrypt an MPEG-4 FGS codestream [29] [40]. Different encryption schemes are used to encrypt the base layer and the enhancement layer so that each scheme can be designed to fully exploit the features of each layer. The based layer can be either fully or selectively encrypted, as described above. To encrypt an enhancement VOP, a hashed version of the base layer VOP that the enhancement VOP depends on is used to generate a random bitmap of fixed size that matches the frame size. Each random bit in the bitmap is used to XOR the sign bit of the DCT coefficient at the corresponding position to ensure correct recovery of all the sign bits of any received enhancement data even some packets are lost. The MV sign bits and the MV residues in an FGST VOP are scrambled in a similar manner. The encrypted codestream with this scheme preserves the full scalability of and has exactly the same error resilience performance as the original MPEG-4 FGS coding. Figure 2 shows the visual effect of SSLFE when the base layer is either selectively or fully encrypted. There exists content leakage when the base layer is selectively encrypted, esp. when motion is small. The encrypted MPEG-4 FGS codestream in this mode is format compliant. When the base layer is fully

encrypted, the resulting video appears very random. In this mode, the encrypted codestream as a whole is not format compliant but its encrypted enhancement layer is format compliant. The security with the base layer fully encrypted is also much higher than selectively encrypted. In both cases, the aforementioned content leakage problem associated with encryption of the base layer alone disappears.

Figure 2: Visual effect for video encrypted with the scheme SSLFE proposed in [29] [40]. Left column: Original: Akyio (top) and Foreman (bottom). Middle: Base layer is selectively encrypted. Right: the base layer is fully encrypted.

Wu and Mao apply the scrambling scheme for variable-length-codewords (VLCs) proposed in [22] to scramble each (RUN, EOP) symbol in the MPEG-4 FGS enhancement layer [41]. More specifically, each possible (RUN, EOP) symbol is assigned with a fixed-length index. The (RUN EOP) symbols are first converted into the corresponding indexes. These indexes are then encrypted with a conventional cipher. The encrypted indexes are partitioned into indexes, and each index is mapped back to the codeword domain. The compression overhead is about 7.0% for the QCIF “Foreman” sequence [41]. The DCT transform compacts energy to low frequency coefficients. This means that (RUN, EOP) symbols in an 8 by 8 block of an enhancement VOP may show much skewed distribution, esp. in high bit-planes. In other words, some symbols appear more frequently than others. This fact can be exploited to break the encryption just described. This scheme is also vulnerable to known-plaintext attacks when the mapping of (RUN, EOP) symbols to indexes is not changed frequently. Frequent change of the mapping will increase the compression overhead. We conclude that the security of the scheme should not be very high. A similar encryption scheme is proposed in [42][43] to encrypt the MPEG-4 FGS enhancement layer, where the (RUN, EOP) symbols are pseudo-randomly shuffled on a bit-plane or sub bit-plane level and the sign bits of DCT coefficients are scrambled for each enhancement VOP. This scheme is MPEG-4 FGS format compliant. The scalable granularity of an encrypted enhancement layer is reduced to a bit-plane or sub bit-pane level, depending on the level that (RUN, EOP) symbols are scrambled, since (RUN, EOP) symbols can no longer be identified for each block after encryption. The encryption also introduces error propagation since a wrong bit in a VLC encoded (RUN EOP) symbol renders wrong VLC decoding for the current and subsequent (RUN, EOP) symbols, which in turn causes the scrambling of (RUN, EOP) symbols for a whole bit-plane irreversible. An encryption scheme called Secure Scalable Streaming (SSS) is proposed to work with scalable coding such as MPEG-4 FGS [25][26] and JPEG 2000 [27]. The scheme partitions a scalable codestream into packets. All data except header fields in each packet is encrypted with a block cipher such as DES in the Cipher Block Chaining (CBC) mode. Hints for RD-optimal cutoff points are inserted into unencrypted header fields to allow near RD-optimal truncations. The scheme exploits the fact that decryption of a block cipher in CBC mode is causal, i.e., decryption of the current block does not depend on any later blocks, so that truncation of trailing data in a packet does not affect decryption of the early blocks in the packet. The scheme adds compression overhead since IVs used with each independent encryption are

inserted into the codestream. It also introduces error propagation since a bit error in ciphertext renders the current block corresponding to the position of the error and the next block unable to be decrypted. The scalable granularity is also reduced to a progressive packet level. The supported adaptations in SSS are to drop an entire packet or to truncate trailing data in a packet. Adaptation manipulations that result in non-ending “holes” in a packet are not supported. Most scalable encryption schemes use a single key to encrypt an entire scalable codestream (if rekeying is not considered). This may be undesirable in many applications. A scalable codestream usually covers a large range of quality and rates. It may not make much sense to pay the same price to play multimedia content on a cellular phone and on a personal computer (PC) since cellular phone plays at a much lower quality. A business model that a consumer pays according to the quality of service he or she receives makes more sense. This means that encryption of a scalable codestream should be partitioned into layers of different quality, and access to each layer is controlled. This can be easily achieved for early scalable coding that offers layered scalability. For a layered scalable codestream, each layer can be encrypted independently with a different key, as described in [38]. Current scalable coding such as MPEG-4 FGS and JPEG 2000 offers fine granularity scalability with multiple types of access. For example, JPEG 2000 allows access based on resolution, layers (i.e., quality), components, spatial locations, etc. The JPEG 2000 encryption proposed by Grosbois et al. [39], which is described earlier in this paper, supports access on both resolutions and JPEG 2000 layers. Different encryption schemes are used for different access types so an encrypted codestream can supports only the access that it is encrypted for. We would prefer that a single encrypted scalable codestream supports multiple access types. This is exactly what we want to achieve with the schemes proposed in [28][29][44]. A layered access control scheme called Scalable Multi-Layer FGS Encryption (SMLFE) for MPEG-4 FGS is proposed in [28][29] which supports access to both PSNR layers and bitrate layers simultaneously with a single encrypted codestream. PSNR layers are usually used for quality-based access such as local play, while bitrate layers are used for streaming and other applications where bandwidth constraints are of a major concern. PSNR layers partition MPEG-4 FGS video data into multiple layers according to PSNR values. A PSNR layer consists of adjacent bitplanes of each enhancement VOP. Bitrate layers partition MPEG-4 FGS video data into multiple layers according to bitrates. A bitrate layer is a group of adjacent video packets. Each layer of either access type is therefore aligned with video packets. The two types of partitions group video packets into segments. Each segment is uniquely identified by a PSNR layer and a bitrate layer. Video data inside each video packet is encrypted independently with the C&S cipher with the segment key the video packet belongs to. For enhancement VOPs partitioned into T PSNR layers and M bitrate layers, there are T × M segment keys used to encrypt an MPEG-4 FGS codestream (again, if rekeying is not used). In SMLFE, lower quality layers are accessed and reused by a higher quality layer of the same type, but not vice versa. The protection of the two different layer types is orthogonal, i.e., the right to access a layer of one type does not make the layers of the other type also accessible. The base layer in SMLFE may be unencrypted to allow an SMLFE encrypted codestream free preview of the content at low quality and content-based search of a video database without decryption. Multiple keys are used for encrypting a scalable codestream with multiple layered accesses. Key management is therefore an important issue. It is desirable to minimize keys to be maintained either on the license server or at the consumer side. An efficient key scheme is proposed in [45] to work with SMLFE. The scheme generates two independent keys, with one key assigned to the highest PSNR layer and the other to the highest bitrate layer. A cryptographic hash function is used to generate keys for the rest layers. More specifically, the layer key for a lower layer is the hash value of the layer key of the next higher layer of the same type. A segment key is generated with the DiffieHellman (DH) key agreement [24] from the two layer keys of the PSNR layer and the bitrate layer that uniquely specifies the segment. The exponentials of all layer keys for both access types are packaged with protected content. When a consumer gets a right to access a layer, the key of that layer is generated by the license server and delivered in a license to the consumer, along with the specification of the right to use the protected content. The layer key contained in a license is used to generate layers keys of the low layers of the access type. Those layer keys are used with the exponentials contained inside the protected content to generate the segment keys to decrypt the segments that the consumer has the right to access. This key scheme and SMLFE are extended in [44] to support an arbitrary number of access types and layers for a general scalable coding including JPEG 2000. A key scheme that one single encryption key with multiple decryption keys used for multiple layers with one access type is proposed in [46].

4. SCALABLE AUTHENTICATION As we mentioned in Section 1, scalable coding is designed to use a single compressed codestream to fit a large range of applications with easy adaptation. Authentication of a scalable codestream should be able to verify and accept the resulting codestream after legitimate adaptation manipulations are applied to a scalable codestream to extract a

presentation that best fit an application. In other words, we want to be able to sign once and verify many ways for a scalable codestream. This requirement poses new challenges to the design of multimedia authentication. The requirement can be met with the multimedia authentication technologies called soft and content-based authentication [17] that allow manipulations that preserve the perceptual quality and semantic meaning of the content, respectively. Soft and contentbased authentications accept manipulations other than adaptation manipulations on a scalable codestream. In fact, there is no clear boundary in general between manipulations that should be rejected and manipulations that should be acceptable in those two types of authentications. In many applications, we may want to accept only those adaptation manipulations that scalable coding is designed for and reject all other manipulations. This type of authentication is called scalable authentication in this paper. In what follows, we give an overview on scalable authentication. Readers interested in soft and content-based authentication are referred to the papers [16][17] which review the technologies for soft and content-based authentication. A scalable authentication scheme for JPEG 2000 is proposed in [39]. The scheme applies a cryptographic hash function such as Secure Hash Algorithm (SHA) [24] to the bitstream of each code-block to calculate a hash value for each code-block. These hash values are encrypted and inserted after the last termination marker of their corresponding code-blocks. To check authenticity of a received JPEG 2000 codestream, the same hash function is applied to the bitstream of each code-block and the result is compared with the embedded hash value inserted right after the last termination marker of the code-block. Authenticity is claimed if they agree with each other. Otherwise the code-block is tampered. This scheme adds 160 bits to each code-block in a JPEG 2000 codestream. The authentication tags are inserted into the codestream to be authenticated in a format compliant manner. It allows adaptation manipulations on the entirety of a code-block, i.e., a code-block is either entirely dropped from or completely preserved in the codestream. Any truncation on a code-block, for example, truncating trailing code-passes of a code-block, renders the remaining data of the code-block unable to be authenticated since any change in the input to a hash function results in a very different hash value. This is not acceptable in many applications since code-passes of a code-block may be partitioned into different layers. Any quality-based truncation, such as dropping some less important layers, makes the resulting codestream unable to be authenticated. In addition, if the same encryption key is used to encrypt hash values for two images, the vector quantization attack can be successfully launched, i.e., code-blocks (encoding data and the corresponding encrypted hash values) at the same location can be swapped between the two images yet pass the proposed authentication scheme. A Merkle hash tree is used for JPEG 2000 authentication in [47]. The scheme first constructs a Merkle hash tree for a JPEG 2000 codestream. Then a signature on the root value of the tree is generated, which is used to authenticate a received codestream. The Merkle tree can be constructed according to any ordering of the parameters: tile, component, resolution level, precinct, and layer, usually to the ordering as layer, resolution, component, and precinct which is optimized to the progression of the same ordering. At the leaf of the tree are the hash values of packets which can be uniquely identified by the five parameters. The value at each node is the hash value of its children. When an adaptation manipulation drops some packets, every subtree whose associated packets corresponding to its leaves are all removed but not completely inside another subtree of the same property is found, and the root value of these subtrees are send together with the resulting codestream as well as the encrypted root value of the Merkle tree for authentication. The scheme can authenticate the resulting codestream after any adaptation manipulations on the packet level. Partial packets after adaptation cannot be verified. Any data loss or error in transmission or storage, for example, a packet or hash value is lost in transmission, would render the whole received codestream unable to be verified, even though the received codestream may be decompressed to a very good quality image. The node values of the Merkle tree take a large storage space, and may have to be calculated on the fly, which may not be acceptable in some applications. Researchers also proposed authentication schemes for scalable codestreams to be robust against packet loss in transmission [48][49][50]. These schemes belong to a large category of streaming data authentication over lossy networks which is beyond the scope of this paper. The reference [50] contains a brief description of proposed schemes in this field.

5. REMAINING CHALLENGES AND FUTURE RESEARCH DIRECTIONS We have discussed general requirements and features in designing a scalable encryption or authentication system and presented an overview of the current scalable encryption and authentication technologies. These technologies have solved a lot of challenges in multimedia applications. As a new research area, many issues need to be further studied and optimized. There are also many remaining challenges that need to be addressed in future research endeavors. For

example, many requirements and features listed in Section 2.2 are not well addressed with the current technologies and still remain as challenges. In the following, we list some additional challenges that are particularly interesting: 1. Scalabilities: Modern scalable coding offers many different scalabilities in a single scalable codestream: quality, rate, spatial, temporal, complexity, etc. This “one compression to –meet –the –needs –of –all – applications” is also the goal for scalable encryption and authentication. In additional to the scalabilities shared with scalable coding, scalabilities in security levels, content leakage of encrypted codestream, granularity, etc., should be pursued, so that a single encrypted or signed codestream can be easily adapted to fit all possible application scenarios. This may remain as a dream in the long run. This “one –fitting –all” ideology may result in fitting no one well. The tradeoff between flexibility and performance should be carefully balanced. 2. Key management: Key management is a critical component in DRM and other security systems. Multiple layer access control requires multiple keys. Reducing complexity in key management and distribution yet maintaining security should be a research issue. 3. Individualization and usability: To make it more secure, every copy of encrypted content and every security system should be individualized to avoid illegal sharing and minimize impact of a breach. This may cause inconvenience to customers who want to move protected content from one system to another. A system should be as transparent to users as possible to make a system easy to use by customers. This is a big challenge to researchers. 4. Renewability, interoperability and protection expiration: A system should be upgradeable to new technologies and renewable once some security loopholes are found. A system should also be interoperable so components or devices made from different manufactures can work together transparently. In many countries, a copyrighted asset should be publicly available after a certain protection period. A content protected system should retire the protection once the IP protection on the content expires. None of the current systems can retire a protection itself. These issues will be addressed in the future. 5. Enabling new services and business models: With the advances of technologies, new services and business models should be developed. Scalable coding, encryption, and authentication offer many new features which should be exploited.

REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15]

R. Aravind, M. R. Civanlar, and A. R. Reibman, “Packet Loss Resilience of MPEG-2 Scalable Video Coding Algorithm,” IEEE Trans. Circuits Syst. Video Technol., vol. 6, no. 10, pp. 426 - 435, 1996. H. Gharavi and H. Partovi, “Multilevel Video Coding and Distribution Architectures for Emerging Broadband Digital Networks,” IEEE Trans. Circuits Syst. Video Technol., vol. 6, no. 5, pp. 459 - 469, 1996. W. Li, “Overview of Fine Granularity Scalability in MPEG-4 Video Standard,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 11, no. 3, pp. 301 – 317, 2001. M. Rabbani and R. Joshi, “An Overview of the JPEG 2000 Still Image Compression Standard,” Signal Processing: Image Communications, vol. 17, no. 1, pp. 3 - 48, 2002. Call for Proposals on Scalable Video Coding Technology, ISO/IEC JTC1/SC29/WG11, N6193, Dec. 2003. R. Iannella, “Digital Rights Management (DRM) Architectures,” D-Lib Magazine, vol. 7, no. 6, June 2001. A. M. Eskicioglu, J. Town, and E. J. Delp, “Security of Digital Entertainment Content from Creation to Consumption,” Signal Processing: Image Communication, Special Issue on Image Security, vol. 18, no. 4, pp. 237 – 262, 2003. ISO/IEC JTC1/SC29/WG11 13818-11:2003(E), Information Technology – Generic Coding of Moving Pictures and Associated Audio Information – Part 11: IPMP on MPEG-2 Systems, 2003. ISO/IEC JTC1/SC29/WG11 14496-13:2004(E), Information Technology – Coding of Audio-Visual Object – Part 13: Intellectual Property Management and Protection (IPMP) Extensions, 2004. Open Mobile Alliance, OMA DRM Specification Draft Version 2.0, March 2004. http://www.openmobilealliance.org Microsoft, Architecture of Windows Media Rights Manager, http://www.microsoft.com/windows/windowsmedia/wm7/drm/architecture.aspx http://www.intertrust.com/main/overview/drm.html IBM: Electronic Media Management System, http://www-306.ibm.com/software/data/emms/ RealNetworks: Helix DRM, http://www.realnetworks.com/products/drm/index.html B. Furht and D. Kirovski, Eds. Multimedia Security Handbook, CRC Press, to be published in 2004.

[16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35]

[36] [37] [38] [39]

B. B. Zhu, M. D. Swanson, and A. H. Tewfik, “When Seeing Isn't Believing,” IEEE Signal Processing Magazine, vol. 21, no. 2, pp. 40-49, 2004. B. B. Zhu and M. D. Swanson, “Multimedia Authentication and Watermarking,” Multimedia Information Retrieval and Management, D. Feng, W. C. Siu, and H. Zhang, Eds. Springer-Verlag, Ch. 7, pp. 148-177, 2003. A. M. Eskicioglu, “A Key Transport Protocol for Conditional Access Systems,” Proc. SPIE Security & Watermarking of Multimedia Content III, vol. 4314, pp. 139-148, San Jose, CA, January 22-25, 2001. B. M. Macq and J. Quisquater, “Cryptology for Digital TV Broadcasting,” Proc. IEEE, vol. 83, no. 6, pp. 944 – 957, 1995. MPEG-4 Video Verification Model Version 17.0, ISO/IEC JTC1/SC29/WG11 N3515, Beijing, July 2000. L. Qiao and K. Nahrstedt, “Comparison of MPEG Encryption Algorithms,” Int. J. Computers & Graphics, Special Issue: Data Security in Image Communication and Network, vol. 22, no. 3, 1998. J. Wen, M. Severa, W. Zeng, M. H. Luttrell, and W. Jin, “A Format-compliant Configurable Encryption Framework for Access Control of Video,” IEEE Trans. Circuits & Systems for Video Technology, vol. 12, no. 6, pp. 545 – 557, June 2002. I. Agi and L. Gong, “An Empirical Study of Secure MPEG Video Transmissions,” Proc. Internet Soc. Symp. Network & Distributed System Security, San Diego, California, pp. 137 – 144, Feb. 1996. B. Schneier, Applied Cryptography: Protocols, Algorithms, and Source Code in C, 2nd ed., John Wiley & Sons, Inc. 1996. S. J. Wee and J. G. Apostolopoulos, “Secure Scalable Streaming Enabling Transcoding Without Decryption,” IEEE Int. Conf. Image Processing, Thessaloniki, Greece, vol. 1, pp. 437 – 440, Oct. 2001 S. J. Wee and J. G. Apostolopoulos, “Secure Scalable Video Streaming for Wireless Networks,” IEEE. Int. Conf. Acoustics, Speech, and Signal Processing, vol. 4, pp. 2049 – 2052, May 7-11, 2001. S. J. Wee and J. G. Apostolopoulos, “Secure Scalable Streaming and Secure Transcoding with JPEG-2000,” IEEE Int. Image Processing, vol. 1, pp. I-205-208, Sept. 14-17, 2003. C. Yuan, B. B. Zhu, M. Su, X. Wang, S. Li, and Y. Zhong, “Layered Access Control for MPEG-4 FGS Video,” IEEE Int. Conf. Image Processing, Barcelona, Spain, vol. 1, pp. 517 – 520, Sept. 2003. B. B. Zhu, C. Yuan, Y. Wang, S. Li, “Scalable Protection for MPEG-4 Fine Granularity Scalability,” to appear in IEEE Trans Multimedia. M. H. Jakubowski and R. Venkatesan, “The Chain & Sum Primitive and Its Applications to MACs and Stream Ciphers,” EUROCRYPT’98, pp. 281 – 293, 1998. (Also available from http://www.research.microsoft.com/~mariuszj/pubs.htm.) X. Liu and A. M. Eskicioglu, “Selective Encryption of Multimedia Content in Distribution Networks: Challenges and New Directions,” Proc. 2nd IASTED Int. Conf. on Comm., Internet, and Info. Technol., pp. 527-533, Scottsdale, AZ, November 17-19, 2003. J. Wen, M. Severa, W. Zeng, M. H. Luttrell, and W. Jin, “A Format-compliant Configurable Encryption Framework for Access Control of Multimedia,” IEEE Workshop Multimedia Signal Processing, Cannes, France, ] pp. 435 – 440, Oct. 2001. H. H. Yu, “An Overview on Scalable Encryption for Wireless Multimedia Access,” Proc. SPIE, vol. 5245: Internet Quality of Service, pp. 24-34, August 2003. T. Kunkelmann and R. Reinema, “A Scalable Security Architecture for Multimedia Communication Standards,” IEEE Int. Conf. on Multimedia Computing and Systems, pp. 660 - 661, June 3-6, 1997. T. Kunkelmann and U. Horn, “Video Encryption Based on Data Partitioning and Scalable Coding – A Comparison,” Lecture Notes in Computer Science vol. 1483/1998, Proc. 5th Int. Workshop Interactive Distributed Multimedia Systems and Telecommunication Services, IDMS'98, Springer-Verlag Heidelberg, pp. 660-661, Oslo, Norway, September 1998. A. S. Tosun and W.-C. Feng, “Efficient Multi-layer Coding and Encryption of MPEG Video Streams,” IEEE Int. Conf. Multimedia and Expo, vol. 1, pp. 119 – 122, July 30 – Aug 2, 2000. A. S. Tosun and W.-C. Feng, “Lightweight Security Mechanisms for Wireless Video Transmission,” IEEE Int. Conf. on Info. Technol.: Coding and Computing, pp. 157-161, April 2001. A. M. Eskicioglu and E. J. Delp, “An Integrated Approach to Encrypting Scalable Video,” IEEE. Int. Conf. on Multimedia and Expo, vol. 1, pp. 573 – 576, Aug. 26-29, 2002. R. Grosbois, P. Gerbelot, and T. Ebrahimi, “Authentication and Access Control in the JPEG 2000 Compressed Domain,” Proc. SPIE 46th Annual Meeting, Applications of Digital Image Processing XXIV, San Diego, California, 2001.

[40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50]

C. Yuan, B. B. Zhu, Y. Wang, S. Li, and Y. Zhong, “Efficient and Fully Scalable Encryption for MPEG-4 FGS,” IEEE Int. Symp. Circuits and Systems, Bangkok, Thailand, vol. 2, pp. 620 – 623, May, 2003. M. Wu and Y. Mao, “Communication-friendly Encryption of Multimedia,” IEEE Workshop on Multimedia Signal Processing, pp. 292 – 295, Dec. 9-11, 2002. H. H. Yu, “Scalable Encryption for Multimedia Content Access Control,” IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 2, pp. II - 417-420, April 6-10, 2003. H. H. Yu and X. Yu, “Progressive and Scalable Encryption for Multimedia Content Access Control,” IEEE Int. Conf. on Communications, vol. 1, pp. 547 – 551, May 11-15, 2003. B. B. Zhu, M. Feng, and S. Li, “SLAC: A Novel Scalable Layered Access Control Framework for DRM of Multimedia,” submitted for publication. B. B. Zhu, M. Feng, and S. Li, “An Efficient Key Scheme for Layered Access Control of MPEG-4 FGS Video,” IEEE Int. Conf. on Multimedia and Expo, Taiwan, June 27-30, 2004. H. H. Yu, “On Scalable Encryption for Mobile Consumer Multimedia Applications,” IEEE Int. Conf. on Communications, vol. 1, pp. 63 – 67, June 20-24, 2004. C. Peng, R. H. Deng, Y. Wu, and W. Shao, “A Flexible and Scalable Authentication Scheme for JPEG2000 Image Codestreams,” Proc. ACM Int. Conf. on Multimedia, pp. 433-441, Nov. 2003. H. H. Yu, “Scalable Multimedia Authentication,” Proc. Joint Conf. on Info., Comm. and Signal Proc., 2003 and 4th Pacific Rim Conf. on Multimedia (ICICS-PCM 2003), pp. 443 – 447, Singapore, Dec. 15-18, 2003. H. H. Yu, “Scalable Streaming Media Authentication,” IEEE Int. Conf. on Communications (ICC’04), 2004. Y. Wu and R. H. Deng, “Content-aware Authentication of Motion JPEG2000 Stream in Lossy Networks,” IEEE Trans. Consumer Electronics, vol. 49, no. 4, pp. 792-801, 2003.