Trends in Medical Image Compression - Semantic Scholar

1 downloads 0 Views 284KB Size Report
Engineering, University of Siena, Via Roma 56, I-53100 Siena, Italy; Tel: (+39 0577) ... development of the JPEG2000 [1] and MPEG4 [2] standards ... Section II summarizes ... diagnosis at a very early stage of transmission, and to ...... Imaging 1988; 7. [19] .... Shelkens P, Giro X, Barbarian J, Cornelis J: 3-D compression of.
Current Medical Imaging Reviews, 2006, 2, 000-000

1

Trends in Medical Image Compression Gloria Menegaz* Dept. of Information Engineering, University of Siena, Via Roma 56, I-53100 Siena, Italy Abstract: This paper presents an overview on the state-of-the-art in the field of medical image coding. After a summary of the more representative 2D and 3D compression algorithms, a versatile model-based coding scheme for threedimensional medical data is introduced. The potential of the proposed system is in the fact that it copes with many of the requirements characteristic of the medical imaging field without sacrificing compression efficiency. Among the most interesting features are progressively refinable up-to-lossless quality of the decoded information, object-based functionalities and the possibility to decode a single 2D image of the dataset. Furthermore, such features can be combined enabling a swift access to any two-dimensional object of any image of interest with refinable quality. The price to pay is a slight degradation of the compression performance. Though, the possibility to focus the decoding process on a specific region of a certain 2D image allows a very efficient access to the information of interest, which can be recovered to the desired up-to lossless quality. It is believed to be an important feature for a coding system meant to be used for medical applications, which largely compensates for the eventual loss in compression that could be implied.

Keywords: Medical image compression, object-based, wavelets, 3D. INTRODUCTION It is a fact that medical images are increasingly acquired in digital format. The major imaging modalities include Computed Tomography (CT), Magnetic Resonance Imaging (MRI), Ultra Sonography (US), Positron Emission Tomography (PET), Single Photon Emission Computerized Tomography (SPECT), Nuclear Medicine (NM), Digital Subtraction Angiography (DSA) and Digital Flurography (DF). All these techniques have made available the view of cross-sections of the human body, permitted to navigate inside it and to design novel minimally invasive techniques to investigate many pathologies. The numeric representation enables new functionalities for both data archiving and processing that improve health care. The exploitation of image processing techniques in the field of medical imaging represented a breakthrough, allowing to manipulate the visual diagnostic information in a novel perspective. Among the many examples are image analysis and rendering. Feature extraction and pattern recognition are the basis for the automatic identification and classification of different types of lesions (like melanomas, tumors, stenosis) in view of the definition of automatic systems supporting the formulation of the diagnosis by a human expert. Volume rendering and computer vision techniques permitted the development of computer assisted surgery, stereotaxis and many other image-based surgical and radiological operations. The need to daily manipulate large volumes of data raised the issue of compression. The last two decades have known an increasing interest for medical image coding. The objective is to reduce the amount of data to be stored and/or transmitted while preserving the features the diagnosis is *Address correspondence to this author at the Dept. of Information Engineering, University of Siena, Via Roma 56, I-53100 Siena, Italy; Tel: (+39 0577) 234719; Fax: (+39 0577) 233609; E-mail: [email protected]

1573-4056/06 $50.00+.00

based on. The intrinsic difficulty of this issue when facing the problem with a wide perspective, namely without referring to a specific imaging modality and disease, led to the general agreement that only compression techniques allowing to recover the original data without loss (lossless compression) would be suitable. Though, the increasing demand of storage space and bandwidth within the clinical environment has encouraged also the development of lossy techniques providing higher efficiency. Another push came from the Picture Archiving and Communication Systems (PACS) community, envisioning an all digital radiological environment in hospitals including acquisition, storage, communication and display. Image compression enables fast recovery and transmission over the PACS network. PACS aims at providing a system integration solution also facilitating different activities besides computer aided diagnosis, like teaching and reviewing of the patient’s records and the mining of the clinical records. It is worth mentioning that in this case, the preservation of the original information is less critical, and a loss due to compression could in principle be tolerated. Last but not least, applications like teleradiology would be prevented without compression. The transmission over wide area networks enclosing low-bandwidth channels like telephone lines or Integrated Service Digital Networks (ISDN) could hinder the diffusion of this kind of applications. As it was easy to expect, this rose a call for a regulation setting the framework for the actual exploitation of such techniques in the clinical environment. The manipulation of medical information indeed implicates many complex legal and regulatory issues, at least as far as the processed data are supposed to be used for formulating a diagnosis. Furthermore, the need of exchanging data and sharing resources calls for the definition of standards establishing a common syntax. In the multimedia framework, this led to the development of the JPEG2000 [1] and MPEG4 [2] standards for encoding still images and video, respectively, as well as ©2006 Bentham Science Publishers Ltd.

2

Current Medical Imaging Reviews, 2006, Vol. 2, No. 2

Gloria Menegaz

MPEG7 [3] and MPEG21[4] for semantic-based tasks, like indexation via content based description. Even though such standards address the problem for natural images, some attention has also been devoted to the particular case of medical images. JPEG2000 Part 10 addresses the issue of medical data compression, with focus on three-dimensional data distributions [5]. Though, no agreement has yet been reached on the subject, which is still under investigation.

and/or to a given portion of them corresponding to the diagnostically relevant segment. Progressiveness (or scalability) allows to improve the quality of the recovered information by incrementally decoding the bitstream. In teleradiology, this enables the medical staff to start the diagnosis at a very early stage of transmission, and to eventually delineate the region of interest to switch to a ROIbased mode.

On top of this, with the introduction of digital diagnostic imaging and the increasing use of computers for clinical applications, the American College of Radiologists (ACR) and the National Manufacturers Association (NEMA) already in the 1970’s recognized the need for a standard method for transferring images and the associated information among devices manufactured by various vendors. The ACR and NEMA formed a joint committee in 1983 to face the problem. This led to the development of a standard, which is currently designated as the Digital Imaging and Communications in Medicine (DICOM) standard [6], meant to facilitate the interoperability of medical imaging equipments. More specifically, it sets forth the set of protocols to be followed by devices claiming conformance with the standard. This includes the syntax and semantics of the commands, of the associated information that can be exchanged and, for media communication, a file format and a medical directory structure to facilitate the access to the images as well as the related information stored on the interchange media. Of particular interest here is that in its Part PS 3.5, it describes how DICOM applications construct and encode the data and the possibility to integrate a number of standard image compression techniques. Among these are JPEG-LS and JPEG2000.

Two scalability options are possible: by quality and by resolution. In the scalability by quality mode, the resolution of the image does not change during the decoding process. This means that when the decoding process starts, the image is recovered at full size but with low quality, e.g. different types of artifacts (depending on the coding algorithm) degrade its appearance. In the scalability by resolution mode, the encoded information is organized such that a reduced size version of the image is recovered at full quality just after the decoding starts; the resolution (size) then increases with the amount of the decoded information.

As a concluding remark, the field of medical data compression is challenging and gathers the interest and efforts of both the signal processing and the medical communities. This paper is organized as follows. Section II summarizes the features and functionalities which are required for a coding system to be suitable for the integration in a modern PACS system. Sect. III provides an overview on the state-ofthe-art; more specifically, Sect. III-A focuses on 2D data (e.g. still images) and Sect. III-B is devoted to 3D systems (e.g. volumetric data). Section IV describes the proposed 3DEncoding/2D-Decoding object-based MLZC architecture. Section V describes the strategies employed for reaching independent object processing while avoiding border artifacts. Section VI describes the coding techniques that have been used in the proposed system as well as their generalization for region-based processing. Performance is discussed in Section VII, and Section VIII derives conclusions. REQUIREMENTS In the last decade, new requirements have emerged in the field of compression going beyond the maximization of the coding gain. Among the most important ones are progressively refinable up-to-lossless quality and Region of Interest (ROI)-based coding. Fast inspection of large volumes of data requires compression schemes able to provide a swift access to a low quality version of the images

Another basic requirement concerns the rate/distortion performance. While both the lossless and lossy representation should be available on the same bitstream, the system should be designed such that an optimal rate/distortion behavior is reached for any decoding rate. These features were not supported by the old JPEG standard, neither was ROI-based processing, making it obsolete. Besides these general requirements, other domain specific constraints come into play when focusing on medical imaging. In this case, lossless capabilities become a must at least as far as the data are supposed to be used for diagnosis. Of particular interest in the medical imaging field are indeed those systems able to provide lossless performance and fast access to the information of interest, which translates in ROI-based capability and low computational complexity. Historically, medical image compression has been investigated by researchers working in the wider field of image and video coding. As a natural consequence, the technological growth in this field is in some sense a byproduct of the progresses done in the more general framework of natural image and video coding. As it is reasonable to expect, there is no golden rule: different coding algorithms fit best different types of images and scenarios. Though, few global guidelines can be retained. One is the fact that the exploitation of the full data correlation in general improves compression. Accordingly, three-dimensional coding systems have been developed, which are more suitable for the application to volumetric data. Furthermore, the integration of some kind of "motion compensation" could lead to better results for time-varying data, including both image sequences (2D+time) and volume sequences (3D+time). In general, the same engines that proved to be the most effective for images and videos were able to provide the best performance when applied to medical images. Though, depending on the imaging modality and application, some are more suitable than others to fulfill a certain target either in terms of compression factor or, more in general, with respect to a desired functionality. A particularly challenging requirement concerns the socalled visually lossless mode. The goal is to design a system that allows some form of data compaction without affecting

Trends in Medical Image Compression

the diagnostic accuracy. This implies the investigation of issues that go beyond the frontiers of classical signal processing, and involves different fields like vision sciences (to model the sensitivity of the visual system) and artificial intelligence (for the exploitation of the a priori knowledge on the image content). Even though the investigation of these issues is among the most promising paths of the current trends in image processing, it is beyond the scope of this contribution and it will not be discussed. STATE-OF-THE-ART In what follows, some of the most widespread classical coding methods are reviewed that have been used to compress multi-dimensional medical data. Some of them respond better to the requirements summarized in the previous section, while others fit best to some specific domains or applications. The choice of a radiological compression scheme results from a complex trade-off between systemic and clinical requirements. Among the most important are image characteristic (resolution, signalto-noise ratio, contrast, sharpness, entropy); image use (telemedicine, archiving, teaching, diagnosis); type of degradation introduced by lossy coding; practical issues like user-friendliness, real-time and cost of implementation and maintenance. 2D Systems A first review of radiological image compression appeared in 1995 [7]. In their paper, the authors reviewed some of the techniques for lossless and lossy compression that have had been applied to medical images so far. Among the lossless ones, there were Different Pulse Code Modulation (DPCM) [8], Hierarchical Interpolation (HINT) [9], Bit-Plane Encoding (BPE) [10], Multiplicative Autoregression (MAR) [11, 12] and Difference Pyramids (DP) [12]. Indeed, the potential advantages of lossy techniques are clear, justifying the efforts of many researchers. Among the ones summarized in [7] were the 2D Discrete Cosine Transform (DCT), implemented either block-wise or on the entire image (e.g. full-frame DCT) [13, 14], the Lapped Orthogonal Transform (LOT) [15], as well as other classical methods like vector quantization [16] and adaptive predictive coding [17]. In what follows, some of the most interesting contributions are reviewed. Lossless Techniques DPCM – DPCM is a simple coding method based on prediction in the image domain. The value of the current pixel is approximated by the linear combination of some neighboring pixels according to some weighting factors. The prediction error is entropy coded and transmitted. According to [7], the compression factors that can be obtained with this technique were in the range 1.5-3, mostly depending on the entropy coder. The main disadvantage of DPCM is that progressiveness is not allowed because the image is reconstructed pixel-wise.

Current Medical Imaging Reviews, 2006, Vol. 2, No. 2 3

Pyramids – Pyramids have extensively been exploited for data compression in different guises. Basically, subsampling is iterated on progressively lower resolution versions of the original image up to a predefined level. The lowest resolution is encoded and transmitted. The lossless representation is obtained by successively encoding and transmitting the interpolation residuals between subsequent pyramid levels. This technique has been tested by different researchers. To give an example, in the HINT implementation [18], the results in compression ratios ranged from about 1.4 for 12 bit 512 512 MR images to 3.4 for 9 bit 512 512 angiographic images. This system was also generalized for 2D image sequences to investigate the usefulness of the exploitation of the temporal dimension. For interframe decorrelation, different approaches were considered, including extrapolation- and interpolation-based methods, methods based on local motion estimation, block motion estimation and unregistered decorrelation. The test set consisted of sequences of coronary X-ray angiograms, ventricle angiograms, and liver scintigrams, as well as a (non medical) videoconferencing image sequence. For the medical image sequences, the authors concluded that the interpolation-based methods were superior to extrapolationbased methods and that the estimation of the interframe motion, in general, was not advantageous [19]. Bit-Plane Encoding – Bit-plane encoding in the image domain can be seen as a successive approximation quantization (SAQ) and can lead to a lossless representation. When it is implemented in the image domain, the subsequent bit-planes of the grey-level original image are successively entropy coded and transmitted. Even though it does not outperform the other methods [20], the main interest of this technique is that it enables progressiveness by quality. This method has then been successfully applied for encoding the subband coefficients in wavelet-based coding, enabling scalability functionalities. Multiplicative Autoregression (MAR) Models – MAR is based on the assumption that images are locally stationary and as such, can be approximated by a 2D linear stochastic model [21]. The basic blocks of a MAR encoder are a parameter estimator, a 2D MAR predictor and an entropy coder. A multi-resolution version (MMAR) has also been elaborated [11]. MAR and MMAR techniques have shown to outperform other methods on some datasets, the main disadvantage being in the implementation complexity. More recently, different solutions have been proposed for lossless compression. Among these, the Context-based Adaptive Lossless Image Codec (CALIC) [22] has proved to be the most effective and, as such, it is often taken as the benchmark for performance in lossless medical image compression. CALIC – The basic principle of CALIC is to use a large number of modeling contexts for both (non-linear) adaptive prediction and entropy coding. The problem of context dilution is avoided by decoupling the prediction and coding phases:

4

Current Medical Imaging Reviews, 2006, Vol. 2, No. 2

CALIC only estimates the expectation of the prediction errors conditioned on a large number of compound contexts instead of estimating a large number of error conditional probabilities. Such expectation values are used to correct the prediction of the current pixel value as obtained from spatial prediction, and the resulting residual is entropy encoded using only eight "primary" contexts. These contexts are formed by quantizing a local error energy estimator, which depends on the local gradients as well as on the non corrected prediction error. CALIC was proposed to the ISO/JPEG as a candidate algorithm for the standard [23], and it was able to provide the lowest lossless rate on six out of seven image classes used as test set (medical, aerial, prepress, scanned, video and compound documents). Though, the Low Complexity Context-Based Lossless Image Compression algorithm (LOCO-I) [24] was chosen instead, due to its lower computational cost and competitive performance. The superiority of CALIC with respect to the other state-of-the-art techniques was further proven in another set of tests described in [25]. In particular, it was run on a set of 3679 images including CT, MR, NM, US, IO, CR and Digitized X-rays (DX) and compared, among others, with JPEG-LS [26] and JPEG2000 in lossless mode. Again, results showed that CALIC equipped with an arithmetic coder was able to provide the highest compression rate except for one modality for which JPEG-LS did better. These results are also consistent with those presented in [27, 28]. Lossy Techniques Lossy methods implicitly rely on the assumption that a visually lossless regime can be reached, which allows to obtain high coding gains while preserving those features which determine diagnostic accuracy. Though, this is still a quite ambitious challenge, which encloses many open issues going beyond the boundaries of the field of signal processing. The difficulty in the identification of the features that are relevant for the formulation of the diagnosis of a given pathology based on a given imaging modality is what makes unavoidable the human intervention in the decision process. Systems for automatic diagnosis are still in their early stage, and are undergoing a vast investigation. If it is difficult to automatically extract the features, to quantify the amount of degradation which is acceptable in order to preserve the diagnostically relevant information, is still more ambitious. It is worth to outline that vision related issues of both low level (related to stimulus encoding) and higher levels (concerning perception and even cognitive processes) come into play, which depend on vision-based mechanisms that are still far from being understood. Last but not least, the validation of a lossy technique based on the subjective evaluation by medical experts is quite complex. The number of variables is large and probably they are not mutually independent. This makes difficult to design an ad-hoc psychophysical experiment where a large number of parameters must be controlled. Both the discrete cosine transform (DCT) and the wavelet transform (WT) have been exploited for lossy compression. DCT – A revisitation of the DCT for compression of different types of medical images can be found in [14]. A block-based

Gloria Menegaz

DCT is implemented on the image (the block size is 8 8 pixels). Then the resulting subband samples are reordered by gathering all the samples at the same spatial frequency into disjoint sets. In this way, some kind of subband decomposition is obtained, where the blocks representing the high frequency components have, in general, low energy, and are mostly set to zero after quantization. For the remaining bands, a frequency domain block-wise prediction was implemented. The residual error is further quantized and eventually entropy coded. The same entropy coding method used by JPEG is adopted. The performance was evaluated on thirty medical images (US, angiograms and X-ray) and compared to that of JPEG. The best results were obtained on the angiographic images, for which the proposed algorithm clearly outperforms JPEG, while the improvement was more modest for the others. Multiresolution Decompositions – Multiresolution decompositions have been extensively exploited for coding in different guises. In [29], an adaptive subband decomposition is used for coding ultrasound images of the liver. The rationale of this approach was in the fact that ultrasound images are characterized by a pronounced speckle pattern, which holds a diagnostic relevance as representative of the tissue and as such should, in general, be preserved. The low-rate compression of this type of images by the most widespread coding algorithm, the Set Partitioning Hierarchical Trees (SPIHT) [30] that will be discussed later, taken as the benchmark for performance, in general produces artifacts that smooth the speckle pattern and are particularly visible in areas of low contrast. Instead, the authors proposed an image-adaptive scheme that chooses the best representation for each image region, according to a predefined cost function. Accordingly, the image is split either in space (image domain) or in frequency, or both, such that the "best" basis functions would be selected for each partition. The Space Frequency Segmentation (SFS) starts with a large set of basis and an associated set of quantizers, and then uses a fast tree-pruning algorithm to select the best combination according to a given rate-distortion criterion. In this work, though, only the lossy regime was allowed and tested. The proposed SFS scheme outperformed SPIHT over a wide range of bitrates and, what is important in this framework, this held true for both the objective (PSNR) and subjective quality assessment. Even though the set of subjective experiments performed by the authors for the investigation of the degradation of the diagnostically relevant information can only be considered as an indication, its relevance is in the fact that it strengthened the point that the amount and type of distortion that could be tolerated hardly depends on the image features, and thus on both the imaging modality and the disease under investigation. This was also clearly pointed out in [16]. The amount of degradation that could be tolerated on medical images depends on the degree to which it affects the formulation of the diagnosis. Noteworthy, in their paper the authors also outline that neither pixel-wise measures (like the PSNR) nor Receiver Operating Curves (ROC) would be suitable; the latter because the ROC analysis is only amenable for binary tasks, while a much more complex decisional process is involved in the examination of a medical image, in general.

Trends in Medical Image Compression

Current Medical Imaging Reviews, 2006, Vol. 2, No. 2 5

The key for combining high coding efficiency with lossless performance is scalability. Allowing a progressively refinable up-to lossless representation, systems featuring scalability are flexible tools able to adapt to the user's requests. The critical point is that for an embedded coding algorithm to be competitive with a non embedded one in terms of rate/distortion, a quite effective coding strategy for the significance map must be devised. This has determined the success of the wavelet transform in the field of compression, since the resulting subband structure is such that both intra-band and inter-band relationships among coefficients can be profitably exploited to the purpose. More generally, wavelets are particularly advantageous because they are able to respond to all the requirements summarized in Section II. Wavelet-based Techniques As mentioned in Section II, the demand of the current multimedia clinical framework goes beyond the maximization of the compression ratio, calling for systems featuring specific functionalities. The most important are progressiveness and ROI-based processing. This has determined the success of the wavelet-based techniques, which can be implemented preserving the specificity of the input data (mapping integer to integer values) and thus enabling lossless coding while being well suitable for region-based processing. Furthermore, the non-linear approximation properties of wavelets inspired the design of an effective coding strategy, which has become the most widespread in the field of data compression: the Embedded Zerotree Wavelet based (EZW) coding algorithm [31]. The discrete wavelet transform of a signal performs space-frequency multi-resolution decomposition. In the classical implementation, it is performed by a fast algorithm that successively filters and down-samples the signal in all the spatial dimensions. The decomposition is iterated on the % approximation low-pass band, which contains most of the energy. The forward transform uses two analysis filters, % (low pass) and (band pass), followed by sub-sampling, while the inverse transform first up-samples the subband coefficients and then applies two synthesis filters, h (low pass) and g (band pass). Figure (1) shows a two levels DWT of a synthetic (a-b) and natural (c-d) image. The approximation subband is a coarser version of the original, while the other subbands represent the high frequencies (details) in the horizontal, vertical and diagonal direction, respectively. The spatial similarities among the wavelet coefficients in subbands with same orientation at different resolution levels have been profitably exploited for coding through the so-called quad-trees. These derive from the definition of a parent-children relation linking the coefficients across subbands [31]. Embedded Zerotree Wavelet Based Coding – The EZW is the ancestor of a large number of algorithms, which try to improve its performance and/or to adapt to particular data structures and applications. The basic idea is to exploit the correlation among the wavelet coefficients in different subbands with the same orientation through the definition of parent-children relationships. The core

Fig. (1). Discrete Wavelet Transform. (a)-(b): synthetic image; (c)(d): natural image.

hypothesis (zerotree hypothesis) is to assume that if a wavelet coefficient w at a certain scale is below a given threshold T, then all its descendants (the coefficients in subbands at the analogous position, lower resolution and same orientation) are also insignificant with respect to the same threshold. Scalability is obtained by following a Successive Approximation Quantization (SAQ) scheme, which translates in a bit-plane encoding of subbands. The significance of the wavelet coefficients with respect to a monotonically decreasing set of thresholds is encoded into a corresponding set of significance maps in a two-steps process. The generated set of symbols is entropy coded by an arithmetic coder. The compression efficiency is due to the gathering of the (in)significance information of a set of wavelet coefficients forming a tree into a unique symbol afferent to the root (the ancestor). Among the most relevant evolutions of the EZW is the Set Partitioning in Hierarchical Trees (SPIHT) [32] algorithm. Set Partitioning In Hierarchical Trees – Grounded on the same underlying stochastic model, SPIHT relies on a different policy for partitioning and sorting the trees for encoding. The basic steps of the SPIHT algorithm are partial ordering by magnitude of the wavelet coefficients; set partitioning into hierarchical trees (according to their significance); ordered bit-plane encoding of the refinement bits. During the sorting pass, the data are organized into hierarchical sets based on the significance of the ancestor, the immediate offspring nodes and the remaining nodes. Accordingly, three lists are defined and progressively updated during the "sorting pass": the List of Insignificant Pixels (LIP), the List of Insignificant Sets

6

Current Medical Imaging Reviews, 2006, Vol. 2, No. 2

(LSP) and the List of Significant Pixels (LSP). During the "refinement pass", the value of the significant coefficients (e.g. those belonging to the LSP) is updated to the current quantization level. The SPIHT algorithm, in general, outperforms EZW at the expenses of an increased complexity. Both of them are often assumed to be the benchmark for compression performance, in both the lossless and lossy mode. The proven efficiency of these coding methods has pushed some researchers to use them in combination with subband decompositions resulting from the application of different families of linear or non-linear filters, leading to a large set of algorithms. It is worth to point out here that the lack of a common database to be used for testing the performance of the different algorithms is one of the main bottlenecks for their comparison and classification. Some efforts in this direction are nowadays spontaneously attempted by the researchers in the data compression community. This is extremely important in view of the definition of a standard for multidimensional medical data compression. In what follows, we summarize some of the more relevant contributions of other forms of subband coding are summarized, in order to draw a picture of the state-of-the-art scenario. Rank-Order Polynomial Subband Decomposition – In [33], the EZW coding principle is applied to a subband structure issued from a so-called Rank-Order polynomial subband decomposition (ROPD), applied to X-ray and heart US images. Basically, the images are decomposed into a "lower" and a "higher" frequency subbands. The "approximation" subband is obtained by simply subsampling the image, while the "detail" subband represents the residual after rank-order polynomial prediction of the original samples. By an ad-hoc definition of the prediction polynomials in vector form, both numerical and morphological (non-linear) filters could be modeled. The ROPD algorithm was tested in both the lossless and lossy modes and compared to SPITH, JPEG lossless, a previous version of the algorithm only using morphological filters [34] (Morphological Subband Decomposition, MSD), and a codec based on wavelet/trellis-coded quantization (WTCQ) [35]. ROPD slightly outperformed SPIHT terms of compression ratio, whereas a more sensible improvement was observed with respect to the other algorithms. This held true for all the images of the test set (two MRI images of a head scan and one X-ray of the pelvis). The interest of this work is mostly an the adaptation of the algorithm for ROIbased coding. In this case, a shape-adaptive non linear decomposition [34] has been applied to the region of interest of a typical heart ultrasound image, which was then losslessly coded. Block-Based Wavelet Coding – A solution for coding coronary angiographic images is proposed in [36]. This wavelet-based algorithm reaches lossless as well as scalability by quality functionalities. The wavelet transform is implemented via the lifting steps scheme in the integer version [37] to enable the lossless

Gloria Menegaz

performance. The strategy followed for coding only exploits intra-band dependencies. The authors prove that if the zerotree hypothesis holds, the number of symbols used to code the zero regions with a fixed-size block-based method is lower than the number of zerotree symbols that would be required following an EZW-like approach, for block sizes confined to some theoretical bounds. Basically, the wavelet image is partitioned into a lattice of squares of width v (v = 4 in their implementation). A starting threshold value is assigned to every square of every subband as the maximum power of two below the maximum absolute value of the coefficients in the square, and the maximum of the series of the starting threshold values Tmax is retained. The subbands are then scanned in raster order, as well as the squares within the subbands, to record the significance of the coefficients according to a decreasing set of thresholds ranging from Tmax to zero. The generated set of symbols is encoded using a high-order arithmetic coder. This coding strategy implies an ordering of the coefficients according to their magnitude, enabling scalability functionalities. Such an order can be modified for providing the algorithm region-based processing. After having established the correspondence between a spatial region and the corresponding set of squares, ROI-based coding can be reached by grouping such squares into a logical entity and attributing it the highest priority during the coding process. The algorithm was also tested in the case v = 1 for the sake of comparison. Performance was compared to that provided by a bunch of other methods including CALIC, lossless JPEG, EZW and SPIHT; these latter two were implemented over a subband structure issued by integer lifting [37]. Their results show that the proposed algorithm outperforms the others with the exception of CALIC, which gives very close lossless rates. ROI-Based Coding The most effective wavelet-based coding methods have also been generalized for region of interest processing. In [38-40], an extension of the EZW algorithm for object processing is proposed. Among the main features of such an approach are the finely graded up-to-lossless representation of any object and the absence of discontinuities along the object’s borders at any decoding quality. A detailed analysis of the performance of this algorithm is provided in Sect. VII. A solution for the ROI-based version of SPIHT as well as of a similar algorithm, the Set Partitioning Embedded Block (SPECK) coding [41], respectively referred to as OB-SPIHT and OB-SPECK, are described in [42]. The dataset consists in this case of mammography images digitized at 12 bits/pixel over 4096x5120 pixels. The region of interest is first segmented and then transformed by a Region-Based DWT [43]. The shape of the region of interest was encoded by a two-link shape coding method [44]. RB-SPIHT and RBSPECK are obtained by simply pruning the tree branches falling outside of the objects. The performance was compared to that of SPIHT and JPEG2000 when applied to the entire image, in order to quantify the rate saving provided by RB processing. A pseudo-RB mode was also obtained by running these two reference algorithms on the images with the background pixels set to zero. Results showed that RBSPIHT and RB-SPECK perform quite similarly and that

Trends in Medical Image Compression

focusing on the region of interest improves the efficiency of the coding system, as it was reasonable to expect. To conclude this Section, will be summarized the main insights that can be drawn from the results of the research effort in this field. First, the lack of a common reference database of images representing different imaging modalities and/or pathologies impedes a clear classification of the many algorithms that have been proposed. Second, only the algorithms featuring lossless functionalities are suitable in wide sense, allowing to recover the original information without loss. Lossy techniques could be adopted for specific applications, like education or post-diagnosis archiving. Whether lossy compression affects the diagnostically important image features is a difficult issue that still remains open. 3D Systems Most of the current medical imaging techniques produce three-dimensional data. Some of them are intrinsically volumetric, like MRI, CT, PET and 3D ultrasound, while others describe the temporal evolution of a dynamic phenomenon as a sequence of 2D images or 3D volumes. The huge amount of data generated every day in the clinical environment has triggered considerable research in the field of volumetric data compression for their efficient storage and transmission. The basic idea is to take advantage of the correlation among the data samples in the multi-dimensional space (3D or 4D) to improve efficiency. The most widespread approach combines a multi-dimensional transform with some generalization of a coding algorithm that has proved to be effective in 2D. Here we constrain to still images and volumes. The design of a coding scheme results from the trade-off among the cost functions derived from a set of requirements. Among these is optimal rate-distortion performance over the entire set of bit-rates as well as progressiveness capabilities, either by quality or by resolution. Besides these general requirements, which apply to any coding framework, there are some domain specific constraints that must be fulfilled. In the case of medical imaging, lossless functionality is a must. It is thus desirable that the type of chosen scalability will end up with a lossless representation. Many solutions have been proposed so far. As was the case in 2D, both the DCT and the discrete WT (DWT) have been used for data decorrelation. The main problem with non DWT-based schemes is that they hardly cope with the requirements mentioned above, which make them unsuitable, despite in some cases, they provide a good rate-distortion performance, eventually outperforming the DWT-based ones. Among the wavelet based methods are the following. 3D SPITH – In [45], the 3D version of the SPIHT [32] algorithm is applied to volumetric medical data. The wavelet transform is implemented in its integer version and different filters are tested. The decomposition is performed first along the z axis and then the spatial dimensions are processed. The system was tested on five datasets including MR (chest, liver and head) and CT (skull). The results were compared to those provided by EZW-3D, as well as to those of some two-

Current Medical Imaging Reviews, 2006, Vol. 2, No. 2 7

dimensional techniques including CALIC for the lossless mode. The lossless rates provided by 3D SPITH improve by about 30 – 38% than those of the 2D methods, and it slightly outperforms EZW-3D on almost all the test sets. A similar approach was followed in [46], where the authors also address the problem of context modeling for efficient entropy coding. This algorithm provides a slightly higher coding gain than 3D SPITH on the MR chest set. Noteworthy, since it has often been used as the benchmark for performance evaluation of 3D systems, the EZW-3D algorithm has been tested by many researchers. Among the numerous contributions are those described in [47] and [38, 39, 48-50]. An extended study of the possible architectures is presented in [51], where Shelkens et al. provide a comparative analysis of different 3D wavelet coding systems. After a brief overview on the more interesting stateof-the-art techniques, the authors propose and compare different architectures. These were designed by combining in different manners those tools which have proved to be the most effective, namely quadtrees, block-based entropy coding, layered zero coding and context adaptive arithmetic coding. The wavelet-based coding systems proposed are the 3D extensions of the Square Partitioning (SQP) [52], the Quad Tree Limited (QT-L) and Embedded Block Coding (EBCOT) [53] methods, respectively. More specifically, the Cube-Splitting (CS) algorithm is based on quadtrees [54]; the 3D Quadtree Limited (3D QT-L) combines the use of quadtrees with block-based coding of the significance map [36] and 3D CS-EBCOT [55] integrates both the CS and the layered zero coding strategies. All the wavelet-based systems share the same implementation of the DWT via integer lifting and different filters and/or decomposition depths are allowed along the three axis. The benchmarks for performance were: JPEG 3D, JPEG2K 3D - 3D versions of JPEG and JPEG2000, respectively - 3D SPITH and another 3D subband-based set partitioning block coding (SBSPECK) method [56]. Cube Splitting (CS) – The Cube Splitting algorithm was derived as the generalization for 3D data of the Square Partitioning (SQP) method [52]. As it is the case for the Nested Quadtree Splitting (NQS) [57] and the SB-SPECK, the same coding principle of EZW and SPITH, consisting of a "primary" or "significance" pass and a "refinement" pass, is applied to a subband structure. Intra-band instead of inter-band relationships are exploited for coding. Instead of using quadtrees, the subbands are split into squares. Following the SAQ policy, the significance of each square with respect to a set of decreasing thresholds is progressively established by an ad-hoc operator. If the block is significant with respect to the current threshold, it is further split in four squares over which the test for significance is repeated. The procedure is iterated until the significance of the leaf nodes is isolated. Thus, the significance pass selects all the leaves that have become significant in the current pass. Then, the refinement pass for the significant leaf nodes is performed. Next, the significance pass is restarted to update the entire quad-tree structure by identifying the new significant leaf nodes. In the SQP coder, the significance, refinement and sign information

8

Current Medical Imaging Reviews, 2006, Vol. 2, No. 2

were encoded by adaptive arithmetic coding. The generalization for 3D is straightforward. The resulting CS coder aims at isolating small subcubes possibly containing significant wavelet coefficients. The same coding algorithm as for SQP is applied to oct-trees. A context-adaptive arithmetic coder is used for entropy coding and four, respectively two, context models are used in the significance and refinement passes. Moreover, the 3D data are organized into Group of Frames (GOFs) consisting of either 8,16 and 32 slices to improve the accessibility. 3D Quad Tree Limited (QT-L) – This method is similar to the SQP and CS coders, the main difference being that in this case, the size of the blocks is upper bounded. When the corresponding area has reached a pre-defined minimum value, the partitioning is stopped and the subband samples are entropy coded. A second difference is that the order of encoding of the coefficients that are non significant in the current pass is partially altered by introducing another coding stage called "insignificance pass" [51]. The coefficients classified as non significant during the significance pass are appended to a list named List of Non-significant Coefficients (LNC), and are coded at first at the beginning of the next significance step. The authors motivate this choice on the consideration that the coefficients in the LNC lie in the neighborhood of others that were already found to be significant during one of the previous passes (including the current one), and as such have a high probability to become significant in the next coding step. The significance and refinement passes of the other coefficients follow. Finally, an extended set of contexts is used for both conditioning and entropy coding. CS Embedded Block Coding (CS-EBCOT) – The EBCOT coder proposed in [53] is a block-based coding strategy, which has become quite popular as the one chosen for the new JPEG2000 standard. The coding of the subband coefficients consists of two steps. During the first one, usually referred to as Tier 1 (T1) coding, the subbands are partitioned into blocks and each block is entropy coded according to the Layered Zero Coding technique [30]. In this way, each block is associated to an embedded bitstream featuring scalability by quality. The second coding step, Tier 2 (T2) coding, aims at identifying the set of truncation points of each block such that a given global rate/distortion function is optimized. The T1 pass is articulated in "significance propagation", "magnitude refinement" and "normalization" passes. During the first one, the coefficients that have been classified as non-significant in all the previous passes and have at least one significant coefficient in a predefined preferred neighborhood are encoded. The second refines the quantization of the coefficients that were significant in one of the previous passes. Finally, the third pass processes all the coefficients that are significant regardless their preferred neighborhood. Otherwise stated, T2 performs the Post Compression Rate Distortion (PCRD) optimization by searching the truncation point of every block-wise bitstream in order to reach the minimum distortion for the given rate is [53] refered to for further details. The 3D CS-EBCOT combines the principles utilized in the CS coder with a 3D version of EBCOT.

Gloria Menegaz

3D DCT – The 3D DCT coding scheme represents in some sense the 3D version of the JPEG standard. The three-dimensional DCT is performed on cubic blocks of 8x8x8 voxels. The resulting coefficients are uniformly quantized and scanned for coding along a space filling 3D curve. Entropy coding is implemented by a combination of run-length and arithmetic coding. The DCT-based coder cannot provide lossless performance. Lossy and lossless compression performance was evaluated over five datasets including CT (CT1 with 512x512x100x12 bpp and CT2 with 512x512x44x12 bpp), MRI (256x256x200x12 bpp), PET (128x128x39x15 bpp) and ultrasound images. The conclusions can be summarized as follows. First, the use of the wavelet transform boosts the performance: 3D systems are advantageous when the interslice spacing is such that a significant correlation exists between adjacent slices. Second, for lossless coding, CSEBCOT and 3D QT-L provide the best results for all the datasets. Third, in lossy coding, 3D QTL tends to deliver the best performance when using the 5/3 kernel [58]. At low rates, CS-EBCOT competes with JPEG2000-3D. Conversely, the 3D SPIHT would be the best choice in combination with the 9/7 filter, followed closely by CS and CS-EBCOT. Their results show that the three proposed algorithms provide excellent performance in lossless mode, and ratedistortion results which are competitive with the reference techniques (3D SB-SPECK and JPEG2K-3D, namely the JPEG2000 encoder equipped with a 3D wavelet transform). In summary, the research in the field of medical image compression led to the common consensus that 3D waveletbased architectures are fruitful for compression as long as the correlation along the third dimension is sufficiently pronounced. On top of this, the availability of ad hoc functionalities like fast access to the data and ROI-based capabilities are critical for the suitability of a given system for PACS. In the medical imaging field, the users tend to give their preference to systems which are better tailored on their needs eventually sacrificing some gain in compression. Probably the major drawbacks of 3D systems are decoding delay and computational complexity. A possible shortcut to the solution of this problem is ROI-based coding. ROI-based coding is particularly appealing for 3D data due to the high demand in terms of transmission and/or storage resources. Different solutions have been proposed so far for 3D ROI-based coding. Of particular interest are those described in [39, 59, 60]. Besides the strategy used for entropy coding, the main difference is in the way the regionbased transform is implemented. The different solutions are application-driven and satisfy different sets of constraints. The system presented in [59] combines a shape-adaptive DWT with scaling-based ROI coding and 3D SPIHT for entropy coding. The dataset is partitioned into Group of Pictures (GOPs), which are encoded independently to save run-time memory allocation. The test set consists of a volumetric MR of the chest (256x256x64x8 bpp). Performance is compared to that obtained with a conventional region-based DWT implementation. The advantage of this algorithm is that the number of coefficients to be encoded for

Trends in Medical Image Compression

each region is equal to the number of pixels corresponding to the object in the image domain. A particularly interesting solution is provided in [39, 60]. In this case, ROI-based functionalities are enabled on both 3D objects and user-defined 2D images, allowing a fast access to the information of interest with finely graded granularity. Furthermore, no border artifacts appear in the image at any decoding rate. The 3D ROI-based Multidimensional Layered Zero Coding (MLZC) technique [60] is an application-driven architecture. It was developed on the ground of the fact that despite the availability of advanced rendering techniques, it is still common practice for doctors to analyze 3D data distributions on 2D image at a time. Accordingly, in order to be suitable within PACS, a coding system must provide a fast access to single 2D images. On top of this, the availability of ROI-based functionalities enables to fasten the access to the portion of image which is crucial for diagnosis, permitting a prompt response by the experts in applications like teleradiology. The 2D decoding capabilities are accomplished by independently encoding each subband image, and making the corresponding information accessible through the introduction of some special characters (markers) into the bitstream. In this way, once the user hadz specified the position of the image of interest along the axis, the set of subband images that are needed for its reconstruction is automatically determined and the concerned information decoded. The inverse DWT is performed locally. ROI-based functionality is integrated by assigning subsequent segments of the bitstream to the different objects, according to their priority [39, 40]. This leads to a versatile and highly efficient coding engine allowing to swiftly recover any object of any 2D image of the dataset at a finely graded up to lossless quality. Besides competitive compression rates and novel application driven functionalities, the proposed system enables a pseudolossless mode, where the diagnostically relevant parts of the image are represented without loss, while a lower quality is assumed to be acceptable for the others. Accordingly, the potential of such architecture is in the combination of a 3D transformation providing a concise description of the data, with the possibility to recover single 2D images by decoding only the part of the bitstream holding the necessary information. Besides the efficiency improvement in accessing the information of interest, ROI-based processing enables the parallelization of the encoding/decoding of the different objects. Together with the use of the integer version of the wavelet transform, which is necessary for reaching the lossless mode, this makes the algorithm particularly suitable for the implementation on a device. Last but not least, the analysis of the compression performance shows that the proposed system is competitive with the other state-of-the-art techniques. It is thus a good compromise between the gain in compression efficiency provided by 3D systems and the fast access to the data of 2D ones. The remaining part of this Chapter is devoted to the description of such a system. To conclude this Section, the new trend for data compression: will be briefly discussed the model-based approach. In the perspective of model-based coding, high efficiency can be accomplished on the ground of a different notion of redundancy, grounded on semantic. This discloses a new perspective in the compression philosophy, based on a

Current Medical Imaging Reviews, 2006, Vol. 2, No. 2 9

redefinition of the notion of relevance. If, in general, the goal of a coding system is to represent the information by reducing the mathematical and/or statistical redundancy, the availability of a priori information about the imaged data suggests a more general approach, where the ultimate goal is to eliminate all kinds of redundancy, either mathematic or semantic. To follow a model-based approach means to focus on semantics instead of just considering the data as multidimensional matrices of numbers. This philosophy traces back to the so-called second generation coding techniques [17], and it has inspired many of the last-generation coding systems, like JPEG2000 [1] for still image coding, MPEG4 [2] for video-sequences, and MPEG7 [3] for content-based description of images and database retrieval. Since semantic comes into play, such an approach must be preceded by an image analysis step. In the general case, the problem of object identification and categorization is illposed. Though, in the particular case of medical imaging, the availability of a-priori information about both the image content and its use simplifies the task making the problem tractable. To get a picture of this, just consider an MRI head scan performed in an oncology division. The object of interest will most probably be the brain, while the surrounding tissues and the skull could be considered as noninformative for the investigation under way. The shape of the region of interest could then be progressively updated during the investigation and, for instance, collapse on the tumor and the region of the brain that surrounds it, where the major changes due to the lesion are expected to be. In both cases, it is known a-priori that the object of interest is the brain (or eventually a part of it), that it is located at the center of the image, and which are its shape and textural properties (these referring to the type of scan). In most of the state-of-the-art implementations, the general idea of model-based coding specializes in region of interest coding. Such regions often correspond to physical objects, like human organs or specific types of lesions, and are extracted by an ad hoc segmentation of the raw data. Here it is assumed that the object of interest is given, and [61] id refered to for a survey on medical image analysis. According to the guidelines described above, model based coding consists in combining classical coding techniques on regions where the original data must be preserved (at least up to a given extent) with generative models reproducing the visual appearance of the image in the regions where the lossless constraint can be relaxed. 3D/2D ROI-BASED MLZC: A 3D ENCODING/2D DECODING OBJECT-BASED ARCHITECTURE The coding algorithm described in this Section allows to combine the improvement in compression performance resulting from a fully three-dimensional architecture with a swift access to single imaged objects. In this framework, the qualification of object is used to identify the part of the data which is of interest for the user. Accordingly, it is used to indicate 3D sets of voxels in the 3D ROI-based working mode, a single 2D image in the 3D/2D working mode and a region of a 2D image in the ROI-based 3D/2D working modality, as illustrated in Fig. (1).

10

Current Medical Imaging Reviews, 2006, Vol. 2, No. 2

Gloria Menegaz

Fig. (2). Volumetric data. We call z the third dimension, and assume that the images are the intersections of the volume with a plan orthogonal to z axis.

The data are first decorrelated by a 3D-DWT and subsequently encoded by an ad hoc coding strategy. The implementation via the lifting steps scheme [62] is particularly advantageous in this framework. First, it provides a very simple way of constructing non-linear wavelet transforms mapping integer-to-integer values [37]. Second, perfect reconstruction is guaranteed by construction for any type of signal extension along borders. This greatly simplifies the management of the boundary conditions underlying the independent object processing with respect to the classical filter-bank implementation. Third, it is computationally efficient. It can be shown that the lifting steps implementation asymptotically reduces the computational complexity by a factor 4 with respect to the classical filter-bank implementation [63]. Finally, the transformation can be implemented in-place, namely progressively updating the values of the original samples, without allocating auxiliary memory, which is quite important when dealing with large volumes. The 3D-DWT is followed by SAQ, bit-plane encoding and context adaptive arithmetic coding. Some markers are placed in the bitstream to enable random access. Tuning the coding parameters leads to different working modalities. In particular, 3D encoding/2D decoding capabilities are gained at the expense of a slight degradation in coding gain due to the extra information needed to enable random access to selected segments of the bitstream. The object-based functionality is reached by independently encoding the different objects, which can then be decoded at the desired bitrate. Finally, the working mode featuring both 3D encoding/2D decoding and object-based capabilities is obtained by concatenating one segment of bitstream built according to the rules enabling 2D decoding for each object. The price to pay is an additional over-head, which slightly degrades the compression performance. Though, the possibility to focus the decoding process on a specific region of a certain 2D image allows a very efficient access to the information of interest, which can be recovered at the desired up-to lossless quality. It is believed to be an important feature for a coding system meant to be used for medical

applications, which largely compensates for the eventual loss in compression that could be implied. The next Sections describe more in detail how the different functionalities are reached. In particular, Sect. IV.A illustrates the object-based processing leading to the desired functionalities, while Sect. IV.B comments the 3D analysis/ 2D reconstruction working modality. Section V illustrates the chosen coding strategy as well as its adaptation leading to the different working modes. Object-Based Processing In ROI-based compression systems, the management of objects concerns both the transformation and the coding steps. In the perspective of transformation, it brings up a boundary problem. As discrete signals are nothing but sets of samples, it is straightforward to associate the idea of object to a subset of samples. The issue of boundary conditions is greatly simplified when the DWT is implemented by the lifting steps scheme [62]. In this case, perfect reconstruction is ensured by construction, for any kind of signal extension at the region borders. Nevertheless, perfect reconstruction is not the only issue when dealing with a complete coding system. The goal was to avoid artifacts along borders in all working modalities in order to make object-based processing completely transparent with respect to the unconstrained general case where the signal is processed as a whole. Otherwise stated, the intantion was to have images decoded at a given level of approximation (e.g. quantization) to be exactly the same in the following conditions: (a) the signal is encoded/decoded as a whole and (b) each object is independently encoded and decoded at the same quality (e.g. quantization level). The perfect reconstruction condition is not enough to ensure the absence of artifacts in terms of discontinuities at borders. Since quantized coefficients are approximations of the true values, any signal extension used to reconstruct two adjacent samples belonging to different objects (e.g. lying at the opposite sides of a boundary) would generate a discontinuity. To avoid this, the inverse transform must be performed as if the whole set of true coefficients were

Trends in Medical Image Compression

Current Medical Imaging Reviews, 2006, Vol. 2, No. 2 11

available. The use of the lifting scheme simplifies this task. The idea is to determine which samples are needed at the input of the synthesis chain to reconstruct a given sample at its output. The resulting set of subband coefficients is called Generalized Projection (GP) of the object in a given subband. The corresponding GP operator [64] allowing to automatically select them can be easily generalized for the 3D case. Let GP be the generalized projection operator, GP and let GPη be the set of samples obtained by applying in the direction η = x,y,z. The separability of the transform leads to the following composition rule (1)

GPzyx = GPz {GPy{GPx { }}}

The set of wavelet coefficients to be encoded for each object are those belonging to its generalized projection.

Table 1.

Total Number of Subband Images to Decode for Reconstructing Image k for L = 3

Nk/K

0

1

2

3

4

5

6

7

5/3

34

46

49

46

42

46

42

46

9/7

58

139

111

161

82

147

111

139

subband samples by providing such information to a contextadaptive arithmetic coder [65]. Different solutions are possible for the definition of the conditioning terms, accounting for both local and wide scale neighborhoods [30] is refered to for more details. MLZC Coding Principle

3D Analysis/2D Reconstruction In the 3D system, filtering is successively performed along the x,y and z directions. It is assumed that the 2D images are stacked along the z axis. Then, the positions of the wavelet coefficients corresponding to GP(l,j) in the subband at level l and orientation j in the 1D case map to the positions of the subband images along the z axis in the 3D case. More precisely, GP(l,j)k identifies the z-coordinates of all the images in subband (l,j) that are necessary to recover the image with index k. In this case, the index j selects either low-pass (j = a) or high-pass (j = d) filtering along z. The total number Nk of subband images needed for the reconstruction of the image of index k is given by [38] 1

N k = 4x[GP(L,a)k + GP(L,d)k ] + ∑ [3xGP(l,a)k + 4xGP(l,d)k] l=L-1

(2) Table 1 shows the values for Nk for the 5/3 the 9/7 filters. When using the 5/3 filter, the number of subband images that are needed is between 1/2 and 1/3 of those required when using the 9/7 filter, depending onk. Accordingly, the 5/3 allows a faster reconstruction of the image of interest. MULTIDIMENSIONAL LAYERED ZERO CODING MLZC is based on the Layered Zero Coding (LZC) algorithm [30]. The main differences between LZC and the proposed MLZC algorithm concern the underlying subband structure and the definition of the conditioning terms. In the LZC approach, each subband is quantized and encoded in a sequence of N quantization layers, following the SAQ policy, which confers it scalability by quality features. The LZC method is based on the observation that the most frequent symbol produced by the quantizers is the zero symbol, and achieves high efficiency by splitting the encoding phase in two successive steps: Zero Coding, which encodes a symbol representing the significance of the considered coefficients with respect to the current quantizer and Magnitude refinement, which generates and encodes a symbol defining the value of each non-zero symbol. Zero coding exploits some spatial or other dependencies among

MLZC applies the same quantization and entropy coding policy as LZC to a 3D subband structure. In order to illustrate how the spatial and inter-band relationships are exploited, the concepts of generalized neighborhood and significance state of a given coefficient are used. We define generalized neighborhood of a subband sample c(l,j,k) in subband j of level l and position k the set GN(l,j,k) consisting of both the coefficients in a given spatial neighborhood N(l,j,k) and the parent coefficient c(l+1, j,k’) in the same subband at the next coarser scale, where k’ = [k + 1/2] GN(l, j, k) = N(l, j, k) ∪c(l +1, j, k´)

(3)

The MLZC scheme uses the significance state of the samples belonging to a generalized neighborhood of the coefficient to be coded for conditioning the arithmetic coding [66]. For each quantization level Q, the significance state of each coefficient is determined scanning the subbands starting from the lowest resolution. For the resulting symbol, two coding modes are possible: significance and refinement mode. The significance mode is used for samples that were non-significant during all the previous scans, whether they are significant or not, with respect to the current threshold. For the other coefficients, the refinement mode is used. The significance mode is used to encode the significance map. The underlying model consists in assuming that if a coefficient is lower than a certain threshold, it is reasonable to expect that both its spatial neighbors and its descendants are below a corresponding threshold as well. The significance state of the samples in the generalized neighborhood of c(k,l,j) is represented by some conditioning term χ(⋅). The local-scale conditioning terms χ’(⋅) concern spatial neighborhoods, while inter-band terms χ(⋅) account for inter-band dependencies. The 3D local-scale conditioning terms were obtained by extending to the third dimension the set of the M most effective 2D contexts. Bitstream Syntax The ability to access any 2D image of the set constrains the bitstream structure. In all the modes (G-PROG, LPLPROG and LPL), the subbands are scanned starting from

12

Current Medical Imaging Reviews, 2006, Vol. 2, No. 2

coarsest resolution. The signal approximation is encoded first, and all the subbands at level (l+1) are processed before any subband at the next finer level l. What makes the difference among the considered working modalities are the order of encoding of the subband images and the placement of the markers during encoding. An illustration of the bitstream structure in the different modes is given in Fig. (3). In the figure, H is the bitstream header, and Lki is the ith bitplane of the subband image at position k in a given 3D subband. Global Progressive (G-PROG) Mode The set of quantizers is applied to the whole set of subband images before passing to the next subband. The scanning order follows the decomposition level: all subbands at level l are scanned i before passing to level (l-1). In other words, during step , the quantizer Qi is applied to each image of each subband. This enables scalability on the whole volume: decoding can be stopped at any point into the bitstream. In this mode, the compression ratio is maximized, but the 3D encoding/2D decoding functionality is disabled. Layer-Per-Layer Progressive (LPL-PROG) Mode

Gloria Menegaz

each subband image before switching to the next one. The progressive quality functionalities are sub-optimal on both the single images and the whole volume. This degrades the performance in the lossy mode with respect to the G-PROG mode. Quality scalability could be improved by an ad hoc procedure for rate allocation. This subject is for left future investigation. As previously mentioned, all these configurations have been tested in conjunction with both the 2D and 3D contexts. Though, the desired 3D encoding/2D decoding capabilities constrain the choice to two-dimensional contexts without inter-band conditioning. 3D Object-Based Coding The analysis is restricted to the case of two disjoint regions. For simplicity, the same terminology used in JPEG2000 is adopted and call ROI is called the object of interest and background the rest of the volume. In this implementation, the ROI is identified by a color code in a three-dimensional mask, that is assumed to be available at both the encoder and decoder. The problem of shape representation and coding is not addressed in this work.

This scheme is derived from the G-PROG mode by adding a marker into the bitstream after encoding every quantization layer of every subband image (see Fig. (2)). Since the quantizers are successively applied - as in the GPROG mode - subband-by-subband and, within each subband, image-by-image, progressiveness by quality is allowed on both the whole volume and any 2D image, provided that 2D local-scale conditioning is used. The drawback of this solution is the overloading of the encoded information.

Independent object coding has two major advantages. First, it is suitable for parallelization: different units can be devoted to the processing of the different objects simultaneously. Second, it is expected to improve coding efficiency when the objects correspond to statistically distinguishable sources. In what follows, the generalization of EZW-3D, which has been taken as the benchmark for the object-based performance, and MLZC coding systems for region-based processing are detailed.

Layer-Per-Layer Mode (LPL)

Embedded Zerotree Wavelet based Coding

One way of reducing the overloading implied by the LPL-PROG mode is to apply the whole set of quantizers to

The generalization of the classical EZW technique [31] for independent processing of 3D objects is performed by

Fig. (3). Structure of the bitstream for MLZC coding scheme in the different working modes.

Trends in Medical Image Compression

applying the 3D extension of the coding algorithm to the different objects independently. The definition of the parentchildren relationship is slightly modified with respect to the general case where the entire volume is encoded, to emphasize the semantics of the voxels as belonging to a particular region. Accordingly, the set of descendants of a wavelet coefficient is identified by restricting the corresponding oct-tree to the domain of its generalized object projection GP in all the finer levels. Based on this, semantically constrained definition is derived a for a zerotree root: Definition 1. A subband sample is a zerotree root if all the coefficients which belong to the oct-tree affering to it are non-significant with respect to the current threshold.

Current Medical Imaging Reviews, 2006, Vol. 2, No. 2 13

generalized projection of the considered object. The definition of the conditioning terms is generalized for this case by assuming that the significance state of any sample outside the generalized projection is zero. 3D/2D MLZC This Section illustrates how the MLZC algorithm can be adapted for 3D encoding/2D decoding functionalities. In order to be able to access any 2D image with scalable quality, it is necessary to independently encode each bitplane of each subband image (LPL-PROG mode). This implies the choice of an intra-band two-dimensional context for spatial conditioning to avoid inter-layer dependencies among the coefficients. This is a necessary condition for the independent decoding of the subband images. Quality scalability on a given 2D object in image k is obtained by successively decoding the bitplanes of the subband images that are necessary for its reconstruction. Figure (3) illustrates the corresponding bitstream structure. In the LPL-PROG mode, the bitplane i is encoded for each subband image before switching to the next (i+1) bitplane. Markers separate information concerning the different subband images and the successive bitplanes. Given the index of the image of interest, the concerned portions of the bitstream are automatically identified, accessed and decoded. The required number of subband images depends on the wavelet filter (see Table 1). Accordingly, the 5/3 filter provides a significant saving of decoding time compared to the 9/7. Moreover, while the two filters perform similarly in terms of lossless rate, the 5/3 minimizes the power of the rounding noise implied by the integer lifting. All this makes such a filter particularly suitable for this application.

Fig. (4). Semantic oct-tree.

Fig. (4) illustrates the semantically constrained oct-tree. Given a zerotree candidate point, as the significance of all the descendents lying outside the generalized projection is assumed not to be relevant to the classification of the root as zerotree, the number of successful candidate is expected to increase with respect to the general case when all the descendants within GP are required to be non-significant. This potentially augments the coding efficiency. The inherent embedding resulting from the quantization strategy allows PSNR scalability for any object. Accordingly, each object can be reconstructed with increasing quality by progressively decoding the concerned portion of the bitstream. Multidimensional Layered Zero Coding Very little modifications are needed to adapt the MLZC system for object-based processing. As for the EZW, each coefficient is encoded if and only if it belongs to the

A better trade-off between over-head and scalability could be reached by removing the markers between the bitplanes and only keeping the random access to subband images (LPL mode). In this case, the coding order would be subband-image-by-subband-image, each being represented with the entire set of bitplanes. Though, this would correspond to a sub-optimal embedding and would lead to a degradation of the performance in lossy regime for a given number of decoded bits. 3D/2D Object-Based MLZC The Object-based 3D/2D MLZC system permits random access to any object of any 2D image of the dataset with scalable up-to lossless quality. As was the case for the 3D ROI-based system, each object is assigned a portion of the bitstream, which can be independently accessed and decoded. Then, to preserve quality scalability, the LPL-PROG mode must be chosen for both the ROI and the background in order to obtain the appropriate granularity for the encoded information. Quality scalability on a 2D object of image k is obtained by successively decoding the bitplanes of its generalized projection in the subband images necessary for the reconstruction of the image it belongs to. Such a functionality is obtained by associating to each object one segment of the bitstream having the structure

14

Current Medical Imaging Reviews, 2006, Vol. 2, No. 2

shown in Fig. (3). The global bitstream thus consists of the concatenation of segments of this type, one for every object. Even though here the analysis is restricted to the case where only two objects (the ROI and the background) are present, this is not a constraint and the proposed system is able to handle any number of objects. Given the index of the image of interest, the respective portions of the bitstream are automatically identified, accessed and decoded with the help of the mask. Besides providing competitive compression performance and allowing a fast access to the ROI at a progressively refinable quality, such a fine granularity also permits to easily reorganize the encoded information to get different functionalities. For instance, by changing the order of the coding units, progressiveness by resolution could be obtained. In this case, the bitplanes of the deep subbands must be decoded first, i.e. the corresponding coding units must be at the beginning of the bitstream. RESULTS AND DISCUSSION This Section provides an overview of the results obtained in the different system configurations and working modes. The datasets that were used are presented in Sec. VI.A. The performance of the 3D/2D MLZC system is compared to those of the fully three-dimensional MLZC system in Sec. VI.B. Section VI.C focuses on the performance of the ROI-based system. It is assumed that the data consist of one ROI surrounded by the background. The ROI is segmented by a mask and the objects are encoded independently. In this configuration, the entire set of conditioning terms is allowed, including both 2D and 3D contexts. The benchmark for performance is ROI-based JPEG2000 implemented as described in [53]. The results are also compared with those provided by the extension of the EZW-3D for object processing. Finally, Sec. VI.D is devoted to the 3D/2D ROIMLZC system, integrating both ROI-based functionalities and 2D decoding capabilities. In this case, only 2D spatial conditioning is possible. Inter-band as well as 3D spatial conditioning would indeed introduce the same type of dependencies among subband coefficients impeding the independent decoding of the subband images. Datasets The performance of the MLZC 3D encoding/2D decoding system was evaluated on the four datasets illustrated in Fig. (4): •

Dynamic Spatial Reconstructor (DSR). The complete DSR set consists of a 4D (3D+time) sequence of 16 3D cardiac CT data. The imaging device is a unique ultrafast multi-slice scanning system built and managed by the Mayo Foundation. Each acquisition corresponds to one phase of the cardiac cycle of a canine heart and is composed of 107 images of size 128x128. A voxel represents approximately (0.9 mm3) of tissue.



MRI Head Scan. This volume consists of 128 images of size 256 256 pixels representing the sagittal view of a human head.



MR-MRI Head Scan. This volume was obtained at the Mallinckrodt Institute of Radiology (Washington

Gloria Menegaz

University) [47]. It consists of 58 images of a sagittal view of the head of size 256x256 pixels. Since this dataset has also been used as a test set by other authors [46, 47, 67], it allows to compare the compression performances of the MLZC to those of the other 3D systems. •

Opthalmological angiographic sequence (ANGIO). The ANGIO set is a 3D sequence (2D+time) of angiography images of a human retina, consisting of 52 images of 256 256 pixels.

The different features of the considered datasets make the resulting test set heterogeneous enough to be suitable for characterizing the system. The DSR volume is very smooth and high correlation is exhibited among adjacent voxels along all the three spatial dimensions. This makes it very easy to code and particularly suitable for the proposed coding system. It represents the “best case” test set, for which the coding gain of 3D over 2D systems is expected to be the highest. Conversely, the ANGIO dataset can be considered as the “worst case” for a wavelet-based coding system. The images are highly contrasted: very sharp edges are juxtaposed to a smooth background. Wavelet-based coding techniques are not suitable for this kind of data. The edges spread out in the whole subband structure generating a distribution of non zero coefficients whose spatial arrangement cannot be profitably exploited for coding. This is due to the fact that wavelets are not suitable descriptors of images with sharp edges. The MR-MRI set has been included for sake the of comparison with the results provided by other authors [47]. Nevertheless, it is not considered it as representative of a real situation, because it went through some pre-processing. In particular, it has been interpolated, scaled to isotropic 8-bit resolution and thresholded. Finally, the characteristics of the MRI set lie in between. Noteworthy, the structure and semantics of the MRI images make the volume suitable for an object-based approach to coding. 3D/2D MLZC The performance of the 3D/2D MLZC system was compared to that of different 2D and 3D coding algorithms. The EZW-3D was taken as the benchmark for the 3D case. The MLZC system was characterized by the lossless rates resulting from the complete set of contexts, in each working mode. As expected, the best performance in terms of lossless rate was obtained in the G-PROG mode. Though, because of the inter-band conditioning, the G-PROG mode does not allow 2D decoding. In the LPL and LPL-PROG modes, such functionality is enabled at the expenses of coding efficiency, which decreases because of the additional information to be encoded for enabling random access. One of the constraints posed by 2D decoding is that no inter-band conditioning can be used. Even though the exploitation of the information about the significance of the parent within the subband hierarchy can be fruitful in some cases, results show that the compression performance is not much affected. In the G-PROG mode, the lossless rate is slightly improved for 2D spatial conditioning, while it is basically left unchanged for 3D contexts. Conversely, in the

Trends in Medical Image Compression

Current Medical Imaging Reviews, 2006, Vol. 2, No. 2 15

Fig. (5). Samples of the 3D dataset. First line: DSR images. The brightest region in the middle represents the left ventricle of a canine heart; Second and third lines: MRI of the human head, sagittal view; Bottom line: angiography of a human retina.

LPL-PROG mode, performance is slightly degraded. This is related to the sparseness of the samples in the conditioning space. Due to the smoothness along the z axis, the exploitation of the significance state of the neighboring voxels is fruitful for entropy, coding up to a limit where the dimension of the conditioning space becomes so large that the available samples are not sufficient to be representative of the statistics of the symbols. The point where such a critical condition is reached depends on the characteristics of the dataset and, in particular, on its size. In general, larger volumes take advantage of wider spatial supports and interband conditioning. The observed dependency of the lossless rate on the design parameters of the conditioning terms (i.e. the spatial support and the use of inter-band conditioning) also applies to the two-dimensional version of the MLZC algorithm. Again, the efficiency of the entropy coding increases with the size of the spatial support up to a limit where the sparseness of the conditioning space does not allow an

adequate representation of the statistics of the symbols to be encoded. The bench-mark for 2D systems is the new coding standard for still images JPEG2000. For the old JPEG standard (JPEG-LS), all of the seven available prediction modes were tested and the one providing the best performance (corresponding to k=7 for all the datasets) was retained. Table 2 summarizes the performance of the different algorithms and working modes. As expected, the coding gain provided by the 3D over the 2D systems depends on the amount of correlation and smoothness along the z axis. Accordingly, it is quite pronounced for DSR (16.3%) and MR-MRI (33.06%), for which the LPL mode leads to a sensible rate saving over JPEG2000, while it is lower for both MRI (2.2%) and ANGIO (5.2%). The best compression performance for ANGIO is obtained with JPEG-LS. As mentioned above, such a dataset is not suitable for waveletbased coding, so that other algorithms can easily be more

16

Current Medical Imaging Reviews, 2006, Vol. 2, No. 2

Table 2.

Gloria Menegaz

Lossless Performance with the 5/3 Filter. No Inter-Band Conditioning is Used

G-PROG

LPL-PROG

LPL

2.99

3.11

3.03

2.93

3.08

3.06

4.58

4.63

4.55

4.52

4.60

4.52

2.24

2.28

2.24

2.19

2.23

2.22

4.19

4.23

4.20

DSR

MRI

MR-MRI ANGIO

effective. Nevertheless, the LPL method provides an improvement of about 5% over JPEG2000. The 3D encoding/2D decoding approach can thus be considered as a good tradeoff between compression efficiency and fast access to the data, besides providing new and/or improved functionalities with respect to other state-of-the-art coding algorithms. The other parameter to be considered for the evaluation of the performances of the 3D/2D MLZC system is the decoding delay, which entails the analysis of the complexity. Here the problem of computational efficiency is not addressed. No optimization was performed. Consequently, the decoding time is sub-optimal and as such it is neither meaningful nor representative of what it would be after optimization. As a general comment, even though a more detailed analysis of the complexity is required for the evaluation of the global performance of the system, there is clearly a trade-off between the improvement in compression efficiency and the increase in complexity when switching from 2D to 3D systems. Nevertheless, this does not compromise their usefulness. What is important is the absolute decoding time, namely the time the user has to wait to access the decoded image, rather than the relative increase with respect to the 2D counterpart. It is expected that the system is able to reach a decoding time of less than one second per image after appropriate restructuring. Last but not least, large PACS can easily incorporate high processing power (e.g. a multiprocessor architecture) at a price that is negligible with respect to the whole cost of a PACS system. Therefore, it is considered that the complexity of the method is not a major issue for its exploitability. In our opinion, the proposed method has a high potential, especially in combination with ROI-based functionalities. 3D Object-Based MLZC In medical images, the background often encloses the majority of the voxels. For a typical MRI dataset for instance, about the 90% of the voxels belong to the background. A sensible rate saving can thus be achieved via ROI-based coding. In the framework of ROI-based coding, the weight assigned to a voxel depends on its semantics. This is assumed as the ground for the judicious allocation of available resources (e.g. bit-budget, bandwidth). The

EZW-3D

MLZC-2D

JJ2K

JPEG-LS

2.88

3.56

3.62

3.90

4.46

4.62

4.65

5.10

2.271

2.92

2.95

3.437

4.18

4.41

4.43

3.87

efficiency improvement is thus to be intended in the sense of prioritization of the information to be transmitted. Coding efficiency results from the trade-off between the improvement due to the separation of sources with different statistics and the degradation due to the overhead implied by the border voxels.

Fig. (6). Sagittal view of the MR of a brain: Top left: original image; Top right: object of interest; Bottom left: background; Bottom right: mask.

The performance of the EZW-3D and MLZC was analyzed by comparison with the 2D counterparts, namely EZW-2D and MLZC-2D, as well as with JPEG and JPEG2000. For the head MRI dataset, performance tends to improve when extending the generalized neighborhood used for conditional arithmetic coding, in both the 2D and 3D cases. The same set of experiments was run on the MR-MRI dataset. In general, the trend is the same as for the MRI one.

Trends in Medical Image Compression

Current Medical Imaging Reviews, 2006, Vol. 2, No. 2 17

Table 3 compares the average lossless rates of each of the considered 2D algorithms to those provided by MLZC and EZW-3D, for both datasets. Among the 2D algorithms, MLZC-2D outperforms the others. JPEG2000 results in a lossless rate slightly lower than EZW-2D for MRI. All 2D schemes provide a sensible improvement over JPEG-LS. For MRI, the lowest lossless rate corresponds to the EZW-3D scheme, which in this case slightly outperforms MLZC. Nevertheless, the MLZC method is faster and less computationally demanding than EZW-3D.

emphasizes the performance degradation due to the implicit coding of the mask. Figure (7) points out that the EZW-2D coding scheme represents a good compromise for the trade off between coding efficiency and random access to the objects. This example outlines the importance of the ROI-based approach. For this dataset, only the 19% - on average - of the bitstream corresponding to the entire volume is needed to represent the ROI. The random access to the objects allows fast access to the important information, with considerable improvement in compression efficiency.

Object-Based Performance

Table 4 quantifies the degradation in compression efficiency due to independent object coding. The second and third columns (OBJ and BGND) show the lossless rates for the ROI and the background. The fourth column is the bitrate obtained when encoding the entire volume, and the last one shows the percentage increase of the lossless rates for independent encoding of the objects (OBJ+BGND) with respect to that corresponding to the entire volume (WHOLE). The increase of the lossless rate for independent object coding is measured by the difference between the required rate (OBJ+BGND) and the reference one (WHOLE). The difference between the compression ratios for the cases WHOLE and (OBJ+BGND) is due to two causes. First, the entropy coder performs differently in the two cases because of the different sources. Second, the total number of coefficients to be encoded is larger for (OBJ+BGND) because of the generalized projections of both the object and background. The size of the bitstream increases by about 7% for L=4 in case of separate object handling. According to Table 4, the gain in compression efficiency due to the exploitation of the full correlation among data is about 4-5%. The improvement in compression efficiency provided by MLZC over JPEG2000 depends on the working mode. Taking the (OBJ+BGND) as reference, the corresponding rate reduction is about 2.2%, and 6.3% for JPEG2000-ROI and JPEG2000-IO, respectively.

In the proposed system, the object of interest and the background are encoded independently. Each of them generates a self-contained segment of the bitstream. This implies that the information concerning the object's border is encoded twice, as side information for both the object and the background. In this way, each of them can be accessed and reconstructed as if the whole set of wavelet coefficients were available, avoiding artifacts along the contours for any quantization of the decoded coefficients. ROI-based EZW-2D was assumed as the bench-mark for object-based functionalities. Despite the availability of ROIbased functionalities, JPEG2000 in the implementation [53] was not able to meet all the requirements. In JPEG2000, ROI-based coding is implemented by the MAXSHIFT method in combination with a predefined rate allocation policy. For the sake of performance comparison, the ROIbased coding was performed both following the JPEG2000 MAXISHIFT method, labeled as JPEG2000-ROI mode, and by encoding the ROI and the background independently, labeled as JPEG2000-IO mode. The corresponding lossless rates were compared to those obtained by our EZW-2D object-based system. This set-up allows emphasizing the implicit ROI mask encoding by JPEG2000: even though the mask does not need to be separately coded, its encoding is implied by the exhaustive scanning of the subbands. Results are given in Fig. (6). The global lossless rate in the different conditions is shown as a function of the image index. In particular, the dash-dot line represents JPEG2000ROI, the dashed line corresponds to JPEG2000-IO and the continuous line is for EZW-2D with independent object coding (IO). The curve represents the sum of the lossless rates concerning the ROI and the background. Due to the rate allocation policy, JPEG2000 outperforms EZW-2D in compression efficiency. The drawback is that, as previously mentioned, the codeblocks of the ROI and the background are interlaced in such a way that the ROI-based functionalities are not always achieved. The dashed line represents the total rate needed for independently encoding the ROI and the background by JPEG2000. The gap between the corresponding curve and the one for EZW-2D IO Table 3.

The prioritization of the information inherent to the different objects leads to a significant improvement in coding efficiency when relaxing the lossless constraint in the background region. In this case, the BGND can be encoded/decoded at a lower quality and combined with the object of interest, which is encoded/decoded without loss in the final composed image. Figure (8) gives an example. Both the object and the background were compressed by the MLZC scheme, with context (271) and using inter-band conditioning. The OBJ was decoded at full quality (e.g. losslessly), while the BGND corresponds to a rate of 0.1 bit/voxels for the image the left and 0.5 bit/voxels for that at the right. The corresponding PSNR values are of 27.76 and 33.53 dB, respectively. Reconstructed images respecting the

Lossless Rates for MRI and MR-MRI Datasets

EZW-2D

MLZC-2D

JPEG2K

JPEG-LS

EZW-3D

MLZC

MRI

4.698

4.597

4.651

5.101

4.456

4.457

MR-MRI

2.878

2.848

2.954

3.437

2.271

2.143

18

Current Medical Imaging Reviews, 2006, Vol. 2, No. 2

Gloria Menegaz

Fig. (7). Lossless rates as a function of the position of the 2D images along the axis. Continuous line: EZW-2D; dashed line: JPEG2000 IO (Independent Object); dash-dot line: JPEG2000 ROI. Table 4.

Lossless Rates (LR) for Head MRI. The Filter is 5/3, L=4. Global Conditioning has been Used in the MLZC Mode

LR [bpp]

OBJ

BGND

WHOLE

OBJ+BGND

∆%

EZW-3D

0.9045

3.9012

4.4598

4.8057

+7.75

MLZC (271)

0.9188

3.8868

4.4566

4.8056

+7.83

EZW-2D

0.9327

4.0835

4.6977

5.0162

+6.78

JPEG2K-IO

1.0641

4.0656

4.6511

5.1297

+10.29

JPEG2K-ROI

-

-

4.6511

4.9099

+5.56

lossless constraint in the ROI and preserving a good visual appearance in the background can thus be obtained by decoding only the 20% of the information that would be required for a lossless representation of the whole volume. 3D/2D Object-Based MLZC The performance of the 3D/2D object-based MLZC coding system was analyzed in the LPL-PROG mode with 2D spatial conditioning. JPEG2000 in the ROI-based mode has been chosen for comparison. Both the lossless and lossy were considered. The test set consists of the same two MRI head scans used to characterize object-based functionalities as discussed in the previous Section. Figure (8) shows the images for the decoding rates of 0.1, 0.5 and 1.6 bit/pixel. The last rate is the lossless rate of the ROI as provided by the JPEG2000 encoder. In order to make the comparison with our system, the ROI and the background were independently compressed with JPEG2000

and the respective bitrate was compared to those provided by ROI-based JPEG2000 (see Fig. (6)). The average lossless rates are compared to those provided by MLZC in the LPLPROG mode in Table 5. In particular, the OBJ column gives the lossless rate for the object of interest, when it is encoded independently by the different algorithms, while the WHOLE and (OBJ+BGND) columns provide the lossless rates obtained for the entire volume and the independent coding of the object and the background, respectively. The minimum lossless rate for OBJ is obtained by MLZC for both the datasets. For MRI, JPEG2000 ROI outperforms the proposed system for (OBJ+BGND), reducing the lossless rate of 2.7%. However, in this working mode, the ROI-based functionalities are not completely fulfilled. Conversely, MLZC (OBJ+BGND) outperforms JPEG2000 IO of about 1.6 %, while preserving random access to every object. The comparison of these data with the lossless rates given in Table 5 reveals a slight performance degradation. This is due

Trends in Medical Image Compression

Current Medical Imaging Reviews, 2006, Vol. 2, No. 2 19

Fig. (8). Pseudo-lossless regime for a sample MRI image. The OBJ has been recovered without loss, while the BGND has been decoded at 0.1 bpv (left) and 0.5 bpv (right). The corresponding PSNR values are of 27.76 and 33.53 dB, respectively. Table 5.

Lossless Rates (LR) for MRI and MR-MRI. The Filter is 5/3, L=4. Pure Spatial Conditioning has been Used for the MLZC LPL-PROG Mode

Volume

MRI

MR-MRI

LR [bpp]

OBJ

WHOLE

OBJ+BGND

MLZC (030)

0.959

4.666

5.048

JPEG2K IO

1.064

4.651

5.130

JJ2K ROI

1.259

4.651

4.910

MLZC (030)

0.788

2.310

2.933

JPEG2K OI

1.062

2.950

3.300

JJ2K ROI

1.289

2.950

3.267

to the different choice for the conditioning term, which makes the EZW-3D outperform MLZC in terms of compression. Though, the interest for the 3D/2D coding system lies in the functionalities which are not allowed by the EZW-3D. For the MR-MRI dataset, MLZC (OBJ+BGND) provides the best performances. The corresponding rate saving is about 11% and 12.5% over JPEG2000 ROI and IO, respectively. In summary, ROIbased processing leads to a significant improvement in coding efficiency when it is possible to relax the lossless constraint in the background region. In this case, the background can be encoded/decoded at a lower quality, leading to the pseudo-lossless regime discussed in the previous Sections. CONCLUSION

Fig. (9). ROI-based JPEG2000. Images decoded at different rates. Top-left: 0.1 bpp; Top-right: 0.5 bpp; Bottom-left: 1.0 bpp; Bottomright: Reference rate for lossless ROI.

Coding systems for medical imagery must focus on multi-dimensional data and cope with specific requirements. This constrains the design while facilitating the task through the availability of a priori knowledge about the image content and its semantic. Besides good rate-distortion performance in any working mode, medical image coding systems must feature lossless capabilities as well as scalability (by quality or by resolution). Wavelets have proved to be well suited to the purpose, leading to architectures responding to the majority the requirements.

20

Current Medical Imaging Reviews, 2006, Vol. 2, No. 2

The semantic of the different objects present in the images pushes towards the definition of object-based techniques. Different solutions have been proposed by different researchers so far, each responding to some domain specific requirements. On the same line, grounding on the fact that it is still common practice for medical doctors to analyze the volumes image-by-image, a coding system was proposed providing a fast access to any 2D image without sacrificing compression performance. The integration of ROI-based functionalities leads to a versatile and highly efficient engine allowing to swiftly recover any object of any 2D image of the dataset at a finely graded up to lossless quality. Furthermore, independent object processing and the use of the integer version of the wavelet transform make the algorithm particularly suitable for the implementation on a device. It is believed that the set of functionalities of 3D/2D object-based MLZC system make it well suitable for the integration in PACS and largely compensates for the eventual loss in compression efficiency. We would like to conclude by mentioning one of the most promising research directions. ROI-based capabilities are the first step along the path of model-based coding. The basic idea is to improve the compression performance by combining real and synthetic data. Real data would be classically encoded while synthetic data would be generated according to an ad hoc recipe. It is believed that such a philosophy opens the way to a novel approach to image representation leading to the next generation of intelligent coding systems.

Gloria Menegaz [16]

[17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29]

REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15]

Information technology -- JPEG2000 image coding system: ISO/IEC International Standard 15444-1, ITU Recommendation T800. ISO/IEC JTC 1/SC 29/WG1, 2000. Working document N3156, ISO/IEC JTC SC29/WG11, 1999. Working Document N2861, ISO/IEC JTC 1/SC 29/WG11, 1999. Working Document N5231, ISO/IEC JTC1/SC29/WG11/N5231, 2002. Shelkens AMP: An overview of volumetric coding techniques: ISO/IEC JTC 1/SC29 WG1. Brussel, Vrije Universiteit, 2002. ACR-NEMA: DICOM Digital Imaging and Communications in Medicine, 2003. S. Wong LZ, D. Gooden, H.K. Huang: Radiologic image compression: a review. Proceedings of the IEEE 1993; 83. Y. Ishimitsu HT, K. Doi H. MacMahon J. Akune, H. Yonekawa: Development of a data compression module for didital radiography. Med Biolog Eng Comp 1991. P.Roos, Viergever MA: Reversible interframe compression of medical images: a comparison of decorrelation methods. IEEE Trans on Medical Imaging 1991; 10: 538-547. Engelhorn MW, J.Liu: Improvement in the locally optimum runlength compression of CT images. J of Biomed Eng 1990; 12. Rangayyan RM, Kuduvalli GR: Performance analysis of reversible image compression techniques for high-resolution digital teleradiology. Proceedings of the IEEE 1993; 11. L.Goldberg, M.Wang: Comparative performance of pyramid data structures for progessive image transmission. 1991; 39: 540-548. Wu CS, H.K.Huang, Ho B, Chao J: Full frame cosine transform image compression for medical and industrial applications. Machine Vision Applications 1991; 3. Tai SC, Wu YJ: Medical image compression by discrete cosine transform spectral similarity strategy. IEEE Trans on Information Technology in Biomedicine 2001; 5: 236-243. Rao KR, Hwang C, S.Wenaktraman: Human visual system weighted progressive image transmission using lapped orthogonal transform/classified vector quantization. Opt Engineering 1993; 32.

[30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42]

Cosman PC, C.Tseng, Gray RM, Olshen RA, Moses LE, Davidson HC, Bergin CJ, Riskin EA: Tree-structured vector quantization of {CT} chest scans: image quality and diagnostic accurancy. IEEE Trans on Medical Imaging 1993; 12. M.Kocher, M.Kunt, A.Ikonomopoulos: Second-generation image coding techniques. Proceedings of the IEEE 1985; 73. Dikje MCAV, Peters H, Roos P, M.A.Viergever: Reversible intraframe compression of medical images. IEEE Trans on Medical Imaging 1988; 7. M.A.Viergever, P.Roos: Reversible 3-d decorrelation of medical images. IEEE Trans on Medical Imaging 1993; 12. Jones PW, Rabbani M: Image compression techniques for medical diagnostic imaging systems. J of Digital Imaging 1991; 2. Das M, Burgett R: Losless compression of medical images using 2d multiplicative autoregressive models. IEEE Trans on Medical Imaging 1993; 12. Memon N, Wu X: CALIC - A context based adaptive lossless image codec: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 1996, vol 4, pp 7-10. Sayood K, D.Wu, Memon ND: A context-based, adaptive, lossless/nearly-lossless coding scheme for continuous tone images, ISO working document n 256: ISO/IEC/SC29/WG1. 1995. Weinberger MJ, Seroussi G, Sapiro G: LOCO-I: a low complexity, context-based, lossless image compression algorithm; in Society IC (ed): Int Conference on Data Compression. 1996. Clunie DA: Lossless compression of greyscale medical images effectiveness of traditional and state of the art approaches. Proc of SPIE 2000. Lossless and near-lossless coding of continuous tone still imagesbaseline, ISO/IEC 14495-1, 2000. J.Kivijarvi: comparison of lossless compression methods for medical images. Computerized Medical Imaging and Graphics 1998; 22. Lemahieu I, K.Deneker, VanOverloop J: An experimental comparison of several lossless image coders for medical images: IEEE Data Compression Conference. 1997. Atkins MS, Chiu E, Vaisey J: Wavelet-based space-frequency compression of ultrasound images. IEEE Trans on Medical Imaging 2001; 5. Taubman D, Zakhor A: Multirate 3-d subband coding of video. IEEE Trans on Image Processing 1994; 3. Shapiro JM: Embedded image coding using zerotrees of wavelet coefficients. IEEE Transactions on Signal Processing 1993; 41. Said A, W.A. Pearlman: A new, fast, and efficient image coded based on set partitioning hierarchical trees. IEEE Trans on Circuits and Systems for Video Technology 1996; 6. Vesin J-M, M.Kunt, R.Gruter, O.Egger: Rank-order polynomial subband decomposition for medical image compression. IEEE Trans on Medical Imaging 2000; 2. Egger O, Kunt M: Embedded zerotree based lossless image coding: IEEE International Conference on Image Processing (ICIP). Washington-DC, 1995. Marcellin M: Wavelet/TCQ questionnaire, ISO/IEC, JTC 1/SC29/WG1 X/275, 1997. Munteanu A, Cornelis J, P.Cristea: Wavelet-based lossless compression of coronary angiographic images. IEEE Trans on Medical Imaging 1999; 18. Calderbank AR, Daubechies I, Sweldens W, Yeo B-L: Wavelet transforms that map integers to integers. Appl Comput Harmon Analysis 1998; 18. Menegaz G, Vaerman V, Thiran J-P: Object-based coding of volumetric medical data; in IEEE (ed): Proc of the International Conference on Image Processing (ICIP). 1999, vol 3, pp 920-924. Menegaz G, Thiran J-P: Lossy to lossless object-based coding of 3d mri data. IEEE Trans on Image Processing 2002; 11: 10531061. Menegaz G, Grewe L: 3D/2D object-based coding of head mri data; in IEEE (ed): International Conference on Image Processing (ICIP. Rochester (NY), 2002, vol 1, pp 181-184. Pearlman WA, Islam A: An embedded and efficient lowcomplexity hierarchical image coder. Proc of SPIE 1999; 3653: 294-305. Tahoces PG, Souto M, Vidal JJ, Penedo M, Pearlman W: Regionbased wavelet coding methods for digital mammography. IEEE Trans on Medical Imaging 2003; 22.

Trends in Medical Image Compression [43] [44] [45] [46]

[47] [48] [49] [50] [51] [52] [53] [54] [55]

Barnard HJ: Image and video coding using a wavelet decomposition; Delf, The Netherlands, Delf University, 1994, vol PhD. Pearlman WA.Lu Z: Wavelet video coding of video objects by object-based speck algorithm: Picture Coding Symposium. 2001. Kim Y, Pearlman WA: volumetric medical image compression; in SPIE (ed): Applications of Digital Image Processing XXII. 1999, vol 3808. Xiong Z, Wu X, Yun DY, Pearlman WA: Progressive coding of medical volumetric data using three-dimensional integer wavelet packet transform; in IEEE (ed): Second Workshop on Multimedia Signal Processing. Piscataway, NJ, USA, 1998. Bilgin A, Zweig G, Marcellin MV: Three-dimensional image compression with integer wavelet transform. Applied Optics 2000; 39. Chan KK, Lau CC, Chuang KS, Morioca CA: Visualization and volumetric compression. Proc of SPIE 1991; 1444. Tai SC, Wu YG, Lin CW: An adaptive 3d discrete cosine transform coder for medical image compression. IEEE Transactions on Information Technology in Biomedicine 2000; 4. Wu YG: Medical image compression by sampling dct coefficients. IEEE Trans on Information Tech in Biomedicine 2002; 6. Shelkens P, Munteanu A, Barbarien J, Galca M, X.Giro-Nieto, J.Cornelis: Wavelet coding of volumetric medical datasets. IEEE Trans on Medical Imaging 2003; 22. Auwera G, Van der, Cristea P, Munteanu A, Cornelis J: Waveletbased lossless compression scheme with progressive transmission capability. Int J of Imaging, Science and Tech 1999; 10. Taubman D: High performance scalable image compression with ebcot. IEEE Trans on Image Processing 2000; 9. Shelkens P, Barbarien J, Cornelis J: Compression of volumetric medical data based on cube splitting. Proc of SPIE 2000; 4115. Shelkens P, Giro X, Barbarian J, Cornelis J: 3-D compression of medical data based on cube splitting and embedded block coding: ProRISC/IEEE Workshop. 2000.

Current Medical Imaging Reviews, 2006, Vol. 2, No. 2 21 [56] [57] [58]

[59] [60] [61] [62] [63] [64]

[65] [66] [67]

Wheeler FW: Trellis source coding and memory constrained image coding: Dept Elect Comp Syst Eng. Troy, NY, Polytech. Inst, 2000. Yi R, C.K.Chui: System and method for nested split coding of sparse datasets; in Inc T (ed) California, 1998. Calderbank AR, Daubechies I, Sweldens W, Yeo BL: Lossless image compression using integer to integer wavelet transforms; in IEEE (ed): International Conference on Image Processing (ICIP). 1997. Pearlman WA, I.Ueno: Region of interest coding in volumetric images with shape-adaptive wavelet transform; in SPIE (ed): Image and Video Communication and Processing. 2003, vol 5022. Menegaz G, Thiran J-P: 3D encoding/2D decoding of medical data. IEEE Trans on Medical Imaging 2003; 22: 424-440. Duncan JS, N.Ayache: Medical image analysis: Progress over two decades and the challenges ahead. IEEE Trans on Pattern Analysis and Machine Intelligence 2000; 73. Daubechies I, Sweldens W: Factoring wavelet transform into lifting steps. J Fourier Anal Appl 1998; 41: 247-269. reichel J, Menegaz G, Nadenau M, Kunt M: wavelet transform for embedded lossy to lossless image compression. IEEE Trans on Image Processing 2001; 10: 383-392. Menegaz G: Model-based coding of multi-dimensional data with applications to medical imaging: Signal Processing Institute (ITS). Lausanne, Swiss Federal Institute of Technology (EPFL), 2000, vol PhD. Pennebacker W, Mitchell J, Langdon G, Arps R: An overview of the basic principles of the q-coder adaptive binary arithmetic coder. IBM Journal of Research and Management 1988; 32: 717-726. Triantafyllidis GA, Strinzis MG: A context based adaptive arithmetic coding technique for lossless image compression. IEEE Signal Processing Letters 1999; 6: 168-170. Kim Y, Pearlman WA: Stripe-based sphit lossy compression of volumetric medical images for low-memory usage and uniform reconstruction quality; in IEEE (ed): Proc of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP). 2000, vol 4, pp 2031-2034.