Noise reduction in high dynamic range images

0 downloads 0 Views 5MB Size Report
(LDR) images with different exposures, where ghost artifacts are effectively removed by image .... LDRIs in low light condition as in an indoor room. However, ... Next, we establish correspondence between feature points to be aligned in the ...
Noise reduction in high dynamic range images Tae-Hong Min · Rae-Hong Park · SoonKeun Chang

Abstract Multiple images with different exposures are used to produce a high dynamic range (HDR) image. Sometimes high sensitivity setting is needed for capturing images in low light condition as in an indoor room. However, current digital cameras do not produce a high-quality HDR image when noise occurs in low light condition or high sensitivity setting. In this paper, we propose a noise reduction method in generating HDR images using a set of low dynamic range (LDR) images with different exposures, where ghost artifacts are effectively removed by image registration and local motion information. In high sensitivity setting, motion information is used in generating a HDR image. We analyze the characteristics of the proposed method and compare the performance of the proposed and existing HDR image generation methods, in which Reinhard et al.’s global tone mapping method is used for displaying the final HDR images. Experiments with several sets of test LDR images with different exposures show that the proposed method gives better performance than existing methods in terms of visual quality and computation time. Keywords Ghost removal · High dynamic range image · Image registration · Motion detection · Noise reduction · Radiance map generation

1 Introduction People can perceive a real-world scene with a high dynamic range (HDR) of illumination. However, conventional image sensors and display devices have a limited dynamic range, not capturing and displaying the full range of illumination [1, 2]. ─────────────────────────── T.-H. Min Department of Electronic Engineering, Sogang University, Sinsu-dong 1, Mapo-gu, Seoul 121-742, Korea e-mail: [email protected] R.-H. Park ( ) Department of Electronic Engineering and Interdisciplinary Program of Integrated Biotechnology, Sogang University, Sinsu-dong 1, Mapo-gu, Seoul 121-742, Korea e-mail: [email protected] S. Chang Algorithm Lab., Digital Imaging Business, Samsung Electronics, Co., Ltd., Suwon, Gyeonggido, 443-742, Korea e-mail: [email protected]

If bright and dark regions co-exist in the same scene, these regions tend to be under- or over-saturated. As the picture quality is one of critical factors that customers consider when they compare different models, the HDR of image acquisition and display devices can be attractive to customers. To overcome the limitation on the dynamic range, HDR imaging was introduced [1, 2], meeting the requirements of current and future high-end image capture and display devices. In conventional methods, the HDR radiance map was generated by combining multiple images of the same scene but with different exposures. Tone mapping is used to convert the HDR radiance map into a display format in typical display devices with a limited dynamic range [2]. Because multiple images of the same scene with different exposures are used in generating a HDR image (HDRI), several problems arise due to global and local motions of an object or a camera. When a set of images is captured by a hand-held camera or camcorder, an HDRI is seen double or blurred due to global motion of a camera. To reduce the artifact, a tripod is used during capturing multiple images or an alignment algorithm that is invariant to illumination is applied to the captured images [3–6]. Hardware approach to capturing the HDR radiance map needs only a single shot. However, these capturing systems are limited to image resolution and are difficult to use for common users because of their large size and high cost [2]. In HDR imaging process, some artifact and noise are due to global and local motion and high sensitivity setting. Global motion with a hand-held camera and local motion of an object in the scene can cause the ghost artifact. Image registration and local motion detection are needed for preventing and removing the ghost effect in HDRI generation. If the LDRIs are captured in low light condition, high sensitivity setting causing noise is used for preventing blurring. Image registration is needed to compensate for camera motion when a user captures low dynamic range images (LDRIs) using a hand-held camera. Because exposure values (EVs) are different among LDRIs used in HDR imaging, conventional image registration methods are not appropriate. For aligning the LDRIs used in HDR imaging, a median threshold bitmap (MTB) based alignment method [4], a scale invariant feature transform (SIFT) based registration method [5], and a contrast invariant feature transform (CIFT) based registration method [6] were presented. To compensate for varying exposures [4], Ward proposed a binary image called MTB by thresholding the LDRI based on the grayscale median value. Alignment using an MTB is insensitive to EVs. Translational offset is computed by comparing the bitmaps extracted from LDRIs with an exclusive-OR operator. For reducing the computation time,

Ward used an image pyramid method. An MTB based alignment method shows good performance for translational camera motion with a low computation time. However, this method cannot compensate for rotation and affine camera motion. Tomaszewska and Mantiuk [5] used the SIFT for searching feature points and estimating the correspondence between LDRIs. Based on the correspondence estimated, a transformation matrix between LDRIs is calculated using the direct linear transform (DLT) method [7]. Then, LDRIs are aligned with respect to the reference image. The SIFT is invariant to rotation and scale as well as insensitive to illumination change [5], [8]. So, the SIFT is appropriate to registration between LDRIs with different EVs. However, this method is slower than the MTB method. Geverekci and Gunturk [6] proposed the CIFT for obtaining a contrast signature of a feature detector. They reduced the influence of scene occlusion using the expectation maximization method. The CIFT that is invariant to exposure change and scene occlusion improves the registration performance. However, this method is also slower than the MTB method. Even though the artifact due to global motion of a camera is eliminated, the ghost effect can be seen by local object motion and background change [2], [9–11]. So, motion detection between LDRIs is needed to prevent the ghost artifact. Motion detection based on variance was used for removing the ghost effect [2], where the weighted variance of pixel values over multiple images is calculated and then, object motion is detected by thresholding the weighted variance map. However, it has some drawbacks. A reference image is selected among LDRIs for constructing the radiance map in the object motion region. This region can be degraded by noise and if it is under- or over-saturated then wrong radiance value will be obtained. Moreover, when the contrast between an object and background is small or a moving object contains a textureless region, motion detection may fail. Jacobs et al. detected object motion based on entropy [9]. They extracted local entropy of the radiance of each LDRI, and then added the difference of local entropy values with weighting factors that are computed between LDRIs. This method is insensitive to the contrast between a moving object and background, however it is hard to detect the motion of a textureless object. This method also uses the reference image for constructing the radiance map in the object motion region. The performance of motion detection depends on threshold selection, where the threshold is used to cluster motion regions. Khan et al. iteratively removed the ghost effect using the kernel density estimator [10]. As a weighting factor for constructing the radiance map, this method uses the probability that the pixel is contained in the background. It is effective for noise reduction, however requiring a computational load higher than the other methods. Also it needs the constraint that the number of pixels of the background is larger than that of an object.

Jinno and Okuda estimated the displacements and detected occlusion and saturation regions using a Markov random field model [11]. This method compensates for the effect of occlusion as well as object motion. So, the ghost effect caused by small motion of an object can be reduced effectively. However, it requires a high computational load. Sometimes high sensitivity setting is used for capturing the LDRIs in low light condition as in an indoor room. However, current digital cameras are sensitive to noise in low light condition or high sensitivity setting. Because LDRIs with short exposure are affected by photon shot noise especially, these images are main factors that degrade the signal to noise ratio (SNR) of an HDRI. The weighted sum of radiance values reconstructed from a set of LDRIs is not sufficient for noise reduction. Akyuz and Reinhard [12] proposed a noise reduction method in HDR imaging when LDRIs were captured at high sensitivity setting. They first estimated the radiance map of each LDRI using an inverse camera response curve, and computed the weighted average for reducing the noise. The weighting functions for reducing the noise are defined based on pixel values (by considering over- and under-saturated pixels) and exposure time of LDRIs (because an LDRI with longer exposure has a higher SNR than that with shorter exposure). Then, each radiance map is converted back to the original domain using a camera response curve. Finally, denoised LDRIs converted back to the original domain are weighted and summed for generating the HDR radiance map. Mitsunaga and Nayar [13] proposed the weighting function, which is calculated based on the SNR of radiance value with the assumption that standard deviation of the noise is independent of the pixel value. Bell et al. [14] used the uniform weighting function that shows the good performance in terms of the SNR. However, they did not consider the reliability of the pixel values. In this paper, we propose a noise reduction method in HDR imaging, where noise occurs during LDRI capture. We analyze the characteristics of the proposed method and compare the performance of the proposed and existing noise reduction methods. The rest of the paper is organized as follows. Section 2 describes preprocessing steps consisting of image registration and ghost removal. Section 3 presents a proposed noise reduction method for HDR radiance map generation using an adaptive spatio-temporal smoothing filter. In Section 4, experimental results of tone mapped HDRIs of the proposed and existing methods are compared and discussed in terms of the visual quality and computation time. Finally, conclusions are given in Section 5.

2 Preprocessing Before HDR radiance map generation, preprocessing steps, image registration and motion detection, are needed to reduce ghost artifacts.

2.1 Image registration If a user does not use a tripod when capturing the LDRIs, then ghost effect would occur in an HDRI. So, image registration is needed to prevent the ghost artifact prior to radiance map generation. In HDR imaging, LDRIs have different EVs. For reducing the influence of the exposure change, illumination invariant features are needed for image registration. In our HDR imaging system, image registration between LDRIs with different exposures consists of four steps: corner detection, correspondence estimation, transformation matrix computation, and image warping. Figure 1 shows the block diagram of the proposed Harris corner based image registration technique. The dotted boxes are optional steps for reducing the computation time when LDRIs are of high resolution. First, we extract interest points using the Harris corner detector [15]. For finding corners in an image, intensity structure of the local neighborhood C(x, y) is computed as (1), where Ix and Iy represent gradients of intensity I at pixel (x, y) along the x and y directions, respectively. Gρ signifies a Gaussian kernel with the standard deviation ρ, and Wc denotes the size (assumed to be odd) of the square window, centered at (x, y). The symbol * signifies the convolution operator. By analyzing the eigenvalues (λ1, λ2) of the intensity structure matrix C(x, y), pixel (x, y) is decided whether it is a corner or not. If λ1 and λ2 are large positive eigenvalues, then this pixel is considered as a corner. To reduce a computational load in computing eigenvalues, the corner strength function Rc is computed as [15]

Rc ( x, y) = λ1λ2 − k (λ1 + λ2 ) 2

(2)

= det(C ( x, y)) − k × tr 2 (C ( x, y))

where det(C(x, y)) and tr(C(x, y)) denote the determinant and trace of matrix C(x, y), respectively, and k is a sensitivity parameter. If Rc(x, y) is larger than threshold value, pixel (x, y) is regarded as a corner.

Fig. 1 Block diagram of the proposed image registration method

Next, we establish correspondence between feature points to be aligned in the reference image and the other image using the zero mean normalized sum of squared differences (ZNSSD) SD(i, j), which is computed as (3) [16], where IR(xi, yi) and I k ( x′j , y′j ) represent intensities of the reference image and the other image (kth LDRI, k ≠ R ), respectively. (xi, yi) is ith feature point in the reference image whereas ( x ′j , y ′j ) is jth feature point in kth LDRI. K1 and K2 denote the numbers of feature points in the reference and kth images, respectively. μR,i and μk,j represent the local means of the reference and kth LDRIs, respectively, used to compensate for the exposure

(W C −1) / 2 (W C −1) / 2 ⎡ I x2 ( x + u , y + v) ⎢ ∑ ∑ C ( x, y ) = Gρ * ⎢ (WC −1) / 2u = − (W(CW−C1−) /12) / 2 v = − (WC −1) / 2 ⎢ ∑ I x ( x + u, y + v) I y ( x + u , y + v) ⎢ ∑ ⎣u = − (WC −1) / 2 v = − (WC −1) / 2

(WS −1) / 2

SD(i, j) =



(WS −1) / 2



u =−(WS −1) / 2 v=− (WS −1) / 2 (WS −1) / 2



(WS −1) / 2

∑ [I

u =− (WS −1) / 2 v =− (WS −1) / 2

[{I

R



(W C −1) / 2



(1)

]

( xi + u, yi + v) − μ R,i } − {I k ( x′j + u, y′j + v) − μ k , j }

( xi + u, yi + v) − μ R,i ]

2

R

⎤ I x ( x + u , y + v) I y ( x + u, y + v)⎥ u = − (W C −1) / 2 v = − (W C −1) / 2 ⎥ (W C −1) / 2 (W C −1) / 2 ⎥ 2 I y ( x + u, y + v) ∑ ∑ ⎥ u = − (W C −1) / 2 v = − (W C −1) / 2 ⎦ (W C −1) / 2

(WS −1) / 2



(WS −1) / 2

∑ [I

u =−(WS −1) / 2 v=− (WS −1) / 2

k

2

( x′j + u, y′j + v) − μ k , j

]

,

2

1 ≤ i ≤ K1 , 1 ≤ j ≤ K 2

(3)

difference between LDRIs. Ws is a size of a square window, assumed to be odd, for computing the ZNSSD. Outliers in correspondences are removed by random sample consensus (RANSAC) method [17]. Using selected correspondences between interest points, a transformation matrix can be computed by the DLT method [10]. The transformation matrix H is defined as ⎡ h11 x′m = Hx m = ⎢⎢h21 ⎢⎣h31

h12 h22 h32

(a)

h13 ⎤ h23 ⎥⎥ x m h33 ⎥⎦

(4)

where xm = [ xm ym 1]T and x′m = [ x′m ym′ 1]T are the mth corresponding points between reference and the other LDRIs, respectively. From (4), we can get a pair of equations h11 xm + h12 ym + h13 − h31xm xm′ − h32 ym xm′ − h33 xm′ = 0

(5)

h21 xm + h22 ym + h23 − h31 xm ym′ − h32 ym ym′ − h33 ym′ = 0.

Using K corresponding points that satisfy (5), a homogeneous equation in vector-matrix form can be generated as (6), where K (≤ min( K 1 , K 2 )) denotes the number of inlier correspondences. Then, the transformation matrix H can be computed by performing the least square solution based on singular value decomposition of A. Using the transformation matrix and bilinear interpolation, LDRIs to be aligned are warped [18]. Warped image coordinates are computed using the transformation matrix. Then, to define the pixel values at integer coordinates, gray level interpolation is needed. For gray level interpolation, we use the bilinear interpolation using the gray level of the four nearest neighboring pixels. Figure 2 shows an example of correspondence estimation between the reference LDRI (IR) and the other LDRI (Ik, k = 1). Figure 2(a) shows the corners in IR and I1 detected by Harris corner detector. Figure 2(b) shows the correspondence estimated by ZNSSD between the two LDRIs after removing outliers using RANSAC. In Fig. 2(b), green and red crosses are corners in IR and I1, respectively, and blue lines show the correspondence between corners detected in two images. For reducing the computation time in image registration

⎡ x1 ⎢0 ⎢ Ah = ⎢ M ⎢ ⎢xK ⎢⎣ 0

y1 0 M yK 0

1 0 0 x1 M M 1 0 0 xK

0 y1 M 0 yK

0 − x1 x1′ 1 − x1 y1′ M M 0 − x K x ′K 1 − x K y ′K

− y1 x1′ − y1 y1′ M

− y K x ′K − y K y ′K

(b)

Fig. 2 Corner detection and correspondence estimation (Warehouse). a Corner detection using Harris corner detector. b Correspondence estimation using ZNSSD and RANSAC

for online processing of a high-definition camera, we perform the corner detection and correspondence estimation in resized LDRIs. First, we apply a Gaussian smoothing filter to LDRIs, and then decimate the smoothed images by a factor of n. The influence of noise is reduced by applying a Gaussian filter. By reducing the size of LDRIs, the computation time can be reduced, because the numbers of searching regions and corners are decreased. After correspondence estimation, we multiply the size reduction factor n to correspondence coordinates to compute the transformation matrix and to perform image warping, and then obtain the aligned LDRIs.

⎡ h11 ⎤ ⎢h ⎥ ⎢ 12 ⎥ − x1′ ⎤ ⎢ h13 ⎥ ⎢ ⎥ − y1′ ⎥⎥ ⎢ h21 ⎥ M ⎥ ⎢h22 ⎥ = 0 ⎥⎢ ⎥ − x ′K ⎥ ⎢h23 ⎥ − y ′K ⎥⎦ ⎢ h31 ⎥ ⎢ ⎥ ⎢h32 ⎥ ⎢h ⎥ ⎣ 33 ⎦

(6)

2.2 Motion detection between LDRIs To remove the ghost effect in an HDRI, we modify multi-level threshold maps that were proposed in our previous method [19] to compensate for the exposure difference between LDRIs. Multi-level threshold maps are generated as follows. First, threshold values are computed by dividing the intensity range of each LDRI into N non-overlapping intervals with each interval containing almost the same number of pixels. Multi-level threshold maps Lj(x, y), 1 ≤ Lj(x, y) ≤ N, at pixel (x, y) with jth exposure, are computed by dividing the intensity values of LDRIs into N intensity intervals based on the threshold values Tj,k, 1 ≤ k < N. However, our previous method has some drawback. False motion detection can be made in regions where the intensity values are similar to threshold values [19]. In this paper, we detect the small region of motion that is connected to large motion region, and adaptively select the final motion map to overcome false detection. In our previous work [19], we find the motion region as follows. We use a medium-exposure LDRI as the reference image (i.e., R = 2), then object motion maps Mj(x, y) are computed as ⎧0, for j = R ⎪ M j ( x, y ) = ⎨1, for L R ( x, y ) − L j ( x, y ) ≥ 1, j ≠ R ⎪0, otherwise ⎩

(7)

where the binary map Mj(x, y) represents the object motion map at pixel (x, y) with exposure j. The pixel belonging to a moving object or occlusion region has the binary value of Mj(x, y) = 1 (white), otherwise Mj(x, y) = 0 (black). Lj(x, y) is the multi-level threshold map defined previously and the subscript R represents the reference. The object motion map Mj(x, y) extracted from each LDRI with respect to the reference LDRI (M1(x, y) is obtained using L1(x, y) and LR(x, y), whereas M3(x, y) is obtained using L3(x, y) and LR(x, y)). It is observed that the object motion regions are due to small motion of an object in the foreground as well as the object in the background. Saturation regions appear in the motion map M3(x, y) between the reference and long exposure images. However, pixels having the intensity similar to the threshold values Tj,k may be classified into different labels due to different exposures. Thus, the possibility of false detection is high if the difference (between multi-level threshold maps) is small. The small difference region Bj(x, y) is defined as ⎧0, for j = R ⎪ B j ( x, y ) = ⎨1, for L R ( x, y ) − L j ( x, y ) = 1, j ≠ R . ⎪0, otherwise ⎩

(8)

To overcome this drawback, in this paper, we select the motion region adaptively in the region Bj(x, y). Motion regions in Bj(x, y) are commonly connected to the large difference region Dj(x, y) (between multi-level threshold maps), which is defined as

⎧0, for j = R ⎪ D j ( x, y ) = ⎨1 , for L R ( x, y ) − L j ( x, y ) ≥ 2, j ≠ R . ⎪0, otherwise ⎩

(9)

Based on this observation, we separate the region B′j ( x, y) , which is connected to the region Dj(x, y) = 1, from the region Bj(x, y). The adaptively detected motion maps M ′j ( x, y ) are computed as

⎧1, for D j ( x, y ) = 1 ⎪ M ′j ( x, y ) = ⎨1, for B ′j ( x, y ) = 1 ⎪0, otherwise ⎩

(10)

where the binary map M ′j ( x, y) represents the adaptively detected object motion map at pixel (x, y) with exposure j. It is observed that false detection is reduced, while maintaining correct detection. False detections can be made by other reasons. Motion of a textureless object is hard to detect. To overcome this drawback, we apply binary morphological operations to object motion maps M ′j ( x, y) and group pixels with similar motions. First, we remove isolated pixels, link the disconnected pixels (that is, by setting 0-valued pixels to 1 if they are connected to two different motion regions), and then finally fill the holes that are surrounded by 0-valued pixels. The final object motion maps M ′j′( x, y) are obtained by applying binary morphological operations to M ′j ( x, y). It can be seen that false detections are reduced and textureless objects are clustered to adjacent motion regions.

3 HDR radiance map generation with noise reduction Our radiance map generation method is based on Debevec and Malik’s method [1], in which the object motion map and noise reduction filter are used. According to the ISO setting, the proposed radiance map generation processes are different as follows.

3.1 Case 1: low sensitivity setting In this section, construction of the HDR radiance map with ghost removal in low sensitivity setting is described. We modify Debevec and Malik’s method [1] for radiance map generation, in which the LDRIs with multiple exposures are used with the motion map M ′j′ ( x, y). The modified radiance map generation is expressed as

ln E ( x, y, j ) = ln f −1 ( z j ( x, y)) − ln Δt j

(11)

J

ln Eˆ ( x, y ) =

∑ w( z j =1

j

( x, y ))(ME j ( x, y ) ) ln E ( x, y, j )

J

∑ w( z j =1

j

(12)

( x, y ))(ME j ( x, y ) )

where E(x, y, j) represents the radiance values of jth exposure at point (x, y), and Eˆ ( x, y ) represents the final radiance value generated by weighted averaging the radiance values E(x, y, j) in the logarithm domain. f(·) signifies the estimated camera response curve that describes the relationship between image intensity and scene irradiance. We estimate the camera response curve using Devebec and Malik’s method [1]. zj(x, y) denotes the pixel value (R, G, B) at point (x, y) in the image with jth exposure, and Δtj is jth exposure time. J is the total number of exposures. To reduce the influence of saturated pixels, the weighting function w(·) gives smaller weights to extreme pixel values of the response curve, which is defined as 1 ⎧ ⎪ z − z min , for z ≤ ( z min + z max ) w( z ) = ⎨ 2 ⎪⎩ z max − z , otherwise

(13)

where zmin and zmax are minimum and maximum pixel values of each channel, respectively. The weighting factor for ghost removal at pixel (x, y) with jth exposure MEj(x, y) is defined as follows: ⎧1, for j = R ⎪ ME j ( x, y) = ⎨1, for M ′j′ ( x, y) = 0, j ≠ R ⎪0, for M ′′ ( x, y) = 1, j ≠ R j ⎩

(14)

where M ′j′( x, y) is the final motion map. Using this weighting factor, we exclude the object motion regions that produce the ghost effect.

For reducing effectively noise due to the high sensitivity and low light, we consider the spatially neighboring pixels as well as temporally neighboring pixels in computing the weighted average of LDRIs. Because the test set of five LDRIs is captured with auto-bracketing setting in our cases, noise reduction methods using a spatio-temporal filter in video sequence can be used. Frame rate of a still camera with autobracketing setting is slower than that of video camera. So, the object motion between LDRIs captured by the still camera is large compared to the video sequence. If the spatio-temporal filter is used in this case, a filter size needs to be large for compensating the large object motion. For reducing a computational load, we use the different types of filters in motion and static regions. Conventional noise reduction methods with edge preservation include wavelet based methods [21–23], total variation based methods [24–27], spatial domain based methods [28, 29], and patch based methods [30–32]. In the proposed method, to preserve details in spatial smoothing, a structure adaptive anisotropic filter [33–35] is used. This filter is effective for low light noise reduction with edge preserving and its computational load is comparably low. In the proposed method, different types of filters are selectively used in motion regions and static regions. In motion regions, a structure adaptive spatio-temporal smoothing filter is applied to reduce the ghost effect. In static regions, a structure adaptive spatial smoothing filter is used for each LDRI and then the weighted averaging is performed. 3.2.1 Motion region In motion regions, a spatio-temporal smoothing filter is applied to radiance values E(x, y, j) that are estimated from LDRIs. The HDR radiance map Eˆ ( x, y ) is computed by applying an adaptive smoothing filter and weighting function to radiance values E(x, y, j): Eˆ ( x, y) =

3.2 Case 2: high sensitivity setting If LDRIs are captured at high sensitivity setting with low light condition, these images can be degraded by photon shot noise and readout noise mainly [20]. Because the HDRI generated using noisy LDRIs will have a low SNR, noise reduction is necessary for reliable radiance map generation. In HDR imaging, the weighted average of LDRIs for generating the HDR radiance map has the effect that reduces the noise in LDRIs [1, 12–14]. In previous work, the weighting functions as w(·) in Eq. (13) used in computing the weighted average of LDRIs were proposed mainly for reducing the noise [1, 14]. The weighted average is similar to smoothing the temporally neighboring pixels in video processing. However, weighted average of radiance values reconstructed from a set of LDRIs is not sufficient for noise reduction in low light condition at high sensitivity setting.

J

∑ l =1

(Wx −1) / 2



(Wy −1) / 2



w( zl ( x, y))k x, y (u, v, l )E( x + u, y + v, l )

(15)

u =−(Wx −1) / 2 v=−(Wy −1) / 2 J

∑ l =1

(Wx −1) / 2



(Wy −1) / 2



w( zl ( x, y))k x, y (u, v, l )

u =−(Wx −1) / 2 v=−(Wy −1) / 2

where k x, y (u, v, l ) is the kernel at point (x, y, R) with R denoting the index of the reference LDRI. The weighting function w(·) is the same as in (13). Wx and Wy are sizes of a kernel along the x and y directions, respectively, which are assumed to be odd. The size of the kernel along the j direction is the same as the total number of LDRIs J. To effectively reduce the noise with preserving edges, the kernel k x, y (u, v, l ) should be wide along the direction of homogenous radiance values whereas narrow across the direction of edges. Thus, the kernel k x, y (u, v, l ) is calculated

separately at each point (x, y, R) based on the structure of radiance values. The structure of radiance values are analyzed by a structure tensor SM defined as

(

S M (∇E( x, y, j )) = GM ∗ ∇E( x, y, j )∇E( x, y, j )T

)

z′j ( x, y ) =

(W x −1) / 2

(W y −1) / 2

u = − (W x −1) / 2

v = − (W y −1) / 2



k x , y (u , v) z j ( x + u , y + v)

(W x −1) / 2

(W y −1) / 2

u = − (W x −1) / 2

v = − (W y −1) / 2



(16)

where G M is the Gaussian kernel function and ∇E ( x, y, j ) represents gradients of radiance values E(x, y, j) at pixel (x, y, j) computed as





(21)

k x , y (u , v)

where zj(x, y) denotes the pixel value (R, G, B) at point (x, y) in the image with jth exposure, k x , y (u, v) is the kernel at spatial point (x, y). The kernel kx, y (u, v) is computed based on the eigenvalue analysis of structure tensor SS as

T

⎡ ∂E ∂E ∂E ⎤ ∇E ( x, y, j ) = ⎢ ⎥ . ⎣ ∂x ∂y ∂j ⎦

(17)

k x , y (u , v ) = e

The kernel k x, y (u, v, l ) is computed based on the eigenvalue analysis of the structure tensor SM [33] as

k x , y (u , v, l ) = e



1 [ x −u 2

2 T y −v R −l ]VM Σ M VM [ x −u

y −v R −l ]T

(18)

where VM = [ v1 v 2 v 3 ] is the rotation matrix which consists of eigenvectors as columns, and

ΣM

⎡ 1 ⎢ ⎢σ (λ1 ) =⎢ 0 ⎢ ⎢ ⎢ 0 ⎣⎢

0 1 σ (λ 2 ) 0

⎤ 0 ⎥ ⎥ 0 ⎥ ⎥ 1 ⎥⎥ σ (λ3 ) ⎦⎥

(19)

1 [x − u 2

y − v ]V S Σ S2V ST [ x − u

y − v ]T

(22)

where VS = [ v1 v 2 ] is the rotation matrix that consists of eigenvectors of the structure tensor, and ⎡ 1 ⎢ σ (λ ) 1 ΣS = ⎢ ⎢ 0 ⎢⎣

⎤ ⎥ ⎥ 1 ⎥ σ (λ2 ) ⎥⎦ 0

(23)

is the scaling matrix that consists of eigenvalues of the structure tensor SS. The structure tensor of intensity SS is defined as

(

S S (∇z j ( x, y)) = GS ∗ ∇z j ( x, y)∇z j ( x, y)T

)

(24)

where GS is the Gaussian kernel function and ∇z j ( x, y )

is the scaling matrix whose components are computed based on the eigenvalues of the structure tensor SM. The standard deviation σ (λi ) along the spatio-temporal direction is defined as [34] λ 2 − i+ ⎧ ⎪(σ − σ ) e d 5 + σ , λ > 2d / 5 max min min i σ (λi ) = ⎨ ⎪⎩ σ max , λi ≤ 2d / 5



(20)

where σ max and σ min are the maximum and minimum standard deviation of the Gaussian kernel, respectively. The parameter d is the scale factor that determines the width of the Gaussian kernel.

represents gradient of intensity zj(x, y) at pixel (x, y) computed as T

⎡ ∂z ∂z j ⎤ ∇z j ( x, y) = ⎢ j ⎥ . ⎣ ∂x ∂y ⎦

The HDR radiance map Eˆ ( x, y ) is computed by combining the denoised LDRIs z′j ( x, y) based on Debevec and Malik’s method [1]. The modified radiance map generation is expressed as

∑ w( z′ ( x, y))(Δt )(ln f J

ln Eˆ ( x, y ) =

j =1

j

j

In static regions, a spatial filter is applied to each LDRI and then HDR radiance map generation with the weight average is performed. Denoised LDRI z ′j ( x, y ) is computed as

−1

( z′j ( x, y )) − ln Δt j

)

(26)

∑ w( z′ ( x, y))(Δt ) J

j =1

3.2.2 Static region

(25)

j

j

where J is the total number of exposures, f(·) signifies the camera response curve estimated from multiple exposures, and Δtj is jth LDRI’s exposure time. Temporal smoothing is conducted with weighted averaging of spatially smoothed LDRIs z ′j ( x, y ) . At high sensitivity setting, photon shot noise is dominant. So, LDRI with longer exposure time has a higher SNR than

(a)

(b)

(c)

(d)

Fig. 3 Aligned LDRI results (Warehouse). a Input LDRIs (2 EV spacing, J = 3). b LDRIs aligned by the MTB method. c LDRIs aligned by the SIFT based registration method. d LDRIs aligned by the proposed registration method

that with shorter one. For reducing the influence of the photon shot noise, we multiply the weighting function by Δtj.

4 Experimental results and discussions We analyze the characteristics of the proposed methods and compare the performance of the proposed and existing noise reduction methods in terms of visual quality and computation time.

4.1 Preprocessing: image registration and ghost removal We test several sets of LDRIs with each set consisting of three images (J = 3) with different exposures, which were taken by Samsung GX-20 camera with hand-held and exposure bracketing setting. For performance comparison, two existing methods are considered: MTB method [4] and SIFT based

registration method [5]. For unbiased comparison, Reinhard et al.’s tone mapping method [36] is applied to result images of these three registration methods. We compare the visual quality of HDRIs of existing and proposed methods. We will show results of HDRIs generated from a set of LDRIs with rotational motion as well as translation motion. Figure 3 shows aligned LDRI results performed by existing and proposed registration methods. Figure 3(a) shows input LDRIs with each image having 2 EV spacing. LDRI’s size is 2336×1552. The exposure time of each LDRI is, from left to right, 1/4000, 1/1500, and 1/350 second, respectively. We can see misalignment between LDRIs, in spite of exposure bracketing setting. Camera motion between LDRIs includes rotation as well as translation. Figures 3(b)–(d) show the LDRIs aligned by the MTB, SIFT based, and proposed methods, respectively. The medium-exposure LDRI is used as the reference image (i.e., R = 2) for estimating the camera motion. Left images are aligned short-exposure LDRIs with respect to the reference LDRIs whereas right images are aligned long-exposure LDRIs. After the image registration, the regions having no information are replaced by 0-valued pixels (black regions in image boundaries in Figs. 3(b)–(d)). Based on these black regions, we can confirm that the MTB method estimates only the translation parameters, whereas the SIFT based method and proposed method estimate rotation parameters between two images as well as translation parameters. Figures 4(a)–(c) show the HDRIs using LDRIs, which are aligned by the MTB (Fig. 3(b)), SIFT based (Fig. 3(c)), and proposed (Fig. 3(d)) methods, respectively. Left images are tone mapped HDRIs using aligned LDRIs in Fig. 3 whereas right images are cropped regions from left images, where cropped regions are shown in left images with red boxes. Reinhard et al.’s method [36] is used for tone mapping the radiance map. Because the MTB method cannot compensate for the camera rotation, the ghost effect is shown in Fig. 4(a). The SIFT based registration method and proposed method align LDRIs properly, so the ghost effect caused by misalignment is effectively removed as shown in Figs. 4(b) and 4(c), respectively. Comparing with SIFT based registration [5], the proposed method is sensitive to scale difference among LDRIs. In our test images with exposure bracketing setting, scale difference between LDRIs is small. Thus, the SIFT based method and proposed method have similar performance in terms of the registration accuracy. The computational overhead by compensating for the scale difference in the SIFT based method can be reduced as described in Section 2. Figure 5 shows the HDRIs with different resize parameter n. We can confirm that ghost artifact is not noticeable if the images are reduced by a factor of up to eight (n = 8), with the given resolution of LDRIs in Fig. 3(a). Table 1 shows the computation time for existing and proposed registration methods. All the methods are implemented using MATLAB code running on an AMD computer (2 GB RAM, 2.8 GHz). From Table 1, it can be noted that the proposed method with n = 8 is faster than the SIFT based method, and similar to the MTB method.

Table 1 Comparison of the computation time (image registration)

time (sec)

(a)

(b)

(c)

Fig. 4 Tone mapped HDRIs using aligned LDRIs (Warehouse). a Aligned by the MTB method. b Aligned by the SIFT based registration method. c Aligned by the proposed registration method

n=1

n=2

n=8

n=4

n = 16

Fig. 5 HDRIs aligned by the proposed method with different resize parameter n (Warehouse)

MTB [4]

SIFT based [5]

Proposed (n = 8)

7.35

67.19

7.84

For performance comparison of ghost removal, three existing methods are considered: motion detection based on variance [2], Jacobs et al.’s method [9], and Khan et al.’ method [10]. For unbiased comparison, Debevec and Malik’s camera response recovery method [1] and Reinhard et al.’s tone mapping method [36] are applied to result images of these ghost removal methods. We compare visual quality of HDRIs of existing and proposed methods. Two sets of LDRIs will be shown to compare the performance of ghost removal. Figure 6 shows the HDRIs after ghost removal by the existing and proposed methods, generated from the LDRIs with three different exposures shown in Fig. 6(a). Figure 6(a) shows the three LDRIs with different exposure time (short, medium, and long). Figure 6(b) shows the final HDRI generated from them. Figure 6(c) shows two cropped (enlarged) regions degraded by the ghost effect, where the locations of the regions are shown in Fig. 6(b) in red and blue boxes. In Fig. 6(c), it is noted that object motion and background change cause the ghost effect. If the objects are moved when capturing the LDRI set, then these objects would be located at different locations in different LDRIs. Because the HDR radiance map is generated by merging LDRIs, object motion causes the ghost effect. Similarly, background change causes the ghost effect. Compared with the HDRI in Fig. 6(b) (before ghost removal), we can see that in Fig. 6(d) the ghost effect, caused by small motion of a man in the foreground and walking people in the background, is greatly reduced. Figure 6(d) shows the cropped regions (red boxes in Fig. 6(b)) of HDRIs that contain the small motion of a man in the foreground after ghost removal (by motion detection based on variance, Jacobs et al.’s method, Khan et al.’ method with 10 iterations, and the proposed method). Figure 6(e) show the cropped regions (blue boxes in Fig. 6(b)) of HDRIs that contain the walking people in the background after ghost removal using existing and proposed methods. Figures 7(a)–(e) shows HDRIs generated from another set of LDRIs (Bench) obtained by existing and proposed methods. Left images are tone mapped HDRIs using Bench LDRIs. Top right and bottom right images are cropped regions from left images, where cropped regions are shown in left images with red and blue boxes, respectively. The ghost effect caused by small motion of an object is greatly reduced. Because the existing methods are not effective to detect small motion of a textureless object like a man’s face in foreground, their results are poor. However, the proposed method is effective in this case, where coarse object segmentation by multi-level thresholding and hole filling (morphological) operation are used. From Figs. 6 and 7, it is noted that the proposed method removes the ghost effect more effectively than existing methods.

(a)

(e)

(b)

Fig. 6 Ghost removal in HDRI (Building). a LDRI sequence with three different exposures (J = 3). b Tone mapping results with three LDRIs in a. c Examples of cropped regions degraded by the ghost effect. d Cropped regions of HDRIs after ghost removal that contain small motion. e Cropped regions of HDRIs that contain walking people

Table 2 shows the computation time for existing and proposed ghost removal methods. The proposed method does not use multiplication, division, and floating-point operations for object motion detection. Thus, it is faster than Jocob et al.’s and Khan et al.’s methods. (c)

4.2 Noise reduction

(d)

For performance comparison of noise reduction in HDR imaging, four existing methods are considered: Debevec and Malik’s method [1], Akyuz and Reinhard’s method [12], Mitsunaga and Nayar’s method [13], and Bell et al.’s method [14]. Also we compare visual quality of HDRIs of existing and proposed noise reduction methods. Two sets of LDRIs will be shown to compare the performance of noise reduction. Figure 8 shows tone mapped HDRI results with noise reduction in HDR imaging performed by existing and proposed methods. Figure 8(a) shows the aligned LDRIs consisting of five images (J = 5) with different exposures (1 EV spacing), which were taken by Samsung NV100HD camera. LDRI’s size is 2192×1644. The exposure time of each LDRI is, from left to right and top to bottom, 1/640, 1/320, 1/160, 1/80, and 1/40 second, respectively. These LDRIs are captured in low light condition with ISO 400. Figure 8(b) shows the cropped region of Fig. 8(a), where cropped regions are shown in left images with red boxes. We can observe that LDRIs with shorter exposures are affected by noise, especially. Figure 9 shows the HDR imaging results with noise reduction process using LDRIs in Fig. 8(a). Figures 9(a)–(e) show the HDRI obtained by Debevec and Malik’s method [1],

Table 2 Comparison of the computation time (ghost removal)

time (sec)

(a)

(b)

Variance based [2]

Jacobs et al. [9]

Khan et al. [10]

Proposed

2.54

7.85

277.66

4.56

Akyuz and Reinhard’s method [12], Mitsunaga and Nayar’s method [13], Bell et al.’s method [14], and proposed method, respectively. Left images are tone mapped HDRIs using five LDRIs in Fig. 8(a) whereas right images are regions cropped from left images (red boxes in left images). We can observe that the proposed method shows better performance than existing methods. Simple weighted average functions used in four existing methods are not effective in low light condition at high sensitivity setting. The proposed method uses the information of spatially neighboring pixels as well as temporally neighboring ones without loss of details. So, the proposed method shows better performance than existing methods. Figure 10 shows the HDR imaging results with noise reduction process using another set of LDRIs (Construction site) by existing and proposed noise reduction methods. Left images are tone mapped HDRIs. Top right and bottom right images are cropped regions from left images, where cropped regions are shown in left images with red and blue boxes, respectively. LDRI test set consists of five images (J = 5) with

(c)

(a)

(d)

(e)

Fig. 7 Ghost removal in HDRI (Bench, J = 3). a Ghost effect (without ghost removal). b Variance based method. c Jocobs et al.’s method. d Khan et al.’s method. e Proposed method

(b)

Fig. 8 Noisy LDRIs (Window). a Input LDRIs (1-stop exposure difference, J = 5). b Cropped regions of LDRIs

(a)

(a)

(b) (b)

(c) (c)

(d) (d)

(e) (e)

Fig. 9 HDR image results with noise reduction process (Window). a Debevec and Malik’s method. b Akyuz and Reinhard’s method. c Mitsunaga and Nayar’s method. d Bell et al.’s method. e Proposed method

Fig. 10 HDR image results with noise reduction process (Construction site). a Debevec and Malik’s method. b Akyuz and Reinhard’s method. c Mitsunaga and Nayar’s method. d Bell et al.’s method. e Proposed method

Acknowledgements This work was supported in part by Samsung Electronics Co., Ltd.

REFERENCES 1 2

Fig. 11 Motion and static regions of the LDRI set (Construction site)

3 4

different exposures (1 EV spacing), which were taken by Samsung GX-20 camera. LDRI’s size is 2336×1552 and the exposure time of each LDRI is 1/250, 1/125, 1/60, 1/30, and 1/15 second. These LDRIs are captured in low light condition with ISO 1600. We can observe that the proposed method gives better performance than existing methods in terms of ghost removal as well as noise reduction performance. We apply the different types of filters in motion and static regions. Figure 11 shows the motion and static regions of the LDRI set (Construction site), where white regions are motion regions whereas black regions are static regions. In motion regions, a structure adaptive spatio-temporal smoothing filter is used to reduce the ghost effect. Thus, ghost effect is effectively removed in the HDRI generated by the proposed method.

5

6

7 8 9 10

5 Conclusions This paper proposes a noise reduction method in HDR imaging. Image registration is performed based on the Harris corner detector, and the computation time is reduced by reducing the number of corners and search ranges in the correspondence estimation. The multi-level threshold map is modified for detecting the local motion between LDRIs and for removing the ghost artifact in HDRIs. For reducing the noise in low light condition with high sensitivity setting, we propose an HDR imaging method based on adaptive filtering. In our method, the information of spatially neighboring pixels as well as temporally neighboring ones are used without loss of details whereas existing noise reduction methods in HDR imaging use only the information of temporally neighboring pixels. Experiments with various test image sets show that the proposed noise reduction methods in HDRIs show better performance than other existing methods, in terms of the visual results and computation time.

11

12 13 14 15 16

17

Debevec, P., Malik, J.: Recovering high dynamic range radiance maps from photographs. In: Proc. ACM SIGGRAPH, pp. 369–378, Los Angeles, CA, USA (1997) Reinhard, E., Ward, G., Debevec, P., Pattanaik, S.: High Dynamic Range Imaging: Acquisition, Display, and Image Based Lighting. San Francisco, CA, USA: Morgan Kaufmann (2005) Zitova, B., Flusser, J.: Image registration methods: A survey. Image and Vision Computing 21(11) 977–1000 (2003) Ward, G.: Fast, robust image registration for compositing high dynamic range photographs for handheld exposures. J. Graphics Tools 8(2) 17–30 (2003) Tomaszewska, A., Mantiuk, R.: Image registration for multiexposure high dynamic range image acquisition. In: Proc. 15th International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision, pp. 49–56, Plzen-Bory, Czech (2007) Gevrekci, M., Gunturk, B. K.: On geometric and photometric registration of images. In: Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, pp. 1261–1264, Honolulu, HI, USA (2007) Hartley, R., Zisserman, A.: Multiple View Geometry. Cambridge, UK: Cambridge University Press (2003) Lowe, D. G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2) 91–110 (2004) Jacobs, K., Loscos, C., Ward, G.: Automatic high dynamic range image generation for dynamic scenes. IEEE Computer Graphics and Applications 28(2) 84–93 (2008) Khan, E. A., Akyuz, A. O., Reinhard, E.: Robust generation of high dynamic range images. In: Proc. IEEE Int. Conf. Image Processing, pp. 2005–2008, Atlanta, GA, USA (2006) Jinno, T., Okuda, M.: Motion blur free HDR image acquisition using multiple exposures. In: Proc. IEEE Int. Conf. Image Processing, pp. 1304–1307, San Diego, CA, USA (2008) Akyuz, A. O., Reinhard, E.: Noise reduction in high dynamic range imaging. J. Vis. Commun. Image Representation 18(5) 366–376 (2007) Mitsunaga, T., Nayar, S. K.: Radiometric self calibration. In: Proc. IEEE Computer Vision and Pattern Recognition, pp. 374–380, Ft. Collins, CO, USA (1999) Bell, A. A., Seiler, C., Kaftan, J. N., Aach, T.: Noise in high dynamic range imaging. In: Proc. IEEE Int. Conf. Image Processing, pp. 561–564, San Diego, CA, USA (2008) Harris, C., Stephens, M.: A combined corner and edge detector. In: Proc. Alvey Vision Conf., Manchester, UK, pp. 147–152 (1988) Aschwanden, P., Guggenbuhl, W.: Experimental results from a comparative study on correlation-type registration algorithms. In: Proc. Robust Computer Vision, W. Forstner and S. Ruwiedel, Eds. Karlsruhe, Germany: Wickmann, pp. 268–289 (1992) Fischler, M. A., Bolles, R. C.: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of ACM 24(6)

18 19

20 21

22 23

24 25 26 27

28 29 30 31 32

33 34

35

36

381–395 (1981) Wolberg, G.: Digital Image Warping. Los Almitos, CA, USA: IEEE Computer Society Press (1992) Min, T.-H., Park, R.-H., Chang, S.: Histogram based ghost removal in high dynamic range images. In: Proc. IEEE Int. Conf. Multimedia and Expo, pp. 530−533, New York, USA (2009) Reibel, Y., Jung, M., Bouhifd, M., Cunin, B., Draman, C.: CCD or CMOS camera noise characteristics. Euro. Physical Journal of Applied Physics 21 75−80 (2003) Charnbolle, A., DeVore, R. A., Lee, N.-Y., Lucier, B. J.: Nonlinear wavelet image processing: Variational problems, compression, and noise removal through wavelet shrinkage. IEEE Trans. Image Processing 7(3) 319–335 (1998) Chang, S. G., Yu, B., Vetterli, M.: Spatially adaptive wavelet thresholding with context modeling for image denoising. IEEE Trans. Image Processing 9(9) 1522–1531 (2000) Goossens, B., Pizurica, A., Philips, W.: Wavelet domain image denoising for non-stationary noise and signaldependent noise. In: Proc. IEEE Int. Conf. Image Processing, pp. 1425–1428, Atlanta, GA, USA (2006) Rudin, L., Osher, S., Fatemi, C.: Nonlinear total variation based noise removal algorithm. Phys. D 60 259–268 (1992) Chambolle, A., Lions, P. L.: Image recovery via total variation minimization and related problems. Numer. Math. 76(2) 167–188 (1997) Chambolle, A.: An algorithm for total variation minimization and applications. J. Mathematical Imaging and Vision 20(1/2) 89–97 (2004) Ishii, Y., Saito, T., Komatsu, T.: Denoising via nonlinear image decomposition for a digital color camera. In: Proc. IEEE Int. Conf. Image Processing, vol. 1, pp. 309–312, San Antonio, TX, USA (2007) Smith, S. M., Brady, J. M.: Susan—New approach to low level image processing. Int. J. Comput. Vis. 23(1) 45–78 (1997) Tomasi, C., Manduchi, R.: Bilateral filtering for gray and color images. In: Proc. IEEE Int. Conf. Computer Vision, pp. 839–846, Bombay, India (1998) Muresan, D. D., Parks, T. W.: Adaptive principal components and image denoising. In: Proc. IEEE Int. Conf. Image Processing, vol. 1, pp. 101–104 Barcelona, Spain (2003) Hirakawa, K., Parks, T. W.: Image denoising using total least squares. IEEE Trans. Image Processing 15(9) 2730–2742 (2006) Buades, A., Morel, J.-M.: A non-local algorithm for image denoising. In: Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, vol. 2, pp. 60–65, San Diego, CA, USA (2005) Yang, G. Z., Burger, P., Firmin, D. N., Underwood, S. R.: Structure adaptive anisotropic image filtering. Image and Vision Computing 14(2) 135–145 (1996) Malm, H., Oskarsson, M., Warrant, E., Clarberg, P., Hasselgren, J., Lejdfors, C.: Adaptive enhancement and noise reduction in very low light-level video. In: Proc. IEEE Int. Conf. Computer Vision, 4409007, pp. 1–8, Rio de Janeiro, Brazil (2007) Malm, H., Warrant, E.: Motion dependent spatiotemporal smoothing for noise reduction in very dim light image sequences. In: Proc. IEEE Int. Conf. Pattern Recognition, pp. 135–145, Hong Kong, China (2006) Reinhard, E., Stark, M., Shirley, P., Ferwerda, J.: Photographic tone reproduction for digital images. ACM Trans. Graphics 21(3) 267−276 (2002)