Fusion of Bracketing Pictures - IEEE Xplore

0 downloads 0 Views 15MB Size Report
Abstract. When taking pictures in a night scene with artificial lighting, a very common situation, for most cameras the light coming from the scene is not enough ...
2009 Conference for Visual Media Production

Fusion of Bracketing Pictures Marcelo Bertalm´ıo1 and Stacey Levine2 1 Departamento de Tecnolog´ıas de la Informaci´on y las Comunicaciones, Universitat Pompeu Fabra T`anger 122-140, 08018 Barcelona, Spain. [email protected] 2 Department of Mathematics and Computer Science, Duquesne University Pittsburgh, PA, USA 15282. [email protected] lighting would present the problem just described, which is why the camera usually suggests using the flash, which yields a bright, sharp image but with colors quite different from the ones present in the real scene.

Abstract When taking pictures in a night scene with artificial lighting, a very common situation, for most cameras the light coming from the scene is not enough and we are posed with the following problem: we can use the flash which yields a bright, sharp image but with colors quite different from the ones present in the real scene, or we can either increase the exposure setting so as to let the camera absorb more light, or just keep the normal (short) exposure setting and take a dark picture. If we discard the flash as an option and use instead the exposure bracketing feature available in most photo cameras, we obtain a series of pictures taken in rapid succession with different exposure times, with the implicit idea that the user picks from this set the better looking image. But it’s a quite common situation that none of these images is good enough: in general, good color information is retained from longer exposure settings while sharp details are obtained from the shorter ones. In this work we propose a technique for automatically recombining a bracketed pair of images into a single picture that reflects the optimal properties of each one. The proposed technique uses stereo matching to extract the optimal color information from the overexposed image and Non-local means denoising to suppress noise while retaining sharp details in the luminance of the underexposed image.

Many photo cameras allow the user to perform exposure bracketing, which is to take in rapid succession a series of pictures of the same field of view but using varying shutter speeds, with the idea that the user picks from this set of images the one that has a better compromise between color information and sharp details. But it’s a quite common situation that none of these images is good enough: in general, good color information is retained from longer exposure settings while sharp details are obtained from the shorter ones (see figure 1). In this work we propose a technique for automatically recombining a bracketed pair of images into a single picture that reflects the optimal properties of each one. The proposed technique uses stereo matching to extract the optimal color information from the overexposed image and Non-local means denoising [3] to suppress noise while retaining sharp details in the luminance of the underexposed image. While the method proposed in this paper is intended for still images, in the final section we suggest how to extend it to deal with motion pictures, which suffer from the same limitations, as pointed out in [12, 9]: night shooting on location requires that everything be lit artificially, which is always very time consuming and also may be a big problem if the location covers a wide area or is difficult to access.

Keywords: Exposure bracketing, Image fusion, Denoising, Stereo matching. 1

There are related works on image fusion that deal with, for example, fusing a pair of pictures taken both with and without a flash, or deblurring a long-exposure image with a blurry and non-blurry image pair (see e.g. [14, 6, 7, 15, 16] and references there-in). However, these existing methods cannot be applied to most common scenarios. The current literature on fusing flash/no-flash image sets assume there is no motion, which in general is not the case (see figure 1) whenever the picture features human beings, unless they keep absolutely still. Current work on picture deblurring assumes the blur only comes from camera motion and the scene is static, which again prevents this method from being applied to most pictures featuring people. In this work we aim to handle both static and non-static scenes, with camera and/or subject motion. In [11]

Introduction

Often times there are challenging lighting conditions that prevent an ideal photograph with both clear detail and accurate color from being obtained. In particular, if the lighting is too poor, while objects still might be distinguishable when using a standard shutter speed, the true colors are almost completely lost. The only mechanism for obtaining proper color information would be to use a very slow shutter speed, but this would inevitably cause blurring of the main subjects. This situation is far from uncommon. For instance, for most photo cameras an indoor night scene with (domestic) artificial 978-0-7695-3893-8/09 $26.00 © 2009 IEEE DOI 10.1109/CVMP.2009.13

25

(a)

(b)

Figure 1: (a) Long-exposure image. (b) Short-exposure image.

much better if the histograms of Iu and its luminance Lu are first modified to match those of Io and its luminance Lo respectively. Both histogram modifications are performed globally, and in the case of Iu , it is applied in each of the RGB color channels separately (we have used the histogram matching algorithm described in [10], section 3.2). This yields the new equalized images Iuh and Lhu . Then Iuh is modified so its luminance is exactly Lhu . That is, for each pixel (i, j) we set

the authors propose a fusion method that combines histogram matching with spatial color matching with very good results, but they require the underexposed image to be noise-free, and the spatial color matching of small regions is problematic. The paper is laid out as follows. In section 2, we discuss the proposed automated algorithm. Numerical results are presented in section 3 that demonstrate the effectiveness of the algorithm. In section 4, we draw some conclusions and discuss future work.

I˜u (i, j) = 2

Proposed Algorithm

For the proposed algorithm we work with a bracketed series of two images, one underexposed and one overexposed. A fast shutter speed typically yields the underexposed image, which we will call Iu . This image has sharp details, but very little color information. A slower shutter speed typically yields an overexposed image, Io which retains good color information, but the details are often blurred. Our goal is to automatically adjust the color in Iu to match that of Io while retaining sharp details.

(1)

Now stereo matching can be used to transfer the bright color contained in the overexposed Io onto the preprocessed I˜u . The matching is performed line-by-line via dynamic programming using the following approach, adapted from the one proposed by Cox et al. [4] in the context of dense stereo matching and already used for several other image restoration and enhancement tasks like deinterlacing [1], denoising [2] or demosaickimg [8]. A dense matching of the k th row of I˜u to the set of rows {k − d, k − d + 1, ..., k + d − 1, k + d} in Io is computed via dynamic programming. With this procedure we find, for each pixel (k, p) in the k th row of I˜u , a match in each of the rows {k − d, k − d + 1, ..., k + d − 1, k + d} in Io . Then we create the value I˜uH (k, p) in the image I˜uH by performing the median average of the matches of I˜u (k, p) : I˜uH (k, p) = median{matches of I˜u (k, p) in rows k − d, ..., k + d of Io }.

The main steps in the algorithm are: 1. Perform stereo matching in both the rows and columns of Iu to those of Io to obtain a new image I˜uHV . 2. Denoise and equalize the grayscale luminance Lu of Iu to ˜u. obtain a new luminance L ˜ u to obtain the final 3. Replace the luminance of I˜uHV with L result O.

To avoid vertical or horizontal artifacts (see figure 2), the matching is performed horizontally (row by row) producing I˜uH , and also vertically (column by column,) yielding I˜uV .

Each of these steps require both pre- and post-processing as well as some deviations from the standard approaches, all of which are outlined below. 2.1

Iuh (i, j)Lhu (i, j) . luminance(Iuh (i, j))

Both the horizontal and vertical matchings in the dynamic programming have an associated cost, which is the sum of absolute differences of the neighborhoods of the pixel and its match; so the image I˜uH has an associated cost image C H , and I˜uV has the cost image C V . These costs are used to combine the images I˜uH and I˜uV to create I˜uHV . In particular, for each pixel

Transfer the color from Io via Stereo matching

The purpose of this first step is to extract the color information from Io and transfer it to Iu . The stereo matching performs 26

  |C H (i, j) − C V (i, j)| eV (i, j) = exp − , ρ otherwise eV (i, j) = 1.0,   |C H (i, j) − C V (i, j)| eH (i, j) = exp − . ρ The value ρ is just a positive constant. In other words, I˜uHV is an average where more weight is given to the match (either horizontal or vertical) with less error. This is the same procedure used in [2] to combine spacial and temporal deinterlacing images. Since the purpose of this first step was to extract the color information from Io and transfer it to Iu , one might wonder why is it necessary to use the rather complicated technique just presented instead of a straightforward approach like histogram matching. True, the simplest approach would be to match channel by channel the histogram of Iu to that of Io , but this does not give optimal results. An example can be found in figure 3 (right) where we see that the problems with histogram matching are twofold: the noise becomes magnified, and shifted colors are transferred (notice the wrong hue of yellow and blue in the boy’s shirt). 2.2

Re-adjust the luminance

The second step consists of computing an optimal grayscale ˜ u , that will then be transfered to the luminance component, L image I˜uHV generated above. This new luminance should retain the sharp details in Lu while keeping the brightness distribution of Lo . A direct contrast enhancement applied to Lu would magnify the noise (see figure 4). Therefore we first remove ’extrema’ or edges, textures, and noise, then enhance the contrast, and finally add back the extrema. This is based on a similar procedure proposed in [13] (section 8.4) and is performed using the following approach.

Figure 2: Top: I˜uH , result of matching the rows of fig. 1b to the rows of fig. 1a. Middle: I˜uV , result of matching the columns of fig. 1b to the columns of fig. 1a. Bottom: I˜uHV , combination of the previous two images according to equation 2.

L Non-local means [3] is used to denoise Lu to obtain LN u . Non-local means is a new methodology for denoising that exploits the natural redundancy in images by averaging similar ’patches’ within the image. This non-local technique has been found to preserve edges and details better than typical local smoothing approaches.

(i, j), a combined image I˜uHV is computed using the formula I˜uHV (i, j) = wH (i, j)I˜uH (i, j) + wV (i, j)I˜uV (i, j), where wH (i, j) =

wV (i, j) =

(2)

L This smoothed image LN is now used in two ways. First, a u NL L ’grain image’ G = Lu − LN is obtained which consists u of the information lost by smoothing using NL-means. This is also called the ’method error’ in [3] . Second, the histogram L of LN is modified so that it matches the histogram of Lo , u L,h obtaining LN . u

eH (i, j) , eH (i, j) + eV (i, j) eV (i, j) . + eV (i, j)

The information in the grain image GN L is added to the L,h smoothed, equalized luminance LN to obtain the luminance u that will be used in the final image

eH (i, j)

If C H (i, j) ≤ C V (i, j),

L,h ˜ u = LN L + GN L . u

eH (i, j) = 1.0, 27

(3)

Figure 3: Left: overexposed image (detail). Middle: our final result. Right: the result we would obtain with global histogram matching (see text).

˜ u . Right: if we use histogram matching to modify Lu Figure 4: Left: Lo , luminance of overexposed image (detail). Middle: L directly, without denoising it first, the noise is enhanced visibly (see text).

˜ u is not significantly magnified while the Note that noise in L details are still preserved (see figure 5).

˜ u that we would like This will happen even if the luminance L to impose is different than the luminance of I˜uHV .

2.3

This slight modification is in fact crucial in getting plausible results. Otherwise, the luminance of O is simply the imposed ˜ u , and if this is very different from LHV (which often L ˜ u was obtained globally with histogram happens, since L equalization) then the colors in O will look very different from the colors in Io , which is precisely the opposite of what we would want.

Transfer the luminance to the final image

The final output of the algorithm, O, is generated by replacing ˜ u from equation the luminance of I˜uHV from equation (2) with L (3), but with a slight twist. For each pixel (i, j), we set ˜ u (i, j) − g ∗ L ˜ u (i, j) + g ∗ LHV (i, j) L O(i, j) = I˜uHV (i, j) , LHV (i, j)) (4) where LHV is the luminance of I˜uHV and g is a fixed Gaussian kernel of effective radius σg .

2.4

Parameters

There are three parameters which affect the quality of the final output, and which are related to the amount of noise and motion present in the image.

Note that equation (4) consists of the substraction and addition of local averages of the luminance. If both averages are equal, then this is no different than the earlier computation in equation (1). However, if the averages are indeed different, this ensures that the output O will have a luminance whose average is equal to that of the luminance of the combined matched image I˜uHV .

The stereo matching procedure described in section 2.1 has two key parameters. The first is the number of matches that are averaged, D = 2 ∗ d + 1. If, for instance, there is significant vertical motion (either global or just local) between Iu and Io , 28

(a)

(b)

(c)

(d)

(e)

(f)

L NL Figure 5: (a) Lu , luminance image of fig. 1b . (b) LN = u , Non-local means [3] denoising of Lu . (c) Grain image G NL N L,h Lu − Lu (scaled for visualization purposes). (d) Lo , luminance image of fig. 1a. (e) Lu , obtained by matching the L L,h ˜ u = LN histogram of LN to that of Lo . (f) L + GN L , the luminance that will be used in the final output of our algorithm. u u

the value of D should be large when performing matching of horizontal lines. Otherwise the matching will be poor, since for many pixels the dynamic programming procedure will not find acceptable matches. Of course the same can be said for horizontal motion and matching of vertical lines. In our set-up we use the same value D for horizontal and vertical matching, so larger values of D are needed for any significant motion. But increasing D implies also a significant increase in the computational cost, so a compromise must be reached. The second parameter is σ 2 , which in the original context of stereo matching by Cox et al. [4] was the estimated variance of the

additive Gaussian noise present in the stereo pair. But in our setting σ 2 is related both to noise and motion, because it sets the threshold for the maximum acceptable difference that the neighborhoods of two pixels can have and still be matched1 . Therefore both strong noise and significant motion blur require a larger value for σ 2 . But increasing σ 2 implies a greater tolerance regarding the similarity of the matched pixels, and therefore the overall matching results might be poor. Finally, the third parameter of our algorithm is the effective 1 In [4], differences above this threshold implied an occlusion, i.e. the pixel is visible in one image of the pair but not in the other.

29

steps (histogram matching, luminance transfer, etc.). The results can be found in figures 6-10. Note that these image sets have some challenging characteristics that our algorithm is able to handle. For example, the images in figure 7 feature people whose mouths are open in one image and closed in the other. Moreover, although the subjects in figure 7 move very little, this small motion causes noticeable motion blur due to the exposure time required to properly capture the colors. The physical motion in figure 10 is a bit more problematic, but the proposed algorithm still produces reasonable results. But again, these are very common scenarios and methods that require static scenes and still subjects are not practical. The values for the parameters h and D were fairly consistent, but the images benefitted from a more adaptive σ 2 . A Nonlocal means parameter of h = 1 worked well for the images in figures 6, 7, and the left image in figure 9. The image on the right in figure 9 worked better with h = 2, while the noisier images in figures 8 and 10 did better with a higher value of h = 4. As for the stereo matching parameters, D = 23 was used for all of the images. Values for the parameter σ 2 ranged from 0.3 to 2, and thus were a bit more image dependent. The result in figure 6 was generated using σ 2 = 0.5, figure 7 used values of (from left to right) σ 2 = 2, 1, figure 8 used σ 2 = 1, 1.5, figure 9 used σ 2 = 0.5, 0.75, and figure 10 used σ 2 = 0.3, 1. However, too much motion and/or occlusion may cause problems which can’t be solved simply by increasing σ 2 or D. Figure 10 demonstrates two of these problems. The front of the guitar in the overexposed image in the first set is completely occluded. The overexposed image in the second set is quite blurred and the main subject changes position much more than in the previous image sets. This causes the colors on the guitar to be inaccurate, as well as severe discoloration on the guitar player’s shirt. 4

Figure 6: Top: overexposed image. Bottom: underexposed image. Middle: our final result, obtained by transfering the ˜ u in fig. 5f to the combination image I˜uHV luminance image L in fig. 2c.

In this work we proposed a novel methodology for automatically combining a bracketed set of images (one underexposed, one overexposed) in such a way that yields an image retaining the optimal qualities in each one. In particular, the bright color in the overexposed image is superimposed onto the sharp details contained in the underexposed one. The algorithm can handle moving subjects as well as minor occlusions which was not possible in previous works.

radius h of the neighborhood used for averaging in the Nonlocal means denoising method [3]: noisier images require a larger h, but too large a value for h may cause over-smoothing and this problem is not solved by adding back the grain image GN L . 3

Conclusion

There are still improvements that can be made to the algorithm. A mechanism for automatically estimating the parameters would be highly desirable, especially given the variability of some of the parameters. The Non-local means parameter h can be directly tied to the noise level, but the parameter σ 2 is a bit more difficult to estimate. Furthermore, although combining the horizontal and vertical stereo matching reduces line artifacts, the approach is still a one dimensional method trying to match two dimensional objects. A two dimensional matching algorithm would be very beneficial in this setting, although far from trivial. We also note that large occlusions

Numerical Results

We have run our algorithm on several pairs of exposure bracketing images taken with a consumer camera. Using nonoptimized code on a 3GHz, 1Gb PC, the computational cost for each 770 × 430 image is of 170 seconds for the Nonlocal means denoising, 30 seconds (D = 3) to 180 seconds (D = 23) for the stereo matching, and negligible for the other 30

Figure 7: Top: overexposed image. Bottom: underexposed image. Middle: our fusion result.

pose a problem as demonstrated in figure 10.

Dynamic Range (HDR) images from a sequence of differentexposure photographs [5], a setting where camera and subject motion are also very relevant and hard to deal with.

This work has an interesting potential extension to video processing. In particular, if the first frame F0 of a video is obtained with a long exposure time and the remaining frames, Fi for i = 1, ..., T , with short exposure, we can apply the proposed method in the following way. First apply the algorithm described in section 2 to the pair (F0 , F1 ) obtaining F˜1 , then successively to each pair (F˜n−1 , Fn ) obtaining F˜n . In theory, the color from F0 should be transferred to all remaining frames, while they still retain a good level of detail. But occlusion can be a problem as soon as the frames are too far apart, and also as the geometric configuration of the scene changes over time so will the global illumination configuration. This will be the subject of further work.

Acknowledgements Thanks to Lisandro Cilento and Luis S´anchez for their help. The first author acknowledges partial support by PNPGC project, reference MTM2006-14836. The second author is funded in part by NSF-DMS #0505729. References [1] C. Ballester, M. Bertalm´ıo, V. Caselles, L. Garrido, A. Marques, and F. Ranchin. An Inpainting-Based Deinterlacing Method. IEEE Transactions on Image Processing, 16(10):2476–2491, 2007.

Finally, we are currently investigating the application of the method proposed in this paper to the formation of High

31

Figure 8: Top: overexposed image. Bottom: underexposed image. Middle: our fusion result.

´ [2] Marcelo Bertalm´ıo, Vicent Caselles, and Alvaro Pardo. Movie denoising by average of warped lines. IEEE Trans. Image Process., 16(9):2333–2347, 2007.

[7] R. Fattal, M. Agrawala, and S. Rusinkiewicz. Multiscale shape and detail enhancement from multi-light image collections. ACM Transactions on Graphics, 26(3):51, 2007.

[3] A. Buades, B. Coll, and J. M. Morel. A review of image denoising algorithms, with a new one. Multiscale Model. Simul., 4(2):490–530, 2005.

[8] S. Ferradans, M. Bertalm´ıo, and V. Caselles. GeometryBased Demosaicking. IEEE Transactions on Image Processing, 18(3):665–670, 2009.

[4] I.J. Cox, S.L. Hingorani, S.B. Rao, and B.M. Maggs. A maximum likelihood stereo algorithm. Computer vision and image understanding, 63(3):542–567, 1996.

[9] G. Haro, M. Bertalm´ıo, and V. Caselles. Visual acuity in day for night. International Journal of Computer Vision, 69(1):109–117, 2006.

[5] P. Debevec and J. Malik. Recovering high dynamic range radiance maps from photographs. In Proc. of the 24th annual conf. on Computer graphics, pages 369–378, 1997.

[10] D.J. Heeger and J.R. Bergen. Pyramid-based texture analysis/synthesis. In Proceedings of the 22nd annual conference on Computer graphics and interactive techniques, pages 229–238. ACM New York, NY, USA, 1995.

[6] E. Eisemann and F. Durand. Flash photography enhancement via intrinsic relighting. ACM Transactions on Graphics (TOG), 23(3):673–678, 2004.

[11] J. Jia, J. Sun, C.-K. Tang, and H. Shum. Bayesian correction of image intensity with spatial consideration. 32

Figure 9: Top: overexposed image. Bottom: underexposed image. Middle: our fusion result.

[16] Lu Yuan, Jian Sun, Long Quan, and Heung-Yeung Shum. Progressive inter-scale and intra-scale non-blind image deconvolution. In SIGGRAPH ’08: ACM SIGGRAPH 2008 papers, pages 1–10, New York, NY, USA, 2008. ACM.

In Proceedings of ECCV 2004, number 3023 in LNCS, 2004. [12] S. Lumet. Making movies. Alfred A. Knopf, 1995. [13] R. Palma-Amestoy, E. Provenzi, M. Bertalm´ıo, and V. Caselles. A perceptually inspired variational framework for color enhancement. IEEE Transcations on Pattern Analysis and Machine Intelligence, 31(3):458– 474, 2009. [14] G. Petschnigg, M. Agrawala, Hoppe. H., R. Szeliski, M. Cohen, and K. Toyama. Digital photography with flash and no-flash image pairs. ACM Transactions on Graphics, 23(3):664–672, 2004. [15] Lu Yuan, Jian Sun, Long Quan, and Heung-Yeung Shum. Image deblurring with blurred/noisy image pairs. IACM Transactions on Graphics, 26(3):1–10, 2007. 33

Figure 10: When there is significant occlusion (left) or motion (right) from the over to the underexposed image, then the color matching presents problems (e.g. see the singer’s guitar). Top: overexposed image. Bottom: underexposed image. Middle: our fusion result.

34