Efficient Implementation of Neural Network ... - Semantic Scholar

3 downloads 220 Views 269KB Size Report
Guiwon Seo, Hyunsoo Choi and Chulhee Lee. Dept. ... conversion of interlaced video into progressive video is required in many applications and a number of ...
Efficient Implementation of Neural Network Deinterlacing Guiwon Seo, Hyunsoo Choi and Chulhee Lee Dept. Electrical and Electronic Engineering, Yonsei University 134 Shinchon-dong Seodeamun-gu, Seoul 120-749, Korea Rep. ABSTRACT Interlaced scanning has been widely used in most broadcasting systems. However, there are some undesirable artifacts such as jagged patterns, flickering, and line twitters. Moreover, most recent TV monitors utilize flat panel display technologies such as LCD or PDP monitors and these monitors require progressive formats. Consequently, the conversion of interlaced video into progressive video is required in many applications and a number of deinterlacing methods have been proposed. Recently deinterlacing methods based on neural network have been proposed with good results. On the other hand, with high resolution video contents such as HDTV, the amount of video data to be processed is very large. As a result, the processing time and hardware complexity become an important issue. In this paper, we propose an efficient implementation of neural network deinterlacing using polynomial approximation of the sigmoid function. Experimental results show that these approximations provide equivalent performance with a considerable reduction of complexity. This implementation of neural network deinterlacing can be efficiently incorporated in HW implementation. Keywords: neural networks, deinterlacing, HW implementation, polynomial approximation

1. INTRODUCTION Most broadcasting systems have employed an interlaced scanning, which makes it possible to reduce bandwidth while doubling the frame rate [1]. However, interlaced scanning causes undesirable artifacts such as jagged patterns, flickering, and line twitters [9]. These artifacts may degrade video quality. Furthermore, most recent TV monitors utilize flat panel technologies such as LCD or PDP monitors. Consequently, conversions between interlaced and progressive video sequences are required in many applications. Due to its importance, a number of deinterlacing methods have been proposed [1-6]. These techniques can be roughly classified into two categories: intra-field methods and inter-field methods. Intra-field deinterlacing methods use only the pixel values of the current frame. Although their performance might not be optimal, they have been widely used since their requirements for memory and computing power are manageable. In order to process large video data in real time, this is a very important problem. The inter-field deinterlacing algorithms utilize the information from adjacent fields to fill in missing lines. Although they provide improved performance than intra-field deinterlacing, they require high computational complexity. Moreover, in motion-compensated deinterlacing methods, inaccurate motion estimation may cause artifacts which degrade perceptual video quality, though overall PSNR is good. Recently, several neural network deinterlacing methods have been proposed with promising results [3-6]. If a neural network deinterlacing method uses a number of fields as input, it can be classified as inter-field deinterlacing. However, since motion-compensated deinterlacing methods are computationally expensive, the neural network deinterlacing method emerges as a promising solution. With the advancement of display and transmission technologies, high resolution video programs such as HDTV become widely available. With such high resolution video contents, the amount of video data to be processed in real time is very large. Consequently, the processing time and HW complexity are important issues when deinterlacing is performed at display monitors. A problem with neural network deinterlacing methods is that they require sigmoid functions which are expensive to implement. In this paper, we propose an efficient implementation of neural network deinterlacing methods using polynomial approximation of the sigmoid function. We tested a number of polynomial functions and analyzed their performance. Experimental results show that these approximations provide equivalent performance with a considerable reduction of complexity. This implementation of neural network deinterlacing can be efficiently incorporated in HW implementation.

Image Processing: Algorithms and Systems VII, edited by Jaakko T. Astola, Karen O. Egiazarian Nasser M. Nasrabadi, Syed A. Rizvi, Proc. of SPIE-IS&T Electronic Imaging, SPIE Vol. 7245, 724519 © 2009 SPIE-IS&T · CCC code: 0277-786X/09/$18 · doi: 10.1117/12.810571 SPIE-IS&T/ Vol. 7245 724519-1

2. NEURAL NETWORKS Figure 1 shows a diagram of multilayer neural network algorithm. Although there is one hidden layer in Figure 1, there can be multiple hidden layers. At each node, we compute the following summation: d

d

i =1

i =0

t

net j = ∑ xi w ji + w j 0 = ∑ xi w ji = w j x

(1)

y j = f (net j )

(2)

r x is an input vector and yi is an output of the node. As can be seen in equation (2), to compute yi , we have to compute the activation function f (x ) . A different type of activation function can be used. On the other hand, there are where

some constraint on the activation function. First, the activation function must be nonlinear. If the activation function is linear, the neural network would be a linear function. Second, the activation function should have a saturation property. In other words, it has maximum and minimum output values. Third, the activation function needs to be continuous and smooth. It is desirable that the activation function and its derivatives are must have continuous. The following sigmoid function is widely used as an activation function. The sigmoid function satisfies all the properties .

f ( x) =

1 1 + e−x

(3)

output layer

hidden layer

input layer Fig. 1. Diagram of multilayer neural networks

2.1 Back-propagation algorithm The back-propagation algorithm is most widely used in training neural networks [7]. The back-propagation algorithm is used to find optimal weight vectors for a given data set. In the back-propagation algorithm, we first compute the error between target values and output values as follows:

E= where

1 (t k − ok ) 2 ∑ 2 k

(4)

t k is a target value and ok is an output value. A weight vector is modified so that the error is reduced as follows:

SPIE-IS&T/ Vol. 7245 724519-2

Δwij = −η

∂E ∂wij

(5)

where η is a learning rate.

2.2 Sigmoid Approximation In the previous session, we explained the activation function and the sigmoid function. However, the sigmoid function includes an exponential function. When an algorithm is implemented in hardware, the exponential function is an expensive operation. In this paper, we aim to approximate the sigmoid function with a polynomial function. In particular, we approximate the sigmoid function with a quadratic function. The quadratic function can be expressed as follows:

y = a0 + a1 x + a2 x 2

(6)

To obtain the optimal coefficients, we try to find the polynomial coefficients that maximize the correlation coefficient between the sigmoid function and the quadratic function. We uniformly sampled an interval (0-4) and generated 100,000 points ( x values). Using the generated data points, we constructed two vectors: x ,

x 2 . Since constant terms can’t affect

the correlation, we can ignore the constant coefficient a0 in maximizing the correlation. Then, we compute the output values of the quadratic function and constructed an output vector:

y ' = w1 x + w2 x 2 = [w1

[

w2 ] T x

y' .

]

x2 = W T D

We also computed the corresponding output values of the sigmoid function and constructed another vector: Finally, we try to maximize the correlation between

(7)

ysgm .

ysgm and y ' . Using the optimization method in [8], we can

formulate the optimal weight vector which maximizes the correlation coefficient as follows:

Σ −D1Σ QW = ρ 2W

(8)

where Q = E ( ysgm D) , ΣQ = QQT and Σ D = DD T . After finding the optimal weight vector, we apply linear fitting to minimize MSE as follows:

y = a0 + k y ' = a0 + kw1 x + kw2 x 2 where a1 = kw1 and a 2 = kw2 . Using linear fitting we can find

(9)

a0 and k .

Since the quadratic function can’t provide the saturation property of the sigmoid function, we approximate the sigmoid function with a linear function when x is larger than a threshold (Fig. 2). The coefficients of the linear function were selected so that the overall function will be continuous and differentiable. The approximation function for x>0 is given by

⎧a0 + a1 x + a2 x 2 y=⎨ ⎩ b0 + b1 x

for 0 ≤ x < xth for x ≥ xth

SPIE-IS&T/ Vol. 7245 724519-3

(10)

1.2 1 0.8 0.6 0.4 0.2 0 0

1

2

3

4

5

6

7

Fig. 2. Threshold of quadratic functions and its substituting linear function

For x 0 , a2 < 0 and a0 = 0.5 . The sigmoid and the proposed function are shown Figure 3. As can be seen, the sigmoid and approximation functions are very similar and the error between two functions is very small.

1 0.8 0.6

Sigmoid Approximation 10 x Error

0.4 0.2 0 -0.2 -6

-4

-2

0

2

4

6

Fig. 3. Sigmoid function, proposed function and error between two functions

SPIE-IS&T/ Vol. 7245 724519-4

3. NEURAL NETWORK DEINTERLACING Deinterlacing is a process of filling in mission lines of interlaced video sequences. Due to it importance, a number of deinterlacing methods have been proposed by many researchers. In particular, there are several deinterlacing methods based on neural networks [3-6]. In neural network deinterlacing methods, input neurons are obtained from field pixels and the outputs of the neural network are desired values. There are two categories in neural network deinterlacing methods: intra-field methods and inter-field methods. In intra methods we use only the pixels of the current field while the inter-field deinterlacing method may use pixels from several fields including the current, previous and next fields. In [3], Plaziac proposed an intra-field deinterlacing algorithm using neural networks. The inputs and outputs of the methods are shown in Figure 4. As can be seen in Figure 4, it computes 3 output neurons using 30 input neurons. The method used a 3 layer neural network with 16 hidden neurons.

Input of neural network Output of neural network

Fig. 4. Deinterlacing method using single field

Previous frame

Current frame

Next frame

Input of neural network Output of neural network Fig. 5. Deinterlacing method using previous and next field

SPIE-IS&T/ Vol. 7245 724519-5

In [5], Choi proposed an inter-field deinterlacing method. The inputs and outputs of the methods are shown in Figure 5. The inputs are extracted from the previous, current and next fields. From the previous and next fields, 5 reference pixels are taken. From the current field, it takes 10 pixels. The network has three layers with 16 hidden neurons. The input and output neuron can be expressed using vectors as follows:

A = [a1 , a2 , L, a20 ] T ,

B = [b]

(12)

where A is the input vector and B is the output vector. It is reported that inter-field deinterlacing methods show better performance than intra-field deinterlacing methods [5]. Thus, in this paper, we selected the inter-field deinterlacing method [5] to verify the proposed activation function that is used instead of the sigmoid function.

4. EXPERIMENTAL RESULTS Experiments were performed to test the proposed function. In the experiments, we implemented the neural network deinterlacing method in [5]. Along with the proposed activation function, we tested the sigmoid function (3) and the hyperbolic tangent function, which are frequently as activation functions. These functions satisfy the non-linearity and saturation properties. These functions have been widely used in many applications and their performance has been satisfactory. We tested the neural network deinterlacing algorithm using nine QCIF video sequences. To evaluate the performance of the activation functions, we first converted the progressive video sequences to interlaced video sequences. After we applied the neural network deinterlacing algorithm, we computed PSNRs with the original progressive video sequences. Table 1 shows performance comparison. Table 1. Average result PSNRs (dB)

Format

QCIF

Hyperbolic

Video

Sigmoid

Coastguard

33.95

33.94

34.02

Container

38.17

40.06

38.98

Foreman

34.21

34.07

34.42

Hall & Monitor

37.17

38.83

37.55

Mobile

32.93

33.99

33.34

Mother & Daughter

41.67

40.26

41.94

Silent

38.27

36.95

38.53

Stefan

25.52

25.03

25.48

Table

32.51

32.25

32.65

Average

34.93

35.04

35.21

Tangent

Proposed

Table 1 is obtained by averaging 10 trials with different initial weights. We used the first 100 frames of four video sequences (Coastguard, Foreman, Mobile and Hall & Monitor). The experimental results show that the proposed function provides performance comparable to the other activation functions.

SPIE-IS&T/ Vol. 7245 724519-6

Table 2 shows the average processing time of the three activation functions (Core™2 Quad CPU 2.4 GHz). It is noted that the processing time was measured for test only. We excluded the training time. It can be seen that the proposed algorithm is fastest among the three activation functions. The proposed algorithm consumes 73% of the processing time with the sigmoid function and 53% of that with the hyperbolic tangent function. It is noted that the processing time of the proposed activation can be further reduced when it is implemented in hardware. Table 2. Average computation time of deinterlacing test

Computation Time(sec)

Sigmoid

Hyperbolic Tangent

Proposed

11.58

15.83

8.41

PSNR(dB)

Figure 6 shows that frame PSNR comparison of the sigmoid and proposed activation functions for the coastguard video sequence. It shows that the proposed activation function provides almost the identical performance compared to the sigmoid function.

37

Sigmoid Proposed

36 35 34 33 32 31 0

20

40

60

80 100 Frame number

Fig. 6. Frame by frame comparison for coastguard sequence (100 frames)

5. CONCLUSIONS In this paper, we proposed a new activation function for neural networks. Although presenting activation functions assure the performance of neural networks, it is time consuming process because it is used every iteration. Our proposed function is implemented by quadratic polynomial functions and linear functions. It has piecewise continuous and smoothness. This function is adapted to neural network deinterlacing. Experiments results show that the proposed function perform in shorter time with similar quality. It seems very useful to replace activation function to proposed polynomial function in hardware implementation.

SPIE-IS&T/ Vol. 7245 724519-7

ACKNOWLEDGEMENT This research was supported by the MKE (Ministry of Knowledge Economy) under the ITRC (Information Technology Research Center) support program supervised by the IITA (Institute of Information Technology Assessment) (IITA2008-(C1090-0801-0011)), Korea

REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9]

Renxiang Li, Bing Zeng and Ming L. Liou, “Reliable motion detection/compensation for interlaced sequences and its applications to deinterlacing,” IEEE Trans. Circuits and System for Video Technology, 10(1), 23-29 (2000). Taeuk Jeong, Younghie Kim, Kwanghoon Sohn and Chulhee Lee, “Deinterlacing with selective motion compensation,” Optical Engineering, 45(7), 077001, (2006). Nathalie Plaziac, “Image interpolation using neural networks,” IEEE Trans. Image Processing, 8(11), 1647-1651 (1999). Xianglin Wang and Yeong Taeg Kim, “An edge direction based neural network interpolator for video deinterlacing,” IEEE Int. Conf. Neural Networks & Signal Processing, 1225-1228 (2003). Hyunsoo Choi, Eunjae Lee and Chulhee Lee, “Neural Network Deinterlacing Using Multiple Fields,” LNCIS, 345, 970-975 (2006). Hyunsoo Choi and Chulhee Lee, “Neural Network Deinterlacing Using Multiple Fields and Field-MESs,” IJCNN 2007, International Joint Conference on, 869-872 (2007). Richard O. Duda, Peter E. Hart and David G. Stork, [Pattern Classification Second Edition], Wiley Interscience, pp. 282-335 (2000). Document 6Q/42: A new method for objective measure of video quality using wavelet transform, ITU-R/SG 6/WP 6Q, Republic of Korea (Sep. 13, 2001). Ohjae Kwon, Kwanghoon Sohn and Chulhee Lee, “Deinterlacing using directional interpolation and motion compensation,” IEEE Trans. Consumer Electronics, 49(1), 198-203(2003).

SPIE-IS&T/ Vol. 7245 724519-8