A Parallel Median Filter with Pipelined Scheduling ... - Semantic Scholar

0 downloads 0 Views 2MB Size Report
Jul 7, 2000 - SUMMARY. In this paper, we propose a fast algorithm to realize parallel median filter for processing 1-D and 2-D signal. In the proposed ...
IEICE TRANS. FUNDAMENTALS, VOL.E83–A, NO.7 JULY 2000

1396

PAPER

A Parallel Median Filter with Pipelined Scheduling for Real-Time 1D and 2D Signal Processing∗ Shih-Chang HSIA† and Wei-Chih HSU† , Nonmembers

SUMMARY In this paper, we propose a fast algorithm to realize parallel median filter for processing 1-D and 2-D signal. In the proposed pipelined architecture, m-passes are employed for filtering signal while word resolution is m bits. One pass employs one processing element (PE), and the number of PEs is independent of the number of samples. Therefore, we only need m PEs for real-time operation. With 8-bits resolution, the system gate-count is less than 5 k. Moreover, this median architecture could be easily modified to consist of the programmable feature that may choose the better sampling number to filter signal. It should be also noted that our proposed processing flow has a progressive property, which is very suitable for bandwidth-limited channel application. key words: median lter, processing element, programmable,

progressive

1.

Introduction

To realize median filter [1]–[11], word-level and bit-level processing are main research approaches. These approaches discuss the hardware complexity in terms of number of samples n and word resolution m. When n and m become large, the hardware becomes more complex and searching time increases largely. In order to find the median value among the sampled data, the sorting procedure may be applied in practical implementation. If the number of samples is n, we require (n − 1)! comparisons with the bubble sorting algorithm [14]. Since the computational complexity is high, the real-time operation is difficult, especially for the time-critical task [15]. Without sorting, the deleting/inserting manners are used to reduce the complexity. Assumed the samples have been ordered, n comparisons are required to find the desire position. When a new sample coming, the older sample is deleted from the data sequence, and the new sample is inserted into the sequence, therefore we just need 2n comparisons. In 2-D signal processing, the window is treated as the basic processing-unit. If the window size is w ×w, for moving window, we require to delete the older w samples and to insert the newer w ones. However, the conventional Manuscript received October 12, 1999. Manuscript revised January 31, 2000. † The authors are with the Department of Computer and Communication Engineering, National Kaohsiung First University of Science and Technology, Kaohsiung 824, Taiwan, R.O.C. ∗ This work was supported by the National Science Council, Republic of China, under Grant NSC89-2213-E327-010.

insert/delete approaches [12], [13] deal with the case of one input sample. In this paper, we present a novel median filter algorithm and VLSI architecture, without sorting or deleting/inserting procedures. First, we define some useful parameters for the algorithm. Then the median value can be found by using two-step procedures. For realtime operation, the parallel median filter is developed based on the above algorithm. This structure could filter the 1-D and 2-D signals at one clock time using the pipelined and parallel architecture. Only m PEs are enough to realize the parallel median filter for real-time operation as word resolution is m bits. The outline for this paper is planned below. The fast median algorithm is described in Sect. 2, and the proposed parallel structure with pipelined scheduling is illustrated in Sect. 3. The processing element is designed in Sect. 4, Sect. 5 shows the complexity evaluation and comparison with other approaches. Finally, we draw the conclusions in Sect. 5. 2.

Fast Median Algorithm

Let Xn is the set of sampling data s1 , s2 , . . . , sn , which is given by Xn = {s1 , s2 , . . . , sn }.

(1)

With bit-level approach, each sample can be presented as sn = {bm , bm−1 , · · · , b1 },

(2)

bm , bm−1 , . . . , b1 are binary bits, bm is MSB (Most Significant Bit) and b1 is LSB (Least Significant Bit), each sample resolution is m bits. From (1) and (2), we can attain the matrix with   b1m b1m−1 . . . b12 b11  b2m b2m−1 . . . b22 b21    . . . b32 b31   b b , (3) Xnm =  3m 3m−1  ···     ··· bnm bnm−1 . . . bn2 bn1 n×m bij denotes the jth bit of ith word. To process the matrix Xnm , we employ column-based computation, let

HSIA and HSU: A PARALLEL MEDIAN FILTER WITH PIPELINED SCHEDULING

1397





b1j b2j .. .

  Cj =  

Table 1 Searching median data among {36, 76, 146, 36, 71, 152, 24, 66, 54}.

   = [b1j b2j · · · bnj ]T , 

(4)

bnj here Cj is the jth column in matrix Xnm . The processing flow is split into m passes in one pass for one column way. Then the processing matrix is classified into the high-level and low-level group. The high-level group is constructed according to the following rule G(H)j = {∪(Ci ), if bij = 1, i = 1 to n} ∩ G(X)j−1 , for j = 1 to m, (5) where G(H)j denotes the set of word if bit bij is highlevel in jth pass and G(X)j−1 is the selected group in (j − 1)th pass (X is H or L). And the number of high-level is recorded as N (H)j = |G(H)j |,

Table 2

The relative parameters from Table 1.

(6)

for the jth pass. By the same way, the low-level group can be classified by G(L)j = {∪(Ci ), if bij = 0, i = 1 to n}∩G(X)j−1 , (7) and N (L)j = |G(L)j |.

(8)

For example, let C1 = [1011011]T , we can obtain G(H)1 = {no1, no3, no4, no6, no7}, N (H)1 = 5 and G(L)1 = {no2, no5}, N (L)1 = 2 for high-level and lowlevel, respectively. Based on the above grouping, the searching step is undertaken: Step-1: Larger group G(H)1 or G(L)1 is selected by the number of member that compares Eqs. (6) and (8) in the first pass. If the N (X)1 ≥ M ed, the median should be located on this group, so G(X)1 is selected, where M ed = (n + 1)/2

(9)

n is the number of samples. Continue to the next pass (j = j + 1), until that N (X)j < M ed, the searching flow enters to the second step. To search the median, we need accumulate the number of unselected group in each pass that can express as S(H)j =

j 

N (H)j , G(H)j ⊂ unselected group

1

(10) and S(L)j =

j 

N (L)j , G(L)j ⊂ unselected group

1

(11) Step-2: If S(X)j−1 + N (X)j ≥ M ed, then G(X)j is

selected, the median should be in G(X)j . Continue to the next pass, until that S(X)j−1 + N (X)j = M ed and N (X)j = 1, then we find the median in G(X)j . While over one data is the same as the median value, the condition that S(X)j−1 +N (X)j = M ed and N (X)j = 1, would not happen. To solve this problem, there are two methods. We check each member in the selected G(X)j , the median is found if the members are all the same value, therefore one can stop to search in jth pass. The another method is that processing is run to the last pass, the members of selected G(X)j should be with the same value, we can randomly choose the one of member as a median output. Now we give an example to describe the processing procedure. Table 1 shows the processing pass for searching the median in sampling data {36,76,146,36,71,152,24,66,54}. And Table 2 shows the tracing results from Table 1 with the above presented algorithm. In the first pass, N (L)1 ≥ M ed(= 5), thus G(L)1 is selected. The S(H)1 becomes to 2, because the unselected group G(H)1 with member no3 and no6 is impossible a median. Enter to the second pass, we find S(H)1 + N (H)2 ≥ M ed, thus G(H)2 is selected. The S(L)2 becomes to 4 owing to reject number no1, no4, no7 and no9. In the pass-3 and pass-4, the G(L)3 and G(L)4 are selected, S(H)3−4 and S(L)3−4 keep the same since the bits b6 and b5 are all zero in the selected number no2, no5, no8. Run to pass-5, since S(L)4 + N (L)5 ≥ M ed, G(L)5 is selected. In this pass, we reject no2 and the S(H)5 becomes to 3. In the pass-6, we find that S(L)5 + N (L)6 = M ed and

IEICE TRANS. FUNDAMENTALS, VOL.E83–A, NO.7 JULY 2000

1398

N (L)6 = 1, the no8 should be a median. While the sequence is sorted as to {24,36,36,54,66,71,76,146,152}, we find “66” locating at no8, so the median is found by using this algorithm. In this fast algorithm, we directly search the median without sorting procedure or inserting/deleting operation. The searching time of the proposed method is very short, which it is less or equal than m times and is in-dependent on the number of sampling data, where m is word-resolution. Therefore as the number of sampling data is very large, this algorithm can achieve high performance than other approaches. To evaluate the complexity, we only employ one comparison in step1 and step-2. Bit of comparison is dependent on M ed value, for example, if M ed = 5, 3-bits comparator is enough. The another overhead uses a few adders to record the parameters S(H), N (H), S(L) and N (L). The maximum resolution of the adder is according to the number of samples n, thus we need make sure that 2p ≥ n for N (H) and N (L), where p is the bit number of adder. Hence the required minimum p must satisfy pmin ≥ log2 n, and p − 1 bits (2p−1 ≥ n) can represent for S(H) and S(L). Therefore, the complexity of adder in term of n is logarithm function, the overhead is not large. 3.

Parallel Architecture with Pipelined Scheduling

Although the presented algorithm has the shorter searching time and reasonable complexity, “how to apply for real-time system” also needs to be developed. Based on the previous algorithm, we will explore the real-time architecture for 1-D and 2-D signals filtering in this section. 3.1 Structure for 1-D Median In 1-D structure, the input sampling data should be stored into register due to the parallel bit-level approach, the developed architecture employs the pipelined structure to increase operation speed, just as shown in Fig. 1(a). The sampling data input to shiftregister in each clock. If there are n sampling data, the register data is ready to be read for processing after n clocks. In pass-1, we get the MSBs from register R1 to Rn, and calculate the parameters according to Eqs. (5)–(8). The detail design of processing element for each pass will be discussed in next section. In next clock, the pass-2 accepts the parameters of pass-1, and gets the second MSBs from register R2 to Rn+1. At the same time, the pass-1 also processes the next input sample. By the same way, the parameters of jth pass comes from (j−1)th pass, then we can attain the final result at the last pass. The register bits are directly tired to each pass, for example, the MSBs of R1 to Rn are connected

to pass-1 and the second MSBs of R2 to Rn+1 are connected to pass-2 and so on. Table 3 shows the detail processing scheduling of 1-D median filter. We assume that the sample resolution is 8 bits, hence 8-passes are required, where S1 to Sn denote the input sampling data. In 1-to-8 clocks, the sampling data is stored to shift register, we cannot process any data since the sampling data is not enough. Utile 9th clock, the pass-1 can deal with S1 to S9 , other passes are still idle. In the next clock, the results of pass-1 are send to pass-2 with S1 to S9 , and the sampling data is updated to S2 to S10 in the pass-1. Continue this pipelined flow, the last pass is employed by pass-8. Pass-8 handles S1 to S9 in 16th clock, and we attain the final median result. Using the same scheme, the median of S2 to S10 could be found at 17th clock. Therefore, the latency time for this pipelined structure is 16 clocks. 3.2 Structure for 2-D Median The windows are the basic processing units for 2-D signal, such as image. If the window size is w × w, w samples are required to input/output from processing sequence at each clock, for real-time operation. Thus a parallel structure should be proposed to handle 2-D signal. As the window size is 3 × 3, the parallel 2-D structure is shown in Fig. 1(b). Three samples input to register R1 to R3 in parallel at the first clock. At the next clock, the data shift to the next register with a pair of R4–R6. As the samples S1 –S9 are stored to R1– R9 during three clocks, the data can be sent to pass-1 to find the first median bit. By the same way in 1-D case, the pass-2 can process S1 –S9 at the 4th clock. The final median result can be derived from pass-8 at 12th clock. Table 4 shows the operation scheduling for 2-D signal filtering. In the first time, the samples S1 –S3 are stored to R1–R3 respectively. They are shifted to R4– R6 at the next clock, and the registers R1–R3 are individually updated by new samples S4 –S6 at the same clock. Enter to the third clock, the registers R1–R9 are filled with S1 –S9 respectively. Thus the pass-1 could process S1 –S9 . At the next clock, S1 –S9 are entered to pass-2. At the same time, the newer samples S10 – S12 enter to pass-1 and S1 –S3 are removed from pass-1, just like window moving. Therefore pass-1 can handle samples S4 –S12 in the 4th clock. Continuing this procedure, the samples S1 –S9 enter to pass-8 at the 12th clock, and we attain the median result among S1 –S9 . At the next clock, we also get median result among S4 – S121 according to this pipelined processing flow. With this operation scheduling, the latency time is 12 clocks, it is shorter than one in 1-D case. Often, 1-D and 2-D processors are designed for different applications, such as for speech and image respectively. Two projects are needed to implement the

HSIA and HSU: A PARALLEL MEDIAN FILTER WITH PIPELINED SCHEDULING

1399

(b)

(a)

(c) Fig. 1 (a) The 1-D median filter processing with pipeline structure. (b) The 2-D median filter processing with 3 × 3 size. (c) The basic structure for hybrid 1-D and 2-D processor.

IEICE TRANS. FUNDAMENTALS, VOL.E83–A, NO.7 JULY 2000

1400 Table 3

The processing flow for 1-D median filter.

Table 4

The processing flow for 2-D median filter.

individual chip for each application. If we can merge the 1-D and 2-D processor into single-chip, the implementation cost should be efficiently reduced. From the previous description, we find that the operation scheduling is similar between 1-D and 2-D structure, the hybrid system should be implemented with some modification only. The hybrid structure for 1-D and 2-D processing is drawn in Fig. 1(c) that is modified from Fig. 1(b). In this architecture, we need increase the extra overhead with multiplexes to select the 1D or 2D processing. Moreover, the flow controller is also required to manage the operation scheduling for each pass. 4.

Processing Element for Each Pass

In the proposed architecture, the processing elements for each pass mainly contain grouping control, adder operation and median decision. The processing flow is shown in Fig. 2. The grouping control selects the possible median bit and determines the member of group in each pass. The adder operation is the sum of the member when the bit=1 in each pass, for calculating the parameters N (H) and N (L) in Eqs. (6) and (8). Finally, the median decision accepts the result of adder operation to calculate parameters S(H) and S(L) with Eqs. (10) and (11) respectively. Then the selected group is determined using these parameters, we thus attain the jth median bit in the jth pass. This median bit is feedback to grouping control to select usable member for next pass processing. As the filtering procedure goes on, we can get a median bit in each pass. Finally, we

Fig. 2

System processing flow.

attain the complete median value when the processing runs to the last pass. It is noted that the pass processing is from MSB to LSB, and the weighting value in MSB is greater than the one of LSB. Therefore, we can attain the approximated result for median value during the processing period. As pass-by-pass processes, the value of median becomes more and more accuracy, this progressive feature is useful for band-limited transmission. 4.1 Grouping Control In the median algorithm, we need calculate the number of high-level and low-level bits in each pass to decide which group to be selected. To reduce the hardware complexity, the algorithm can be further devised. In the practical implementation, only the number of highlevel bits is required to compute, the one of low-level

HSIA and HSU: A PARALLEL MEDIAN FILTER WITH PIPELINED SCHEDULING

1401

Fig. 3

The grouping control unit.

can be neglected since the information can be attained from the one of high-level. With this concept, the structure of the grouping control unit is illustrated in Fig. 3. The mask function determines which bit selected to find the next median bit. If the input data of mask modular is un-matched, the output is reset to zero to denote unselected status. In the pass-1, we get the first median bit M ed1, the M ed1 bit controls the next bit in the pass-2. The second column bits b11, b21, . . . b81 are controlled by mask1 modular. If input bits are equal, it denotes to a match condition, these bits are through AN D gate for M ed2 bit computing. Otherwise, the bit is masked to zero by AN D gate. If the bits are masked in jth pass, the (j +1)th pass is also needed to be masked, so the masking bit should be transmitted to the next pass. When jth bit of ith word is masked, the (j + 1)th bit of ith word is also required to be masked. For the pass-3, the mask2 receives the result of mask1, M ed2 and the third column bits to determine output-state. As the bit of mask1 is zero, the mask2– mask7 output zeros too. The mask1–mask7 modulars contain one-bit comparator, moreover, mask2–mask7 require the extra AN D function for getting the previous mask information. With the pipeline scheduling, the jth masked bit should be carried to the (j + 1)th pass, the final unmasked-bit should be a median bit and its relative word is a median value. 4.2 Processing Element Design In this median algorithm, we need to find the number of high-level in each pass. To achieve this goal, the adder modular should be employed, the basic architecture is

Fig. 4

Addition of high level number using full adder.

shown in Fig. 4. For real-time purpose, the operation should be finished in one clock time. With 9 samples, all bits input to full adders in the first stage, and they produce the carry and the sum result. In the second stage, carry bits and sum results are sent to the individual full-adder. Addition of carry bits generates MSB bits, and addition of sum bits produces LSB bits. In the last stage, the carry of LSB should be transmitted to MSB, so that increase-one operation is required to attain the final results with A3-A0 for N (H). In this system, we need to design two kinds of processing element (PE) for step-1 and step-2. In the pass1, the step-1 should be employed. Other passes may be

IEICE TRANS. FUNDAMENTALS, VOL.E83–A, NO.7 JULY 2000

1402

(a)

(b) Fig. 5

(a) The step-1 processing element. (b) The step-2 processing element.

operated on step-1 or step-2 depending on the real sampling value. With regular control, step-1 PE is only employed for pass-1 and step-2 PE is employed for pass-2 to final pass. Figure 5(a) shows the PE for step-1 processing, the internal modulars contain grouping control, N (H) calculation, checking current status to produce median bit and parameter S(H). If N (H) ≥ M ed, the group of high-level bits is selected, and we find the median bit=1 in first pass. Otherwise, we attain median bit=0. At the same time, the median bit controls the accumulation of N (H) for S(H). Since the S(H) records the rejected number of N (H), we accumulate N (H) to the S(H) when median bit is low only. In opposite condition, if the median bit is high, the accumulated content S(H) is not changed. Then, the masked bits and accumulated S(H) output to the next pass processing. The median bit is sent to median value generator that also outputs to next pass. Figure 5(b) shows the step-2 PE, we receive the previous pass information with masked bits, accumulated result S(H) and median bit. From the (j − 1)th masked bits, the results are sent to adder operation to calculate the N (H)j +S(H)j−1 . If N (H)j +S(H)j−1 ≥ M ed, the median bit is set to high in jth pass, otherwise it is zero. The median bit controls the jth masked bit for (j +1)th pass, and also restrains the accumulator whether to accumulate N (H)j to S(H)j . The median

bit generator collects the median bit in every pass, and we can find the median value in the last pass. In this pipelined architecture, the m-passes are employed for m bits per word. One pass uses one processing element and the number of PEs is independent of the number of samples, thus we only apply m PEs for real-time operation. The modulars of median checker are shown in Figs. 6(a) and 6(b) for step-1 and step-2 PEs respectively. With 9 samples, the median is located at number 5. If the input value is larger or equal to 5, then median bit is set to high. The circuit is shown in Fig. 6(a) that only applies one AN D gate and one OR gate, where A2–A0 are binary bits of N (H). For step-2 PE, the 3bits adder is required for pre-sum operation of N (H)j and S(H)j−1 . The accumulator for S(H) is generated with the conventional method with a 3-bit adder and register. The median value generator is shown in Fig. 7. The median bit of pass-1 is loaded into MSB of the median value, the one of final pass determines the LSB content. With pipeline scheduling, the jth pass necessitates j registers to store the temp median bit in each PE, then we could find the median value in final pass. 5.

Hardware Evaluation and Comparison

To estimate complexity, the equivalent gate number

HSIA and HSU: A PARALLEL MEDIAN FILTER WITH PIPELINED SCHEDULING

1403 Table 5 Complexity evaluation for the proposed median architecture.

(a)

(b) Fig. 6 (a) The modular of median check for step-1 PE. (b) The modular of median check for step-2 PE.

Table 6 Comparison of the proposed architecture and other approaches.

Fig. 7 pass.

Median value generator for collecting median bit in each

uses the typical circuit component [16]. In our proposed architecture, main modulars contain pipelined registers and processing elements, Table 5 shows the module size of each module. With 8 bits resolution, the stage for pipelined registers to store samples requires 15 and 30 with 960 and 1920 gates for 1-D and 2-D structure. The number of processing element (PE) is 8 for 8 bits word, each PE is consisted of grouping control , adder operation, checker , accumulator and median register . The grouping control requires 256 control gates and 512 registers to record masked information for the next pass as pipelined processing is used. The adder operation uses five full-adders and one increase-one operator, so they need 463 gates. For checking median bit, we only require 2 gates in the first pass, and other passes need extra 3-bit adder, so the checker employs 247 gates to determine the median bits. The 3-bit accumulator is required to store S(H), thus the number of gate is 456. In order to collect the median information in the each pass, we employ 368 gates to realize. The complexity of 1-D and 2-D processing only requires 3262 and 4222 gates in the proposed architecture. Now, we compare the performance of the proposed architecture and other approaches. The compared results are shown in Table 6. In our design, the gate-level style is employed by using the conventional standard cell to realize this chip. The input samples don’t require pre-sorted in our method, however, the other approaches need pre-sorted before processing since they

employ the deleting/inserting operation. To compare the complexity, only about 5 k gates are used in our architecture. It is less than other approaches. The number of PE is decided by the word resolution and it is independent of the number of samples. With 9 samples, per-sample employs 362 and 496 gates for 1-D and 2-D in average respectively. The critical path of our architecture is to accumulate the N (H) just as shown in Fig. 5(b), the processing speed can achieve 70 MHz with 0.6 µm CMOS technology. The programmable median filter is usable in the practical application. When the number of samples is over real requirement, the distortion may be happened after filtering. On the other hand, the filtering result is not good if the sampling number is less. If the chip has the programmable property, the number of samples can be selected by user to find the better result. In our architecture, the programmable feature could be achieved easily, only by increasing the masked bit in

IEICE TRANS. FUNDAMENTALS, VOL.E83–A, NO.7 JULY 2000

1404

the first pass to reduce the sampling number and controlling the median value using the extra input. For the bandwidth-limitation communication channel, the progressive feature can attain the temporary information to show the rough result when the data is not completely transferred. In the proposed architecture, the processing flow is from MSB to LSB, we can get the coarse resolution when the MSB bits are finished, and the resolution becomes finer and finer when more passes are processed. For the high-speed application, the progressive transmission should be attained from the intermediate pass. In other words, we can stop the processing in jth pass to accommodate the high speed requirement. 6.

Conclusions

In this paper, the fast algorithm is first presented with bit level approach. With this algorithm, the 1-D and 2D median structures are proposed using parallel architecture combining pipeline scheduling. With multiplex, 1-D and 2-D processor can be possibly merged into one chip. The number of PEs is dependent of the word resolution and is independent of the sampling number. Only 8 PEs are required in our system as word length is 8. The hardware complexity is less than 5 k gate counts for 1-D and 2-D processing in the real-time operation. Moreover, the users can program the number of samples to select a better result and the progressive feature is very suitable for high-speed transmission application. References [1] H. Rantanen, M. Karisson, P. Pohjala, and S. Kalli, “Color video signal processing with median filters,” IEEE Trans. Consumer Electronics, vol.38, no.3, pp.157–161, Nov. 1992. [2] A.K. Jain, Fundamentals of Digital Image Processing, Prentice-Hall, Englewood Cliffs, NJ, 1989. [3] D.S. Richards, “VLSI median filters,” IEEE Trans. Acoust., Speech & Signal Process., vol.38, no.1, pp.145–153, Jan. 1990. [4] T.A. Nodes and N.C. Gallagher, Jr., “Median filters: Some modification and their properties,” IEEE Trans. Acoust., Speech & Signal Process., vol.ASSP-30, no.5, Oct. 1982. [5] S.B. Leeb, A. Ortis, and J.L. Kirtley, “Real-time median filter with a fast hardware sorter,” Sixth Annual Applied Power Electronics Conference and Exposition, pp.254–260, 1991. [6] K. Chen, “Bit-serial realization of a class of non-linear filters based on positive Boolean functions,” IEEE Trans. Circuits & Syst., vol.36, no.6, pp.785–794, June 1989. [7] E. Ataman, V.K. Aatre, and K.M. Wong, “A fast method for real-time median filtering,” IEEE Trans. Acoust., Speech & Signal Process., vol.ASSP-28, no.4, Oct. 1980. [8] J. Siu, J. Li, and S. Luthi, “A real-time median based filter for video signals,” IEEE Trans. Consumer Electron., vol.39, no.2, pp.115–121, May 1993. [9] P.R. Burfiled, “A VLSI implementation study of a 10 Mbit/s video decoder,” Signal Processing, Image Communication, pp.59–74, May 1993. [10] R. Roncella, R. Saletti, and P. Terreni, “70 MHz 2 µm

[11]

[12]

[13]

[14]

[15]

[16]

CMOS bit-level systolic array median filter,” IEEE J. Solid State Circuits, vol.28, no.5, pp.530–536, 1993. C. Chakrabarti, “High samples rate systolic architecture for median filter,” IEEE Trans. Signal Process., vol.42, no.3, pp.707–712, 1994. C.Y. Lee, P.W. Hsieh, and J.M. Tasi, “High speed median filter designs using shiftable content-addressable memory,” IEEE Trans. Circuits Syst. Video Technol., vol.4, no.6, pp.544–549, Dec. 1994. C.T. Chen, L.G. Chen, and J.H. Hsiao, “VLSI implementation of a selective median filter,” IEEE Trans. Consumer Electronics, vol.42, no.1, pp.33–42, Feb. 1996. E. Horowitz, S. Sahni, and S. Anderson-Freed, Fundamental of data structure in C, Computer Science Press, New York, 1992. R. Hopkins, “Digital terrestrial HDTV for North America: The grand alliance HDTV system,” IEEE Trans. Consumer Electron., vol.40, no.3, pp.185–198, Aug. 1994. H.-M. Jong, L.-G. Chen, and T.-D. Chiueh, “Parallel architectures for 3-step hierarchical search block-matching algorithm,” IEEE Trans. Circuits Syst. Video Technol., vol.4, no.4, pp.407–416, Aug. 1994.

Shih-Chang Hsia received the Ph.D. degree from the Department of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan, ROC, in 1997. During 1986–1989, he was an engineer in the R&D department of Microtek International Inc., Hsin-Chu. He was an instructor and associate professor in the Department of Electronic Engineering, Chung Chou Institute of Technology during 1991–1997. Since 1997, he is an associate professor in Department of Computer and Communication Engineering, National Kaohsiung First University of Science and Technology, Kaohsiung. His research interests include VLSI design for HDTV systems, video coding and processing, communication and data hiding system.

Wei-Chih Hsu was born in NanTo, Taiwan, on February 21, 1963. He received the B.E.E. degree from National Taiwan University in 1984, and Ph.D. degree also from National Taiwan University in 1993. From 1994 to 1996, he served as an associate researcher at Chu-Hua Telecommunication Cooperation. Since 1997, he has been an Associate Professor at the department of computer and communication engineering in National Kaoshiung First University of Science and Technology. His main research interests are in speech processing, computer networking, and digital signal processing.