Pearson's Correlation Coefficient for Discarding ... - Semantic Scholar

3 downloads 80692 Views 656KB Size Report
Mendeleck are with the Mechanical Engineering Faculty, University of. Campinas, SP ... to USA high schools and Universities/Companies inside and outside the ...
16th IEEE International Conference on Control Applications Part of IEEE Multi-conference on Systems and Control Singapore, 1-3 October 2007

MoC03.6

Pearson's Correlation Coefficient for Discarding Redundant Information in Real Time Autonomous Navigation System A. Miranda Neto, Member, IEEE, L. Rittner, Member, IEEE, N. Leite, D. E. Zampieri, R. Lotufo and A. Mendeleck

Abstract—Lately, many applications for control of autonomous vehicles are being developed and one important aspect is the excess of information, frequently redundant, that imposes a great computational cost in data processing. Based on the fact that real-time navigation systems could have their performance compromised by the need of processing all this redundant information (all images acquired by a vision system, for example), this work proposes an automatic image discarding method using the Pearson’s Correlation Coefficient (PCC). The proposed algorithm uses the PCC as the criteria to decide if the current image is similar to the reference image and could be ignored or if it contains new information and should be considered in the next step of the process (identification of the navigation area by an image segmentation method). If the PCC indicates that there is a high correlation, the image is discarded without being segmented. Otherwise, the image is segmented and is set as the new reference frame for the subsequent frames. This technique was tested in video sequences and showed that more than 90% of the images can be discarded without loss of information, leading to a significant reduction of computational time necessary to identify the navigation area.

L

I. INTRODUCTION

ATELY, several applications for control of autonomous vehicles are being developed, and in most cases, the machine vision is an important part of the set of sensors used for navigation. Although extremely complex and highly demanding, thanks to the great deal of information it can deliver, machine vision is a powerful means for sensing the environment and has been widely employed to deal with a large number of tasks in the automotive field [1]. But complex computer vision systems may peform unsatisfactorily due to its processing time. Thinking about the existing relation between a real time decision system and an image reading system that operates in an specific acquiring/reading rate, one can question: how many images must be discarded by the image processing system to

Manuscript received January 10, 2007. This work was supported in part by CNPq. Arthur de Miranda Neto, Douglas Eduardo Zampieri and André Mendeleck are with the Mechanical Engineering Faculty, University of Campinas, SP, 13083-970, Brazil, phone: 55-19-3521-3386; (e-mails: [email protected], [email protected], andre.mendeleck@uol. com.br). Leticia Rittner and Roberto Lotufo are with the Electrical Engineering Faculty, University of Campinas, SP, 13083-970 Brazil, (e-mails: [email protected], [email protected].). Neucimar J. Leite is with the Institute of Computing, University of Campinas, SP, 13084-971 Brazil, (e-mail: [email protected]).

1-4244-0443-6/07/$20.00 ©2007 IEEE.

guarantee an acceptable real time navigation of an autonomous vehicle? Therefore, the decision for a more complex machine vision system possibly leads to an excessively slow system for an independent real time application. Although the system could maintain a database of acquired images and submit it, for example, to a neural network for decision, the high number of information not necessarily would lead to better decisions and could also harm the performance of the system, overloading it. Taking in account that it has been estimated that humans perceive visually about 90% of the environment information required for driving [1], it is not a bad idea to reduce information acquired by a navigation system, in order to reduce processing time. But the definition of an automatic image discarding criteria, that leads to a minimum loss of information, may not be a trivial task for computational systems, specially real time ones. In this work we propose an automatic image discarding method based on Pearson’s Correlation Coefficient (PCC), a low complexity and easy implemented solution. With the purpose to study machine vision techniques applied to the navigation of autonomous vehicles, the worldwide championship organized by the DARPA (Defense Advanced Research Projects Agency) [2], [3], known as Grand Challenge, was one of our motivations for the accomplishment of this work. Nevertheless it is important to point out that the purpose of this work is to propose a solution for a specific task: identifying the navigation area. In general, the participant vehicles of this competition are equipped with all kinds of sensors to allow the interface between the navigation system and the environment to be explored. The vision system usually used in these vehicles is composed by two or more cameras, as we see in [4] and [5]. Our work is based on monocular vision, that is, a single camera will acquire images of the environment and these will feed the computer system. In section 2 we present an overview of the Grand Challenge, as the context for the studied problem. Section 3 presents the structure of our Navigation System. In section 4 the steps of the image processing chain for identification of the navigation area are described. In section 5 the Pearson’s Correlation Coefficient is introduced and the discarding method is explained. Implementation details and obtained results are presented in section 6 and the conclusions and possible extensions can be found in section 7.

426

MoC03.6

II. GRAND CHALLENGE A. Overview With the intention to stimulate the development of autonomous off-road vehicles, the DARPA (Defense Advanced Research Projects Agency) organized the first DARPA Grand Challenge in 2004. This competition is open to USA high schools and Universities/Companies inside and outside the USA. Vehicles of all kinds are allowed [2], [3]. In 2004 there were 106 competitors inscribed. It took place on a desert course stretching from Barstow, California to Primm, Nevada, but did not produce a finisher. In 2005, the route to be followed by the robots was supplied to the teams two hours before the start. Once the race had started, the robots were not allowed to contact humans in any way. By the end, 18 robots had been disabled and five robots finished the course. The winner of the 2005 DARPA Grand Challenge was Stanley [3], with a course time of 6 hours 53 minutes and 8 seconds and with average speed of 19.1 MPH. B. Basic Rules [2] Basic rules of the DARPA Grand Challenge: vehicle must be entirely autonomous, using only GPS and the information they detect with their sensors; it must complete the route by driving between specified checkpoints; it must operate in rain and fog, with GPS blocked; and it must avoid collision with other vehicles and objects such as carts, bicycles, traffic barrels, and objects in the environment such as utility poles. C. Future of the Championship The 2007 Grand Challenge, which is also known as the DARPA Urban Challenge, will take place on November 3, 2007. The course will involve a 60 mile mock urban area course, to be completed in fewer than 6 hours. Rules will include the obeying of traffic laws while negotiating other traffic and obstacles and merging into traffic [2], [3]. III. NAVIGATION SYSTEM The navigation system abstracted by us in this work is presented in Fig. 1. After the image acquisition by a physical device, the computer vision layer fulfills its role through sublayers (reading and pattern recognition sublayers). The generated information is then available for the next layer, the navigation layer, which converts it into movement commands for the vehicle. The image discarding based on PCC, method proposed in this work, is part of the tasks performed by the computer vision layer. It is important to point out that our application was developed based on the object oriented paradigm and organized in layers (multithreading), where each layer is treated as a service (thread). When a layer is composed by sublayers, they will also be structured as services . According to [6] and [7], the use of object oriented paradigm facilitates the migration of the software project to

the codification, and the layers structure contributes with applications running in multiprocessor computers.

Fig. 1. Navigation System

IV. COMPUTER VISION LAYER One of the functions of the computer vision layer in embedded autonomous vehicle systems is to segment the image obtained by the camera. The purpose of segmentation is to distinguish objects in an image [8], [9]. In our case, the object to be detected is the navigation area for the autonomous vehicle. This task can be extremely sophisticated and complex, and with the use of well elaborated filters, it is possible to get very satisfactory results. However, high quality segmentation usually implies in a higher price, because robust segmentation algorithms present great complexity. One way to perform segmentation of an image is to use thresholds. This type of segmentation technique is very simple and fast computationally, however the identification of the ideal threshold can be sufficiently complicated. The best thing to do in this case is to use techniques and algorithms that search the threshold automatically. In previous work [10] we presented a segmentation technique called TH Finder, based in a method proposed by Otsu [11]. This algorithm searches for an ideal threshold calculated only on parts of the image related to the navigation area (thus discarding the horizon region). In the proposed method, we divide image in two parts. The division is not necessarily in equal parts, but in two complementary sub-images. Fig.2 shows an example of this division: Up (above) and Down (below), respectively, horizon vision (contribute for future decisions) and the close vision (supply information for immediate displacements). The question is: which would be the ideal percentage to attribute for each part (Up and Down) of the image?

427

Fig. 2 – Image cut (Up e Down).

MoC03.6

Initially, we create cuts that divide image in ten slices of equal heights. The algorithm initiates then analyzing the slice closest to the vehicle (more inferior slice of the image, going from the bottom edge of the image to the first imaginary cut). This first sub-image has, therefore, 10% of the total height of the original image. The second sub-image to be analyzed is the one that goes from the bottom edge of the original image until the second cut, totalizing 20% of the height of the original image. All sub-images are submitted then to the segmentation algorithm, where the output of this process is the vector of percentages, with values of the percentages of navigation points (white points) found after analysis of each sub-image. The purpose is to analyze how much the inclusion of one superior slice in the segmentation process positively contributes for the increase/reduction of navigation points of the first slice of the image. In other words, once using a global segmentation method, not always the analysis of a bigger portion of the original image can contribute for a better result in the most critical region (region closer to the vehicle), where obstacles should be detected and avoided as fast as possible. On the contrary, when discarding the superior portion of the original image (horizon vision), we are capable to get a more efficient segmentation and to distinguish with bigger precision the obstacles from the navigation area. After creating the percentage vector from sub-images analyses, next stage considers these values to decide where should be the cut of the image. This is made by using the standard deviation of the percentages vector (VP), defined in Eq. (1): _ 1 n s= (VPi −VP )2 ∑ n −1 i=1

(1)

To find the cut point in the image through the analysis of the percentages vector, we subtract the standard deviation from the value of the percentage of the first sub-image (contained in the first position of the percentage vector). After that, we search in the percentage vector the position that contains the last bigger value than the calculated difference. Once found this index, we decrement it of one, that is, we select the previous index. Doing so, it is possible to get two images: image of the bottom part (Down), using the percentage equivalent to the decremented index, and image from upper part (Up), what remains from the image. It is known that a secure navigation cannot rely in the machine vision model as the only source of information, unless it takes place in a controlled environment. With this in mind, the purpose of the segmentation and the identification of the navigation area is not to create algorithms with perfect results, but to find the bigger amount of white pixels, so that the rules system can indicate the necessary route corrections. In order to deviate from the existing obstacles it is necessary, therefore, to find the

biggest number of white points that represent, in our case, the navigation area. For each point of the sub-image, the value attributed to the maximization matrix is calculated based on the maximization function (Eq. (2)). The purpose of this function is to define the influence of the points of obstacles in the analyzed image.

f (x, y) = y *

{

0 White 1 Black

}

(2)

At this point, a maximization vector is created (VM), and each position of this vector is filled with the sum of all points of the corresponding column in the maximization matrix. This step is described by Eq.(3). As example, the first position of the maximization vector will receive the sum of all values of the first column of the maximization matrix (M). n

VMi = ∑M ( xi , y j )

(3)

j =1

After all, in each position of the maximization vector will contain an integer to represent the influence of obstacles points (in black) for each column of the image. The maximization vector will be used later by the navigation layer, which will be able to decide for one specific route, depending on the obstacles pointed by this vector. Fig. 3 shows the result of the work developed by Thurn et al. in [5], at the University of Stanford, in partnership with companies. It is part of the machine vision system used by the vehicle “Stanley” in the 2005 Grand Challenge, however it was assisted by laser (excellent tool for detection of cracks in the navigation area). In Fig. 4 we present our results using the TH Finder in the same original image used by Stanford. Fig. 4(b) shows the conversion of the original image to a gray-level image. Fig. 4(c) shows the result of the filtering operation. Finally, Fig. 4 (d) presents the result of the segmentation algorithm.

Fig. 3 – Results from Thurn [5]: (a) Original image [5]; (b) Processed image with a quadrilateral laser and pixel classification. (c) Pixel classification before thresholding; (d) Horizon detection for sky removal.

428

MoC03.6

Fig. 4 – Results from our algorithm: (a) original image; (b) gray-level image; (c) filtered image (smoothed); (d) navigation area identified after segmentation.

V. AUTOMATIC IMAGE DISCARDING BASED ON PEARSON’S CORRELATION COEFFICIENT (PCC) A. Pearson’s Correlation Coefficient According to Eugene and Johnston in [12], the Pearson’s Correlation Coefficient, r, is widely used in statistical analysis, pattern recognition, and image processing. Applications on the latter include comparing two images for image registration purposes, object recognition, and disparity measurement. For monochrome digital images, the Pearson’s Correlation Coefficient is described by Eq.(4):

r=

∑ (x

i

− x m )(yi − y m )

i

∑ (x i

i

image. A high PCC indicates that there are almost no changes, while a low PCC means that there are significant changes between images. Therefore, if the PCC is above a given threshold, the image is discarded, since it is very similar to the previous image. For this reason, there is no need to segment it, because it will not bring additional information to the navigation layer. In this case, the system will repeat last navigation command. Otherwise (PCC lower than threshold), the image can not be discarded and it will be segmented. In this case, this image will be adopted as the new reference image. From this moment on, the images will be compared to this new reference image, until the reference changes again. Fig. 5 presents the flowchart of the computer vision layer tasks. After image acquisition, the first step is to verify if it is the first image. In this case, the image is submitted to segmentation (TH Finder algorithm) and the result feeds next layer (navigation). Otherwise, the acquired image is compared to the reference image and PCC is computed. If PCC is higher than threshold, it is discarded and if not, the image is segmented to feed the Navigation layer and it becomes the new reference image.

− xm )2

∑ (y

i

− y m )2

(4)

i

where xi is the intensity of the ith pixel in image 1, yi is the intensity of the ith pixel in image 2, xm is the mean intensity of image 1, and ym is the mean intensity of image 2. Fig. 5 – Computer Vision Layer

The correlation coefficient has value 1 if the two images are identical, 0 if they are completely uncorrelated, and –1 if they are completely anti-correlated, for example, if one image is the negative of the other. In theory, they would obtain a value of 1 for r if the object is intact and a value of less than 1 if alteration or movement has occurred. In practice, distortions in the imaging system, pixel noise, slight variations in the object’s position relative to the camera, and other factors produce an r value less than 1, even if the object has not been moved or physically altered in any manner. For security applications, typical r values for two digital images of the same scene, one recorded immediately after the other using the same imaging system and illumination, range from 0.95 to 0.98 [12]. B. Discarding Method Before submitting an image to the segmentation process, the PCC is computed between this image and the reference image. At the beginning, the reference image is the first

VI. IMPLEMENTATION AND RESULTS A. Experiments It is important to point out that our hardware and software were not embedded in a vehicle or robot. Our results were obtained after submit images extracted from video clips to our navigation software (Laptop Computer – Intel Processor Core Duo T2300E - 2M L2 667Mhz). Because it supports image treatment and multithreading development, among other advantages, we use the 2 Java Platform for the software development. The threshold used in our tests was 0.8861, based on some preliminary experiments. This choice should take into account the amount of obstacles or details in images and the desired security degree for the navigation system. In other words, if one needs a very secure system or if the images are full of obstacles, the adopted threshold should be very high, meaning that only images with almost no changes will be

429

MoC03.6

discarded. But if the images have no obstacles at all, one could choose a lower threshold, thus increasing the percentage of image discarding. The first set of experiments used videos of different sizes and containing different road conditions that had been made available by DARPA [2]. The second set of experiments was based on a built prototype, composed of a regular remote control car, a notebook computer, a parallel computer interface (adapted to the remote control), a mobile with digital camera and JAVA support, and a Bluetooth adapter. The video sequence used on this experiments were the ones obtained by the mobile camera embedded on the remote control car. B. Results The first experiment was based on a video sequence from the DARPA (RaceDay) [2]. From this video sequence, different sequences of 300 frames were used. In Fig.6 we present four frames extracted from this video sequence. In Fig.6 (a) we have Frame 4440, in Fig.6 (b) Frame 4441, in Fig.6 (c) Frame 4450 and in Fig.6 (d) Frame 4460. The computed PCC were: 1. Between Frame 4440 and Frame 4441 = 0.9949; 2. Between Frame 4440 and Frame 4450 = 0.8673; 3. Between Frame 4440 and Frame 4460 = 0.7464; Fig.7 shows more four frames of the same video sequence. In Fig.7 (a) we have Frame 4020, in Fig. 7 (b) Frame 4120, in Fig.7 (c) Frame 4220 and in Fig.7 (d) Frame 4320. The computed PCCs for these new four frames were: 4. Between Frame 4020 and Frame 4120 = 0.9591; 5. Between Frame 4020 and Frame 4220 = 0.9407; 6. Between Frame 4020 and Frame 4320 = 0.8861; According to the PCC values obtained for frames in Fig. 6 and Fig.7, one can conclude that when the obstacle occupies a big portion of the scene (Fig.6), the PCC tends to be low and just a few frames can be discarded. Segmentation has to be performed in almost all images to determine the ideal movements of the vehicle. Conversely, if obstacle occupies a small portion of the frame, it means that it is away from the vehicle (Fig.7) and it will have time enough to react. In this case PCC is higher than threshold and segmentation is not necessary for a significant number of frames. From frame 4020 to frame 4320, for example, the commands to the autonomous vehicle obtained from the segmentation of the first frame (Frame 4020 – reference frame) could be used for the whole sequence. In other words, because the correlation between the 300 frames was high, due to only small changes, segmentation could be bypassed 300 times. It is important to notice that the computational time of our discarding process was 18ms, using 160x120 frames. The segmentation process using the same frame dimensions takes 67ms. In addition, the mean time to pre-processing each frame before segmentation consumes 56ms. That means that, although the system spends some time and

computational effort computing the PCC, it is less that it would spend in segmenting every video frame. In another experiment, a video sequence of ten thousand frames, corresponding to a 6.6 minutes video at a rate of 25 frames per second, was submitted to our discarding process. Only 639 frames from the 10000 frames were submitted to the segmentation process, while all the 9361 frames were automatically discarded by the system, based on the PCC computation. The chosen threshold for the PCC was 0.8861.

Fig. 6 – Frames extracted from the first video sequence: (a) Frame 4440; (b) Frame 4441; (c) Frame 4450; (d) Frame 4460.

Fig. 7 – Frames extracted from the first video sequence: (a) Frame 4020; (b) Frame 4120; (c) Frame 4220; (d) Frame 4320.

This result showed that only 6.39% frames from the used video sequence were necessary to generate the commands for the mobile robot. Similar tests were performed with smaller video sequences and the percentage of used frames stayed between 8% and 10%. The usage percentage is not dependent on the video sequence size, but on the obstacles proportion (influence). In the prototype experiments (camera embedded on a mobile robot), the obtained results of the first set of experiments were confirmed. The percentage of images submitted to the segmentation process was between 6 and 10%, depending on the proximity of the obstacles and the selected threshold.

430

MoC03.6

Fig.8 (a) shows the prototype parts: remote control car, notebook computer, a parallel computer interface adapted to the remote control, a mobile with digital camera and JAVA support, and a Bluetooth adapter. Fig. 8 (b) shows the car with the embedded camera. In Fig. 8 (c) we present the image acquired by the embedded camera and in Fig. 8 (d) the resulting image after segmentation using the TH Finder algorithm.

Fig. 8 –. (a) Prototype parts ; (b) Embedded camera; (c) Acquired image; (d) Segmented image using TH Finder.

One should also have in mind that the proposed system could be as secure as needed. In first place, this algorithm is thought to be used in a mobile robot together with other sensors. It is not expected that a single camera provides all needed information to the navigation system to take decisions on routes. In second place, the choice of the threshold value should take into account the amount of obstacles or details in images and the desired security degree for the navigation system. In other words, if one needs a very secure system or if the images are full of obstacles, the adopted threshold can be very high, meaning that only images with almost no changes will be discarded. But if the images have no obstacles, one can choose a lower threshold, thus increasing the percentage of image discarding. Finally, it is possible to introduce redundancy to guarantee that the command system will react on time if an obstacle suddenly crosses the navigation area. One possible way to do it is to interrupt the PCC computing every 30 frames (for example) and to force the segmentation, adopting the new frame as the reference frame. Doing so, one is reducing the discarding rate and the associated computational gain. On the other hand, the introduced redundancy allows the system to meet higher security requirements.

VII. CONCLUSION

AKNOWLEDGMENT

The machine vision research area stills in evolution. The challenge to construct robust methods of image processing and analysis is far from being achieved. This can be observed by the great number of research being published in the last few years. In this work we presented a simple solution to improve the performance of a real-time navigation system by choosing, in an automatic way, which images to discard and which ones to segment, in order to identify the navigation area. Our experiments showed that the inclusion of an automatic image discarding method based on PCC did result in a reduction of the processing time. Although the system spends some milliseconds computing the PCC, it gains much more time discarding more than 90 % of the images (without segmenting them), in order to identify the navigation area. The computational mean time of our process (PCC) was 85% smaller than the segmentation process added to the pre-processing time. It is important to notice that our algorithm is not optimized yet and we are confident that the processing time can be reduced even more, taking in account that it could be, for example, implemented in hardware. Another remarkable characteristic of the Computer Vision layer presented in this work is its independence of the image acquiring system and of the robot itself. There is no need to calibrate the camera and there is no requirement of previous knowledge of system parameters to adjust our method. The same implementation works in different mobile robots, with different embedded vision systems, without the need of adjusting parameters.

The authors wish to thank Mr. Eduardo Fritzen for his support in the prototype construction. This work was supported in part by CNPq. REFERENCES [1] M. Bertozzi. A. Broggi and A. Fascioli.,”Vision-based intelligent vehicles: state of the art and perspectives”. Robotics and Autonomous systems 32, 1-16, 2000. [2] DARPA 2004. “DARPA Grand Challenge Rulebook”, http://www.darpa.mil/grandchallenge05/ [3] Stanford Racing Team’s Entry In The 2005 DARPA Grand Challenge, (2006), http://www.stanfordracing.org [4] H. Dahlkamp, A. Kaehler, D. Stavens, S. Thrun, and G. Bradski, “SelfSupervised Monocular Road Detection in Desert Terrain.” In Proceedings of Robotics: Science and Systems 2006 (RSS06), Philadelphia , USA. [5] S. Thrun, M. Montemerlo, D. Stavens, H. Dahlkamp, et. al. “Stanford Racing Team’s Entry in the 2005 DARPA Grand Challenge.” Technical Report. DARPA Grand Challenge 2005. [6] R.S.Pressman, “Engenharia de Software”. Ed. MAKRON Books, 1995. [7] W. Boogs and M. Boogs, “Mastering UML com Rational Rose 2002”, Ed. ALTA Books, 2002. [8] C.R. Gonzalez and E.R. Woods, “Processamento de Imagens Digitais”, Ed. Edgard Blücher, Brazil, 2000. [9] H.A. Abutaleb ,“Automatic Thresholding of Gray-Level Pictures Using Two-Dimensional Entropy”, Computer Vision, Graphics, and Image Processing, 1989. [10] A.Miranda Neto and L.Rittner, “A Simple and Efficient Road Detection Algorithm for Real Time Autonomous Navigation based on Monocular Vision”, Proceedings of the 2006 IEEE 3rd Latin American Robotics Symposium (LARS), 2006. [11] N. Otsu, “A threshold selection method from gray-level histogram”. IEEE Transactions on Systems, Man, and Cybernetics, 1978. [12] Y.K. Eugene and R.G. Johnston, “The Ineffectiveness of the Correlation Coefficient for Image Comparisons”, Technical Report LA-UR96-2474, Los Alamos, 1996. c

431