VIDEO-BASED VEHICLE AND PEDESTRIAN TRACKING AND ...

2 downloads 209195 Views 395KB Size Report
ANDROID (Automated Numeration of Data Realised by Optimised ..... 7. 4. 0815. 90. 89. 104. 110. 78. 76. 3. -1*. 0830. 70. 78. 89. 86. 70. 63. 3. 7. 0845. 78. 90.
VIDEO-BASED VEHICLE AND PEDESTRIAN TRACKING AND MOTION MODELLING PT Blythe University of Newcastle upon Tyne, UK INTRODUCTION The ANDROID Partnership ANDROID (Automated Numeration of Data Realised by Optimised Image Detection) is a collaborative, LINK-funded project between the University of Newcastle upon Tyne, Ross Silcock Partnership (RSL) and the Defence, Evaluation and Research Agency (DERA). ANDROID is co-ordinated by RSL. The motivating idea for the project was to build upon research started under the DETR–funded Pedestrian Behaviour and Exposure to Risk (PEDEX) project (R.A.Allsopp, 1997, C.R.Williams and M.G.H.Bell, 1997) where image processing of traffic videos was used to assess the exposure of pedestrians to risk. This project illustrated the shortcomings of existing systems to monitor, classify and count vehicles and pedestrians automatically. These shortcomings were addressed and, to some extent resolved, in the ANDROID project which specifically developed techniques to characterise objects and the application software to discriminate, track and count pedestrians and vehicles automatically from video footage. In parallel with the technical research development work ANDROID also developed a validation scheme for the system and performed significant market analysis to determine what exploitation path the partnership should pursue. Overview The front-end processing for the ANDROID system is based on ASSET (A Scene Segmenter Establishing Tracking) which uses image flow determined from the motion of two-dimensional image features (known as “corners”) to identify moving objects within a scene. ASSET processes a sequence of video images, taken by a possibly moving camera, and performs point-based feature matching and motion segmentation in the image plane. Knowledge of the camera position allows the estimation of the real-world trajectories of objects moving within the scene. Vehicle and pedestrian activity within different areas of the scene is quantified using designated trip-wires by detecting when tracked objects cross user-designated trip-wires and recording such attributes such as size, shape and motion. Discrimination between vehicles and pedestrians is currently based on the real-world motion, as segmented pedestrian groups and vehicles are often similar in size and shape, whilst the number of pedestrian in a group is estimated from its real-world size.

The computationally intensive corner detection stage was previously performed using custom hardware, developed by Roke Manor Research, and frame-rate processing of 256x256 pixel images was achieved by running ASSET corner matching and optic flow segmentation on a PowerPC-based system. However, the increasing performance of PC-based processors, combined with the ability to perform robust corner detection in software with the SUSAN (Smallest Univalue Segment Assimilating Nucleus) algorithm, is now sufficient to allow a PC-based implementation to process eight frames a second. Earlier work, within the PEDEX (Pedestrian Exposure) project, showed that processing every third frame of an image sequence, recorded at normal frame-rate, led to more reliable detection of slow-moving pedestrians. Occlusion resulting from the overlap of vehicles in a queue or pedestrians in a crowd presents a particular problem for motion-based techniques, both in terms of discrimination and loss of tracking. Results from the ANDROID project have demonstrated the close correlation which exists between manual and automatic vehicle counts under steady state conditions, and vehicle count accuracies to better than ten per cent in the absence of occlusion. Improved discrimination of vehicles may be provided by other techniques, such as the use of wire-frame model matching at Reading University, or the analysis of object track history. Analysis of pedestrian activity is more difficult since movement is generally unconstrained and may be contra-directional. For this reason much of the related work has focused on the use of temporal differencing and the correlation of localised image features to global crowd parameters. Within the ANDROID project, work has included the development of object track modelling and matching using quartic representations of the object position and velocity in the real-world co-ordinate system to enable improved discrimination of objects, based on an analysis of the observed object track history, and to overcome problems associated with loss of tracking due to occlusion. Why Track Modelling? The object track modelling leads to improved discrimination and, in conjunction with trip-wire data, allows the derivation of origin-destination matrices of vehicle manoeuvres at junctions. Cumulative observations of object tracks may be used to construct models of “normal” vehicle or pedestrian behaviour, such as expected traffic flow patterns within the scene. This might be used in traffic monitoring applications to detect the presence of a stopped vehicle on a busy carriageway. Whilst video-based systems are unlikely to outperform loop-based counting systems in terms of vehicle counts and speed accuracy, this should be weighed against their

ease of installation for short-term traffic surveys. In addition, the use of object tracking enables the identification of specific vehicle paths through junctions, whilst trajectory modelling can be used to detect incident through an analysis of observed changes in traffic flow patterns.

improve the stability and reliability of the code. Factors affecting the accuracy of the system were investigated so that the performance of the system could be characterised. A PC based demonstrator application of the software was constructed to show that the code could be implemented without the need for expensive custom hardware.

THE ANDROID PROJECT Background Earlier work within the DETR-funded PEDEX project provided a basis for the development of pedestrian and vehicle counters. This work used the ASSET (A Scene Segmenter Establishing Tracking) image processing system which required custom hardware to achieve realtime performance. The ASSET system, developed at DERA Chertsey, determines image optic flow from the motion of twodimensional image features (referred to as “corners”) and uses this to identify moving objects within a scene. ASSET processes a sequence of video images, performs point-based feature matching in the image plane, and uses knowledge of the camera position to derive estimates of the real-world trajectories of moving objects within the scene. Pedestrian and vehicle counts can be obtained by recording the numbers of objects and their attributes, such as position, velocity and shape, as they cross designated trip-wires. This simple approach suffers from several problems because of the small image size of the objects being tracked. Corners detected on pedestrians and vehicles can easily be lost amongst the large number of corners detected in the background, particularly if the contrast in the scene is low, or the motion of the object is slow. This causes the ASSET software to lose the object. Scene segmentation is based on image motion, which means that several adjacent pedestrians or vehicles moving with similar motion can be grouped together into a single object. Objectives The ANDRIOD project had two major technical aims. The first of these was to improve the scene segmentation of the system by extending the functionality of the ASSET code. The second task was to write software capable of interpreting the motion detected in traffic videos as being caused by vehicles or pedestrians and calculating statistical information about traffic flows. This software was to be able to classify objects as being vehicles or pedestrians, to estimate the number of pedestrians present in a detected group, and to count the vehicle or pedestrian flow over predefined tripwires within the scene. ASSET System Characterisation and Improvement Development work on the ASSET software fell into three main categories. The software was enhanced by the addition of several high level control routines to

Determine Accuracy Envelope Analysis of system performance requires an assessment of the effects of both physical and scene constraints on detection probabilities and count accuracies. Physical constraints include the format of the recording medium and the camera set-up, which affects both the size of objects in the image and the degree to which objects occlude one another. Scene constraints take account of traffic density and flow patterns which can cause problems in certain circumstances. The probability of detection of an object is a function of its image size, its motion, and the motion of other nearby objects as well as the level of inherent image noise. The dependency of the detection probability on image size, motion and background noise was investigated by running ASSET on a range of different video sequences. These videos showed pedestrian activity at a range of distances and speeds and had varying levels of noise added to them. The results of this work allowed lower bounds to be placed on object image size and motion and upper bounds on noise levels for reliable detection. This information can be used to generate image overlays which delineate those scene regions for which the detection probability exceeds some specified figure and to identify optimum trip-wire positions and orientations. Work was also done to assess the effects of video format and speed on counter performance. The aim of this task was to identify the differences in counter performance between VHS and S-VHS recordings which arise solely as a result of the recording format. It was found that variations in vehicle and pedestrian tripwire counts between runs, which were of the order of 5%, were greater than any inherent variations which might be due to differences between VHS and S-VHS recordings. ASSET Enhancements. In order to improve the performance of the ASSET routine, several aspects of the system were investigated. The 32 set up parameters of the ASSET system were investigated, and guidelines produced to aid the setting of these numbers. One parameter, the corner finding threshold, is particularly important in determining the performance of the system. Code was written to allow this parameter to be set automatically, and altered as the program runs to adapt to changing conditions. A method was devised that could automatically filter out movements of the camera, and the method of tracking objects was improved. Development of Demonstrator Platform. Previous implementations of the ASSET system have required custom hardware to provide enough processing power to operate in real time. Since the cost of the hardware

will be a major factor in determining the market for the software, it was decided to implement ANDROID so that all of the image processing could be performed in the CPU of a Pentium PC. This section of the work sought to configure the hardware components of a PC to demonstrate that such an implementation could work in real time. The target specification for a near real-time system for traffic monitoring applications is to capture and process a 256x256 pixel image at 8 Hz, equivalent to a frame processing time of 125 milliseconds. A version of the ASSET code has been implemented entirely in software. Running on a 200 MHz Pentium MMX PC system, this software can attain a frame rate close to 8 Hz.

frame could be distinguished from pedestrians on the basis of their maximum velocity.

CLASSIFICATION AND OBJECT RECOGNITION

It was found, however, that extrapolation far beyond the times that the object had been identified was unreliable. It was decided, therefore, to restrict the time that a model could be used to half of the lifetime of the object before the first sighting, and the same time after the last sighting, and to fix the gradient of the model for these extrapolated positions. In this way, a more robust model was produced.

The ASSET software extracts from video images a list of moving objects that have been detected. No information is provided about the nature of the moving objects. The aim of this part of the work was to develop algorithms capable of analysing the motion of objects detected by ASSET and identifying them as vehicles or pedestrians. Because groups of pedestrians are often merged together by ASSET to become one moving object, an important aspect of this interpretation is the determination of the number of pedestrians in a group. The output of the ASSET system was used as the input to the ANDROID software, which was run as a post processor after the ASSET code had finished. The ASSET software itself was unchanged, except for the addition of an extra module to calculate the real-world size and average red, green and blue luminance values for each pixel within the object, and to write all the object information to disk. The information provided for each frame comprised a list of shapes that had been identified as moving through the scene giving the position, size, and colour of the shape. These shapes could be tracked through subsequent frames. The advantage of this post processing approach is that the ANDROID system has access to all the information about moving objects before any processing is done. Each object can then be treated as a point moving along the ground as time passes, rather than as a snapshot of the moving object. This allowed the motion of each object to be modelled as a quadratic function of time. Because the ASSET software can lose track of objects and find them again later, it was necessary to compare each of the detected objects with the others to determine if they had been formed by the same real-world object. If it was decided that they had been, the models of both objects were combined to form a single object. The distinction between cars and pedestrians was made by examining the maximum velocity of the object as it crossed the scene. Although cars often move at speeds comparable to those of pedestrians, it is unlikely that they will cross the scene without speeding up at all. In the scenes that were analysed during the development of the software, even vehicles that stopped within the

Object Modelling To allow the motion of the objects to be extrapolated beyond what had been picked up by ASSET, the position of each one was modelled as a function of time. The X and Y components of the position of the object in the image plane were treated independently, each being modelled as a quadratic function of time as illustrated in Figures 1a and 1b. The coefficients for each model were calculated by least squares fitting. In this way the object’s position can be calculated for any time.

Track Matching The matching of different shapes together was performed by a neural network. This algorithm was designed to process information provided about two shapes, and produce a decision about whether or not they were created by the same object. The information provided to the neural network was: • the time difference between the end of the earliest shape and the start of the latest shape; • the distance between the two shapes. This distance was calculated in the middle of the gap between the end of the earliest shape and the start of the latest shape; • the difference between the velocities of the two shapes. The velocities were calculated as the derivative of the model representing the shapes, calculated at the same time as the distance between the shapes; • the difference between the average sizes of the shapes; • the differences between the average values of the red, green and blue brightness of the shapes. For monochrome images, these three variables were replaced by one, the difference between the average values of the luminance of the shapes. For the neural network to function, it needs to be “trained”. This training required the system to be told whether various pairs of shapes were matches or not. Suitable pairs of shapes were picked from those available by a simple matching algorithm based on the distance between the objects and the difference in their velocities. These pairs of objects were then shown to the user to allow a decision to be made as to whether or not they were created by the same object. The values of the input variables were then averaged, and the variance was calculated, for both those that matched, and those that did not.

In the main ANDROID routine, the neural network was run on each pair of shapes in the system to decide if they matched. This decision was made by comparing the values of the variables provided to the network with the averages calculated in the training session. If the variables supplied were closer to the average of the matches, then the decision of the network was that the two tracks matched. If the values were closer to the average of the non matches, the decision was that they did not match. The distance between two sets of variables was calculated as the sum of the squares of the differences between each variable, scaled according to the standard deviation of that variable. Pedestrian Numeration Pedestrians moving closely together are often grouped together by the ASSET software as one object. In order to calculate accurate counts of pedestrians crossing tripwires it was necessary to estimate the number of pedestrians making up a particular group. This task proved to be particularly difficult. It was found that the size of pedestrian objects, as provided by the ASSET system, fluctuated considerably as groups moved through a scene, as did the number of corner features detected within the group. Neither variable showed a strong correlation with the number of people in the group. The final numeration algorithm was based on the size of the group measured as a real world area calibrated according to the position of the camera. A series of boundaries were set according to average sizes of different groups obtained during the development of the algorithm.

EVALUATION OF ANDROID A selection of the evaluation results obtained are presented in the following section. A more detailed analysis is provided in ANDROID, 1998 Analysis The analysis involved a comparison between automatic and manual count data for a number of sites. Manual data was provided by survey firms. These ranged from congested urban junctions, with traffic queues to freeflowing traffic on a link road. The ability of the system to detect specific manoeuvres was assessed using origin-destination data, where available. Graphical methods were used to assess the correlation between automatic and manual counts and the effects which environmental factors such as camera set-up, tripwire positions and traffic conditions had on absolute counts. It was generally found that there was a strong linear relationship between automatic and manual counts. Under good viewing conditions, where vehicles were well-separated, counts accuracies were within a few per cent of the manual counts. Where there was occlusion between vehicles as a result of, for example, counter trip-wire position or high traffic density, this

linear relationship was found to persist under steady traffic conditions. Although the automatic count could be reduced as a result of several tailgating vehicles being tracked as a single object, this could be compensated for by using the slope of a least-squares fit against the manual data. Deviations from this linear behaviour could generally be identified with changes in traffic flow. For example, an increase in the traffic using the major road through a junction was found to reduce the detection performance for vehicles approaching on the minor road, as they were forced to queue or stop at the junction, thereby reducing the performance of motion-based segmentation technique used for vehicle detection. Pedestrian Counts Table 1 shows a comparison between manual and automatic counts for pedestrians crossing Claremont Road, Newcastle. Automatic counts were calculated for three different tripwire positions across the road. There is a close correlation between ASSET and manual pedestrian counts. When setting up pedestrian tripwires it should be remembered that more reliable detection occurs where object movement is largest. At this site, for example, the central reservation trip-wire provides more reliable detection of pedestrians crossing the road since they slow down as they approach the pavement. Detection of Vehicle Manoeuvres One of the strengths of a tracking-based system is the ability to generate an origin-destination matrix of identifiable vehicle manoeuvres for a given junction, as shown in Figure 2. Table 2 shows the manual and the compensated automatic counts of the various manoeuvres over a two-hour period for video of the Bebside interchange, Wansbeck. The agreement between manual and system counts is generally good. The only exception to this is the case of the small number of vehicles which join the roundabout from origin 2 and leave from destination 2 (Figure 3). Motion detection is least reliable in this area because of the need for vehicles to stop in order to give way to traffic on the roundabout. CONCLUSIONS The results described above clearly demonstrate that the ANDROID software is capable of interpreting the motion of both vehicles and pedestrians in videos of traffic. Under optimum viewing conditions, with careful placement of tripwires, vehicle count accuracies in excess of 98% can be achieved, with pedestrian counts within 90%. The greater the degree of occlusion and the smaller the image size, the more difficult the object detection and discrimination process. Nevertheless, under steady-state traffic conditions, the linear relationship between ANDROID and manual vehicle counts can be used to derive reliable estimates

of actual vehicle counts by applying a multiplicative correction factor. The sensitivity of counter performance to environmental conditions such as camera set-up, traffic conditions and trip-wire positions is a common feature of video-based traffic monitoring systems currently on the market. Work within the ANDROID project has made the underlying ASSET software more robust to camera movement and variations in lighting. It has been found that the object tracking and discrimination module offers the potential of improved vehicle and pedestrian counting through trajectory interpolation. The development of a real-time version would increase the flexibility and number of applications of the system. The application of the ANDROID system to the detection and logging of vehicle manoeuvres at junctions has been demonstrated, and potentially offers a commercially viable alternative to manually intensive methods. The problems of the ANDROID system come about because it is based on ASSET which is a generalised motion detector rather than being specifically a traffic monitoring system. It would be possible to represent both vehicles and pedestrians by three dimensional models, and to compare these models with the moving objects detected by ASSET. By finding which model best fits a particular object that object could be classified as a particular type. This classification could aid the detection of that object in the following frames of the video by providing constraints on the search process. In the case of groups of pedestrians, it would be necessary to apply several different versions of the model and track the movement of each version over time to determine which ones actually represent people. A model based approach would have implications for the speed at which the system could be run. Additional low level image processing would be required for the matching process, and in complex scenes this could become a major overhead. It seems unlikely, however, that pedestrian count accuracy can be significantly improved without an attempt to identify and track individual pedestrians. Video based image processing systems are unlikely to outperform loop based systems in terms of vehicle count accuracy, but their ease of installation make them ideal for short-term traffic surveys. In addition, the application of tracking technology allows the identification of specific vehicle paths across roundabout and junctions, and incident detection based on the analysis of changes in traffic flow patterns. The ANDROID system is also capable of monitoring pedestrian activity, giving it a large advantage over other systems that are currently available.

REFERENCES

R.A. Allsopp, 1997. Image Processing for the Analysis of Pedestrian Behaviour Defence Evaluation and Research Agency Report for Department of Transport. C.R. Williams and M.G.H. Bell, 1997. Pedestrian Behaviour and Exposure to Risk. Proc. IEE Colloquium on Incident Detection and Management. ANDROID partnership, 1998. LINK IST Project ANDROID Final Report, Ross Silcock Ltd. S.M. Smith, 1998. Real-time motion segmentation and object tracking. Real-Time Imaging, Vol. 4, No. 1, pp.21-40. ACKNOWLEDGEMENTS As with all research projects, ANDROID was a team effort and thus the achievements described in this paper cannot be attributed to a single person. The author wishes to gratefully acknowledge the significant contributions made to this research by the staff at DERA and in particular Dr Richard Allsopp; Dr Chris Jenner and Chris Williams of TORG at Newcastle University; and also the excellent project management and support provided by David Silcock and Richard Walker at the Ross Silcock Partnership. Furthermore the support of EPSRC and the DTLR/DTI through the LINK initiative is gratefully acknowledged.

Figure 1a: Object Models

Figure 2 Frame from video, and

Figure 1b: Pedestrian and Vehicle Counts

Origin 1

Origin 2

Destination 1

Destination 2

Figure 3 Schematic of manoeuvres for Bebside interchange, Wansbeck.

Period ANDROID ANDROID ANDROID Manual Manual Manual 1000 38 37 43 39 39 39 1015 20 25 21 37 37 37 1030 25 26 31 24 24 24 1045 96 135 110 142 142 142 total 179 223 205 242 242 242 Table 1 Automatic and Manual pedestrian counts from Claremont Road, Newcastle.

Origin 1 Origin 2 Period Destination 1 Destination 2 Destination 1 Destination 2 Manual AND. Manual AND. Manual AND. Manual AND.* 0700 39 33 22 0 36 30 21 5 0715 86 77 26 10 74 71 36 4 0730 120 84 63 15 116 88 62 12 0745 97 89 81 11 124 96 65 5 0800 79 77 40 4 76 70 40 7 0815 89 110 76 -1* 90 104 78 3 0830 78 86 63 7 70 89 70 3 0845 90 84 39 3 78 88 49 4 Total 678 640 410 49 664 636 421 43 Table 2 Manual and ANDROID counts of vehicle manoeuvres at Bebside interchange, Wansbeck. * counts derived from difference of other counts.