Abstract 1 Colour Templates - CiteSeerX

45 downloads 92187 Views 158KB Size Report
email: [email protected]. Abstract. This paper presents new results of work that has been carried out in tracking objects, notably people and ve- hicles ...
Tracking using colour information Simon A. Brock-Gunn, Geo R. Dowling and Tim J. Ellis Department of Computer Science City University, London EC1V OHB U.K.

email: [email protected]

Abstract This paper presents new results of work that has been carried out in tracking objects, notably people and vehicles, using simply the colour information available. It includes details of the algorithm that has been used to extract and maintain the background information for the scene dynamically, and shows how a hierarchy of colour templates can be used to store information about targets and track them as they move into the eld of view. The paper then describes how the system handled a typical task; that of monitoring vehicles at a trac light controlled road intersection, discusses its performance, and procedures for maintaining consistency in the database after occlusions have occured.

1 Colour Templates

R

F O1

Make

T

O2

Make

T

T1

Differ D

T2 Comp.

O3

Make

T Tm

Select O

Split

Object Extraction

On

Make

Template Calculation

T

Results Template Tracking

red rg green 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 blue 5 6 by 7 8 9 yellow 10 11 12 13 14 15

When considering a general list of properties that the human visual system may use for recognising and tracking objects, it might seem obvious that colour would rank highly, alongside other factors such as shape, size and texture. In the work presented here, we aim to show what can be done with colour information alone, in the task of tracking objects in cluttered scenes. Envisaged tasks include tracking personnel and vehicles around aircraft at airports. Security is the motivation here, and a composite system would use knowledge of which personnel and vehicles ought to be present, and their relevant timings, so that intruders could be detected. Tracking vehicles at road intersections could be used in highway planning [5] [8], and surveillance [4] and following speci c people in shopping centres could Figure 1: be used crime prevention/detection. . (b) An earlier paper introduced the theory of hierarchi- . cal colour templates [2]. Figure 1(a) illustrates the behaviour of the system. A static camera is used to capture the background colour reference image R, which is dynamically updated to handle changing lighting conditions. The objects in the scene can be then found by simple di erencing, colour information is stored in

v R

o l

c O

n=0 n=1 n=2 n=3

n=4

rg

(a) Complete tracking process - by plane (c) Polar distribution (d) Pyramid structure

templates, and identi ed with reference to a database of current objects. We employ a dynamic background updating strategy to cope with changes in the global illumination, and template replacement to address the problem of colour constancy. 1.1

Dynamic background extraction

At any frame F in the sequence, for each point in the image, reference must be made to the preceding frames to determine whether it is a background point. A suitable algorithm may then be one which takes into account not only a di erence threshold to determine allowable ambient changes in background, but also a constancy value. This constancy value would be the number of frames for which a point would have to maintain its value to within a di erence threshold, in order to be considered a background point. For any given frame, each point within the frame is analysed:



If the di erence between the sum of the red, green and blue components at this point in the scene and the respective point in the background image is more than the di erence threshold, then it is a foreground point | make no change to the background image and set the constancy counter for this point to zero



If the di erence is less than the di erence threshold, then increment the constancy counter and: { If the constancy counter has just reached the constancy value, then this point is determined to have become a background point after a period of being a foreground point | copy this point into the background image { Otherwise this point has been, and still remains, a background point | update the respective point in the background image by using a weighted average One of the most e ective, and easiest to implement, ways of achieving this is to use an exponential decay of preceding values, so that for frame n of the sequence, background point Bnx;y may be calculated from the previous background point Bn01x;y and the current image point Inx;y as Bnx;y

= wInx;y + (1 0 w)Bn01x;y

where 0 < w < 1 and refers to the weighting to be applied to the current frame as compared to previous frames. 1.2

Tracking

When in tracking mode, each incoming colour image frame F is compared with the background, and a binary array element is set if there is sucient di erence

between the pixels in the incoming and background frames. From this binary information D, the object sub-images Oi , greater than a suitable size threshold can be extracted. For each sub-image Oi , we transform the raw red, green and blue (r; g; b) values into the two opponent colours (rg; by) which represent colours along the red to green, and blue to yellow axes using the transformation [1]



TOrg;by



2b 0 r 0 g . = r 0 g; 2

This type of transformation has been used to provide object detection cues [7] for the recognition of static objects. The three dimensional colour information has been mapped to two dimensions, and for each object a two dimensional colour histogram can be drawn up as a template. This 2-d template needs less storage than a 3-d one, and so is faster to handle, at the expense of the intensity information [6]. Figure 1(b) shows how a template in the rg and by dimensions might look for a person wearing a red shirt and blue trousers. For this template, n = 16, there are sixteen bins on each axis, and regions of intensity (represented by darker areas) are more prevalent in the red and blue parts of the template. Note that with these two axes the template is invariant to rotation, translation and re ection of the object in the image plane. However for the tracking tasks we have considered the objects are unlikely to be subjected to rotation or re ection, and the spatial distribution of the colours can be important factor in resolving similarly coloured objects. To this end, we add two further axes to the template which implement this spatial weighting: r and , which are analogous to the polar coordinates of a given point, o, in an object, O, such that, as shown in Figure 1(c), if c is the centroid of the object then  = 6 vco; and r = l=R, where v is the point of intersection between the circle, centre c, radius R, which circumscribes the object and a vertical line through c, and l is the distance between o and c. Thus we are left with a template T of four dimensions: rg, by, r and , and the need to match each object with each template Ti ; (i = 1::M ) stored in a database of templates that have been tracked in our image sequence. Intuitively some tracked objects will be radically di erent from others, and they ought to be declared di erent with an ecient search and reject test. This objective is achieved by having a hierarchy of templates at di erent resolutions, and by trying to match initially at the coarsest resolution, and only going to higher resolution for objects considered similar. Figure 1(d) illustrates 2-d templates at di erent resolutions. 4-d templates with 16 bins along each axis need 164 storage locations per template. By averaging this information and maintaining extra templates with 8, 4 and 2 bins along each axis, storage increases by

less than 10 percent, but searching time may be just one ftieth, depending on how similar the objects are. We decide which template in a library the current target template most resembles, by comparing it with each in turn. If the match coecient is less than a minimum threshold, the target has been identi ed, and the library template is updated. If it is greater than a maximum threshold for all library templates, a new object has been found and is put into the library, otherwise the match is inconclusive. Even when an object is occluded, or disappears totally from view, it may be re-located as soon as it can be clearly seen again. Since work is carried out on a moving sequence, the problems of changing object shape and maintaining colour constancy between frames are handled by allowing small changes to each object's template.

2 Trac monitoring The hardware consists of a Sun SparcStation and a Parallax video frame grabber board, capturing images of 384*288 24-bit colour pixels. With a frame rate of 15 frames per second, and JPEG image compression, 20 seconds of data may be saved in around 6 Mbytes. Figure 2 shows four frames from the middle of a sequence. The entire sequence lasts for about forty seconds, and depicts trac passing through a signalcontrolled road intersection. The frames shown here cover about one and a half seconds of that sequence when three vehicles are being tracked, and are chosen to be a good representation of the sort of scene which may typically be analysed and of the type of problems encountered. The whole sequence of 618 frames was recorded at a rate of fteen frames per second, and the images shown here are of every fth frame of the sub-sequence, which starts at frame 225. In the rst frame, two objects are detected, and as the database is currently empty, the two objects' colour descriptions are inserted as new entries. The black taxi is associated with database entry number one and the red van with entry number two. The two vehicles are tracked in the second frame. Even though more of the van is now visible, it matches well with database entry number two, since matches are not in uenced by objects' size. Both of the objects' entries in the database are updated to take into account any small changes which may be occurring in terms of lighting changes, aspect, etc. In the third frame the van turns across the front of the taxi, causing a slight occlusion of the taxi. Only one object is detected (consisting of the van and taxi joined together), and this object contains some ele- Figure 2: ments which match with each of the two entries but sequence a sucient number which similarly cause mismatch. The composite object's description is entered as a new database entry | number three.

Four frames from the road intersection sub-

The next frame shows the van continuing its turn across the front of the taxi, causing the taxi to become more occluded, and a blue lorry which enters the scene from the left. Two objects are detected in this scene the combined taxi and van, and the front of the lorry. Although the van is causing the taxi to become more occluded, this process is slow and the continual updating of entry number three in the database re ects these small changes ensuring that the combined object still matches within the pre-determined margin of error. The front end of the lorry matches with none of the three entries in the database, and so its description is inserted as entry number four.

3 Experimental results Using experimental data, it is possible to study the rate of success or failure of database matching using true and false positives and negatives. It was found that using opponent colours with both the distance and angle information performed better than using the opponent colours alone, or in combination with either distance or angle. There are three main sets of events which can be counted:

 Insertion of a new entry into the database. An ob-





ject and a match should have taken place, causing the description to be updated in the database. Between the certainty of a match and the certainty of mismatch, there is an area of doubt where a given match is deemed to be inconclusive. This is expressed as a gap between the thresholds for match and mismatch, where no conclusions are drawn and no action is taken as a result. Since such events convey no conclusive knowledge, their occurrences are excluded from the results. Over a 64 frame sub-sequence, all methods correctly identi ed and inserted eight separate vehicles which were shown. However, \extra" objects have also been detected. This is generally arises from one of two sets of circumstances:



As an object enters from one edge of the scene, its composition changes so quickly that that it appears to be totally di erent between two successive frames. Incidence of this can to some extent be reduced by increasing the threshold size for objects, so that partial object images are not processed. This would be at the expense of being able to process genuinely small objects.



Although, from time to time, uncertainty in the match coecients may be expected, occasionally the match coecient is so low that it passes beyond the area of uncertainty and falsely becomes a mismatch. Since the object does not (and should not) match with any others in the database, the object is inserted as a new description. The result is that there are now two descriptions for the same object in the database. The description with which the object will match in the future then becomes unpredictable.

ject is located in the scene and compared with all the descriptions currently residing in the database. The match coecients calculated are suciently low to be able to determine that the object de nitely does not match with any of the descriptions in the database. The conclusion is therefore that the obejct has not been previously encountered and its description is inserted as a new entry in the database. I indicates that this event has occurred correctly, a true positive, whilst I 0 indicates As would be expected, the number of correct that the insertion is incorrect, a false positive since matches which takes place is high. This gure is perthe object does already exist in the database. haps the best indicator of the relative merits of each system when applied to a \typical" object. Match of an object with a description in the The number of mismatches which occurs is dependatabase. An object located in the scene is com- dant on the number of matches: since the database pared with one of the descriptions in the database is sorted, a positive match terminates processing for and the match coecient is suciently high to be the particular object. This means that more corable to determine that the object in the scene rerect matches will naturally result in fewer correct mislates to that description in the database. M inmatches. However, although the absolute gure may 0 dicates that the match was correct. M indicates not be of use, comparison between correct mismatches that the match was incorrect. and incorrect mismatches yields some useful informaMismatch of an object with a description in the tion. Incorrect mismatches are generally not fatal to database. An object located in the scene is com- the process | as long as there is no incorrect match pared with one description in the database and elsewhere in the database, and as long as there is at the match coecient is suciently low to be able least some uncertainty in a match, the object will simto determine that the object in the scene does not ply be ignored for the frame. By taking these counts it was observed that the correlate to that description in the database. S indicates that the event occurred correctly. S 0 in- rect Imaintenance of descriptions within the database dicates that the mismatch occurred incorrectly | | I +I was 89%, the correct matching of objects with the two descriptions were in fact of the same ob- descriptions already in the database | M +MM was 0

0

95%, and the correct mismatching of objects with descriptions already in the database | S +SS was 98%. These gures are useful as a summary of the merits of the scheme, but remember it may only take one incorrect match in a sequence of 100 frames to lose track of an object. 0

4 Keeping the database up-todate One of the most important aspects of the object matching process is striving to maintain a database of relevant descriptions. The experimental results show that many instances of incorrect matching occur because there are super uous descriptions in the database which coincidentally match with the particular object in question. Identi able classes of such objects include:



Noise objects | such as re ections of moving objects, specularities and general image noise;



with storing and processing large amounts of data are still important issues. The desire to perform better always outstrips the resources available. However, using just the colour information held in an image, objects have been tracked successfully in sequences of hundreds of frames. Working out of doors has presented problems when, for example, the lighting suddenly changes due to a cloud going in front of the sun, or a vehicle passes into the shade of a tall building. By using other trackable features we can remind the system that the change is due to the lighting, as the object, identi ed by other features, is in its expected position. They would also help disambiguate objects with the same colour and colour distribution, two red Honda Civics, for example. It will be exciting to integrate this colour work into another tracking system using, for example, Kalman ltering [3], but it was always intended here to show just what can be achieved using colour. To that end, a robust tracking mechanism has been demonstrated.

References

Previously-occluding objects | when two or more objects occlude, their description is still tracked [1] Dana H. Ballard and Christopher M. Brown. Comthrough the database, even though the results are puter Vision. Prentice-Hall, Inc., Eaglewood Hills, not required; when the objects become separate NJ, 1982. again, their last known joint description remains [2] Simon A. Brock-Gunn and Tim J. Ellis. Usin the database; ing colour templates for target indenti cation and  Partial objects | particularly when entering the tracking. In Proceedings of the British Machine Viscene from one of the image edges, the shape and sion Conference, pages 207{216. British Machine colours of an object may appear to change rapidly Vision Association, 1992. as more of it becomes visible; if the change is sucient, matching cannot take place successfully and [3] T.J. Ellis, M. Mirmehdi, and G.R. Dowling. Tracking image features using a parallel computational a description of a small part of the object will remodel. In Proceedings of the SPIE, volume 1708, main in the database. pages 172{183. SPIE, 1992. One possible aid to ensuring the integrity of the descriptions in the database is to devise a set of rules de- [4] T.J. Ellis, P.L. Rosin, and P. Golton. Modelscribing the circumstances in which a template may be based vision for automatic alarm interpretation. removed from the database. These rules will have reAerospace and Electronic Systems Magazine, 6:14{ gard to the conditions which cause redundant descrip20, March 1991. tions, as well as reducing the possibility of removing [5] D. H. Mott, K. D. Baker, G. D. Sullivan, and D. C. valid entries. The likelihood of such removal would: Hogg. An initial study into bus detection in London  increase with the length of time an object has retrac. In IEE Colloquium on UK Developments in mained in the database without being matched; Road Trac Signalling, pages 9/1{9/5. IEE, 1984.





be higher for a smaller object whose size makes [6] Milan Sonka, Vaclav Hlavac, and Robert Boyle.

uctuations in appearance relatively greater; Image Processing, Analysis and Machine Vision. Chapman and Hall, London, 1993. be higher for an object which has only ever been matched for a short while during its existence; [7] Michael J. Swain. Color indexing. Ph.D. thesis, University of Rochester, New York, November 1990.

5 Discussion

Although the template colour descriptions are compact ways of storing the colour information associated with an object (certainly when compared to a standard fullcolour image representation), the problems associated

[8] T.N. Tan, G.D. Sullivan, and K.D. Baker. Recognising objects on the ground-plane. In Proceedings of the British Machine Vision Conference, pages 85{94. British Machine Vision Association, 1993.