Aerial refueling drogue detection based on sliding-window object ...

2 downloads 0 Views 859KB Size Report
Sep 22, 2017 - In order to present an aerial refueling drogue detector, we use a sliding-window object detection framework, while using hybrid features of the ...
Proceeding of the 11th World Congress on Intelligent Control and Automation Shenyang, China, June 29 - July 4 2014

Aerial Refueling Drogue Detection based on Sliding-Window Object Detector and Hybrid Features Mingran Bai, XinGang Wang, Yingjie Yin and De Xu Research Center of Precision Sensing and Control Institute of Automation, Chinese Academy of Sciences 100190, BEIJING, CHINA {mingran.bai & xingang.wang & yingjie.yin & de.xu}@ia.ac.cn Abstract - In order to present an aerial refueling drogue detector, we use a sliding-window object detection framework, while using hybrid features of the sub-image in the detecting windows. Image processing technique and wavelet filter technique are used in the process of feature extraction, to form hybrid feature set. Feature selection depending on AdaBoost is used in the process of feature subset selection. The detection of the black center of drogue is in conjunction with the determination of the external umbrella area. Our aerial refueling drogue detection works well in a variety of illumination environment as well as works with high computational efficiency. Index Terms - Object detection; Computer Vision; Machine learning; AdaBoost; Precise control

the same area size. We get the rectangular features response of Fig. 1 CAD model of the drogue and different appearance of a same drogue

I. INTRODUCTION A computer vision(CV) system used in the process of aerial refueling, which can detect the drogue, track it and measure the distance visually, is the demand of precise control of air refueling, especially for unmanned aerial vehicles. Our visual object detection system is designed for the air-to-air refueling (AAR) process using probe-and-drogue refueling system. Drogue is a part of the aircraft providing the fuel, and the probe, as well as the visual camera is set on the receiver aircraft (see Fig. 1). The goal of our system is detecting the existence, location and the size of the drogue in the images captured by the visual camera on the receiver aircraft. Our work is a vitally important part of the overall CV system and the basis of object tracking and visual measurement. Detecting aerial refueling drogue seems to be a task not having more challenge than the vehicle or face detection. However, when you take a closer look at the fig. 1, you will observe that the orientation of some drogue may be 180 degree off another, even though these drogues have the same appearance. The most challenging work is extracting a robust feature set, which can describe the object discriminatively, under different illumination conditions. We segment the sub-image in the detection windows into 2 sections, one of which is the black center of drogue, another of which is the external drogue’s umbrella. As to the black center sections, we extract the grey-scale based features and the Gabor-wavelet based features. As to the external umbrella ribs sections, a classifier based external umbrella model is build. First we split the external umbrella ribs sections into regions which share almost

978-1-4799-5825-2/14/$31.00 ©2014 IEEE

this region and use their histogram as the region’s feature vector. The second need of the detection is to select a reasonable subset of the feature space, to speed up the classification. We make a detail study on the selection of these hybrid features and evaluate it by the performance of classifier trained by the responsive feature subset. We created a drogue test set and get perfect results on them, especially on the drogues with a large range of illumination conditions. We briefly describe previous work on object detection in I, give our method’s overview in III, describe our data sets in IV and discuss the experimental evaluation in V. The conclusions of our experiment are summarized in VI.

II. PREVIOUS WORKS There are many object detection method nowadays, but here we just list a few relevant papers on object detection[1, 2, 3, 4]. See [5] for a survey. Papageorgiou et al[2] describe a pedestrian detector based on a polynomial SVM using rectangle feature as input descriptors, with a parts (subwindow) based variant in [2]. Depoortere et al give an optimized version of this [6]. Gavrila & Philomen[11] take a more direct approach, extracting edge image and matching them to a set of learned exemplars using chamfer distance. This has been used in a practical real-time pedestrian detection system [7]. Viola et al [3] made an efficient progressively more complicated region rejection rules based on Haar-like wavelets and space-time differences. Ronfard et al[8] made an articulated body detector by incorporating SVM based limb classifiers over Gaussian filters in a dynamic programming

81

B. Black Center Classifier Like most normal sliding window detector [2, 3], the black center classifier C1’s basic idea is that feature vector of a sub-image in the sliding window is extracted, from a large set of potential visual features and the object determination is made by the classifier C1. These potential visual features, which are also called hybrid features, consist of grey-scale based features and the Gabor-wavelet based features. The classifier C1 is used to determine whether the center region of a sliding window image has a black “hole” looks like the drogue’s pipe. Grey-scale based features could be described as

framework, which is similar to those of Felzenszwakb & Huttenlocher [9] use combinations of orientation-position histograms with binary-thresholded gradient magnitudes to build a parts based method containing detectors for face. Nowadays the detection algorithm goes consistently with object tracking algorithm. Such as the Helmut Grabner & Horst Bischof[13] use on-line boost method, which is a front field of object online detection. The specially designed object detection framework called stixels was used by Rodrigo Benenson et al [14], to get the efficiency of 100 frames per second for pedestrian detection. III OVERVIEW OF THE METHOD

φ0,(x ,y ) = I ( x,y)

A. Drogue Detection Process In this section, we give an overview of the drogue object detection process, which is summarized in Fig. 2. Some implementation details are set in IV and V. The bright region and the shadow on the drogue’s external umbrella could appear anywhere, due to the different illumination direction; however the drogue’s black center doesn’t change a lot in appearance. When the light is too strong, the bright region on the drogue’s umbrella usually could be overexposed and the texture and the ribs of the overexposed region may disappear. The stability of appearance is taken into consideration, so we segment the subimage in the detection windows into 2 sections: the black center section and the external umbrella section. As to the black center section, we trained the classifier C1, and we also build the classifier based model for the external umbrella section. Our work can be concluded in these 4 steps: S1, collect and calibrate the drogue and the background training set; S2, use the central region of them to train the classifier C1 for the black center of drogue; S3, build the classifier based model of the external umbrella region S4, use the C1 and the model in the sliding window detection framework

y is the location of the center region of a sliding window, and I ( x,y) is the grey-scale value. where x and

Gabor-wavelet based features are defined as ⎡ 1 ⎛ x ′2 ⎛ 2π x ′ ⎞ y ′2 ⎞ ⎤ g ( x , y , θ , λ , σ x , σ y ) = exp ⎢ − ⎜ 2 + 2 ⎟ ⎥ cos ⎜ ⎟ ⎜ ⎟ σ y ⎠ ⎥⎦ ⎢⎣ 2 ⎝ σ x ⎝ λ ⎠

where

and

is the center frequency of the filter. is the

wavelength of cosine function.

σx

and

σy

define the scale

in both directions, while is the orientation factor and we choose 6 different orientations ⎧⎪ π ( k − 1) ⎫⎪ θk ∈ ⎨ ⎬ , k = 1, 2. . . , 6 6 ⎪⎩ ⎪⎭ We discuss the miss rate as

σx , σy

and

λ

change and

find a most appropriate value of them in V. In this case, the Gabor-wavelet base features could be described as

φk ,( x,y) = g ( x , y , θk ) , k = 1, 2, . . . , 6

After feature extraction, for a single image region is

p , whose center

pc , we can get the classifier C1’s potential visual

features

Φ p = ⎡⎣φ0p , φ1p , φ2p , φ3p , φ4p , φ5p , φ6p ⎤⎦

{

Φip = Φi ,( x ,y ) | ( x , y ) ∈ pc

}

The use of feature select has many precursors [12]. For these potential features, we use CART algorithm to form the weak classifier for each feature and use AdaBoost to select a small number of critical and discriminative visual features. Recall that each center region of image p in the training set p

has a label L . We have a set of n images, which can be seen as data points used for two-class classification problem, and each data point is N dimensional. The training set of n data points in N dimensional feature space can be described as

(Φ , L ) , (Φ , L ) , . . . , (Φ , L ) , Φ 1

Fig. 2 An overview of our aerial refueling drogue detection system

82

1

2

2

n

n

i

∈ ℜN , Li ∈ {±1}

If a

( Φ , L ) pair is a positive instance, then L i

i

i

The potential visual feature set could be described as

= 1.

Φ = ⎡⎣φ0 , φ1 , φ2 , φ3 , φ4 , φ5 , φ6 ⎤⎦ , and from each type of features Φ j we select a nj dimensional feature sub-set

Li = −1 when it is a negative instance. Φi contains 7

different types of feature set, which can be attributed to the Grey-scale based features and Gabor-wavelet based features. And a type of feature set features so it is

Φj ' . After feature selection, we get the feature vector of

Φj i contains Nj individual

7

∑n

Nj dimensional. AdaBoost can select those

individual features that can best discriminate among classes, so we select space

Φj i .

single feature

φj ,(x ,y) .

C. Classifier based External Umbrella Model When the C1 classifier makes the positive determination, it means that the center region of the sub-image in the sliding window is similar to the drogue’s black pipe. For some visual difficulties, there may be some dark round spots, and the classifier based model is used to determine whether the external region of the sub-image is qualified. Since the visual characteristic of external drogue umbrella would be various, for more details, the bright region and the shadow on the drogue’s external umbrella could appear anywhere, due to the different illumination direction. We split the external drogue’s umbrella into 8 similar regions, shown in fig. 3. Every region has an approximate sum of pixels and we train classifier C2’ to recognize whether this is a normal part of the external drogue’s umbrella. Take consideration of the over-exposed region on some images, we integrate the 8 classifiers C2’ into the classifier C2. If the number of negative determination of classifier C2’ is larger than 3, the classifier C2 would give a negative determination.

hj ,( x ,y ) for a

An AdaBoost iteration picks a most

discriminating information allowing a correction of classification errors resulted from previous steps. The outline of AdaBoost can be describe 1. Given a type of feature set

φj

and its label

training set

(Φ , L ) , (Φ , L ) , . . . , (Φ 1

1

2

2

2. Initial weights wj

,( x ,y )

nj

)

Lj in the

, L j , Φi ∈ ℜ j , Li ∈ {±1} n

N

1 1 , , where m and 2m 2l

=

l are the number of negative and positive data point respectively ( m + l = nj ) . 3. For iteration t a.

nj

in the V. With these selected feature vectors, we train the C1 classifier.

nj sub-feature set from Nj dimensional feature

CART algorithm can form a weak classifier

dimensional. We discuss the appropriate value of

i

i =1

= 1, . . . , T

Normalize the weight

wt ,j ,(x ,y ) ←

b. Select the best weak

wt ,j ,(x ,y )

∑ ( )w ( classifier wt ,j , x ,y , ( ) ∀ x ,y

t ,j , x , y )

with

which the weighted error is the minimum

c.

ε (x ,y ) =

∑w

α (x ,y ) =

1 − ε (x ,y ) 1 ln 2 ε (x ,y )

nj

t ,j , (x ,y )

hj ,(x ,y ) − Lj ,( x ,y )

Fig. 3 For the external area, we split it into 8 regions. For each region, we use its respective rectangular feature to build the histogram model.

Update the weight

⎧e , hj ,(x ,y ) = Lj ,(x ,y ) ⎪ wt +1,j ,(x ,y ) = wt ,j ,(x ,y ) × ⎨ α ⎪ e (x , y ) , hj ,( x ,y ) ≠ Lj ,( x ,y ) ⎩ −α ( x , y )

4. The strong classifier H Φ ( j

)

We use rectangular features, whose direction is vertical to the radius of this region, to extract the visual features of each region. Rectangular feature (or called Haar-like feature [3])

⎛T ⎞ is a = si gn ⎜ ∑ wt ht ( Φj ) ⎟ t = 1 ⎝ ⎠

filter’s by ψ

linear combination of T weak classifier that contains the critical and discriminative visual features.

83

response

on

location

(x , y)

is

represented

( x , y ) . The responses of this filter are discretized into

b intervals that define the histogram. Let

TABLE I MISS RATE OF A SET OF PARAMETERS

bmki n and bmkax be the

lower and upper limits of the k -th bin. The feature vector of this region would be k -th dimensional and the feature vector Φ is defined as

φ

k

=



k k ∀ ( x , y ) ,bm i n ≤ψ ( x , y ) < bmax

λ 2√2 4√2 6√2

ψ (x , y )

Φ = ⎡⎣φ 1, φ 2 , . . . , φ k ⎤⎦

=0.5λ =0.5λ 12.0% 7.5% 8.3%

=0.7λ =0.5λ 11.8% 7.9% 9.7%

=0.5λ =0.7λ 12.1% 8.9% 10.1%

=0.7λ =0.7λ 12.3% 9.2% 13.1%

Over all the finding of the Gabor wavelet parameter tuning is that, the Gabor wavelet features are fit for the feature extraction. The fact that Gabor wavelet at the adequate value

With these feature vectors, we train the classifier C2’ and integrate them into classifier C2. Even though the single classifier C2’ is not precise enough, the combination classifier C2 or the classifier based model is able to determinate the external umbrella regions. Since the rectangle feature is quite efficient in computation, building a model and verifying it is quite efficient. This ROC comparison also reflects that the top-n important feature set play the most important role in training the classifier and the rest’s contribution is very small.

of

λ = 4 2 , σ x = 0. 5λ

and

σ y = 0. 5λ

for the

64 × 64 sub-image can generate good performance by just an iteration of classifier training. Maybe this type of 2D Gabor wavelet has the same size as the dark pipe object.

IV. DATA SET AND METHODOLOGY We create an image set of the drogues and background by manual calibration from a clip of aerial refueling video. Our image set contains 140 drogues and 700 background subimages. We choose 3/4 of the image set as training set and take the rest as testing set. The testing set of the classifier C2 are some background sub-images where the center of whom is looked darker than the rest, and we test our classifier’s performance on these hard examples. After feature extraction and feature selection as we mentioned in II, SVM is used as the final classifier For SVM good result for computer vision problems. After the detection system start to work we use retraining process to improves the performance, which is an important technique in the engineering field. To quantify our systems performance we use miss rate (1Recall), ROC curve of a bi-classifier.

Fig. 3 The ROC curve for the classifier whose m=20 and n=10

For the feature selection process, we get 2 set of

Φ = ⎡⎣φ0 , φ1 , φ2 , φ3 , φ4 , φ5 , φ6 ⎤⎦ one of which is

Φ = ⎡⎣20, 10, 10, 10, 10, 10, 10⎤⎦

V. OVERVIEW OF RESULT To quantify the parameter tuning of the Gabor wavelet, we just use the miss rate on the testing dataset. For these testing set, we get the miss rate as

σx , σy

and

λ

another one of which is

Φ = ⎡⎣10, 20, 20, 20, 20, 20, 20⎤⎦ .

change

The ROC curve of the classifiers trained by their data set is

and find a most appropriate value of them. As we mentioned in III, we use the miss rate of the 3 sets of the values of

λ

and 4 sets of the values of

σx , σy .

After applying them on the testing central sub-image set,

84

also performs well on the final determination of the drogue object.

REFERENCES [1] [2]

[3] [4]

[5]

Fig. 4 The ROC curve for the classifier whose m=10 and n=20

[6]

We just plot these two type of Φ and the two classifier trained by these two set of features works both well. However the grey-scale feature performs well than Taking the performance and the computational efficiency into consideration, the second set of feature selection result is used as our final feature set to train the classifier C1. For these testing set, we get the miss rate as

λ

[7] [8] [9]

σ x , σ y and

[10]

change and find a most appropriate value of them. As we mentioned in III, we use the miss rate of 3 sets of

λ

the values of

and 4 sets of the values of

σx , σy .

[11]

After

applying them on the testing central sub-image set,

[12]

VI. SUMMARY AND CONCLUTION

[13]

Aerial refuelling drogue detection is a challenging task because of the object’s variable appearance and the object’s wide range of pose, in addition to the detection accuracy requirements of detection. Taking them into consideration, we make the classifiers C1 and build the classifier based model, working with the sliding window object detection framework, in order to meet these requirements. In the process of features extraction, we take many common image feature or descriptor techniques into consideration. In the process of features selection, we used a common feature selection method based on AdaBoost to get the subset of redundant features, promoting the working efficiency of our aerial refuelling drogue detection system. We have shown that using the Gabor wavelet features of

λ = 4 2

,

σ x = 0. 5λ

and

σ y = 0. 5λ

[14]

for the

64 × 64 sub-image and for the detection window gives very good results for drogue detection. For the features selection, our study has shown that using the top 80 of the most discriminant features calculated by AdaBoost by 200 iterations are important for good performance of classifier C1. The integration of classifier for the external drogue’s umbrella

85

Papageorgiou, Constantine, and Tomaso Poggio. "A trainable system for object detection." International Journal of Computer Vision 38.1 (2000): 15-33. Mohan, Anuj, Constantine Papageorgiou, and Tomaso Poggio. "Example-based object detection in images by components." Pattern Analysis and Machine Intelligence, IEEE Transactions on 23.4 (2001): 349-361. Viola, Paul, Michael J. Jones, and Daniel Snow. "Detecting pedestrians using patterns of motion and appearance." Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on. IEEE, 2003. Mikolajczyk, Krystian, Cordelia Schmid, and Andrew Zisserman. "Human detection based on a probabilistic assembly of robust part detectors." Computer Vision-ECCV 2004. Springer Berlin Heidelberg, 2004. 69-82. Gavrila, Dariu M. "The visual analysis of human movement: A survey." Computer vision and image understanding 73.1 (1999): 82-98. De Poortere, Vincent, et al. "Efficient pedestrian detection: a test case for svm based categorization." Workshop on Cognitive Vision. 2002. Gavrila, D. M., Jan Giebel, and Stefan Munder. "Vision-based pedestrian detection: The protector system." Intelligent Vehicles Symposium, 2004 IEEE. IEEE, 2004. Ronfard, Rémi, Cordelia Schmid, and Bill Triggs. "Learning to parse pictures of people." Computer Vision—ECCV 2002. Springer Berlin Heidelberg, 2002. 700-714. Felzenszwalb, Pedro F., and Daniel P. Huttenlocher. "Efficient matching of pictorial structures." Computer Vision and Pattern Recognition, 2000. Proceedings. IEEE Conference on. Vol. 2. IEEE, 2000. Liang, W. A. N. G., Hu Weiming, and Tan Tieniu. "A survey of visual analysis of human motion." Chinese J. of Comput 25.3 (2002): 225-237. Gavrila, Dariu M., and Vasanth Philomin. "Real-time object detection for “smart” vehicles." Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on. Vol. 1. IEEE, 1999. Ravela, S., and Allen R. Hanson. On multi-scale differential features for face recognition. MASSACHUSETTS UNIV AMHERST DEPT OF COMPUTER SCIENCE, 2005. Grabner, Helmut, and Horst Bischof. "On-line boosting and vision." Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on. Vol. 1. IEEE, 2006. Benenson, Rodrigo, et al. "Pedestrian detection at 100 frames per second." Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.