A Recurrent Neural Network for Detecting Objects in Sequences of

3 downloads 0 Views 1MB Size Report
ables a large area to be surveyed efficiently and avoids the vessel passing too close to ... objects using a sector-scan sonar capable of imaging the sea floor out to 800 m from the ... image-segmentation and tracking system for the purpose of ob- .... frequency-modulation (FM) pulse with a center frequency of ap- proximately ...
IEEE JOURNAL OF OCEANIC ENGINEERING, VOL. 29, NO. 3, JULY 2004

857

A Recurrent Neural Network for Detecting Objects in Sequences of Sector-Scan Sonar Images Stuart W. Perry, Member, IEEE, and Ling Guan, Senior Member, IEEE

Abstract—This paper presents a system for detecting small man-made objects in sequences of sector-scan images formed using a medium-range sector-scan sonar. The detection of such objects is considered out to ranges of 200 m from the vessel and while the vessel is in motion. This paper extends previous work by making use of temporal information present in the data to improve performance. The system begins by cleaning the imagery, which is done by tracking objects on the sea bed in the imagery and using this information to obtain an improved estimate of the motion of the vessel. Once the vessel’s motion is accurately known, the imagery is cleaned by temporally averaging the images after motion compensation. The detector consists of two stages. After the first detection stage has identified possible objects of interest, a bank of Kalman filters is used to track objects in the imagery and to supply sequences of feature vectors to the final detection stage. A recurrent neural network is used for the final detection stage. The feedback loops within the recurrent network allow the incorporation of temporal information into the detection process. The performance of the proposed system is shown to exceed the performances of other models for the final detection stage, including nonrecurrent networks that make use of temporal information supplied in the form of temporal feature vectors. The proposed detection system attains a probability of detection of 77.0% at a mean false-alarm rate of 0.4 per image. Index Terms—Recurrent neural networks, sector-scan sonar, temporal features, underwater object detection, underwater object tracking.

I. INTRODUCTION

T

HE ability to accurately detect and identify objects underwater is of great interest to commercial, government, environmental, and military authorities. Man-made objects placed on the sea floor, either through accident or deliberate act, can pose a threat to shipping and the marine environment; hence, finding this type of object is of particular concern to many organizations. Sonar systems are often used to search for such objects due to the ability of sound to propagate with less loss through water than light [2]. It is desirable to detect man-made underwater objects as far from the vessel as possible. This enables a large area to be surveyed efficiently and avoids the vessel passing too close to dangerous objects. One solution to this problem is to use a forward-looking sector-scan sonar system

Manuscript received September 11, 2003; revised February 3, 2004. This work was supported by the Capability Technology Demonstrator Program, Australian Defence Organization, for an advanced mine-hunting sonar system (project SEA 1436). S. W. Perry was with the Maritime Operations Division, Defence Science and Technology Organization, Australia. He is now with Canon Information Systems Research Australia, North Ryde, NSW 2113, Australia (e-mail: [email protected]). L. Guan is with the Department of Electrical and Computer Engineering, Ryerson Polytechnic University, Toronto, ON M5B 2K3, Canada. Digital Object Identifier 10.1109/JOE.2004.831616

mounted (or suspended) under the hull of a vessel. Sector-scan sonar systems usually project a pulse of acoustic energy onto a wedge-shaped region of the sea floor. The energy reflected from objects on the sea floor is then collected by a hydrophone array. The received sonar returns are beamformed to create an image of the insonified region of the sea floor. Sector-scan sonars typically transmit a sonar pulse every second; hence, the operator is able to view an image of the sea floor that updates every second. Even when the vessel is moving rapidly, a single object on the sea bed may be insonified many times. These multiple “looks” of an object can be used to improve the detection performance of a sector-scan sonar system. In [1], a system was presented for detecting small man-made objects using a sector-scan sonar capable of imaging the sea floor out to 800 m from the vessel. That work [1] focused on developing detection systems that are able to operate for different sonar pulse length settings and image resolutions without readjusting the system parameters. Statistical detectors were compared with neural-network-based detectors. It was found that neural networks worked better for the data set under consideration. The system [1] attained a probability of detection of 92.4% at a false-alarm rate of ten per sonar image. The reason for the very high false-alarm rate was mainly because the temporal nature of the object returns was ignored. Although the data was cleaned temporally, the detection system examined each sonar image independently (without considering the preceding sonar images in the sequence) and, hence, did not access the temporal information present in the sonar data explicitly in the detection process. Temporal information is a key feature of how operators are able to detect objects in sector-scan sonar imagery. In a cluttered environment, a single image is very difficult to interpret; many faint objects can look like clutter in a single image. The true identity of such an object can only be discerned by the fact that the object’s acoustic return will appear in a predictable location from image to image as time progresses, whereas the returns due to clutter will appear at random locations in the imagery. In this paper, the system described in [1] is extended to include temporal information of the object returns in the detection process. Detection systems making use of temporal information present in sector-scan sonar imagery have been examined in the past [3]–[13]. An algorithm for segmenting moving and static objects in sector-scan sonar imagery was presented in [4]. This algorithm was based on filtering the data in the temporal domain. Lane et al. used optical flow-motion estimation to track objects in sector-scan imagery [7]. The information extracted with optical flow was used to estimate an object’s position from image to image in a sequence of sector-scan images. Search-area constraints and the optical flow values were merged to compute

0364-9059/04$20.00 © 2004 IEEE

858

a “confidence measure” that was then used to create a tracking tree to track objects in the sequence. Chantler and Stoner [6] developed an algorithm for classifying objects in short-range 10 m) high-resolution sector-scan sonar imagery that takes the temporal information into account. They considered a set of 11 geometric features from objects segmented in each sonar image and called these features “static features.” From each static feature they extracted six statistical measures of how the value of that feature changed within the last ten images; they called them “temporal features.” From the total of 66 temporal features, the best set of 20 features was determined by sequential forward selection. For the final classifier, a linear discriminant function was used with success. The tracking and detection research mentioned above was later merged into a complete system described in [8]. This system used the tracking algorithm presented in [7] to facilitate the computation of the temporal features required by the detection algorithm presented in [6]. Carpenter used decoupled extended Kalman filters to track objects on the sea bed for the purpose of estimating the position of an autonomous underwater vehicle (AUV) [9]. The sonar system used in that work was a two-dimensional (2-D) planar array capable of imaging the sea bed out to 400 m away from the sonar. A number of geometrical features were extracted from objects in the imagery. These features were for object association from image to image and not for object classification. Ruiz et al. described the application of the Kalman-filter approach to track objects in sector-scan imagery [10]. They found that using a Kalman filter to solve the tracking problem was more computationally efficient than the optical flow-based methods considered previously [7], [8]. In [11], the robustness of the combinations of static and temporal features to varying segmentation and image-quality levels was examined. In that work, nine spatial geometric features were considered; four temporal measures were derived from each static measure to produce a total of 36 features. Similar to previous work, the best set of features was selected by sequential forward selection and a linear discriminant function was used for the final object classification. The sensor considered was a high-resolution sonar with a maximum range of 10 m. Petillot et al. developed a sonar image-segmentation and tracking system for the purpose of obstacle avoidance and navigation [12]. Their system was designed for a high-resolution sonar with a maximum range of 40 m and used Kalman filter-based tracking. Static features were extracted from objects of interest to help with tracking, but classification of the objects was not considered. Carpenter and Medeiros used a Kalman-filter approach to identify objects for map matching [13]. Once again, that paper was not concerned with classifying the objects encountered. The previous work on using temporal information to help object classification has been limited to high-resolution shortrange imagery. Often, objects to be classified are within 10 m of the sonar array. This type of sonar system would be fitted to an AUV and is primarily designed to help the AUV navigate and identify objects it encounters at close range. In this paper, we consider a lower resolution sonar system, which is capable of imaging objects out to 200 m from the array. This type of sonar is designed to detect objects at a greater distance and survey large areas rapidly. The objects would be expected to occupy fewer pixels on the display when using this type of sonar and,

IEEE JOURNAL OF OCEANIC ENGINEERING, VOL. 29, NO. 3, JULY 2004

hence, are more difficult to detect by a human or machine. Static and temporal features extracted from objects imaged using a low-resolution sonar would be expected to be noisier than the features extracted from the same objects imaged with a high-resolution sonar. Instead of extracting temporal features explicitly, in this paper we show that a recurrent neural network processing only the static features can, through delayed feedback connections, make use of the temporal information present in the sequence to greatly improve detection performance. This method has the advantage of not requiring the storage of previous feature values or additional system load to compute temporal features. This paper presents a method for detecting small man-made objects in sequences of sector-scan sonar imagery collected using a sonar system mounted under the hull of a vessel. The detection performance of the system presented in [1] is improved by taking into account the temporal information present in the image sequence. Like [1], we begin by estimating the motion of the vessel based on tracking objects on the sea floor. Using this estimate, the imagery is temporally cleaned by averaging multiple pings after compensation for the vessel’s motion. In the resultant image sequence, the appearance of objects lying on the sea floor are enhanced, while clutter noise is reduced. An adaptive threshold level is then used to segment each image into roughly 100–200 objects of interest. The detection system consists of two stages. At the first stage, a neural network detector similar to the one described in [1] is used to detect potential man-made objects in each sonar image. The neural network judges each candidate based on the 15 geometrical, statistical, and texture-based features presented in [1]. After the first stage, a system of Kalman filters are used to track up to 20 objects determined to be potentially man-made by the first stage detector. Using a Kalman filter to track an object through the image sequence allows a sequence of feature vectors describing only the object being tracked to be extracted. Temporal information encoded in this sequence can then be accessed. Instead of explicitly extracting another set of temporal features from the static features, we feed the sequence of static features into a recurrent neural network at the second stage. The time-delay feedback within the recurrent neural network enables the network to access temporal information present in the feature sequence without the need to calculate temporal features. In this paper, we compare the results obtained using recurrent neural networks with the results obtained using nonrecurrent neural networks. For both types of neural network, we compare the performances using static features only and using a combination of static and extra temporal features. We will show that although the addition of temporal features improves the performance of nonrecurrent neural networks, recurrent neural networks using only the sequence of static features are still able to attain a superior performance. Recurrent neural networks given access to temporal features as well as static features do not seem to perform any better than recurrent networks designed to use only the static feature set for the data used in this investigation. This paper is divided into a number of sections. Section I is the introduction and Section II describes the data set used for this investigation, while Section III details the way in which the data was preprocessed. Section IV describes the first stage

PERRY AND GUAN: RECURRENT NEURAL NETWORK FOR DETECTING OBJECTS IN SEQUENCES OF SECTOR-SCAN SONAR IMAGES

Fig. 1.

859

Typical sonar image used in this investigation.

nontemporal detector and the object-tracking methodology is described in Section V. The various options for the second or final detection stage considered in this paper are described in Section VI, while Section VII gives some experimental results. The conclusion is presented in Section VIII. II. DESCRIPTION OF DATA An experimental sector-scan sonar system is mounted on the same trials vessel as used in [1]. The sonar system transmits a frequency-modulation (FM) pulse with a center frequency of approximately 100 kHz, a bandwidth of approximately 10 kHz, and a pulse length of approximately 100 s. The sonar system insonifies (or pings) the underwater environment at a rate of one ping every 1.3 s. The sonar returns from each ping are received by the sensors of a linear hydrophone array and beamformed. The magnitude of the beam signals are squared and integrated over short time intervals so that the range resolution is slightly larger than the pulse length. This data is then presented to the operator in the form of an image on a display. The operator sees a new image of the environment shortly after each sonar ping. The operator is, hence, viewing a sequence of images of the sea floor that are updated every 1.3 s. For this particular system, the sonar’s pulse length setting determines the range resolution of the images and how much of the sea floor the operator is shown in a single image. In this investigation, the term image refers to the image formed by the system using data from a single ping. The term sequence refers to a collection of consecutive images formed when the sonar is allowed to ping repeatedly. When the sonar is set up to transmit signals with pulse lengths of approximately 100- s duration, the sea floor is imaged only out to 200 m from the sonar. The data used in this work are collected during an experiment performed at Jervis Bay, NSW, Australia. Three small man-made test objects are placed on the sea floor roughly 400 m

apart from each other, along a straight line. The vessel and sonar system passes over the objects following the line along which the objects were lain repeatedly with the sonar operating. The sea state is moderate, with waves of approximately 0.1–0.5 m in height present. The water depth is 20 m. Due to the way in which the vessel is moving at the time the data is recorded, objects appear on the sonar display and apparently move toward the vessel before passing out of the sonar’s field of view. The bearing and range at which each object first appears on the operator’s display depend on a number of factors, including the vessel’s position, the underwater acoustic environment, and the aspect angle of the object. For this reason, where and when each object will appear on the operator’s display during each run of the experiment cannot be accurately predicted. The strength and shape of the object acoustic returns are quite variable from image to image due to environmental effects such as multipath propagation and changes in aspect angle as the vessel moves. Fig. 1 shows an image typical of that obtained by the sonar during this experiment in polar coordinates. One of the test objects can be seen in the middle left of the image at a range of approximately 110 m . Note that a number of and a bearing of approximately natural objects are visible in Fig. 1, as well as clutter artifacts. In this work, the term run will be used to describe a single passage of the vessel through the test field. During a run, the vessel passes within sonar range of each of the three test objects exactly once. Once the vessel has reached the end of a run, it turns around and begins a new run by passing through the test field in the opposite direction. In this experiment, for half the runs the vessel travels along a bearing of approximately 140 (relative to north) and for the other half a bearing of 320 . In this investigation, 12 runs are considered, containing a total of 6970 pings. Ground-truth data on the positions and times at which objects appear in the imagery is determined by a human

860

Fig. 2.

IEEE JOURNAL OF OCEANIC ENGINEERING, VOL. 29, NO. 3, JULY 2004

Sonar image following the cleaning operation.

analyst after the experiment. Despite carefully examining of the data (images) from each run, human analysts are unable to spot all three objects in some runs. It is likely that the sonar returns from some objects are lower than the reverberation in certain runs. We only consider those data in which a human analyst is able to spot an object for training and testing the algorithms presented. When an object can to be discerned by a human analyst, we will say that the object was encountered by the sonar. In the data set used by this work, there are 30 encounters present. The average duration of an encounter was about 50 pings; some encounters lasted for as little as ten pings and others for as long as 100 pings. III. PREPROCESSING A number of preprocessing operations are required to reduce the effects of clutter noise, enhance the appearance of objects on the sea bed, and produce a segmentation of each image in preparation for the detection procedure. The preprocessing operations used in this paper are almost identical to those discussed in [1]. First, the vessel’s motion from ping to ping is estimated by tracking a bright object detected on the sea bed using a simple threshold detector. In [1], a bright object found in the current ping is matched with an object found in the next ping with the aid of rough vessel motion information obtained from Doppler sonar sensors on the vessel’s hull. A small region of each image containing the object is converted into Cartesian coordinates, followed by a cross-correlation of the two regions. The position of the peak in the cross-correlation becomes an estimate of how far the object (and, hence, the vessel, since it is assumed that all objects are stationary on the sea floor) has apparently moved in the time interval between the images. In this paper, the bright object found in the current ping is tracked using a Kalman filter.

This filter, which is used to refine the motion estimate. is identical in implementation to the bank of Kalman filters used for object tracking after the first stage of detection. For that reason, we defer the full description of the Kalman filter until Section V of this paper. The crude vessel information becomes an input to the Kalman filter, in this case to help predict the position of the currently tracked object in the next image. The improved motion estimate is obtained from the state vector of the Kalman filter. The tracking system is fairly simple at this stage. The currently tracked object is abandoned if a match cannot be found within the vicinity of the object’s predicted position in the next image or if another object is found that appears more than twice as bright as the current object being tracked. In the former case, a new bright object will be searched for and tracked. In the latter case, the brighter object is likely a different object and will be tracked. Note that, for motion estimation, the Kalman filter always tracks the brightest object. The resultant motion estimate (which we will denote as the new motion information) is given either by the rough motion information obtained from the Doppler sonar sensors or the more accurate information obtained from the Kalman filter. When an object is being tracked successfully, which happens when the image sequence shows a bright object apparently moving in a path consistent with the vessel’s motion, the new motion information is provided by the Kalman filter. When no bright objects can be discerned or when a track is abandoned, the new motion information is provided by the Doppler sonar sensors. The rest of the preprocessing operation proceeds as per [1]. Each image is averaged with the previous five images in the sequence after suitable alignment based on the new motion information. Fig. 2 shows the result of the cleaning operation for the image in Fig. 1. Note that the object of interest has been contrast enhanced while clutter has been suppressed.

PERRY AND GUAN: RECURRENT NEURAL NETWORK FOR DETECTING OBJECTS IN SEQUENCES OF SECTOR-SCAN SONAR IMAGES

861

Fig. 3. Segmentation of the sonar image.

The next stage is to segment and label objects in the imagery. Segmentation is performed by examining each pixel in the image and computing the mean and standard deviation of its local neighborhood. A threshold value for each pixel is computed as

TABLE I SET OF STATIC FEATURES

(1) is the mean of the local neighborhood of the where and is the standard deviation of the local pixel at neighborhood. The constant controls the quality of the segexceeds mentation. If the intensity of the pixel at position , the pixel is set to unity in the segmentation. Otherwise, it is set to zero. The segmentation is then examined to remove objects smaller than a certain size. Fig. 3 shows a typical seg. This value of is used for all mentation result for the data in this paper. Note that in [1] a different data set was used and that the vessel was tied to a wharf during the experiment. These differences change the properties of the data. For this reason, some of the preprocessing parameters are different in this paper. In [1], the length of the temporal filter was six images and the value of used in the segmentation was 2.5. IV. FIRST STAGE OF DETECTION The nontemporal neural-network-based detection system described in [1] is used for the first stage of detection. Following preprocessing, the detection of objects of interest is made by a multilayer perceptron (MLP) [14]. Each candidate object identified by the segmentation procedure has a set of 15 geometric, statistical, and texture-based features extracted. The features

used in this work are listed in Table I. These features were selected from a larger group of 31 features using a sequential backward selection approach. The exact equations for these features, as well as the details of their selection, were described in [1]. The nontemporal detector developed in [1] was designed to function under a number of different transmitted signals. To facilitate this, an extra feature was used to denote the mode of

862

IEEE JOURNAL OF OCEANIC ENGINEERING, VOL. 29, NO. 3, JULY 2004

operation of the sonar. This additional feature is not used in this work, since we are only dealing with one type of transmitted signal. The neural network used for the first stage of detection in this paper has three layers. These are the input, hidden, and output layers, which consist of 15, 20, and 1 neuron, respectively. For training purposes, the neural weights are initialized using the Nguyen and Widrow method [15] and training is performed by the Levenberg–Marquardt algorithm [16]. These algorithms will also be used to train the neural networks used for the final stage of detection. V. OBJECT TRACKING The detected objects from the first stage of detection are passed to the tracking stage. The tracking stage consists of 20 independent Kalman filters. Each filter has the same design, but does not exchange any information about its tracked object with any of the other filters. This is a valid approach, since the objects being tracked are located on the sea floor and can be expected not to have intersecting tracks. Proper multiobject tracking algorithms are described in [17]. The Kalman filter makes a number of assumptions about the objects in the imagery. It assumes that all errors are Gaussian and that each object in the data at time is completely represented by a four-element state vector

(2)

is the coordinate of the object’s centroid in the where is the coordinate of the object’s centroid sonar image and in the sonar image. The transmitter is located at the center of the hydrophone array that lies along the axis. The apparent motion of objects in the imagery is assumed to consist of two components. The first is known and is given by the motion estimate as computed in Section III, which will be used as an input to the filter. The second residual motion component is unknown and accounts for the error in the motion estimate from Section III and other effects, such as angular motion of the vessel (yaw, pitch, and roll) and the changing environment. The variables and are the components of the object’s velocities due to residual motion along the and axes, respectively. The evolution of the object’s state from time to time can be described by the following state space equation: (3) where (4) (5)

(6)

(7)

(8)

The vector is the effect that the estimate of the vessel’s velocity (computed in Section III) would have on the apparent motion of objects in the image along both axes at time . This is the first (known) motion component as described above. The represents Gaussian process noise with covariance vector matrix . A full description of the Kalman filter can be found in [17] and [18] and, hence, will not be given here. The measurement equation for this system is given by (9) where (10) and is the additive measurement noise vector with covariance in the Cartesian domain. The system in (9) and (10) indicates that the only quantities of the object’s state we can directly measure from the data are the and coordinates of the object. These measurements are also expected to have errors associated with them. In reality, we measure the range and bearing of the objects and convert these measurements into Cartesian coordinates. A method to perform this conversion without introducing biases into the measurements is , described in [19], which also describes how to calculate which is not a constant but varies with the measurement. Following the first stage of detection, most of the false alarms will have been rejected. The detections that survived have a confidence value associated with them. The confidence value is merely the output of the neural network used for the first stage of detection. The tracking system starts by assigning the 20 objects with the highest confidence measurements to initialize the available Kalman filters. Usually, there will be less than 20 objects passed by the first stage of detection, so some available Kalman filters will not be used. For each Kalman filter, the state vector is initialized by setting the and elements of the vector to the coordinates of the object found and the initial estimate of the object’s residual velocity is set to zero. This is reasonable, since the error in the motion estimate from Section III (i.e., the residual motion) is usually small. Each Kalman filter has a track hits counter variable associated with it. The track hits counter is set to three upon initialization, since new objects appearing on the display are likely to be faint and so are given a “grace period” of two pings to establish a strong track. An initial estimate of the is computed using the method destate covariance matrix scribed in [17]. In the next image, for each active Kalman filter, the state of the currently tracked object is predicted using (3). Based on this

PERRY AND GUAN: RECURRENT NEURAL NETWORK FOR DETECTING OBJECTS IN SEQUENCES OF SECTOR-SCAN SONAR IMAGES

new state, the next measurement is predicted using (9). The meais used to form surement prediction error covariance matrix a validation region (refer to [18] for details). Measurements separated from the predicted measurement by more than four standard deviations of the measurement prediction error have a very low probability of association with the currently tracked object and can be ignored as clutter. Hence, all points within four standard deviations of the measurement prediction error from the predicted measurement define the validation region. If a tracked object is not seen (i.e., no measurement falls inside the validation region) for a couple of pings, the size of the validation region will increase. Since the system described in (3) does not take into account the fact that the object’s state vector represents the residual motion of the object, so its magnitude cannot be much larger than zero, the validation region can grow too large when a currently tracked object is not seen for a few pings. To remedy this situation the size of the validation region is restricted such that it does not include points more than 5 m from the measurement prediction. If a candidate object is found within the validation region, the tracking system has found a “match.” If multiple objects are found within the validation region, the tracking system uses the measurement closest to the predicted measurement. This is called the nearest neighbor algorithm and works well in situations in which the clutter is not too severe [17]. Each time that a match is found, the Kalman filter updates the estimate of the state to include information from the new measurement and also updates the estimate of the state covariance matrix [17]. In addition, the track hits counter is also incremented. If no match is found, the track hits counter is decremented. A Kalman filter can be freed from tracking an object if that object passes outside the sonar field of view or if the track hits counter drops to zero. At this stage, any unmatched candidates are examined as potential new objects for tracking. These candidates are sorted in order of confidence to ensure that the strongest candidates are free Kalman filters, the candiconsidered first. If there are dates with the highest confidence values will be assigned to these filters. When a Kalman filter becomes active, it will be initialized with the information from the associated candidate. The value of the relevant track hits counter is set to 3. The initial estimates of the tracked object’s state and state covariance are also computed as above. If insufficient Kalman filters are available to accommodate all the unmatched candidates, the candidates left behind can take over currently tracked objects with confidences lower than the candidates. Any candidate still not assigned a Kalman filter at this point is discarded. VI. FINAL STAGE OF DETECTION In this section, we will discuss the various options for the second or final detection stage of the proposed system. A. Nonrecurrent Final-Stage Detector Without the Tracking Stage The simplest option for the final detection stage of this system is to feed the candidate objects detected by the first stage system

863

into another nonrecurrent neural network without tracking the objects to extract temporal information. In this paper, we consider a second-stage neural network with the same architecture as the first stage detector. However, the final-stage neural network will be trained only on candidate objects detected by the first stage detector. This is a well-known way to boost the performance of a detection system and has been used very successfully to help detect objects in side-scan sonar imagery [20]. In [20], it was noted that dominant cluster centers in the distribution of the features in feature space can tend to bias training away from the class boundaries. In effect, the many easy-to-classify samples hinder the detector from concentrating on the hard-to-classify samples near the decision boundary in feature space. Mitigating this effect is part of the rationale behind support vector machines [21]. Cascading neural network detectors can, therefore, be expected to improve the performance, since the rejection of false alarms in earlier stages helps latter stages by deemphasizing the dominant cluster centers associated with the false alarms. In the case of sector-scan sonar, using this method alone still has the disadvantage of ignoring the temporal information present in the data. B. Nonrecurrent Final-Stage Detector With the Tracking Stage A step up from the system described in Section VI-A is to add the tracking stage described in Section V. After the first detection stage, the surviving candidates are tracked by the bank of Kalman filters and only objects that are tracked successfully are considered as potential man-made objects and their feature vectors are fed into another nonrecurrent neural network. The fact that there are only 20 Kalman filters available to track objects is a simple way to help the rejection of false alarms (assuming that there are no more that 20 manmade objects in the sequence at any one time). When no object is found in the validation region, the tracking system simply returns the feature vector from the last ping. Although this may cause false alarms in the case that very bright clutter events occur, it would be expected to help matters in the more common case of objects of interest fading temporarily due to environmental effects. However, like the method in Section VI-A, this method still does not make good use of the temporal information present as it examines the feature vectors of a tracked object individually. C. Recurrent Final-Stage Detector With the Tracking Stage A nonrecurrent neural network has no memory of the previous output. That is, when supplied with a vector of features, the network will arrive at the same judgment regardless of the previous samples encountered. There is no form of memory allowed for in the network architecture. In most cases, this is desirable, as in many classification problems the probability that the current sample belongs to a certain class is independent of the probability that the previous sample belonged to the same class. In the problem of concern to this paper, this is not so. The probability that an object of interest is in the current image is much higher if that object has been observed in images in the immediate past. Human operators often identify such objects in the imagery by watching how the objects persist over time. Hence, time is a key feature of this problem.

864

IEEE JOURNAL OF OCEANIC ENGINEERING, VOL. 29, NO. 3, JULY 2004

D. Detectors Augmented With Temporal Features

Fig. 4. Example of a recurrent neural network.

Sometimes, temporal information can be presented to a classifier by creating large feature vectors that contain the features in the current sample as well as features from a number of previous samples [21]. This can, however, be unwieldy and result in networks that contain a large number of neurons and are difficult to train. Recurrent neural networks are specifically designed to learn and recognize sequential or time-varying patterns in data [22]. In a nonrecurrent neural network, neurons only receive their inputs from the outputs of neurons in lower layers. The data flows in only one direction. A recurrent neural network is the term for any neural network whose neurons can receive inputs from neurons within the same layer or upper layers (as well as neurons in the lower layers in most cases). In particular, feedback connections from events and neuron states in the past give certain types of recurrent neural network the ability to learn time-varying patterns. There are many types of recurrent neural networks [22], [23]. For this investigation, a basic type of recurrent network is used, which has an architecture similar to the MLP. The network has three layers: input, hidden, and output. Like the system in Section IV, the output layer has a single neuron. The network’s judgment about a given object is obtained by thresholding the value of the output neuron. However, the previous output layer states of the network are stored for a number of time cycles and fed back into the network as new features. If the three previous output values are stored, then the network will be in effect using 18 features (15 static features and the three previous output features). Because of the temporal information used by the network, training becomes more computationally demanding. However, the potential benefits of this type of network can outweigh this disadvantage. Fig. 4 shows an example of a recurrent neural network with the previous five output layer states stored in a tapped delay line. The symbol “D” in the tapped delay line represents a time delay of one sample.

Another approach to including temporal information in the detection process is the computation of additional temporal features. The temporal features are formed by storing the static features for a number of consecutive images and computing statistical quantities such as the mean, standard deviation, rate of change, etc., of each feature over time. This is only made possible by the presence of an object-tracking stage to ensure that time sequences of features from the same object can be extracted. Each statistical measure produces another feature vector of equal size to the vector of static features. The number of features available quickly grows in size and a method of feature set reduction must be used to avoid the “curse of dimensionality” [14]. In [6] and [11], this approach was shown to improve the detection performance; however, a number of problems remain. The temporal features require the storage of past static features. Their computation must be done at the same time as the static features; hence, they produce an increased computational load on the detection system whilst it is operating. Note that a recurrent neural network requires a larger computational load during training, but does not put as much of a computational load on the system while it is operating. Also, if the recurrent network is powerful enough, it may be able to accurately model the temporal nature of the data without the computation of temporal features. For comparison with the proposed recurrent neural-network approach, two other detectors that make use of temporal features were created. In this paper, two temporal measures are computed from each static feature. The feature vectors of an object being tracked by a Kalman filter are added to the storage area assigned to that object. Each feature vector consists of 15 static features. When a Kalman filter starts to track an object, the associated storage area begins to fill with the feature vectors extracted from that object. When the storage area contains ten feature vectors, new feature vectors begin to displace the oldest stored vectors. The mean and standard deviation for each of the 15 static features over time are computed using the stored feature vectors. This occurs as soon as the storage area contains feature vectors, not just when the storage area is full. The addition of these temporal features brings the total number of features to 45. We denote this feature set the augmented feature set. Not all of these features are guaranteed to be useful so, in this paper, we use three different methods to reduce the set of features to a useful size. The feature-set reduction techniques we consider are principal component analysis [21], sequential backward selection, and sequential forward selection [1]. The optimal subset of the augmented feature set is then fed into recurrent and nonrecurrent neural networks. VII. EXPERIMENTAL RESULTS Each of the runs described in Section II is preprocessed to remove clutter and segmented in the manner described in Section III. As described in Section II, a human operator determines which segmented objects in each sequence correspond to objects of interest and which segmented objects correspond to clutter events or natural structures on the sea floor not of interest to this investigation. Denote the segmented structures in the imagery as structure examples or just examples. The segmented

PERRY AND GUAN: RECURRENT NEURAL NETWORK FOR DETECTING OBJECTS IN SEQUENCES OF SECTOR-SCAN SONAR IMAGES

865

Fig. 5. Performance of the first detection stage.

structures corresponding to objects of interest are, therefore, positive examples and any other segmented structures in the imagery are negative examples. It is important to note that there are many more negative examples in the data set than positive examples which has an important effect on how the detectors are trained and their performance measured. The data is divided into two primary sets. Set A contains seven runs and a total of 4450 pings. This is the set from which all training data will be extracted in this investigation. The runs in Set A were all collected during a single day of the trial. Set B contains five runs and a total of 2520 pings. The data in this set will not be used in training any detector, but will be used to evaluate the performance of the various detection systems. Four of the runs in Set B were collected on a different day than the runs in Set A. The remaining run was collected on the same day as those in Set A. A. First Stage of Detection The first step is to train the first detection stage as described in Section IV. A master training set is created from the data in Set A. This set consists of all positive examples in Set A and a random sampling of the negative examples. The number of negative examples used in the master training set is equal to the number of positive examples. If the detector was trained using all the data from Set A, the overwhelming number of negative examples would bias the training away from the desired result. However, since some of the data in Set A are not used in training, it is important that the detector’s performance is tested on the entirety of Set A. The master training set is further divided into two subsets. The training set consists of 75% of the examples in the master training set, randomly selected. The remaining 25% of the examples form a validation set. The examples in the validation set are not used to directly adjust the network weights;

however, after each training epoch, the performance against the validation set is measured. If the error computed on the validation set begins to rise during training, the training is halted. This is a common approach to avoid overtraining neural networks. Fig. 5 shows the performance of the first detection stage for Sets A and B, as well as the overall performance on the entire data set. The horizontal axis is the average number of false alarms per image while the vertical axis is the probability of detecting the objects of interest. The curves in Fig. 5 were created by varying the output judgment threshold of the neural network and measuring the number of detections and false alarms at each threshold level. Low thresholds favor the detection of the positive examples, whereas high thresholds favor the rejection of the negative examples. The final detection stage needs the first detection stage to be set at a particular operating point on the curves in Fig. 5. An operating point was chosen so that the first detection stage has a probability of detection of 87.8% over the entire data set with an average of 21 false alarms per image. This performance is poorer than the performance reported for a similar system in [1]. The experimental conditions under which data was collected were more benign in [1] than in this paper. In [1], the vessel was tied to a wharf and the sea conditions were calmer. In this work, the vessel is in motion at approximately 3 kn, the water depth is greater, and the sea conditions rougher. These factors cause a much greater variability in the strength and form of the acoustic returns from the test objects and make objects of interest much harder to detect. It should be noted that the above probability of detection of 87.8% does not mean that 87.8% of the test objects are found by the first detection stage. Rather, it means that for each image in which a test object appears, the detector has an 87.8% probability of finding the object in the image. Alternatively, if an object of interest appears on the operator’s display for 100 pings in

866

Fig. 6.

IEEE JOURNAL OF OCEANIC ENGINEERING, VOL. 29, NO. 3, JULY 2004

Performance of the complete two-stage detection systems using only static features in the second stage.

a run, the first detection stage will correctly mark the object on the display in roughly 88 pings. This convention is used for all the detectors in this paper. If the goal of the detection system is to indicate the presence of objects in the imagery, then a very high probability of detection for objects in a single image is not required for sector-scan sonar (the case is very different in side-scan sonar systems). This is because an object will appear on the sonar display for a number of pings, so it is sufficient that the object is detected for most of those pings. B. Final Stage of Detection Using Only Static Features There are two types of system for the final detection stage using only static features. The first type is described in Section VI-A and does not make use of the object-tracking system. The second type includes those systems described in Sections VI-B and VI-C that do make use of the tracking system. For the first type of system, a master training set is created consisting only of positive and negative examples from Set A accepted (as potential man-made objects) by the first detection stage in Section VII-A. The training set is created from the master training set using all of the positive examples in five of the seven runs and a randomly sampled subset of the negative examples in the same five runs. Once again, the number of negative and positive examples are made equal. The first stage detector rejects some of the negative examples; however, the number of negative examples in the master training set is still much greater than the number of positive examples. The validation set is created in a similar way from the remaining two runs in the master training set not used to create the training set. The architecture of this system is identical to that of the first detection stage and the number of neurons in the hidden layer was set to 20.

For the second type of system, the candidate examples from Set A accepted by the first detection stage were passed to the tracking system (described in Section V). The tracking system performs well and is able to robustly track almost all objects of interest as they were encountered by the sonar. A human analyst identified the pings during which the objects of interest were being tracked and which of the 20 Kalman filters were tracking the object. This information was then used to extract post-tracking positive and negative training examples from Set A to create a master training set. Once again, the training examples consist of many more negative examples than positive examples. If we consider a single Kalman filter tracker out of the 20 in the tracking system, we see that for most pings the filter will either be tracking nothing, a clutter event, or an uninteresting feature on the sea floor. The output of a single Kalman filter during a run will be a time sequence of feature vectors (if we exclude the pings when the filter was inactive), with an occasional contiguous block of feature vectors from an object of interest. A training set is created by taking five of the seven runs in the master set and choosing to include in the training set the outputs of specific Kalman filters in the tracking system such that most positive examples are represented and there is approximately an equal number of feature vectors from negative examples as positive examples. The training set then consists of a single long sequence of feature vectors derived from joining together contiguous blocks of feature vectors from negative and positive examples alternately. The validation set was created in the same way, but using only data from the two runs not used to create the training set. The reason for this complicated method of forming the training and validation sets is to preserve the temporal information obtained by the tracking algorithm. Nonrecurrent and recurrent networks were trained using this data. For the nonrecurrent network, after

PERRY AND GUAN: RECURRENT NEURAL NETWORK FOR DETECTING OBJECTS IN SEQUENCES OF SECTOR-SCAN SONAR IMAGES

867

Fig. 7. Closer look at Fig. 6 in the region of low false-alarm rates. TABLE II PROBABILITY OF DETECTION AT A MEAN FALSE-ALARM RATE OF 0.4 PER IMAGE USING STATIC FEATURES ONLY

experimentation it was found that the optimal performance was obtained with a hidden layer of 20 neurons. For the recurrent neural network, a hidden layer of 30 neurons with the previous five network outputs fed back into the system was found to work best. Hence, the number of neurons in the recurrent network is greater than the competing methods. However, the performance of the nonrecurrent networks was not found to improve for more than 20 neurons in the hidden layer despite numerous attempts. Fig. 6 shows the overall performance of the complete twostage detection system for each of the three different implementations of the final detection stage that use only static features. The overall performance is defined as the performance of the system on the entire data set. The overall performance of the first detection stage has been included in the plot for reference. It can be readily seen that, for low mean false-alarm rates, the addition of a second detection stage has greatly increased the probability of detecting the objects of interest. The second notable aspect of Fig. 6 is the fact that as the mean false-alarm rate rises toward ten false alarms per image, the performance of the detector that does not use tracking converges to the performance of the first detection stage. This is expected since, without the extra information available by tracking the objects in the imagery, the complete

system can never attain a probability of detection greater than that of the operating point of the first detection stage (87.8% in this case). However, both the nonrecurrent and recurrent networks that make use of the tracking system eventually reach probabilities of detection greater than that of the first detection stage. All of the detectors in this work have their performances measured in the same manner. The detectors that use tracking have their performances measured against all objects including those that the tracking stage did not acquire or track well, which was done to make fair comparisons between the different detectors. Fig. 7 shows the overall performances of the three different systems for the final detection stage, which use only static features in the region of low false-alarm rates. The recurrent neural network outperforms the other approaches. Table II gives the performances of the various systems at a mean false-alarm rate of 0.4 per image. Fig. 8 gives a more detailed view of the performance of the recurrent neural network approach by explicitly showing the performance of the system on Sets A and B, as well as the overall performance on both data sets. Despite the fact that no training examples were drawn from Set B, the performance on Set B is no worse than that of Set A, indicating that the network has not been overtrained.

868

IEEE JOURNAL OF OCEANIC ENGINEERING, VOL. 29, NO. 3, JULY 2004

Fig. 8. Performance of the complete two-stage detection system using the recurrent neural network and only static features for the second stage.

Fig. 9. Performance of the complete two-stage detection systems using static and temporal features processed by PCA.

C. Final Stage of Detection Using the Augmented Feature Set In Section VI-D, the methodology for the extraction of temporal features for a tracked object was discussed. The inclusion of temporal features raises the total number of features used to characterize an object in the imagery to 45. This set of 45 features is referred to as the augmented feature set. The number of

TABLE III PROBABILITY OF DETECTION AT A MEAN FALSE-ALARM RATE OF 0.4 PER IMAGE USING PCA APPLIED TO THE AUGMENTED FEATURE SET

PERRY AND GUAN: RECURRENT NEURAL NETWORK FOR DETECTING OBJECTS IN SEQUENCES OF SECTOR-SCAN SONAR IMAGES

869

Fig. 10. Mean-square error versus feature set size for SFS and SBS.

features is excessive and three different selection methods are used to reduce the dimensionality of the feature set to something more manageable. The first feature selection technique is principal component analysis (PCA) [21], which can be very useful when the problem is linear in nature, but when the problem is nonlinear this technique may not produce the best results [21]. For this work, we found the top 15 principal components of the augmented features extracted from the examples in Set A. These principal components comprised 98.8% of variation in the data set. Training and validation sets are formed in the same way as in the Section VII-B. After some experimentation, the optimal number of neurons in hidden layers of the recurrent and nonrecurrent networks are found to be 30 and 20, respectively. Once again, the optimal number of feedback connections into the recurrent network are found to be five. Fig. 9 shows the overall performances of the two-stage detection systems using the recurrent and nonrecurrent networks with the PCA-derived features for the second stage, respectively. Table III gives the measured probabilities of detection at a mean false-alarm rate of 0.4 per image for the two detectors. Both systems perform similarly. Comparing them with Table II, the nonrecurrent neural network benefits from the temporal features and has gained an improvement in performance. The recurrent network, however, has suffered a degradation in its performance and appears to not benefit from the inclusion of the temporal features. The next feature-selection techniques are sequential forward selection (SFS) and sequential backward selection (SBS). These techniques are implemented in the same manner as described in [1]. Fig. 10 shows the mean-square detection errors for different sizes of feature sets. Note that as the number of

features included in the detection process increases, the detection error drops. However, after about 20–25 features, adding additional features to the set produces very little gain. In this case, the SFS technique performs slightly better than the SBS technique. Both techniques suggest using a feature vector of approximately 25 features. The feature vector chosen by the SFS technique is used to train the detectors. Once again, the training and validation sets are formed in the same way as in Section VII-C. Due to the larger size of the feature vector, it is found that the nonrecurrent neural network performs best when the number of neurons in its hidden layer is increased to 30. However, the recurrent network performs best when its hidden layer remains at 30 neurons with the previous five outputs fed back into the input layer. Fig. 11 shows the overall performances of the two-stage detectors using the recurrent and nonrecurrent networks with the SFS-derived features for the second stage, respectively. Table IV gives the measured probabilities of detection at a mean falsealarm rate of 0.4 per image. In this case, the nonrecurrent neural network outperforms the recurrent network. Comparing it with Table II, the nonrecurrent neural network benefits from the temporal features and has gained an improvement in performance. The recurrent network, however, has suffered a degradation in its performance; it does not perform as well as the nonrecurrent network, although the difference is small. Among the different implementations of the final detection stage, the best performance is obtained by the recurrent neural network trained using only sequences of static features. This network attains a probability of detection of 77.0% at a mean false-alarm rate of 0.4 per image. This performance is achieved without the extra storage or system load caused by the computation of the temporal features. When the nonrecurrent networks

870

Fig. 11.

IEEE JOURNAL OF OCEANIC ENGINEERING, VOL. 29, NO. 3, JULY 2004

Performance of the complete two-stage detection systems using static and temporal features selected by SFS.

TABLE IV PROBABILITY OF DETECTION AT A MEAN FALSE-ALARM RATE OF 0.4 PER IMAGE USING SFS APPLIED TO THE AUGMENTED FEATURE SET

are supplied with the temporal features in addition to the static features, their performances improve. However, they are still unable to reach the performance level of recurrent neural network using only static features. When the recurrent neural network is trained with the temporal features in addition to the static features, no improvement is observed. From this we can conclude that the temporal information obtained by the recurrent network via the feedback loops in its architecture is sufficient for the detector to perform the task. No useful additional information is supplied by the temporal features in this case. Instead, the addition of the temporal features seems to add noise and unnecessary extra complexity to the system that degrades the performance of the recurrent network. VIII. CONCLUSION In this paper, we have presented a system for detecting small man-made objects in sequences of sector-scan sonar images. The system begins by cleaning the imagery using temporal averaging with motion compensation. Following this operation, segmentation is performed on the imagery and static features extracted from the objects found. A first detection stage then rejects obvious false alarms, while maintaining a reasonable probability of detection. The first detection stage is limited in its performance by the lack of ability to extract temporal information

from the data and is only able to attain a probability of detection of 87.8% with a mean false-alarm rate of 21 per image. A bank of Kalman filters is used to track objects in the imagery. By tracking objects in the imagery, sequences of feature vectors can be extracted. Such sequences contain information about how the acoustic returns from objects vary over time. There are a number of ways to make use of this information. Previous work on this subject analyzed the feature sequences to compute extra temporal features. In this paper, we proposed using a recurrent neural network to process this information without the computation of temporal features. The recurrent neural network is designed to extract temporal information from the data by use of feedback connections that give the network a form of memory. Recurrent neural networks are well suited to the problem of analyzing temporal data. In this paper, we showed that the proposed approach performed better than using a nonrecurrent neural network augmented with temporal features. This is especially encouraging since the computation of temporal features provides additional load and complexity to the system while operating. In fact, although the addition of temporal features was found to improve the performance of nonrecurrent network detectors, they did not help the recurrent network to improve its performance. From this we may conclude that the additional temporal information encoded in the temporal features was not needed by the recurrent network. The complete two-stage detection systems using the recurrent neural network approach for the second stage was found to achieve a probability of detection of 77.0% at a mean false-alarm rate of 0.4 per image. Hence, the addition of the second detection stage has only decreased the probability of detection by 11% compared to the first detection stage, while decreasing the mean false-alarm rate by a factor of more than 50.

PERRY AND GUAN: RECURRENT NEURAL NETWORK FOR DETECTING OBJECTS IN SEQUENCES OF SECTOR-SCAN SONAR IMAGES

ACKNOWLEDGMENT This work was conducted while S. W. Perry was with the Maritime Operations Division, Defence Science and Technology Organization, Australia. The authors would like to acknowledge the contributions to this paper by K. Lo, B. Ferguson, B. B. Shen, and the Maritime Operations Division’s High Frequency Sonar Trials Team: J. Shaw, G. Speechley, C. Halliday, J. Cleary, R. Susic, N. Tavener, and A. Head. REFERENCES [1] S. W. Perry and L. Guan, “Pulse-length-tolerant features and detectors for sector-scan sonar imagery,” IEEE J. Oceanic Eng., vol. 29, pp. 138–155, Jan. 2004. [2] R. J. Urick, Principles of Underwater Sound. New York: McGrawHill, 1975. [3] M. J. Chantler and J. P. Stoner, “Robust classification of sector scan sonar image sequences,” in Proc. OCEANS’94, vol. 2, Brest, France, 1994, pp. 591–596. [4] D. Dai, M. J. Chantler, D. M. Lane, and N. Williams, “A spatial-temporal approach for segmentation of moving and static objects in sector scan sonar image sequences,” presented at the Inst. Elect. Eng. Conf. Image Processing Applications, Edinburgh, U.K., July 1995, pp. 163–167. [5] M. J. Chantler, D. M. Lane, D. Dai, and N. Williams, “Detection and tracking of returns in sector-scan sonar image sequences,” Proc. Radar, Sonar Navig., vol. 143, no. 3, pp. 157–162, 1996. [6] M. J. Chantler and J. P. Stoner, “Automatic interpretation of sonar image sequences using temporal feature measures,” IEEE J. Oceanic Eng., vol. 22, pp. 47–56, Jan. 1997. [7] D. M. Lane, M. J. Chantler, and D. Dai, “Robust tracking of multiple objects in sector scan sonar image sequences using optical flow motion estimation,” IEEE J. Oceanic Eng., vol. 23, pp. 31–46, Jan. 1998. [8] D. M. Lane, M. Chantler, D. Y. Dai, and I. T. Ruiz, “Tracking and classification of multiple objects in multi-beam sector scan sonar image sequences,” in Proc. Underwater Technology, Tokyo, Japan, Apr. 1998, pp. 269–273. [9] R. N. Carpenter, “Concurrent mapping and localization with FLS,” in Proc. Workshop Autonomous Underwater Vehicles (AUV’98), Cambridge, MA, Aug. 1998, pp. 133–148. [10] I. T. Ruiz, Y. Petillot, D. Lane, and J. Bell, “Tracking objects in underwater multibeam sonar images,” in Proc. IEE Colloq. Motion Analysis Tracking, London, U.K., May 1999, pp. 11/1–11/7. [11] I. T. Ruiz, D. M. Lane, and M. J. Chantler, “A comparison of interframe feature measures for robust object classification in sector scan sonar image sequences,” IEEE J. Oceanic Eng., vol. 24, pp. 458–469, Oct. 1999. [12] Y. Petillot, I. T. Ruiz, and D. M. Lane, “Underwater vehicle obstacle avoidance and path planning using a multi-beam forward looking sonar,” IEEE J. Oceanic Eng., vol. 26, pp. 240–251, Apr. 2001. [13] R. N. Carpenter and M. R. Medeiros, “Concurrent mapping and localization and map matching on autonomous underwater vehicles,” in Proc. OCEANS’01, Honolulu, HI, Nov. 2001, pp. 380–389. [14] J. Hertz, A. Krogh, and R. G. Palmer, Introduction to the Theory of Neural Computation. Reading, MA: Addison-Wesley, 1991. [15] D. Nguyen and B. Widrow, “Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights,” in Proc. Int. Joint Conf. Neural Networks, vol. 3, San Diego, CA, June 1990, pp. 21–26. [16] M. T. Hagan and M. B. Menhaj, “Training feedforward networks with the Marquardt algorithm,” IEEE Trans. Neural Networks, vol. 5, pp. 989–993, Nov. 1994. [17] Tracking and Data Association, Y. Bar-Shalom and T. E. E. Fortmann, Eds., Academic, Orlando, FL, 1988. [18] Y. Bar-Shalom and X. R. Li, Estimation and Tracking: Principles, Techniques and Software. Norwood, MA: Artech House, 1993. [19] D. Lerro and Y. Bar-Shalom, “Tracking with debiased consistent converted measurements versus EKF,” IEEE Trans. Aerosp. Electron. Syst., vol. 29, pp. 1015–1022, July 1993.

871

[20] G. J. Dobeck, “Algorithm fusion for automated sea mine detection and classification,” in Proc. OCEANS’01, vol. 1, Honolulu, HI, Nov. 2001, pp. 130–134. [21] Y. H. Hu and J. N. E. Hwang, Eds., Handbook of Neural Network Signal Processing. Boca Raton, FL: CRC, 2002. [22] L. R. Medsker and L. C. E. Jain, Eds., Recurrent Neural Networks: Design and Applications. Boca Raton, FL: CRC, 2000. [23] J. L. Elman, “Finding structure in time,” Cogn. Sci., vol. 14, pp. 179–211, 1990.

Stuart W. Perry (S’96–M’99) received the B.S. degree with first-class Honors in electrical engineering and the Ph.D. degree from the from the University of Sydney, Sydney, Australia, in 1995 and 1999, respectively. From 1993 to 1994, he was a Research Assistant with the Division of Wool Technology, Commonwealth Science and Industrial Research Organization (CSIRO), Ryde, Australia. From 1998 to 2003, he was a Research Scientist with the Maritime Operations Division, Defence Science and Technology Organization (DSTO), Pyrmont, Australia. He currently is a Senior Research Engineer with Canon Information Systems Research, North Ryde, Australia (CISRA). He is the author of technical articles in the fields of neural-network-based image restoration and acoustical image processing and object detection. He also coauthored, Adaptive Image Processing: A Computational Intelligence Perspective (Boca Raton, FL: CRC, 2001) and has also been involved in organizing international conferences. His research interests include neural networks, pattern recognition, image processing, and acoustical imaging.

Ling Guan (S’88–M’90–SM’96) received the B.Sc. degree in electronics engineering from Tianjin University, Tianjin, China, in 1982, the M.A.Sc. degree in systems design engineering from the University of Waterloo, Waterloo, ON, Canada, in 1985, and the Ph.D. degree in electrical engineering from the University of British Columbia, Vancouver, BC, Canada, in 1989. From 1989 to 1992, he was a Research Engineer with Array Systems Computing Inc., Toronto, ON, Canada, in machine vision and signal processing. From October 1992 to April 2001, he was a Faculty Member with the University of Sydney, Sydney, Australia. Since May 2001, he has been a Professor in the Department of Electrical and Computer Engineering, Ryerson University, Toronto, ON, Canada, where he was subsequently appointed to the position of Canada Research Chair. He has held visiting positions with British Telecom, Ipswich, U.K. (1994), Tokyo Institute of Technology, Tokyo, Japan (1999), Princeton University, Princeton, NJ (2000), and Microsoft Research Asia, Beijing, China (2002). He has published more than 180 scientific papers and edited/authored two books, Multimedia Image and Video Processing (Boca Raton, FL: CRC, 2000) and Adaptive Image Processing: A Computational Intelligence Perspective (Boca Raton, FL: CRC, 2001). He is an Associate Editor of several international journals. His research interests include multimedia processing and communications, human-centered computing, machine learning, and adaptive image and signal processing. Dr. Guan is an Associate Editor of IEEE TRANSACTIONS ON NEURAL NETWORKS. In 1999, he was Co-Guest Editor of the Special Issues on computational intelligence for PROCEEDINGS OF THE IEEE. He is on the Editorial Board of CRC Press’s book series on image processing. He has been involved in organizing numerous international conferences and was the Founding General Chair of the IEEE Pacific-Rim Conference on Multimedia. He is a Member of the International Association for Pattern Recognition and is on the Advisory Board of the International Computational Intelligence Society. He is currently serving on the IEEE Signal Processing Society Technical Committee on Multimedia Signal Processing and was a Member of the IEEE Signal Processing Society Technical Committee on Neural Networks for Signal Processing (1997–2000).