Automatic Segmentation of Episodes Containing ... - IEEE Xplore

5 downloads 0 Views 400KB Size Report
Epileptic Clonic Seizures in Video Sequences. Stiliyan Kalitzin. ∗ ... F. Lopes da Silva is with Swammerdam Institute for Life Sciences, Cen- ter of Neuroscience ...
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 59, NO. 12, DECEMBER 2012

3379

Automatic Segmentation of Episodes Containing Epileptic Clonic Seizures in Video Sequences Stiliyan Kalitzin∗ , George Petkov, Demetrios Velis, Ben Vledder, and Fernando Lopes da Silva

Abstract—Epilepsy is a neurological disorder characterized by sudden, often unexpected transitions from normal to pathological behavioral states called epileptic seizures. Some of these seizures are accompanied by uncontrolled, often rhythmic movements of body parts when seizure activity propagates to brain areas responsible for the initiation and control of movement. The dynamics of these transitions is, in general, unknown. As a consequence, individuals have to be monitored for long periods in order to obtain sufficient data for adequate diagnosis and to plan therapeutic strategy. Some people may require long-term care in special units to allow for timely intervention in case seizures get out of control. Our goal is to present a method by which a subset of motor seizures can be detected using only remote sensing devices (i.e., not in contact with the subject) such as video cameras. These major motor seizures (MMS) consist of clonic movements and are often precursors of generalized tonic–clonic (convulsive) seizures, sometimes leading to a condition known as status epilepticus, which is an acute life-threatening event. We propose an algorithm based on optical flow, extraction of global group transformation velocities, and band-pass temporal filtering to identify occurrence of clonic movements in video sequences. We show that for a validation set of 72 prerecorded epileptic seizures in 50 people, our method is highly sensitive and specific in detecting video segments containing MMS with clonic movements. Index Terms—Epilepsy, motor seizures, signal processing, video sequence analysis.

I. INTRODUCTION PILEPSY is a clinical condition of the central nervous system that can be described as “dynamic” [1], [2]. That is, most of the time people with epilepsy are without any apparent abnormal symptoms but they may suddenly display attacks or seizures which partially or entirely impair their normal functions. Of the different types of seizures, motor seizures are those that have perhaps the most dramatic appearance and may pose a

E

Manuscript received March 19, 2012; revised June 7, 2012; accepted August 20, 2012. Date of publication August 27, 2012; date of current version November 22, 2012. This work was supported in part by the ZonMw agency, The Netherlands, under Grant 300040003. Asterisk indicates corresponding author. ∗ S. Kalitzin is with the Foundation Epilepsy Institute of The Netherlands (SEIN), 2103 SW Heemstede, The Netherlands (e-mail: [email protected]). G. Petkov, D. Velis, and B. Vledder are with the Foundation Epilepsy Institute of The Netherlands (SEIN), 2103 SW Heemstede, The Netherlands (e-mail: [email protected]; [email protected]; [email protected]). F. Lopes da Silva is with Swammerdam Institute for Life Sciences, Center of Neuroscience, University of Amsterdam, 1012 ZA Amsterdam, The Netherlands, and also with the Department of Bioengineering, Instituto Superior T´ecnico, Lisbon Technical University, 1169 Lisbon, Portugal (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TBME.2012.2215609

hazard to the individual’s health. For example, the major motor seizures (MMS), including clonic and tonic–clonic seizures [3], can last for long periods and may cause severe physical injuries if the individual is not attended and helped soon after seizure onset. A considerable body of evidence exists demonstrating that (semi)automatic detection of epileptic seizures is both clinically feasible and practically useful [4]–[8]. Nonetheless, such detections are primarily based on the analysis of electroencephalographic (EEG) records obtained during long-term video-EEG recordings in people undergoing long-term seizure monitoring in specialized institutions [9]. Very few people, however, remain under constant surveillance by trained operators at all times. The vast majority of people with epilepsy lead normal lives at home rather than being admitted in specialized institutions and even those who are admitted in hospital are not continuously monitored by trained personnel. Therefore, automated systems have been developed and implemented to detect MMS. Most of the systems commercially available for people with epilepsy and their caregivers are based either on accelerometer devices, i.e., sensors that measure the instantaneous acceleration or the “g-force” [10], [11] or other motion sensors [12]. They can either be attached to the individual’s bed or to the person with epilepsy [13] (typically using wireless transmission of real-time data). In the first case, such systems are often of limited use since they detect only events when the individual is in bed. Body-attached sensors, on the other hand, may be uncomfortable and, if they rely on wireless transmission, have to be recharged at regular intervals. Furthermore, in all cases trained personnel or family members have to be involved in the care of both the individual and in keeping such devices in adequate working order. In clinical practice, it is important to develop other methods to overcome these limitations. One possibility is to use video signals to give appropriate information about the individual’s behavioral seizures. Closed-circuit television (CCTV) observation, including recording of digital video, is commonplace, especially in inpatient populations, often combined with recording of the EEG, the scalp-derived electroencephalogram (EEG/CCTV). The video images are generally used either only for visual display [14] preferably time-locked to the video signal. Earlier work, particularly in infants, showed that automated detection of seizures included in video sequences can be performed by means of motion strength signal analysis of videotaped signals, looking for targeted seizure events and based on dedicated image processing and feature classification [6], [14]–[18]. These studies are based on optical flow analysis. Using automated selection of the region of interest by velocity field clustering

0018-9294/$31.00 © 2012 IEEE

3380

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 59, NO. 12, DECEMBER 2012

in [15], Karayiannis et al. have analyzed a variety of temporal features of the signal. Application of neural network classification techniques has been applied by the same authors in [6] to improve the detection performance. Cuppens et al. have used the average of a percentage of the highest movement vectors in a video sequence and subsequently derived from a training set appropriate thresholds. A common feature in the aforementioned works is that they deal with nocturnal video sequences of pediatric [14] or neonatal [6] patients where the subject is well positioned into the camera field of vision and other forms of interference with the video signal are minimal. In all cases, however, the basic assumption is that a given seizure may have a specific motor movement signature that can be identified in this way. Chen et al. [19], using registration with dedicated visual markers on the body of the patient have analyzed the movement patterns associated with frontal lobe epilepsy. Movement quantification has been fundamentally analyzed by Li et al. [20] and recently Remi et al. [21] have shown that such quantification can differentiate focal seizures characterized by various automatisms. Defining the region of interest is a major challenge in those approaches as they still deal with the optical flow on a pixel level. We present here a novel remote-sensing paradigm as part of a device designed to detect MMS using signals recorded from standard video cameras, preferably those sensitive to IR light for observation during the night. This device can be implemented using currently available desktop computer power alone, with the potential to upscale the device to online seizure detection already at the prototype stage. The main aim is, as in previous works, to use the image sequences provided by the video cameras and to analyze changes in these images by applying the optical flow method. Using this method, the velocities of recorded objects can be reconstructed from the image intensity changes. A novel unique feature in our approach is the extraction of global properties of the velocity field such as rotation, translation, dilatation, and shear rates of transformations. This gives a significant reduction of the data volume compared to the earlier works and increases the robustness of the method. It also bypasses the problem of local feature extraction and subsequent spatial grouping. By further processing the data with a nonstationary wavelet technique, we show here that sequences that may reflect the presence of movement characteristic of clonic seizures may be detected. We introduce two additional novel concepts: the spectrum of the wavelet amplitude cross covariance and the spectral contrast feature which provides a single measure of “seizureness” as an output of the algorithm. Our algorithms are constructed as feed-forward processing that allows them to be implemented in real-time systems. Here, we describe how we apply this technique only to MMS video sequences that had already been recorded in people being monitored by EEG/CCTV for diagnostic purposes at our institution. Our objective is to test the sensitivity and specificity of our method before considering a real-time seizure warning application. In addition, the method proposed here can be utilized for offline screening of long video sequences, including those showing epileptic seizures, a laborintensive and time-consuming process in epilepsy diagnostic monitoring.

To validate the automated algorithms, we have considered as a “golden standard” the detections of events involving MMS provided by two human experts (D.V. and B.V.). In this study, we do not attempt to validate or quantify the performance of the system as an early MMS-seizure detector. The objective here is to discriminate between epochs (or in this case frame sequences) that contain paroxysmal motor events from those that are free of MMS. A detailed analysis investigating the alarm performance of the system requires different validation tests in view of the potential practical application. This will be the object of a forthcoming publication, with additional data provided by a Dutch multicenter investigation program dedicated to MMS detection in a home environment. II. METHODS A. Video Acquisition The video recordings are performed as part of the routine long-term EEG/CCTV monitoring protocol in our clinical facility with day-time color-enabled cameras Bosch (Bosch Security Systems, B.V) Dinion-LTC 0610, PAL with an external electrical grid vertical synchronization of 50 Hz, interlaced frames, and a horizontal synchronization of 15.636 kHz. The night-time recordings were done with BW-infrared-lightassisted video cameras, Ikegami (Tsushinki Co., Ltd., Ohta-ku, Tokyo), B/W CCD ICD-42 E-type. All digitized recorded images were in mpeg2 format with resolution of 352(H)×288(V) pixels, “YUY2” color encoding, and a fixed frame rate of 25 fps. In this study, the signal intensity alone is used for the processing for both in BW and in color sequences. The positions of the camera’s were fixed but an operator could adjust the PTZ (pan, tilt, and zoom) in order to achieve best registration of the patient. In some of the sequences, mainly during the day-time observations several subjects were in the recorded frame. The sequences were used retrospectively without any selection or attempt to facilitate automated seizure detection. B. Optical Flow Optical flow is a well-established technique for approximate reconstruction of spatial movements as recorded in sequences of optical images [22]. We aim to reconstruct the vector field of velocities from the luminance changes that might be generated by moving objects as recorded by the video camera, by calculating the velocity field as follows: L(x, y, t) → {Vx (x, y, t), Vy (x, y, t)}

(1)

where L(x, y, t) is the intensity field contained in the video recording as a function of the 2-D spatial coordinates (x, y) and the time (or frame number) t. We have used a standard implemented method as provided by the Computer Mathworks Inc., Natick, MA, release 7.13 (2011b). The method applied was “Horn–Schunck” with a single iteration step and a single frame delay. We have performed multiple tests with various iteration steps and frame delays obtaining after the subsequent processing practically identical results. The rest of the parameters used in the optical flow object are smoothness 1, temporal

KALITZIN et al.: AUTOMATIC SEGMENTATION OF EPISODES CONTAINING EPILEPTIC CLONIC SEIZURES IN VIDEO SEQUENCES

gradient filter (−1,1), image smoothing standard deviation 15, gradient smoothing standard deviation 1, and nearest neighbors’ rounding was used.

3381

and unit 1-norm, or ∞ 

|g(t, ν)| = 1.

t=−∞

C. Reconstruction of Group Motion Parameters Once the velocity fields are reconstructed, we reduce the data by extracting only rates of global motion parameters. Accordingly, we first introduce complex coordinates and velocities √ W (z, t) = Vx (z, t) + iVy (z, t); z = x + iy; i = −1. (2) The group of nonhomogeneous linear transformations is then defined by the linear decomposition W (z, t) = T (t) + R(t)z + S(t)z;

z ≡ x − iy.

(3)

Here, T (t), R(t), and S(t) are complex scalars representing the rates of affine transformations [23], namely T the translational rates (velocities) along the two image axes (the real and imaginary parts), R the rotational and dilatational rates, and S the shear rates. These quantities are uniquely defined for a coordinate system centered in the middle of the image with axes along the inherited image dimension. T, R, and S are estimated directly from the optical flow output W as follows: T (t) ∼ = W (z, t)z R(t) ∼ (4) = zW (z, t)z S(t) ∼ zW (z, t) . = z We used in (4) a normalized parameterization of the im  age coordinates such that zz = 0; z 2 z zzz = 1; or in real notation  2   xx,y = yx,y = xyx,y = 0; x x,y = y 2 x,y = 1. (5) In summary, from the original video image sequence, we derive three complex or equivalently six real [the real and imaginary parts of (4)] time series representing the rates of linear spatial transformations. We denote these features as Fc (t), c = 1 . . . 6

(6)

and will use them as primary features that will subsequently be used to detect the MMS-type events. D. Extracting Principle Components Using a Gabor Wavelet Technique Our next strategy is to reduce the six degrees of freedom in (6) and at the same to extract some spectral content of the signals suitable for filtering. To this end, we employ Gabor aperture functions [24], [25] which give the optimal compromise between temporal and spectral resolutions and are given as  2 2 2 2 g(t − t , ν) = (e−π α ν (t−t ) −i2π ν (t−t ) − Oν ) Nν (7) where ν is the central frequency and the product αν is the bandwidth of the filter. The normalization factor N and the offset factor O are chosen so that the functions have zero mean

We also selected the factor α with a constant value of 0.1. The result of this choice for the Gabor set (7) is, therefore, a sequence of scale transformed filters with bandwidths of 10% of the corresponding central frequencies. The sequence ν1 , ν2 , . . . , ν200 was chosen in the range [0.5 Hz 12 Hz], such that (νk − νk −1 )/(νk + νk −1 ) = 0.1, k = 2, . . . , 200. For each of the traces (channels) Fc (t) we define its Gabor time-frequency-dependent amplitude as       (8) Gc (t, ν) ≡  dt g(t − t , ν)Fc (t ) .   t

Next, a complex covariance matrix can be obtained from the Gabor amplitudes as  Kab (ν, Ω) ≡ dtGa (t, ν)Gb (t, ν). (9) t∈Ω

Here, Ω is the time window on which we intend to apply the detection method. In this study, we have chosen a sequence of time windows Ωn of 4 s with 75% overlap or in other words, we have performed the algorithm every second using the optical flow data from the previous 4 s. The reason to use the quantity (9) is that it contains information about the mutual correlations of the primary optical flow features (6). Such correlations will be caused by the fact that our 2-D video registration is a projection of 3-D objects and the movements quantifiers (4) are not independent. We are interested in the dominant component of the projected velocities, and therefore, we define the quantity to represent our “seizureness spectrum” as the maximal eigenvalue of the Hermitian matrix (9): Q(ν, n) ≡ max(eig(K(ν, Ωn ))).

(10)

This quantity is illustrated for a case of MMS in Fig. 1, top frame. E. Filtering and Detection Comparing the time-averaged eigenvalue spectra defined by (10), as shown in Fig. 1, bottom frame, from epochs containing clonic events with those from epochs that are at least 15 s away from any identified MMS, we selected the spectral “footprint” of MMS to be in the range of 2–6 Hz. To extract a normalized quantity from the spectral weights, we use a novel technique which we call spectral contrast defined as follows:    κ(ν)Q(ν, n) |κ(ν)| Q(ν, n). (11) C(n) ≡ ν

ν

The spectral weight function κ(ν) can have in general positive and negative values in order the result to highlight certain spectral components on the background of the other. Definition (11) can be seen as a generalization of the relative difference between

3382

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 59, NO. 12, DECEMBER 2012

as sure no-MMS during the measurement period, and +1— indication for a certain MMS. The intermediate values will be interpreted as different levels of certainty about the presence of MMS during the nth window of 4 s. F. Validation Datasets and Patients

Fig. 1. Top frame: 2-D pseudocolor plot of the largest eigenvalue of the Gabor wavelet amplitudes’ cross-covariance matrix. The computation is from video recording of a patient that has suffered MMS between 100 and 170 s (time on the horizontal axis) from the beginning of the video sequence. Vertical axis represents the central wavelet frequencies ranging from 0.5 to 12 Hz. The MMS event is clearly visible in the 1.5–6 Hz region. Middle frame: For the same case as on the top frame, trace of the spectral contrast defined in (12) (the red line) generated from the optical flow data and compared with a surrogate spectral contrast trace generated from random noise (black line). Horizontal line represents the same time span in seconds as on the top frame and the vertical axis is the spectral modulation. Bottom frame: Average eigenvalue spectra defined in (10), normalized to unit sum. The red line represents the average over 72 MMS of total length of 60 min and the blue line is the average over approximately 590 min of sequences not containing MMS and at least 15 s away from any identified MMS. The central wavelet frequencies are presented on the horizontal axis in hertz.

two positive numbers say a and b, which is (a − b)/(a + b) and takes values between (−1) and (+1). One strategy to select the weight function is to use a training set and optimize κ(ν) in such a way that the quantity (11) produces maximal separation between the values during MMS and the interictal measurements. Such a technique, however, may be dependent on the quality and variability in the set used, and requires extended cross-validation approach. Therefore, we decided to apply a more heuristic approach by postulating the weights as ⎧ ⎨ 1, ν ∈ [2, 6] κ(ν) = −1, ν ∈ [1, 2) ∪ (6, 10] (12) ⎩ 0, ν ∈ / [1, 10]. The result for the example used in the top frame of Fig. 1 is shown in the same figure in the middle frame. The quantity (11) is the final quantifier of the MMS used in this study. It has been applied to all cases without any further parameter changes. It is clear that for nonnegative spectral weights Q(ν, n) ≥ 0 the spectral contrast C(n) will range between (−1), interpreted

We have used data from 50 people with known MMScontained segments of synchronized EEG and video sequences ranging from 12 to 56 min. One or more seizures per individual were identified in the video sequences by human experts (D.V. and B.V. independently) while the end of each MMS was marked on the corresponding video and EEG records (D.V. only). A total of 72 seizure periods were identified with a total duration of 60 min. The total length of the analyzed records was 746 min. We compare the performance of our algorithm to a random detector we generated for each case of a surrogate set of six signals instead of the quantities from (6) randomly taken from a normal distribution with mean zero and variance 1. Accordingly, we performed all the steps as above and obtained the quantity (11) but for the surrogate signals. In Fig. 1, middle frame, the output of the surrogate feature (11) is superimposed with the actual signal derived from optical flow. To test the specificity of our method, we used an additional 37 h of video recording from a patient during four consecutive nights containing no observed seizure activity. G. Statistical Validation of the Detectors In this study, we use a simple statistical separation test to determine whether the distribution of the quantity (10) during the MMS periods is different from that during nonictal periods. Given that the quantity is bound between (−1) and +1, nonparametric Kolmogorov–Smirnov (K–S) tests, single-tailed and two-tailed, were used. To quantify the detection power of the spectral contrast quantity (11) and to compare with the surrogate simulations, we have adopted the following definition for the sensitivity: Sens(θ) ≡ (#k : max(C(n)) ≥ θ)/#(k). n ∈T k

(13)

Here, Tk is the time interval of the kth MMS out of 72 in total as determined by the clinical experts. The # symbol signifies the number of elements belonging to the corresponding set. It is a custom for all practical purposes and devices involved in clinical event detection to quantify the specificity as the number of false alarms per 24 h. Therefore, we define the general detector property of a false-positive rate (FPR) as FPR(θ) ≡ Γ#(n : C(n) ≥ θ)/#(n); n ∈ / MMS.

(14)

Here, Γ is the rate of measurements (1 Hz throughout this study), and therefore, the quantity 24 ∗ 3600 ∗ FPR(θ) will represent the expected number of false alarms for a 24-h standard period. All computations were performed on a PC, AMD Opteron 250, 2.39 Ghz dual processor, Windows XP, x64. Processing from prerecorded mpeg sequences was 10% slower than real

KALITZIN et al.: AUTOMATIC SEGMENTATION OF EPISODES CONTAINING EPILEPTIC CLONIC SEIZURES IN VIDEO SEQUENCES

3383

Fig. 2. Outputs of the spectral contrast values for all the 50 patient sequences (vertical axis represents the case number). On the horizontal axis is the time in seconds. Each pixel represents a calculation over a time window of 4 s performed every second (75% overlap). The pseudocolor scale is the spectral contrast. The black rectangles present the onset and durations of the clonic MMS as detected by human experts.

Fig. 3. Histograms (normalized to unit sum) of the spectral contrast quantity (10) calculated for the MMS epochs (red bars) and during the interictal epochs (blue bars). The interictal epochs are defined as periods with no observed MMS and not closer than 15 s from either the beginning or the termination of any registered MMS.

time, with live feed from a camera was roughly 5% faster than real time. III. RESULTS The detection results for all individuals are presented in Fig. 2. The expert findings are depicted as rectangular areas outlining the duration of MMS. The distributions of the spectral contrast quantity (10) are represented by the histograms in Fig. 3. We see that higher values of the spectral contrast quantity occur predominantly during seizures. The K–S tests show significantly (p