Corrosion Time Series Classification using the Haar Wavelet ...

2 downloads 1009 Views 1MB Size Report
power spectral density (PSD) are used. Thirdly, a Bayesian classifier is .... wavelet as a template. One can consider the Haar ..... 3746.html. [13] Z. Michalewicz ...
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE 1(1) 2004 1-12

Corrosion Time Series Classification using the Haar Wavelet Transform and MML Density Estimation G. Van Dijck, M. Wevers, and M. Van Hulle upon different actions to be taken by the service engineers of the chemical process installation. Therefore it is important to develop a pattern recognition system that is able to detect and classify the different corrosion processes automatically.

Abstract—Corrosion causes many failures in chemical process installations. These failures generate high costs, therefore an effective corrosion monitoring system obtrudes. This paper focuses on the classification of the most important corrosion processes: pitting, stress corrosion cracking (SCC) and general corrosion. The computations and algorithms involved in the classification of the corrosion time series are presented. A technique for trend removal of the time series is proposed. Features to extract the characteristics of the corrosion time series are designed. A genetic algorithm for the selection of features is described in which the correlations between features are exploited. The combination of the proposed techniques leads to a new high performing pattern recognition system for corrosion time series classification. Keywords—Genetic algorithm, Haar wavelet minimum message length, subspace decomposition.

I

II. ELECTROCHEMICAL NOISE (ECN) TIME SERIES Time series used in the analysis for this paper are acquired from electrochemical noise measurements on a reference probe of stainless steel in laboratory conditions [4]. In the final application the probe will be inserted in chemical plant installations. The corrosion behavior of the installation is then monitored from the corrosion of the reference probe. Time series are acquired at a sampling rate of 10Hz. A detailed description of the experimental set-up and different environmental conditions to induce the different types of corrosion can be found in [4]. For the analysis, 207 time series in absence of corrosion processes are selected, 207 time series with general corrosion, 75 time series with SCC and 90 time series with pitting. Fig. 1-Fig. 4 show examples for electrochemical noise time series in absence of corrosion, for general corrosion, for pitting and for SCC time series, respectively.

transform,

I. INTRODUCTION

N the literature [1]-[3] it is mentioned that about 3 to 5% of the Gross National Product in the United States is lost due to the costs caused by corrosion. Furthermore it is estimated [1], [2] that 35% of these costs can be avoided by the application of existing technologies (coatings, inhibitors, cathodic protection…) and by appropriate services (consulting, testing, monitoring…). Pitting, stress corrosion cracking (SCC) and general corrosion cause most of the corrosion damage in chemical process installations [4]. Pitting causes pits that lead to leaks, stress corrosion cracking causes cracks and general corrosion leads to a uniform decrease of thickness of the material. Each of the corrosion processes calls

-6

No corrosion

Amplitude (V)

x 10 2 0 -2 0

100

200

400 500 600 700 800 time (s) Fig. 1. Example of electrochemical voltage time series in absence of

Amplitude (V)

-3

Manuscript received November 4, 2004. This work was supported by the Institute for the Promotion of Innovation through Science and Technology in Flanders (IWT-Vlaanderen). G. Van Dijck is with the Computational Neuroscience Research Group of the Laboratory of Neurophysiology and the Department of Metallurgy and Materials Engineering (MTM), K.U. Leuven, 3000 Leuven, Belgium (phone: +32 16 32 12 47; fax: +32 16 32 19 90; e-mail: [email protected]). M. Wevers is with the Department of Metallurgy and Materials Engineering (MTM), K.U. Leuven, 3000 Leuven, Belgium, (e-mail: [email protected]). M. Van Hulle is with the Computational Neuroscience Research Group of the Laboratory of Neurophysiology, K.U. Leuven, 3000 Leuven, Belgium, (email: [email protected]).

2

x 10

300

General corrosion

0 -2 0

100

200

300

400 500 time (s)

600

700

800

corrosion. Fig. 2. Example of electrochemical voltage time series for general corrosion.

1

INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE 1(1) 2004 1-12

Amplitude (V)

-3

4 2 0 -2 0

100

where L = M + N – 1. Next, this matrix is decomposed by singular value decomposition:

Pitting

x 10

200

300

400 500 time (s)

600

700

H = ( UT

800

Fig. 3. Example of electrochemical voltage time series for pitting. -4

Amplitude (V)

x 10 2 0 -2 -4 0

US

400 500 600 700 800 time (s) Fig. 4. Example of electrochemical voltage time series for SCC. Observe the characteristic voltage transients for SCC at about 200 s and 500 s.

Different stages in the design of the pattern recognition system will be presented in this paragraph. Firstly, a trend removal technique is proposed to facilitate the identification of time series in absence of corrosion. Secondly, features are derived from the time series. In the extraction of features the Haar wavelet transform and the power spectral density (PSD) are used. Thirdly, a Bayesian classifier is designed using the minimum message length (MML) criterion for density estimation. Finally, a genetic algorithm is described to search for a subset of features that leads to a high classification performance.

x3

"

x4 #

" #

" xL − 2

xN −1   #  #   xL − 2  xL −1 

Σ 0 ∈ \ ( N − K − P )×( N − K − P ) ,

relevant signal part is not considered. Observation of the singular values in (2) from the time series shows one large singular value and several smaller singular values. According to the model this singular value is identified as belonging to the trend. Therefore it is further assumed that K equals 1. Matrix HT is not of Hankel structure. Averaging the diagonal elements of HT for which i+j is constant, where i indicates the row index and j the column index, repairs the Hankel structure. The trend for the particular frame is then found on the main diagonal. The complete trend for the whole time series is found by adding the 50% overlapping trends from each frame. The use of the Hanning window in the time series decomposition guarantees a smooth reconstruction. Fig. 5 shows the estimated trend superposed on an electrochemical noise time series, with L set to 1500.

A. Trend Removal From Time Series The observed time series contain voltage drifts, both linear and nonlinear, which are not related to the active corrosion process. These drifts are related to changes in system influences: change in temperature, change in flow of the liquid… [4]. These trends cause the different corrosion processes to have similar autocorrelations. This makes the separation between different corrosion processes more difficult. The proposed technique for trend removal operates in time domain by subtracting the estimated trend from the original time series. A description of the subspace decomposition based trend removal algorithm follows. First the signal is decomposed into 50% overlapping frames of length L. All samples within a frame are then multiplied by the corresponding coefficients of a Hanning window of size L. The resulting data samples are arranged in an MxN Hankel matrix structure:

"

Σ S ∈ \ P× P ,

models the electrochemical voltage signal as a trend (referred to by subscript T) on which the relevant part of the signal (referred to by subscript S) is superposed. Furthermore it is assumed that noise, not related to a corrosion process (referred to by subscript 0), is present and superposed on the relevant signal part and the trend. In this paper only the subtraction of HT form H is addressed. The separation of noise from the

III. DESIGN OF THE PATTERN RECOGNITION SYSTEM

x2

(2)

VT ∈ \ N × K , VS ∈ \ N ×P , V0 ∈ \ N ×( N − K − P ) . Equation (2)

300

x1  x0  x2  x1 H ( xn ) =  x2 x3  #  # x  M −1 "

0   VT'    0   VS'  = Σ 0   V0' 

U T ∈ \ M × K , U S ∈ \ M × P , U 0 ∈ \ M ×( N − K − P ) ,

ΣT ∈ \ K × K , 200

0 ΣS 0

UT ΣT VT' +(U S Σ S VS' + U 0 Σ 0 V0' ) = HT + H S + 0

Stress corrosion cracking (SCC)

where

100

 ΣT  U0 )  0  0 

Estimated trend

Amplitude (V)

0.0887

0.0887

0.0887 0

200

400 time (s)

600

800

Fig. 5. Estimated trend. The curve in white is the estimated trend from the subspace decomposition technique. Fig. 1 is obtained from Fig. 5 after subtracting the trend from the signal.

Fig. 1-Fig.4 are obtained by removing the estimated trend from the original time series. Fig. 6a and Fig. 6b show the effect of the detrending technique on the separation of electrochemical noise time series in absence of corrosion and electrochemical noise time series with general corrosion.

(1)

2

2

Before detrending

1 0 -1 -2 -2 0 2 4 Projections on 1st PCA axis

Fig 6a. Result before detrending. Each data point indicated with ‘+’ is a noise time series represented by 2 parameters. Data points indicated with ‘.’ are the general corrosion time series represented by 2 parameters. These parameters are derived from the PSD parameters to be defined in paragraph B.

Projections on 2nd PCA axis

Projections on 2nd PCA axis

INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE 1(1) 2004 1-12 After detrending

1

ψ (t ) = 1, ∀t ∈ [0, 0.5[; ψ (t ) = −1, ∀t ∈ [0.5,1[; ψ (t ) = 0, ∀t ∉ [0,1[

0

The continuous wavelet transform (CWT) is defined by [10]:

2



-1

F ( a, b) =

-2

where ψ a ,b (t ) =

(5)

1 t −b ψ( ), with a ≠ 0 and a, b ∈ \ a a

Parameter ‘a’ represents a scale parameter and parameter ‘b’ represents a time shift parameter. The CWT is applied in this case as a correlator with the Haar wavelet as a template. One can consider the Haar wavelet as a mathematical simplification of a corrosion transient occurring in the electrochemical noise time series. At positions where transients occur, large wavelet coefficients (5) are generated. By making comparisons in standard deviations, energy… of positive and negative wavelet coefficients one can quantize the ‘asymmetry in transient behaviour’ in parameters. With ‘asymmetry in transient behaviour’ the tendency to contain positive voltage transients, as observed in pitting, vs. negative voltage transients, as observed in SCC, is meant. One set of parameters compares standard deviations of positive vs. negative accumulated wavelet coefficients:

Fig. 6b. Result after trend removal. Noise time series and general corrosion time series clearly fall apart in 2 separate clusters. The same PSD parameters were used as in Fig. 6a. Principal component analysis (PCA) was used in Fig. 6a and Fig. 6b to extract 2 parameters from the set of PSD parameters.

B. Feature design from power spectral density (PSD) and Haar wavelet transform Features are extracted from the time series to characterize each corrosion process in a few parameters. In this paragraph the choice of features is motivated. Three sets of features will be computed: one set derived from the PSD and two sets from the Haar wavelet transform. A GA selects among these sets of features the features that are relevant to distinghuish between the different corrosion processes. 1) Features from the power spectral density It is common practice to consider the power spectral density in analysis of electrochemical noise corrosion time series [5], [6]. Electrochemical noise voltages show power law behavior [7]. This means the power spectrum can be interpolated by an f -β function: 2

f (t )ψ a ,b (t )dt

−∞

-3 -4 -2 0 2 Projections on 1st PCA axis

A( f ) = c. f − β .



(4)

x(k ) =

std ( F (b) acc ,k )

F ( b )acc ,k < 0

std ( F (b) acc ,k )

(6)

F ( b )acc ,k > 0

where F (b) acc , k =

k

∑ F (a, b) , std is the standard deviation a =1

operator. Accumulation of wavelet parameters over different resolution levels ‘a’ leads to extreme values at the position of the transients. Another set of parameters is computed by extracting the maxima, F (b) acc , k ,max , and minima,

(3)

When power law behavior is observed for at least two decades, the time series are considered fractal [8]. However power law behavior is not present in all time series for the complete range of two decades. Therefore the frequency scale is divided in different decades. The 1st decade ranges from 0.012Hz-0.12Hz, the second from 0.12 Hz-1.2Hz and the last part of a decade from 1.2Hz-5Hz (half the sampling rate). To each frequency range and all combinations of ranges a straight line is fitted to the log. PSD vs. log. frequency scale using the least squares criterion. The PSD is estimated by means of the periodogram method [9]. The slopes of the lines are considered as parameters. By making all combinations of frequency ranges, what leads to 7 parameters, potentially irrelevant parameters are derived. The selection of relevant parameters is dealt with in a feature subset selection algorithm. 2) Features from the Haar wavelet transform The choice of the Haar wavelet transform is motivated by observation of transients in pitting and SCC data.

F (b) acc ,k ,min , from F (b) acc ,k and comparing the power in the extremes:

y (k ) =

mean( F (b) 2acc ,k ,max ) mean( F (b) 2 acc ,k ,min )

(7)

Note that x(k) and y(k) represent sets of parameters, which depend upon the upper limit of accumulation. It is not a prior known which k values lead to good features. This is related to the duration of transients. Selection of good k values is addressed in the feature subset selection algorithm. Note that features from x(k) are strongly correlated, as well as the features from y(k). This is due to the fact that a small change in k generates very similar features. C. Classifier In this paper a Bayesian classifier is opted for. In a Bayesian classifier feature x is assigned to the class with maximum posterior class probability:

The mother wavelet [10] is defined by: 3

INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE 1(1) 2004 1-12

P(x | θ, Ci ) P(Ci ) j = arg max P(Ci | x) = P ( x) i

best combination of clusters a 2nd local search with a GA with fixed length of chromosomes is performed in which the features within the clusters are considered. As fitness function, a cosh function on the classification performance is chosen, averaged over 5 runs of 10-fold crossvalidation. The Bayesian classifier is used in the prediction of the class labels.

(8)

where Ci represents the class. The class conditional densities P (x | θ, Ci ) are estimated by means of a Gaussian mixture model (GMM): k

P (x | θ, Ci ) = ∑ α m p (x | θm , Ci )

IV. RESULTS

(9)

From 4 different runs of the GA algorithm the best subset contained 5 features. 2 features are selected from set x(k), 2 from set y(k) and 1 from the set of PSD slopes. For this selection of features a very high classification performance is obtained. The classification performance averaged over 10 runs of 10fold crossvalidation equals 96.41%. Observation from confusion matrices shows that most confusion occurs between: general corrosion – SCC and general corrosion – pitting. Fig. 7 shows the decision boundaries computed by the Bayesian classifier for a good selection of 2 parameters.

m =1

where θ = {θ1 ,..., θ k , α1 ,...α k } . θ m are the parameters of the Gaussian probability function p ( x | θ m , Ci ) . α1, …, αk are the mixing probabilities that must satisfy: k

α m ≥ 0, m = 1, ..., k , and ∑ α m = 1. m=1

The parameters in (9): k, αm, θm, are optimized by minimization of the MML criterion for GMM’s [11].

L(θ , Y) =

N 2

∑ α

m:

m >0

log(

nα m k n ) + nz log( ) 12 2 12

k ( N + 1) + nz − log p ( X | θ) 2

(10)

where knz denotes the number of nonzero probability components, N is function of the dimensionality of the data, n the number of data points. Log p(X|θ) is the log-likelihood of the data: n

log p( X | θ) = log(∏ p(x( i ) | θ)) = i =1

  log  ∑ α m p(x( i ) | θ m )  . ∑ i =1  m =1  n

k

(11)

Fig. 7. Decision boundaries computed from Bayesian classifier. The data topmiddle indicated by ‘○’ are the noise time series represented by 2 parameters. The data on the left indicated by ‘+’ are SCC data. Data indicated by ‘□’ are pitting data. Data indicated by ‘.’ are general corrosion data. On the X-axis the value of a wavelet parameter is shown. On the Y-axis the slope from a line fit on the log PSD over all 3 decades is shown. From this figure, one can conclude: the PSD parameter is important for the identification of noise. The wavelet parameter, selected from x(k) with k = 24, is important for the identification of different corrosion processes. The PSD-slope for noise is centered on 0. This indicates that the noise is approximately white. The classification performance averaged over 5 runs of 10-fold crossvalidation for this selection of parameters is: 92.8%.

Minimization of (10) is performed by means of the component-wise expectation maximization (CEM) algorithm [11], [12]. While the maximum-likelihood (ML) criterion has been extensively used in literature it cannot be used to estimate the number of components k. D. Feature subset selection A two stage genetic algorithm (GA) [13] with variable chromosome length is designed to search for a subset of features with high classification performance. In the GA the correlations between features are exploited. It is shown from different simulations that this leads to subsets with higher classification performance compared to a standard GA without exploitation of the correlations [14]. A detailed description of the GA in combination with the classifier can be found in [14]. A short description follows. In a first step the features are clustered with the correlations as a measure of distance between features. An agglomerative hierarchical clustering method [15] is used. For each cluster of features a representative feature is chosen. For this the feature closest to the cluster center is chosen. Next a GA with variable length of chromosomes searches for a combination of clusters that leads to a high classification performance using the representative feature from each cluster. Starting from the

V. CONCLUSION A new high performing pattern recognition system is designed for electrochemical noise time series classification of different types of corrosion of stainless steel. A detrending technique based on subspace decomposition is proposed. It was shown that the detrending technique leads to an easier identification of noise. It was shown that a combination of features based on the PSD and the Haar wavelet transform leads to a high classification performance by means of a Bayesian classifier.

4

INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE 1(1) 2004 1-12 REFERENCES [1] [2] [3]

[4]

[5] [6] [7] [8]

[9] [10] [11] [12]

[13] [14]

[15]

L.H. Bennett, J. Kruger, R.L. Parker, E. Passaglia, C. Reimann, A.W. Ruff, and H. Yakowitz, “Effects of metallic corrosion in the United States,” NBS, Special Publication 511-1, Washington D.C., 1978. “Economic effects of metallic corrosion in the United States: a 1995 update,” Battellle Report to Specialty Steel Industry of North America, April 1995. G.H. Koch, M.P.H. Brongers, N.G. Thompson, Y.P. Virmani, J.H. Prayer, “Corrosion costs and preventive strategies in the United States,” CC Technologies Laboratories, Report FHWA-RD-01-156, Washington D.C., 2001 M. Winkelmans, “Fusion of non-destructive testing techniques for corrosion monitoring in chemical process installations,” Ph.D. dissertation, Dept. Metallurgy and Materials Engineering, K.U. Leuven, Leuven, 2004. P.R. Roberge, “Analysis of electrochemical noise by the stochastic process detector,” Corrosion, vol.50, no.7, pp. 502-512, July 1994. Electrochemical Noise Measurements for Corrosion Applications, ASTM, STP 1277, Philadelphia, PA, 1996, pp. 114-128. A. Legat, V.Dole…ek, “Chaotic analysis of electrochemical noise measured on stainless steel,” J. Electrochem. Soc. vol. 142, no. 6, June 1995. A.Eke, P. Hermán, J.B. Bassingthwaughte, G.M. Raymond, D.B. Percival, M.Cannon, I.Balla, C. Ikrényi, “ Physiological time series: distinguishing fractal noises from motions,” Eur. J. Physiol. 2000, pp. 403-415. A. Schuster, “On the investigation of hidden periodicities with application to supposed 26 day period of meteorological phenomena,” J. Geophys. Res. 3:13-41 I. Daubechies, Ten Lectures On Wavelets, CBMS Regional Conference Series in Applied Mathematics # 61, SIAM, 1992. M. A.T. Figueirido, A.K. Jain, “Unsupervised learning of finite mixture models,” IEEE Trans. On Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp.381-396, March 2002. G. Celeux, S. Chrétien, F. Forbes, and A. Mkhardi, “A component wise EM algorithm for Mixtures,” Technical report 3746, INRIA, RhôneAlpes, France, 1999. Available: http://www.inria.fr/RRRT/RR3746.html. Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs. Springer-Verlag, 3rd Edition, 1999. G.Van Dijck, M.Wevers, M. Van Hulle, “Genetic algorithm for feature subset selection with exploitation of feature correlations from wavelet transform: a real-case application,” submitted for publication, International Conference on Computational Intelligence, ICCI 2004. R. O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, 2nd ed., Wiley-Interscience, 2000, pp. 550-559.

5