Wavelet-RKHS-based functional statistical classification - Springer Link

1 downloads 0 Views 683KB Size Report
Aug 19, 2012 - Abstract A functional classification methodology, based on the ... developed for testing the performance of this statistical classification ...
Adv Data Anal Classif (2012) 6:201–217 DOI 10.1007/s11634-012-0112-4 REGULAR ARTICLE

Wavelet-RKHS-based functional statistical classification M. Rincón · M. D. Ruiz-Medina

Received: 7 January 2012 / Revised: 6 June 2012 / Accepted: 9 July 2012 / Published online: 19 August 2012 © Springer-Verlag 2012

Abstract A functional classification methodology, based on the Reproducing Kernel Hilbert Space (RKHS) theory, is proposed for discrimination of gene expression profiles. The parameter function involved in the definition of the functional logistic regression is univocally and consistently estimated, from the minimization of the penalized negative log-likelihood over a RKHS generated by a suitable wavelet basis. An iterative descendent method, the gradient method, is applied for solving the corresponding minimization problem, i.e., for computing the functional estimate. Temporal gene expression data involved in the yeast cell cycle are classified with the wavelet-RKHS-based discrimination methodology considered. A simulation study is developed for testing the performance of this statistical classification methodology in comparison with other statistical discrimination procedures. Keywords Functional data analysis · Gene expression profiles · Penalized logistic regression · Reproducing kernel Hilbert space · Wavelet decomposition · Yeast cell cycle Mathematics Subject Classification

62J12 · 60G15

1 Introduction In the last two decades, Functional Statistics emerges as a new developing area of research (see Ferraty and Vieu 2006; Ruiz-Medina and Salmerón 2010, among others). Special attention has been paid to the formulation of linear models in functional spaces (see, for example, Abramovich and Angelini 2006; Cardot and Sarda 2005). In the non-parametric functional framework, the non-linear case has also been M. Rincón · M. D. Ruiz-Medina (B) Department of Statistics and Operations Research, University of Granada, Campus de Fuente Nueva s/n, 18071 Granada, Spain e-mail: [email protected]

123

202

M. Rincón, M. D. Ruiz-Medina

addressed (see Ferraty and Vieu 2006; Rachdi and Vieu 2007, among others). Recently, Reproducing Kernel Hilbert Space (RKHS) theory reveals as a useful framework for the analysis of the non-linear functional case (see, for example, Lian 2007; Mendelson 2002; Preda 2007). The mentioned results have attracted the interest from several disciplines, among which we highlight Biomedicine and Bioinformatics. In this paper, we are motivated by recent research development in these areas in relation to the statistics classification of gene expression profiles, since it plays a crucial role in the yeast cell cycle control (see Spellman et al. 1998). In particular, in the functional statistical framework, different methodologies have been adopted (see Araki et al. 2009a; Leng and Müller 2006; Ferraty and Vieu 2006; Hall et al. 2001; Müller and Stadtmüller 2005; Rincón and Ruiz-Medina 2012, among others). In this paper, we consider the context of statistical learning methods based on kernels (see, for example, Schölkopf and Smola 2002). In this framework, the unknown function is estimated, considering its optimal approximation in a functional class given by a RKHS, under some prescribed criterion. Preda (2007) applies this methodology in the regression problem with functional predictors, while Lian (2007) considers functional responses in RKHSs, both covering the non-linear case. Here, we adopt the RKHS approach for estimation of the parameter function, involved in the functional logistic regression model, considering the penalized negative log-likelihood criterion. The RKHS studied is generated by an orthonormal wavelet basis, since these functions allow to process singular signals, covering a wider family of parameter functions (e.g. functions displaying singularities) than the ones approximated in terms of global spline, and local spline or polynomial estimators. The penalized term, that reflects the balance between bias (model fitting) and variance, is then defined in terms of the wavelet norm of the associated RKHS. Such a RKHS coincides with a fractional Sobolev space endowed with the associated wavelet norm, under suitable conditions on the wavelet basis, i.e., in the case of a ([r ] + 1)-regular multiresolution analysis with r > 1/2, where r denotes the continuity order of the scaling function, as well as of the number of derivatives of such a function having fast decay (see Angelini et al. 2003). Summarizing, we present here an application of the Preda (2007) results to the context of Reproducing Kernel Hilbert Spaces (RKHSs) generated by wavelet bases, for estimation of the parameter function involved in the functional logistic regression model. These results present an alternative estimation methodology w.r.t. the ones derived in Leng and Müller (2006) and Rincón and Ruiz-Medina (2012), based on generalized linear model fitting, in terms of Functional Principal Component Analysis (FPCA) and local wavelet-vaguelette decomposition, respectively. Alternatively, chaos game representation and multifractal analysis can be considered in clustering and classification of functional protein sequences displaying singular features (see, for example, Yang et al. 2009a,b). To test the wavelet RKHS-based functional statistical classification methodology proposed, the gene expression profile data set by Spellman et al. (1998) is analyzed. A simulation study is also developed, where the proposed functional statistical classification procedure is compared with the classification methodology established in Araki et al. (2009a,b), in terms of Gaussian basis expansion. The regularization parameter is chosen by leave-one-out cross validation, in the real data example, and by applying the generalized information criterion, GIC (see Konishi and Kitagawa 1996),

123

Wavelet-RKHS-based functional statistical classification

203

and the generalized Bayesian information criterion, GBIC (see Konishi et al. 2004), in the simulation study. A final discussion is provided on the applicability of the estimation and discrimination methodology proposed. 2 The model Generalized Linear Models (GLM) (see, for example, McCullagh and Nelder 1989) constitute a flexible extension of classical linear models where the response variables, Y1 , . . . , Y N , are independent and identically distributed (i.i.d.), with probability distribution in the exponential family. The logistic regression model is defined in terms of a set of parameters β1 , . . . , β H , the explanatory variables {xi1 , . . . , xi H }, i = 1, . . . , N , the logit link function g(x) = log{x/(1 − x)}, such that E(Yi |xi1 , . . . , xi H ) = μi = g −1 (ηi ), with ηi = Hj=1 xi j β j and the response variables Yi , i = 1, . . . , N , which are conditionally distributed as a Bernoulli with mean μi and variance μi (1 − μi ). In the functional formulation of the above-described logistic regression model (see, for example, Müller and Stadtmüller 2005), the parameter function β is a square integrable function, and the explanatory variables X i (t), t ∈ [0, T ], i = 1, . . . , N , are functional variables with values in a separable Hilbert space H, e.g., H = L 2 ([0, T ]), with [0, T ] being a real interval, and T ηi = α +

β(t)X i (t)dt,

(2.1)

0

where α is a constant, and the integral is understood in the mean-square sense, and in the sample-path sense, in the Gaussian case actually assumed for the random curves X i (t), t ∈ [0, T ], i = 1, . . . , N . The linear predictors ηi , i = 1, . . . , N , define the conditional mean and variance, E(Yi |X i (t)) = μi = g −1 (ηi ), V ar (Yi |X i (t)) = σ 2 (μi ) = μi (1 − μi ), for a binary response variable Yi , through the logit function g. The following wavelet-based approximation of the functional explanatory variables X i (t), i = 1, . . . , N , and the parameter function β(t) is considered in the L 2 -sense: X i (t) =



cij0 ,k φ j0 ,k (t) +

k∈0

β(t) =



 

d ij,k ψ j,k (t)

(2.2)

d ∗j,k ψ j,k (t),

(2.3)

j≥ j0 k∈ j

c∗j0 ,k φ j0 ,k (t) +

k∈0

 

j≥ j0 k∈ j

in terms of an orthonormal scaling basis {φ j0 ,k , k ∈ 0 }, and a wavelet basis {ψ j,k , k ∈  j , j ≥ j0 }, providing a multiresolution analysis of L 2 ([0, T ]), and the orthogonal decomposition L 2 ([0, T ]) = V0



Wj,

j≥ j0

123

204

M. Rincón, M. D. Ruiz-Medina

for a given mother, ψ, and father, φ, wavelet functions (see, for instance, Vidakovic 2006). Here, V0 denotes the subspace of L 2 ([0, T ]) generated by {φ j0 ,k , k ∈ 0 } and W j denotes the subspace of L 2 ([0, T ]) generated by {ψ j,k , k ∈  j }, for each j ≥ j0 . Moreover, 0 is such that the supports of functions φ j0 ,k , k ∈ 0 , cover the original compact support of the father wavelet φ. Similarly,  j denotes the set of integers k considered in the translation of the argument of function ψ j· for covering the original compact support of the mother wavelet ψ. Here, the random wavelet coefficients in Eq. (2.2) define a series convergent in the mean-square sense, and in the sample-path sense, since, as commented before, X i (t), t ∈ [0, T ], are assumed to be Gaussian, for i = 1, . . . , N . Then, from the orthonormality of wavelet basis selected, considering Eqs. (2.2) and (2.3), the functional formulation of the logistic regression model can be equivalently given in terms of the infinite-dimensional generalized linear model: Yi = g −1 (ηi ) + ei    = g −1 α + β(t)X i (t)dt + ei ⎛ ⎞    = g −1 ⎝α + cij0 ,k c∗j0 ,k + d ij,k d ∗j,k ⎠ + ei . k∈0

(2.4)

j≥ j0 k∈ j

The errors ei are independent with mean 0 and variance σ 2 (μ). Remark 1 In the case where the observed times are not equally spaced, interpolation must be performed in order to obtain the values of the data curves on a regular grid of the considered temporal interval, for the application of the discrete wavelet transform of the data. 3 Penalized logistic regression through wavelets with a RKHS approach As commented in Sect. 1, we adopt the statistical learning method approach, based on kernels, to derive a functional estimate of parameter function β. That is, β estimate is searched over a class of functions which has the properties of a Hilbert space H with reproducing kernel K, denoted as HK . here is generated by an orthonormal wavelet basis. The RKHS HK considered Specifically, HK = H0 H1 , where H0 and H1 are the spaces, respectively, characterized by the following reproducing kernels on [0, T ]2 , for a fixed j0 ∈ Z : K (s, t) = 0

j0 −1 2

k=0

−1  2 j

φ j0 ,k (s)φ j0 ,k (t) K (s, t) = 1

λ j ψ j,k (s)ψ j,k (t), (3.1)

j≥ j0 k=0

 where, for j ≥ j0 , λ j ∈ R+ , and j≥ j0 2 j λ j < ∞. Here, we have considered a compactly supported wavelet basis inducing a dyadic regular discretization of the interval

123

Wavelet-RKHS-based functional statistical classification

205

[0, T ]. Thus, 0 = {0, 1, . . . , 2 j0 − 1}, and  j = {0, 1, . . . , 2 j − 1}, for j ≥ j0 . 2 Note that H0 = V0 = sp L ([0,T ]) {φ j0 ,k , k ∈ 0 }, and K = K 0 + K 1 . Remark 2 From Proposition 2.1 of Angelini et al. (2003), when the wavelet basis involved in the kernel construction defines a ([r]+1)-regular multiresolution analysis, with r > 1/2, HK coincides with the classical Sobolev space H2r [0, T ], for λ j = 2−2 jr . In the computation of a wavelet-RKHS-based estimate of β, we consider the loss function defined by the penalized negative log-likelihood. The penalty term controls the local variability at micro-scale level of the functional estimate of β. That is, our functional penalized loss function L(X, Y, β(X )) = −l(X, Y, β(X )) + J (β),

(3.2)

where X represents the stochastic process underlying to our observed explanatory random variables, and Y denotes the random variable model defining the response. Here, the log-likelihood function l is given by l(X, Y, β(X )) = Y η(β(X )) − log(1 + ex p(η(β(X ))),

(3.3)

with T β(t)X (t)dt, ∀X ∈ L 2 ([0, T ]),

β(X ) = 0

and the penalty term J (β) is defined as J (β) = ||P1 β||2H1 , −1 [d ∗ ]2  2 j,k j

||P1 β||H1 = 2

j≥ j0 k=0

λj

.

(3.4)

where the norm of the projection of function β into the spaces generated by wavelet bases, providing its micro-scale local variation, is computed, in terms of the corresponding Fourier coefficients, by applying Parseval identity. Thus, to estimate the parameter function β, we proceed to minimize the loss function L(β) in equation (3.2). Since H0 = { f ∈ HK |J ( f ) = 0}, the smoothing parameter provides the balance between model fitting and local variability. In the derivation of the existence and uniqueness of the solution to the minimization problem, given in terms of the penalized loss function L(β), as well as of the consistency of the estimator defined by the minimizer, we will apply the results formulated in Preda (2007). To this aim we choose an appropriate domain for our convex loss function. Specifically, we consider the Hilbert space H defined as H = sp L

2 ([0,T ])

{g ∈ L 2 ([0, T ]), g L ∞ ≤ C},

123

206

M. Rincón, M. D. Ruiz-Medina

for a certain finite positive constant C, where g L ∞ = sup g(x). x∈[0,T ]

Assume now that the next condition is satisfied by the stochastic process model underlying to our observed functional explanatory random variables: A.1 X  L ∞ ≤ C, a.s. The following results are then considered.

∈ HK to minimize Theorem 1 Under A.1, the solution to the problem of finding β N 1  (−l(X i , Yi , β(X i ))) + J (β), N i=1

exists, is unique as a functional on H, and admits the following weak-sense representation: T

N

β (t)g(t)dt =

N  i=1

0

T T g(t)K(t, s)X i (s) dsdt,

ai 0

(3.5)

0

for all g ∈ H. Remark 3 In the case where the selected wavelet basis provides a ([r ] + 1)−regular multiresolution analysis, with r > 1/2, and λ j = 2−2 jr , from Embeddings Theorems of fractional Sobolev spaces into Hölder spaces (see, for example, Triebel 1978), the

N becomes strong-sense, i.e., above weak-sense definition of the functional estimate β N

admitting the above kernel-based representation, there exists a unique function β which can be univocally (pointwise) determined from identity (3.5). Proof The proof follows directly from the application of Theorem 2 (Representer theorem) of Preda (2007) to our wavelet-based RKHS functional context. Proposition 1 Under A.1, the following inequality holds for the minimizer derived in Theorem 1:    N 1    κ2   N N

(X i ))) − E −l(X, Y, β

(X ))  > ε + (−l(X i , Yi , β P  N  N i=1   2N ε2 2 , (3.6) ≤ 2 exp − 9κ 4 where κ is such that, for g ∈ H,   T T     K(t, s)g(t)g(s)dsdt ≤ κ. 0

123

0

Wavelet-RKHS-based functional statistical classification

207

Proof The integral operator K defined from kernel K in Eq. (3.1) is symmetric and positive. From Lemmas 1.1 and 1.2 in Vakhania et al. (1987), operator K then admits ∗ −→ L 2 ([0, T ]), a factorization in terms of a linear continuous operator A : H ∗ ) = L 2 ([0, T ]), such that K = A∗ A, and with H  being the Hilbert space with A( H defining the domainof K. In particular, in our case, from the positiveness of K, conj sidering condition j≥ j0 2 λ j < ∞, and from the orthonormality of the wavelet 2 basis generating L ([0, T ]), we can consider A = K1/2 , with   K(g)(g) = K1/2 (g), K1/2 (g)

L 2 ([0,T ])

= K1/2 (g)2L 2 ([0,T ]) ≤ Lg2L 2 ([0,T ]) ,

for all g ∈ L 2 ([0, T ]), and for certain finite positive constant L . Under assumption A.1, we then have   T T   √ √   K (t, s)X (t)X (s) dsdt ≤ LX  L 2 ([0,T ]) ≤ LC T < ∞, a.s. (3.7) 0

0

√ The same upper bound LC T is obtained for every function in H. Thus, from equation (3.7) and the definition of H, equation (3.6) holds from the application of Theorem √ 5 and Proposition 6 of Preda (2007), considering the Hilbert space H and κ = LC T. Corollary 2 Let us define β opt ∈ HK the minimizer of the expected risk E [−l(X, Y, β(X ))] . The following limits hold: 1 

N (X i ))) −→ 0 (−l(X i , Yi , β opt (X i ))) − (−l(X i , Yi , β P N i=1    

N (X )) − E −l(X, Y, β opt (X )) −→ 0, E −l(X, Y, β N

P

(3.8)

where, as before, X represents the stochastic process underlying to our observed explanatory random variables, and Y denotes the random variable model defining the response. The proof is as given in Corollary 8 of Preda (2007). 4 Computation The first step in the computation of the above-derived functional estimate of β is to select the highest resolution level M for suitable description of the local regularity properties of our β function. Note that large values of M are needed when function β displays high local singularity, i.e., high local variability, that must be collected at the microscale level. The regularity properties of the wavelet functions considered also affect the selection of M. Without prior information on the local variability features

123

208

M. Rincón, M. D. Ruiz-Medina

of β, M can be chosen by cross-validation. Additionally, in practice, usually j0 = 0,  j λ < ∞, according to the 2 and λ j , j ≥ j0 , are selected under condition j j≥ j0 regularity properties of the wavelet functions considered (see Remarks 2 and 3). Note that the empirical Hölder spectra of the observed functional explanatory variables X i , i = 1, . . . , N , define, by duality, the desirable local regularity properties of the wavelet functions in the basis to be selected. After selection of M, in the computation of the functional estimate of β on [0, T ],  with wavelet transform  we consider its approximation by function β β = (c∗ , d∗ ), ∗ ∗ ∗ ∗ with c = (c j0 ,k )k∈0 , and d = (d j,k )k∈ j , j∈{ j0 ,...,M} . Then, the penalty term is approximated by M 2 −1 [d ∗ ]2  j,k j

||2 = ) = ||P1 β J (β H1

j= j0 k=0

λj

T  β β ,

(4.1)

where  is a diagonal matrix, whose entries σii are defined as follows: 1 , i = 1, . . . , 2 j0 − 1 λ j0 1 , i = 2 j0 , . . . , 2 j0 + 2 j0 +1 − 1 σii = λ j0 +1 ... M−1 M−1   1 , i= 2k , . . . , 2k + 2 M − 1. σii = λM

σii =

k= j0

k= j0

A single Newton–Raphson equation is given by β new = β old − (Dβ2 l (β))−1 (∇β (l )(β))  )−1 (XT Wz(β old )) = (XT WX +  where the initial value of β is computed by applying weighted penalized least-squares, and z(β new ) = Xβ old + W−1 (Y − μ).

(4.2)

Here, Y is the vector of responses, W is a diagonal matrix with entries μi (1−μi ), X  is an extended version of  matrix with the is the matrix with rows (1, Xi ), and  first row and column of zeros. Note that, Xi = (ci , di ) is the truncated empirical wavelet transform of the functional explanatory variable X i , with ci = (cij0 ,k )k∈0 , and di = (d ij,k )k∈ j , j∈{ j0 ,...,M} , for i = 1, . . . , N , as defined in (2.2), and i is the i − th column of matrix . Remark 4 In Sects. 5 and 6, the discrete wavelet transform (ci , di ) of the observed values of Xi , i = 1, . . . , N , is computed. Note that in these sections the discrete Haar, Symlet, Coiflet and Gaussian wavelet transforms of the data are considered.

123

Wavelet-RKHS-based functional statistical classification

209

In this paper, functional logistic regression based classification is considered (see Rincón and Ruiz-Medina 2012, for an alternative implementation of this classifica tion procedure). Therefore, once we have computed the estimate β of the wavelet , to distinguish between the two considered groups A and B, respectransform  β of β tively, associated with the values zero and one of the response, a prior probability p0 is considered for A class memberships, and similarly, a prior probability p1 is considered for B class memberships. Thus, for each centered sample curve X (t), if  ˆ = 1|X ) ≈ g −1 ( η) ≈ g −1 ((X)T ∗ β) ≥ p1 , X (t) curve is classified as a member P(Y of B class. Otherwise, it is assigned to A class. Here, as before, X denotes the truncated empirical wavelet transform of the sample curve X (t). In the next section, the results derived are applied to a real-functional-data set. In this application, the smoothing parameter is selected by minimization of the leave-oneout cross validation error rate (see Picard and Cook 1984). To illustrate the performance of alternative model selection criteria in the choice of the smoothing parameter , in Sect. 6, the generalized information criterion (GIC) and the generalized Bayesian information criterion (GBIC), proposed by Konishi et al. (2004), and extended to the context of nonlinear logistic model by Kawano and Konishi (2009) and Araki et al. (2009a,b), are applied to the generated functional data sets in the simulation study developed. In the last case, the results are compared to those obtained by Araki et al. (2009a,b), within the same class of functional data models. 5 Analysis of yeast cell cycle gene expression profiles We use the temporal gene expression data (α factor synchronized) for N = 90 genes involved in the yeast cell cycle obtained by Spellman et al. (1998) as sample curves. The gene expression is measured every 7 min between 0 and T = 119 min (both time instants included), thus, n = 18 observations for each gene. It is known that 44 of these genes are related to G 1 phase regulation, and 46 to the S, S/G 2 , G 2 /M and M/G 1 phases. The temporal Haar wavelet transform is applied to each gene expression curve, with M = 5 resolution levels, selected by cross-validation. Then, β is reconstructed in H from its truncated wavelet expansion (see Remark 3). Figure 1 displays the original data (left) with their approximation in terms of the Haar wavelet decomposition (right). We choose smoothing parameter minimizing the cross-validation classification error rate CVE (see Picard and Cook 1984). Suppose that the ith gene is missing, the estimate of the conditional mean is computed from a functional sample of 89 genes expression curves. That is, an approximation η−i of η is obtained from the sample information provided by 89 gene expression curves removing the ith gene. This procedure is repeated with every gene, if g −1 (η(−i) ) ≥ p1 the ith gene is member of G 1 , otherwise is from G 0 . The cross-validation classification error rate CVE is computed as the quotient between the total number of genes misclassified under cross-validation, and the total number of genes. The minimum CVE obtained was 0.1 with = 2 × 10−16 . In the cross validation procedure for our approach there are three misclassified G 1 genes: YDL055W, YDL227C, and YJL092W, while with the approach presented in Leng and Müller (2006) there were 5: YDL055W, YDR113C, YDR356W, YJL092W

123

M. Rincón, M. D. Ruiz-Medina 5

5

4

4

Gene Expression Level

Gene Expression Level

210

3 2 1 0 −1 −2 −3

3 2 1 0 −1 −2

0

20

40

60

80

100

120

−3

0

20

Time (Minutes)

40

60

80

100

120

Time (Minutes)

Fig. 1 Left panel Temporal gene expression profiles of yeast cell cycle. Right panel Reconstruction of the temporal gene expression profiles at the left panel from their orthogonal expansion in terms of Haar wavelet transform

2

5

Gene expression level

Gene expression level

1.5 1 0.5 0 −0.5 −1 −1.5 −2 −2.5 −3

4 3 2 1 0 −1 −2

0

20

40

60

Minutes

80

100

120

0

20

40

60

80

100

120

Minutes

Fig. 2 Profiles of misclassified G 1 genes. At the left panel: G 1 genes are represented by gray solid lines, S genes by dashed lines, misclassified gene SRS2(YJL092W) by black solid line. At the right panel: G 1 genes are given by gray solid lines, M/G 1 genes by dashed lines, misclassified gene KAR4(YDL055W) by black solid line

and YDL055W. On the common misclassified genes with the two approaches, YDL055W(KAR4) and YJL092W(SRS2), in Spellman et al. (1998) is commented that the first one is induced by an α factor, and so it is very strongly expressed at the beginning of the α factor experiment. Both genes are also misclassified by functional data analysis method given in Leng and Müller (2006), who suggest that SRS2 is close to the trajectories of S genes, and KAR4 is close to the trajectories of the M/G 1 , as we can see in Fig. 2. More precisely, it can be pointed out from such a figure that SRS2 oscillates between the two groups.

123

Wavelet-RKHS-based functional statistical classification

211

6 Simulation study To investigate the general performance of the methodology proposed, a simulation study has been carried out considering eight cases. Case 1 and 2 are generated as in Araki et al. (2009a,b), who consider these cases for testing functional logistic discrimination, based on Gaussian basis expansions and regularization. The results obtained in terms of the wavelet-based RKHS approach are then compared to those ones obtained in Araki et al. (2009a,b). Additionally, Cases 3a–3c and 4a–4c are also defined from the models generated in Cases 3 and 4 of Araki et al. (2009a,b), involving three groups, which will be here adapted to the case of two groups, in terms of the possible combinations of the three categories into pairs. Specifically, the waveletRKHS-based penalized functional logistic regression has been applied to a training sample and tested in a drawing of the same size with known class memberships. The regularization parameter was chosen through GIC and GBIC criterion. Random samples of the curves {xα (ti ); i = 1, . . . , N , α = 0, 1} are generated, where the functional explanatory variables xα (·) are assumed to be defined as xα (ti ) = h α (ti ) + α (ti ), i = 1, . . . , N , α = 0, 1, with α representing the group parameter. The models considered are: Case 1 h α (ti ) = sin(cα ti π )u α , α (ti ) ∼ N (0, 0.1), ti =

2i − 2 , N −1

(6.1)

for i = 1, . . . , N . Here, c0 = 1, c1 = 1.02, u 0 ∼ U [0.3, 1.3], u 1 ∼ U [0.1, 0.6].

(6.2)

Case 2 h α (ti ) = cα +ex p(a1 ti )+a2 ti , α (ti ) ∼ N (0, 0.052 ), ti =

2i −2 − 1, N −1 (6.3)

for i = 1, . . . , N . Here, c0 = 0, c1 = 0.2, a1 ∼ N (2, 0.52 ), a2 ∼ N (−3, 1).

(6.4)

Case 3 h α (ti ) = a1α sin(a2α π(ti − a3α )), α (ti ) ∼ N (0, 0.2), ti =

2i − 2 , N −1 (6.5)

123

212

M. Rincón, M. D. Ruiz-Medina

1.4

2

1.2

1.5

1 1

0.8 0.6

0.5

0.4

0

0.2 −0.5

0 −0.2

0

0.2

0.4

0.6

0.8

1

−1

0

0.2

0.4

0.6

0.8

1

Fig. 3 At the left, function h α , and at the right, function xα , for Case 1, evaluated in 50 subjects. Red lines represent group 0 and blue lines group 1 (color figure online)

for i = 1, . . . , N . Here, a11 ∼ U [0.7, 1.1], a21 ∼ U [0.7, 1.2], a31 ∼ U [0.8, 1.4] a12 ∼ U [0.6, 0.9], a22 ∼ U [0.6, 0.9], a32 ∼ U [0.7, 1] a13 ∼ U [0.6, 0.9], a23 ∼ U [0.8, 1.2], a33 ∼ U [0.5, 0.8]. (6.6) Case 4 h α (ti ) = a1 ti3 + a2 (ti − cα )2 , α (ti ) ∼ N (0, 0.1), ti =

2i − 2 − 1, N −1 (6.7)

for i = 1, . . . , N . Here, c0 = 0, c1 = 0.3, c2 = −0.3, a1 ∼ N (2, 0.52 ), a2 ∼ N (−3, 1). (6.8) Figures 3, 4, 5 and 6 show a comparison between the true curves {h α (ti ), i = 1, . . . , N } and sample curves {xα (ti ), i = 1, . . . , N }, for Cases 1–4, respectively. The mean of misclassified curve rates is showed in Table 1 for the eight cases studied, considering the data sizes 50(A), 100(B), 200(C) and 400(D). Cases 1 and 2 are given as in Araki et al. (2009a,b), while, as commented before, Cases 3a–3c and 4a–4c are generated by defining six models of binary categorical responses from Cases 3 and 4 in Araki et al. (2009a,b). For Cases 2 and 4a–4c, it can be appreciated that the RKHS-based projection in terms of Haar wavelet functions outperforms the projection into Gaussian wavelet bases for most of the data sizes studied (see Araki et al. 2009a,b), since curve data are defined as realizations of linear combinations of random exponential and polynomial type functions, or only random polynomial functions. Note that Gaussian basis functions are given by

123

Wavelet-RKHS-based functional statistical classification

213

16

16

14

14

12

12

10

10

8

8

6

6

4

4

2

2

0

0

−2

0

0.2

0.4

0.6

0.8

1

−2

0

0.2

0.4

0.6

0.8

1

Fig. 4 At the left, function h α , and at the right, function xα for Case 2, evaluated in 50 subjects. Red lines represent group 0 and blue lines group 1 (color figure online) 1.5

2 1.5

1 1 0.5

0.5 0

0 −0.5 −0.5

−1 −1.5

−1 −2 −1.5 0

0.2

0.4

0.6

0.8

1

−2.5

0

0.2

0.4

0.6

0.8

1

Fig. 5 At the left, function h α , and at the right, function xα for Case 3, evaluated in 50 subjects. Red lines represent group 0, blue lines group 1 and green lines group 2 (color figure online)



φk (t; μk , ηk2 , ν)

(t − μk )2 = exp − 2νηk2

 , k = 1, . . . , m,

with μk defining the position of centers, ηk being the dispersion parameter, and ν representing hyperparameter, which adjusts the amount of overlapping among the basis functions. Moreover, we have to note that Araki et al. (2009a,b) showed that Gaussian bases also outperform other classical bases like Fourier series or B-spline bases. The values displayed in Table 1 also support the fact that Gaussian wavelet bases (see fourth column of these tables, under the title PFLDA) outperform Haar wavelet basis (see the second and third columns), in Cases 1, 3a and 3c. Note that these cases correspond to curve data generated from the realizations of random sinusoidal functions, which are better described in terms of radial functions, like the ones involved in Gaussian bases. We also remark that, in general (see Table 1-C, D), for data size equal

123

214

M. Rincón, M. D. Ruiz-Medina

2

2

1

1

0

0

−1

−1

−2

−2

−3

−3

−4

−4

−5

−5

−6

−6

−7

0

0.2

0.4

0.6

0.8

1

−7

0

0.2

0.4

0.6

0.8

1

Fig. 6 At the left, function h α , and at the right, function xα for Case 4, evaluated in 50 subjects. Red lines represent group 0, blue lines group 1 and green lines group 2 (color figure online)

to 400(D), that has only been considered here (not studied in Araki et al. 2009a,b), the mean of misclassified curve rates decreases with respect to the previous smaller data sizes, in all the cases studied. Also, in all these cases, similar results are obtained from the application of GIC and GBIC criteria in the selection of the regularization parameter. On the other hand, in the resolution of the selection problem associated with the choice of a specific wavelet basis, moment conditions must also be taken into account, as showed in Preda (2007), where the following simulated example is considered in terms of the curve data (see Fig. 7) h α (ti ) = a1 H1 (ti ) + (1−a1 )H1 (ti +cα ), α (ti ) ∼ N (0, 1), ti = 1+

20(i − 1) , 63 (6.9)

for i = 1, . . . , 64. Here, c0 = −4, c1 = 4, a1 ∼ U (0, 1).

(6.10)

where H1 (t) = max(6 − |ti − 11|, 0) denotes the triangular waveform. Figure 7 show a comparison between true curves {h α (ti ), i = 1, . . . , 64} and sample curves {xα (ti ), i = 1, . . . , 64}, for this case. We consider 100 simulated samples of size 1,000, with 500 observations in each class. Each sample is randomly divided into a training sample of size 800 and a test sample of size 200, each class having the same number of observations in both samples. For this example, Gaussian wavelet bases outperform Haar wavelet basis, due to their suitability in terms of the moment conditions satisfied. Specifically, the mean of misclassified curve rate obtained is 0.042050 with GIC criterion, and 0.044150 with GBIC, for Haar wavelet basis. However, the order of magnitude of the error rate averaged is 2 × 10−2 , when Gaussian bases are considered (see Preda 2007).

123

Wavelet-RKHS-based functional statistical classification Table 1 Comparison of test error rates of Wavelet-based Penalized Functional Logistic Discrimination Analysis (WPFLDA) (finding the penalization parameter through GIC and GBIC criterion) and PFLDA proposed in Araki et al. (2009a) for simulated data over 100 repetition

215

WPFLDA (GIC)

WPFLDA (GBIC)

PFLDA

Case 1

0.247

0.243

0.195

Case 2

0.053

0.011

0.068

Case 3a

0.192

0.192

0.18

Case 3b

0.065

0.065

Case 3c

0.211

0.211

Case 4a

0.0047

0.0047

Case 4b

0.011

0.011

Case 4c

0.0036

0.0036

Case 1

0.2541

0.2547

Case 2

0.588

0.588

0.059

Case 3a

0.200

0.2047

0.153

Case 3b

0.0629

0.0629

Case 3c

0.2110

0.2111

Case 4a

0.0028

0.0028

Case 4b

0.0040

0.0040

Case 4c (C)

0.0029

0.0029

Case 1

0.2350

0.2477

0.175

Case 2

0.0037

0.0040

0.052

Case 3a

0.1559

0.1697

0.147

Case 3b

0.0556

0.0556

Case 3c

0.17

0.1803

Case 4a

0.0059

0.0059

Case 4b

0.0031

0.0031

Case 4c

0.0032

0.0032

(A)

0.118

(B) 0.187

0.101

0.102

(D)

(A), (B), (C) and (D) show 50, 100, 200 and 400 data sizes for each class, respectively

Case 1

0.2150

0.2151

Case 2

0.00224

0.00224

Case 3a

0.1362

0.1387

Case 3b

0.040

0.046

Case 3c

0.1476

0.1500

Case 4a

0.0026

0.0026

Case 4b

0.0050

0.0050

Case 4c

0.0033

0.0033

A better performance is obtained when wavelet-RKHS-based methodology is applied in terms of Symlet wavelet functions of order N , with N ≥ 4, or, in terms of Coiflet wavelet family of order N ≥ 3. Specifically, for this range of values of N in both wavelet families, the order of magnitude of the mean of misclassified curve rate is always less than 2 × 10−2 .

123

216

M. Rincón, M. D. Ruiz-Medina

6

10

5

8 6

4 4 3 2 2 0 1

0

−2

5

10

15

20

−4

5

10

15

20

Fig. 7 At the left, function h α , and at the right, function xα for waveform data, evaluated in 50 subjects. Red lines represent group 0 and blue lines group 1 (color figure online)

7 Discussion The wavelet-based RKHS approach presented provides a flexible framework where a wider class of predictors can be formulated, in a consistent way, in the resolution of the classification problem of gene expression profiles. In this approach, the penalty, appearing in the regularized loss function (negative log-likelihood), reflects the local variability at micro-scale level. The usual regular bases considered in the framework of functional logistic discrimination do not allow the search of a solution to the associated minimization problem in functions spaces as large as the ones considered with our approach. For example, fractal models for the parameter function β can be considered in our setting. As showed in the simulation study, the choice of a suitable wavelet basis, generating our RKHS space for projection is crucial, since in that case our approach outperforms the classical ones, when suitable data sizes are available. Note also that the duality between the Hilbert spaces where the parameter function β and the functional explanatory variable X lie ensures a good performance of our approach (see comments, in the simulation study section, about the influence of the choice of a suitable wavelet family on the performance of our approach, according to the curve data model considered). This is a key aspect of the proposed methodology, since it is well-known that smooth approximations are usually constructed from large-dimensional data for their processing, as curve data, in a Functional Statistical framework (see, for instance, Ramsay and Silverman 2005). Acknowledgments This work has been supported in part by projects MTM2009-13393 of the DGI, MEC, P09-FQM-5052 of the Andalousian CICE, Spain.

References Abramovich F, Angelini C (2006) Testing in mixed-effects FANOVA models. J Stat Plan Inference 136:4326–4348

123

Wavelet-RKHS-based functional statistical classification

217

Angelini C, De Canditiis D, Leblanc F (2003) Wavelet regression estimation in nonparametric mixed effect models. J Multivar Anal 85:267–291 Araki Y, Konishi S, Kawano S, Matsui H (2009a) Functional regression modeling via regularized Gaussian basis expansions. Ann Inst Stat Math 61:811–833 Araki Y, Konishi S, Kawano S, Matsui H (2009b) Functional logistic discrimination via regularized basis expansions. Commun Stat Theory Methods 38:2944–2957 Cardot H, Sarda P (2005) Estimation in generalized linear model for functional data via penalized likelihood. J Multivar Anal 92:24–41 Ferraty F, Vieu P (2006) Nonparameric functional data analysis. Springer, New York Hall P, Poskitt D, Presnell B (2001) A functional data-analytic approach to signal discrimination. Technometrics 43:1–9 Kawano S, Konishi S (2009) Nonlinear logistic discrimination via regularized Gaussian basis expansions. Commun Stat Simul Comput 38:1414–1425 Konishi , Kitagawa G (1996) Generalised information criteria in model selection. Biometrika 83:875–890 Konishi S, Ando T, Imoto S (2004) Bayesian information criteria and smoothing parameter selection in radial basis function networks. Biometrika 91:27–43 Leng X, Müller HG (2006) Classification using functional data analysis for temporal gene expression data. Bioinformatics 22:68–76 Lian H (2007) Nonlinear functional models for functional responses in reproducing kernel Hilbert spaces. Canad J Stat 35:597–606 McCullagh P, Nelder JA (1989) Generalized linear models. Chapman & Hall, London Mendelson S (2002) Learnability in Hilbert spaces with reproducing kernels. J Complexity 18:152–170 Müller HG (2005) Functional modelling and classification of longitudinal data. Cand J Stat 32:223–240 Müller HG, Stadtmüller U (2005) Generalized functional linear models. Ann Stat 33:774–805 Picard R, Cook D (1984) Cross-validation of regression models. J Am Stat Assoc 79:575–583 Preda C (2007) Regression models for functional data by reproducing kernel Hilbert spaces methods. J Stat Plann Inference 137:829–840 Rachdi M, Vieu P (2007) Nonparametric regression for functional data: automatic smoothing parameter selection. J Stat Plann Inference 137:2784–2801 Ramsay J, Silverman BW (2005) Functional data analysis. Springer series in statistics, Springer, New York Rincón M, Ruiz-Medina MD (2012) Local wavelet-vaguelette-based functional classification of gene expression data. Biometrical J 54:75–93 Ruiz-Medina MD, Salmerón R (2010) Functional maximum-likelihood estimation of ARH(p) models. Stoch Environ Res Risk Assess 24:131–146 Schölkopf B, Smola A (2002) Learning with kernels. MIT Press, Cambridge Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B (1998) Comprehensive identification of cell-cycle regulated genes of the yeast Saccharomyces cerevisiae by microarray hibridization. Mol Biol Cell 9:3273–3297 Triebel H (1978) Interpolation theory, function spaces, differential operators. North-Holland, Amsterdam Vakhania NN, Tarieladze VI, Chebonyan SA (1987) Probability distributions in banach spaces. D. Reidel Publishing Company, Dordrecht Vidakovic B (2006) Statistical modelling by wavelets. Wiley, New York Yang JY, Peng ZL, Yu Z, Zhang R-J, Anh V, Wang D (2009) Prediction of protein structural classes by recurrence quantification analysis based on chaos game representation. J Theor Biol 257:618–626 Yang J-Y, Yu Z, Anh V (2009) Clustering structures of large proteins using multifractal analyses based on a 6-letter model and hydrophobicity scale of amino acids. Chaos Solitons Fractals 40:607–620

123