SFB 823 - Eldorado - TU Dortmund

1 downloads 0 Views 516KB Size Report
SFB. 823. Numerical algebraic fan of a design for statistical model building ... experimental settings is expressed as solution of a system of polynomials. Thereby.
SFB 823

Numerical algebraic fan of a design for statistical model building

Discussion Paper

Nikolaus Rudak, Sonja Kuhnt, Eva Riccomagno

Nr. 4/2013

Contents 1 Introduction

3

2 Direct, indirect and composite model

4

2.1

Three model strategies . . . . . . . . . . . . . . . . . . . . . . . . . .

4

2.2

Main effect linear models from Y to Z . . . . . . . . . . . . . . . . .

5

3 Computational polynomial algebra and designed experiments

8

3.1

Term ordering, matrices and fans . . . . . . . . . . . . . . . . . . . . 10

3.2

Algebraic analysis of the direct case . . . . . . . . . . . . . . . . . . . 14

3.3

Motivations for a numerical fan of a design . . . . . . . . . . . . . . . 14

3.4

Numerical BM-algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.5

Numerical fan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4 Application: Thermal Spraying Process

19

4.1

Statistical analysis of the direct case . . . . . . . . . . . . . . . . . . 20

4.2

Possible designs for the second stage . . . . . . . . . . . . . . . . . . 21

4.3

Approximated vanishing ideals for the Y -designs . . . . . . . . . . . . 23

4.4

Computation of the Algebraic Fan . . . . . . . . . . . . . . . . . . . . 27

4.5

Composite vs Direct Approach . . . . . . . . . . . . . . . . . . . . . . 33

5 Conclusion

35

6 Appendix

39

6.1

Direct Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6.2

First Stage Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

1 Introduction In this article we develop methods for the analysis of non-standard experimental designs by using techniques from algebraic statistics. Our work is motivated by a thermal spraying process used to produce a particle coating on a surface, e.g. for wear protection or durable medical instruments. In this application non-standard designs occur as intermediate results from initial standard designs in a two-stage production process. We investigate algebraic methods to derive better identifiable models with particular emphasis on the second stage of two-stage processes. Ideas from algebraic statistics are explored where the design as finite set of distinct experimental settings is expressed as solution of a system of polynomials. Thereby the design is identified by a polynomial ideal and features and properties of the ideal are explored and provide inside into the structures of models identifiable by the design [Pistone et al., 2001, Riccomagno, 2009]. Holliday et al. [1999] apply these ideas to a problem from the automotive industry with an incomplete standard factorial design, Bates et al. [2003] to the question of finding good polynomial metamodels for computer experiments. In our thermal spraying application, designs for the controllable process parameters are run and properties of particles in flight measured as intermediate responses. The final output describes the coating properties, which are very time-consuming and expensive to measure as the specimen has to be destroyed. It is desirable to predict coating properties either on the basis of process parameters and/or from particle properties. Rudak et al. [2012] provides a first comparison of different modeling approaches. There are still open questions: which models are identifiable with the different choices of input (process parameters, particle properties, or both)? Is it better to base the second model between particle and coating properties on estimated expected values or the observations themselves? The present article is a contribution in this direction. Especially in the second stage particle properties as input variables are observed values from the originally chosen design for the controllable factors. The resulting design on the particle property level can be tackled with algebraic statistics to determine identifiable models. However, it turns out that resulting models contain elements which are only identifiable due to small deviations of the design from more regular points, hence leading to unwanted unstable model

3

results. We tackle this problem with tools from algebraic statistics. Because of the fact that data in the second stage are very noisy, we extend existing theory by switching from symbolic, exact computations to numerical computations in the calculation of the design ideal and of its fan. Specifically, instead of polynomials whose solution are the design points, we identify a design with a set of polynomials which "almost vanish" at the design points using results and algorithms from Fassino [2010]. The paper is organized as follows. In Section 2 three different approaches towards the modeling of a final output in a two-stage process are introduced and compared. The algebraic treatment and reasoning is the same whatever the approach. Section 3 contains the theoretical background of algebraic statistics for experimental design, always exemplified for the special application. Section 4 is the case study itself.

2 Direct, indirect and composite model Aiming at a prediction model of the final response Z in a two stage model, we consider three different approaches where the prediction model is either based on the initial input X, the intermediate outcomes Y or a prediction Yˆ of them. After introducing the different approaches in general we discuss them in more detail for main effect models from Y to Z.

2.1 Three model strategies To fix notation assume X has q components, Y has p components, and Z has m components. Model building is based on an initial design Dx and we have observed values Dy and Dz . A first model building strategy, which we name direct model, assumes Z = h(X) + δ with E(δ|X) = E(δ) = 0 and given V ar(δ|X) = V ar(δ) and hence E(Z|X = x) = h(x). Our composite model is based on the assumptions that Z = g(f (X)) + η and Y = f (X) + , thus E(Z|X = x) = g(E(Y |X = x))

4

Composite E(Y|X)

Direct

X

Z

f (X) +  Indirect

Figure 1: Modeling strategies and the indirect model takes Z = g(f (X) + ) + η˜ and Y = f (X) + , hence E(Z|X = x) = E(g(f (X) + )|X = x).

(1)

We assume throughout E(|X) = E() = 0 and V ar(|X) = V ar() given E(η|Y ) = E(η|X) = 0 and V ar(η|Y ) = V ar(η|X) = V ar(η) given E(˜ η |Y ) = E(˜ η |X) = 0 and V ar(˜ η |Y ) = V ar(˜ η |X) = V ar(˜ η ) given. Figure 1 illustrates these three model approaches. If g is a linear function then Equation (1) becomes E(Z|X) = g(f (x)) by linearity of expectation and the indirect and composite model coincide.

2.2 Main effect linear models from Y to Z We next compare the different approaches on the model level for the special case of linear models and main effects in going from Y to Z. Note that we still allow models beyond main effects in the direct strategy as well as from X to Y for the other two strategies. Without loss of generality we set m = 1, hence Z ∈ R. Under these restrictions the direct model becomes Z = h(X) + δ

= |{z}

linear model

5

XzT γ ∗ + δ

with γ ∗ an unknown parameter vector and Xz a vector of monomials of the original X-variables, to model intercept, main effects, interactions, quadratic terms and so on, as required. Thereby it follows from the assumptions in Section 2.1 that E(Z|X = x) = XzT γ ∗ .

(2)

We next introduce a notation to represent polynomial models that will be expediα

ent in this section and later on. The symbol xα stands for the monomial xα1 1 . . . xq q where for i = 1, . . . , q, αi is a non negative integer number and α = (α1 , . . . , αq ) ∈ Zq≥0 . For example, the intercept is given by the zero vector (0, . . . , 0) = 0q , and a P main effect model by α∈L θα xα with L = {0q , (1, 0, . . . , 0), (01, 0, . . . , 0), . . . , (0, . . . , 0, 1)} and θα real numbers, while a generic linear model is of the form X

θα xα

α∈L

with θα ∈ Rq and L a finite subset of Zq≥0 . In this notation model (2) above becomes E(Z|X = x) = xTz γ ∗ =

X

γα∗ xα

α∈L

Note that the support of the model xz = [xα ]α∈L is identified with the exponents of the monomials in the model and the parameter vector is γ ∗ = [γ α ]α∈L . For the composite model when we assume a main effect linear model between Y and Z, equation (1) simplifies to E(Z|X = x) = E(γ0 + f (X)T γ + η|X = x) = γ0 + f (x)T γ

(3)

with γ0 ∈ R and γ ∈ Rp unknown parameters, for some suitable p ∈ Z≥1 . Similarly when g gives a main effect linear model the indirect model gives E(Z|X = x) = E(γ˜0 + (f (X) + )T γ˜ + η˜|X) = γ˜0 + f (x)T γ˜

(4)

From (3) and (4) we can conclude that the indirect and composite strategies are structurally the same if and only if γ = γ˜ and γ0 = γ˜0 . Next, we replace each component of f (x) in (3) and (4) by a multivariate linear model. So for i = 1, . . . , p let the i-th component of f be written as f (x)i =

X

xα βαi = xTy,i β i

α∈Li

6

where Li identifies the support vector xy,i = [xα ]α∈Li for the X to Yi regression model and β i = [βαi ]α∈Li gives the unknown parameter vector. Hence equation (3) becomes #T

" X

E(Z|X = x) = γ0 + f (x)T γ = γ0 +

xα βαi

α∈Li

γ i=1,...,p

p

= γ0 +

XX

xα βαi γi .

(5)

i=1 α∈Li

By assuming equality of E(Z|X = x) in all modeling approaches, from (2) and (5) we obtain an equality of polynomials X

γα∗ xα

= γ0 +

p X X

xα βαi γi

i=1 α∈Li

α∈L

This holds true if and only if coefficients of the same monomial on the left hand side and right hand side are equal. To expand on this we further assume that all X-to-Y models admit intercept, so that 0q ∈ Li for all i from 1 to p, and define L∗i = Li \ {0q }. The above become γ0∗q +

X

γα∗ xα = γ0 +

α∈L∗

p X

β0i q γi +

p X X

xα βαi γi

i=1 α∈Li

i=1

Equating coefficients of the intercept gives γ0∗q

= γ0 +

p X

β0i q γi

i=1

Similarly for each α ∈ L∗ we have γα∗

=

p X

βαi γi

i=1

where βαi is zero if α is not in Li . Finally for α 6∈ L∗ we have 0=

p X

βαi γi

i=1

The above can be intended as theoretical aliasing relationships among the parameters for the indirect/composite case and the direct case. For a generalization to a multivariate linear model with higher order terms for the Y -variables, further assumptions on the structure of the error terms are necessary.

7

Fitting above models to real data sets, e.g. by common estimating and model selection procedures, indicates that the obtained models have different monomials in the x variables. This prompts us to adopt new ways to compare the three approaches by algebraic statistics. Besides, in any approach it is of interest to know which models may be identified from the given design on the X, Y or Yˆ ’s. One aim is to find out if information is lost or models are missed by considering any of the three possible input types. For the model selection procedure the knowledge of possible maximal models is extremely useful as an all-subset selection is usually unfeasible.

3 Computational polynomial algebra and designed experiments A design or a set of observations can be seen as the zeros of polynomial equations. This simple observation is the entry key for algebraic geometry to the design and analysis of experiments. In the case study we analyse in Section 4 we consider a full factorial design with central point in four factors, Dx . In total we have 17 points at which four different outputs Y = (Y1 , . . . , Y4 ) are measured. The observed or the estimated values of Y at Dx are the input points for the next stage of the modeling process from Y to Z (see Figure 1). To start with we ignore the output, concentrate on the input and consider the two dimensional analogue of Dx . Example 1. The design Dx = {(±1, ±1), (0, 0)} is the solution set of     p1 = x31 − x1 = 0    p2 = x32 − x2 = 0      p3 = (x1 − x2 )(x1 + x2 ) = 0. From classical theory we know that only two saturated models, with the hierarchical (or order ideal) property, are identifiable by this design.1 The order ideal property states that any lower order term of an interaction term in the model is in 1

Here a model with support [f1 (x), . . . , fr (x)] is identified by the design with distinct points d1 , . . . , ds if the design/model matrix [fj (di )]i=1,...,s;j=1,...,r is full rank. It is saturated if the rank is r = s.

8

the model as well. Peixoto [1990] among many authors advocates the desirability of the hierarchical property for a statistical model. In practice the final model will have less terms than there are design points and, very often, its terms are chosen from a larger set satisfying the order ideal property. The two models for Example 1 are {1, x1 , x2 , x1 , x1 , x2 , x21 } and {1, x1 , x2 , x1 , x1 , x2 , x22 }. The corresponding design/model  1   1   1 X=  1   1  1

matrices coincide and are  x1 x2 x1 x2 x11 /x22   1 1 1 1    1 −1 −1 1    −1 1 −1 1    −1 −1 1 1   0 0 0 0

(6)

Clearly X is invertible and the two models are identifiable. In this example it is evident that there are no other saturated identifiable hierarchical models. These two models give the so-called statistical fan of Dx . Note that x21 and x22 cannot be part of the same models because they are aliased. Algebraically this follows from the fact that p3 = (x1 − x2 )(x1 + x2 ) = 0 is equivalent to x21 = x22 . Statistically this also means that both effects are not distinguishable by data observed from this design. The notion of a statistical fan goes back to [Pistone et al., 2001, Def. 35]. Definition 1. The statistical fan of a design is the set of hierarchical (support vectors for polynomial) models identified by the design with as many terms as distinct design points. Main properties of the statistical fan are: • it is finite; • each of its elements, called leaves, is formed by as many monomials as there are points in the design; • each leaf is an order ideal and hence it contains 1, the constant term; • the design/model matrix for each leaf is invertible.

9

In designs with a less regular structure than Dx , the statistical fan might not be as easy to determine as in Example 1. Many authors advocate the importance of hierarchical models (see e.g. Cox and Reid [2000], p.104). (Subsets of) fans provide lists of saturated hierarchical models each of which can be be input to a selection procedure for determination of a well-fitting parsimonious submodel. Furthermore, if we have different hierarchical models in the fan which differ only by a few terms this gives an indication of confounding within these terms. However, the space of hierarchical models is often too large for an exhaustive search of saturated and identifiable models by the design, namely of the statistical fan. Still it is useful to have a large selection of saturated hierarchical models from which to select a submodel. A systematic method to investigate at least an “interesting” part of that space is provided by algebraic methods. The obtained subset of the statistical fan is called the algebraic fan of a design, or of the design ideal. For the analysis of the relation between these two fans see Maruri-Aguilar [2007] and Section 3.1 below. The technical tool at the basis of the computation is a termordering on the set of monomials. The technique from computational commutative algebra that allows this, also provides a theory that develops the observation about aliasing written up before Definition 1 for general designs. The key notion is the design ideal discussed below in Subsection 3.1.

3.1 Term ordering, matrices and fans A good reference for this section is Cox et al. [1996]. A mathematical reference for the connection between matrices and term ordering is Robbiano [1985] and for the algebraic fan of a polynomial ideal see Mora and Robbiano [1988]. The set of polynomials in the variables x1 , . . . , xn and with real coefficients is indicated with R[x1 , . . . , xn ] and the set of monomials with T n . More generally instead of real coefficients we might consider coefficients in an algebraic field and in the applications we often have rational coefficients. Polynomials are a linear combination of monomials and in turn, monomials are special polynomials formed by just one power product with coefficient equal to one. Note that a monomial xα = xα1 1 . . . xαnn is represented by its exponent vector α = (α1 , . . . , αn ) ∈ Zn≥0 with αi non-negative integers for all i = 1, . . . , n. Hence ordering monomials correspond

10

to order non-negative integer vectors, more precisely a term order τ on T n is a wellorder relation on Zn≥0 . This can be extended to Zn but we do not need to consider this generalisation here. Definition 2. A term order τ is a total order on T n such that 1. 1 is the smallest term (i.e. 1