Knowledge structures and latent class models

1 downloads 0 Views 459KB Size Report
Apr 29, 2005 - (1993), Koppen and Doignon (1990), Koppen (1993), Kambouri, Koppen, Villano and Falmagne. (1994), or Düntsch and Gediga (1996).
About the connection between knowledge structures and latent class models Martin Schrepp 29.04.2005

Correspondence address: Dr. Martin Schrepp, Schwetzinger Strasse 86, 68766 Hockenheim, Germany

Original article available under DOI: 10.1027/1614-2241.1.3.93 Methodology, Volume 1 © 2005 by Hogrefe. This article may not exactly replicate the final version published in Methodology. It is not the version of record and is therefore not suitable for citation.

Abstract This paper tries to establish a connection between knowledge structures and latent class models. We will show that knowledge structures can be interpreted as a special type of constrained latent class models. Latent class models offer a well founded theoretical framework to investigate the connection of a given latent class model to observed data. If we establish a connection between latent class models and knowledge structures we can use this framework also in knowledge structure theory. We will show that the connection to latent class models offers us a possibility to construct a knowledge structure by exploratory data analysis from observed response patterns. Other possible applications are the empirical comparison of hypothetical knowledge structures and the statistical test of a given knowledge structure.

2

Introduction Knowledge structures and latent class models are two approaches to describe the response behavior of subjects which gained considerable interest in psychology. Both approaches share a number of fundamental assumptions. A central assumption in both approaches is that the population can be split into a finite number of groups. It is assumed that the members of each group cannot be distinguished concerning their response behavior, i.e. the groups are homogeneous with respect to this property. Such a group is called a latent class in latent class models and a knowledge state in knowledge structure theory. Latent class models describe the structure of the data by a set of latent classes together with a set of parameters which specify the frequencies of the latent classes in the population. In knowledge structure theory the set of possible knowledge states describes the structure of the knowledge domain. Both models assume that each subject is in exactly one latent class respectively knowledge state at each point in time. The latent class or knowledge state is in both cases a hypothetical not directly observable (latent) attribute of a subject. The goal of this paper is to show that knowledge structures can be considered as a special type of constrained latent class models. The transformation of a knowledge structure into a latent class model requires only some small and straightforward extensions to knowledge structure theory. The dependency between latent class models and knowledge structures is not well investigated. The main reason for the lack of interest into an investigation of this dependency seems to be that latent class models are mainly seen as a method of data analysis while this aspect plays traditionally only a minor role in knowledge structure theory. The main focus of knowledge structure theory was in the past to describe the possible knowledge states of subjects in a given domain and to use this description for an efficient knowledge diagnosis. The number of data patterns which are necessary to create a knowledge structure by data analysis or to validate a given knowledge structure empirically increases exponentially with the size of the problem set on which the knowledge structure is defined. Since realistic applications of knowledge structures in knowledge diagnosis operate on huge problem sets an empirical construction or validation of a knowledge structure was considered to be impossible. Therefore, the theoretical investigation of the connection between a knowledge structure and observed data was widely ignored in the past1. This situation has changed recently. First, it was shown that it is possible to use knowledge structures for the validation of psychological assumptions. Here it is often possible to validate the assumptions using only a small number of well constructed problems. Second, knowledge structures have a natural connection to methods of Boolean analysis of data. Boolean analysis tries to uncover hidden deterministic dependencies between problems by exploratory data analysis. The result of such an analysis can then be described by a knowledge structure. For these two applications of knowledge structures the connection between a hypothetical knowledge structure and observed data naturally plays a very important role. But within knowledge structure theory there are in the moment mainly heuristic approaches for the comparison between a knowledge structure and observed data available. On the other hand a well developed methodology to answer the corresponding questions exists in latent class analysis. Thus, if we link knowledge structures to latent class models we can profit from this methodology.

1

An exception is an extension of knowledge structures described by Falmagne (1989), which will be discussed later in this paper.

3

Basic elements of knowledge structure theory We give now a short introduction to knowledge structure theory (Doignon & Falmagne, 1985). For a detailed introduction see the books of Doignon and Falmagne (1998) or Albert and Lukas (1999). In knowledge structure theory a knowledge domain is represented as a finite set Q of problems. The subset K of problems from Q a subject is capable of solving is called the knowledge state of the subject2. A set  of subsets of Q is called a knowledge structure. A knowledge structure represents the set of all possible knowledge states. Assume, for example, that we have a set Q of 3 problems a, b, c. Assume that each subject who is able to solve a is also able to solve b and that each subject who is able to solve b is also able to solve c. Under these assumptions , {c}, {b, c}, Q are all possible knowledge states, i.e. only these subsets of Q are compliant with the assumptions concerning the dependencies between the problems. Thus, the knowledge structure  is given by {, {c}, {b, c}, Q}. A knowledge structure  describes all possible knowledge states of subjects. Thus, we assume that for each K   there is a subject who is able to solve exactly the problems in K and that for each K   there is no such subject. The knowledge state K of a subject can differ due to the influence of random errors from the observed response pattern D of the subject. It is, for example, possible that a subject fails due to a lack in concentration a problem which he or she can in principle solve with his or her available domain knowledge. Thus, the knowledge state of a subject is an unobservable (latent) attribute of the subject. It describes which problems a subject can solve accordingly to his or her actual domain knowledge. There are three methods to construct knowledge structures. The first method is querying experts. In this method an expert in the knowledge domain constructs the knowledge structure with the help of a formalized procedure. Such querying procedures are described, for example, in Dowling (1993), Koppen and Doignon (1990), Koppen (1993), Kambouri, Koppen, Villano and Falmagne (1994), or Düntsch and Gediga (1996). The querying procedure presents the expert a number of statements of the form If all questions in the set P are answered incorrectly by a subject, then the question q can be assumed to be answered incorrectly. The expert can accept or reject such a statement. The accepted statements are then used to construct the knowledge structure. To minimize the number of statements the expert has to judge an adaptive procedure is used. This procedure presents only such statements to the expert for which the response of the expert can not be inferred from his or her already given responses. Potential problems with such an adaptive procedure are described in Schrepp and Held (1995). A second method is to derive knowledge structures from psychological models of the problem solving behavior (Schrepp, 1995) or from assumptions concerning the skills necessary in the domain (see Albert, Schrepp & Held, 1994; Doignon, 1994; Korossy, 1996 or Schrepp, Held & Albert, 1999). The first of these two approaches requires a detailed psychological model of the problem solving processes in the knowledge domain Q. This model is then used to derive the set of all knowledge states which are compatible with the model. The second approach relies on an analysis of the skills or competencies which are necessary to solve the problems from Q. These skills are then linked to the problems by a skill assignment which defines for each problem the

2

We describe knowledge structure theory on the example of problems which can be solved or failed by subjects. This is the typical application area of knowledge structures. But please note that knowledge structures can be used to describe the structure of arbitrary dichotomous data. Another possible application area is questionnaires where the items represent statements to which subjects can agree or disagree (see, for example, Wiley & Martin, 1999 or Schrepp, 2002).

4

subsets of skills which are necessary to solve the problem. The knowledge structure consists of all subsets from Q which are compatible with the assumptions from this skill assignment. The third method is an analysis of observed response patterns. This approach uses methods of Boolean data analysis. See, for example, Leeuve (1974), Flament (1976), Theuns (1994, 1998), or Schrepp (1999a, 1999b, 2003). These algorithms try to construct a number of logical formulas, for example a  b or a  b  c, from a data set. These formulas determine the knowledge structure as the set of all subsets of Q which are compatible with all formulas. Knowledge structures can be used in two different areas. The most prominent application of knowledge structures is adaptive testing. Since a knowledge structure does, in general, contain only a small number of elements of the power set of Q, it can be used to infer the answers of subjects to particular problems from the answers they have already given. This property can be used to define an adaptive procedure for the assessment of knowledge. See Falmagne and Doignon (1988a, 1988b) for details. Knowledge structures which are derived from psychological models of problem solving or from skill assignments are often used for an empirical test of the underlying psychological assumptions. Here the knowledge structure  which was derived from the model is compared to a set  of observed response patterns. The extent to which  is able to explain the data is used as a test of the assumptions which are used to derive the knowledge structure . If many response patterns are not contained in  and differ in many problems from each of the states in , then the underlying assumptions must be rejected. To make the connection to latent class models evident we use in this paper at some points a special notation for a knowledge structure. We represent a knowledge state K by a mapping S: Q  {0,1}, where S(q) = 1  q  K. Thus, for a set Q of m problems a knowledge state is written as an mtuple of 0‘s and 1‘s. A knowledge structure is simply a set of such m-tuples. The knowledge structure from our example can thus be written as {(0,0,0), (0,0,1), (0,1,1), (1,1,1)}.

Basic elements of latent class models We describe now the basic concepts of latent class analysis. For a more detailed introduction into latent class analysis see, for example, the books of Goodman (1978), Clogg (1995), McCutcheon (1987), or Rost and Langheine (1997). Assume that Q is a set of m dichotomous problems3 and that is a data set containing n response patterns to the problems in Q. The data set  can be described by a function f : Pow(Q)  {0, 1, ...} which assigns to each S  Q the number of subjects which showed response pattern S. A latent class analysis of a given data set  requires the following basic assumptions on the data:  The population from which the data was taken can be split into c groups which are called latent classes. These classes are mutually exclusive and exhaustive, i.e. each subject of the population is in exactly one class.  Within each latent class x each problem j has a specific probability xj of occurence. Thus, each latent class x is described by an m-tuple (x1, ..., xm), where xj  [0,1] for each j = 1, ..., m. The interpretation of (x1, ..., xm) is A subject in latent class x answers problem j with probability xj positive.  Within each latent class x the variables xj are independent (local independence).  Each latent class x occurs with probability x in the population. 3

We introduce latent class analysis for the special case of dichotomous data, since our main interest is a comparison to knowledge structure theory. But please note that latent class analysis is not restricted to dichotomous data.

5

The result of a latent class analysis of a binary data set  is thus given by: 1 (11, ..., 1m) ... c (c1, ..., cm)

where c is the number of latent classes. A latent class analysis (for c latent classes) of a given data set  requires thus an estimation of the c parameters 1, ..., c and the c * m parameters ij. These parameters can be estimated accordingly to the maximum likelihood method (Goodman, 1974a, 1974b). Given a latent class model we can easily calculate for each S  Q the expected frequency fexp(S) of S under the assumption that the model is correct. For S  Q and a latent class (x1, ..., xm) we can calculate the probability px,S that a subject in latent class x will produce response pattern S by px,S =  { (x, S ,i) | i=1, ..., m}, where (x, S ,i) = xi if S(i) = 1 and (x, S ,i) = (1-xi) otherwise. The value fexp(S) is then simply given by fexp(S) =  { x * px,S | x = 1,..., c }. The fit of a latent class model to the data can be measured, for example, by the Chi-Square statistic or by the likelihood-ratio statistic (L2-statistic) defined by: L2 = S  Q 2 f(S) ln( f(S) / fexp(S) ) with the usual convention ln(0) = 0. The L2-statistic has a theoretical Chi-Square distribution with r – ( (c + 1) * m) degrees of freedom, where r is the number of observed different response patterns, i.e. r = |{ S  Q | f(S)  0 }|. The estimation of the parameters 1,..., c and ij for i = 1, ..., c and j = 1, ..., m requires that the number c of latent classes is known. But if we use latent class analysis in a purely exploratory context, this number should be determined by the analysis and is unknown before. This problem is solved in latent class analysis by the following procedure. A latent class analysis is performed for c = 1, 2, .... Each analysis results in a latent class model Mc of the data with c latent classes. The higher the number c of latent classes is, the better is in general the fit of the model Mc to the data accordingly to the L2-statistic. One method to determine the number of latent classes c is now to assess the fit of each model to the data set statistically. We choose then the model with the smallest possible number of latent classes which fits the data concerning the L2statistic for a predefined test level. Another method is to use information statistics, like the Bayesian Information Criterion BIC (Schwartz, 1978; Raftery, 1985) or the Akaike Information Criterion AIC (Akaike, 1974) to assess the fit of the models Mc to the data4. These information statistics offer a trade-off between the fit of the model to the data accordingly to the L2-statistic and the number of parameters of the model. The BIC statistic can be used to compare non-nested models. The model which shows the lowest BIC value is considered to be the best representation of the observed data, even if the model does not fit the data statistically. The BIC criterion strongly favors parsimonious models with fewer parameters. In some situations there is knowledge about constraints for the parameters of a latent class model available. Such constraints can, for example, restrict the values of some parameters to a certain

AIC is defined by L2 – df * 2 and BIC is defined by L2 – df * log(n), where L2 is the value of the likelihood-ratio statistic, df is the degrees of freedom of the model and n is the size of the data set. 4

6

fixed value or can restrict several parameters to have equal values. Such models are called constrained latent class models.

Knowledge structures as constrained latent class models We describe now a straightforward extension of knowledge structures, which allows us to see them as constrained latent class models. A knowledge structure describes only the possible states of knowledge. To connect these possible states to observed data we have to specify the frequency of the states in the population under investigation and a model for random errors during the measurement process. The description of the frequencies of the states is trivial. Assume that  is a knowledge structure on a problem set Q. Let h :   [0,1] be a function with K h(K) = 1. We interpret h(K) as the probability that a subject from the investigated population is in state K. Now we have to deal with the problem that we sometimes observe response patterns which are not contained in . To explain this situation we assume that a subject in state K can show due to the influence of random errors a response pattern D which is not equal to K. Such random errors can result, for example, from a lack in concentration, loss of motivation during the test, or time pressure. In knowledge structure theory usually two types of random errors are distinguished. Assume that a subject is in knowledge state K. Then it is possible that the subject fails a problem q in K, i.e. q  K and q  D. This is called a careless error. It is also possible that the subject solves a problem q not contained in K, i.e. q  K and q  D. This type of error is called a lucky guess. There are in principle two possibilities to model such lucky guesses and careless errors. In the first approach it is assumed that the probability of a careless error respectively lucky guess is the same for all problems. Thus, we have to specify probabilities  for careless errors and  for lucky guesses. This approach is for example used in Schrepp (1999b). It requires two additional parameters to describe the influence of random errors. The second approach assumes that the probability of a careless error respectively lucky guess depends on the problem. Thus, we have to specify for each problem q a probability q for a careless error in problem q and a probability q for a lucky guess in problem q. This approach is, for example, used in Falmagne, Koppen, Vilano, Doignon and Johannesen (1990). It requires 2 * |Q| additional parameters to describe the influence of random errors. It depends on the problems in Q which of these alternatives should be used. If we can assume that all problems in Q are more or less homogeneous concerning the probability of random errors we should clearly choose the first alternative, since it requires less free parameters. If we can not make that assumption, then we have to use the second alternative. As an example assume that Q is a set of multiple choice problems and that these problems have different numbers of answer categories. Since we can assume that the probability of a lucky guess in a multiple-choice problem is approximately given by one divided by the number of answer categories we have in this case to use the second alternative. Since the first alternative is a special case of the second alternative we restrict in the following our formulation of the model to the second case. For a description of the first case it is sufficient to replace the error probabilities q and q in this description by  and . Our generalized model consists thus of a knowledge structure , probabilities q, q for each q in Q and a frequency function h:   [0,1] with K h(K) = 1.

7

We can write a knowledge structure = {K1, …, Kc} with the extensions described above as h(K1) (11, ..., 1m) ... h(Kc) (c1, ..., cm)

where ij = 1-j if Ki(j) = 1 and ij = j if Ki(j) = 0. This shows directly that we can interpret a knowledge structure with this parameterization as a special type of constrained latent class model. But we have to mention here that the interpretation of a knowledge structure is quite different from the standard interpretation of a latent class model. A knowledge structure is a set of states  = { K1, ..., Kc } which describe all possible states of knowledge for a knowledge domain Q. It is a deterministic model of the knowledge of subjects which is enhanced by an error model to explain deviations between the observed data and the model. Thus, if we write for given error parameters i and i the knowledge structure  as a latent class model (see above) then each ij is either close to 1 (if subjects in this class master problem i with the exception of careless errors) or close to 0 (if subjects in this latent class do not master problem i with the exception of lucky guesses). For example the latent class model 0.6 (0.3, 0.7, 0.5) 0.4 (0.2, 0.8, 0.6)

can hardly be interpreted as a knowledge structure with two states, since error probabilities higher than 0.4 are of course not compatible with the interpretation of a deterministic state. Again it depends on the problems how big acceptable error probabilities are. Assume that Q consists of a number of mathematical problems which require that certain equations must be solved. If these problems are presented in the form of multiple choice problems with 4 answer categories, then we will expect the probabilities for lucky guesses somewhere around 0.25. If the problems are presented in an open form (the subjects have to write down the solution directly), then it is unlikely that a subject gets the correct solution by guessing or due to a wrong procedure. Thus, the probabilities for lucky guesses will be close to 0. Thus, we have to add two additional restrictions depending on the type of the problems in Q. We have to specify for each problem q in Q two intervals [l, u]q and [l, u]q which specify the upper (u and u) and lower limit (l and l) for a careless error respectively lucky guess. These intervals define the range in which the parameters q and q must be located to be compatible with the interpretation of random response errors for problem q. In most cases 0 will be a natural choice for l and l. But for example for multiple-choice items with 4 answer alternatives it makes sense to set l to 0.25, i.e. to the chance to select the correct answer by chance. We define [l, u]q and [l, u]q to make sure that q and q can be interpreted as random response errors. This is necessary to guarantee that the latent class model can be interpreted as a knowledge structure. Thus, u and u should be set in general to the maximal value which is still interpretable as a lucky guess or careless error probability. The concrete value which should be used for u and u obviously depends on the type of problems. Another important difference between latent class models and knowledge structures is the number of latent classes necessary to explain the data. A knowledge structure is a quasi-deterministic model. Thus, it is not unusual that the number of states is quite high compared to the power set of Q. It would be, for example, also a valid description of a knowledge domain if all elements of the 8

power set of Q are states. This would simply mean that there are no deterministic dependencies between the problems in Q and that thus each element of the power set describes a possible state of knowledge of a subject. For a latent class model it is in contrast important that the set of latent classes is small compared to the power set of Q. The number of parameters in the model must be smaller than the degrees of freedom in the data to ensure that the model is still meaningful. Given the arguments above a latent class model 1 (11, ..., 1m) ... c (c1, ..., cm)

can be interpreted as an extended knowledge structure if the following constraints hold:  for each j = 1, ..., m there are constants j and j , so that ij = 1 - j or ij = j for all i = 1, ..., c (thus only the two values 1 - j and j appear in column j of the matrix).  for each j = 1, ..., m the constants j and j are contained in an interval [jl, ju] respectively [jl, ju]. Here jl and jl describe the lower limit for a careless error respectively lucky guess in problem j and ju and ju describe the upper limit for a careless error respectively lucky guess in problem j. Our constrained latent class model has 2 * m free parameters q, q for the lucky guess and careless error probabilities of the m problems and per state K one free parameter h(K) for the frequency of the state. If we estimate the model parameters from data we have to make sure that the knowledge structure has not too many states to keep the model identifiable. Let r again be the number of observed different response patterns. For a knowledge structure with c states we have r – ( 2 * m + c + 1) degrees of freedom. Thus, the model is only meaningful if the knowledge structure contains less than r – ( 2 * m ) states.

Possible applications The parameters of the extended knowledge structure can be estimated as in latent class theory by an optimization procedure which considers the above stated restrictions. This directly shows that we can construct a knowledge structure from data using the standard techniques of latent class models. This can be done as described in the introduction to latent class models by performing the estimation for c = 2, 3, ... latent classes respectively knowledge states. Each estimation results in a latent class model Mc with c latent classes respectively states. We can then choose the model which shows the best fit accordingly to the BIC criterion. This procedure to compare knowledge structures over the BIC value is also used in a method of Wiley & Martin (1999) which will be discussed later in this paper in more detail. We can use our extension of knowledge structures in addition to compare several hypothetical knowledge structures empirically. Therefore, we simply estimate for each of these knowledge structures the parameters q and q for each q in Q and h(K) for each knowledge state K. The knowledge structure which fits the data best can then be determined as the knowledge structure for which the corresponding latent class model shows the smallest BIC value. Another possible application is the validation of a given knowledge structure . Assume that  = {K1, ..., Kc} is a knowledge structure constructed from a psychological model of problem solving. We transform this knowledge structure then to a latent class model and estimate the parameters h(Ki) for i = 1, ..., c, and q respectively q for each q in Q. The fit of the model can then be assessed using the L2-statistic. 9

Estimation of the parameters We use a simulated annealing algorithm (see for example, Metropolis, Rosenbluth, Rosenbluth, Teller & Teller, 1953; Cerny, 1985 or Kirkpatrick, Gelatt & Vecchi, 1983) for the estimation of the model parameters. Simulated annealing is a technique to find solutions to an optimization problem by trying random variations of a starting solution. Assume that we have a model with k parameters p1, ..., pk and that V(p1,..., pk) is a real-valued function which describes the quality or fit of the model. The goal is to determine a parameter combination with an optimal fit. Assume that a higher V(p1,..., pk) indicates a better fit. A simulated annealing algorithm usually consists of two steps. The first step sets (random) starting values for the model parameters which should be optimized. The second step contains a loop which repeats the following three sub-steps: 1. The actual parameter values p1, ..., pk are changed randomly to new values p’1, ..., p’k. 2. V(p’1, ..., p’k) is computed. 3. If V(p’1, ..., p’k) > V(p1, ..., pk), then p1, ..., pk are replaced by p’1, ..., p’k. After sub-step 3 the computation proceeds with sub-step 1. This loop is repeated until a stopping criterion is reached, for example, until no improvement is observed for a given number of repetitions. The influence of the random changes applied in sub-step 1 is decreasing as the computation proceeds5. Thus, in the beginning the parameter values can change heavily and towards the end of the loop only small random changes to the actual parameter values are possible. This avoids local minima by jumping out of them early in the computation. To reduce the chance to get trapped in a local minimum this procedure is usually repeated several times with different starting values. For our specific optimization problem we have the difficulty that not only the parameters q, q, h(K) but also the knowledge structure  is subject of change. The application of the simulated annealing principle to our optimization problem (for c knowledge states) looks like this. The selection of the starting values consists of the following two steps:  A knowledge structure with c states is determined randomly. Therefore, we choose c different states from the set of observed response patterns. Each response pattern S is chosen with a probability f(S)/n reflecting its frequency in the data.  The starting values for the parameters q, q, h(K) for K   are determined randomly in the corresponding intervals. The optimization loop repeats the following three sub-steps: 1. The parameter values q, q, h(K) are changed randomly inside the corresponding intervals6. The amount of the random change slowly decreases as computation proceeds. 2. The expected frequencies fexp(S) for all S  Q are computed from the model with the changed parameter values. These expected frequencies are then used to compute the fit (likelihood) of the model with the changed parameters to the data. 3. If the extended knowledge structure with the changed parameters values shows a better fit than the extended knowledge structure with the old parameter values, then the old parameter values are replaced by the new values.

5

Some simulated annealing procedures use a slightly different approach. Here the influence of the random changes is held constant, but a worse result is accepted with a probability which decreases as the computation proceeds. 6 The values h(K) are readjusted after the change so that the condition {h(K) | K  } = 1 is always true.

10

Then the computation proceeds with sub-step 1, until no improvement of the model fit is observed for a given number of repetitions. This procedure is repeated with a huge number7 of random starting values for the knowledge structure and the parameter values to reduce the chance that the algorithm is trapped in a local minimum and to make sure that a huge number of different knowledge structures are tried in the first step. The result of the algorithm is the best solution over all repetitions8.

Relations to other models Our approach to interpret a knowledge structure as a latent class model is in some sense a generalization of the Proctor model (Proctor, 1970). The Proctor model is a probabilistic version of Guttman scaling, which can be formulated as a constrained latent class model. Assume that we have m problems which can be ordered concerning their difficulty into a linear sequence. The Proctor model consists of m+1 latent classes which represent the m+1 Guttman types. The model assumes that there are error probabilities which describe the probability of an incorrect response given the scale type a subject belongs to. The simplest version of the model assumes that these error probabilities are identical across the problems and scale types. A more general version of the model assumes that the error probabilities are problem dependent but independent from the scale type. Falmagne (1989) and Falmagne, Koppen, Villano, Doignon and Johannesen (1990) describe a representation of a knowledge structure as a constrained latent class model. In some sense our model can be seen as a special case of their general model. The original model described by Falmagne et al (1990) had to face the problem that the number of latent classes (respectively knowledge states) was to large to be used in practice. To handle this problem the authors introduced several additional assumptions concerning the knowledge structure in order to restrict the number of free parameters in the model. First, it is assumed that the knowledge structure is a knowledge space, i.e. that it contains , Q and is closed under union. In addition there is the assumption that the knowledge space  is wellgraded. Well graded means that each state in  is contained in at least one chain   K1  ...  Km-1  Q of m+1 states in which each pair of states Ki, Ki+1 differs in exactly one problem. Such a chain is called a learning path. The basic idea is now to assume that all subjects start in state  and proceed along a learning path. The chance to switch from one state in the learning path to the next state is described by a learning parameter τ. The model contains in addition parameters for careless errors and lucky guesses per problem. Thus, the state of each subject at each point in time can be seen as a latent state which results from a mixture of the states in the learning path of the subject by the learning parameter τ and from the influence of the error parameters. The additional assumptions concerning the closure of the knowledge structure and concerning the existence of learning paths can to some extent be motivated as long as the items in Q are problems 7

In the empirical examples which are described later on in the paper we use 10000 repetitions. We need this huge number of repetitions since the knowledge structure determined in the first step of the algorithm is not changed inside the optimization loop. A possible alternative implementation of the simulated annealing procedure would be to allow random changes of the states inside the optimization loop. This alternative implementation would require fewer repetitions, but would on the other hand increase the number of necessary steps inside the optimization loop. 8 Please note that the described simulated annealing algorithm is just one out of several possible methods to estimate the model parameters. Especially for bigger problem sets the performance of this algorithm can be problematic since a huge number of repetitions with different starting values are required to produce valid results. More research is necessary to investigate the quality of the estimated solution of the simulated annealing algorithm and to determine which estimation method will provide the best results.

11

which can be solved or failed by subjects (this is of course the original domain for which the knowledge structure approach is developed). But when we try to apply knowledge structures to the description of the response behavior in questionnaires these assumptions can hardly be justified. In contrast to the model of Falmagne et al. (1990) our approach does not require any additional assumptions concerning the closure or other internal properties of the knowledge structure. Our method is very closely related to a method presented by Wiley and Martin (1999) respectively Martin and Wiley (2000). They analyze data from questionnaires with special latent class models which can be interpreted as (quasi-ordinal) knowledge structures. Their method starts from a partial-order on the set of questionnaire items. This partial order correspond to a knowledge structure which contains , Q and is closed under union and intersection (Birkhoff, 1937), i.e. a quasi-ordinal knowledge structure. The states in this knowledge structure are called belief states, since the items represents in their application context beliefs which a subject can hold or not hold. The belief states are linked to latent classes. The connection between the deterministic belief states and the corresponding latent classes is done by incorporating an error parameter per item. This error parameter describes the probability that a subject in a belief state does not answer the item accordingly to his or her belief, for example due to a misunderstanding of the item formulation. Thus, the model does not distinguish between lucky guesses and careless errors. The values for the error probabilities of the problems are not restricted directly9. Wiley and Martin (1999) describe in addition how it is possible to detect the correct belief states by a latent class analysis and how different models, i.e. partial-orders on the item set, can be compared by methods of latent class analysis. They also use BIC to compare different models. The main difference between our approach and the method described in Wiley and Martin (1999) is that they rely to some extent to the assumption that the knowledge structure can be derived from a quasi-order on the items. The problem with this assumption is that the resulting set of belief states can be too big, because it may contain belief states which have no real empirical evidence and are only included to satisfy the closure assumption. Another important difference is that we require that the researcher restricts the values for the error parameters in advance. Thus, the method will produce always results which can be interpreted as a representation of a deterministic set of states. In the procedure of Wiley and Martin (1999) latent class models can result from the analysis which can not be interpreted as knowledge structures, since the values for the error parameters are too high.

Example 1 In this example we analyse a data set which is based on the June 1987 New York State Regents Competency Test in Mathematics. This is a 60 problem test covering skills from high-school mathematics. The data set10 analysed in this example consists of 6 of the 60 problems from the original test. The data set contains data from 60000 students which participated in the test. This data set was already analysed in the context of knowledge structure theory by Villano (1991) and Theuns (1998). These publications contain also a detailed description of the data set.

9

Wiley and Martin (1999) interpret these parameters clearly as error parameters which are properties of the items and not properties of the model. Thus, they assume that the error parameters show small values. For example on page 131 we find the statement We consider probabilities of correct classifications of under 0.75 as suspect, more importantly, we belief that the degree of misclassification should be related to factors that are known to produce variations in survey responses, such as interviewer effects or cognitive complexity. 10 The data set used for the analysis was taken from Theuns (1998), page 189, Table 11.2.

12

The 6 problems in the problem set are open-ended mathematical problems, which are given by:      

Problem a: Add: 546 and 1248 and 26 Problem b: Multiply: 507 with 56 Problem c: Subtract: 1.25 from 4.5 Problem d: Divide: 6.8 by 7.48 Problem e: A class of 117 students is planning a bus trip. What is the least number of busses that must be reserved if each bus carries a maximum of 47 passengers? Problem f: In a triangle ABC the measure of angle A is 30° and the measure of angle B is 50°. What is the number of degrees in the measure of angle C?

The solution probabilities for the problems a to f are 90%, 79%, 56%, 64%, 47%, and 47%. We analyse the data now with our method. The problems a to f are quite different in their structure. Thus, we can not assume that the probabilities of lucky guesses and careless errors are the same for all problems. For example, the problems e and f require a transformation of the textual information into a proper mathematical procedure, while this is not necessary for the problems a, b, c, and d. Since the problems are open-ended mathematical problems the chances for careless errors and lucky guesses should not be too high. Thus, we restrict the values for q and q to the interval [0, 0.1]. Using the BIC criterion the best fitting solution was found for 20 latent classes respectively states. This knowledge structure is given by:  =

{, {a}, {a, b}, {a, d}, {a, e}, {a, b, c}, {a, b, d}, {a, b, f}, {a, b, e}, {a, b, c, d}, {a, b, c, f}, { a, b, e, f}, {a, b, c, e}, {a, b, d, e}, {a, b, d, f}, {a, b, c, e, f }, {a, b, d, e, f}, {a, b, c, d, e}, {a, b, c, d, f}, Q}

The calculated error probabilities are a = 0.087, a = 0.028, b = 0.1, b = 0.073, c = 0.029, c = 0.062, d = 0.023, d = 0.059, e = 0.089, e = 0.083, f = 0.06, f = 0.074. So the error probabilities are in fact quite different for the different problems. When we compare this solution with the knowledge structure which was determined in Theuns (1998) with the help of Boolean analysis, we see that it contains all 8 states from this solution. Each state in , with the exception of , contains problem a. Thus, if a subject solves one of the problems b to f then this subject also solves a, i.e. the implications b, c, d, e, f  a hold. Other implications which can be derived from  are c  b and f  b.

Example 2 In our second example we will show an application of the method to an exploratory analysis of the data from a questionnaire. We use our analysis method to search for states in data sets from the International Social Science Survey Programme (ISSP) for the year 1995. The ISSP is a continuing annual program of cross-national collaboration on surveys covering topics important for social science research. There are 29 nations participating in the program. The program conducts each year one survey with comparable questions in each of the participating nations. The theme of the survey is changed each year. The theme of the ISSP 1995 was National Identity. We analyse the results for Question 4 of this survey for Western Germany, and Eastern Germany.

13

__________________________________________________________________________________________

Question 4: Some people say the following things are important for being truly German. Others say they are not important. How important do you think each of the following is ...

a

to have been born in Germany

b

to have German citizenship

c

to have lived in Germany for most of one’s life

d

to be able to speak German

e

to be a Christian

f

to respect Germany’s political institutions and laws

g

to feel German

Very important

Important

Not very important

Not important at all

Can’t choose

  

  

  

  

  

  

  

  

  

  











__________________________________________________________________________________________

To apply our analysis method to this data we have to change the answer categories. The answers Very important and Important are coded as 1, the answers Not very important and Not Important at all are coded as 0 and the answer Can’t choose is coded as –. All response patterns which contain at least one – were removed from the data set for the analysis11. We have to note here that the interpretations of the basic parts of knowledge structure theory change in this research context. A knowledge state represents the answers given to the questions above and represents thus the opinion or belief of a subject to the aspect of national identity as described by the statements of question 4. We adopt the terminology of Wiley and Martin (1999) and use in the following the terms belief state and belief structure instead of the terms knowledge state and knowledge structure. It seems plausible to assume that the probability of an erroneous response does not depend on the item12. So we use the assumption that these are the same for all items for the analysis. The best fit accordingly to the BIC criterion was found with 17 belief states for Germany West and East. Table 1 shows the determined belief states and their estimated relative frequencies. Insert Table 1 around here An interesting result is that the belief structures from Germany West and Germany East are quite similar with respect to the existing belief states. They share 14 of their 17 existing belief states. But the estimated frequencies of the constructed belief states are quite different for some of these states (see for example the estimated relative frequencies for the belief states 1111111 and 1111011). 11

The complete data sets contained 1282 response patterns for Germany West and 612 response patterns for Germany East. After the patterns which contained at least one – where removed the reduced data sets contained 1126 response patterns for Germany West and 510 response patterns for Germany East. 12 This assumption was checked by an analysis of the data. Therefore, we run the analysis with and without this assumption. Let M1 be the best fitting knowledge structure accordingly to the BIC criterion when we assume that the error probabilities do not depend on the items. Let M2 be the best fitting knowledge structure accordingly to the BIC criterion when we assume that the error probabilities depend on the items. M2 shows naturally a better fit to the data concerning the L2-statistic, but the BIC value of M2 was higher than the BIC of M1. Thus, the decrease in the L2statistic does accordingly to the BIC criterion not compensate for the increase in the number of parameters.

14

Figure 1 presents the belief states for Western Germany as a Hasse-Diagram. Insert Figure 1 around here The belief structure is not closed under union or intersection. Thus, this belief structure could not be detected by the method described by Wiley and Martin (1999). We can easily derive the following elementary implications from Figure 1: a, b, c, d, e, g  f a  b, d b, c, g  d An example for a complex implication is c  b  g. Thus, the derived belief structure can be used to determine deterministic dependencies between items. Some of these dependencies follow more or less directly from the formulation of the items. This is, for example, true for the implication a  c. Assume a subject who beliefs that it is important to be born in Germany to be truly German. Such a subject will also find it important to have lived most of one’s life in Germany. For other implications the reason for the dependency may not be as obvious. It is an interesting question for further research to clarify which common beliefs in subjects cause these implications.

Summary We have shown that we can represent a knowledge structure as a constrained latent class model. To do this transformation we have only to enhance the knowledge structure with parameters which describe the probabilities of lucky guesses and careless errors per problem and parameters which describe the frequency of occurrence for a knowledge state in the investigated population. These parameters can be estimated as in latent class models by optimization procedures. To guarantee that the resulting latent class model can still be interpreted as a knowledge structure we have in addition to restrict the values for the error parameters to certain intervals which depend on the problems under investigation. The integration of knowledge structures into latent class models has in our opinion the following advantages. First, it is with the described enhancement possible to determine a knowledge structure from a set of observed response data. Thus, the integration allows us with the help of the methods of latent class analysis to construct knowledge structures by exploratory data analysis. Since the size of the power set of a problem set and thus the number of necessary response patterns for such an analysis increases exponentially with the size of the problem set, the application of this method is of course restricted to small problem sets. Second, we can use the connection to latent class models to validate knowledge structures empirically. The empirical validation of knowledge structures has here two aspects. The first aspect is that we need empirical methods to compare different knowledge structures concerning their ability to represent the response behavior of subjects. The second aspect is that we need empirical methods to test the fit of a knowledge structure to observed data statistically. Latent class analysis offers a well developed methodology to deal with these two aspects. The integration of knowledge structure theory into latent class models allows us to use this methodology to answer the corresponding questions in the theory of knowledge structures.

References Akaike, H. (1974). A new look at the statistical identification model. IEEE Trans. Auto Control, Vol. 19, 716-723.

15

Albert, D.; Schrepp, M. & Held, T. (1994). Construction of Knowledge Spaces for Problem Solving in Chess. In: G. Fischer & D. Lamming (Eds.), Contributions to Mathematical Psychology, Psychometrics, and Methodology. New York: Springer, Ch. 9, 123 - 135. Albert, D. & Lukas, J. (1999). Knowledge Spaces: Theories, Empirical Research and Applications. Mahwah, N.J.: Erlbaum. Birkhoff, G. (1937). Rings of sets. Duke Mathematical Journal, Vol. 3, 443-454. Cerny, V. (1985). Thermodynamic approach to the traveling salesman problem: An efficient simulation algorithm. Journal of Optimization Theory and Applications, Vol. 45/1, 41-51. Clogg, C.C. (1995). Latent Class Models. In: G. Arminger, C.C. Clogg & M.E. Sobel (Eds.); Handbook of Statistical Modeling for the Social and Behavioral Sciences, Ch. 6, 311-359. Doignon, J.P. & Falmagne, J.C. (1985). Spaces for the assessment of knowledge. International Journal of Man-Machine Studies, Vol. 23, 175-196. Doignon, J.P. (1994). Knowledge spaces and skill assignments. In G.H. Fischer & D. Laming (Eds.), Contributions to mathematical psychology, psychometrics and methodology. Berlin, Heidelberg, New York: Springer. Doignon, J.P. & Falmagne, J.C. (1998). Knowledge Spaces. Berlin: Springer. Dowling, C. (1993). Applying the basis of a knowledge space for controlling the questioning of an expert. Journal of Mathematical Psychology, Vol. 37, 21-48. Düntsch & Gediga (1996). On query procedures to build knowledge structures. Journal of Mathematical Psychology, Vol. 40, 160-168. Falmagne, J.C. & Doignon, J.P. (1988a). A class of stochastic procedures for assessing the state of a system. British Journal of Mathematical and Statistical Psychology, Vol. 41, 1-23. Falmagne, J.C. & Doignon, J.P. (1988b). A markovian procedure for assessing the state of a system. Journal of Mathematical Psychology, Vol. 32/2, 232-258. Falmagne, J.C. (1989). A latent trait via a stochastic learning theory for a knowledge space. Psychometrika, Vol. 54, 283-303. Falmagne, J.C.; Koppen, M.; Vilano, M.; Doignon, J.P. & Johannesen, L. (1990). Introduction to knowledge spaces: How to build, test and search them. Psychological Review, Vol. 97/2, 201-224. Flament, C. (1976). L’analyse booleenne de questionaire. Paris: Mouton. Goodman, L.A. (1974a). The analysis of systems of qualitative variables when some of the variables are unobservable. Part I – A modified latent structure approach. American Journal of Sociology, Vol. 79, 1179-1259. Goodman, L.A. (1974b). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, Vol. 61, 215-231. Goodman, L.A. (1978). Analysing qualitative/categorial variables: loglinear models and latent structure analysis. Cambridge. Kambouri, M; Koppen, M.; Villano, M. & Falmagne, J.C. (1994). Knowledge assessment: Tapping human expertise by the QUERY routine. International Journal of Human-Computer Studies, Vol. 40, 119-151. Kirkpatrick, S.; Gelatt, C.D. & Vecchi, M.D. (1983). Optimization by simulated annealing. Science, 220, 4998, 671-680. Koppen, M. (1993). Extracting human expertise for constructing knowledge spaces: An algorithm. Journal of Mathematical Psychology, Vol. 37, 1-20. 16

Koppen, M. & Doignon, J.P. (1990). How to build a knowledge space by querying an expert. Journal of Mathematical Psychology, Vol. 34, 311-331. Korossy, K. (1996). Kompetenz und Performanz beim Lösen von Geometrie-Aufgaben. Zeitschrift für Experimentelle Psychologie, Vol. 43, 279-318. Leeuwe, J.F.J.v. (1974). Item tree analysis. Nederlands Tijdschrift voor de Psychologie, Vol. 29, 475-484. Martin, J.L. & Wiley, J.A. (2000). Algebraic Representations of Beliefs and Attitudes II: Microbelief Models for dichotomous Belief Data. Soziological Methodology, Vol. 30/1, 123-164. McCutcheon, A.L. (1987). Latent class analysis. Newbury Park: Sage Publications. Metropolis, N.; Rosenbluth, M.; Rosenbluth, A., Teller, A. & Teller, E. (1953). Equation of state calculations by fast computing machines. Journal of Chemical Physics, Vol. 21/6, 1087-1092. Proctor, C.H. (1970). A probabilitistic formulation and statistical analysis of Guttman scaling. Psychometrika, Vol. 35, 73-78. Raftery, A. (1985). A note on bayes factors for log-linear contingency table models with vague prior information. Journal of the Royal Statistical Society, Series B, Vol. 48, 249-250. Rost, J. & Langeheine, R. (1997). Applications of Latent Trait and Latent Class Models in the Social Sciences. New York, Waxmann. Schrepp, M. (1995). Modeling interindividual differences in solving letter series completion problems. Zeitschrift für Psychologie, Vol. 203, 173-188. Schrepp, M. (1999a). On the empirical construction of implications on bi-valued test items. Journal of Mathematical Social Sciences, Vol. 38/3, 361 – 375. Schrepp, M. (1999b). Extracting knowledge structures from observed data. British Journal of Mathematical and Statistical Psychology, Vol. 52/2, 213-224. Schrepp, M (2002). Explorative analysis of empirical data by boolean analysis of questionaires. Zeitschrift für Psychologie, Vol. 210/2, 99-109. Schrepp, M (2003). A method for the analysis of hierarchical dependencies between items of a questionaire. Methods of Psychological Research, Vol. 8/1, 43-79. Schrepp, M. & Held, T. (1995). A simulation study concerning the effect of errors on the establishment of knowledge spaces by querrying experts. Journal of Mathematical Psychology, Vol. 39, 376-382. Schrepp, M.; Held, T.& Albert, D. (1999). Component based construction of surmise relations for chess problems. In: D. Albert & J. Lukas (Eds.), Knowledge spaces: Theories, Empirical Research, and Applications. Mahwah, New Jersey, USA: Lawrence Erlbaum Associates, Ch. 3, 41-66. Schwartz, G. (1978). Estimating the dimensions of a model. The Annals of Statistics, Vol. 5/2, 461-464. Theuns, P. (1994). A dichotomization method for boolean analysis of quantifiable co-occurence data. In: G. Fischer & D. Laming (Eds.), Contributions to Mathematical Psychology, Psychometrics and Methodology. Springer: New York. Theuns, P (1998). Building a knowledge space via boolean analysis of co-occurence data. In: C.E. Dowling, F.S. Roberts, and P. Theuns (Eds.), Recent Progress in Mathematical Psychology. Erlbaum: Hillsdale, USA. Villano, M. (1991). Computerized knowledge assessment: Building the knowledge structure and calibrating the assessment routine. Unpublished doctoral dissertation, New York University. 17

Wiley, J.A. & Martin, J.L. (1999). Algebraic Representations of Beliefs and Attitudes: Partial Order Models for Item Responses. Sociological Methodology, Vol. 29/1, 113-146.

18

Table 1: Belief states and estimated relative frequencies for question 4 of ISSP 1995 for Germany West and Germany East. An empty cell indicates that the corresponding belief state is not contained in the corresponding belief structure. Belief state 1111111 1111011 0111011 0101011 1101011 0011011 0001011 0101010 0111111 0001010 0111010 1111010 0000000 0000010 1101010 0000110 0001110 1111001 0000011 0100010

Germany West 0.225 0.173 0.102 0.092 0.047 0.046 0.036 0.039 0.036 0.038 0.027 0.027 0.027 0.029 0.018 0.016 0.020

Germany East 0.160 0.257 0.128 0.067 0.038 0.025 0.021 0.032 0.034 0.026 0.035 0.044 0.033 0.030

0.025 0.022 0.022

19

Figure 1: Hasse-Diagram of the belief structure for Western Germany concerning the subset relation.

Q { a, b, c, d , f, g } { a, b, c, d , f } { a, b, d , f }

{ b, c, d , e, f, g }

{ a, b, d , f, g } { b, c, d , f }

{ b, d, f }

{ b, c, d , f, g }

{ b, d , f, g }

{ c, d, f , g }

{ d, f , g } { d, f }

{ d, e, f } { e, f }

{f} 

20