PSYCHOMETRIKA — VOL . 77, NO . 2, A PRIL 2012 DOI : 10.1007/ S 11336-012-9256-6
A NEW HETEROGENEOUS MULTIDIMENSIONAL UNFOLDING PROCEDURE
J OONWOOK PARK AND P RIYALI R AJAGOPAL SOUTHERN METHODIST UNIVERSITY
WAYNE S. D E S ARBO PENNSYLVANIA STATE UNIVERSITY A variety of joint space multidimensional scaling (MDS) methods have been utilized for the spatial analysis of two- or three-way dominance data involving subjects’ preferences, choices, considerations, intentions, etc. so as to provide a parsimonious spatial depiction of the underlying relevant dimensions, attributes, stimuli, and/or subjects’ utility structures in the same joint space representation. We demonstrate that care must be taken with respect to a key assumption in existent joint space MDS models such that all estimated dimensions are utilized by each and every subject in the sample, as this assumption can lead to serious distortions with respect to the derived joint spaces. We develop a new Bayesian dimension selection methodology for the multidimensional unfolding model which accommodates heterogeneity with respect to such dimensional utilization at the individual subject level for the analysis of two or threeway dominance data. A consumer psychology application regarding the preference for Over-the-Counter (OTC) analgesics is provided. We conclude by discussing the practical implications of the results, as well as directions for future research. Key words: multidimensional unfolding, dimension selection, Bayesian multidimensional scaling, consumer psychology, heterogeneity.
1. Introduction Carroll and Arabie (1980) broadly defined multidimensional scaling (MDS) as a family of various geometric models for the multidimensional representation of the structure in data as well as the corresponding set of methods for fitting such spatial models. They developed a taxonomy of the area of multidimensional scaling based on the properties of the input measurement data (e.g., number of modes, number of ways, power of the mode, scale type, conditionality, completeness of the data, replications, etc.) and properties of the underlying multidimensional measurement model (e.g., type of geometric model, number of sets of points in the derived space, number of derived spaces, degree of constraints on model parameters, etc.). Thus, their definition extends classical MDS which typically deals only with spatial models for proximity data (e.g., similarities) to various other types of continuous and discrete representation, as well as to other data types. Our focus will be on the major types of spatial model utilized for the analysis of dominance data (i.e., preference, profile, consideration to buy, importance, effectiveness, choice, etc.) as are typically collected in the various social sciences. MDS procedures that can provide joint space representations (row and column entities) for general two-way dominance data abound in terms of either multidimensional unfolding representations (Borg & Groenen, 2005; Carroll, 1972; DeSarbo & Carroll, 1985; DeSarbo & Hoffman, 1986; DeSarbo & Rao, 1984, 1986; DeSarbo, Young, & Rangaswamy, 1997; Greenacre & Browne, 1986; Gifi, 1990; Heiser, 1981; Kruskal, 1964a, 1964b; Kruskal & Carroll, 1969; Lingoes, 1972, 1973; Roskam, 1973; Schönemann, 1970; Takane, Young, & DeLeeuw, 1977; Young & Torgerson, 1967), vector or Requests for reprints should be sent to Joonwook Park, Cox Business School, Southern Methodist University, Dallas, TX 75275, USA. E-mail: [email protected]
© 2012 The Psychometric Society
scalar product representations (Borg & Groenen, 2005; Carroll, 1980; Gifi, 1990; Slater, 1960; Tucker, 1960), or correspondence analysis and related optimal scaling approaches (Benzécri, 1973, 1992; Cox & Cox, 2001; Gifi, 1990; Greenacre, 1984; Nishisato, 1980). Readers interested in a more comprehensive examination of this particular area of MDS are encouraged to read the book on MDS by Borg and Groenen (2005) for an in-depth treatment of these and other existent MDS approaches for the analysis of such data. An important assumption of such joint space MDS models is that the derived latent dimensions are shared by all subjects. Previous MDS research has ignored the possibility that subjects may selectively utilize latent dimensions to form their ideal points or vectors which best represent their own preference structure. However, there is a host of behavioral research that suggests that individuals may utilize different dimensions of an object to form opinions, attitudes, and/or evaluations due to several different factors such as their differing goals (Payne, Bettman, & Johnson, 1993), individual differences such as level of expertise (Alba & Hutchinson, 1987), contextual factors such as knowledge accessibility (Feldman & Lynch, 1988), time pressure (Wright & Weitz, 1977), involvement (Petty & Cacioppo, 1986), and mood states (Isen, 1993). For example, the research on goals has found that decision makers with accuracy goals undertake more extensive processing of attribute information than decision makers with effort minimization goals (Payne et al., 1993); thus, individuals with the goal of effort minimization are likely to utilize fewer dimensions than individuals with the goal of accurate decision making. Individuals with justification goals, on the other hand, have been shown to focus on attributes that help them justify their final choices (Kunda, 1990). Hence, the type of goal is a very important determinant of the number and type of dimensions utilized during decision making. Research on expertise has shown that experts possess greater and more detailed knowledge structures about categories than novices (Johnson & Mervis, 1997), and are able to recall more dimensions/attributes about alternatives than novices (Vincente & Wang, 1998). This suggests that when decision making is memory based, experts will utilize more dimensions than novices and are more likely to focus on important and relevant dimensions, while novices are likely to rely on more salient and prototypical attributes/dimensions (Alba & Hutchinson, 1987). Apart from goals and other individual differences, contextual variables such as time pressure, mood states, and involvement can also impact dimension selection. For example, under moderate time pressure, individuals are likely to process each stimulus alternative separately, while under severe time pressure, individuals have been shown to switch to select a few important dimensions and evaluate alternatives on the basis of this restricted set of dimensions (Payne, Bettman, & Luce, 1996). Houston and Sherman (1995) found that the starting alternative in a choice process determined the type of dimension that received greater weight during choice. Thus, features shared by the choice alternatives were canceled and greater weight was placed on the unique features of the alternative that was the starting point for comparison. Since the starting alternative in a choice set is likely to be different for different individuals, different dimensions would emerge as unique versus common, which in turn results in different weights over dimensions during the choice process. Involvement with the stimulus category or the decision has also been shown to have a significant effect on decision making processes and information processing. Individuals with higher levels of involvement with the decision or the object have been shown to pay greater attention to the decision making task and process more information than individuals with lower levels of involvement. Highly involved decision makers have also been shown to focus on relevant aspects of the choice task as compared to subjects with low involvement levels who focus on peripheral aspects of the choice task (Petty & Cacioppo, 1986; Petty, Cacioppo, & Goldman, 1981). Finally, research on positive affect has shown that people in a positive mood are cognitively more flexible than people in negative or neutral moods and have been shown to be able to utilize more dimensions and broader dimensions during decision making (e.g., Isen, 1993; Isen, Daubman & Nowicki, 1987).
JOONWOOK PARK, PRIYALI RAJAGOPAL, AND WAYNE S. DESARBO
Research has also shown that there are differences in respondents’ usage of response scales, and these differences in response styles may bias the interpretation of results from the data. For example, Baumgartner and Steenkamp (2001) found that several different types of response style (e.g., extreme response style, midpoint responding, etc.) could bias results. Similarly, the use of poorly constructed scales that use biased or ambiguous scale labels and/or insufficient scale points, etc. (Churchill & Peter, 1984; Friedman, Friedman, & Gluck, 1988; Schwarz, Strack, Muller, & Chassein, 1988; Sterngold, Warland, & Hermann, 1994), may also result in errors in the interpretation of data. Specifically, such scales could lead to respondents selecting options that do not necessarily correspond to their true judgments. It is not clear what effects the assumption of homogeneous dimensionality has on accurately portraying the true joint space representation in terms of the relationships both between and within the row and column entities. The major contributions of this manuscript are twofold. First, we examine the issue of dimensional heterogeneity (i.e., different subjects utilize potentially different subsets of dimensions in the evocation of their dominance judgments) and illustrate with synthetic data the potential distortions in the derived joint space representations and associated implications when using traditional joint space MDS methods in the presence of such dimensional heterogeneity. Second, we develop a new Bayesian joint space multidimensional unfolding methodology that can accommodate both dimensional selection heterogeneity and individual level preference heterogeneity in a unified framework. We adapt a Bayesian variable selection approach (George & McCulloch, 1993, 1997; Gilbride, Allenby, & Brazell, 2006; Kuo & Mallick, 1988) to estimate individual level dimension selection for deriving joint space structures from the analysis of two- or three-way dominance data. We apply our procedure to synthetic data with dimensional selection and compare its performance in recovering the true underlying joint space with traditional methods. Then, we introduce an empirical consumer psychology application using Over-the-Counter (OTC) analgesic dominance data. Finally, the paper concludes with a discussion of the results and implications, as well as avenues for future research.
2. A Synthetic Data Illustration Using Traditional Joint Space MDS Methods As an illustration of the potential difficulties associated with the use of traditional joint space MDS models to dominance data with heterogeneous dimensional selection, we created simulated data for I = 100 subjects, J = 8 stimuli, R = 1 situation/replication, in T = 2 dimensions based on a simple unfolding model with dimensional selection. Here, we first randomly generated stimulus positions via a standard normal distribution subject to the identification constraints to be discussed in Section 3.1. Then, we randomly generated dimensional selection indicators via a Bernoulli distribution with probability 0.6 for each dimension independently. Next, we randomly generated the ideal points from a normal distribution for these 100 subjects and computed squared distances1 for the utilized dimensions per subject. Finally, we generated individual specific additive constants from a normal distribution and added N (0, 1) error. Note that these simulated data included 14 subjects without any dimension utilization, 52 subjects using only one of the two dimensions, and 31 subjects using both dimensions. These dispreferences were then submitted as input to three different traditional multidimensional unfolding procedures: (a) the weighted unfolding model in GENFOLD2 (DeSarbo & Rao, 1984), (b) PREFSCAL (Busing, Groenen, & Heiser, 2005), and (c) ALSCAL (Takane et al., 1977). 1 Note that PREFSCAL assumes Euclidean distance while all other models, including the proposed model, assume squared Euclidean distance. As such, we inputted Euclidean distances and added the same amount of error for PREFSCAL analysis.
F IGURE 1. True configuration of the simulated data.
These various analyses were performed treating the data as metric and row conditional. We then utilized the appropriate configuration matching procedure (depending upon each solution’s particular set of parameter indeterminacies) to maximize the congruence of each recovered solution with the true joint space of the solution. Figure 1 shows the true joint space utilized to create the input data, and Figure 2 shows the derived joint space from (a) the proposed Heterogeneous Dimensional Multidimensional Unfolding (HDMDU) model (we will discuss this later after we introduce this new procedure), (b) the GENFOLD2 weighted unfolding model, (c) PREFSCAL, and (d) ALSCAL. Here, the stimulus positions are displayed as black squares with alphabetical labels (A-H). The ideal points of those subjects who utilize both dimensions are displayed as grey dots; ideal points of subjects who only use one dimension are displayed with plus signs (‘+’); and ideal points of subjects who do not utilize any dimensions are displayed with star signs (‘*’) for models (b)–(d). Figure 2(a) depicts the derived joint space configuration in two dimensions estimated from the proposed HDMDU model. We portray here only those subjects estimated to be using at least one dimension as explained earlier. Notice that ideal points of those subjects who utilize only one dimension are portrayed on the line above each dimension with plus signs (‘+’).2 As shown, it is nearly identical to Figure 1. However, the three traditional unfolding models estimate rather distorted joint spaces with radically different implications as compared to what should be recovered in Figure 1. First, ALSCAL in Figure 2(d) shows a type of degenerate solution where most of the ideal points are located nearby the origin, and stimulus locations are located farther away from most of the ideal points.3 The PREFSCAL solution displayed in Figure 2(c) shows a slightly better inter-mixed joint space. However, one can find that the ideal points of those subjects who either utilize only one dimension or no dimensions are located farther away from all stimulus locations, thereby distorting the derived joint space.4 The GENFOLD2 weighted unfolding model in Figure 2(b) shows 2 We thank an anonymous reviewer for this point. 3 These derived joint spaces are usually referred as degenerate solutions meaning that the derived joint space is
extremely uninformative despite good fit to the data (Heiser, 1989). In MDU models, this usually takes the form of a wide separation between row and column points. Note that we use 0.00001 for default convergence criteria for ALSCAL as the default convergence option would keep the final solution close to the initial solution. 4 One reviewer suggested a different specification of PREFSCAL to mimic the proposed model with a three-way dimension weighting option combined with almost missing values. Here, one can replicate the two-way data I = 100
JOONWOOK PARK, PRIYALI RAJAGOPAL, AND WAYNE S. DESARBO
F IGURE 2. Comparison of derived joint spaces for the simulated data. Note: ideal points of subjects using two dimensions are displayed with dots, ideal points of subjects using one dimension are displayed in plus signs. Furthermore, ideal points of subjects with no dimension usage are displayed in star signs in (b)–(d).
a seemingly much better recovery of the true joint space, particularly regarding the stimulus locations. However, the estimated weights matrix contain a large number of sizable negative weights and positive weights with very few of small magnitudes. Here, we find 42% saddle points, 21% anti-ideal points, and 37% ideal points. In other words, the weighted unfolding model appears to accommodate dimensional heterogeneity via the use of saddle points and antiideal points. In sum, this simple example illustrates how one can derive highly distorted representations of joint space structures by explicitly ignoring dimensional selection heterogeneity in using such traditional joint space MDS methods. Stimulus distinction appears to collapse in these solutions, and the intermixing of stimuli and subject ideal points is not accurately re-
times where row i of the i-th replication has a weight one and very small numbers (i.e., 1e–06) elsewhere. We tried this approach but didn’t find any significant improvement over the more traditional PREFSCAL approach.
covered. We will return to this illustration later in the manuscript and provide additional details.
3. The Proposed Heterogeneous Multidimensional Unfolding Procedure Carroll (1972, 1980) introduced a hierarchy of nested preference models involving four different structural models with somewhat different underlying utility structures: the vector, the simple unfolding, the weighted unfolding, and the general unfolding models. In the simple unfolding model, latent utility is typically modeled as an inverse function of the squared Euclidean distance between a stimulus location and a subject’s ideal point such that utility decreases for stimulus locations further away from a subject’s ideal point in any direction (Coombs, 1960). This implies that the subject is indifferent between the stimuli located equidistant on the circle from his/her ideal point in a two-dimensional joint space. An extension of the simple unfolding model is the weighted unfolding model (Carroll, 1972, 1980; DeSarbo & Rao, 1984; Wedel & DeSarbo, 1996) where the squared Euclidean distance is weighted by dimension via individual level scale parameters. As shown by Carroll (1980), the weighted unfolding model is flexible for estimation purposes as it nests the simple unfolding and vector models as special cases. Although the weighted unfolding model is flexible for estimation purposes, it often hinders the interpretation of the joint space map since the simple distance between the stimulus position and the ideal point in the derived joint space map does not represent the recovered preference of a particular subject—one has to apply the estimated weights. As noted by Carroll (1972, 1980) and above, these weights can be positive or negative complicating the resulting interpretation. The corresponding stochastic latent disutility functions of the simple and weighted unfolding MDS models can be represented, respectively, as DUij = ci
T (xj t − wit )2 + di + eij ,
cit (xj t − wit )2 + di + eij ,
where i is the index for subjects (i = 1, . . . , I ) who make dominance judgments toward j = 1, . . . , J stimuli using t = 1, . . . , T latent dimensions. In Equations (1) and (2), xj t refers to stimulus j ’s position on dimension t; wij refers to the subject i’s ideal point on dimension t; di is the individual specific additive constant; cit is the individual and dimension specific weight for the weighted unfolding MDS model; ci is the individual specific multiplicative constant for the simple unfolding MDS model; and eij is error assumed to follow a Normal distribution such that eij ∼ N (0, σ 2 ). These simple and weighted unfolding MDS models are typically applied to two-way dominance data where the dominance data contain subjects as the rows and stimuli as the columns. Assuming positive weights in Equation (2), a subject’s latent utility decreases for stimulus locations further away from the subject’s ideal point in any direction in a weighted unfolding model. For such interpretation, the dimensional scale parameter cit has to be constrained to be positive in the weighted multidimensional unfolding model. In two dimensions, the isoutility contours are represented as ellipses centered on the ideal points. If we let cit = ci = 1 for all subjects and dimensions, the weighted unfolding MDS model reduces to the simple unfolding MDS model whose iso-utility contours in two dimensions are concentric circles centered around the ideal points.
JOONWOOK PARK, PRIYALI RAJAGOPAL, AND WAYNE S. DESARBO
As mentioned above, a problem with both the traditional simple and weighted unfolding MDS models is that they implicitly assume that the dimensions underlie all subjects’ judgments, and that all subjects share the same dimensions to make their judgments. However, as discussed in the introduction section, there is a plethora of behavioral research that has documented significant heterogeneity among subjects’ selection of dimensions during decision making. As such, ignoring subject’s information selection heterogeneity can lead to biased parameter estimates and distorted joint spaces as shown in our simple illustration earlier.5 We address this issue and provide a parsimonious way to represent subjects’ heterogeneity in their selective information usage in such spatial models via a new Bayesian dimensional selection procedure akin to Bayesian variable selection in regression settings (Carlin & Chib, 1995; George & McCulloch, 1993, 1997; George, McCulloch, & Tsay, 1995; Geweke, 1996; Mitchell & Beauchamp, 1988). Note that Bayesian variable selection procedures in a regression context can be seen as deciding which of the regression parameters are equal to zero and which of them are not equal to zero. For instance, George and McCulloch (1993) proposed a stochastic search variable selection procedure (SSVS) with a hierarchical mixture normal prior on the regression coefficients where one distribution is concentrated around zero and the other distribution is relatively diffused. Gilbride et al. (2006) applied this Bayesian variable selection to a brand choice context by assuming subjects’ selective processing of product attribute information. Specifically, they extended existing Bayesian variable selection to the individual level via a random effects distribution. A direct extension of existing Bayesian variable selection procedure to the multidimensional unfolding model (MDU model hereafter), however, is not as straightforward because of the interpretation of the zero value for the parameters in the MDU models. Unlike regression, an ideal point at zero (i.e., wit = 0) or stimulus coordinate at zero (i.e., xj t = 0) in MDU models still has an effect on the latent disutility calculation via squared Euclidean distance. As such, we extend the Bayesian dimensional selection procedure to an MDU model by introducing a latent variable γit such that P (γit = 1) = φt and P (γit = 0) = 1 − φt . This latent variable, γit determines the probability that subject i would utilize dimension t (i.e., P (γit = 1 − φt )) or not (i.e., P (γit = 0)) by treating the latent Euclidean squared distance as a variable. If γit = 0, individual i does not utilize dimension t for both stimulus and ideal point positions as they do not affect the latent disutility of subject i; likewise, if γit = 1, subject i uses dimension t. Equation (3) shows the latent disutility of the proposed Heterogeneous Dimensional multidimensional unfolding model (noted as the HDMDU model hereafter): DUij r =
T (xj t − wit )2 γit + dir + eij r ,
where eij r ∼ N(0, σ 2 ) and without loss of generality, we set σ 2 = 1. Note that the latent disutility of the HDMDU model in Equation (3) accommodates three-way dominance data unlike the traditional simple and weighted MDU models in Equations (1) and (2). In addition to stimulus locations xj t and ideal points wit , we introduce individual and situation specific additive parameters dir for situation r = 1, . . . , R (e.g., time, experimental treatments, occasions, replications, etc.) for the HDMDU model. Hence, the HDMDU model extends the Bayesian variable selection procedure to individual and dimension level selection in multidimensional unfolding models. The proportion of subjects utilizing dimension t is φt , and the latent variable γit is drawn from a Bernoulli distribution by the standard data augmentation procedure as to be shown below in 5 Theoretically, the weighted unfolding model should be able to account for such dimensional heterogeneity where dimensions not utilized for a particular subject would have an associated weight equal to zero. However, we observed from the small synthetic example presented earlier that this does not always occur in practice.
our technical description of the proposed estimation procedure (see Diebolt & Robert, 1994; Tanner & Wong, 1987). Note that the latent variable γit and the squared Euclidean distance (xij − wit )2 are assumed to be independent a priori, and that each latent variable γit is assumed to be independent. Hence, the indicator function of γit can have such combinations as (0, 0), (1, 0), (0, 1), and (1, 1) in a hypothetical two-dimensional solution. If both indicators of γi1 and γi2 are ones, then the subject i uses both dimensions. If γi1 and γi2 = 1, then subject i selectively uses Dimension 2. However, if both indicators have values of zeros, the estimated joint space structure (i.e., stimulus locations) does not fit subject i’s utility structure. This could happen, for example, where the dispreference ratings of subject i contain very small variation across all stimuli. As an extreme case, this can occur when a respondent provides all dispreference ratings with the same value due to lack of interest, fatigue, lack of stimulus familiarity, etc., or due to poorly constructed scales (e.g., Friedman et al., 1988) as commonly observed in many survey studies. In such cases, the individual level additive constant dir would be the prediction of this respondent’s dispreference structure.6 We now turn our discussion to the detailed MCMC procedures and issues related to the estimation and identification of the proposed HDMDU model. Let Yij r denote the dispreference rating for stimulus j by subject i in the rth replication, and t = 1, . . . , T unknown dimensions. Then, Equation (4) presents the joint posterior density function of the unknown parameters of the proposed HDMDU model: P (xj t , wit , dir , φt , γit |Yij r ) 2 I T R J 1 1 ∝ (xj t − wit )2 γit − dir √ exp − Yij r − 2 2π i=1 r=1 j =1 t=1 2 2 2 2 P σd . × P xj t |τx P wit |τw P dir |di , σd2i P di |σd2 P (φt )P (γit )P τx2 P τw2 P σdi (4) To estimate the HDMDU model, we employ a hierarchical Bayesian approach. We use Markov chain Monte Carlo methods (MCMC) to generate random deviates from the posterior distributions without requiring analytic integration (Chib, 2002; Gelfand & Smith, 1990; Tanner & Wong, 1987). Regarding the specification of the prior distributions, we assume that the dimensional heterogeneity parameter φk follows a Beta prior with parameters ak and bk for k = 1, . . . , T , where f (φk ) = B(ak1,bk ) φkak −1 (1 − φk )bk −1 if 0 < φk < 1, and 0 otherwise for the HDMDU model. Here, we set ak = bk = 1 to reflect non-informative priori information. We then assign univariate normal priors for other parameters such that P (xj t ) ∼ N (0, τx2 ), P (wit ) ∼ N (0, τw2 ), and P (dir ) ∼ N (di , σd2i ). Note that dir is an individual and situation specific parameter that can capture additional variation across situations not fully explained by the squared Euclidean distance between a stimulus location and one’s ideal point. Therefore, we add additional hierarchical structure such that di ∼ N (0, σd2 ), and we set σd2 = 10000 as a vague prior. Note that one can use the posterior distribution of di in place for dir for predictive purpose. For other hyperpriors for the variance, we use standard conjugate priors such as P (τx−2 ) ∼ G(kx , ux ), P (σd2i ) ∼ G(kd , ud ), and P (τw−2 ) ∼ G(kw , uw ), where G denotes the Gamma distribution, and we set kx = ux = kw = uw = kd = ud = 0.5 to reflect vague priori information.
6 As noted by one reviewer, many nonmetric unfolding models can yield degenerate solutions when the transformation can become a constant combined with a solution that has all between set distances equal to the same constant (see, e.g., Borg & Groenen, 2005). The proposed model avoids this type of degeneracy problem as there is no slope parameter on the utility Uir .
JOONWOOK PARK, PRIYALI RAJAGOPAL, AND WAYNE S. DESARBO
3.1. Issues Concerning the Identification of the HDMDU Model When specifying the proposed HDMDU model, one assumes that the HDMDU model is identified. However, unfolding MDS models are typically “under-identified”—these models have an infinite number of solutions which render the same likelihood/objective function values. Thus, some parameters need to be constrained to obtain unique solutions (see, DeSarbo & Rao, 1984, 1986; Wedel & DeSarbo, 1996; Wedel & Kamakura, 2000; Young & Hamer, 1987). These “parameter indeterminacies” (in the two-way simple unfolding model) stem from several indeterminacies such as rotation, reflection, permutation, and origin indeterminacies involving the joint space. Note that these indeterminacies can seriously affect the operation of the proposed MCMC procedure without proper constraints. First, a simple ideal point model solution can be rotated, permuted, or reflected without any change in the resulting likelihood or objective function value. The solution in any simple MDU model is invariant under orthogonal transformation, or reflection transformation by a non-singular T × T matrix L such that L L = I . It follows that det(L) = ±1 and if det(L) = 1, then L represents an orthogonal rotation; if det(L) = −1, then L represents a reflection (Erdem, 1996). Second, the origin indeterminacy, which is sometimes called translational, additive, or centering indeterminacy (Wedel & DeSarbo, 1996), refers to the fact that addition of a constant to all stimulus and ideal point locations does not change the likelihood. For this origin indeterminacy, a restriction can be made to the stimulus coordinates, xj t , such that Jj=1 xj t = 0 for each dimension t (Erdem, 1996). Alternatively, this origin indeterminacy can be removed by fixing the coordinates of some stimuli coordinates at arbitrary constants (Chintagunta, 1994; Elrod, 1988; Elrod & Keane, 1995). Finally, permutation indeterminacy refers to reordering of the dimensions by applying a permutation matrix P to the matrix of stimuli coordinates where P is a T by T binary matrix with a single 1 in each row and column (Young & Hamer, 1987). In sum, these indeterminacies contribute to the under-identification of simple unfolding models without any effect on the likelihood or objective function values, and executing the MCMC procedure will be unnecessarily complicated without explicit constraints. In Bayesian MDS analysis, three different ways have been suggested to resolve such model identification issues. One approach is to impose strong or informative priors on the stimulus coordinates (Fong et al., 2010; DeSarbo et al., 1998, 1999). For example, Fong et al. (2010) combined a Bayesian factor analysis and vector MDS model by employing the posterior of the Bayesian factor analysis as a part of prior distribution of the stimulus coordinates in a vector joint space MDS model. Another approach concerns post-processing the resulting coordinates (Oh & Raftery, 2001; Okada & Shigemasu, 2009). Oh and Raftery (2001) used the Bayesian posterior mode of stimulus coordinates that minimizes the sum of squared residuals between the actual data and the recovered Euclidean squared distances, and post-processed the MCMC sample of stimulus locations so that the transformed stimulus locations would have mean zero. Note that Oh and Raftery’s approach does not impose any constraints to the stimulus locations. A third method is to impose direct constraints on the stimulus coordinates (Bradlow & Schmittlein, 2000; Park, DeSarbo, & Liechty, 2008). We adopt the Park et al. (2008) approach by imposing strict identification constraints on stimulus locations. Specifically, one can remove the rotational, permutation, and reflection indeterminacies by imposing 2T constraints of which T stimulus coordinates are fixed at zero, and T other coordinates are constrained to lie in the positive orthant of the derived space. For example, let x = (x11 , x21 , . . . , xj t ) = (xc , xnc ) , where x c are constrained and x nc are unconstrained stimulus coordinates in a two-dimensional solution. In our proposed model, we use a Gamma prior for x11 and x22 so that these parameters are constrained on the positive real line, while fixing x12 and x21 at zero in the two-dimensional space for permutation, reflection, and rotational indeterminacies; a similar approach can be taken in a higher dimensional space. Finally, for the origin indeterminacy, we impose a restriction on the stimulus locations xj t such
that Jj=1 xj t = 0 for each dimension t (Erdem, 1996). In the next section, we briefly discuss a new Markov chain Monte Carlo algorithm for the proposed MDU model. 3.2. An MCMC Algorithm for the Proposed HDMDU Model The estimation of the model parameters proceeds by recursively sampling from the following full conditional distributions in Equations (5)–(23). Because of space limitations, we show only the full conditional distributions for the HDMDU model. Draws of model parameters from the MCMC algorithm can be obtained by recursively iterating the following seven steps: 1. Generate the individual and dimensional specific heterogeneity latent variable γit : I (γit = 1) ∼ Bin 1, P (γit = 1| ∼) ,
where the full conditional distribution is Li φt , where Li φt + Li (1 − φt ) 2 R T J 1 1 Li = (xj t − wit )2 γit − dir . √ exp − Yij r − 2 2π P (γit = 1| ∼) ∝
r=1 j =1
2. Generate the individual and situation specific additive parameter dir : P (dir | ∼) ∼ N d¯ir , τd2i , where
1 −1 τd2i = J + 2 and σdi J J T 1 d¯ir = Yij r − (xj t − wit )2 I (γit = 1) + 2 di τd2i . σdi j =1 j =1 t=1
3. Update the prior distribution parameter di : P (di | ∼) ∼ N (d¯i , Vdi ), where
R J JR 1 −1 Vdi = + 2 and d¯i = dir Vdi . σd2i σd σd2i r=1
4. Generate the ideal point parameters wit : A random-walk Metropolis–Hastings algorithm (n) is used to generate the ideal point parameters wit . Let wit denote a new candidate and (o) wit represent the old value from the previous iteration of the chain. Draw a random (n) (o) vector (scalar) wit = wit + κe, where κe is a draw from a candidate generating density (n) N (0, κ 2 ). Accept new value wit with probability:
(n) (n) (o) P (wit ) αw wit , wit = min , 1 , (o) P (wit )
− 1 (w(n) )2 (n) [L (Y |w , rest)] e 2τw2 it ir ij r r=1 it = , (o) (o) (o) R − 1 (w )2 P (wit ) r=1 [Lir (Yij r |wit , rest)] e 2τw2 it (n)
P (wit )
JOONWOOK PARK, PRIYALI RAJAGOPAL, AND WAYNE S. DESARBO
where rest indicates the other remaining parameters in the likelihood, and 2 J T 1 1 2 Lir = (xj t − wit ) γit − dir . √ exp − Yij r − 2 2π j =1
5. Generate the stimulus parameters xj t : As discussed above, the stimulus coordinates can be separated into two parts: one with the constraints where parameters need to be confined to the non-negative space and the other without these constraints. Let xj t (nc) be the unconstrained stimulus coordinates and xj t (c) be the constrained stimulus coordinates. For the proposed model, xj t (nc) and xj t (c) are randomly drawn from the respective posterior distribution iteratively and recursively as follows. First, a random-walk Metropolis– Hastings algorithm with a normal prior P (xj t (nc) ) ∼ N (0, τx2 ) is used to generate the (n) unconstrained parameter xj t (nc) . Let xj t (nc) denote a new candidate of the unconstrained (o)
parameter, and xj t (nc) be previous draw of xj t (nc) . A new candidate xj t (nc) is given by (n)
xj t (nc) = xj t (nc) + ωe, where is a draw from a candidate generating density N (0, ω2 ) Here, we calibrate ω so that the acceptance rate is around 30%, resulting in acceptable mixing probabilities as suggested by Gelman, Gilks, and Roberts (1996). Accept the new (n) candidate xj t (nc) with probability:
(n) P (xj t (nc) ) (o) = min , 1 , , x αxnc xj(n) t (nc) j t (nc) (o) P (xj t (nc) )
P (xj t (nc) )
− 12 (xj t (nc) )2 (n) i=1 Li (Yij r |xj t (nc) , rest) e 2τx , I (o) (o) − 1 (x )2 i=1 Li (Yij r |xj t (nc) , rest) e 2τx2 j t (nc)
P (xj t (nc) )
where rest indicates the other remaining parameters in the likelihood. Next, a random-walk Metropolis–Hastings algorithm with a Gamma prior P (xj t (c) ) ∼ G(sh0 , sc0 ) and a Gamma proposal are used to generate the constrained parameters xj t (c) . Let xj(n) t (c) denote a new candidate of the constrained parameter and (o)
xj t (c) be previous draw of xj t (c) . For the Gamma proposal, we reparameterize the shape (o)
parameter of the Gamma distribution kernel as k(xj t (c) )2 and the scale parameter as 1 (o) , kxj t (c)
so that the new candidate xj t (c) is the mean equal to the previous draw xj t (c)
and the variance
(Bradlow & Schmittlein, 2000). Therefore, a new candidate xj t (c) is (o)
1 (o) ) kxj t (c)
generated from G(k(xj t (c) )2 ,
and k is tuned to get an adequate acceptance rate.
Accept the new candidate xj t (c) with probability:
(n) P (xj t (c) ) (n) (o) , 1 , αxc xj t (c) , xj t (c) = min (o) P (xj t (c) ) (n)
i=1 Li (Yij r |xj t (c) , rest) = (o) (o) I P (xj t (c) ) i=1 Li (Yij r |xj t (c) , rest)
P (xj t (c) )
(o) sh2 −sh0 −xj t (c) ( sc2 − sc0 ) 1 ) e sh (x Γ (sh2 )sc2 2 j t (c) (n)
(n) sh1 −sh0 −xj t (c) 1 1 ) e ( sc1 sh (x Γ (sh1 )sc1 1 j t (c)
1 sc0 )
where (o) 2 sh1 = k xj t (c) ,
1 (o) kxj t (c)
(n) 2 sh2 = k xj t (c) ,
and sc2 =
1 (n) kxj t (c)
6. Update the hyperparameters τx−2 , σd−2 and τw−2 : i −2 J T − 2T 1 P τx | ∼ ∼ G + kx , 2 2
−1 xj2t + u−1 x
j =2T +1 t=1
R −1 −2 R 1 1 2 + kd , P σdi | ∼ ∼ G (dir − di ) + , 2 2 ud
I T −1 −2 IT 1 2 −1 + kw , P tw | ∼ ∼ G wit + uw . 2 2
7. Update the hyper parameter for dimension selection: P (φt | ∼) ∝ Beta at +
I (γit = 1), bt + I −
I (γit = 1) .
For model selection criteria, we use Newton and Raftery’s weighted estimate of the log marginal likelihood, pˆ 4 (D) (Newton & Raftery, 1994). Let Hk denote/index a specification of the models (k = 1, . . . , K). The marginal likelihood, Pr(Y |Hk ), is obtained by integrating the posterior probability over the parameter space in Equation (24): Pr(Y |Hk ) = Pr(Y |Θk , Hk )Pr(Θk |Hk ) dΘk , (24) where Pr(Y |Θk , Hk ) is the likelihood function given a specification Hk ; Y denotes data; Θk denotes parameters; and Pr(Θk |Hk ) is the prior density. This posterior density Pr(Y |Hk ) is called the marginal likelihood of the data. The logarithm of Pr(Y |Hk ) can be used as input for the Bayes factor7 (Kass & Raftery, 1995; Newton & Raftery, 1994) in model selection. As the Bayes factor needs pairwise comparisons among competing models, we report the log marginal likelihoods instead. In the next section, we discuss our results with the synthetic/simulated data discussed earlier to investigate the efficacy of the proposed MCMC algorithm for the proposed Bayesian MDU model. In addition, we devise two alternative Bayesian MDU models and present another Monte Carlo analysis with other three-way synthetic data (akin to what we present in the application). 3.3. Simulated Data Analysis Earlier, we illustrated how distorted joint space structures can be when derived from traditional MDU procedures in the presence of dimension selection heterogeneity. In this section, we first discuss the comparative performance of the proposed HDMDU procedure with the traditional two-way MDU models presented earlier with respect to the first simulated data. Next, 7 Also note that BIC (Bayesian information criterion) could be a rough approximation to the logarithm of the Bayes factor (see Kass & Raftery, 1995, for complete review on Bayes factor).
JOONWOOK PARK, PRIYALI RAJAGOPAL, AND WAYNE S. DESARBO TABLE 1. Goodness-of-fit results for the simulated data.
Overall HDMDU Weighted unfolding PREFSCAL ALSCAL
Configuration matching goodness-of-fit
RMSE Stimuli Ideal points Joint space
1.000 0.999 0.994 0.973 0.995 0.987 0.030 0.996 0.894 0.061 0.006 0.087 0.035 0.698
0.893 0.172 0.338 0.024 0.337 0.071 0.764 0.989 0.724 0.974 0.919 0.938 0.916 0.956
we discuss another simulated dataset contrasting the HDMDU model and other Bayesian MDU models for two- and three-way analysis. Returning to the earlier presented synthetic data illustration, we now wish to demonstrate the comparative performance of the proposed HDMDU model in recovering the true configuration shown in Figure 1. Figure 2(a) depicts the recovered joint space plot for the HDMDU model. We see excellent congruence with the true configuration in Figure 1. Table 1 displays the corresponding goodness-of-fit statistics for the four different models employed for the analysis of this synthetic data set. As seen, the HDMDU model is clearly superior in fit value vs. the other three traditional MDU methods. Table 1 also presents the recovery measures for the stimulus coordinates across all four competing MDU procedures (after appropriate transformation to optimal congruence with the true configuration for the three traditional methods).8 Here too, we see the superiority of the HDMDU method in recovering the true positioning of the stimuli and respective joint space structure. Thus, there seems to be a cost involved in utilizing traditional joint space MDU models in the presence of dimensional heterogeneity in terms of potentially serious distortions in the resulting joint spaces derived from such analyses. One might argue that the comparison between the proposed HDMDU model and traditional MDU models in this simulated data example is not a fair comparison since the data were generated directly from the HDMDU model. In addition, traditional MDU models are mostly deterministic while the proposed HDMDU model is parametric. As such, we devised two alternative Bayesian MDU models similar to the weighted unfolding and simple unfolding models expressed in Equations (1) and (2). Note that the Bayesian weighted multidimensional unfolding model (BWMDU hereafter) and Bayesian simple multidimensional unfolding model (BSMDU hereafter) also incorporate individual and situation specific additive constants, dir similar to the HDMDU model and can be estimated by similar MCMC procedures. Here, the scale parameter of the BWMDU model, cit , is constrained to be positive with a left truncated normal distribution (i.e., P (cit ) ∼ N (0, τc2 )I (cit > 0)) in order to prevent anti-ideal and saddle points. The purpose of this next simulation study is to examine the comparative performance of these three new Bayesian MDU models on data generated from each of these three different stochastic unfolding models. Three simulated data sets were generated consisting of I = 100 respondents evaluating J = 8 stimuli in R = 8 situations/replications where the data from the first seven situations are used for model calibration and the data for the last situation are to be used for model predictive validation (i.e., holdout data). The joint space is set to T = 2 dimensions. The first simulated data set was generated according to the HDMDU model, the second data were generated according to the BWMDU model, and the third data set was generated according to the BSMDU model. The data generation procedures are similar to the one described earlier in the manuscript differing 8 Note that the proposed HDMDU is uniquely determined as discussed in Section 3.1 As such, no transformation was applied to the results of HDMDU model.
PSYCHOMETRIKA TABLE 2. Log marginal likelihood of simulated data.
(a) Data with Dimensional Selection Heterogeneity (HDMDU) Dimension
HDMDU BWMDU BSMDU
−8304.2 −7385.1 −8453.1
−1113.9 −1220.5 −2523.6
−1238.6 −1349.7 −1610.0
−1176.6 −1056.7 −1208.9
−151.9 −174.8 −339.0
−171.7 −193.9 −223.3
(b) Data with Scale Heterogeneity (BWMDU) Dimension
HDMDU BWMDU BSMDU
−14010.0 −9296.4 −13929.0
−3352.8 −3014.9 −4215.9
−3425.5 −3054.9 −3438.9
−2294.4 −1601.6 −2298.8
−775.6 −745.8 −895.6
−794.7 −753.5 −812.3
(c) Data with Simple Euclidean Distance (BSMDU) Dimension
HDMDU BWMDU BSMDU
−13500.0 −8599.0 −13462.0
−1104.7 −1230.2 −1103.3
−1166.5 −1295.9 −1140.5
−1981.2 −1243.9 −1979.7
−157.4 −179.8 −156.8
−167.0 −193.2 −169.8
only with respect to the weight parameters (i.e., cit and γit ). Here, 41 subjects use both dimensions and the remaining 59 subjects used either Dimension 1 or Dimension 2. Next, we estimated the three different models (the HDMDU, BWMDU, and BSMDU models) for t = 1, . . . , 3 dimensions. Here, we executed the MCMC procedure for a total of 20,000 iterations, where the first 10,000 iterations were used for burn-in while the last 10,000 iterations were used for inference. Convergence was checked by starting the chain from multiple starting points and inspecting the iteration trace plots. For model and dimensionality selection, we select the best model by computing the posterior densities of the different model specifications for the observed data. Therefore, we increase t = 1, . . . , T by one and compare the posterior densities of these specifications. It should be noted that the maximum number of dimensions a subject can utilize would be T . For instance, if we fix T = 3 then the resulting stimulus maps would have the dimension at T = 3 while each subject could have dimensions varying from t = 1, 2, or 3. The simulation results show that all three models are properly identified, the chains converge, and true parameters can be recovered within sampling error. In addition, we calculate a “hit rate” of the dimensional heterogeneity parameter, γit , for the HDMDU model where we record whether the sampled γit is equal to the true value used in the simulation. Here, the hit rate is 99.49% for the first dimension and 100% for the second dimension. Table 2 reports the log marginal likelihoods for the three data sets and three models, respectively, for 1–3 dimensions. Table 2 shows some rather interesting results. First, as expected, all three models recover their true dimensionality (t = 2) for the data set where each model is assumed to be the true structure. Second, a cursory look at the log marginal likelihoods suggests that even more general models cannot outperform the simpler model if the simulated data structure is different from its underlying utility function. For instance, Table 2(c) shows the log marginal likelihoods of the three models for the data set where the BSMDU model is the true structure. Even though both the BWMDU model and HDMDU model can be regarded as more general types of MDU model, the log marginal likelihood of the BWMDU model (−1230.2) and that of the HDMDU
JOONWOOK PARK, PRIYALI RAJAGOPAL, AND WAYNE S. DESARBO
model (−1104.7) are lower than the log marginal likelihood of the BSMDU model (−1103.3).9 This same pattern can be confirmed across all three data sets regarding both calibration and prediction to the holdout sample. As shown in Table 2(a), when data include dimension selection heterogeneity, the BWMDU model outperforms the HDMDU model in the lower dimension (i.e., t = 1); but, its performance in the true dimension (i.e., t = 2) and in the higher dimension is inferior to that of HDMDU model. This clearly demonstrates that dimension selection heterogeneity is a rather unique type of heterogeneity that cannot be well approximated by the weighted multidimensional unfolding model. Third, the BSMDU model generally requires more dimensions than the true number of dimensions if the simulated data are generated from either the BWMDU or HDMDU model. That is, Table 2(a) and (b) show that the BSMDU model’s log marginal likelihood keeps increasing as the number of dimensions increases. This makes intuitive sense as the BSMDU model assumes the simplest form of all models, and thus it requires additional dimensions for attempting to accommodate either scale or dimension selection heterogeneity. Thus, these modest simulated data analyses reveal that the generalized version (i.e., BWMDU or HDMDU model) does not necessarily outperform the simpler model if the true data structure differs from its underlying utility function. However, the simpler model does seem to require additional dimensions to attempt to accommodate such heterogeneity. In addition, the weighted version of the Bayesian MDU model does not appear to perform well in recovering such dimensional selection heterogeneity.
4. The Consumer Psychology Application Here, we illustrate the three Bayesian MDU models with an empirical study of Over-theCounter (denoted as OTC hereafter) internal analgesics (for pain relief). The OTC analgesic market was projected at nearly $3 billion in 2007, with Tylenol as the market share leader (24.5%) followed by Advil (16.4%), Aleve (8.1%), Excedrin (7.1%), Bayer (6.5%), and Motrin (4.8%) (Mintel Report, 2007). We surveyed 140 respondents including students (99 respondents) and non-student adults (41 respondents) in a large southern metropolitan city in the US Respondents rated their preferences toward 11 brands (for sale in this area) including Tylenol, Advil, Aleve, Bayer, Motrin, Excedrin, Anacin, Bufferin, CVS Naproxen, Equate Acetaminophen, and Midol for resolving various maladies (consumption situations) including (a) Cold, Flu, and Fever reduction, (b) Arthritis or Joint pain, (c) Heart attack or Stroke prevention, (d) Muscle or Back ache, (e) Pre-menstrual or Period pain, (f) Regular or Migraine headache, and (g) Overall effectiveness using 7 point Likert scales (scaled from −3 to +3). The list of maladies was finalized based on a pretest with twenty-six respondents who reported all the different situations for which they used OTC analgesics as well as based on the different uses listed on the packaging for the different brands. Thus, the final data collected were three-way dominance type data. We initially preprocessed these preference data by double mean centering to minimize the predominance of row (i.e., usage volume) and column (i.e., brand share) main effects in the resulting spatial solution (see, Harshman & Lundy, 1984, 1985). The data for the first six remedies were used for model calibration, and the overall preference (i.e., the last replication/malady) was used for predictive model validation as a holdout sample. Similar to the previous simulated data analysis, we ran 20,000 MCMC iterations where the first 10,000 iterations were used for burn-in and the last 10,000 iterations were used for inference. We find that the chains converge relatively quickly for all three newly presented Bayesian MDU models,10 and we assess convergence by 9 Bayes factor between HDMDU and BSMDU would not be significant (see Kass & Raftery, 1995, for review of the Bayes factor). 10 The computing time for this data is approximately 3 hours run on a 3.3 GHZ computer with a Windows operating system.
PSYCHOMETRIKA TABLE 3. Log marginal likelihoods for the OTC analgesics data.
(a) Log marginal likelihood Dimension
HDMDU BWMDU BSMDU
−5942.7 −5879.0 −6012.7
−5604.6 −5651.8 −5717.6
−5268.6 −5369.9 −5444.6
−5273.4 −5536.6 −5314.4
−620.0 −590.7 −586.8
−560.8 −583.1 −559.9
−539.5 −545.0 −559.7
−525.5 −563.2 −543.9
(b) RMSE Dimension HDMDU BWMDU BSMDU
1.095 1.091 1.102
1.064 1.072 1.078
1.035 1.045 1.050
1.035 1.057 1.038
0.813 0.793 0.794
0.774 0.782 0.770
0.761 0.766 0.768
0.752 0.773 0.759
starting the chains from multiple starting points and inspected time series plots of model parameters. Table 3 presents the log marginal likelihood (Newton & Raftery, 1994) and the root mean squared error (RMSE) for the three models by dimensionality. As shown, the three-dimensional solution appears most parsimonious for the HDMDU and BWMDU models, while the BSMDU model requires more dimensions as the log marginal likelihood increases as the number of dimensions increases. This finding is similar to the result of our second simulated data analyses presented in the last section. This indicates that the OTC Analgesic preference data contain a structure that can be better explained more parsimoniously by the HDMDU or BWMDU models as compared to the BSMDU model. Furthermore, we find evidence that market structure for OTC analgesic data can be better represented by the HDMDU model as it shows the best calibration data fit (−5268.6 for the HDMDU model and −5369.9 for the BWMDU model). Note that the four-dimensional solution of the HDMDU model shows the best holdout sample fit (−525.5) rather than the three-dimensional solution. Nonetheless, the HDMDU model outperforms both the BWMDU and BSMDU models for both the calibration sample and the holdout sample in the three-dimensional solution. As such, our result suggests that respondents use different numbers of dimensions for their preference formation, and that the model which accommodates this phenomenon better depicts the underlying market structure. We devote the remainder of this section to discussing the results of the three-dimensional HDMDU solution. As is common in most MDU methods, the parameters of focal interest (e.g., brand and ideal point locations) can be better understood visually. Figure 3 shows the derived joint spaces for the three-dimensional HDMDU model in terms of three two-dimensional joint spaces constructed for every pair of dimensions extracted from the three dimensions. The derived joint space maps show that the 11 brands are well distributed throughout the four quadrants in each plot. Based upon relationships with various attribute and market share data available on the brand package and in the Mintel report, we related this information to each of the three extracted dimensions for interpretation purposes. On this basis, we find that Dimension 1 represents Market Share/Popularity as the three most popular brands (Advil, Tylenol, and Motrin) are located on the right side of the joint space. Dimension 2 seems to separate the Acetaminophen brands (Tylenol, Excedrin, and Equate) from the rest and relates to Active Ingredients. Finally, Dimension 3 separates the Generic Brands vs. the National Brands. Note that the ideal points displayed as dots in Figure 3 are subjects who use two of the three estimated dimensions and that the ideal points displayed as plus signs along each dimension are those that use one dimension only out
JOONWOOK PARK, PRIYALI RAJAGOPAL, AND WAYNE S. DESARBO
F IGURE 3. Derived joint space for the three-dimensional HDMDU model.
of the three estimated dimensions. We now focus on the dimensional selection variable γit as it provides information about how subjects utilize the various dimensions for their evaluations of brands. Figure 4 presents the histogram representation of the distribution of dimensional selection variable γit by dimension. As one can see, Dimension 2 is the most utilized/selected dimension, followed by Dimension 1, and Dimension 3. Specifically, by using the mode for the dimension selection indicator I (γit = 1) for the last 10,000 iterations, we find that Dimension 2 is utilized/selected by 87% of the respondents, followed by Dimension 1 (40%), and Dimension 3 (34%). As such, Dimension 2 (Active Ingredients) seems to be the primary attribute driver regarding the OTC analgesic market structure for these respondents across these various maladies. This makes imminent sense given that these active ingredients (aspirin, acetaminophen, ibuprofen, and sodium naproxen) have been well communicated to the public as to their effectiveness and side effects. In addition, individuals vary in terms of their own tolerance and derived effectiveness of each of these active ingredients (e.g., some individuals are allergic to aspirin; others experience stomach discomfort or bleeding with ibuprofen or sodium naproxen analgesics). As mentioned earlier, the dimensional selection variable γit is assumed to be independent a priori, and the first dimension does not need to explain the most variance in the data unlike in a deterministic joint space model such as MDPREF. Figure 5 shows the ideal points of respondents who utilize only one dimension (Dimension 2), where the one-dimensional ideal points are displayed as dots and brand positions are shown as squares. Figure 5 also exhibits the distribution of ideal points for Dimension 2. These respondents comprise approximately 40% of the total 140 respondents, and their ideal points are somewhat correlated to their involvement (r = 0.2498, p = 1.0564), which is not the case for those subjects who utilize more than one dimension. Past research has shown that greater involvement is correlated with greater knowledge (Brucks, 1985; Sujan, 1985), and greater knowledge results in the utilization of important and relevant attributes (Hutchinson & Alba, 1991). Consistent with these findings, respondents who utilize only Dimension 2 are likely to be respondents with higher levels of involvement who rely on information about active ingredients of the OTC formulation since this would arguably be the most important and relevant attribute for brand choice, as compared to Market Share and National Brand Status. Figure 5 also shows that brands with negative loadings are the top leading brands (e.g., Tylenol) while brands on the right hand side are less popular brands (e.g., Anacin), implying that subjects who utilize only Dimension 2 show greater preference for lower share brands. Since involved subjects possess greater product knowledge and utilize relevant attributes, they are more likely to use their knowledge about active ingredients while forming brand preferences instead of relying on more superficial heuristics such as Popularity/Market Share. Therefore, they exhibit greater preference for the cheaper generic, low share brands whose ingredients are often identical to their national brand counterparts. Compared to Dimension 2, the γit for both Dimensions 1 and 3 show U shaped distributions (Figure 4) which suggests that some subjects who utilize either Dimension 1 or Dimension 3 exhibit usage of peripheral cues such as National Brands vs. Generic Brands and/or High Market Share Brands vs. Low Market Share Brands. This pattern of results is consistent with the notion that respondents who utilize Dimensions 1 or 3 are likely to be low involvement subjects who have limited knowledge about the OTC category, and therefore rely on peripheral and salient cues like market share and brand names while constructing their preferences (e.g., Hutchinson & Alba, 1991; Petty, Cacioppo, & Schumann, 1983). Next, we examine the subject characteristics of selective dimension usage. Figure 6 shows the individual level scatter plot of the overall variance of brand evaluations across situation and the sum of the estimated dimensional selection variable γit by subject. As one can see, there exists a significant positive correlation between the two factors (r = 0.3638, p < 0.001) This suggests that subjects who show a relatively extreme pattern of preference tend to utilize more dimensions.
JOONWOOK PARK, PRIYALI RAJAGOPAL, AND WAYNE S. DESARBO
F IGURE 4. Distribution of the dimensional selection variable γit .
F IGURE 5. One-dimensional plot for Dimension 2 and ideal points.
F IGURE 6. Overall variance vs. Dimensional selection variable.
Finally, we discuss the relative contribution of the individual and situation specific additive constants dir . As shown in Figure 7, this additive constant dir has a unimodal distribution with a positive skew. We expected that the situation specific parameter dir would capture additional variation across situations not explained by the squared Euclidean distance term. As seen in Figure 8, respondents with high situation specific constants dir exhibit relatively extreme preference structures as the variance across brands is high among these respondents. Thus, subjects who attempt to match brands to specific consumption situations are likely to have greater variance in their preferences and brand choices than subjects who do not so attempt to match brands and consumption situations. This finding is consistent with past research on the effects of consump-
JOONWOOK PARK, PRIYALI RAJAGOPAL, AND WAYNE S. DESARBO
F IGURE 7. Distribution of the situation specific additive constant dir .
F IGURE 8. Variance across brands vs. dir .
tion situations on brand preference and choice (e.g., Miller & Ginter, 1979; Yang, Allenby, & Fennell, 2002).
5. Discussion Over the past five decades, various types of multidimensional unfolding method have been developed for the analysis of dominance data for use in the various social sciences. Little attention has been given to the possibility that subjects may show heterogeneity in information usage (e.g., Bettman, Luce, & Payne, 1998), and the possible impact this may have on the correspond-
ing estimated joint space maps. We address this important issue and develop a new Bayesian MDU model for the analysis of two- or three-way preference data. This new Bayesian model explicitly accommodates dimensional selection and preference heterogeneity in a unified framework. Specifically, we treat dimensional selection at the individual level and show (vis-a-vis synthetic data analyses and comparisons with three traditional MDU models) that one may encounter biased estimation of parameters and distorted joint space structures if such dimensional selection heterogeneity is ignored. We have presented the details of the priors, likelihood, and posterior joint density, as well as the description of the full conditional distributions for the new Bayesian MDU model. A new MCMC estimation procedure has been described as well. A small Monte Carlo analysis with synthetic data was presented where three newly presented Bayesian MDU models were run with various data sets created from each of the models and comparisons in recovery made. This simulation study shows that dimensional selection heterogeneity cannot be well approximated by a weighted multidimensional unfolding model. We then applied the proposed Bayesian multidimensional unfolding models to a consumer psychology application concerning subjects’ preference toward Over-the-Counter analgesics over a number of typical maladies. The results demonstrate that a model that incorporates dimensional selection heterogeneity outperforms models without dimensional selection heterogeneity. The proposed HDMDU model provides information regarding how individual subjects utilize each dimension, and thus helps to identify the major dimension(s) that could maximize the effect of a change of stimulus/brand position in subjects’ minds. Our HDMDU model also shows that subjects with extreme preference structures tend to utilize more dimensions. Opportunities for future research merit final discussion. The focus of the proposed model has been to represent subjects’ dominance judgments in a derived joint space. Application of the proposed model for the analysis of binary choice data would be an interesting extension. The explicit incorporation of stimulus attributes and individual subject background characteristics in the model as in GENFOLD2 (DeSarbo & Rao, 1986) would also merit further consideration. Further, more extensive Monte Carlo simulations need to be performed to more thoroughly examine the performance of the proposed Bayesian MDS model in comparison with traditional MDU models and the proposed other two Bayesian MDU models. Finally, further empirical applications need to be performed across a wider variety of applications. References Alba, J.W., & Hutchinson, J.W. (1987). Dimensions of consumer expertise. Journal of Consumer Research, 13, 411–454. Baumgartner, H., & Steenkamp, J.E.M. (2001). Response styles in marketing research: A cross-national investigation. Journal of Marketing Research, 38, 143–156. Benzécri, J.P. (1973). L’analyse des données: Tome II. Analyse de correspondances. Paris: Dunod. Benzécri, J.P. (1992). Correspondence analysis handbook. New York: Dekker. Bettman, J.R., Luce, M.F., & Payne, J.W. (1998). Constructive consumer choice processes. Journal of Consumer Research, 25, 187–217. Borg, I., & Groenen, P.J.F. (2005). Modern multidimensional scaling: Theory & application (2nd ed.). New York: Springer. Bradlow, E.T., & Schmittlein, D.C. (2000). The little engines that could: Modeling the performance of world wide web search engines. Marketing Science, 19(1), 43–62. Brucks, M. (1985). The effects of product class knowledge on information search behavior. Journal of Consumer Research, 12, 1–16. Busing, F.M.T.A., Groenen, P.J.F., & Heiser, W.J. (2005). Avoiding degeneracy in multidimensional unfolding by penalizing on the coefficient of variation. Psychometrika, 70, 71–98. Carlin, B.P., & Chib, S. (1995). Bayesian model choice via Markov chain Monte Carlo. Journal of the Royal Statistical Society, Series B, 57(3), 473–484. Carroll, J.D. (1972). Individual differences and multidimensional scaling. In R.N. Shepard, A.K. Romney, & S. Nerlove (Eds.), Multidimensional scaling: Theory & applications in the behavior sciences: Theory (Vol. I, pp. 105–155). New York: Seminar Press. Carroll, J.D. (1980). Models and methods for multidimensional analysis of preferential choice (or other dominance) data. In E.D. Lantermann & H. Feger (Eds.), Similarity and choice (pp. 234–289). Vienna: Hans Huber Publishers.
JOONWOOK PARK, PRIYALI RAJAGOPAL, AND WAYNE S. DESARBO
Carroll, J.D., & Arabie, P. (1980). Multidimensional scaling. Annual Review of Psychology, 31, 607–649. Chib, S. (2002). Markov chain Monte Carlo methods. In S.J. Press (Ed.), Subjective and objective Bayesian statistics (2nd ed., pp. 119–171) New York: Wiley. Chintagunta, P.K. (1994). Heterogeneous logit model implications for brand positioning. Journal of Marketing Research, 31(2), 304–311. Churchill, G.A., & Peter, J.P. (1984). Research design effects on the reliability of rating scales: A meta analysis. Journal of Marketing Research, 21(4), 360–375. Coombs, C.H. (1960). A theory of data. Psychological Review, 67(3), 143–159. Cox, T.F., & Cox, M.A. (2001). Multidimensional scaling (2nd ed.). London: Chapman & Hall. DeSarbo, W.S., & Carroll, J.D. (1985). Three-way metric unfolding via weighted least-squares. Psychometrika, 50, 275– 300. DeSarbo, W.S., & Hoffman, D. (1986). Simple and weighted unfolding MDS threshold models for the spatial analysis of binary data. Applied Psychological Measurement, 10, 247–264. DeSarbo, W.S., Kim, Y., & Fong, D.K.H. (1999). A Bayesian multidimensional scaling procedure for the spatial analysis of revealed choice data. Journal of Econometrics, 89(1–2), 79–108. DeSarbo, W.S., Kim, Y., Wedel, M., & Fong, D.K.H. (1998). A Bayesian approach to the spatial representation of market structure from consumer choice data. European Journal of Operational Research, 111(2), 285–305. DeSarbo, W.S., & Rao, V.R. (1984). GENFOLD2: A set of models and algorithms for the GENeral UnFOLDing analysis of preference/dominance data. Journal of Classification, 2, 147–168. DeSarbo, W.S., & Rao, V.R. (1986). A constrained unfolding methodology for product positioning. Marketing Science, 5(1), 1–19. DeSarbo, W.S., Young, M.R., & Rangaswamy, A. (1997). A parametric multidimensional unfolding procedure for incomplete nonmetric preference/choice set data in marketing research. Journal of Marketing Research, 34, 499–516. Diebolt, J., & Robert, C.P. (1994). Estimation of finite mixture distributions through Bayesian sampling. Journal of the Royal Statistical Society, Series B, 56(2), 363–375. Elrod, T. (1988). Choice map: Inferring a product-market map from panel data. Marketing Science, 7(1), 21–40. Elrod, T., & Keane, M.P. (1995). A factor-analytic model for representing the market structure in panel data. Journal of Marketing Research, 32, 1–16. Erdem, T. (1996). A dynamic analysis of market structure based on panel data. Marketing Science, 15(4), 359–378. Feldman, J.M., & Lynch, J.G. (1988). Self-generated validity and other effects of measurement on belief, attitude, intention, and behavior. Journal of Applied Psychology, 73(3), 421–435. Fong, D.K.H., DeSarbo, W.S., Park, J., & Scott, C.J. (2010). A Bayesian vector multidimensional scaling procedure for the analysis of ordered preference data. Journal of the American Statistical Society, 105(490), 482–492. Friedman, H.H., Friedman, L.W., & Gluck, B. (1988). The effects of scale-checking styles on responses to a semantic differential scale. Journal of the Market Research Society, 30(4), 477–481. Gelfand, A.E., & Smith, A.F.M. (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85, 398–409. Gelman, A., Gilks, W.R., & Roberts, G.O. (Eds.) (1996). Efficient Metropolis jumping rules (Vol. 5). Oxford: Oxford University Press. George, E.I., & McCulloch, R.E. (1993). Variable selection via Gibbs sampling. Journal of the American Statistical Association, 88, 881–889. George, E.I., & McCulloch, R.E. (1997). Approaches for Bayesian variable selection. Statistica Sinica, 7(2), 339–373. George, E.I., McCulloch, R.E., & Tsay, R.S. (1995). Two approaches to Bayesian model selection with applications. In D. Berry, K. Chaloner, & J. Geweke (Eds.), Bayesian statistics and econometrics: Essays in honor of Arnold Zellner (pp. 339–348). New York: Wiley. Geweke, J. (1996). Monte Carlo simulation and numerical integration. In H.M. Amman, D.A. Kendrick, & J. Rust (Eds.), Handbook of computational economics (Vol. 1, pp. 731–800). New York: Elsevier. Gifi, A. (1990). Nonlinear multivariate analysis. New York: Wiley. Gilbride, T.J., Allenby, G.M., & Brazell, J.D. (2006). Models for heterogeneous variable selection. Journal of Marketing Research, 43(3), 420–430. Greenacre, M.J. (1984). Theory and applications of correspondence analysis. New York: Academic Press. Greenacre, M.J., & Browne, M.W. (1986). An efficient alternating least-squares algorithm to perform multidimensional unfolding. Psychometrika, 51, 241–250. Harshman, R.A., & Lundy, M.E. (1984). Data preprocessing and the extended PARAFAC model. In H.G. Law, C.W. Snyder, J.A. Hattie, & R.P. McDonald (Eds.), Research methods for multimode data analysis (pp. 216–284). New York: Praeger. Harshman, R.A., & Lundy, M.E. (1985). The preprocessing controversy: An exchange of papers between Kroonenberg, Harshman and Lundy (Technical Report). London, Ontario: University of Western Ontario, Department of Psychology. Heiser, W.J. (1981). Unfolding analysis of proximity data (Unpublished doctoral dissertation). University of Leiden. Heiser, W.J. (1989). The city-block model for three-way multidimensional scaling. In R. Coppi & S. Bolasco (Eds.), Multiway data analysis (pp. 395–404). Amsterdam: North-Holland. Houston, D.A., & Sherman, S.J. (1995). Cancellation and focus: The role of shared and unique features in the choice process. Journal of Experimental Social Psychology, 31(4), 357–378. Hutchinson, W.J., & Alba, J.W. (1991). Ignoring irrelevant information: Situational determinants of consumer learning. Journal of Consumer Research, 18, 325–345.
Isen, A.M. (1993). Positive affect and decision making. In M. Lewis & J. Haviland (Eds.), Handbook of emotions (pp. 261–273). New York: Guilford. Isen, A.M., Daubman, K.A., & Nowicki, G.P. (1987). Positive affect facilitates creative problem solving. Journal of Personality and Social Psychology, 52(6), 1122–1131. Johnson, K.E., & Mervis, C.B. (1997). Effects of varying levels of expertise on the basic level of categorization. Journal of Experimental Psychology: General, 126, 248–277. Kass, R.E., & Raftery, A.E. (1995). Bayes factors. Journal of the American Statistical Association, 90(430), 773–795. Kunda, Z. (1990). The case for motivated reasoning. Psychological Bulletin, 108, 480–498. Kruskal, J.B. (1964a). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29, 1–27. Kruskal, J.B. (1964b). Nonmetric multidimensional scaling: A numerical method. Psychometrika, 29, 115–129. Kruskal, J.B., & Carroll, J.D. (1969). Geometric models and badness of fit functions. In P.R. Krishnaiah (Ed.), Multivariate analysis (Vol. II). New York: Academic Press. Kuo, L., & Mallick, B. (1988). Variable selection for regression models. Sankhya: The Indian Journal of Statistics, Series B, 60, 65–81. Lingoes, J.C. (1972). A general survey of the Guttman–Lingoes nonmetric program series. In R.N. Shepard, A.K. Romney, & S. Nerlove (Eds.), Theory and applications in the behavior sciences: Theory (Vol. I, pp. 49–68). New York: Seminar Press. Lingoes, J.C. (1973). The Guttman–Lingoes nonmetric program series. Ann Arbor: Mathesis Press. Miller, K.E., & Ginter, J.L. (1979). An investigation of situational variation in brand choice behavior and attitude. Journal of Marketing Research, 16, 11–123. Mitchell, T.J., & Beauchamp, J.J. (1988). Bayesian variable selection in linear regression. Journal of the American Statistical Association, 83, 1023–1032. Newton, M.A., & Raftery, A.E. (1994). Approximate Bayesian inference with the weighted likelihood bootstrap. Journal of the Royal Statistical Society, Series B, 56, 3–48. Nishisato, S. (1980). Analysis of categorical data: Dual scaling and its applications. Toronto: University of Toronto Press. Okada, K., & Shigemasu, K. (2009). BMDS: A collection of R functions for Bayesian multidimensional scaling. Applied Psychological Measurement, 33(7), 570–571. Oh, M., & Raftery, A.E. (2001). Bayesian multidimensional scaling and choice of dimension. Journal of the American Statistical Association, 96(455), 1031–1044. Park, J., DeSarbo, W.S., & Liechty, J. (2008). A hierarchical Bayesian multidimensional scaling methodology for accommodating both structural and preference heterogeneity. Psychometrika, 73, 451–472. Payne, J.W., Bettman, J.R., & Johnson, E. (1993). The adaptive decision maker. Cambridge: Cambridge University Press. Payne, J.W., Bettman, J.R., & Luce, M.F. (1996). When time is money: Decision behavior under opportunity-cost time pressure. Organizational Behavior and Human Decision Processes, 66, 131–152. Petty, R.E., & Cacioppo, J.T. (1986). Communication and persuasion: Central and peripheral routes to attitude change. New York: Springer. Petty, R.E., Cacioppo, J.T., & Goldman, R. (1981). Personal involvement as a determinant of argument based persuasion. Journal of Personality and Social Psychology, 41, 847–855. Petty, R.E., Cacioppo, J.T., & Schumann, D. (1983). Central and peripheral routes for advertising effectiveness: The moderating role of involvement. Journal of Consumer Research, 10, 135–144. Roskam, E.E. (1973). Fitting ordinal relational data to a hypothesized structure (Technical Report No. 73MA06). Nijmegen: Catholic University. Schönemann, P.H. (1970). On metric multidimensional unfolding. Psychometrika, 35, 349–366. Schwarz, N., Strack, F., Muller, G., & Chassein, B. (1988). The range of response alternatives may determine the meaning of the question: Further evidence on informative functions of response alternatives. Social Cognition, 6, 107–117. Slater, P. (1960). Inconsistencies in a schedule of paired comparisons. Biometrika, 48, 303–312. Sterngold, A., Warland, R.H., & Herrmann, R. (1994). Do surveys overstate public concerns? Public Opinion Quarterly, 58, 255–263. Sujan, M. (1985). Consumer knowledge: Effects on evaluation strategies mediating consumer judgments. Journal of Consumer Research, 12, 31–46. Takane, Y., Young, F.W., & de Leeuw, J. (1977). Nonmetric individual differences multidimensional scaling: An alternating least-squares method with optimal scaling features. Psychometrika, 42(1), 7–67. Tanner, M.A., & Wong, W.H. (1987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association, 82, 528–540. Tucker, L.R. (1960). Intra-individual and inter-individual multidimensionality. In H. Gulliksen & S. Messick (Eds.), Psychological scaling: Theory and applications (pp. 110–123). New York: Wiley. Vincente, K.J., & Wang, J.H. (1998). An ecological theory of expertise effects in memory recall. Psychological Review, 105, 35–57. Wedel, M., & DeSarbo, W.S. (1996). An exponential-family multidimensional scaling mixture methodology. Journal of Business and Economic Statistics, 14(4), 447–459. Wedel, M., & Kamakura, W. (2000). Market segmentation: Conceptual and methodological foundations. Boston: Kluwer Academic. Wright, P., & Weitz, B. (1977). Time horizon effects on product evaluation strategies. Journal of Marketing Research, 14, 429–443.
JOONWOOK PARK, PRIYALI RAJAGOPAL, AND WAYNE S. DESARBO
Yang, S., Allenby, G., & Fennell, G. (2002). Modeling variation in brand preference: The roles of objective environment and motivating conditions. Marketing Science, 21, 14–31. Young, F.W., & Hamer, R.M. (1987). Multidimensional scaling: History, theory and applications. Hillsdale: Lawrence Erlbaum Associates. Young, F.W., & Torgerson, W.S. (1967). TORSCA, a FORTRAN IV program for Shepard-Kruskal multidimensional scaling analysis. Behavioral Science, 12(6), 498. Manuscript Received: 31 AUG 2010 Final Version Received: 18 JUN 2011 Published Online Date: 24 FEB 2012