Data Preparation

0 downloads 0 Views 1MB Size Report
that was formulated in the research design phase. .... Marketing Research: An Applied Orientation, Naresh Malhotra, Pearson. Publications, 6th Edition. B.R.C.M. ...
Data Preparation Mrunal Joshi

B.R.C.M. College of Business Administration Mrunal Joshi

Data Preparation • Before the raw data contained in the questionnaires can be subjected to statistical analysis, they must be converted into a form suitable for analysis. • The quality of statistical results depends on the care exercised in the data-preparation phase. • Paying inadequate attention to data preparation can seriously compromise statistical results, leading to biased findings and incorrect interpretation.

B.R.C.M. College of Business Administration Mrunal Joshi

Data-Preparation Process Preparing Preliminary Plan of Data Analysis Questionnaire Checking

Editing

Coding B.R.C.M. College of Business Administration Mrunal Joshi

Transcribing

Data Cleaning

Statistically Adjusting the data Selecting the data analysis strategy B.R.C.M. College of Business Administration Mrunal Joshi

1. Preparing Preliminary Plan of Data Analysis • The entire process is guided by the preliminary plan of data analysis that was formulated in the research design phase. • Formulated in Research Design

B.R.C.M. College of Business Administration Mrunal Joshi

B.R.C.M. College of Business Administration Mrunal Joshi

2. Questionnaire Checking • all questionnaires for completeness and interviewing quality 1. Parts of the questionnaire may be incomplete. 2. The pattern of responses may indicate that the respondent did not understand or follow the instructions. For example, skip patterns may not have been followed. 3. The responses show little variance. For example, a respondent has checked only 4s on a series of 7-point rating scales. 4. The returned questionnaire is physically incomplete: one or more pages are missing. 5. The questionnaire is received after the pre-established cutoff date. 6. The questionnaire is answered by someone who does not qualify for participation. B.R.C.M. College of Business Administration Mrunal Joshi

3. Editing • is the review of the questionnaires with the objective of increasing accuracy and precision. • identify illegible (highly varied answers), incomplete, inconsistent (e.g. mismatch between income and expenditure), or ambiguous responses (multiple answers in structured questions and problems with unstructured questions) • Treatment to Unsatisfactory Responses • Returning to the field • Assigning missing values (if less number of responses or not a key variable) • Discarding unsatisfactory respondents (should not affect required number of sample, missing information about key variable, adopt required procedure and of Business Administration report the number of it) B.R.C.M. CollegeMrunal Joshi

4. Coding • assigning a code to each possible response to each question • codes are assigned before fieldwork is conducted • Easy to code structured close-ended questions • Standard code for missing value i.e. 9 or 99 • Single column for only one response e.g. gender, occupation • Multiple response – column for all possible responses e.g. sources of information, accounts with banks • unstructured open-ended questions, codes are assigned after fieldwork – more complex – consider previous work or theoretical framework – identify categories based on initial 50-100 responses – mutually exclusive and collectively exhaustive categories – additional code “others” or “none of the above” for less than 10% responses – if categories are predefine B.R.C.M. College of Business Administration Mrunal Joshi analysis would be limited by those categories.

• Develop a data file • a data file may be viewed as an n x m matrix of numbers or values, where n is the number of respondents and m is the number of variables or Fields • prepare a codebook containing the coding instructions and the necessary information about the variables in the data set • Use spread sheet program like EXCEL – where row = cases or respondents and column = variables or responses • Table 14.1 gives the data from a pretest sample of 20 respondents on preferences for restaurants • Each respondent was asked to rate preference to eat in a familiar restaurant (1 = Weak Preference, 7 = Strong Preference), and to rate the restaurant in terms of quality of food, quantity of portions, value, and service (1= Poor, 7 = Excellent). Annual household income was also obtained and coded as: 1 = Less than $20,000; 2 = $20,000 to 34,999; 3 = $35,000 to 49,999; 4 = $50,000 to 74,999; 5 = $75,000 to 99,999; 6 = $100,000 or more. The codebook for coding these data is given in Figure 14.2. B.R.C.M. College of Business Administration Mrunal Joshi

B.R.C.M. College of Business Administration Mrunal Joshi

Code Book

B.R.C.M. College of Business Administration Mrunal Joshi

B.R.C.M. College of Business Administration Mrunal Joshi

5. Transcribing • Transcribing data involves transferring the coded data from the questionnaires or coding sheets onto disks or directly into computers by keypunching or other means. • Unnecessary if computer-assisted telephone interviewing (CATI) or CAPI, or Internet surveys is used – but verification can be done • optical recognition – optical scanner (OMR, OCR, QRC), digital technologies, bar codes, or other technologies • Computerized sensory analysis systems, which automate the datacollection process • Voice recognition and voice response systems B.R.C.M. College of Business Administration Mrunal Joshi

B.R.C.M. College of Business Administration Mrunal Joshi

6. Data Cleaning • the checks at this stage are more thorough and extensive, because they are made by computer • Consistency Check: out of range, logically inconsistent, or have extreme Values • Out-of-range data values are inadmissible and must be corrected - identify out-of-range values for each variable and print out the respondent code, variable code, variable name, record number, column number, and out-ofrange value

B.R.C.M. College of Business Administration Mrunal Joshi

• Treatment of missing response: 1. Substitute a Neutral Values – mean response 2. Substitute an Imputed Response – done statistically by determining the relationship of the variable 3. Casewise deletion 4. Pairwise deletion - uses only the cases or respondents with complete responses for the variable(s) involved in each calculation appropriate when (1) the sample size is large, (2) there are few missing responses, and (3) the variables are not highly related

B.R.C.M. College of Business Administration Mrunal Joshi

7. Statistically Adjusting the Data • Weighing - each case or respondent in the database is assigned a weight to reflect its importance relative to other cases or respondents – e.g. more weight to response of heavy user • Variable Specification - involves the transformation of data t • create new variables or • modify existing variables

• e.g. reduced categories for the same variable or create new variable using some calculation e.g. average marks, ratio (Expense / Income) • use of Dummy Variable also known as binary, dichotomous, instrumental, or qualitative variables - values are 0 or 1 B.R.C.M. of Business Administration • variable with K categories, K College - 1Mrunal dummy variables are needed Joshi

• Scale Transformation – involves a manipulation of scale values to ensure comparability with other scales or otherwise make the data suitable for analysis • to make comparisons across the measurement scales (5 point and 7 point) • Correct the differences of respondents in using upper end or lower ends of scales through appropriate transformation • Standardisation – allows to compare variables that have been measured ത using different types of scales i.e. zi = (Xi - 𝑋)/s B.R.C.M. College of Business Administration Mrunal Joshi

8. Selecting a Data Analysis Strategy

B.R.C.M. College of Business Administration Mrunal Joshi

B.R.C.M. College of Business Administration Mrunal Joshi

B.R.C.M. College of Business Administration Mrunal Joshi

Statistical Techniques • Univariate techniques are appropriate when there is a single measurement of each element in the sample, or there are several measurements of each element but each variable is analyzed in isolation. • Multivariate techniques, on the other hand, are suitable for analyzing data when there are two or more measurements of each element and the variables are analyzed simultaneously. Multivariate techniques are concerned with the simultaneous relationships among two or more phenomena.

B.R.C.M. College of Business Administration Mrunal Joshi

• Metric data are measured on an interval or ratio scale. • Nonmetric data are measured on a nominal or ordinal scale. • The samples are independent if they are drawn randomly from different populations. • The samples are paired when the data for the two samples relate to the same group of respondents. • Dependence techniques are appropriate when one or more variables can be identified as dependent variables and the remaining as independent variables. • In interdependence techniques, the variables are not classified as dependent or independent; rather, the whole set of interdependent relationships is examined. B.R.C.M. College of Business Administration Mrunal Joshi

Bibliography Marketing Research: An Applied Orientation, Naresh Malhotra, Pearson Publications, 6th Edition

B.R.C.M. College of Business Administration Mrunal Joshi