The Simple Difference Formula: An Approach to ... - SAGE Journals

0 downloads 0 Views 934KB Size Report
Dec 9, 2015 - Effect sizes are a key issue in teaching statistics in psychology. An important ..... of the Pearson r sometimes referred to as phi. The concepts of ...
COMPREHENSIVE PSYCHOLOGY 2014, Volume 3, Article 1 ISSN 2165-2228 DOI 10.2466/11.IT.3.1 © Dave S. Kerby 2014 Attribution-NonCommercialNoDerivs CC-BY-NC-ND

NOTICE

Received May 27, 2014 Accepted December 27, 2014 Published February 14, 2014 Re-licensed December 9, 2015

This Open Access article originally appeared in Innovative Teaching, published by Ammons Scientific LTD. It is reproduced in the following pages in that form. CITATION Kerby, D. S. (2014) The simple difference formula: an approach to teaching nonparametric correlation. Comprehensive Psychology, 3, 1.

Innovative Teaching was sold to SAGE Publishing Inc., and will not be published after December 31, 2015. With their permission, the authors are hereby issued a new Creative Commons license for the article to be included in Comprehensive Psychology. In this manner, the article can continue to be accessed and cited in an active Open Access journal, operated now by SAGE Publishing Inc. This article should be cited as a part of Comprehensive Psychology in the format listed in the side bar of this cover page. The original DOI has not changed.

Ammons Scientific www.AmSci.com

The simple difference formula: an approach to teaching nonparametric correlation1

INNOVATIVE TEACHING 2014, Volume 3, Article 1 ISSN 2165-2236

Dave S. Kerby Oklahoma Department of Corrections

DOI 10.2466/11.IT.3.1 © Dave S. Kerby 2014 Attribution-NonCommercialNoDerivs CC-BY-NC-ND

Abstract

Although teaching effect sizes is important, many statistics texts omit the topic for the Mann-Whitney U test and the Wilcoxon signed-rank test. To address this omission, this paper introduces the simple difference formula. The formula states that the correlation equals the simple difference between the proportion of favorable and unfavorable evidence; in symbols this is r = f – u. For the MannWhitney U, the evidence consists of pairs. For the signed-rank test, the evidence consists of rank sums. Also, the formula applies to the Binomial Effect Size Display. The formula r = f – u means that a correlation r can yield a prediction so that the proportion correct is f and the proportion incorrect is u.

Received May 27, 2014 Accepted December 27, 2014 Published February 14, 2014

CITATION Kerby, D. S. (2014) The simple difference formula: an approach to teaching nonparametric correlation. Innovative Teaching, 3, 1.

Effect sizes are a key issue in teaching statistics in psychology. An important early statement of this fact was by Jacob Cohen in 1962. Surveying published studies, he found that researchers had little chance of rejecting the null, “unless the effect they sought was large” (Cohen, 1962, p. 151). He concluded that the low power seen in much published research was due to a lack of awareness of effect sizes. The issue of effect sizes and power was part of the debate in the 1990s over the value of the null hypothesis decision strategy, and the American Psychological Association addressed the debate with a Task Force on Statistical Inference. When the Task Force issued its report in 1999, one key recommendation echoed Cohen's concern about the importance of effect sizes. “Always present effect sizes for primary outcomes” (Wilkinson & APA Task Force, 1999, p. 599). In the wake of this recommendation, textbooks on statistics in psychology have paid more attention to the topic of effect sizes; however, these textbooks still have gaps. For example, many popular texts do not mention effect sizes for two common nonparametric procedures: the Mann-Whitney U, and the Wilcoxon signed-rank test (e.g., Kirk, 2008; Nolan & Heinzen, 2008; Kiess & Green, 2010; Aron, Aron, & Coups, 2011; Spatz, 2011; Howells, 2012; Privitera, 2012; Gravetter & Wallnau, 2013). Thus, teachers face a challenge in teaching effect sizes for at least two simple methods usually covered in the introductory course in statistics. The goal of this paper is to present an effect size formula for two commonly used nonparametric methods. The simplicity of the formula provides insight into the meaning of rank correlation, and it permits its use in other psychology classes to convey a meaningful sense of the size of an effect.

Effect Size for the Mann-Whitney U

Frank Wilcoxon (1945) developed a now widely used nonparametric test called the rank-sum test. The test assigns ranks to all the scores considered as one group, then sums the ranks of each group. The null hypothesis is that the two samples come from the same population, so any difference in the two rank sums comes only from sampling error. The rank-sum test is often described as the nonparametric version of the t test for two independent groups. A mathematically equivalent version of the rank-sum test is the Mann-Whitney U test (Mann & Whitney, 1947). In their original paper, Mann and Whitney defined U as a count of the number of times that a y score (a score from Group 1) precedes in rank order an x score (a score from Group 2). They illustrated the idea with an example of a drug study, with rats assigned to either a treatment group or to a control group; the

Ammons Scientific

www.AmmonsScientific.com

Address correspondence to Dave S. Kerby, Kerby Behavioral Development, 2205 Cardinal Lane, McAlester OK 74464 or e-mail ([email protected]). 1

Simple Difference Formula / D. S. Kerby

hypothesis was that the rats given the drug would live longer. U is computed twice, once for each group; and the test statistic is the smaller of the two. The value of each U can be computed from the rank sums. For each group, U is equal to the observed rank sum, ΣR, minus the minimum value that the rank sum could be for the group size, ΣRmin. The formulas for U can be expressed most simply as follows:

U 1 = ΣR1 − ΣRmin 1

(1)



U 2 = ΣR2 − ΣRmin 2 

(2)

students. First, the terms are informative. To say that an x score precedes in rank order a y score is accurate, but it is more informative to say that the pair is favorable to the hypothesis. The second reason is that the terms are general, and they can be used in other contexts, such as with the Wilcoxon signed-rank test, as will be shown below. To use the Mann and Whitney (1947) example, if a rat in the treatment group does live longer than a rat in the control group, then these two rats are said to form a favorable pair.

The Definition of U

Finding the test statistic U requires two steps. First, compute the number of favorable and unfavorable pairs; or what is the same thing, compute U1 and U2, as defined in Equations 1 and 2. Second, select the smaller of the two numbers; this smaller number is the test statistic U. An easy way to teach these steps to students is with a structured data table, as shown in Table 1. The first column lists the name or ID of each participant, as a reminder that each person has one score. The second column lists the scores, and an important part of the structure of the table is that the scores are in rank order; the scores can be in descending or ascending order, whichever seems most convenient for the problem at hand. The last two columns list the ranks, with one column for the treatment group and one column for the control group. The hypothetical data in Table 1 concern running times for an 800-meter run. The hypothesis is that wind-sprint training will yield faster runners than a control method of training. With the data structured in this manner, students may now examine the pairs. An easy way to teach pair formation is to draw a line from a rank in the experimental group to a rank in the control group. First, consider the favorable pairs. In the current example, a favorable pair exists when the runner in

Replacing ΣRmin with an expression using only the number in each group, another still quite simple way to compute U is with the formulas below:

n1 (n1 + 1)  (3) 2 n (n + 1)  (4) U 2 = ΣR2 − 2 2 2 Teachers of statistics can impart to students some insight into the meaning of this formula by noting that U is zero when one group has all the smallest ranks. To illustrate, consider the rat example used by Mann and Whitney (1947), and imagine five rats in the control group and five in the treatment group. Suppose that the five rats in the control group have the shortest lives, and so have ranks one (shortest life) through five (fifth shortest life). The actual sum of ranks for the control group is ΣR = 1 + 2 + 3 + 4 + 5 = 15. This is also the minimum value, so U for this group is 15 minus 15, which equals zero, meaning that none of the rats in the control group lived longer than a rat in the treatment group.

U 1 = ΣR1 −

Favorable and Unfavorable Pairs

A helpful term for teaching students the meaning of U is the word pair. Consider a study with human participants on learning vocabulary: a pair exists when a person in the experimental group is compared to a person in the control group. When a pair is formed, there are three possible outcomes. One outcome is that the pair may support the hypothesis; using the vocabulary example, the student in the experimental group learned more words. The teaching approach introduced in this paper describes such a pair with the term favorable, because the data are favorable to the hypothesis. The second outcome is that the pair may not support the hypothesis; using the vocabulary example, the student in the experimental group learned fewer words. The teaching approach introduced in this paper describes such a pair with the term unfavorable, because the data are unfavorable to the hypothesis. Finally, the pair may have the same score, in which case they are tied. There are two reasons to use the terms favorable and unfavorable when teaching the Mann-Whitney test to Innovative Teaching

TABLE 1 Mann-Whitney U Test Sample Data Person

Score

Art

2:20

Carl

2:23

Bill

Dan Ed

Frank Gary Hal Ira

2:21 2:25 2:26 2:28 2:29 2:30 2:32

Ranks Sprints 1

Control

2 3 5 6

4

7 8 9

Note  The hypothesis is that the runners in the sprint group will run faster. Of the total of 20 pairs, 18 (90%) are favorable to the hypothesis and 2 (10%) are unfavorable; hence, the rankbiserial correlation is r = .90–.10 = .80.

2

2014, Volume 3, Article 1

Simple Difference Formula / D. S. Kerby

Though U is non-directional, it is always possible to state a directional hypothesis. In fact, in the field of psychology a directional hypothesis is common. Researchers prefer that depressed patients become less depressed, that students make higher grades, and that healthy behaviors lead to a longer life. In cases where there is no preferred direction, one can select an arbitrary direction as the hypothesis. When a direction is stated, and when the results are in the expected direction, then U is easy to interpret: it is the number of pairs that are unfavorable to the hypothesis. For example, consider a study with 10 people in the control group and 10 in the treatment group (for a total of 100 pairs). If there are 70 favorable pairs and 30 unfavorable pairs, then U is 30. Dividing U by the total number of pairs will yield the proportion of unfavorable pairs, a proportion that can be represented by u. In the same way, the proportion of favorable pairs can be represented by f, which can be used as a measure of effect size with the Mann-Whitney U test.

the experimental group is faster than the runner in the control group. For example, Art in the treatment group came in first place, so he has a rank of one. Dan in the control group came in fourth place, so he has a rank of four. A line that connects the two ranks will slope down to the right, a direction that indicates a favorable pair. Art can be paired in this way with three more runners, so Art is a member of four favorable pairs. Counting in this way for each person in the treatment group, the result is a total of 18 favorable pairs. Next, consider the unfavorable pairs. In the current example, a pair is unfavorable when the runner in the control group is faster; when this is the case, the line that connects the two ranks slopes up to the right. For example, Ed in the treatment group has a rank of five, and Dan in the control group has a rank of four. The two runners form an unfavorable pair, so the line between the two ranks slopes up to the right. Counting in this way, for the entire table, the result is only two unfavorable pairs. Table 1 contains no tied pairs, but tied pairs are counted as one half favorable and one half unfavorable. By counting pairs in this simple manner, students obtain the same result as applying the formulas for U1 and U2. Applying the formulas, the result is 18 and 2; counting the pairs with connecting lines, the result is 18 favorable pairs and 2 unfavorable pairs. In this way, the meaning of the formulas can be made clear. While on the topic of ties, it should be noted that a few ties cause little concern with the Mann-Whitney U. However, when there are numerous ties, “it may now be misleading to use tabulated critical values” (Sprent & Smeeton, 2001, p. 152). The misleading results would also apply to effect sizes discussed in this paper, so caution should be used when applying the Mann-Whitney U to data with numerous ties (see also Conover, 1999).

The Common Language Effect Size

McGraw and Wong (1992) discussed using the proportion of favorable pairs as a measure of effect size, which they called the common language effect size. Given assumptions such as normality and equal variances, effect sizes expressed as Cohen's d or the Pearson r can be converted into the common language effect size. The effect size also works well with the Mann-Whitney U test. The purpose of the common language effect size, as the name implies, is to express the meaning of an effect size in the everyday language of a percent. McGraw and Wong give the example of the height difference between men and women. While a simple way of reporting the effect size is to note that the average man is 5.4 inches taller than the average woman, the common language effect size is also easily interpreted, because one can say that “in 92 out of 100 blind dates among young adults, the male will be taller than the female” (p. 361). Thus, one advantage of the common language effect size is that it is easily interpreted. A second advantage of reporting the common language effect size is that it allows for a clear statement of the null hypothesis: if the null is true, then in the long run the amount of favorable evidence is equal to the amount of unfavorable evidence. In other words, when the null is true, the expected value of the common language effect size is fifty percent. With the Mann-Whitney U test, this amounts to saying that the sampling distribution has a mean of 50% favorable pairs. A third advantage of the common language effect size is that it allows easy interpretation when the results are against the prediction. For example, suppose that a study backfires, so that the analysis yields 30 favorable pairs and 70 unfavorable pairs. The value of U is still 30, just as it was in the previous example when the data

U is Non-Directional

A key feature of U as a statistic is that it is non-directional. For example, consider a study on the treatment of depression. Suppose that five people in the treatment group have the five lowest scores on depression, so the ranks are 1, 2, 3, 4, and 5. The four people in the control group have the highest scores on depression, so the ranks are 6, 7, 8, and 9. In such a case, there are 20 favorable pairs, and 0 unfavorable pairs, so U = 0. On the other hand, suppose the treatment backfired, so that the five people in the treatment group had the five highest scores on depression. In such a case, there are 0 favorable pairs, and 20 unfavorable pairs, so U still equals zero. The point here is that in both cases, U equals zero. That is, if one only knows that U is zero, then one of two states is true: either the data in the form of ranks are as good as they can possibly be, or they are as bad as they can possibly be. In short, U is non-directional. Innovative Teaching

3

2014, Volume 3, Article 1

Simple Difference Formula / D. S. Kerby

were in the predicted direction. However, by stating the effect size as 30% favorable pairs, one knows that the data are not in the predicted direction, because the most of the evidence is not in accord with the hypothesis.

when U is zero. Because U is by definition non-directional, the rank-biserial as computed by the Wendt formula is also non-directional and is always positive.

Rank-Biserial Correlation

Though the three formulas mentioned above are useful, they were introduced before the common language effect size. Also, two decades of teaching statistics at the college level has convinced me that none of the three formulas convey much meaning to the typical student. In this paper, I introduce a fourth formula for computing the rank-biserial correlation, one that is based on the common language effect size. To do this, begin with the formula presented by Cureton (1956), then break the ratio in two. This yields a formula that is the simple difference between two ratios: r = (P/Pmax) – (Q/Pmax). The first ratio is the proportion of favorable pairs, which is the common language effect size; the second ratio is the proportion of unfavorable pairs. If the proportion of favorable pairs is represented as f, and if the proportion of unfavorable pairs is represented as u, then the formula can be written as the simple difference between the two proportions: r =  f – u. This is the simple difference formula. In words, the formula states that the nonparametric correlation equals the simple difference between the proportion of favorable and unfavorable evidence; in the case of the MannWhitney U test, the evidence consists of pairs. An advantage of the simple difference formula is that it expresses one effect size (the rank-biserial correlation) in terms of another easily understood measure of effect size (the common language effect size). The other formulas for the rank-biserial express it in terms that are less easily interpreted. A second advantage of the simple difference formula is that it gives meaning to the direction of the sign. A positive correlation means that the data were in the predicted direction; a negative correlation, that they were against the predicted direction. A third advantage of the simple difference formula is that it is readily understood by students, for an analogy is easily made to weighing information in a balance. If the favorable data is more than the unfavorable data, then the scales are tipped in favor of the hypothesis, and the correlation is positive. Using the data in Table 1 as an example, the favorable evidence outweighs the unfavorable 90% to 10%, so the overall balance is 0.90 minus 0.10, yielding a rank-biserial correlation of r = .80. If the data are all favorable, then the scales are tipped as much as possible, and the correlation is a perfect one. If the amount of favorable data is equal to the amount of unfavorable data, then the scales are not tipped either way, and the correlation is zero. In summary, because many introductory texts omit a discussion of a correlational effect size for the MannWhitney U, teachers of introductory statistics may find the simple difference formula a useful way to address

The Simple Difference Formula

While the common language effect size is useful, a more widely used measure in statistics is the correlation. A correlation effect size exists for the Mann-Whitney U test, and it is known as the rank-biserial correlation. Three formulas have been proposed for computing this correlation. Edward Cureton (1956) introduced and named the rank-biserial correlation. To compute the correlation, Cureton stated a direction; that is, one group was hypothesized to have higher ranks. Then he used the concepts of an agreement and an inversion, his terms for what in this paper are called a favorable pair and an unfavorable pair. Cureton denoted the number of agreements (favorable pairs) with P, and he denoted the number of inversions (unfavorable pairs) with Q. He then computed the correlation as the difference between P and Q, divided by the maximum value. In symbols, r  = (P–Q)/Pmax. A second formula for the rank-biserial correlation was developed by Gene Glass (1965) in the course of his work on item analysis for scale development. His goal was to derive a formula to estimate the Spearman correlation, in the same way that the biserial r estimates the Pearson r. As Glass worded it, “One can derive a coefficient defined on X, the dichotomous variable, and Y, the ranking variable, which estimates Spearman's rho between X and Y in the same way that biserial r estimates Pearson's r between two normal variables” (p. 91). He ended with a formula that was mathematically equivalent to Cureton's formula. The Glass formula is convenient for item analysis, because a high correlation occurs when test takers who correctly answer an individual item have a higher rank on the total score than those who answered incorrectly. The Glass formula is equal to twice the difference in the mean ranks of those people who answered correctly (Y1 ) and those who missed the item (Y0 ), divided by the total number of people who are ranked: r = 2 Y1 − Y0 / N . Among those few textbooks that present a formula, the Glass formula is used (e.g., Cohen, 2008; Jaccard & Becker, 2009). Hans Wendt (1972) presented a third formula, one based on U. Wendt was motivated to develop his formula because he observed in published research a “neglect of correlation in favor of significance statistics” (p. 463). His goal was to derive an easy-to-use formula that would promote the reporting of effect sizes with the Mann-Whitney U test. The Wendt formula computes the rank-biserial correlation from U and from the sample size (n) of the two groups: r = 1 – (2U)/ (n1 * n2). One can see that the correlation is at a maximum of r =  1

(

Innovative Teaching

)

4

2014, Volume 3, Article 1

Simple Difference Formula / D. S. Kerby

this omission. The simple difference formula allows students to compute an effect size for the Mann-Whitney U, and it provides insight into the meaning of the correlation.

If SU represents the sum of the unfavorable ranks, then SF represents the sum of the favorable ranks. Then the expected sum E can be expressed as (SF + SU)/2, and this reduces the numerator to W, which is sometimes used as the test statistic (e.g., Glantz, 2005). Thus, the formula is now r = W/S. And of course, for a directional hypothesis, W can be stated as the difference between the favorable sums and unfavorable sums. The result is that the matched-pairs rank-biserial correlation can be expressed r =  (SF/S) – (SU/S), a difference between two proportions. One can note that the rank-biserial as defined by Cureton (1956) can be stated in a similar form, namely r = (P/Pmax) – (Q/Pmax). Therefore, the formula for the matched-pairs rank-biserial correlation also reduces to: r =  f – u. Here again is the simple difference formula. In words, the formula states that the nonparametric correlation is the simple difference between the proportion of favorable and unfavorable evidence; in the case of the Wilcoxon signed-rank test, the evidence consists of rank sums.

The Wilcoxon Signed-rank Test

Another popular method taught in introductory courses is the Wilcoxon signed-rank test. While the U test compares two independent groups, the signed-rank test compares two matched groups (Wilcoxon, 1945). The signed-rank test is often described as the nonparametric version of the paired t test. Various symbols have been used for the test statistic. In his original paper, Wilcoxon used r to refer to the smaller sum of liked-signed ranks; however, the letter r is used for the correlation coefficient, so the use of this symbol has been rarely adopted. The popular book by Sidney Siegel (1956) introduced many behavioral researchers to nonparametric methods, and Siegel referred to the smaller of the liked-signed ranks with the letter T, which has become popular in the social sciences. Another approach is to add both the negative and positive sums to produce a test statistic called W (e.g., Glantz, 2005). As mentioned above, it is common for introductory texts to omit a discussion of an effect size with the signed-rank test. One exception to this state of affairs is King, Rosopa, and Minium (2011), who note that an appropriate effect size is the matched-pairs rank-biserial correlation. Though they do not provide a citation, King et al. present a formula for the correlation in terms of the smaller of the liked-signed ranks (T), the sum of the positive ranks (R+), the sum of the negative ranks (R–), and the sample size (N). Using r for the correlation, the formula is as follows:

r=

4×|T − (( R+ + R− )/ 2)| N ( N + 1)

Sample Problem

An easy way to teach the signed-rank test is to place the data in a structured table. Table 2 displays such a data table. The first column lists the names or IDs for each participant. The next two columns contain the pretest scores and posttest scores. The fourth column lists the change score, computed as the posttest score minus the pretest score. A directional hypothesis is stated, either that scores are predicted to increase or to decrease. A favorable rank is one that is in accord with this prediction; an unfavorable rank, one that is not in accord with it. The last two columns contain the ranks for the absolute value of the change scores, with one column for favorable ranks and one column for unfavorable ranks. An important part of the structure of such a table is that the ranks are placed in order; they can be ascending or descending, whichever seems more convenient for the problem at hand. For the hypothetical study in Table 2, eight people participate in a program to increase marital happiness, and a scale of marital happiness is given before and after the program. Because there are eight people in the study, the total rank sum is 36 (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8  = 36). Given the hypothesis that the program will increase the happiness score, a person's data is favorable when the score increases after the program; in the same way, the data is unfavorable when the score decreases after the program. The data in Table 2 illustrate how to apply the simple difference formula, and these particular data yield a correlation of 0.50. The major advantage of the simple difference formula is its simplicity, especially as compared with the formula presented in the text by King, et al. (2011). A second advantage is that students do not have to learn a

(5)

The formula can be daunting for students in their first course of statistics, so it would be convenient if a simpler form were available. In fact, a simpler form is possible, because this formula can be converted into the simple difference formula. To simplify, first change the four in the numerator to two divided by one half; this change makes the value in the denominator equal to the total sum of ranks, which can be symbolized as S. Next, the value in parenthesis is merely the expected value when the null hypothesis is true, so the value in parenthesis can be replaced with the letter E. The absolute value sign allows a change from T–E to E–T, so the formula becomes: r = 2 |E–T|/S. To express the correlation in a directional manner, eliminate the absolute sign in the numerator. Also, when a direction is stated, T is equal to the sum of the unfavorable ranks, which can be expressed as SU, yielding the formula: r = (2E – 2SU)/S. Innovative Teaching

5

2014, Volume 3, Article 1

Simple Difference Formula / D. S. Kerby

Suppose that the count of students at or above grade level is 70 in the treatment group, but only 30 in the control group. The BESD is the simple difference between the two proportions, r = .70 – .30 = .40. Thus, for those teachers who cover the BESD in class, the simple difference formula which is used for the BESD can be carried over to the Mann-Whitney U test and the Wilcoxon signed-rank test.

TABLE 2 Wilcoxon Signed-rank Test Sample Data Person A

Before 20

B

22

D

20

C E F

G

H

Score

After 38

18

33

14

37

19

29

22 18 20

15 9

14

–8

20

–4

12

24

Change

22 Rank Sums =

Rank of Change

Favorable (+) Unfavorable (–) 8 7 6 5

–6 2

Three-valued Logic for the Null Hypothesis

4

Another possible use of the simple difference formula is in teaching three-valued logic for the null hypothesis. A long tradition exists in introductory texts for teaching a two-valued testing strategy: reject the null, or fail to reject. Despite this tradition, Wainer and Robinson (2003) note that Ronald Fisher, the founder of null hypothesis testing, in fact used a three-valued approach. “When p … is small (less than .05), he declared that an effect has been demonstrated. When it is large (p is greater than .2), he declared that, if there is an effect, it is too small to be detected with an experiment this size. When it lies between these extremes, he discussed how to design the next experiment to estimate the effect better” (p. 23). Harris (1997) provides a good discussion on threevalued logic in testing the null hypothesis, and recommends its use. He noted that two-valued logic leads to “such absurdities as stating whether or not results are statistically significant, but not in what direction” (p. 8). Given that the simple difference formula is easy to use and readily indicates a direction, the formula may prove useful to those who wish to apply three-valued logic in testing the null.

3 1 27

2 9

Note  The hypothesis is that the scores will increase. Of the total rank sum of 36, the favorable rank sum is 27 (75%), and the unfavorable rank sum is 9 (25%), so the matched-pairs rank-biserial correlation is r = .75–.25 = .50.

new formula; they can instead apply the same rank formula used for the Mann-Whitney U test. A third advantage is that the simple difference formula is directional and gives meaning to the direction of the sign; when the data are in the predicted direction, the correlation is positive; when the data are not in the predicted direction, the correlation is negative. A fourth advantage is again the ease of understanding that comes from making an analogy to weighing the evidence in a balance. Using the data in Table 2, the favorable evidence outweighs the unfavorable 75% to 25%, so the overall balance is 0.75 minus 0.25, yielding a matched-pairs rank-biserial correlation of r = .50. One would expect that students would report little insight into the meaning of the involved formula used by King, et al. (2011); by contrast, they find the analogy to weighing evidence for and against a hypothesis to be intuitively meaningful.

Teaching Effect Sizes In Other Psychology Classes

The simple difference formula is so readily understood that it can be used to convey the meaning of research results in psychology classes other than statistics. As an example, consider a course in personality, for which the topic of effect sizes is important. In the 1960s, many researchers attacked the field of personality. Two researchers describe the mood of the decade: “During the 1960s when we were graduate students, we frequently heard predictions from experimental psychologists and experimental social psychologists that in 20 or so years differential psychology would be a dead field” (Schmidt & Hunter, 2004, p. 162). The dismissal of personality traits rested on the supposedly small effect sizes between traits and relevant outcomes, as detailed in Mischel (1968). He noted that much research found that traits correlated with social outcomes at about r = .30, which he said was too small to be important, and which he mockingly called the personality coefficient. Nisbett (1980) later had to admit that many well-designed personality studies regularly obtained effect sizes near r = .40, but the criticism remained that this was too small to be important. How-

The Binomial Effect Size Display

An additional benefit of teaching the simple difference formula is that it applies to another topic in psychological statistics – the Binomial Effect Size Display (BESD). Based on a 2 × 2 chi-square, the BESD was developed to provide insight into the meaning of the size of a correlation (Rosenthal & Rubin, 1982). The effect size in this case is a version of the Pearson r sometimes referred to as phi. The concepts of favorable and unfavorable evidence still apply. But whereas the evidence for the MannWhitney U consists of pairs, and whereas the evidence for the Wilcoxon rank-sum test consists of sums, the evidence for the BESD consists of counts. To illustrate, consider of study of reading ability in fifth grade students. The 100 students in the control group receive teaching as usual, and the 100 students in the treatment group receive a new method; the outcome measure is reading at or above grade level. Innovative Teaching

6

2014, Volume 3, Article 1

Simple Difference Formula / D. S. Kerby

ever, Nisbett and other social psychologists continued to claim that effect sizes in social psychology were large. This line of argument based on effect sizes was shown to be unfounded in a seminal paper by Funder and Ozer (1983). Selecting high-profile social psychology studies widely agreed to show large effects, Funder and Ozer computed the effect sizes. The result was that the median effect size of these large effects was r = .38. If this effect is large in social psychology, then it must also be large in personality psychology, refuting the claim that the effects of traits are too small to be important. A later meta-analysis of 25,000 studies in social psychology over the past century found that the average effect size was r = .21, with a standard deviation of .15 (Richard, Bond, & Stokes-Zoota, 2003). These results soundly reject the claim that social psychology has larger effects than personality psychology. Because this debate looms large in the history of personality, teachers of personality may wish to convey to students the meaning of a correlation of r = .40. An easy way to do this is to use a simple rank method, such as the simple difference formula. Table 3 presents a sample of data that can be used for this purpose. The data consists of scores on a trait, which are ranked in the table from high to low; and an outcome variable is scored as success or failure. Thus, the data could illustrate using emotional stability to predict a given level of marital happiness, extraversion to predict meeting a sales target, or core self-evaluations to predict a satisfactory job evaluation by a work supervisor. The data in Table 3 consists of 40 people, of which 20 succeed on the outcome, and 20 fail. The ranks of the trait scores are in the last two columns, with the ranks of successful people in one column, and the ranks of unsuccessful people in the other. The rank-biserial r for these data is r = .40. To reject the null at the .05 level with the MannWhitney U requires a U of 127 or less, so the obtained U of 120 allows one to reject the null hypothesis. A good way to teach students the meaning of a correlation is in terms of a prediction. For the rank-biserial r, the prediction is with pairs. When all possible pairs are formed between a person with success and a person with failure, the prediction is that the person with the higher rank on the trait score is the one rated successful. In Table 3 there are 400 pairs, and the prediction is correct in 70% of cases (n = 280 pairs) and incorrect in 30% of cases (n = 120 pairs); in other words, the odds of being correct are seven to three. A line in Table 3 divides the upper half from the lower half; this line shows the close association between the rank-biserial r and the BESD. For the BESD, the prediction is with counts of people: the prediction is that those people in upper half of trait scores will have success. Note that this prediction is correct in 70% of cases (n = 14 people) and wrong in 30% (n = 6 people); put another way, the odds of being correct are seven to Innovative Teaching

TABLE 3 Example to Illustrate a Rank Correlation of .40 Person ID

Trait Score

A1

70

A3

60

A2 A4 A5 A6 A7 A8 A9

A10 A11

A12 A13 A14 A15 A16 A17 A18 A19 A20 B1 B2 B3 B4 B5 B6 B7 B8 B9

B10 B11

B12 B13 B14 B15

67 59 58 57 56 54 52 51 50 48 47 46 43 42 40

30

28

25

31

19

30 25 24 23

17 16 14

19

12

18 15 14 13 12 11

 6

39 37 36 34 32 31 29 27 26 24 22

18

15

13

10  9  7  6

 8

 4

 5

 2

 4

40

20

21 20

Success

23 21

32

Ranks

33

35

B18 B20

35

41

10

B19

38

55

B16 B17

Failure

11

 8

 5

 3 1

Note  The hypothesis is that the people with higher trait scores will be more likely to show success on the outcome. Of the total of 400 pairs, 280 (70%) are favorable to the hypothesis and 120 (30%) are unfavorable; hence, the rankbiserial correlation is r = .70 – .30 = .40.

three. Of course, this means those people in the lower half of trait scores are predicted to fail; this prediction is also correct in 70% of cases (n = 14 people) and wrong

7

2014, Volume 3, Article 1

Simple Difference Formula / D. S. Kerby

in 30% (n = 6 people), with odds of being correct also at seven to three. Notice that these figures of 70% and 30% are those of the simple difference formula: r = .70 – .30 = .40. In general terms, the simple difference formula r =  f – u means that a given correlation r can yield a prediction for two independent groups, so that the proportion correct is f and the proportion incorrect is u. For the data in Table 3, this holds true whether using the 400 pairs to compute the rank-biserial r, or whether using the 40 people to compute the BESD. An example such as this can provide students with insight into the meaning of r = .40.

Cohen, J. (1962) The statistical power of abnormal-social psychological research. Journal of Abnormal Psychology, 65(3), 145-153. DOI: 10.1037/h0045186 Conover, W. J. (1999) Practical nonparametric statistics (3rd ed.). New York, NY: John Wiley. Cureton, E. E. (1956) Rank-biserial correlation. Psychometrika, 21(3), 287-290. DOI: 10.1007/BF02289138 Funder, D. C., & Ozer, D. J. (1983) Behavior as a function of the situation. Journal of Personality and Social Psychology, 44(1), 107112. DOI: 10.1037/0022-3514.44.1.107 Glantz, S. A. (2005) A primer of biostatistics (6th ed.). New York, NY: McGraw-Hill Medical. Glass, G. V. (1965) A ranking variable analogue of biserial correlation: implications for short-cut item analysis. Journal of Educational Measurement, 2(1), 91-95. DOI: 10.1111/j.1745-3984.1965. tb00396.x Gravetter, F. J., & Wallnau, L. B. (2013) Statistics for the behavioral sciences (8th ed.). Belmont, CA: Wadsworth Cengage Learning. Harris, R. J. (1997) Significance tests have their place. Psychological Science, 8(1), 8-11. Howells, D. C. (2012) Statistical methods for psychology (8th ed.). Belmont, CA: Thompson Wadsworth. Jaccard, J., & Becker, M. A. (2009) Statistics for the behavioral sciences (5th ed.). Pacific Grove, CA: Wadsworth Publishing. Kiess, H. O., & Green, B. A. (2010) Statistical concepts for the behavioral sciences (4th ed.). Boston, MA: Allyn & Bacon. Kirk, R. E. (2008) Statistics: an introduction (5th ed.). Belmont, CA: Thomson Wadsworth. King, B. M., Rosopa, P. J., & Minium, E. W. (2011) Statistical reasoning in the behavioral sciences (6th ed.). Hoboken, NJ: John Wiley. Mann, H. B., & Whitney, D. R. (1947) On a test of whether one of two random variables is stochastically larger than the other. Annals of Mathematical Statistics, 18(1), 50-60. DOI: 10.1214/ aoms/1177730491 McGraw, K. O., & Wong, J. J. (1992) A common language effect size statistic. Psychological Bulletin, 111(2), 361-365. DOI: 10.1037/0033-2909.111.2.361 Mischel, W. (1968) Personality and assessment. New York, NY: Wiley. Nisbett, R. E. (1980) The trait construct in lay and professional psychology. In L. Festinger (Ed.), Retrospections on social psychology. New York, NY: Oxford Univer. Press. Pp. 109-130. Nolan, S. A., & Heinzen, T. E. (2008) Statistics for the behavioral sciences. New York, NY: Worth Publ. Privitera, G. J. (2012) Statistics for the behavioral sciences. Thousand Oaks, CA: Sage Publications, Inc. Richard, F. D., Bond, C. F., Jr., & Stokes-Zoota, J. J. (2003) One hundred years of social psychology quantitatively described. Review of General Psychology, 7(4), 331-363. DOI: 10.1037/10892680.7.4.331 Rosenthal, R., & Rubin, D. B. (1982) A simple, general purpose display of magnitude of experimental effect. Journal of Educational Psychology, 74(2), 166-169. DOI: 10.1037/0022-0663.74.2.166 Siegel, S. (1956) Nonparametric statistics for the behavioral sciences. New York, NY: McGraw-Hill Book Company. Schmidt, F. L., & Hunter, J. (2004) General mental ability in the world of work: occupational attainment and job performance. Journal of Personality and Social Psychology, 86(1), 162-173. DOI: 10.1037/0022-3514.86.1.162 Spatz, C. (2011) Basic statistics: Tales of distributions (10th ed.). Belmont, CA: Wadsworth, Cengage Learning. Sprent, P., & Smeeton, N. C. (2001) Applied nonparametric statistical methods. Boca Raton, FL: Chapman & Hall/CRC.

Discussion

Though the importance of teaching effect sizes in introductory courses is widely recognized, many introductory textbooks do not provide an effect size measure for two nonparametric methods commonly covered in the course – the Mann-Whitney U test, and the Wilcoxon signed-rank test. When a correlational effect size is discussed in textbooks, two different formulas are used – the Glass (1965) formula for the Mann-Whitney U, and a different formula for the Wilcoxon signed-rank test (Cohen, 2008; King, et al., 2011). This paper has shown that the simple difference formula can be used for both inferential tests. The simple difference formula states that the correlation is equal to the difference between the proportion of favorable and unfavorable evidence; in symbols, r =  f – u. When expressed in terms of favorable pairs, the formula computes the rank-biserial correlation for the Mann-Whitney U. When expressed in terms of favorable sums, the simple difference formula computes the matched-pairs rank-biserial correlation for the Wilcoxon signed rank test. Its ease of use and its generality makes the simple difference formula a useful concept to teach in the introductory course in psychological statistics. In addition, another advantage of adopting the formula for class is that it is related to other concepts in statistics. When expressed in terms of favorable counts, the simple difference formula is an easy introduction to the BESD, a method designed to instruct students on how to interpret the size of a correlation. Also, the formula can be used to introduce students to the common language effect size, which in the Mann-Whitney U test is equal to the proportion of favorable pairs. In summary, teachers of psychological statistics may find the simple difference formula a useful addition to their class, and teachers of other classes in psychology may find the formula an easy way to convey the meaning of a correlation.

References

Aron, A., Aron, E. N., & Coups, E. J. (2011) Statistics for psychology (6th ed.). Upper Saddle River, NJ: Pearson Prentice Hall. Cohen, B. (2008) Explaining psychological statistics (3rd ed.). Hoboken, NJ: John Wiley. Innovative Teaching

8

2014, Volume 3, Article 1

Simple Difference Formula / D. S. Kerby Wainer, H., & Robinson, D. H. (2003) Shaping up the practice of null hypothesis significance testing. Educational Researcher, 32(7), 22-30. Wendt, H. W. (1972) Dealing with a common problem in social science: a simplified rank-biserial coefficient of correlation based on the U statistic. European Journal of Social Psychology, 2(4), 463-465. DOI: 10.1002/ejsp.2420020412

Innovative Teaching

Wilcoxon, F. (1945) Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80-83. DOI: 10.2307/3001968 Wilkinson, L., & APA Task Force on Statistical Inference. (1999) Statistical methods in psychology journals: guidelines and explanations. American Psychologist, 54(8), 594-604. DOI: 10.1037/0003066X.54.8.594

9

2014, Volume 3, Article 1