Elementary Statistics

14 downloads 240298 Views 249KB Size Report
Elementary Statistics. Dr. Ghamsary. Chapter 2. Page 2. 2. Descriptive Statistics. Grouped vs Ungrouped Data. • Ungrouped data: have not been summarized in ...
Elementary Statistics

Dr. Ghamsary

Chapter 2

Elementary Statistics M. Ghamsary, Ph.D. Chapter 02

1

Page 1

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 2

Descriptive Statistics Grouped vs Ungrouped Data •

Ungrouped data: have not been summarized in any way are also called raw data



Grouped data: have been organized into a frequency distribution

Raw Data: When data are collected in original form, they are called raw data. The following are the scores on the first test of the statistics class in fall of 2004. 76 62 68 69 79 90 79 86 52 97

78 55 96 89 73 66 88 92 94 50

71 89 78 88 58 76 59 92 93 88

86 66 81 85 85 70 55 62 80 60

80 72 82 86 99 63 75 83 78 61

Table 2.1: Data fromTest#1 of fall 2007

Stem-and-Leaf: One method of displaying a set of data is with a stem-and-leaf plot. Stem

Leaf

5

025589

6

012236689

7

012356688899

8 9

001235566688899 02234679

Group Data: When the raw data is organized into a frequency distribution

Frequency Distribution: is the organizing of raw data in table form, using classes and frequencies.

2

Elementary Statistics

Class

Dr. Ghamsary

Tally

Chapter 2

Page 3

Frequency

50-59

6

60-69

9

70-79

12

80-89

15

90-99

8

• Class: Number of classes in the above table is 5. • Class Limits: represent the smallest and largest data values in each class. • Lower Class: the lowest number in each class. In above table 50 is the lower class limit of the first class, 60 is the lower class limit of the 2nd class, etc. • Upper Class: the highest number in each class. In above table 59 is the upper class limit of the first class, 69 is the upper class limit of the 2nd class, etc. • Class Width: for a class in a frequency distribution is found by subtracting the lower (or upper) class limit of one class minus the lower (or upper) class limit of the previous class. In above table the class width is 10.

Class Boundaries are used to separate the classes so that there are no gaps in the frequency distribution. Class 50-59

3

Class Frequency Boundaries 49.5-59.5 6

60-69

59.5-69.5

9

70-79

69.5-79.5

12

80-89

79.5-89.5

15

90-99

89.5-99.5

8

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 4

Cumulative Frequency: Relative Frequency: Class

Frequency

Cumulative Relative Frequency Frequency

50-59

6

6

6/50=0.12

60-69

9

9+6=15

9/50=0.18

70-79

12

12+15=27

12/50=0.24

80-89

15

15+27=42

15/50=0.30

90-99

8

8+42=50

8/50=0.16

n=50

Most Popular Graphs in Statistics The most commonly used graphs in statistics are: 1. The Histogram

6. Pareto Charts

2. The Frequency Polygon.

7. Dot Plot

3. The Cumulative Frequency Graph

8. Stem-Leaf

4. The Bar Chart

9. Time Series Graph

5. Pie Chart

1. The Histogram o Making decisions about a process, product, or procedure that could be improved after examining the variation (example: Should the school invest in a computer-based tutoring program for low achieving students in Algebra I after examining the grade distribution? Are more shafts being produced out of specifications that are too big rather than too small?) o Displaying easily the variation in the process (example: Which units are causing the most difficulty for students? Is the variation in a process due to parts that are too long or parts that are too short?)

4

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 5

Histogram of Test1 Normal 16 14

Mean StDev N

76.8 12.98 50

Mean StDev N AD P-Value

76.8 12.98 50 0.537 0.161

Frequency

12 10 8 6 4 2 0

55

65

75 Test1

85

95

Probability Plot of Test1 Normal - 95% CI 99

95 90

Percent

80 70 60 50 40 30 20 10 5

1

5

30

40

50

60

70 80 Test1

90

100

110

120

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 6

2. The frequency polygon o Making decisions about a process, product or procedure that could be improved (example: a frequency polygon for 642 psychology test scores, shown below to the right.) X

Frequency

54.5

6

64.5

9

74.5

12

84.5

15

94.5

8

Scatterplot of f vs x 15.0

f

12.5

10.0

7.5

5.0 60

70

80

Midpoints x

6

90

100

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 7

2. The Cumulative Frequency Graph (Ogive) Cumulative frequency is used to determine the number of observations that lie above (or below) a particular value. Upper Class Boundaries

Cumulative Frequency

59.5

6

69.5

15

79.5

27

89.5

42

99.5

50

Scatterplot of Cumulative f vs x 50

Cumulative f

40

30

20

10

0 60

70

80

Upper Class Boudaries

7

90

100

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 8

4. The bar chart Bar charts are useful for comparing classes or groups of data. A class or group can have a single category of data or they can be broken down further into multiple categories for greater depth of analysis. Class

Grade

Frequency

50-59

F

6

60-69

D

9

70-79

C

12

80-89

B

15

90-99

A

8

16 14

Frequency

12 10 8 6 4 2 0

8

F

D

C Grade

B

A

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 9

5. Pie Chart o A pie chart is a way of summarizing a set of categorical data or displaying the different values of a given variable (example: percentage distribution).

o

Pie charts usually show the component parts of a whole. Often you will see a segment of the drawing separated from the rest of the pie in order to emphasize an important piece of information

A 8, 16.0%

F 6, 12.0%

D 9, 18.0%

B 15, 30.0%

C 12, 24.0%

9

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 10

6. Pareto Charts A Pareto chart is used to graphically summarize and display the relative importance of the differences between groups of data.

16 14

Frequency

12 10 8 6 4 2 0

B

C

D

A

F

7. Dot plot A dot plot is a visual representation of the similarities between two sequences. D o tp l o t o f T e s t1

49

56

63

70

77 Te s t 1

10

84

91

98

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 11

8. Stem-Leaf o The Stem-and-Leaf Plot summarizes the shape of a set of data (the distribution) and provides extra detail regarding individual values. o They are usually used when there are large amounts of numbers to analyze. Series of scores on sports teams, series of temperatures or rainfall over a period of time, series of classroom test scores are examples of when Stem and Leaf Plots could be used. Stem

Leaf

5

025589

6

012236689

7

012356688899

8 9

001235566688899 02234679

9. Time series Graph Month Price of AOL Jan 65 Feb 60 Mar 58 Apr 62 May 55 Jun 50 Jul 48 Aug 55 Sep 57 Oct 50 Nov 48 Dec 40

Price of MSFT 110 115 120 100 95 90 85 75 80 60 50 40

Time Series Plot of AOL, MSFT Variable AOL MSFT

120 110 100 90 80 70 60 50 40 30 Dec

11

Jan

Feb Mar

Apr May Jun Month

Jul

Aug Sep Oct

Nov

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 12

Type of Distributions: There are several different kinds of distributions, but the following are the most common used in statistics. •

Symmetric , normal, or bell shape



Positively skewed, Right tail, or skewed to the right side.



Negatively skewed, Left tail, or skewed to the left side.



Uniform

• Symmetric, Bell Shape, or Normal Distribution 600

500

400

300

200

100

0

12

18

36

54

72

90

108

126

144

Elementary Statistics



Dr. Ghamsary

Chapter 2

Page 13

Positively skewed

500

400

300

200

100

0

0.00

0.09

0.18

0.27

0.36

0.45

0.54

0.63

• Negatively skewed 500

400

300

200

100

0

13

0.36

0.45

0.54

0.63

0.72

0.81

0.90

0.99

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 14

• Uniform 1000

800

600

400

200

0

14

0

2

4

6

8

10

Test1 76 62 68 69 79 90 79 86 52 97 78 55 96 89 73 66 88 92 94 50 71 89 78 88 58

Sex 1 1 1 1 0 0 1 1 0 1 1 1 1 1 0 0 1 0 1 1 0 0 1 0 1

Dr. Ghamsary

Grade C D D D C A C B F A C F A B C D B A A F C B C B F

Test1 76 59 92 93 88 86 66 81 85 85 70 55 62 80 60 80 72 82 86 99 63 75 83 78 61

Sex 1 1 1 1 0 0 0 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1

Grade C F A A B B D B B B C F D B D B C B B A D C B C D

Chapter 2

Grade 8

F D C B A

6

4

2

0 Male

Female

Sex

Sex

15

Male Female

12

Count

9

6

3

0 F

D

C

B

A

Grade

Sex 70

Male Female

60

Percent

50

40

30

20

10

0 F

D

C

Grade

15

B

Page 15 1=Female 0=Male

Count

Elementary Statistics

A

Elementary Statistics

Dr. Ghamsary

Chapter 2

Boxplot of Test1 vs Sex 100

90

Test1

80

70

60

50 Female

Male Sex

16

Page 16

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 17

Numerical measurements: •

Statistic:: any value(s) or measure(s) obtained from a sample.



Parameter:

any value(s) or measure(s) obtained from a specific population.

Measures of central tendency: are Mean, Median, and Mode, Mean is defined to be the sum of the scores in the data set divided by the total number of scores. o Sample Mean: is denoted by x , and it is defined by: n

x=

∑x i =1

n

i

, or simply

x=

∑x. n

o Population Mean: is denoted by µ , and it is defined by:

N

µ=

∑x i =1

N

i

, or simply µ = ∑

N

x .

Note: The sample mean, x is an unbiased estimate of the population mean, µ . Example1: Find the mean of 10, 7, 3, 12, 18. x=

10 + 7 + 3 + 12 + 18 = 10 . 5

Example2: Find the mean of 10, 7, 3, 12, 18, 13, 17, 15, 25, 3 x=

10 + 7 + 3 + 12 + 18 + 13 + 17 + 15 + 25 + 30 150 = = 15 10 10

Example3: Find the mean of scores in the test#1, 2004 in data set in this chapter.

x=

76 + 62 + " + 78 + 61 = 76 . 8 50 17

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 18

Median: is defined to be the midpoint of the data set that is arranged from smallest to largest. Example4: Find the median of 10, 7, 3, 12, 15. Solution: First we must sort the data set as follows: 3, 7, 10, 12, 15. The median is 10.

Example5: Find the median of 10, 7, 3, 12, 15, 20. Solution: After we sort we get: 3, 7, 10, 12, 15, 20. As we observe, there are 2 middle observations. So to find the median we average these 2 values, namely: Median=(10+12)/2 =11.

Example6: The median of scores in the test#1, 2004 in data is 78.50 Median = 78.50

Mode: is defined to be the value in the data set that occurs most frequently. Example7A: Find the mode of 10, 7, 3, 12, 15, 3. Mode is 3. Example7B: Find the mode of 10, 7, 3, 10, 15, 3. Modes are 3 and 10. Example7C: Find the mode of 10, 7, 3, 10, 10, 3. Mode is 10. Example7D: Find the mode of 10, 7, 3, 10, 7, 3. There is no mode, since all values occur with same frequency Example7E: Find the mode of 10, 7, 3, 12, 15, 18. There is no mode, since no values occur more than once.

18

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 19

Example 8: Find the mean, the median, and the mode of data set: 10, 17, 13, 12, 15, 18, 10, 17, 14, 16, 35, 28, 22, 17, 23, 12, 15, 28, 10, 20 Solution: First we must sort the data set 10, 10, 10, 12, 12, 13, 14, 15, 15, 16, 17, 17, 17, 18, 20, 22, 23, 28, 28, 35 o Mean: x = o Median:

10 + 10 + 10 + 12+ .....+28 + 28 + 35 352 = = 17.6 20 20

16 + 17 = 16.5 , since there are 2 middle observations 2

o Mode: 10, 17

Example 9: Find the mean, the median, and the mode of data set: 25, 42, 18, 37, 25, 18, 40, 57, 64, 66, 85, 86, 92 85, 88, 92, 67, 33, 75, 85, 48, 60, 80, 60, 50

Example10: Find the mean, the median, and the mode of data set: 12.37, 13.33, 32.67, 12.37, 26.45

19

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 20

Example11A: Find the mean for the following group data Class

Frequency

50-59

6

60-69

9

70-79

12

80-89

15

90-99

8

Solution: First we need to find the class marks(midpoints) and then we use the following formula :x

=

∑ [ x. f ] , n

where

x : is the midpoint or class mark, and f :is the frequency n :is the number of data points

Class

Frequency

Class marks

f

x

x. f

50-59

6

54.5

327

60-69

9

64.5

580.5

70-79

12

74.5

894

80-89

15

84.5

1267.5

90-99

8

94.5

756

n=

So the mean is

20

x=

∑ f =50

∑ [ x. f ] = 3825 = 76.5 n

50



x . f =3825

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 21

Example11B: Find the mean for the following group data Class

Frequency

00-04

4

05-09

10

10-14

12

15-19

20

20-24

8

25-29

6

Weighted Average (Mean): The formula in above is also called weighted average or weighted mean. It can also be written as follows: x=

∑ [ w .x ] ∑w

where w is weight and x is the score.

Example12: Find the GPA of John who has the following courses with the corresponding units and grades. English Math Spanish

5 units with the grade of A 3 units with the grade of F 2 units with the grade of D

Solution: In this problem, x will be the value of the grades and w is the number of units, x=

∑ [ w .x ] = [5.4] + [3.0] + [ 2.1] = 20 + 0 + 2 = 22 = 2.2 . 5+3+ 2 10 10 ∑w

Example13: A teacher is teaching 3 classes: There are 30 students in the first Class with the average of 70 on the final exam. The second class has 40 students with the average of 60 on the final exam. The 3rd class has 20 students with the average of 80 on the final exam. Find the weighted average of the three classes combined together. Solution: Let x be the average of and w be the number of students. x=

∑ [ w .x ] ∑w

=

70( 30 ) + 60( 40 ) + 80( 20 ) 2100 + 2400 + 1600 6100 = = ≈ 67.8 30 + 40 + 20 90 90

21

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 22

Measures of Variation • Range • Variance • Standard Deviation The Range: is defined to be the highest value minus the lowest value in the data set The Variance: is defined by the following: n

Sample: s

2

=

∑( x i =1

i

− x)

2

s2 =

or

n −1

∑x

2

(∑ x ) − n −1

2

n

(short cut formula of the sample

variance). N

Population: σ

2

=

∑ ( xi − µ )

2

i =1

, or σ = 2

N

∑x

2

d∑ x i −

2

N

N

variance).

Standard deviation: is the positive square root of the variance. Standard deviation = n

Sample: s =

∑( x i =1

i

− x)

, and

n −1

N

Population: σ =

22

∑( x i =1

2

i

− µ)

N

2

Variance

(short cut formula of the sample

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 23

Example14A: Find the range, variance, and the standard deviation of the following data set. 3, 0, 7, 5, 15. Solution: o Range: Largest- Smallest = 15-0=15 n

o Variance: If we use the So x =

s

s

2

i =1

i

− x)

n −1

2

, first we need to find the sample mean x .

3 + 0 + 7 + 5 + 15 30 = = 6 , then we substitute in the above formula and we get 5 5

b 3 − 6g + b0 − 6g + b7 − 6g + b5 − 6g + b15 − 6g = 2

2

s2 =

∑( x

2

2

2

5−1

b−3g + b−6g + b1g + b−1g + b9g = 2

2

2

2

2

s2 =

5−1

s2 =

2

,

9 + 36 + 1 + 1 + 81 , 5−1

128 = 32 , So the variance is s 2 = 32 . 4

x

x− x

3

3-6=-3

9

0

0-6=-6

36

7

7-6=1

1

5

5-6=-1

1

15

15-6=9

81

( x − x)

2

∑ ( x − x ) =0 ∑ ( x − x ) n

s2 =

∑ ( xi − x ) i =1

n −1

2

=128

2

=

128 128 = = 32 5 −1 4

o Standard deviation: As we know the standard deviation is positive square root of variance. standard deviation =

Variance

=

32 ≈ 5.66

23

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 24

But if we use the short cut formula

∑x

s2 =

(∑ x ) −

2

n −1

n

2

, first we need to find their sum,

∑ x , and their sum of squares,

∑x . ∑ x = 3 + 0 + 7 + 5 + 15 = 30 2

∑x s2 =

∑x

2

2

= 32 + 02 + 7 2 + 52 + 152 = 9 + 0 + 49 + 25 + 225 = 308

(∑ x ) − n −1

n

2

=

( 30 ) 308 − 5 −1

5

then we have

2

900 5 = 308 − 180 = 128 = 32 , which is exactly the 4 4 4

308 − =

same as above. ----------------------------------------------------------------------------------

Example14B: Find the range, variance, and the standard deviation of the following data set. 10, 17, 13, 12, 15, 18, 10, 17, 14, 16 28, 22, 17, 23, 12, 15, 28, 10, 20, 35

Solution:

24

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 25

Example15A: Find the standard deviation for the following group data Class

Frequency

50-59

6

60-69

9

70-79

12

80-89

15

90-99

8

Solution: First will modify the above formula for the variance. But first we need to find the class marks (midpoints) and then we use the following formula

s2 =



bx − xg . f 2

i

n−1

or s

=

2

∑ ⎡⎣ x

2

( ∑ xf ) f ⎤− ⎦

n −1

2

n

where

x : is the midpoint or class mark f : is the frequency

n : is the number of data points We already know the mean Class

50-59 60-69

x=

∑ [ x. f ] = 3865 = 76.5 x

x. f

bx − xg

6

54.5

327

(54.5-76.5)2=484

12

80-89

15

90-99

8

n=

50

f

9

70-79

n

∑ f =50

64.5 74.5 84.5 94.5

580.5 894 1267.5 756



x. f

=3825

2

i

bx − xg . f 2

i

2904

2

(64.5-76.5) =144

1296

2

(74.5-76.5) =4

48

2

(84.5-76.5) =64

960

2

(94.5-76.5) =324

2592



bx − xg . f 2

i

= 7800

25

Elementary Statistics

Dr. Ghamsary

After substitution in s = 2



bx − xg . f

Chapter 2

2

i

n−1

we get

7800 = 159.18 , and hence the 50 − 1

s = 159.18 ≈ 12.6

standard deviation will be

If we use the short cut formula Class

s2 =

∑ ⎡⎣ x

2

( ∑ xf ) f ⎤− ⎦

n −1

n

2

, we need the following table.

x2. f

f

x

x. f

50-59

6

54.5

327

(54.5)2.6 =17821.5

60-69

9

64.5

580.5

(64.5)2..9 =37442.25

70-79

12

74.5

894

(74.5)2.12 =66603

80-89

15

84.5

1267.5

(84.5)2.15=107103.8

90-99

8

94.5

756

(94.5)2.8 =71442

n=

∑ f =50



x. f

=3825

s2 =

s2 =

( 3825) 300412.5 − 40 − 1

50

Page 26

2

=



x2. f

=300412.5

14630625 3004125.5 − 292612.5 50 = = 39 49

300412.5 −

7800 = 159.18 and hence the standard deviation will be s = 159.18 ≈ 12.6 , which the same as 49 the above result.

26

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 27

Example15B: Find the standard deviation for the following group data Class

Frequency

00-04

4

05-09

10

10-14

12

15-19

20

20-24

8

25-29

6

27

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 28

Question 1. What will happen to the mean, median, mode, range, and standard deviation if we add a fix number, c, to all values in the data set?

Answer. The mean, median, and mode will increase by c units, but the range, and standard deviation will not change.

Question 2. What will happen to the mean, median, mode, range, and standard deviation if we subtract a fix number, c, from all values in the data set?

Answer. The mean, median, and mode will decrease by c units, but the range, and standard deviation will not change.

Question 3. What will happen to the mean, median, mode, range, and standard deviation if we multiply a fix number, c, to all values in the data set?

Answer. The mean, median, and mode will be multiplied by c units, so does to the range, and standard deviation.

Example 16: X

X+7

X-7

X*7

15

15+7=22

15-7=8

15*7=105

13

16+7=23

16-7=9

16*7=112

15

15+7=22

15-7=8

15*7=105

15

15+7=22

15-7=8

15*7=105

22

22+7=29

22-7=15

22*7=154

Mean

16

16+7=23

16-7=9

16*7=112

Median

15

15+7=22

15-7=8

15*7=105

Mode

15

15+7=22

15-7=8

15*7=105

Range

9

9

9

9*7=63

3.46

3.46

3.46

Sd

3.46*7=24.22

In general if Y = aX + b , then we have •

Mean of Y = a. [Mean of X]+b or



Standard deviation of Y = |a| [standard deviation of X], S y = a S X

28

y = ax + b

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 29

Empirical Rule If the distribution of a data is bell shape or normal, then



Approximately 68% of scores are one standard deviation away from the mean. They fall in the interval x − 1s , x + 1s .



Approximately 95% of scores are two standard deviation away from the mean. They fall in the interval x − 2s , x + 2s .



Approximately 99.7% of scores are two standard deviation away from the mean. They fall in the interval x − 3s , x + 3s .

Example17. Suppose the IQ scores are normally distributed with the mean of µ = 100 and standard deviation of σ = 15 . Then by the empirical rule



Approximately 68% of scores are in the interval 100-15, to100+15 or 85 to 115.



Approximately 95% of scores are in the interval 100-2(15), to100+2(15) or 70 to 130.



Approximately 99.7% of scores are in the interval 100-3(15), to100+3(15) or 55 to 145.

29

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 30

Coefficient of Variation The coefficient of variation is defined to be the standard deviation divided by the mean. Coefficient of variation (CV) =

s . If x is 0 or close to 0, then this measure shall not be used. x

Normally this measure is used in the case we have 2 or more groups of data with different units.

Example18. Class A

Mean =129, and standard deviation= 11

CV=11/129=.085 or 8.5%

Class B

Mean =150, and standard deviation= 25

CV=25/150=.167 or 16.7%

Class C

Mean =60, and standard deviation= 15

CV=15/60 = .25 or 25.0%

The class C has the greatest relative variation.

Measures of Position • Standard Scores z=

x−x or s

z=

x−µ

σ

,

where, x or µ is the mean s or σ is the standard deviation. This value, z, measures the deviation from the mean in number of standard deviation which is also has no unit.

Example19. Suppose John is taking 3 classes with the following scores. In which class has he better score? Class A Class B Class C

English test score = 145 Mean =129, and standard deviation= 11 Physics test score = 190 Mean =150, and standard deviation= 25 Statistics test score = 88 Mean =60, and standard deviation= 15

So his score in class C is higher relatively.

30

Z=(145-129)/11 =1.45 Z=(190-150)/25 = 1.60 Z=(88-60)/15=1.87

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 31

Percentiles The percentile corresponding to a given score (X) is denoted by P and it is given by the following formula

P=

# of scores less than x .100 total number of scores

Example20. John has the score of 88 in a class of 20 students. Find the percentile rank of a his score. 81, 65, 75, 76, 78, 62, 63, 65, 70, 90, 61, 75, 76, 79, 58, 88, 82, 95, 90, 67.

Solution: In any problem of finding percentile, we must sort the data set from smallest to largest. 58, 61, 62, 63, 65, 65, 67, 70, 75, 75 76, 76, 78, 79, 81, 82, 88, 90, 90, 95.

P=

# of scores less than x 16 .100 = .100 = 80 total number of scores 20

So john’s score has 80th percentile, which means 80% of all scores are below 88.

Finding the Score Corresponding to a Given Percentile Example21. In data set of example 20, find the score corresponding 12th percentile. Solution: Step1: Make sure data is sorted 58, 61, 62, 63, 65, 65, 67, 70, 75, 75 76, 76, 78, 79, 81, 82, 88, 90, 90, 95

Step2: Compute the L = p% of n., where L is the location for the score. In this example L=12%of 20=0.12(20)=2.4 or 3.

Step3: Go to the data set and pick the score at the 3rd position which is 62. It is usually written as P12=62

31

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 32

Note: • If L is not a whole number, round up to the next whole number. •

If L is a whole number, use the score as the average of Lth and (L+1)th location score.

Example22. In data set of example 20, find the score corresponding 40th percentile. Step1: as before 58, 61, 62, 63, 65, 65, 67, 70, 75, 75 76, 76, 78, 79, 81, 82, 88, 90, 90, 95

Step2: L =40% of 20= 0.40(20)=8 which is a whole number so we are going to pick the average of 8th and 9th scores. 8th score is 70

Step3:

9th score is 75 and their average is (70+75)/2=72.5. So P40=72.5.

Deciles: divide the data set into 10 groups. D1=10th percentile which the same as P10 D2=20th percentile which the same as P20 ……. D9=90th percentile which the same as P90

Quartiles: divide the data set into 4 groups. Q1=First quartile or 25th percentile which the same as P25 Q2=second quartile or 50th percentile which the same as P50 . This is also median Q3=third quartile or 75th percentile which the same as P75

Inter-Quartile Range (IQR): is the difference between 3rd and 1st quartiles and it is denoted by IQR and it is defined by IQR = Q3 – Q1.

32

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 33

Example23. In data set of example 20, find the score corresponding to • D2 • Q1 • Q3 • IQR

Outlier: An outlier is an extremely high or an extremely low data value, To check for outlier we compute Q1-1.5(IQR) and Q3+1.5(IQR), then if

• The suspected score is below Q1-1.5(IQR) or • The suspected score is above Q3+1.5(IQR) Then the score is said to be an outlier.

Example24. Is there any outlier in the following data set? 55 55 36 52 51

46 49 41 47 61

46 51 86 44 51

41 41 53 51 48

Sorted Data 36 51

41 51

41 51

41 51

44 52

46 53

46 55

47 55

48 61

49 86

33

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 34

Five commonly used Statistics: The five numbers in any data set that is used frequently are

Minimum, Q1, Q2, Q3, Maximum Box plot or box-and-whisker plot: is another graphical representation of any data set. We use the five commonly used statistics to graph the box plot. The box plot can provide answers to the following questions o Is a factor significant? o Does the location differ between subgroups? o Does the variation differ between subgroups? o Are there any outliers?

Example25. In data set of example 20, find the 5 common statistics. 58, 61, 62, 63, 65, 65, 67, 70, 75, 75 76, 76, 78, 79, 81, 82, 88, 90, 90, 95

1. Minimum: is 58 2. Q1: L= 25% of 20 =.25(20) = 5. Since this is a whole number we use the average of 5th and

6th

observation. In above ordered data set we have 5th score is 65 6th score is 65 their average is also 65. SO Q1=65.

3. Q2: L= 50% of 20 =0.50(20) =10. Again since this is a whole number we use the average of and 11th observation. In above ordered data set we have 10th score is 75 11th score is 76 their average is (75+76)/2=75.5 SO Q2=75.5.

4. Q3: L= 75% of 20 =0.75(20) =15. This is a whole number we use the average of 15th and 16th observation. In above ordered data set we have 15th score is 81

34

10th

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 35

16th score is 82 their average is (81+82)/2=81.5 SO Q3=81.5.

5. Maximum: is 95. So the five statistics are

58, 65, 75.5, 81.5, and 95. Boxplot of C 1

100

C1

90

80

70

60

Example26 In data set of example 24, find the 5 common statistics.

35

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 36

Example27. In data set below use computer to find the descriptive statistics and plot all appropriate charts for all variables that was discussed so far. Test1 76 62 68 69 79 90 79 86 52 97 78 55 96 89 73 66 88 92 94 50 71 89 78 88 58

Sex 1 1 1 1 0 0 1 1 0 1 1 1 1 1 0 0 1 0 1 1 0 0 1 0 1

Grade C D D D C A C B F A C F A B C D B A A F C B C B F

Test1 76 59 92 93 88 86 66 81 85 85 70 55 62 80 60 80 72 82 86 99 63 75 83 78 61

Sex 1 1 1 1 0 0 0 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1

Descriptive Statistics: Test1 Variable Sex N N* Mean SE Mean StDev Minimum Q1 Median Test1 Female 34 0 75.59 2.36 13.76 50.00 62.00 77.00 86.50 Male

36

16 0 79.38

2.77 11.10 52.00 71.50 83.50 88.00

Q3 99.00 92.00

Maximum

Grade C F A A B B D B B B C F D B D B C B B A D C B C D

Elementary Statistics

Dr. Ghamsary

Chapter 2

Page 37

Boxplot of Test1 vs Sex 100

90

Test1

80

70

60

50 Female

Male Sex

37