5. Hypothesis Testing

April 12, 2017 | Author: Mae Hodge | Category: N/A
Share Embed Donate


Short Description

1 5. Hypothesis Testing A hypothesis is a theory or statement which has yet to be proven or disproven. Statistical metho...

Description

5. Hypothesis Testing

A hypothesis is a theory or statement which has yet to be proven or disproven. Statistical methods will never prove or disprove a hypothesis with certainty.

1 / 73

5.1 Examples of hypotheses 1. Smoking causes lung cancer. 2. Sales in a shop are greater on a Friday than on other weekdays. 3. 60% of the population are dissatisfied with the government. A hypothesis always relates to populations, but statistical inference is based on results from a sample (or samples). i.e. two samples are used to test Hypothesis 2 (the value of sales on a set of Fridays and the value of sales on a set of other weekdays), but the conclusions of the test relate to the whole population of ”prospective weekdays”.

2 / 73

Examples of hypotheses

A hypothesis can very often be presented in terms of population means or proportions. 1. The proportion of the population of smokers affected by lung cancer differs from the proportion in the population of non-smokers. 2. The mean value of sales on Fridays are greater than the mean value of sales on other weekdays. 3. 60% of the population are dissatisfied with the government.

3 / 73

Intuition behind hypothesis testing

We always choose between one of two hypotheses. 1. The null hypotheses, denoted H0 , which always includes some equality i.e. 60% of the population are dissatisfied with the government i.e. H0 : p = 0.60). 2. The alterative hypothesis, denoted HA or H1 .

4 / 73

Possible Conclusions of a test

Initially, we assume H0 is correct. There are 2 possible conclusions of a test. 1. The evidence is not strong enough to reject H0 . In this case the conclusion is ’Do not reject H0 ’. 2. There is strong enough evidence to reject H0 and accept HA . In this case the conclusion is ’Accept HA ’.

5 / 73

5.2 Errors in statistical inference

Since our conclusions are never 100% certain, we should consider the possible errors. A type I error is made when H0 is true, but we reject it. A type II error is made when H0 is false, but we do not reject it. α (the probability of rejecting H0 when H0 is true) is called the significance level of a test.

6 / 73

5.2 Errors in statistical inference

Reality H0 Reality HA

Conclusion H0 OK Type II error

Conclusion HA Type I error OK

7 / 73

Relation between the probabilities of Type I and Type II errors

The significance level α is chosen. Obviously, we wish α to be small. However, decreasing α makes it harder to reject the null hypothesis even when the null hypothesis is false. Hence, decreasing α increases the probability β of a Type II error (not rejecting H0 when H0 is false). In order for both α and β to be small, we must have a large sample.

8 / 73

Commonly used significance levels

The significance level is normally taken to be 0.05 (5%). If for some reason we require strong evidence against H0 (e.g. the costs associated with wrongly accepting HA may be very high), we may test at a significance level of 0.01 or even 0.001 (1% or 0.1% respectively).

9 / 73

Example 5.1

Suppose we were given data on the salaries of males and females in comparable positions (say secondary school teachers). We may wish to test the hypothesis that the mean salary of all male secondary school teachers is equal to the mean salary of all female secondary school teachers. Since this hypothesis contains an equality, it must be H0 .

10 / 73

Example 5.1-ctd. HA is the complement of H0 i.e. the hypothesis that the mean salaries of male and female secondary school teachers is not equal. Given our data we would reject H0 if the mean salary of sampled males is ”significantly different” from the mean salary of sampled females (the concept of ”significantly” different is defined by the critical value for the significance level given - see later). Suppose H0 is true (i.e. salary levels do not depend on sex). If we reject H0 in favour of HA (i.e. that salary levels depend on sex), we commit a Type I error. Suppose HA is true (i.e. salary levels do depend on sex). If we do not reject H0 (that salaries do not depend on sex), then we commit a Type II error. 11 / 73

5.3 The procedure for testing

Hypothesis testing is based on a sample (or samples) of data. The standard procedure is as follows: 1. State H0 and HA . 2. Choice of test statistic (this is a random variable, a measure of the distance between the data and H0 ). Since the samples are random there will always be some variation, but if H0 is true then realisations of this distance will tend to be small. 3. Calculate the realisation of the test statistic (based on the sample).

12 / 73

The procedure for testing

4. Either a) Comparison of the realisation of the test statistic with the appropriate critical value. If the distance of the data from H0 is greater than the distance given by this critical value, then we reject H0 . or b) Calculation of the p-value of a test. This is a measure of the credibility of H0 (it is not the probability that H0 is true). If p < α (the significance level), then we reject H0 . 5. The conclusion should be clearly stated.

13 / 73

5.4 Interpretation of the p-value

SPSS gives the p-value for tests (in the SIG. column). If p < α, then we reject H0 More precisely, if 1. p < 0.05, there is evidence against H0 . 2. p < 0.01, there is strong evidence against H0 . 3. p < 0.001, there is very strong evidence against H0 .

14 / 73

5.5 Testing a hypothesis about a population mean

The testing procedure depends on whether the sample is small (n < 30) or large. We wish to test between the two hypotheses H0 :µ = µ0 HA :µ 6= µ0

15 / 73

5.5.2 Procedure for large samples This test is called the Z test. The test statistic is √ Z=

n(X − µ0 ) X − µ0 = s S.E (X )

Note that this is a measure of distance from the null hypothesis µ = µ0 . If the null hypothesis is correct, then realisations of this test statistic should be close to 0. The larger the sample size n, the more significant a given difference X − µ0 . The larger the dispersion of the data s (we expect that the distribution of X is more dispersed), the less significant a given difference X − µ0 . 16 / 73

Distribution of the test statistic

For large samples, the sample mean will be approximately normally distributed. It follows that √

n(X − µ0 ) ∼ N(0, 1). σ Since s will be a good approximation of σ, the test statistic √ n(X − µ0 ) X − µ0 = Z= s S.E (X ) is approximately N(0, 1) [i.e. standard normal].

17 / 73

Calculation of the p-value Denote the realisation of the test statistic by t (this is obtained by substituting in the appropriate values for the sample mean and standard deviation). |t| is the absolute value of this realisation,  t, if t ≥ 0 |t| = −t,if t < 0. The p-value of such a test is given by 2P(Z > |t|), where Z ∼ N(0, 1)

18 / 73

Calculation of p-value

19 / 73

Critical values for t-test Instead of using the p-value, we can use the critical value for a given significance level. We reject H0 when t deviates strongly from 0. 1. At a 100α% significance level, we reject H0 iff |t| > t∞, α2 , in particular 2. At a 5% significance level, we reject H0 iff |t| > t∞,0.025 = 1.96. 3. At a 1% significance level, we reject H0 iff |t| > t∞,0.005 = 2.576. 4. At a 0.1% significance level, we reject H0 iff |t| > t∞,0.0005 = 3.291.

20 / 73

Critical values and p-values

It should be noted that t∞, α2 satisfies 2P(Z > t∞, α2 ) = α. e.g. 2P(Z > 1.96) = 2 × 0.025 = 0.05. Hence, if t = t∞, α2 , then the p-value is α. Clearly, if t > t∞, α2 , then the p-value is less than α.

21 / 73

Critical values and p-values

22 / 73

Critical values and p-values

Hence, when the sample size is large 1. when |t| > 1.96, p < 0.05 i.e. we have evidence against H0 . 2. when |t| > 2.576, p < 0.01 we have strong evidence against H0 . 3. when |t| > 3.291, p < 0.001 we have very strong evidence against H0 .

23 / 73

Example 5.2

A sample of 100 Irish people were measured. Their mean height was 168cm and the sample standard deviation 12cm. Test the hypothesis that the average height of Irish people is 170cm at a significance level of 5%.

24 / 73

Solution to Example 5.2- Hypotheses

i) The hypotheses are H0 :µ = 170 HA :µ 6= 170

25 / 73

Solution to Example 5.2-Test Statistic

ii) Since the sample size is large, we use the test statistic √ Z=

n(X − µ0 ) ∼ N(0, 1), s

where µ0 is the mean from the null hypothesis.

26 / 73

Solution to Example 5.2-Realisation of the Test Statistic

iii) Now we calculate the realisation of the test statistic (distance from H0 ). √ 100(168 − 170) t= = −1.667. 12

27 / 73

Solution to Example 5.2-Comparison with Critical Value

iv) Now we compare this realisation with the appropriate critical value |t| = 1.667 < t∞,0.025 = 1.96.

28 / 73

Solution to Example 5.2-Conclusion

v) Finally, we make our conclusion. Since |t| < t∞,0.025 = 1.96, there is no evidence to reject H0 . In conclusion, we do not reject the hypothesis that the mean height of Irish people is 170cm.

29 / 73

Solution to Example 5.2-Calculating the p-value

Note that at stage iv) for large samples we can calculate the p-value. p = 2P(Z > |t|) ≈ 2P(Z > 1.67) = 2×0.0475 = 0.095 > α = 0.05. Since the p-value (measure of the credibility of H0 ) is greater than the significance level, we do not reject H0 .

30 / 73

5.5.3 Procedure for small samples

Suppose there are less than 30 observations. In this case the test statistic is √ n(X − µ0 ) X − µ0 = . T = s S.E (X ) In this case, since s will not be a very good estimator of σ, the approximation to the standard normal distribution is not appropriate. This test is called the Student t-test. Given the data come from a normal distribution, then this test statistic will have a student distribution with n − 1 degrees of freedom (T ∼ tn−1 ).

31 / 73

The student t-distribution

As mentioned before, the student distribution is very similar to the standard normal distribution. The student distribution is also symmetric about 0. Its variation is greater than the variation of the N(0,1) distribution. This reflects the uncertainty involved in the estimation of σ. As the number of observations n increases, the estimation of σ improves and for large n the tn−1 -distribution is very similar to the N(0,1) distribution. Hence, for n > 30 the critical values for the student distribution are approximated by the critical values of the normal distribution.

32 / 73

Critical values for the Student t-test

In this case the critical value for a test at a significance level of α is given by tn−1, α2 . These critical values can be read from Table 7. We reject H0 if |t| > tn−1, α2 . The critical value for a Z (large sample) test is given by t∞, α2 .

33 / 73

Assumption of normality

When a sample is small, if the data do not fit the normal distribution, then the student distribution will not be appropriate. In SPSS one may use the Kolmogorov-Smirnov test for normality (under non-parametric tests). In this test H0 states that the data come from a normal distribution. HA states that the data do not come from a normal distribution. Q-Q plots and histograms may also be used to graphically see how the data fit the normal distribution (under graphs in SPSS).

34 / 73

Example 5.3

The salaries of 16 adult Irish people was observed. The average salary was 2000 Euro and the sample standard deviation 1000 Euro. Test the null hypothesis that the mean salary of all Irish adults is 1400 Euro at significance levels of 5% and 1% .

35 / 73

Solution to Example 5.3-Hypotheses

i) First we state the hypotheses H0 :µ = 1400 HA :µ 6= 1400

36 / 73

Solution to Example 5.3-Test Statistic

ii) Since the sample size is small the test statistic is X − µ0 T = = S.E (X )

√ n(X − µ0 ) ∼ tn−1 under the assumptions of the test s

Since the distribution of salaries is certainly not normal, the assumption that the test statistic has a student distribution is not appropriate.

37 / 73

Solution to Example 5.3-Realisation of Test Statistic

iii) We calculate the realisation of the test statistic √ 16(2000 − 1400) = 2.4 t= 1000

38 / 73

Solution to Example 5.3-Comparison with Critical Values

iv) We now compare this realisation with the appropriate critical values. At a significance level of 5% this is t16−1,0.05/2 = t15,0.025 = 2.131 At a significance level of 1% this is t16−1,0.01/2 = t15,0.005 = 2.947

39 / 73

Solution to Example 5.3-Conclusions

v) Now we draw our conclusions. At a significance level of 5% |t| = 2.6 > t15,0.025 = 2.131 Hence, we reject H0 at a significance level of 5%. At a significance level of 1% |t| = 2.6 < t15,0.005 = 2.947 Hence, we do not reject H0 at a significance level of 1%.

40 / 73

Solution to Example 5.3-Final Conclusion It follows that we have evidence against the hypothesis that the mean salary of all Irish adults is 1400 Euro (since we reject at the 5% level). However, this evidence is not strong (since we do not reject at a significance level of 1%). Also, we cannot make a strong conclusion, since the assumption of the test (i.e. that the data come from a normal distribution) is not appropriate. We should also look at the data set, since one extreme result may have a huge effect on the realisation of the test statistic.

41 / 73

5.7 Testing for a difference between two population means

In such tests we have 2 samples. In order to apply the appropriate test, we must first decide whether the samples are dependent or independent.

42 / 73

5.7.1 Dependent samples

Samples are dependent if a pair of observations are made on one group of n individuals from a population under differing conditions e.g. The weight of one group of people before and after a diet. The time one group of runners take to run 400m at sea level and at altitude. The blood pressure of one group of patients before and after treatment for high blood pressure.

43 / 73

Dependent Samples In such cases we must have n pairs of observations (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ), where (xi , yi ) denotes the pair of observations made on the i-th individual (1 ≤ i ≤ n). We wish to test the hypothesis that µX − µY = k , where k is given, µX is the mean for the population under the conditions in which the first sample is taken, µY is the mean for the population under the conditions in which the second sample is taken. Define D = X − Y . Hence di = xi − yi (i.e. we calculate the differences between the two observations made on an individual). 44 / 73

Testing procedure

When the two samples are dependent, this test reduces to the following one sample test: H0 :µD = k HA :µD 6= k If the sample size is small, then this test assumes that these differences come from a normal distribution.

45 / 73

Example 5.4 A drug for reducing blood pressure is tested on a group of 7 patients. The maximum resting blood pressure of these patients was measured before and after treatment. Test the hypothesis that the drug has an effect on blood pressure. Patient Blood Pressure - Before Blood Pressure - After

1 170 160

2 190 180

3 160 150

4 180 160

5 170 160

6 160 170

7 170 150

46 / 73

Solution to Example 5.4 First we note that these observations were made on one group of patients. Hence, the samples are dependent (clearly, the blood pressure of an individual after treatment depends on his/her blood pressure before treatment). One of the hypotheses must be that the drug affects blood pressure i.e. µX 6= µY where µX is the mean blood pressure of all patients before treatment and µY is the mean blood pressure of all patients after treatment. Since this hypothesis does not contain an equality it must be the alternative HA . The null hypothesis is the complement of this hypothesis i.e. µX = µY (µX − µY = 0). 47 / 73

Solution to Example 5.4-Hypotheses

i) Hence, we have H0 :µD = µX − µY = 0(= µ0 ) HA :µD = µX − µY 6= 0

48 / 73

Solution to Example 5.4-Calculation of Differences

The first stage is to calculate the differences between the blood pressure before and after treatment (the di ). Patient Before (X ) After (Y ) Difference (X − Y )

1 170 160 10

2 190 180 10

3 160 150 10

4 180 160 20

5 170 160 10

6 160 170 -10

7 170 150 20

49 / 73

Solution to Example 5.4-Test statistic

ii) We treat the test as a 1-sample test. Since the sample is small (n = 7) √ D − µ0 n(D − µ0 ) T = = ∼ tn−1 sD S.E (D) We have 10 + 10 + . . . + 20 = 10 7 (10 − 10)2 + (10 − 10)2 + . . . + (20 − 10)2 sD2 = = 100 6 D=

50 / 73

Solution to Example 5.4-Realisation of the test statistic

iii) The realisation of the test statistic is √

n(D − µ0 ) sD √ 7(10 − 0) = √ ≈ 2.6458 100

t=

51 / 73

Solution to Example 5.4-Comparison with critical values

iv) We read the appropriate critical values from the tables. At a significance level of 5% the critical level is tn−1,α/2 = t6,0.025 = 2.447. Since |t| = 2.6458 > t6,0.025 = 2.447, we reject H0 at a significance level of 5%. At a significance level of 1% the critical level is tn−1,α/2 = t6,0.005 = 3.707. Since |t| = 2.6458 < t6,0.005 = 3.707, we do not reject H0 at a significance level of 1% .

52 / 73

Solution to Example 5.4-Conclusion

v) In conclusion, we reject H0 at a significance level of 5% , but not at a significance level of 1%. We have evidence that the drug affects blood pressure, but this evidence is not strong (especially since the sample size is small and it is difficult to say whether the change in the blood pressure should follow a normal distribution).

53 / 73

5.7.2 Test for a difference between 2 population means given 2 independent samples

In this case the 2 samples are observations made on 2 different groups of subjects. i.e. 1. The weight of Americans and the weight of Japanese. 2. The sales of a shop on Saturdays and on weekdays. 3. The blood pressure of smokers and non-smokers.

54 / 73

Independent Samples

Note: Compare these samples with the examples given for dependent samples. Understanding the difference between dependent and independent samples is fundamental. The samples will be denoted by X1 , X2 , . . . , Xm and Y1 , Y2 , . . . , Yn .

55 / 73

Independent Samples

We consider tests of the form H0 :µX − µY = 0 HA :µX − µY 6= 0 i.e. we are interested in whether two population means differ or not.

56 / 73

5.7.2.1 Large Samples When both samples are large, the test statistic is Z=

X −Y , S.E .(X − Y )

where X , Y are the sample means, sX2 , sY2 are the sample variances, m, n are the sample sizes and q S.E .(X − Y ) = (sX2 /m) + (sY2 /n). We have Z ∼ N(0, 1).

57 / 73

Example 5.5

The average earnings of 100 male teachers is 37 600 Euro with standard deviation 12 000. The average height of 50 female teachers is 36 900 Euro and the standard deviation is 10 000. Test the hypothesis that the average salaries of male teachers and female teachers are equal.

58 / 73

Solution to Example 5.5-Hypothesis

i) We have H0 :µX − µY = 0 HA :µX − µY 6= 0, where µX is the mean salary of all male teachers and µY the mean height of all female teachers.

59 / 73

Solution to Example 5.5-Test Statistic ii) Since both samples are large we use the test statistic Z=

X −Y X −Y =q . S.E .(X − Y ) (sX2 /m) + (sY2 /n)

We have s

sX2 s2 + Y = m n

Hence, t=

r

120002 100002 + = 1855 100 50

37600 − 36900 ≈ 0.3774 1855

60 / 73

Solution to Example 5.5-Conclusion

iv) Since the test statistic has a standard normal distribution, we use the same approach for drawing our conclusion as used in the Z test. At a 5% significance level, reject H0 iff |t| > t∞,0.025 = 1.96. In this case, since |t| < 1.96 we do not reject H0 at a significance level of 5%. There is no evidence of a difference between the earning of male and female teachers.

61 / 73

Small, Independent Samples

When at least one of the sample sizes is small the test assumes that: 1. The samples come from normal distributions. 2. The distributions from which the samples come have equal variance.

62 / 73

Checking Assumptions of the test when a sample is small

In order to check the second assumption, we should test H0 :σX2 = σY2 HA :σX2 6= σY2 , where σX2 and σY2 are the variations in the two populations.

63 / 73

Levene’s test for equality of population variances The test statistic is given by F , where F =

2 smax , 2 smin

2 2 smax is the maximum of the two sample variances, smin is the minimum of the two sample variances.

If the null hypothesis is correct, then this ratio should be close to 1. Values of the ratio significantly greater than 1 indicate that the hypothesis of the equality of the variances is false. We will only carry out such tests using SPSS. As usual, SPSS gives the p-value for such a test. By default, we test at the 5% level of significance. 64 / 73

Test for the equality of two population means (at least one small sample)

If we do not reject the hypothesis regarding the equality of the population variances, we can carry out the test for the equality of the two population means: H0 :µX − µY = 0 H1 :µX − µY 6= 0,

65 / 73

The test statistic

The test statistic is: T = where S.E .(X − Y ) = variance:

X −Y , S.E .(X − Y )

q sp2 (1/m + 1/n) and sp2 is the ”pooled”

sp2 =

(m − 1)sX2 + (n − 1)sY2 . m+n−2

66 / 73

The test statistic

This statistic has a Student distribution with m + n − 2 degrees of freedom, T ∼ tm+n−2 . We should use this procedure whenever at least one of the sample sizes is less than 30. However, if m + n − 2 > 30, then we may use the approximation tm+n−2,α/2 ≈ t∞,α/2 .

67 / 73

Example 5.6 The table below gives statistics regarding the weights of Americans and Japanese. Sample size mean std. dev.

Americans 15 86 12

Japanese 10 72 10

Test the hypothesis that the average weights of Japanese and Americans are the same. Note: Levene’s test for the equality of variances shows that there is no significant difference between the variances.

68 / 73

Solution to Example 5.6-Hypotheses

The hypotheses are H0 :µX = µY HA :µX 6= µY , where µX and µY are the mean weights of Americans and Japanese, respectively.

69 / 73

Solution to Example 5.6-Calculation of pooled variance

To calculate the realisation of the test statistic, we first calculate the pooled variance (m − 1)sX2 + (n − 1)sY2 m+n−2 14 × 122 + 9 × 102 = ≈ 126.78 15 + 10 − 2

sp2 =

70 / 73

Solution to Example 5.6-Realisation of the test statistic

Now we calculate the realisation of the test statistic. T=

X −Y X −Y =q S.E .(X − Y ) sp2 (1/m + 1/n)

86 − 72 t= p 126.78(1/15 + 1/10) 14 ≈ 3.05 = 4.597

71 / 73

Solution to Example 5.6-Critical values

The distribution of the test statistic is tm+n−2 = t23 . The critical values are t23,α/2 a. 5% level: t23,0.025 = 2.064 < |t| b. 1% level: t23,0.005 = 2.807 < |t| c. 0.1% level: t23,0.0005 = 3.767 > |t|

72 / 73

Solution to Example 5.6-Conclusion

We reject at the 1% level, but not at the 0.1% level. We have strong evidence that the average mass of Americans differs from the average mass of Japanese (Americans weigh more on average). Note: If Levene’s test indicates that the population variances are different, then this test is inappropriate. SPSS also carries out such tests under the assumption that the variances are different. Such tests will only be considered in the computer laboratories.

73 / 73

View more...

Comments

Copyright � 2017 SILO Inc.