《社会科学统计》(英文版)_Lecture 4 Chi-square Test (W)

Crosstabulation Lecture 4 Crosstabulations are also called contingency tables eir simplest form t Chi-Square Test count of the categories of one variable for each category of another variable For example, we might like to examine a ablation of age of t whether they have ever had a child E199T Suvey Iets-srss Ist.Editer File Edit Yiw Data turfan analyze Graphs utilities finde yelp This lecture covers Crosstabulation Chi-Square Test
1 1 Lecture 4 Chi-Square Test 2 This lecture covers • Crosstabulation • Chi-Square Test 2 3 Crosstabulation • Crosstabulations are also called contingency tables or two-way frequency tables. In their simplest form they are the count of the categories of one variable for each category of another variable. • For example, we might like to examine a crosstabulation of age of woman with whether they have ever had a child. 4

Cae processing summay We can also calculate row or column percentages. The following table shows column percentages. It presents the percentage age distribution for each year age group* whether have any chid category of WCEB. We can see straight away that the no category is younger than the yes We call this a 7x2 table because it has 5year age group *whether have any child 7 rows and 2 columns a table with r rows and c columns is an rxc table. The wihin whether table shows the relationship between two hether have ay gorica variables. The explanatory variable is the treatment(the drugs). The response variable is success(no relapse) 24%183% or failure(relapse). The two-way table gives the counts for all 6 combinations of 155%126% values of these variables each of the counts occupies a cell of the table
3 5 6 • We call this a table because it has 7 rows and 2 columns. A table with r rows and c columns is an table. The table shows the relationship between two categorical variables. The explanatory variable is the treatment (the drugs). The response variable is success (no relapse) or failure (relapse). The two-way table gives the counts for all 6 combinations of values of these variables. Each of the counts occupies a cell of the table. r c × 7 2 × 4 7 • We can also calculate row or column percentages. The following table shows column percentages. It presents the percentage age distribution for each category of WCEB. We can see straight away that the NO category is younger than the YES category. 8 5-year age group * whether have any child Crosstabulation % within whether have any child 51.9% .0% 10.3% 36.0% 7.3% 13.0% 7.9% 22.1% 19.3% 1.5% 22.4% 18.3% 1.3% 14.4% 11.8% .6% 18.2% 14.7% .9% 15.5% 12.6% 100.0% 100.0% 100.0% 15-19 20-24 25-29 30-34 35-39 40-44 45-49 5-year age group Total no yes whether have any child Total

The question is: Is there a significant When creating crosstabulations it is relationship between woman's age and standard practice to use the dependent ng ever nad a variable as the rows and the independent variable as the columns We can create a crosstabulation with three variables. For example, we may want to see the age distribution for WCEB for urban and rural area separately. This is shown in the following table Please examine this question by your own after class. I will discuss some other examples Rural and urban age distribution for Example 1: Treating cocaine WCEB addiction This is a three-year study on medication to help cocaine addicts stay off cocaine: D, L, and P. Each treatment was randomly assigned with 24 subjects The counts and proportions who avoided relapse into caine use during the study
5 9 • When creating crosstabulations it is standard practice to use the dependent variable as the rows and the independent variable as the columns. • We can create a crosstabulation with three variables. For example, we may want to see the age distribution for WCEB for urban and rural area separately. This is shown in the following table. 10 Rural and urban age distribution for WCEB 5-year age group * whether have any child * place of residence Crosstabulation % within whether have any child 57.3% .1% 10.7% 33.9% 8.4% 13.2% 5.0% 22.9% 19.6% 1.4% 22.8% 18.8% 1.4% 12.9% 10.8% .3% 17.9% 14.6% .8% 15.0% 12.3% 100.0% 100.0% 100.0% 37.4% 8.9% 41.5% 3.3% 12.4% 15.6% 19.3% 18.4% 1.9% 21.1% 16.5% 1.1% 19.9% 15.4% 1.5% 19.1% 14.9% 1.1% 17.3% 13.5% 100.0% 100.0% 100.0% 15-19 20-24 25-29 30-34 35-39 40-44 45-49 5-year age group Total 15-19 20-24 25-29 30-34 35-39 40-44 45-49 5-year age group Total place of residence rural urban no yes whether have any child Total 5-year age group * whether have any child * place of residence Crosstabulation % within 5-year age group 99.5% .5% 100.0% 48.0% 52.0% 100.0% 4.8% 95.2% 100.0% 1.4% 98.6% 100.0% 2.4% 97.6% 100.0% .4% 99.6% 100.0% 1.3% 98.7% 100.0% 18.7% 81.3% 100.0% 100.0% 100.0% 79.4% 20.6% 100.0% 20.1% 79.9% 100.0% 2.7% 97.3% 100.0% 1.7% 98.3% 100.0% 2.4% 97.6% 100.0% 2.0% 98.0% 100.0% 23.8% 76.2% 100.0% 15-19 20-24 25-29 30-34 35-39 40-44 45-49 5-year age group Total 15-19 20-24 25-29 30-34 35-39 40-44 45-49 5-year age group Total place of residence rural urban no yes whether have any child Total 6 11 • The question is: Is there a significant relationship between woman’s age and having ever had a child? 5-year age group 15-19 20-24 25-29 30-34 35-39 40-44 45-49 Mean whether have any child 1.2 1.0 .8 .6 .4 .2 0.0 age 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Mean whether have any child 1.2 1.0 .8 .6 .4 .2 0.0 Please examine this question by your own after class. I will discuss some other examples. 12 Example 1: Treating cocaine addiction • This is a three-year study on medication to help cocaine addicts stay off cocaine: D, L, and P. Each treatment was randomly assigned with 24 subjects. The counts and proportions who avoided relapse into cocaine use during the study:

Group Treatment Subjects No relapse Proportion Here is the two-way table of the cocaine addiction data 0.250 0.167 relapse No Ye D 080 The sample proportions of subjects who We want to test the null hypothesis that stayed off cocaine are quite different. Are there are no differences among the these data good evidence that the proportions of successes for addicts given proportions of successes for the three the three treatments treatments differ in the population of all cocaine addicts? P1=p2=P3 Does success differ significantly The alternative hypothesis is that there is between the treatments? some difference that not all three proportions are equal Is there a significant relationship between treatment and success? H,: not all of p, p,, and pa are equal
7 13 3 P 24 4 0.167 2 L 24 6 0.250 1 D 24 14 0.583 Group Treatment Subjects No relapse Proportion 14 • The sample proportions of subjects who stayed off cocaine are quite different. Are these data good evidence that the proportions of successes for the three treatments differ in the population of all cocaine addicts? Does success differ significantly between the treatments? Is there a significant relationship between treatment and success? 8 15 Here is the two-way table of the cocaine addiction data: 16 • We want to test the null hypothesis that there are no differences among the proportions of successes for addicts given the three treatments: • The alternative hypothesis is that there is some difference, that not all three proportions are equal: 01 2 3 Hp p p : = = 1 12 3 H pp p : not all of , , and are equal

To test Ho, we compare the observed In more formal language, if we have n counts in a two way table with the dependent tries and the probability of a expected counts, the counts we would success on each try is p, we expect np expect if H, were true. If the observed successes If we draw an sRs of n counts are far from the expected counts individuals from a population in which the that is evidence against H proportion of successes is p, we expect np accesses in the sample. That s the fact behind the formula for expected counts in Expected counts Let's apply this fact to the cocaine study The two-way table with row and column The expected count in any cell of a two totals is way table when H, is true is expected count row total x column total
9 17 • To test , we compare the observed counts in a two-way table with the expected counts, the counts we would expect if were true. If the observed counts are far from the expected counts, that is evidence against . H0 H0 H0 18 Expected counts • The expected count in any cell of a twoway table when is true is H0 row total column total expected count table total × = 10 19 • In more formal language, if we have n independent tries and the probability of a success on each try is p, we expect np successes. If we draw an SRS of n individuals from a population in which the proportion of successes is p, we expect np successes in the sample. That’s the fact behind the formula for expected counts in a two-way table. 20 • Let’s apply this fact to the cocaine study. The two-way table with row and column totals is

will find the expected count for the cell in row 1 and column 1. The proportion of Observed versus expected counts all 72 subjects who succeed in avoiding a count of successes column I total 24 Yes table total table total 72 3 D 16 Think of this as p, the overall proportion of 16 successes. If H is true, we expect this same proportion of successes in all three groups Because 1/3 of all subjects succeed, we So the expected count of successes among the 24 subjects who took D is expect 1/3 of the 24 subjects in each group to avoid a relapse if there are no differences among the treatments. In fact, D has more successes(14)and fewer failures(10)than expected. The Phas fewer successes (4)and more relapses This expected count has the form (20). d does much better than P, with L in row I total× column l total24×24 table total 72
11 21 • We will find the expected count for the cell in row 1 and column 1. The proportion of all 72 subjects who succeed in avoiding a relapse is count of successes column 1 total 24 1 table total table total 72 3 = == Think of this as p, the overall proportion of successes. If is true, we expect this same proportion of successes in all three groups. H0 22 • So the expected count of successes among the 24 subjects who took D is 1 24 8 3 np = ×= This expected count has the form: row 1 total column 1 total 24 24 table total 72 × × = 12 23 Observed versus expected counts 24 • Because 1/3 of all subjects succeed, we expect 1/3 of the 24 subjects in each group to avoid a relapse if there are no differences among the treatments. In fact, D has more successes (14) and fewer failures (10) than expected. The P has fewer successes (4) and more relapses (20). D does much better than P, with L in between

The chi-Square Test The chi-square statistic is a sum of term one for each cell in the table. In the The statistical test that tells us whether cocaine example, 14 of the D group those differences are statistically succeeded in avoiding a relapse. The significant compares the observed and expected count for this cell is 8. So the expected counts. The test statistic that component of the chi-square statistic from makes the comparison is the chi-square this cell is statistic (observed count-expected count) expected count (14-8)36 88 Chi-square statistic Think of the chi-square statistic z as a measure of the distance of the observed The chi-square statistic is a measure of counts from the expected counts. Like any how far the observed counts in a two-way distance, it is always zero or positive, and table are from the expected counts. The it is zero only when the observed counts formula for the statistic is are exactly equal to the expected counts arge values of x are evidence against H (observed count-expected count)- because they say that the observed counts are far from what we would expect if h. were tru
13 25 The Chi-Square Test • The statistical test that tells us whether those differences are statistically significant compares the observed and expected counts. The test statistic that makes the comparison is the chi-square statistic. 26 Chi-square statistic • The chi-square statistic is a measure of how far the observed counts in a two-way table are from the expected counts. The formula for the statistic is 2 2 (observed count expected count) expected count χ − = ∑ 14 27 • The chi-square statistic is a sum of terms, one for each cell in the table. In the cocaine example, 14 of the D group succeeded in avoiding a relapse. The expected count for this cell is 8. So the component of the chi-square statistic from this cell is 2 2 (observed count expected count) expected count (14 8) 36 4.5 8 8 − − = == 28 • Think of the chi-square statistic as a measure of the distance of the observed counts from the expected counts. Like any distance, it is always zero or positive, and it is zero only when the observed counts are exactly equal to the expected counts. Large values of are evidence against because they say that the observed counts are far from what we would expect if were true. 2 χ 2 χ H0 H0

The chi-square distribution There are three major properties The chi-square distributions are a family of of a chi-square distribution distributions that take only positive values and are skewed to the right. A specific chi- Chi-square is either 0 or positive, never square distribution is specified by giving its degrees of freedom A chi-square distribution in not symmetrical The chi-square test for a two-way table Its skewness is positive. As the number of with r rows and c columns uses critical degrees of freedom increases, chi-square values from the chi-square distribution with (r-1)(c-1)degrees of freedom. The P-value approaches a symmetric distribution is the area to the right of x under the chi- There is a particular distribution for each quare density curve degree of freedom Figure 1 shows the density curves for three z/X members of the chi-square family of Table e distributions gives critical values distributions to find p for a 出##出 test 需
15 29 The chi-square distribution • The chi-square distributions are a family of distributions that take only positive values and are skewed to the right. A specific chisquare distribution is specified by giving its degrees of freedom. • The chi-square test for a two-way table with r rows and c columns uses critical values from the chi-square distribution with (r-1)(c-1) degrees of freedom. The P-value is the area to the right of under the chisquare density curve. 2 χ 30 Figure 1 shows the density curves for three members of the chi-square family of distributions. 16 31 There are three major properties of a chi-square distribution • Chi-square is either 0 or positive, never negative. • A chi-square distribution in not symmetrical. Its skewness is positive. As the number of degrees of freedom increases, chi-square approaches a symmetric distribution. • There is a particular distribution for each degree of freedom. 32 Table E gives critical values for chi-square distributions. Use Table E to find Pvalue for a chi-square test

Using SPSS we can easily find the P-value We use the formula to calculate chi-square statistic. x'=y(observed count-expected count) 82+01656:8) 8-162(4-8)(20-162 16 16 =450+225+0.500+025+200+100 =1050 OMEUTE prob. 1-CDE, 0HISQ010,5.2) The two-way table has 3 rows and 2 If we want our significance level to be columns. That is, [3, C=2. The chi-square 0.05, the critical value is 5.99. To reject statistic therefore has degrees of freedom the null hypothesis at the 0.05 level the 1)c1)=(3-1)(21)=(2)1)=2 value of chi-square needs to be greater than 5.99. If it were less than 5.99 the null Look in the df=2 row of table e. the chi- quare statistic x =10.5 falls between the hypothesis would be accepted 0.01 and 0.05 critical values Remember In this example, the value of chi-square is 10.5, which is greater than 5.99, so we that the chi-square test is always one- sided So the P-value of x =10.5 is reject the null hypothesis in this case. We between 0.01 and 0.05 The p-value is can conclude that there is a statistically equal to 0.005 when rounded to three ignificant difference in effects of the decimal places treatments(drugs)
17 33 • We use the formula to calculate chi-square statistic: 2 2 2 22 22 2 (observed count expected count) expected count (14 8) (10 16) (6 8) 8 16 8 (18 16) (4 8) (20 16) 16 8 16 4.50 2.25 0.500 0.25 2.00 1.00 10.50 χ − = ∑ −−− =+ + −−− + ++ =++ +++ = 34 • The two-way table has 3 rows and 2 columns. That is, r=3, c=2. The chi-square statistic therefore has degrees of freedom (r-1)(c-1)=(3-1)(2-1)=(2)(1)=2. • Look in the df=2 row of Table E. The chisquare statistic =10.5 falls between the 0.01 and 0.05 critical values. Remember that the chi-square test is always onesided. So the P-value of =10.5 is between 0.01 and 0.05. The P-value is equal to 0.005 when rounded to three decimal places. 2 χ 2 χ 18 35 Using SPSS we can easily find the P-value. 36 • If we want our significance level to be < 0.05, the critical value is 5.99. To reject the null hypothesis at the 0.05 level the value of chi-square needs to be greater than 5.99. If it were less than 5.99 the null hypothesis would be accepted. • In this example, the value of chi-square is 10.5, which is greater than 5.99, so we reject the null hypothesis in this case. We can conclude that there is a statistically significant difference in effects of the treatments (drugs)

Using Crosstabs in SPSS Calculating the expected counts and then the chi-square statistic by hand is a bit time- consuming. We can avoid this trouble by using SPSS's crosstabs But you need to arrange the data in the following format: PSS Data Ed EieE出 lie Data rasion Analyz Aied CeerI tile Edit Miex Data Transfar 回母型叫回回上 rpre 1 ADT· RF APSF Drstahdatin relate clinear parametric tes multiple Pearse Issing Value Analysis
19 37 Using Crosstabs in SPSS • Calculating the expected counts and then the chi-square statistic by hand is a bit timeconsuming. We can avoid this trouble by using SPSS’s crosstabs. But you need to arrange the data in the following format: 38 20 39 40
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《SPSS For Windows》第三讲 SPSS的主要窗口和菜单.ppt
- 《SPSS For Windows》第一讲 SPSS简介及数据编码录入.ppt
- 云南财贸学院:《审计学》教案课件_第十六章 计算机审计(罗莉).ppt
- 云南财贸学院:《审计学》教案课件_第十五章 内部审计(朱锦余).ppt
- 云南财贸学院:《审计学》教案课件_第十四章 政府审计(朱锦余).ppt
- 云南财贸学院:《审计学》教案课件_第十三章 审计报告(朱锦余).ppt
- 云南财贸学院:《审计学》教案课件_第十二章 现金收支循环审计(杨静).ppt
- 云南财贸学院:《审计学》教案课件_第十一章 投资循环审计(杨静).ppt
- 云南财贸学院:《审计学》教案课件_第十章 筹资循环审计(古淑萍).ppt
- 云南财贸学院:《审计学》教案课件_第九章 员工服务、生产与仓储循环审计(朱锦余).ppt
- 云南财贸学院:《审计学》教案课件_第八章 采购与付款循环审计(朱锦余).ppt
- 云南财贸学院:《审计学》教案课件_第七章 销售与收款循环审计(曾纯).ppt
- 云南财贸学院:《审计学》教案课件_第六章 内部控制及其评审.ppt
- 云南财贸学院:《审计学》教案课件_第四章 审计证据与审计工作底稿(朱锦余).ppt
- 云南财贸学院:《审计学》教案课件_第五章 重要性与审计风险.ppt
- 云南财贸学院:《审计学》教案课件_第二章 审计组织与审计规范体系(古淑萍).ppt
- 云南财贸学院:《审计学》教案课件_第三章 审计目标和审计计划.ppt
- 云南财贸学院:《审计学》教案课件_第一章 概述(朱锦余).ppt
- 《统计学练习》第七章 相关分析.doc
- 《统计学练习》第六章 抽样调查.doc
- 《时间序列模型》课程教材讲义(ARIMA)第6讲 单位根检验.doc
- 《时间序列模型》课程教材讲义(ARIMA)第1讲 季节时间序列(SARIMA)模型.doc
- 《时间序列模型》课程教材讲义(ARIMA)第2讲 ARMA 模型的干扰分析.doc
- 《时间序列模型》课程教材讲义(ARIMA)第3讲 回归与 ARMA 组合模型.doc
- 《时间序列模型》课程教材讲义(ARIMA)第4讲 模型诊断与检验.doc
- 《时间序列模型》课程教材讲义(ARIMA)第5讲 非平稳随机过程.doc
- 广东海洋大学食品科技学院:《试验设计与数据处理》课程教学资源(PPT课件讲稿)第三章 统计基础(刘书成).ppt
- 广东海洋大学食品科技学院:《试验设计与数据处理》课程教学资源(PPT课件讲稿)第四章 方差分析.ppt
- 广东海洋大学食品科技学院:《试验设计与数据处理》课程教学资源(PPT课件讲稿)第八章 均匀试验设计及其应用.ppt
- 广东海洋大学食品科技学院:《试验设计与数据处理》课程教学资源(PPT课件讲稿)第六章 正交试验设计.ppt
- 广东海洋大学食品科技学院:《试验设计与数据处理》课程教学资源(PPT课件讲稿)第七章 正交回归设计.ppt
- 广东海洋大学食品科技学院:《试验设计与数据处理》课程教学资源(PPT课件讲稿)第一章 绪论(刘书成)、第二章 试验设计的基本知识.ppt
- 《经济计量学》课程教学资源(书籍教材参考资料,第二版,共十五章).pdf
- 上海财经大学:《统计学原理》课程教学资源(PPT课件讲稿)第一章 导论(曹刚、李文新).ppt
- 上海财经大学:《统计学原理》课程教学资源(PPT课件讲稿)第八章 假设检验与方差分析.ppt
- 上海财经大学:《统计学原理》课程教学资源(PPT课件讲稿)第九章 相关与回归分析.ppt
- 上海财经大学:《统计学原理》课程教学资源(PPT课件讲稿)第六章 统计指数.ppt
- 上海财经大学:《统计学原理》课程教学资源(PPT课件讲稿)第七章 抽样与抽样估计.ppt
- 上海财经大学:《统计学原理》课程教学资源(PPT课件讲稿)第三章 统计数据整理.ppt
- 上海财经大学:《统计学原理》课程教学资源(PPT课件讲稿)第十章 国民经济统计指标分析.ppt