Task brief:

You need to prepare a group analysis report (using Microsoft Word), which requires an amount of computer work and written comment.

In this report, you will analyse a randomised subset of a student survey. It is an in-class survey of statistics students over several years. We only consider the First Year data. Also, this subset only has four variables, namely Sex [F and M denote female and male student respectively], HigherSAT [Which SAT is higher? Math or Verbal], Height [in inches] and Weight [in pounds].

Task 1: Identify data type (4 Marks)

Identify each variable (i.e., Sex, HigherSAT, Height and Weight) in the subset whether it is categorical

nominal, categorical ordinal, quantitative discrete or quantitative continuous.

Task 2: Use the Chi-square test at the 5% level of significance to determine whether Sex and HigherSATare independent. (10 Marks)

a. Create the contingency table to show student’s Sex frequencies by the HigherSAT variable

(i.e., Math or Verbal). (3 Marks)

b. State the null and alternative hypotheses. (1 Mark)

c. Calculate expected values. (2 Marks)

d. Calculate the test statistics. 2 Marks)

e. State the rule of decision. (1 Mark)

f. State the conclusion. (1 Mark)

Task 3: Perform the regression analysis of the Height variable on the Weight variable. (10 marks)

a. Use the Excel Regression function in the Data Analysis Tools Add-ins to create a simple regression model. The straight-line equation is y = a + bx where y and x are the Weight and Height variables respectively. (3 Marks)

b. Determine and interpret the Coefficient of Determination, i.e., R square. (2 Marks)

c. Determine the regression equation. Interpret the slope. (2 Marks)

d. Perform the hypothesis test to determine if the slope is significantly differently from zero at

5% significance level. (3 Marks)

Conclusion: (6 marks)

From the Chi-square test, you would be able to conclude whether the relationship between Sex and

HigherSAT are independent. Also, the regression result would show whether Height is a significant

predictor for Weight. Write a short report summarising your main findings.

Answer:

**DATA ANALYSIS**

Data analysis is an essential component of project activities. When data is used efficiently, it enables a deeper insight into responders’ opinions and the identification of the project’s future scope of work. There are several methods to use data to get greater success. In data analysis, part of the study shows that there are no different analyses included in the project activity such as descriptive, diagnostic, predictive, and prescriptive analysis. This present study examines the data analysis survey of several students during the year 2018 and the study analyses the importance of Higher SAT value, height and weight, and sex of the students. For analysis purposes, statistical tools like chi-square and a regression model are applied. The students’ data is stated below

SL.NO | SEX | SAT | SAT VALUE | HEIGHT (Inches) | WEIGHT (Pound) |

1 | Female | Math | Higher | 5’1’’ | 127.868 |

2 | Male | Math | Higher | 5’0’’ | 132.277 |

3 | Male | verbal | Lower | 4’10’’ | 99.208 |

4 | Male | Math | Higher | 5’0’’ | 105.822 |

5 | Female | Verbal | Lower | 5’5’’ | 114.64 |

6 | Female | Math | Higher | 5’2’’ | 108.027 |

7 | Male | Math | Lower | 5’3’’ | 123.459 |

8 | Male | Verbal | Lower | 4’9’’ | 136.687 |

9 | Female | Verbal | Lower | 5’8’’ | 114.64 |

10 | Male | Verbal | Higher | 4’8’’ | 99.208 |

11 | Female | Math | Lower | 6’0’’ | 136.687 |

12 | Female | Verbal | Higher | 5’10’’ | 123.459 |

13 | Male | Math | Higher | 4’11’’ | 101.413 |

14 | Female | Math | Lower | 5’7’’ | 143.3 |

15 | Female | Verbal | Higher | 4’8’’ | 94.799 |

16 | Male | Math | Higher | 5’3’’ | 119.05 |

17 | Female | Math | Higher | 5’0’’ | 121.254 |

18 | Male | Verbal | Lower | 4’9’’ | 112.436 |

19 | Male | Math | Higher | 5’7’’ | 138.891 |

20 | Female | verbal | Higher | 5’1’’ | 105.822 |

21 | Female | Verbal | Lower | 6’0’’ | 143.3 |

22 | Male | Math | Higher | 5’5’’ | 127.868 |

23 | Male | Verbal | Lower | 5’2’’ | 116.845 |

24 | Female | Math | Higher | 5’8’’ | 125.663 |

25 | Male | Verbal | Higher | 6’0’’ | 152.119 |

26 | Female | Verbal | Higher | 4’9’’ | 119.05 |

27 | Female | Math | Lower | 5’3’’ | 127.868 |

28 | Male | Verbal | Lower | 5’7’’ | 105.822 |

29 | Male | Math | Higher | 5’9’’ | 110.231 |

30 | Female | Verbal | Lower | 6’0’’ | 147.71 |

**CLASSIFICATION OF DATA**

In the data analysis, the study classifies the data into different types such as nominal data, ordinal data, discrete and continuous. Nominal and Ordinal data types come into the category of qualitative or categorical data types. Discrete and continuous data are included in the quantitative data type.

**QUALITATIVE DATA ANALYSIS**– qualitative or categorical data explained that the subject under study uses a finite number of discrete categories. This implies that this sort of data cannot be readily tallied or quantified with numerals and must therefore be classified. Qualitative data cannot be expressed numerically or quantified. Qualitative data is made up of phrases, visuals, and concepts rather than statistics. Everything that describes flavor, feel appearance, or a viewpoint is regarded as quality data. This information is typically gathered through focus groups, direct qualitative interviews, or open-ended inquiries in surveys. Nominal and ordinal data are the part of qualitative data type in project works.

**Nominal Data Type –** Nominal data, often known as “named or labeled data” or a “nominal scale,” refers to any sort of information that may be used to describe something without assigning a specific score to it. Nominal data can be used by researchers to discover statistically significant variations within collections of qualitative information. Nominal data can be used by investigators to produce multiple-choice questionnaire answers or to characterize participants. Investigators can compute frequency, proportions, percentages, and central positions using nominal data. In this present data analysis, Gender, SAT, and SAT Value is is classified based on Nominal data type.

**Ordinal Data Type –** Ordinal data is qualitative data that has been classified in a specific order or on a degree. When using ordinal data, the ordering of the qualitative information is more important than the variance among each group. Ordinal survey results range from satisfied to neutral or unsatisfied. Ordinal data may be used by researchers to create graphs, while academics may use it to categorize factors or subjects into categories. Ordinal data is nearly identical to nominal data, but in terms of the order, since its classes can be arranged as 1st, 2nd, etc. However, the proportional ranges across nearby groups are not consistent.

**QUANTITATIVE DATA TYPES**– Quantitative data is any information represented in numerical form. This kind of information can be classified, categorized, surveyed, computed, or rated. The data can be presented in graphical forms, and charts and applied to the statistical data analysis method to this data. Outcomes Measurement Systems questions in assessments are an important source of numerical data collection. This data type is also described as a set of measurable data that may be utilized for arithmetic operations and statistical analysis to guide real-world judgments. Here, in this present study, the Height and weight of students’ data can be classified into a quantitative data type.

**HYPOTHESIS TESTING**

**CHI-SQUARE TEST**

A Chi-Square test is a statistical tool used to compare observed and anticipated data. This test may also be applied to see if it relates to the data’s categorical variables. It aids in determining if a discrepancy across two category factors is the result of chance or a link among them. A Chi-Square statistic test is computed using data that must be raw, randomized, derived from independent factors obtained from a random pool, and mutually excluded. Karl Pearson developed this test for categorical data analysis and distribution in 1900. ‘Pearson’s Chi-Squared Test’ is another name for this test. To verify that chi-square tests are statistically accurate, the degrees of freedom can be determined. These tests are widely used to see data with data that would be predicted if a certain hypothesis were correct.

This present study examines the chi-square test at a 5% level of significance. The level of significance of the study represents that there is the possibility of accepting or rejecting the hypothesis whether it is true or false. This present study states null and alternative hypotheses. The chi-square test follows the rule of decision-based on the level of significance.

**CALCULATION OF TEST**

**X ^{2} = ∑ ( O – E )^{2} / E**

**H0:** There is no significant difference between Student’s Sex and Higher SAT

**H1:** There is a significantly different between Students’ Sex and Higher SAT

Degree of Freedom – (r-1) (c-1)

SEX * SATVALUE CROSSTABULATION TABLE 1 | |||||

SAT VALUE | TOTAL | ||||

HIGHER SAT | LOWER SAT | ||||

SEX | MALE | Count | 9 | 6 | 15 |

Expected Count | 8.5 | 6.5 | 15.0 | ||

Residual | .5 | -.5 | |||

FEMALE | Count | 8 | 7 | 15 | |

Expected Count | 8.5 | 6.5 | 15.0 | ||

Residual | -.5 | .5 | |||

TOTAL | Count | 17 | 13 | 30 | |

Expected Count | 17.0 | 13.0 | 30.0 |

**CHI-SQUARE TABLE**

**TABLE 2**

Value | df | Asymp. Sig. (2-sided) | |

Pearson Chi-Square | .136^{a} | 1 | .713 |

Continuity Correction^{b} | .000 | 1 | 1.000 |

Likelihood Ratio | .136 | 1 | .712 |

Linear-by-Linear Association | .131 | 1 | .717 |

N of Valid Cases | 30 |

**INTERPRETATION **– While calculating the chi-square test of this present study shows that the calculated value of test statistics is 0.136 which is higher than the table value at a 5% level of significance. The null hypothesis is accepted because the calculated value is 0.136 higher and there is no significant difference between Sex and Higher SAT. The Sex and Higher SAT is independent in this study and it shows that the student’s ability In Math or Verbal is not associated with their gender. The score higher SAT is attained by the students depending on their mental, numerical, reasoning, and logical abilities and it does not depend on their sex or gender identity.

**REGRESSION ANALYSIS**

Regression analysis is a collection of statistical techniques used to calculate connections among one or more independent variables and a dependent variable. It may be used to analyze the relationship that exists among factors and to forecast their future relationship. There are various types of regression analysis, including linear, multiple linear, and nonlinear. Simple linear and multiple linear models are the most frequent. This present study uses linear regression analysis to examine the relation between the height and weight of students in the statistical survey. By applying a linear equation to observed evidence, linear regression seeks to explain the link between two factors. One element is regarded as a factor, while the other is regarded as a dependent factor. In this presentation, the dependent factor is considered to be height and the independent variable is weight.

**Calculation of Linear Regression**

H0: There is no significant prediction of students’ weight with students’ height

H0: There is a significant prediction of students’ weight with students’ height

Regression Calculation Equation = Y = b + ax

**REGRESSION ANALYSIS**

**TABLE 3**

Model | R | R Square | Adjusted R Square | Std. Error of the Estimate | |

1 | .699^{a} | .489 | .471 | .44182 |

**REGRESSION TABLE**

**TABLE 4**

Model | Unstandardized Coefficients | Standardized Coefficients | t | Sig. | ||

B | Std. Error | Beta | ||||

1 | (Constant) | .649 | .255 | 2.548 | .017 | |

Weight | .494 | .095 | .699 | 5.178 | .000 |

**INTERPRETATION – **The model summary table shows the correlation between height and weight. The coefficient of variation is the proportion of the standard deviation to the mean, and it is a helpful statistic for determining the extent of variance between datasets, even if the means are different. The adjusted R-squared is a slight modification of the R-squared that accounts for factors in a regression model that are not relevant. The adjusted R-squared indicates whether or not introducing more factors improves a regression analysis. The coefficient (0.699) and adjusted R Square (4.471) show that the variables in this present study are correlated. The adjusted R Square of 0.471 indicates that a 4.71% variation in students’ weight is explained by the heights of the students. Since the level of significance of the present study is set as 5% and the table shows that the significance value (.000) is less than 5% which means that there is a significant prediction of students’ weight with students’ height.

**CONCLUSION – **The data analysisis animportant part of thestudy and it examines the identified variables of the study. This present study identifies such variables as sex, Higher SAT, Height, and Weight, and these variables are used for chi-square analysis and regression analysis. The Chi-Square test of this present study shows that the variables between sex and Higher SAT have independent because there are no significant differences between the influence of students’ sex and Higher SAT and the students have different influences on their SAT preferences in Math and Verbal. The regression analysis of the study shows that explain that the height of students’ has significant differences between the weight of students. Students’ weight is changed according to their height. The data analysis is essential to identify the changes which arise in the study and it helps for the further scope of the study.