Fundamentals Of Data Science
Using the same dataset as last week or a different one, select two qualitative variables and two quantitative variables. Explain why you selected these variables.
Analysis:
For your qualitative variables, create a contingency table and calculate the association between them.
For your quantitative variables, calculate the correlation between them. Include scatter plot to visually represent this relationship.
Interpretation: Explain your findings. What does the association or correlation say about the relationship between your variables? Is the relationship strong, weak, positive, negative, or nonexistent?
Reflection: Reflect on the importance of understanding associations and correlations in data analysis and how they can guide further data investigation.
Building on the theme of education, let's analyze a hypothetical dataset on student sleep habits and academic performance.
Selected Variables:
drive_spreadsheetExport to Sheets
Calculation of Association (Chi-Square Test):
A Chi-Square test would be performed to assess if there is a statistically significant association between sleep quality and course difficulty. This test helps determine if the observed distribution of sleep quality across difficulty levels deviates from what would be expected by chance.
Quantitative Variables:
- Qualitative:
- Sleep Quality (Good, Fair, Poor): This variable captures the subjective perception of sleep quality reported by students.
- Course Difficulty (Easy, Moderate, Hard): This variable categorizes courses based on student perception of the workload and difficulty level.
- Quantitative:
- Sleep Duration (hours):This variable measures the number of hours students typically sleep per night.
- GPA (Grade Point Average):This variable represents a student's overall academic performance on a numerical scale.
- Contingency Table (Sleep Quality vs. Course Difficulty):
| Course Difficulty | Good Sleep | Fair Sleep | Poor Sleep | Total |
| Easy | ||||
| Moderate | ||||
| Hard | ||||
| Total |
- Correlation between Sleep Duration and GPA:
- Scatter Plot:
- Contingency Table:If the Chi-Square test results in a statistically significant p-value (less than 0.05), it suggests an association between sleep quality and course difficulty. Further analysis, like looking at the cell values, would be needed to understand the nature of the association (e.g., students with poor sleep quality might be more likely to enroll in easier courses).
- Correlation and Scatter Plot:The correlation coefficient would indicate the strength and direction of the relationship between sleep duration and GPA. A positive correlation would suggest that students who sleep more tend to have higher GPAs, while a negative correlation would suggest the opposite. The scatter plot would visually depict this trend, with a tighter clustering of points suggesting a stronger correlation.