The COVID tracking project
Data Preparation
- Create a new column for % positive: In column AR, enter the formula
= AD1/AP1and copy it down for all rows. This will calculate the percentage of positive tests for each date. - Format the column: Format column AR as a percentage.
Linear Regression Analysis
Null Hypothesis (H0): There is no linear relationship between the independent variable (time) and the dependent variable (deaths, % tested, or % positive).
Alternative Hypothesis (H1): There is a linear relationship between the independent variable (time) and the dependent variable.
Analysis and Interpretation:
-
Conduct linear regression analysis: Use a statistical software package like R or Python to perform linear regression analysis on the following variables:
- Deaths (column D) vs. time
- % Tested (column AP) vs. time
- % Positive (column AR) vs. time
-
Evaluate the r-squared value: The r-squared value measures the proportion of variance in the dependent variable that is explained by the independent variable. A higher r-squared value indicates a stronger linear relationship.
-
Evaluate the p-value: The p-value indicates the statistical significance of the relationship. A p-value less than 0.05 suggests a statistically significant relationship.
Interpretation:
- If the r-squared value is high and the p-value is low: There is a strong linear relationship between the independent variable and the dependent variable. This suggests that changes in the independent variable are associated with changes in the dependent variable.
- If the r-squared value is low and the p-value is high: There is a weak or no linear relationship between the independent variable and the dependent variable. This suggests that changes in the independent variable do not significantly explain changes in the dependent variable.
Predicting Future Values:
For significant relationships, you can use the regression model to predict the value of the dependent variable for a future date (seven days after the end of the workshop).
Report
[Insert your state name]
COVID-19 Data Analysis
This report presents the results of a linear regression analysis on COVID-19 data for [your state]. The analysis examines the relationship between time and three variables: deaths, % tested, and % positive.
Results:
- Deaths: The analysis found a [significant/non-significant] linear relationship between time and deaths. The r-squared value was [value], indicating that [percentage] of the variation in deaths can be explained by time. The p-value was [value], suggesting that the relationship is [significant/not significant].
- % Tested: [Similar analysis and interpretation for % tested]
- % Positive: [Similar analysis and interpretation for % positive]
Discussion:
[Discuss the implications of your findings, considering factors such as public health policies, vaccination rates, and emerging variants.]
Conclusions:
[Draw conclusions based on your analysis, considering the limitations of the data and the potential implications for future COVID-19 prevention and control efforts.]