Since the COVID tracking project has discontinued, there is no single federal database with all covid data. However, the Center for Disease Control (CDC) COVID Data Tracker contains links out to covid data for states and counties.
Use the links provided by the CDC COVID data tracker to find the covid data for your state:
Sort the data by state and then by submission date. Copy all of the data for your state to a new sheet.
Create a new column AR in the database labeled % positive. The formula to calculate % positive should be = AD1/AP1. Format the cell for percentage. Copy the cell down the column for all dates of your state.
Using the data, develop a linear regression time series analysis for deaths (column D), % tested, and % positive. Answer the following questions:
What is the null and alternative hypothesis for each variable?
What was the r squared for each variable? What does this mean?
What was the p-value of each test? What does this mean?
For those tests which were significant, use the model to predict the value of the variable seven days after the end of the workshop.
Write a short report (1 to 2 pages for each variable) that includes the results of your analysis. Present the results and discuss the implications of your findings. Include whatever graphs or statistical output you may have generated in answering these questions along with a short explanation of your analysis. What conclusions concerning COVID19 may you draw from your analysis?
= AD1/AP1
and copy it down for all rows. This will calculate the percentage of positive tests for each date.Null Hypothesis (H0): There is no linear relationship between the independent variable (time) and the dependent variable (deaths, % tested, or % positive).
Alternative Hypothesis (H1): There is a linear relationship between the independent variable (time) and the dependent variable.
Analysis and Interpretation:
Conduct linear regression analysis: Use a statistical software package like R or Python to perform linear regression analysis on the following variables:
Evaluate the r-squared value: The r-squared value measures the proportion of variance in the dependent variable that is explained by the independent variable. A higher r-squared value indicates a stronger linear relationship.
Evaluate the p-value: The p-value indicates the statistical significance of the relationship. A p-value less than 0.05 suggests a statistically significant relationship.
Interpretation:
Predicting Future Values:
For significant relationships, you can use the regression model to predict the value of the dependent variable for a future date (seven days after the end of the workshop).
[Insert your state name]
COVID-19 Data Analysis
This report presents the results of a linear regression analysis on COVID-19 data for [your state]. The analysis examines the relationship between time and three variables: deaths, % tested, and % positive.
Results:
Discussion:
[Discuss the implications of your findings, considering factors such as public health policies, vaccination rates, and emerging variants.]
Conclusions:
[Draw conclusions based on your analysis, considering the limitations of the data and the potential implications for future COVID-19 prevention and control efforts.]