The COVID tracking project

 

Since the COVID tracking project has discontinued, there is no single federal database with all covid data. However, the Center for Disease Control (CDC) COVID Data Tracker contains links out to covid data for states and counties.
Use the links provided by the CDC COVID data tracker to find the covid data for your state:
Sort the data by state and then by submission date. Copy all of the data for your state to a new sheet.
Create a new column AR in the database labeled % positive. The formula to calculate % positive should be = AD1/AP1. Format the cell for percentage. Copy the cell down the column for all dates of your state.
Using the data, develop a linear regression time series analysis for deaths (column D), % tested, and % positive. Answer the following questions:
What is the null and alternative hypothesis for each variable?
What was the r squared for each variable? What does this mean?
What was the p-value of each test? What does this mean?
For those tests which were significant, use the model to predict the value of the variable seven days after the end of the workshop.
Write a short report (1 to 2 pages for each variable) that includes the results of your analysis. Present the results and discuss the implications of your findings. Include whatever graphs or statistical output you may have generated in answering these questions along with a short explanation of your analysis. What conclusions concerning COVID19 may you draw from your analysis?

Sample Solution

Data Preparation

  1. Create a new column for % positive: In column AR, enter the formula = AD1/AP1 and copy it down for all rows. This will calculate the percentage of positive tests for each date.
  2. Format the column: Format column AR as a percentage.

Linear Regression Analysis

Null Hypothesis (H0): There is no linear relationship between the independent variable (time) and the dependent variable (deaths, % tested, or % positive).

Alternative Hypothesis (H1): There is a linear relationship between the independent variable (time) and the dependent variable.

Analysis and Interpretation:

  1. Conduct linear regression analysis: Use a statistical software package like R or Python to perform linear regression analysis on the following variables:

    • Deaths (column D) vs. time
    • % Tested (column AP) vs. time
    • % Positive (column AR) vs. time
  2. Evaluate the r-squared value: The r-squared value measures the proportion of variance in the dependent variable that is explained by the independent variable. A higher r-squared value indicates a stronger linear relationship.  

  3. Evaluate the p-value: The p-value indicates the statistical significance of the relationship. A p-value less than 0.05 suggests a statistically significant relationship.

Interpretation:

  • If the r-squared value is high and the p-value is low: There is a strong linear relationship between the independent variable and the dependent variable. This suggests that changes in the independent variable are associated with changes in the dependent variable.
  • If the r-squared value is low and the p-value is high: There is a weak or no linear relationship between the independent variable and the dependent variable. This suggests that changes in the independent variable do not significantly explain changes in the dependent variable.

Predicting Future Values:

For significant relationships, you can use the regression model to predict the value of the dependent variable for a future date (seven days after the end of the workshop).

Report

[Insert your state name]

COVID-19 Data Analysis

This report presents the results of a linear regression analysis on COVID-19 data for [your state]. The analysis examines the relationship between time and three variables: deaths, % tested, and % positive.

Results:

  • Deaths: The analysis found a [significant/non-significant] linear relationship between time and deaths. The r-squared value was [value], indicating that [percentage] of the variation in deaths can be explained by time. The p-value was [value], suggesting that the relationship is [significant/not significant].
  • % Tested: [Similar analysis and interpretation for % tested]
  • % Positive: [Similar analysis and interpretation for % positive]

Discussion:

[Discuss the implications of your findings, considering factors such as public health policies, vaccination rates, and emerging variants.]

Conclusions:

[Draw conclusions based on your analysis, considering the limitations of the data and the potential implications for future COVID-19 prevention and control efforts.]

This question has been answered.

Get Answer