Create a histogram of the variable selling price. Does it appear normally distributed? Justify your answer.
Create a correlation matrix to explore the linear correlation coefficient of each independent variable with selling price. Rank the correlation coefficients from Strongest to Weakest linear association.
Create scatterplots of the selling price vs each of the independent variables. Do all of the relationships appear linear? Does the variance appear constant? Are there any outliers?
Add a regression line to each of the scatterplots. Which one demonstrates the highest R2? Does this make sense given the visual appearance of the scatterplots? Why/Why not?
Run a simple linear regression model for each independent variable. Complete the table below.
Examine the residual plot for each independent variable. Do the residuals appear random around zero?
Are all of the variables significant predictors of selling price? Select which model you think explains best the variation around the mean selling price. Justify your choice with information from the comparison chart in 4.
List the assumptions of linear regression. Does your model violate any of these assumptions? Justify your answer (ie. what information helped you evaluate each assumption).
Data Exploration:
Histogram: Create a histogram of the selling price. Analyze the shape of the distribution.
Correlation Matrix: Calculate a correlation matrix to explore the linear correlation coefficient between each independent variable and the selling price. Rank them from strongest to weakest.
Scatterplots: Create scatterplots of the selling price vs each independent variable. Analyze each plot for:
Regression Lines: Add a regression line to each scatterplot. Calculate the R-squared (coefficient of determination) for each model. R-squared represents the proportion of variance in the selling price explained by the independent variable.
Simple Linear Regression: Run a simple linear regression model for each independent variable. Fill out a table including:
Model Evaluation:
Linear Regression Assumptions:
Linear regression relies on several assumptions:
Assumption Violations:
Your analysis throughout the steps above should provide information to evaluate these assumptions.
By analyzing the data and model results, you can identify potential violations of these assumptions and determine if the linear regression models are suitable for this data.