Lab 11 - Multivariate Regression, Diagnostics, OLS and Exploratory Regression in ArcGIS

As the name suggests, multivariate regression uses multiple variables (more than two). Multivariate regression is similar to bivariate regression in that there are correlation coefficients, an adjusted R-squared of overall relationship, and one intercept coefficient. However, the key difference is that in multivariate regression, every independent variable has its own regression coefficient and its own p-value to indicate significance. (Zandbergen)

We practiced a whole lot in this module's lab assignment!

Part A

In Part A, we worked with housing data to predict home selling prices using lot size and number of bedrooms. Using Excel, we began by creating a correlation coefficient matrix; then we ran and evaluated the results of the regression model; and finally, we estimated the selling price of a house (using the formula Y = a + b1X1 + b2X2) given an outlier scenario and non-outlier scenario. Through this, we learned that regression is sensitive to outliers, especially in small sample sizes. The results of outliers can produce biased regression coefficients, which in turn can create biased predictions.

Part B

In Part B, we compared multivariate regression models from a cirrhosis death rates dataset that included the following independent variables: urban population, liquor consumption, and wine consumption. I created a total of 7 regression models. Based on my results, the model providing the best overall fit was the multivariate regression model using all 3 independent variables combined. This model explained 78.1% of cirrhosis death rates; additionally,  the regression coefficients and p-values showed that wine consumption had the strongest, most significant influence contributing to it.

Part C

In Part C, we learned how to perform a multivariate regression in ArcGIS using the OLS tool (Ordinary Least Squares) under Spatial Statistics > Modeling Spatial Relationships, to predict the number of 911 calls using the following variables: total population, number of residents with a low level of education, and distance to the urban center. More importantly, we learned about the Six OLS Checks and analyzed our results for each based on the output of the tool ("Summary of OLS Results" and "OLS Diagnostics").

Six OLS Checks:
1. Are the independent (explanatory) variables helping your model?
2. Are the relationships what you expected?
3. Are any of the explanatory variables redundant?
4. Is the model biased?
5. Do you have all key explanatory variables?
6. How well are you explaining your dependent variables?

Part D

Finally, in Part D we learned how to use the Exploratory Regression tool in ArcGIS for the instance of not knowing which variables to use. This tool "will evaluate all possible combinations of the selected explanatory variables and look for an OLS model that best explains the dependent variable" (Morgan). The tool also takes into account all Six OLS Checks.

When discussing the performance of a model, we interpreted/compared Akaike's Information Criterion (AIC) diagnostic statistic, as well as used the Jarque-Bera test number to determine to run the Spatial Autocorrelation tool (Global Moran's I) to examine residuals and ensure all key variables were used. Some screenshots of my work are provided below.



Comments