This worksheet puts linear regressions on the next level by actually using them to predict values of the dependent variable based on the independent one. After completing this worksheet you should know how to predict the values of the dependent variable using linear regression models.
What we want to do in this worksheet is to take the vegetation coverage of each plot (because we have that for all 161 plots) and predict the animal activity (because we have that only for 50 of those plots). The model will be built by using those 50 plots with information about animal activity and the corresponding vegetation coverage. That model will be used to predict the animal activity for the rest of the plots.
First things first: the following analysis is build on top of your script from W02-1. Please copy your script “W02-1.R”, rename the copy to “W05-1.R” and use it for the programming tasks of this worksheet.
Let's have a look at the distribution of the coverage. You can use the hist() function for that. Does this plot suit your expectations of normally distributed data? If not, you could try to plot a histogram of the square root. The function sqrt() will help you with that. Please also check the distribution for the animal activity in the same way.
Now it's time to redo our regression. You could peek at W04-1 and do the regression in the same way, only this time please use the square root.
Let's use our linear model now to predict the animal activity by the square root of coverage. Please use the predict() function and save the result in a variable. Remember that you still have the square root of the value so please convert it back to the original ones. An exponent in R can be calculated by using ^x.
Great, now our values are back to animal activity. Let's see what we achieved, please visualize the relation between coverage and our newly calculated variable.