User Tools

Site Tools


Translations of this page:

Back to BIS-Schools

Biodiversity data analysis with R






W06-1 Leave-one-out validation

This worksheet revisits the prediction using linear models and focuses on a quite solid validation approach for the goodness of the prediction. After completing this worksheet you should know how to use for loops to perform leave-one-out validations.

Things you need for this worksheet

  • R — the interpreter can be installed on any operation system. For Linux, you should use the r-cran packages supplied for your Linux distribution. If you use Ubuntu, this is one of many starting points. If you use windows, you could install R from the official CRAN web page.

  • R Studio — we recommend to use R Studio for (interactive) programming with R. You can download R Studio from the official web page.

  • your script and data from W02-1: Reading CSV files

What's the plan?

For the linear model we get a R² value in the summary. But this R² does only tell us the fitting of the model to the plots that we have data for. What we actually want to get is the error of our prediction. For that error we need a different approach.

To calculate the actual error, the general idea is to predict values that we actually know. Of course we still need data to do our regression with, that's why we just leave one plot out to calculate our model, afterwards we let that model predict the one left out value. Now we can take the diffenece between the actual value and the value calculated by the model. That's the Error … BUT … of course it's not enough to calculate the error of one single plot. That is why we do this for all our plots leaving one out at a time. In the end we will calculate the mean error out of all those errors.

Learning log assignments

:!: First things first: the following analysis is build on top of your script from W02-1. Please copy your script “W02-1.R”, rename the copy to “W06-1.R” and use it for the programming tasks of this worksheet.

:-\ In order to have all our variables in one data frame, let's create two new columns with the square roots of the animal activity and the coverage.

Now we are ready to program a leave-one-out validation script. The script should compute the prediction error for each plots where animal and vegetation coverage data is available. Hence, we need a loop which leaves out each animal plot once and predicts the animal activity on this plot based on a linear model derived from the remaining vegetation coverage to animal activity relationship.

:-\ Please write a for-loop which realizes the leave-one-out validation. Be aware that you need two variables which hold each predicted value as compared to the observed value of the left-out validation sample.

:-\ Using the results from the loop, please calculate the mean error and the R² of all your validation value pairs.

en/learning/schools/s01/worksheets/ba-ws-06-1.txt · Last modified: 2015/09/22 16:22 (external edit)