# BIS-Fogo

### Site Tools

en:learning:schools:s01:lecture-notes:ba-ln-07

# L07: Descriptive statistics

“ I don't even know what I'm doing here.”

Chrom, Tron

### Things we cover in this session

• Describing and visualizing model results by boxplots and simple statistics

### Things to take home from this session

At the end of this session you should be able to

• Calculate characteristics of the model output
• Create boxplots
• Interpret model results based on boxplots

## Descriptive statistics: min/max/mean/median/sd

To interpret the success or characteristics of you model, there are more measures beside the p value and R² you learned in LN04-1 Regressions.

The minimum and maximum values of a prediction indicate how well a model is able to predict extreme values (either low or high).

Comparing the mean and median values of a prediction to the observed values teachs you about a general over- or underestimation of the prediction: The mean value is calculated by summing up all values of the dataset of interest and divid it by the number of observations. Though the mean value is widely used to characterize datasets, it has the major disadvantage of being highly affected by outliers. The median, in contrast is the value which is located in the middle of an ordered dataset. Thus it is robust to outliers.

The standard deviation (sd) describes the spread of the data. It is the average deviation from each value to the mean value of the distribution.

### Descriptive statistics: Do it in R

Luckily, as a R user you don't have to calculate these measures by hand. The functions

```mean()
max()
min()
median()
sd() ```

will do it for you!

## Boxplots

A boxplot is a useful visualization of the measures shown in the section above. It is therefore often used to depict the differences of distributions eg. between predicted and observed values.

(Chen-Pan Liao [CC_BY_SA] via wikimedia.org)

A Boxplot shows several components:

1. The box includes the distribution of the values located in the second and third quartil, thus of the 50% of values which are closest to the mean value.
2. The median is depicted by the line in the box. The whiskers and representation of outliers represent the spread of the values.
3. The Whiskers mark the remaining values which don't fall into the second and third quartile. The length of the whiskers is not standardizized. Often they are expanded to 1.5*the interquartile range (IQR).
4. The interquartile range is the range between the lowest value falling into the second quartile and the highest value falling into the third quartile.
5. All values which are higher than 1.5*IQR are considered as outliers and are usually marked by points over or under the whiskers, respectively.

## Time for practice 