User Tools

Site Tools


en:learning:schools:s02:code-examples:sm-ce-03-01

C01-1: Simple predictive models

The following example uses data from a field survey of areas in the Fogo natural park in 2007 by K. Mauer. For more information, please refer to this report.

Regarding libraries, the following packages are necessary.

library(MuMIn)

Simple visualization and models

In many cases, species richness changes along elevational gradients. Hence, a first analysis could check the relationship between richness and elevation.

The following example first creates a scatter plot and then computes a linear and a loess (i.e. local polynomial regression) model. The predicted linear and polynomial functions are added to the scatter plot in blue and red color.

Since the loess model is a polynomial function, individual y-axis values must be computed in a sufficient resolution in order to plot a “continuous” line. Therefore, the model is used to predict the y-axis values (i.e. species richness) for any full meter between the minimum and maximum elevation displayed in the scatter plot.

# Create scatter plot
plot(data$ALT_GPS_M,richness,
      xlab="Elevation [m]",ylab="Plant species richness",pch=21,bg="grey")

# Compute linear model
model_linear <- lm(richness~data$ALT_GPS_M)

# Add linear model to the scatter plot
abline(model_linear,lwd=2,col="blue")


# Compute loess model
model_loess<-loess(richness~data$ALT_GPS_M)

# Predict y-axis values of the loess model (explanation see above)
predict_x <- seq(min(data$ALT_GPS_M), max(data$ALT_GPS_M))
model_loess_predict <- predict(model_loess,newdata=predict_x)

# Add predicted loess model to the scatter plot
lines(predict_x,model_loess_predict,lwd=2,col="red")

Advanced model selection for explaining species richness

A general problem with multiple variable models is overfitting. For example, the R-squared value will get larger and larger the more explanatory variables are included in the model equation (or at least R-squared will not decrease). If one chooses the model with the largest R-squared (i.e. with the most variables), the model might actually explain very much of the particular data sample used to train the model but the explanatory value might very likely be next to nothing if the model is applied to another sample.

Hence, the best model is not the one which explains most but which explains quite a lot with only a few variables.

While there are quite many approaches which can be used for training the best model (e.g. cross-validation or boot-strapping), the example below illustrates a model averaging approach where 256 unique linear models are built which use a different (sub-)set of all explanatory variables available in the dataset.

Applying such an automated model selection is quite easy. We will use the dredge function of the MuMln package and just pass our linear model (which has been built using all available explanatory variables) to the function.

Since the results from dredge are quite confusing in the first place, we apply a model averaging (function model.avg) afterwards and just look at a summarized output. As you can see, aside from returning information on the best model using e.g. the AICc value (the best model has the smallest), it also returns significance information for each of the explanatory variables.

# Build linear model which includes all available/relevant explanatory 
#variables.
model<-lm(richness ~ ALT_GPS_M + GRAU_EROS + SOLO + CAT_USO + DECL_GR + 
            EXP_GR + MAT_ORG + GRAU_UTIL, data = data, na.action = "na.fail")

# Perform a model selection iteration for the linear model defined above.
model_selection <- dredge(model)
## Fixed term is "(Intercept)"
# Average/aggregate the output of the model selection and summarize results.
model_averaged <- model.avg(model_selection)
summary(model_averaged)
## 
## Call:
## model.avg.model.selection(object = model_selection)
## 
## Component model call: 
## lm(formula = richness ~ <256 unique rhs>, data = data, na.action = 
##      na.fail)
## 
## Component models: 
##          df  logLik   AICc delta weight
## 123       6 -189.13 391.22  0.00   0.16
## 1236      7 -188.11 391.51  0.29   0.14
## 1234      7 -188.72 392.74  1.52   0.07
## 12346     8 -187.66 393.00  1.79   0.07
## 1237      7 -189.06 393.40  2.19   0.05
## 1238      7 -189.09 393.46  2.25   0.05
## 1235      7 -189.10 393.48  2.26   0.05
## 12356     8 -188.06 393.79  2.57   0.04
## 12367     8 -188.08 393.84  2.62   0.04
## 12368     8 -188.10 393.87  2.65   0.04
## 12348     8 -188.65 394.97  3.76   0.02
## 12347     8 -188.66 394.99  3.78   0.02
## 12345     8 -188.68 395.04  3.83   0.02
## 123456    9 -187.60 395.32  4.11   0.02
## 123468    9 -187.63 395.38  4.17   0.02
## 123467    9 -187.64 395.40  4.18   0.02
## 12357     8 -188.94 395.55  4.34   0.02
## 12358     8 -189.03 395.74  4.52   0.02
## 12378     8 -189.04 395.76  4.54   0.02
## 123567    9 -187.96 396.04  4.83   0.01
## 123568    9 -188.03 396.18  4.96   0.01
## 123678    9 -188.08 396.27  5.06   0.01
## 123457    9 -188.53 397.19  5.97   0.01
## 123458    9 -188.58 397.27  6.06   0.01
## 123478    9 -188.62 397.36  6.14   0.01
## 1234567  10 -187.52 397.65  6.44   0.01
## 1234568  10 -187.54 397.71  6.49   0.01
## 1234678  10 -187.62 397.86  6.65   0.01
## 123578    9 -188.91 397.94  6.73   0.01
## 1235678  10 -187.95 398.53  7.31   0.00
## 1234578  10 -188.48 399.57  8.36   0.00
## 12345678 11 -187.49 400.15  8.94   0.00
## 126       6 -195.94 404.84 13.63   0.00
## 1267      7 -195.66 406.61 15.39   0.00
## 1256      7 -195.83 406.94 15.73   0.00
## 1246      7 -195.84 406.96 15.75   0.00
## 1268      7 -195.94 407.17 15.96   0.00
## 12        5 -198.96 408.60 17.38   0.00
## 12467     8 -195.53 408.74 17.53   0.00
## 12678     8 -195.63 408.93 17.71   0.00
## 12567     8 -195.65 408.97 17.75   0.00
## 12456     8 -195.72 409.11 17.89   0.00
## 1348      6 -198.16 409.27 18.05   0.00
## 12568     8 -195.82 409.32 18.10   0.00
## 12468     8 -195.84 409.35 18.13   0.00
## 13468     7 -197.18 409.66 18.44   0.00
## 1346      6 -198.64 410.24 19.02   0.00
## 127       6 -198.74 410.44 19.23   0.00
## 1347      6 -198.75 410.46 19.25   0.00
## 125       6 -198.88 410.72 19.50   0.00
## 13478     7 -197.72 410.74 19.52   0.00
## 124       6 -198.91 410.78 19.57   0.00
## 128       6 -198.94 410.82 19.61   0.00
## 13467     7 -197.80 410.88 19.67   0.00
## 134       5 -200.17 411.00 19.79   0.00
## 124678    9 -195.48 411.09 19.87   0.00
## 124567    9 -195.52 411.16 19.95   0.00
## 125678    9 -195.61 411.34 20.12   0.00
## 13458     7 -198.03 411.36 20.14   0.00
## 124568    9 -195.70 411.52 20.31   0.00
## 134678    8 -196.95 411.57 20.35   0.00
## 134568    8 -196.96 411.60 20.38   0.00
## 13457     7 -198.38 412.05 20.84   0.00
## 134578    8 -197.23 412.14 20.93   0.00
## 134567    8 -197.35 412.38 21.17   0.00
## 13456     7 -198.62 412.52 21.31   0.00
## 1278      7 -198.64 412.57 21.35   0.00
## 1368      6 -199.82 412.59 21.37   0.00
## 1247      7 -198.69 412.66 21.44   0.00
## 138       5 -201.03 412.73 21.52   0.00
## 1257      7 -198.74 412.76 21.55   0.00
## 1345678   9 -196.39 412.91 21.69   0.00
## 137       5 -201.13 412.94 21.72   0.00
## 1258      7 -198.83 412.95 21.73   0.00
## 1245      7 -198.83 412.95 21.74   0.00
## 1248      7 -198.88 413.05 21.83   0.00
## 1367      6 -200.06 413.07 21.85   0.00
## 1345      6 -200.16 413.28 22.06   0.00
## 136       5 -201.34 413.35 22.13   0.00
## 1378      6 -200.21 413.38 22.16   0.00
## 1245678  10 -195.46 413.55 22.33   0.00
## 13678     7 -199.31 413.92 22.70   0.00
## 1357      6 -200.78 414.51 23.29   0.00
## 13567     7 -199.63 414.54 23.33   0.00
## 13568     7 -199.69 414.66 23.44   0.00
## 13578     7 -199.75 414.78 23.56   0.00
## 12478     8 -198.56 414.79 23.58   0.00
## 13        4 -203.19 414.81 23.60   0.00
## 1358      6 -200.98 414.91 23.70   0.00
## 12578     8 -198.63 414.93 23.72   0.00
## 12457     8 -198.68 415.03 23.82   0.00
## 12458     8 -198.77 415.21 23.99   0.00
## 135678    8 -198.78 415.24 24.03   0.00
## 1356      6 -201.33 415.62 24.41   0.00
## 146       5 -202.87 416.41 25.19   0.00
## 1468      6 -201.91 416.77 25.55   0.00
## 135       5 -203.15 416.97 25.76   0.00
## 124578    9 -198.55 417.21 25.99   0.00
## 16        4 -204.56 417.57 26.35   0.00
## 168       5 -203.51 417.69 26.47   0.00
## 2378      7 -201.33 417.94 26.73   0.00
## 2358      7 -201.39 418.07 26.85   0.00
## 23568     8 -200.30 418.27 27.06   0.00
## 1467      6 -202.74 418.44 27.22   0.00
## 23678     8 -200.39 418.45 27.23   0.00
## 1456      6 -202.85 418.65 27.44   0.00
## 23578     8 -200.51 418.70 27.49   0.00
## 14568     7 -201.75 418.79 27.57   0.00
## 14678     7 -201.90 419.09 27.88   0.00
## 167       5 -204.23 419.13 27.91   0.00
## 235678    9 -199.56 419.23 28.01   0.00
## 1568      6 -203.41 419.77 28.56   0.00
## 156       5 -204.56 419.79 28.58   0.00
## 1678      6 -203.47 419.90 28.69   0.00
## 2357      7 -202.32 419.93 28.72   0.00
## 23567     8 -201.17 420.02 28.80   0.00
## 23478     8 -201.30 420.28 29.07   0.00
## 23458     8 -201.38 420.43 29.22   0.00
## 14567     7 -202.63 420.55 29.33   0.00
## 234568    9 -200.29 420.70 29.49   0.00
## 2367      7 -202.76 420.81 29.60   0.00
## 237       6 -203.94 420.83 29.61   0.00
## 234678    9 -200.37 420.85 29.64   0.00
## 2368      7 -202.78 420.85 29.64   0.00
## 238       6 -203.99 420.93 29.72   0.00
## 234578    9 -200.50 421.12 29.91   0.00
## 145678    8 -201.73 421.14 29.93   0.00
## 1567      6 -204.10 421.16 29.94   0.00
## 2356      7 -202.96 421.22 30.00   0.00
## 378       5 -205.52 421.71 30.50   0.00
## 2345678  10 -199.55 421.72 30.50   0.00
## 235       6 -204.44 421.84 30.63   0.00
## 15678     7 -203.29 421.88 30.66   0.00
## 23457     8 -202.19 422.04 30.83   0.00
## 234567    9 -201.06 422.24 31.03   0.00
## 2347      7 -203.68 422.66 31.44   0.00
## 23467     8 -202.55 422.77 31.56   0.00
## 23468     8 -202.75 423.18 31.97   0.00
## 2348      7 -203.95 423.19 31.97   0.00
## 3678      6 -205.13 423.20 31.99   0.00
## 23456     8 -202.80 423.27 32.06   0.00
## 3478      6 -205.25 423.46 32.25   0.00
## 2345      7 -204.23 423.75 32.54   0.00
## 3578      6 -205.41 423.78 32.56   0.00
## 148       5 -206.92 424.52 33.30   0.00
## 358       5 -206.92 424.52 33.31   0.00
## 38        4 -208.11 424.67 33.45   0.00
## 34678     7 -204.92 425.12 33.91   0.00
## 35678     7 -205.01 425.31 34.10   0.00
## 368       5 -207.35 425.37 34.15   0.00
## 18        4 -208.47 425.38 34.16   0.00
## 34578     7 -205.08 425.46 34.24   0.00
## 3568      6 -206.28 425.52 34.30   0.00
## 3458      6 -206.44 425.84 34.62   0.00
## 37        4 -208.71 425.87 34.65   0.00
## 14        4 -208.75 425.95 34.74   0.00
## 348       5 -207.80 426.27 35.06   0.00
## 1478      6 -206.90 426.76 35.55   0.00
## 1458      6 -206.92 426.79 35.58   0.00
## 367       5 -208.18 427.03 35.81   0.00
## 357       5 -208.21 427.10 35.88   0.00
## 34568     7 -205.91 427.10 35.88   0.00
## 1         3 -210.42 427.11 35.89   0.00
## 2568      7 -205.93 427.14 35.93   0.00
## 345678    8 -204.75 427.17 35.95   0.00
## 3468      6 -207.12 427.20 35.98   0.00
## 147       5 -208.29 427.25 36.04   0.00
## 178       5 -208.34 427.36 36.15   0.00
## 158       5 -208.47 427.61 36.39   0.00
## 17        4 -209.62 427.68 36.46   0.00
## 145       5 -208.65 427.98 36.76   0.00
## 347       5 -208.67 428.01 36.79   0.00
## 3567      6 -207.68 428.31 37.10   0.00
## 268       6 -207.73 428.41 37.19   0.00
## 68        4 -210.07 428.59 37.37   0.00
## 568       5 -209.10 428.88 37.66   0.00
## 678       5 -209.12 428.92 37.70   0.00
## 15        4 -210.26 428.96 37.75   0.00
## 256       6 -208.02 428.99 37.77   0.00
## 14578     7 -206.89 429.07 37.85   0.00
## 3457      6 -208.09 429.14 37.92   0.00
## 3467      6 -208.16 429.27 38.05   0.00
## 2678      7 -207.02 429.32 38.10   0.00
## 24568     8 -205.84 429.35 38.13   0.00
## 25678     8 -205.90 429.47 38.25   0.00
## 1457      6 -208.29 429.53 38.32   0.00
## 1578      6 -208.33 429.61 38.40   0.00
## 157       5 -209.62 429.91 38.69   0.00
## 2468      7 -207.59 430.47 39.25   0.00
## 34567     7 -207.60 430.48 39.26   0.00
## 2567      7 -207.61 430.50 39.29   0.00
## 258       6 -208.77 430.50 39.29   0.00
## 5678      6 -208.78 430.51 39.30   0.00
## 2456      7 -207.68 430.65 39.44   0.00
## 468       5 -210.00 430.67 39.45   0.00
## 4568      6 -208.95 430.85 39.63   0.00
## 4678      6 -209.07 431.10 39.88   0.00
## 24678     8 -206.88 431.43 40.21   0.00
## 245678    9 -205.80 431.72 40.51   0.00
## 28        5 -210.62 431.92 40.70   0.00
## 267       6 -209.64 432.23 41.02   0.00
## 24567     8 -207.30 432.27 41.05   0.00
## 2458      7 -208.63 432.54 41.33   0.00
## 356       5 -210.96 432.59 41.37   0.00
## 35        4 -212.09 432.63 41.41   0.00
## 45678     7 -208.67 432.63 41.42   0.00
## 278       6 -209.89 432.73 41.51   0.00
## 2578      7 -208.74 432.77 41.56   0.00
## 67        4 -212.17 432.78 41.57   0.00
## 58        4 -212.37 433.19 41.98   0.00
## 567       5 -211.27 433.22 42.00   0.00
## 25        5 -211.32 433.31 42.10   0.00
## 8         3 -213.58 433.43 42.22   0.00
## 78        4 -212.53 433.51 42.30   0.00
## 2467      7 -209.13 433.55 42.33   0.00
## 248       6 -210.41 433.78 42.57   0.00
## 56        4 -212.84 434.12 42.91   0.00
## 236       6 -210.68 434.32 43.10   0.00
## 345       5 -211.88 434.43 43.21   0.00
## 3456      6 -210.82 434.60 43.39   0.00
## 245       6 -210.84 434.63 43.41   0.00
## 257       6 -210.84 434.63 43.41   0.00
## 2478      7 -209.68 434.65 43.43   0.00
## 578       5 -212.06 434.79 43.58   0.00
## 24578     8 -208.59 434.86 43.64   0.00
## 458       5 -212.13 434.94 43.73   0.00
## 2346      7 -209.85 434.99 43.77   0.00
## 467       5 -212.16 435.00 43.78   0.00
## 48        4 -213.46 435.37 44.15   0.00
## 4567      6 -211.26 435.47 44.25   0.00
## 478       5 -212.44 435.56 44.34   0.00
## 2457      7 -210.39 436.06 44.84   0.00
## 456       5 -212.80 436.27 45.05   0.00
## 23        5 -212.94 436.55 45.34   0.00
## 27        5 -213.02 436.71 45.49   0.00
## 4578      6 -211.88 436.72 45.50   0.00
## 234       6 -211.93 436.82 45.60   0.00
## 247       6 -212.32 437.60 46.38   0.00
## 57        4 -214.88 438.20 46.98   0.00
## 7         3 -216.05 438.37 47.16   0.00
## 26        5 -213.99 438.65 47.44   0.00
## 246       6 -213.00 438.95 47.73   0.00
## 5         3 -216.53 439.33 48.12   0.00
## 457       5 -214.84 440.35 49.13   0.00
## 47        4 -216.05 440.55 49.34   0.00
## 45        4 -216.44 441.33 50.12   0.00
## 36        4 -216.57 441.59 50.37   0.00
## 6         3 -217.80 441.86 50.65   0.00
## 3         3 -218.61 443.49 52.27   0.00
## 346       5 -216.56 443.78 52.57   0.00
## 24        5 -216.61 443.90 52.68   0.00
## 46        4 -217.75 443.95 52.73   0.00
## 2         4 -217.92 444.29 53.08   0.00
## 34        4 -218.61 445.67 54.45   0.00
## (Null)    2 -222.53 449.19 57.98   0.00
## 4         3 -222.50 451.27 60.05   0.00
## 
## Term codes: 
## ALT_GPS_M   CAT_USO   DECL_GR    EXP_GR GRAU_EROS GRAU_UTIL   MAT_ORG 
##         1         2         3         4         5         6         7 
##      SOLO 
##         8 
## 
## Model-averaged coefficients: 
##               Estimate Std. Error Adjusted SE z value Pr(>|z|)    
## (Intercept)  8.6836253  1.7801913   1.8039414   4.814 1.50e-06 ***
## ALT_GPS_M   -0.0044854  0.0007369   0.0007468   6.006  < 2e-16 ***
## CAT_USOUM    3.0558200  0.6774887   0.6866900   4.450 8.60e-06 ***
## CAT_USOUT    1.7379181  0.8332520   0.8438917   2.059   0.0395 *  
## DECL_GR      0.1096971  0.0261855   0.0265336   4.134 3.56e-05 ***
## GRAU_UTIL   -0.3650070  0.2642385   0.2679174   1.362   0.1731    
## EXP_GR       0.0016703  0.0018690   0.0018951   0.881   0.3781    
## MAT_ORG     -0.2858891  0.8684900   0.8804522   0.325   0.7454    
## SOLO        -0.1324605  0.5379966   0.5454568   0.243   0.8081    
## GRAU_EROS    0.0924248  0.2662798   0.2699664   0.342   0.7321    
## 
## Full model-averaged coefficients (with shrinkage): 
##               Estimate Std. Error Adjusted SE z value Pr(>|z|)    
## (Intercept)  8.6836253  1.7801913   1.8039414   4.814 1.50e-06 ***
## ALT_GPS_M   -0.0044854  0.0007370   0.0007468   6.006  < 2e-16 ***
## CAT_USOUM    3.0553406  0.6785158   0.6877019   4.443 8.90e-06 ***
## CAT_USOUT    1.7376454  0.8334709   0.8441061   2.059   0.0395 *  
## DECL_GR      0.1096234  0.0263307   0.0266766   4.109 3.97e-05 ***
## GRAU_UTIL   -0.1665952  0.2547987   0.2565461   0.649   0.5161    
## EXP_GR       0.0005293  0.0013080   0.0013199   0.401   0.6884    
## MAT_ORG     -0.0694587  0.4452968   0.4509681   0.154   0.8776    
## SOLO        -0.0315265  0.2684603   0.2720197   0.116   0.9077    
## GRAU_EROS    0.0225202  0.1372988   0.1390419   0.162   0.8713    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Relative variable importance: 
##                      ALT_GPS_M CAT_USO DECL_GR GRAU_UTIL EXP_GR GRAU_EROS
## Importance:          1.00      1.00    1.00    0.46      0.32   0.24     
## N containing models:  128       128     128     128       128    128     
##                      MAT_ORG SOLO
## Importance:          0.24    0.24
## N containing models:  128     128

In the case of the linear model approach above, the best model is the one which just uses ALT_GPS_M, CAT_USO and DECL_GR as explanatory variables.

en/learning/schools/s02/code-examples/sm-ce-03-01.txt · Last modified: 2015/09/28 09:21 by tnauss