User Tools

Site Tools


Sidebar

Translations of this page:

Back to BIS-Schools

Biodiversity data analysis with R

Lectures

Worksheets

Code

Excursus

en:learning:schools:s01:lecture-notes:ba-ln-12

L12: Subsetting data frames

“Come on, you scuzzy data, be in there. Come on. ”

Kevin Flynn, Tron

Things we cover in this session

  • Logical operations
  • Sub-setting data frames
  • Handling fill values or NA

Things you need for this session

Things to take home from this session

At the end of this session you should be able to

  • subset data frames by simple boolean expressions

Logical operations

The group of logical operations in programming languages generally encompasses relational and boolean operators.

Relational operators are used to compare two entities regarding their equality. Depending on that, they return only TRUE and FALSE. The following operators are included in R:

Operator Operation
> greater than
< less than
== exactly equal
>= greater than or equal
<= less than or equal
!= not equal

A special instance of the == operator is implemented in the isTRUE function which returns if an expression is TRUE or FALSE (e.g. isTRUE(x) returns TRUE if x is TRUE; it is an alternative for x == TRUE).

Boolean operators are another core component. They allow to combine the boolean expressions TRUE and FALSE in a boolean algebra. The basic operators implemented in R are the following:

Operator Operation
!x Not x (with e.g. x is the result of a boolean expression
x | y x OR y
x & y x AND y
xor(x, y) exclusive x OR y

Of course, one can combine such operators but keep in mind that the precedence of these operators is as follows: NOT, AND, OR. Here is an example:

> A <- TRUE
> B <- FALSE
> C <- FALSE
> B & C | A
[1] TRUE
> B & (C | A)
[1] FALSE
> !B & C | A
[1] TRUE
> !(B & C | A)
[1] FALSE

Subsetting data frames

Subsetting implies that you remove certain rows and/or columns from a data frame to reduce the actual data set to what is needed for your analysis. The two types are realized with partially different manners:

  • subsetting by selecting the rows and columns you want in your final data frame
  • subsetting by removing the rows and columns you want in your final data frame

Both can easily be done using the indexing methods of the data types already introduced in C00-2 Data frame basics.

The main difference (or advantage now) may be that you can derive the indexing boundaries using logial and boolean expressions.

8-O Have a look at E12-1 Subsetting data frames now for more information on this subject.

m( A more detailed introduction to subsetting or - more general - cleaning data frames is beyond the scope of this school. For some more information on that topic, please refer to the excursus E12-1: Cleaning data frames.

Time for practice

en/learning/schools/s01/lecture-notes/ba-ln-12.txt · Last modified: 2015/09/22 16:22 (external edit)