User Tools

Site Tools


en:learning:schools:s01:lecture-notes:ba-ln-12

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

en:learning:schools:s01:lecture-notes:ba-ln-12 [2015/09/22 16:22] (current)
Line 1: Line 1:
 +====== L12: Subsetting data frames ======
  
 +"Come on, you scuzzy data, be in there. Come on. "
 +
 +Kevin Flynn, Tron
 +
 +==== Things we cover in this session ====
 +  * Logical operations
 +  * Sub-setting data frames
 +  * Handling fill values or NA
 +
 +==== Things you need for this session ====
 +  * [[en:​learning:​schools:​s01:​worksheets:​ba-ws-12-1|W12-1 Subsetting data frames]]
 +==== Things to take home from this session ====
 +At the end of this session you should be able to
 +  * subset data frames by simple boolean expressions
 +
 +===== Logical operations =====
 +The group of logical operations in programming languages generally encompasses relational and boolean operators. ​
 +
 +Relational operators are used to compare two entities regarding their equality. Depending on that, they return only TRUE and FALSE. The following operators are included in R:
 +^Operator ^Operation ^
 +| > | greater than |
 +| < | less than |
 +| == | exactly equal |
 +| >= | greater than or equal |
 +| %%<%%= | less than or equal |
 +| != | not equal |
 +
 +A special instance of the == operator is implemented in the ''​isTRUE''​ function which returns if an expression is TRUE or FALSE (e.g. isTRUE(x) returns TRUE if x is TRUE; it is an alternative for x == TRUE).
 +
 +Boolean operators are another core component. They allow to combine the boolean expressions TRUE and FALSE in a boolean algebra. The basic operators implemented in R are the following:
 +^Operator ^Operation ^
 +| !x | Not x (with e.g. x is the result of a boolean expression |
 +| x %%|%% y | x OR y |
 +| x & y | x AND y |
 +| xor(x, y) | exclusive x OR y |
 +
 +
 +Of course, one can combine such operators but keep in mind that the precedence of these operators is as follows: NOT, AND, OR. Here is an example:
 +
 +<code rsplus>
 +> A <- TRUE
 +> B <- FALSE
 +> C <- FALSE
 +> B & C | A
 +[1] TRUE
 +> B & (C | A)
 +[1] FALSE
 +> !B & C | A
 +[1] TRUE
 +> !(B & C | A)
 +[1] FALSE
 +</​code>​
 +
 +===== Subsetting data frames =====
 +Subsetting implies that you remove certain rows and/or columns from a data frame to reduce the actual data set to what is needed for your analysis. The two types are realized with partially different manners:
 +
 +  * subsetting by selecting the rows and columns you want in your final data frame
 +  * subsetting by removing the rows and columns you want in your final data frame
 +
 +Both can easily be done using the indexing methods of the data types already introduced in [[en:​learning:​schools:​s01:​code-examples:​ba-ce-00-02|C00-2 Data frame basics]]. ​
 +
 +The main difference (or advantage now) may be that you can derive the indexing boundaries using logial and boolean expressions.
 +
 +8-O Have a look at [[en:​learning:​schools:​s01:​code-examples:​ba-ce-12-01|E12-1 Subsetting data frames]] now for more information on this subject.
 +
 +
 +m( A more detailed introduction to subsetting or - more general - cleaning data frames is beyond the scope of this school. For some more information on that topic, please refer to the excursus [[en:​learning:​schools:​s01:​excursus:​ba-ex-12-01|E12-1:​ Cleaning data frames]].
 +===== Time for practice =====
 +[[en:​learning:​schools:​s01:​worksheets:​ba-ws-12-1|W12-1 Subsetting data frames]]
en/learning/schools/s01/lecture-notes/ba-ln-12.txt ยท Last modified: 2015/09/22 16:22 (external edit)