introtor.lecture.9.the tidyverse

Welcome to R.


Packages

install.packages( ‘packageName’ )
Packages window is next to the Help window.
library( ‘packageName’ ): to load the package in. There may be some functions that have the same name with the original functions. So when you have loaded in some packages and you call a function with this kind of name, the function is in the package.
search on the internet for the documentations of these packages.

Review of Data Management

  • Filter rows
  • Sort the rows
  • Select columns
  • Add new columns
  • Summarize data(use tapply): pay attention to the ‘na.rm’ parameter.

The ‘dplyr’ Packge

  • Filter rows
    1
    2
    # Select rows that Species is equal to "virginica"
    filter( iris, Species == "virginica" )
  • Sort the rows
    1
    2
    3
    # Sort iris with Sepal.Length decreasing
    # In default, it is in an increasing order
    arrange( iris, desc( Sepal.Length ) )
  • Select columns
    1
    2
    3
    # Select one or more columns
    select( iris,
    Sepal.Length, Sepal.Width, Species)
  • Add new columns
    1
    2
    3
    4
    # Add a column at the end
    mutate( iris,
    Sepal.Product = Sepal.Width * Sepal.Length )
    # By the way, "mutate" means change, but it's normally in a bad way.
  • Summarize data
    1
    2
    3
    4
    5
    6
    7
    8
    # This will return a data frame to us
    # with one column: mean(Sepal.Length)
    summarize( iris, mean(Sepal.Length) )

    # This will return a data frame
    # with two columns: Species and Sepal.Length
    iris.grouped <- group_by( iris, Species )
    summarize( iris.grouped, mean(Sepal.Length) )

Data Visualization

Box Plot

The boxplot is another way to visualize a one-dimensional distribution of data, but it’s more abstract than the strip chart or the histogram. Instead, a boxplot only displays a few aspects of the data:
First, the center of the boxplot is a box with a line in the middle of the box. This line represents the median of the data, which is the value such that 50% of the data is bellow the value and 50% of the data is above the value.

box plot

1
2
3
4
5
boxplot( rivers,
main = "Box plot of rivers data",
xlab = "Length (miles)",
horizontal = TRUE,
col = "cyan3")

There’s a much more interesting way to use boxplots, however. Rather than just look at one group, we can also show all of the subgroup-specific boxplots in one graph.( Think about the stripcharts we drew before.)

1
2
3
4
5
boxplot( iris$Sepal.Length ~ iris$Species,
main = "Stratified Stripchart for Sepal Length",
ylab = "Sepal length (cm)",
horizontal = TRUE,
col = c( "cyan1", "dodgerblue1", "skyblue" ) )

Multi-facted display

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# We set R to make graphs of 3 rows and 1 column.
par( mfrow = c(3,1) )

hist( iris$Sepal.Length[ iris$Species == "setosa" ],
main = "Histogram of Setosa Sepal Length",
ylab = "Sepal Length (cm)",
breaks = 10,
col = "cyan1" )

# Notice that the graphs don't have to be the same type.
stripchart( iris$Sepal.Length[ iris$Species == "setosa" ],
main = "a. Stripchart of Setosa Sepal Length",
ylim = c(0, 2),
xlim = c(4, 8),
method = "jitter",
jitter = 0.5,
pch = 24, # this give us a hollow shape
cex = 1.4,
bg = "salmon2"
# bg means background color which will fill in the hollow shape.
)

hist( iris$Sepal.Length[ iris$Species == "virginica" ],
main = "Histogram of Virginica Sepal Length",
ylab = "Sepal Length (cm)",
breaks = 10,
col = "cyan3" )

# At the end, we must set R to draw graphs in the normal way( 1 row 1 column)
par( mfrow = c(1,1) )

Todays Tips