1 Introduction
First class, introduction to R
and how to use it
1.1 General R usage
Set up your directory
Install and load tidyverse
1.2 Basics :
- R comments start with #
- Use text editor and beware of ”smart quotes”
- To get help about anything use
?
:?seq
will show you the manual to use the functionseq
(a function which generates sequences of numbers) - To assign a value to a variable use the assignment operator
<-
1.3 Explore dataset:
We will use the mpg dataset which is load by default with tidyverse
To have more info about this dataset:
To see the first lines of a dataset:
head(mpg)
## # A tibble: 6 x 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compa…
## 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compa…
## 3 audi a4 2 2008 4 manual(m6) f 20 31 p compa…
## 4 audi a4 2 2008 4 auto(av) f 21 30 p compa…
## 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compa…
## 6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compa…
To see the structure of the data:
str(mpg)
## Classes 'tbl_df', 'tbl' and 'data.frame': 234 obs. of 11 variables:
## $ manufacturer: chr "audi" "audi" "audi" "audi" ...
## $ model : chr "a4" "a4" "a4" "a4" ...
## $ displ : num 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
## $ year : int 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
## $ cyl : int 4 4 4 4 6 6 6 4 4 4 ...
## $ trans : chr "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
## $ drv : chr "f" "f" "f" "f" ...
## $ cty : int 18 21 20 21 16 18 18 18 16 20 ...
## $ hwy : int 29 29 31 30 26 26 27 26 25 28 ...
## $ fl : chr "p" "p" "p" "p" ...
## $ class : chr "compact" "compact" "compact" "compact" ...
You can see from the output that this is a data.frame
: a rectangular structure with column and lines (?data.frame
if you want to learn more data.frame
). As you can see each column may store data of different types: here int
ie numbers and chrs
ie string of characters.
To get the list of column names:
names(mpg)
## [1] "manufacturer" "model" "displ" "year" "cyl"
## [6] "trans" "drv" "cty" "hwy" "fl"
## [11] "class"
To get some simple summary statistics of the dataset:
summary(mpg)
## manufacturer model displ year
## Length:234 Length:234 Min. :1.600 Min. :1999
## Class :character Class :character 1st Qu.:2.400 1st Qu.:1999
## Mode :character Mode :character Median :3.300 Median :2004
## Mean :3.472 Mean :2004
## 3rd Qu.:4.600 3rd Qu.:2008
## Max. :7.000 Max. :2008
## cyl trans drv cty
## Min. :4.000 Length:234 Length:234 Min. : 9.00
## 1st Qu.:4.000 Class :character Class :character 1st Qu.:14.00
## Median :6.000 Mode :character Mode :character Median :17.00
## Mean :5.889 Mean :16.86
## 3rd Qu.:8.000 3rd Qu.:19.00
## Max. :8.000 Max. :35.00
## hwy fl class
## Min. :12.00 Length:234 Length:234
## 1st Qu.:18.00 Class :character Class :character
## Median :24.00 Mode :character Mode :character
## Mean :23.44
## 3rd Qu.:27.00
## Max. :44.00
1.4 Vectors
to create a vector use c()
### Exemple:
* Numeric Vector
To select the i
th element of a vector a
use the notation a[i]
. You can also select multiple values from the same vector by using the notation a[c(i,j)]
.
As you may have noticed this means you use a second vector to store the indices of the elements you want so the notation a[c(2,4)]
could be rewritten as:
1.5 Matrices and arrays
All columns must be the same type (numeric, character, etc.) and the same length. Arrays are similar than matrices but can have more than two dimensions
1.6 Working directory
To know the actual working directory:
1.7 Data Frame
1.8 Plots
We can use this simple plot function to visualise the mpg dataset:
## ggplots
How can we do
1.8.1 The mpg
data frame
We will reuse the mpg
data set
mpg
## # A tibble: 234 x 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto(l… f 18 29 p comp…
## 2 audi a4 1.8 1999 4 manual… f 21 29 p comp…
## 3 audi a4 2 2008 4 manual… f 20 31 p comp…
## 4 audi a4 2 2008 4 auto(a… f 21 30 p comp…
## 5 audi a4 2.8 1999 6 auto(l… f 16 26 p comp…
## 6 audi a4 2.8 1999 6 manual… f 18 26 p comp…
## 7 audi a4 3.1 2008 6 auto(a… f 18 27 p comp…
## 8 audi a4 quat… 1.8 1999 4 manual… 4 18 26 p comp…
## 9 audi a4 quat… 1.8 1999 4 auto(l… 4 16 25 p comp…
## 10 audi a4 quat… 2 2008 4 manual… 4 20 28 p comp…
## # … with 224 more rows
1.8.2 Creating a ggplot
To plot mpg
1.8.3 adding colors:
1.9 For those who finished all
Can you reproduce the following plots? You will need some of the libraries listed below.
PLOT 1:
PLOT 2:
PLOT 3:
PLOT 4:
(hint you will need to use theme_economist_white()
from ggthemes)