-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathch10_lab1.Rmd
93 lines (75 loc) · 1.66 KB
/
ch10_lab1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
title: "10.4 Lab 1: Principal Components Analysis"
output:
github_document:
md_extensions: -fancy_lists+startnum
html_notebook:
md_extensions: -fancy_lists+startnum
---
```{r setup, message=FALSE, warning=FALSE}
library(tidyverse)
```
```{r}
(
us_arrests <-
USArrests %>%
as_tibble(rownames = "states")
)
```
```{r}
us_arrests %>%
summarise_if(is.numeric, mean)
```
```{r}
us_arrests %>%
summarise_if(is.numeric, var)
```
Variables have big differences in mean and variance. Because of that, we scale them while doing PCA to set them all to a mean equal 0 and standard deviation equal 1.
```{r}
pr_out <- prcomp(select_if(us_arrests, is.numeric),
scale = TRUE)
```
```{r}
names(pr_out)
```
```{r}
pr_out$center
```
```{r}
pr_out$scale
```
```{r}
pr_out$rotation
```
```{r}
biplot(pr_out, scale = 0)
```
The biplot show the loadings and scores "inverted" from what appears in Figure 10.1, this is because when both scorings and loadings have inverted sign, they still represent the same loadings and scorings.
We can replicate Figure 10.1 by inverting the sign of loadings and scorings.
```{r}
pr_out$rotation <- -pr_out$rotation
pr_out$x <- -pr_out$x
biplot(pr_out, scale = 0)
```
```{r}
pr_out$sdev
```
Variance explained by each component:
```{r}
pr_var <- pr_out$sdev ^2
```
Proportion of variance explained:
```{r}
pve <- pr_var/sum(pr_var)
pve
```
```{r}
qplot(x = 1:4, y = cumsum(pve), geom = "line") +
labs(x = "Principal Component",
y = "Cumulative Proportion of Variance Explained")
```
```{r}
qplot(x = 1:4, y = pve, geom = "line") +
labs(x = "Principal Component",
y = "Proportion of Variance Explained")
```