forked from rdpeng/RepData_PeerAssessment1
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPA1_template.Rmd
125 lines (90 loc) · 3.03 KB
/
PA1_template.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
---
title: "Reproducible Research: Peer Assessment 1"
output: html_document
keep_md: true
---
## Loading and preprocessing the data
Extract the zip file and load the data with classes of the columns defined.
```{r}
unzip("./activity.zip")
activity <- read.csv("./activity.csv", na.strings = "NA",
colClasses = c("numeric", "Date", "integer"))
head(activity, 3)
tail(activity, 3)
```
Convert the time interval into *actual* minute value.
```{r}
activity$minute <- sapply(activity$interval, function(x) {x%/%100*60+x%%100})
head(activity, 3)
tail(activity, 3)
```
## What is mean total number of steps taken per day?
A histogram of the total number of steps taken each day.
```{r}
eachday <- aggregate(steps ~ date, data = activity, FUN = sum)
hist(eachday$steps)
```
**mean** total number of steps taken per day (ignoring missing values).
```{r}
mean(eachday$steps)
```
**median** total number of steps taken per day (ignoring missing values).
```{r}
median(eachday$steps)
```
## What is the average daily activity pattern?
A time series plot of the 5-minute interval (x-axis)
and the average number of steps taken, averaged across all days (y-axis)
```{r}
eachtime <- aggregate(steps ~ minute + interval, data = activity, FUN = mean)
plot(eachtime$minute, eachtime$steps, type = "l")
```
The 5-minute interval, on average across all the days in the dataset,
that contains the maximum number of steps.
```{r}
eachtime[eachtime$steps >= max(eachtime$steps), ]
```
## Imputing missing values
Total number of missing values in the dataset.
```{r}
sum(!complete.cases(activity))
```
Fill in all the missing values by the median for that 5-minute interval.
I have chosen that because it is fun to do something that is not suggested also not forbidden.
Create a new dataset that is equal to the original dataset but with the missing data filled in.
```{r}
newdata <- activity
newdata$steps <- with(newdata, do.call(c, tapply(steps, minute, function(y) {
ym <- median(y, na.rm=TRUE)
y[is.na(y)] <- ym
y
})))
```
A histogram of the total number of steps taken each day of new dataset.
```{r}
eachdaynew <- aggregate(steps ~ date, data = newdata, FUN = sum)
hist(eachdaynew$steps)
```
**mean** total number of steps taken per day of new dataset.
```{r}
mean(eachdaynew$steps)
```
**median** total number of steps taken per day of new dataset.
```{r}
median(eachdaynew$steps)
```
## Are there differences in activity patterns between weekdays and weekends?
Create a new factor variable in the dataset with two levels - "weekday" and "weekend".
```{r}
newdata$daytype <- as.factor(sapply(weekdays(newdata$date), function(x) {
if(grepl("^S", x)) "weekend"
else "weekday"
}))
```
A panel plot containing a time series plot of the 5-minute interval (x-axis)
and the average number of steps taken, averaged across all weekday days or weekend days (y-axis).
```{r}
library(lattice)
bydaytype <- aggregate(steps ~ minute + daytype, data = newdata, FUN = mean)
xyplot(steps ~ minute | daytype, data = bydaytype, type = "l", layout = c(1, 2))
```