-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy path1_fundamentals.Rmd
609 lines (456 loc) · 12.9 KB
/
1_fundamentals.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
---
title: |-
R Crash Course \
Part 1 -- Fundamentals
author: "Rob Colautti"
---
<script src="_hidecode.js"></script>
# Reference
Webpages (from R markdown) available at:
https://colauttilab.github.io/RCrashCourse/1_fundamentals.html
All source code and linked files available on GitHub:
https://github.com/ColauttiLab/RCrashCourse
# 1. Introduction
## Group introductions
1. Name & department/program of everyone in your group
2. Total number of languages spoken (fluently)
3. Number of people in your group who grew up in Canada
4. After completing this course, I would like to be able to:
## Class discussion
1. How did you become fluent in a second language?
<div class="fold s o">
Study, read, listen
Try something new, fail, error correct, repeat
</div>
2. How do you become fluent in a programming language
(Colautti Lab guide successful programming):
<div class="fold s o">
1. Get organized
2. Set aside large blocks of time (2+ hours), __ideally__ without interruption
3. Good headphones with white noise or music (no lyrics), depending on mood:
a. Baroque/Classical
b. Smooth Jazz
c. Electronic (ambient, house, lofi)
e. <!--html_preserve--><a href="https://coffitivity.com/">Coffitivity</a><!--/html_preserve-->
4. Google: "How do I ______ in R"
5. Read other people's code carefully
6. Study, read, listen
7. Try something new, fail, error correct, repeat
8. __EMBRACE FAILURE__ -- even after 10+ years of programming experience, most of my algorithms do not work on the first try (typos, mis-specified objects, etc.)!
</div>
<br>
***
<br>
# 2. R Basics
Make comments inside your code. Very important (unless you are using R markdown or R notebooks)!
```{r}
# Use hastags to make comments - not read by the R console
# Use other characters and blank lines to improve readability:
# -------------------------
# My first R script
# Today's Date
# -------------------------
# Add a summary description of what the script does
# This script will...
# And annotate individual parts of the script
```
## Basic Math
```{r}
10+2 # add
10-2 # subtract
10*2 # multiply
10/2 # divide
10^2 # exponent
abs(-10) # absolute value
sqrt(10-1) # square root (with subtraction)
log(10) # natural log
log10(10) # log base 10
exp(1) # power of e
sin(pi/2) # sine function
asin(1) # inverse sine
cos(pi) # cosine
acos(-1) # inverse cosine
tan(0) # tangent
atan(0) # inverse tangent
```
## Round/Truncate
```{r}
round(pi,digits=3) # standard rounding to 3 digits
floor(pi) # round down to closest whole number
ceiling(pi) # round up to closest whole number
signif(pi,digits=2) # round to keep 2 significant digits
```
## Logic Operators
Note: **!** is a negation/inverse operator
```{r}
1 > 2 # greater than
1 < 2 # less than
1 <= 2 # less than or equal to
1 == 1 # equal to
1 != 1 # not equal to
1 == 2 | 1 == 1 # | means 'OR'
1 == 2 & 1 == 1 # & means 'AND'
1 == 1 & 1 == 1
```
### Protip:
Instead of `|`, you can us `%in%` with `c()` to check a large number of values:
```{r}
1 %in% c(1,2,3,4,5,6,7,8,9,10)
```
## Random Numbers
Generate some random numbers. Useful for modelling, testing scripts, etc.
```{r}
runif(10,min=0,max=1) # random numbers from a uniform distribution (each number equally likely to be chosen)
rnorm(10,mean=0,sd=1) # random numbers from a normal distribution
rpois(10,lambda=10) # poisson distribution
rbinom(10,size=1,prob=0.5) # binomial sampling (e.g. 10 coin tosses where heads=1 tails=0)
rbinom(10,size=10,prob=0.5) # binomial repeated sampling (e.g. number of heads in 10 coin tosses, repeated 10 times)
```
Fun fact, random numbers generated by a computer are generated by a calculation from a 'seed' number, so they are never truly random. They act random because the seed number is typically something like the millionth of a second of the time on your computer's internal clock.
It's not just philosophical, it is also useful for testing and debugging since you can set the seed to generate the same 'random' numbers.
Compare these outputs:
```{r}
runif(5)
runif(5)
set.seed(3)
runif(5)
set.seed(3)
runif(5)
set.seed(172834782)
runif(5)
set.seed(172834782)
runif(5)
runif(5)
```
## Combining objects
`c()` to concatenate
```{r}
c(1,2,5,"string")
```
**:** for a range of numbers
```{r}
1:10
100:90
-1:1
```
<br>
***
<br>
# 3. Defining variables / objects
## Cells
The most basic object is a single number or string
```{r}
X<-"string"
```
#### Why no output?
When we wrote: `X<-"string"`
R created the object called **X**, so no output is produced.
A few options To see the contents of **X**:
```{r}
print(X)
paste(X)
X
```
#### What's the difference?
`print()` Is most generic, useful for providing feedback from running scripts
(e.g. during loops, Bash scripts, etc)
`paste()` Converts objects to a string, we'll come back to this.
Generally `print()` or `paste()` are preferred over calling the object directly.
## A Vector
A one-dimensional array of cells
Ordered from 1 to ?
Items must all be of the same type. If you mix numbers and text, then the whole vector will be formatted as text.
```{r}
Xvec<-c(X,1:10,"E", "Computational Biology", 100:90)
Xvec
```
> Protip: A common problem when importing data to R occurs when a column of numeric data includes at least one text value (e.g. "missing" or "< 1"). R will treat the entire column as text rather than numeric values. Watch for this when working with real data!
### Subset a vector with square brackets [loc]
Requires a number or range of numbers
```{r}
Xvec[1]
Xvec[13]
```
## Matrices
A 2-D array of cells
With 1 to ? rows by columns
```{r}
Xmat<-matrix(Xvec,nrow=6)
Xmat
```
**Notice** the square brackets along the top and left side?
These show the 'address' of each element in the matrix
### Subset with square brackets [row,col]
Just like the vector, but now two numbers for a 2D array
```{r}
Xmat[1,3]
```
## Higher-order arrays
Add as many dimensions as you need using `array()`
```{r}
Xarray<-array(0, dim=c(3,3,2)) # 3 dimensions
Xarray
```
Note how 3rd dimension is sliced to print out in 2D
<br>
Higher-order arrays are possible, ugly to print out
```{r}
Xarray<-array(rnorm(64), dim=c(2,2,2,2,2,2)) # 6 dimensions
```
But easy to subset:
```{r}
Xarray[1:2,1:2,1,1,1,1]
Xarray[1:2,1,1,1:2,1,1]
```
Why are these numbers not the same?
### Matrix Algebra Basics
```{r}
## Create some vectors to play with
X<-c(1:10)
X
Y<-c(1:10*0.5)
Y
## Use pretty much any standard operator for element-by element calculations
X*Y # Multiply corresponding element (e.g. X[1]*Y[1], then X[2]*Y[2], etc)
X+Y
X/Y
X^Y
log(X)
exp(Y)
## More advanced matrix algebra
X%*%Y # Matrix multiplication (e.g. X[1]*Y[1]+X[2]*Y[2]...)
sum(X*Y) == X%*%Y
Z<-X[1:4]%o%Y[1:3] # Outer product
Z
t(Z) # Transpose
crossprod(X[1:4],Z) # Cross product
crossprod(Z) # Cross product of Z and t(Z) a.k.a. Z'Z
diag(4) # Identity matrix, 4x4 in this case
diag(Z) # Diagonal elements of Z
```
### Principal components analysis
Widely used in biology; from community ecology and metagenomics to gene expression
`prcomp()`
```{r}
prcomp(Z)
```
## Lists
A group of objects
Can include cells, vectors, and higher-order arrays
Each element has a name
```{r}
MyList<-list(name="SWC",potpourri=Xvec,numbers=1:10)
MyList
```
### A few ways to subset a list
```{r}
MyList$numbers # Use $ to subset by name
MyList[3] # A 'slice' of MyList
MyList[[3]] # An 'extract' of MyList
```
What's the difference between [] and [[]]?
Look carefully at the output above; notice how the [] includes $numbers but the [[]] includes only the values? This is important if you want to use the slice:
```{r, error=TRUE}
2*MyList[[3]]
2*MyList[3]
```
### Protip:
Many analysis functions in R output as lists (e.g. statistical packages)
<br>
For example, the output of prcomp:
```{r}
prcomp(Z)
names(prcomp(Z))
prcomp(Z)$center
prcomp(Z)$scale
```
## paste()
Versatile function for manipulating output
```{r}
paste("Hello World!") # Basic string
paste("Hello","World!") # Concatenate two strings
paste(1:10) # Paste numbers as strings
paste(1:10)[4] # Note that each number is a separate cell in a vector of strings
as.numeric(paste(1:10)) # Convert back to numbers
paste(1:10,collapse=".") # Collapse separate cells to produce a single string
```
Note what happens if we combine objects of different length:
```{r}
paste("Hello",1:10,sep="-") # Note
```
## ? for HELP
Provides detailed information on functions
and their important parameters
```{r eval=F}
?paste
```
<br>
***
<br>
# 4. Working with data
## Working Directory
Set your working directory
```{r, eval=F}
setwd("C:/Users/rob_c/Documents")
```
Check current working directory
```{r, eval=F}
getwd()
```
## Import data
Download [this dataset](./FallopiaData.csv) and save in your current working directory -- for example your project folder or whatever directory is shown when you run the command `getwd()`
Import data from .csv file into an object called 'MyData'
```{r}
MyData<-read.csv("FallopiaData.csv",header=T) # Header=T tells read.csv to interpret first row as column labels
```
**Important**: objects created by `read.csv` and other `read.?` functions are special objects called **`data.frame`** objects.
## **`data.frame`**
A **`data.frame`** is a special type of 2D matrix with additional indexing information for rows/columns of data
This format is partly why R is so useful for data analysis
<br>
Inspecting your **`data.frame`** object
```{r}
names(MyData) # See column names
head(MyData) # Show first six rows of data
tail(MyData) # Show last six rows of data
dim(MyData) # Number of rows x columns (or 'dimension') of the data object
nrow(MyData) # Number of rows only
ncol(MyData) # Number of columns only
str(MyData) # Data 'structure' - types of variables
```
### Protip:
`str()` is very important for functions that use data.frames including statistical analysis and plotting
Pay careful attention to 'int' vs 'num' vs 'factor'
#### Example:
in ANOVA, you want 'factor' as a predictor
in regression, you want 'int' or 'num' as a predictor
## Subset data
```{r}
MyData[1,] # Returns first row of data.frame
MyData[,1] # Returns first value of data.frame
MyData[,"PotNum"] # Returns values in "PotNum" column
MyData$PotNum # Another way to get the same output
subset(MyData,Scenario=="extreme") # Subset data where the Scenario column == 'extreme'
levels(MyData$Scenario)
```
## Adding calculations
e.g. add a new column that is the sum of others
```{r}
MyData$Total<-MyData$Symphytum+MyData$Silene+MyData$Urtica+MyData$Geranium+MyData$Geum+MyData$Fallopia
names(MyData)
print(MyData$Total)
```
<br>
***
<br>
# 5. Summary statistics
Make a vector containing unique values
```{r}
unique(MyData$Nutrients)
```
Find duplicated values
```{r}
duplicated(MyData$Nutrients)
```
The `duplicated` function returns a logical vector of TRUE (duplicated) and FALSE (not duplicated). A logic vector can be used to subset data
```{r}
MyData$Nutrients[duplicated(MyData$Nutrients)]
```
Calculate means of Total column for each level of Nutrients column
```{r}
aggregate(MyData$Total,list(MyData$Nutrients),mean)
```
OR to preserve column names:
```{r}
aggregate(Total~Nutrients,data=MyData,mean)
```
You can also use this to calculate means across different grouping variables
```{r}
aggregate(Total~Nutrients*Taxon*Scenario,data=MyData,mean)
```
Calculate standard deviations
```{r}
aggregate(Total~Nutrients,data=MyData,sd)
```
tapply is like aggregate, but note the difference in output
```{r}
tapply(MyData$Total,list(MyData$Nutrients),mean) # calculate means
tapply(MyData$Total,list(MyData$Nutrients),sd) # calculate standard deviation
```
<br>
***
<br>
# 6. Save output
## Saving Data:
```{r}
## Calulate means
NutrientMeans<-tapply(MyData$Total,list(MyData$Nutrients),mean)
## Save means as .csv file
write.csv(NutrientMeans,"MyData_Nutrient_Means.csv")
```
<br>
***
<br>
# 7. Flow control
## Brief exampes
Make up a couple of ojects to play with
```{r}
X<-21
Xvec<-c(1:10,"string")
```
### **`if(){}`**
```{r}
if(X > 100){
print("X > 100")
} else {
print("X <= 100")
}
```
### **`for(){}`**
```{r}
# Loop through numbers from 1 to X
for (i in 1:X){
print(paste(X,i,sep=":"))
}
# Loop through elements of a vector directly
for (i in Xvec){
print(i)
}
# Use an index to loop through the elements
for (i in 1:length(Xvec)){
print(Xvec[i])
}
```
Counter variables can be very useful to help keep track of loops:
```{r}
count1<-1
count10<-1
for(i in 1:10){
print(count1)
print(count10)
count1<-count1+1
count10<-count10*10
}
```
Just think carefully about whether you want to update at the beginning of the loop, at the end, or somewhere in between.
```{r}
countbefore<-1
countafter<-1
for(i in 1:10){
countbefore<-countbefore+1
print(countbefore)
print(countafter)
countafter<-countafter+1
}
```
### **`while(){}`**
```{r}
count<-0
while (count < X){
print(count)
count<-count+1
}
```
# 8 TEST yourself:
Are you ready to test your knowledge?
If so, click [HERE](./1_fundamentals_test.html)