Create an index, style guide, handle remaining bugs #238

trevorcampbell · 2021-09-22T19:06:58Z

This PR is not done yet.

I've finished everything listed below except the index (waiting for wrangling PR).

To address:

Index

I made one! This required some fixes to enable compile to PDF:

Fixes to make the book compile to PDF

need to add figure units to out.width
- done: added pt
the gentoo penguin image in clustering chapter breaks things with percent signs in its name
- done: i stored a local copy of it instead
clustering two-column "center update / label update" spaces use unicode (non-latex) characters
- done: no-op. xelatex handles unicode characters.
img/population_vs_sample.svg -- svgs break pdf output
- done: i rendered this to png instead
unicode characters in the jupyter section
- done: no-op. xelatex handles unicode characters.
duplicate entry in refs.bib (wilson2014best)
- done: removed old entry
Lots of fig.retina = ... in inference.Rmd without actually setting out.width (causes LaTeX to generate width=468px which breaks xelatex because the unit should be pt
- done: either set width explicitly or removed setting retina
Fixed @online entries in references.bib to be @misc

This work closes #120

Style Guide

I created one. It's in README.md. I just made executive decisions on everything in the interest of moving this along, but feel free to raise anything I did there as a discussion item. Also feel free to move the style guide into its own .md file and link from README.md if desired.

The work on the style guide closes #240 -- but note that we still need to do the actual formatting pass to fix it to conform to the style.

Clustering

Fixed the two examples of "failures" in clustering (bad init, bump in elbow plot).
This closes #236

Videos in chapter 2

No-op. These videos were removed (I assume it was the jupyter one). The only videos left are in the additional resources at the end, and those aren't about previewing data in jupyter. The jupyter chapter no longer has any videos.

I've added to the style guide a mention of making sure video links are usable in both pdf and html.
This work closes #189

Database in Chapter 2

Made it clearer that the DB host/username/password info that we use in ch2 is fake. This closes #187

Feedback on multicollinearity

The first paragraph uses terms we have not used in the book before (outliers and collinear predictors) without defining them. Could we roughly hint at what they are here, or say that we will explain them, and their impact below?

agree, done

In the outlier section, should we discuss the simple solution of dropping outliers? Why this can work sometimes, and why sometimes this should be avoided and instead other advanced methods (like median or quantile regression, should be used?)

Nope - I don't think we should go into it in this text. I don't think it's possible to convey it in such a way that it won't cause students to go "oh well I don't like this datapoint, it looks wrong to me" and then just arbitrarily throw out data.

What do unstable coefficients mean, we should explain this term.

I agree, unstable is a bit too technical -- replaced with "unreliable" + explanation instead!

We say "Therefore, when performing multivariate linear regression, it is important to avoid including very linearly related predictors." but this begs the question how do we avoid this? Could we suggest or show some simple ways of looking at this? And then what to do if they see this? For example, in the case of the two measurements, you could drop one, or you could create a new feature from these two where you average them?

Definitely out of scope. I just point the reader to the end of the chapter.

At the end of the multicollinearity section, can we point to resources where the can read more about this in case they are interested or have to deal with this in a more complex case than we present?

See above.

At the end of the outlier section, can we point to resources where the can read more about this in case they are interested or have to deal with this?

agreed, but not worth the effort before CRC subm

This work closes #195

Feedback on predictor design

We write "For example, a dataframe df with two variables—x and y—with a nonlinear relationship between the two variables will not be correctly captured by simple linear regression, as shown in Figure 9.11. " but is "correct" the right word? Maybe "fully" might be better?

agree, done

Can we make it clear that we drop X, when we add Z to the model?

I don't think we need to -- the plot axes are clear that we're plotting z now

Should we use captitals to refer to X, Y and Z here?

I don't see any reason to - lowercase seems fine

what is df, we never define it...

it was defined in the text, but I added a quick printout of df at the beginning

At the end of this section, can we point to resources where the can read more about this in case they are interested or have to deal with this in a more complex case than we present?

I agree, but not worth the effort prior to CRC subm.

This work closes #196

`set.seed`

seed/randomness was previously first introduced at the end of classification 1 for a single off-the-cuff usage in picking 3 observations to create imbalanced data; I thought it was a bit hidden and not clearly described
I removed that usage of randomess
now the first introduction of randomness is in classification2, where we can justify it clearly saying "OK now we need randomness since we're splitting data"
I have now devoted a whole section to randomness and seeds
it's now clear that you only set.seed once at the beginning of your analysis
I've hidden everywhere in the textbook where we "seed hack" to get results we want (for pedagogical reasons) and just put a seed at the start of the chapter when we load tidyverse, reminding the reader why we do this

This work closes #168

Miscellaneous

fixed y axis labels in inference chapter
fixed typos throughout
cleaned the appendix a bit
fixed weird learning objective spacing in chapter 2
fixed super long lines in ch2; 80ch line limit
moved faithful_plot.* images to img/ folder
fixed bad sample in the first example (small_sacramento) in regression1 (there was a point lying basically right on top of the dashed line)

…nas for pdf output in inferencermd

…n and wrangling first

leem44

@trevorcampbell Thank you for putting all of this together!! Looks great!

The page numbers in the index seem to be off by 2. E.g. "data science": in index says it's on page 5 but really its on page 7
I guess we'll need to do another pass and look at all the plots and code that fall off page, and also the plots that are in weird places (e.g. in the middle of text etc)
nice addition of the set.seed explanation!
a bunch of the figure references display "Figure ??" -- I started pointing them out and then realized that they are in a lot of places so likely this will need to be a part of the formatting pass

intro.Rmd

inference.Rmd

version-control.Rmd

trevorcampbell added 6 commits September 22, 2021 11:09

added basic makeindex commands

eed7eaa

added pdf build scripts

fe2082f

make pdfbuild executable

f8e8e2f

made pdfbuildR script executable

7984533

added gentoo image locally; fixed always allow html in preamble

440a8e4

added local link to gentoo img

9e669f8

trevorcampbell changed the title ~~Makeindex~~ Create an index Sep 22, 2021

trevorcampbell added 6 commits September 22, 2021 12:28

added pdfbuild instructions to readme

6d57299

removed svg figure from inference

501749e

added pt measurement to out.widths

c47bb36

fixed refs bib to work with pdf output; fixed out.widths and fig.reti…

186ee5e

…nas for pdf output in inferencermd

fixed bib entry for tidy data

4d416e7

style guide template added to readme

fbc147d

trevorcampbell changed the title ~~Create an index~~ Create an index, formatting/style guide Sep 22, 2021

trevorcampbell added 14 commits September 22, 2021 14:58

created index entries for intro, indexrmd

b4abf3a

added 80ch line limit to readme style

898b780

added index entries to intro

277d829

Added index entries to reading

3820eac

added index to viz; fixed error in viz

a4bbebb

Added index to vsn-ctl, minor typo fix

7cfcdee

added set seed style to readme

e9b5eb9

added index entries to setup

46bbd43

typo index fix

7732220

Added index entries to jupyter

fc5fba6

added index entries to clustering,inference

03349af

minor appendix edit; added index entries

89bc487

started adding index entries to regression; waiting for classificatio…

a050172

…n and wrangling first

filling out the style guide

37e2e8a

trevorcampbell mentioned this pull request Sep 23, 2021

Fix/improve index #120

Closed

added more points/issues in style guide

3f2a9ce

trevorcampbell added 9 commits September 24, 2021 00:38

removed seed from class1; added seed section to class 2

fdd2f09

hide seeds in classification 2

609d4b9

hidden seeds in regress 1

4ab7f45

hide seeds in reg2; added fake load packages

f9d7a69

added fake data load reg2

e36ee1a

hidden seeds in clustering

927d82d

hidden seeds in inference

d54bb7a

moved faithful images to img folder

5871db3

moved faithful figs to img/

9a36055

trevorcampbell removed a link to an issue Sep 24, 2021

videos in chapter 2 #189

Closed

trevorcampbell added 2 commits September 24, 2021 10:54

fixed small sacramento example seed

b61e7ff

Fixed index seed on hidden seeds

fc82cc1

trevorcampbell linked an issue Sep 24, 2021 that may be closed by this pull request

videos in chapter 2 #189

Closed

trevorcampbell added 5 commits September 24, 2021 11:07

added style guide about URLs

d58a3f7

added more counterexamples to style guide

4d56b32

indices in classification and regression

42037b3

added indices to regression

a6fab1f

advice about passwords in README

8eff9e9

leem44 reviewed Sep 27, 2021

View reviewed changes

Merge branch 'dev' into makeindex

2cbd536

trevorcampbell mentioned this pull request Sep 27, 2021

Index page numbers wrong #244

Closed

trevorcampbell added 7 commits September 27, 2021 10:20

added more style guide items

4625cf0

removed duplicated new term bolding

49ef107

more style guide items

fe13389

fixed figure units in wrangling

4c8eade

added working dir index term to version ctl

9b8ce27

predictive to predictive question (and similar change elsewhere)

de2e0b8

predictive to predictive ques

b18985d

trevorcampbell merged commit 4eff8b4 into dev Sep 27, 2021

trevorcampbell deleted the makeindex branch December 1, 2021 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create an index, style guide, handle remaining bugs #238

Create an index, style guide, handle remaining bugs #238

trevorcampbell commented Sep 22, 2021 •

edited

Loading

leem44 left a comment

Create an index, style guide, handle remaining bugs #238

Create an index, style guide, handle remaining bugs #238

Conversation

trevorcampbell commented Sep 22, 2021 • edited Loading

Index

Style Guide

Clustering

Videos in chapter 2

Database in Chapter 2

Feedback on multicollinearity

Feedback on predictor design

set.seed

Miscellaneous

leem44 left a comment

Choose a reason for hiding this comment

trevorcampbell commented Sep 22, 2021 •

edited

Loading

`set.seed`