Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

finished assignment 2 #8

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
150 changes: 125 additions & 25 deletions 1_Download_Visualize.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,9 @@ library(GGally)
knitr::opts_chunk$set(echo = TRUE)
```


# Data Acquisition

For this assignment we are going to be playing with annually
aggregated metrics of USGS flow data from the [CAMELS](https://ral.ucar.edu/solutions/products/camels) dataset. This dataset
has sparked a revolution in machine learning in hydrology.

For this assignment we are going to be playing with annually aggregated metrics of USGS flow data from the [CAMELS](https://ral.ucar.edu/solutions/products/camels) dataset. This dataset has sparked a revolution in machine learning in hydrology.

```{r}
if(!file.exists('data')){
Expand Down Expand Up @@ -53,17 +49,19 @@ download.file('https://gdex.ucar.edu/dataset/camels/file/camels_vege.txt',
download.file('https://gdex.ucar.edu/dataset/camels/file/camels_hydro.txt',
'data/hydro.txt')

#geol (response variable)
download.file('https://gdex.ucar.edu/dataset/camels/file/camels_geol.txt',
'data/geol.txt')


# Variable definitions
download.file('https://gdex.ucar.edu/dataset/camels/file/camels_attributes_v2.0.pdf',
'data/meta.pdf')

```


## Data org


```{r}
dat_files <- list.files('data',
full.names = T)
Expand All @@ -73,55 +71,48 @@ dat_files <- list.files('data',
climate <- read_delim(dat_files[1], delim = ';')

hydro <- read_delim('data/hydro.txt', delim = ';')

geol <- read_delim('data/geol.txt', delim = ';')
```

## Initial data viz


### Baseflow

### Baseflow

```{r}

ggplot(hydro, aes(x = baseflow_index,
y = q95)) +
geom_point()


```


Baseflow doesn't strongly control Q95 in a predictable way.


Baseflow doesn't strongly control Q95 in a predictable way.

### Climate controls


```{r}

cq <- inner_join(climate, hydro %>%
select(gauge_id, q95))



ggplot(cq, aes(x = p_mean, y = q95)) +
geom_point() +
geom_smooth(method = 'lm', se = F)

p_mean_mod <- lm(q95 ~ p_mean, data = cq)

```


#### All at once

```{r}

png(filename = 'bigclimeplot.png', width = 10, height = 8, units = 'in', res = 300)

cq %>%
select_if(is.numeric) %>%
ggpairs()

dev.off()


Expand All @@ -138,17 +129,126 @@ ggplot(long_cq, aes(value,
scales = 'free')
```

The average precip (p_mean) controls 71% of the variation in 195, where every 1 mm/day increase in long-term average precip increases the q95 by 2.95 mm/day.

The average precip (p_mean) controls 71% of the variation in 195, where every 1 mm/day increase in long-term average precip increases the q95 by 2.95 mm/day.

# Assignment

## What are three controls on average runoff ratio?
## What are three controls on average runoff ratio?

```{r}
cq_rr <- inner_join(climate, hydro %>%
select(gauge_id, runoff_ratio))
```

```{r}
png(filename = 'bigrrplot.png', width = 10, height = 8, units = 'in', res = 300)

cq_rr %>%
select_if(is.numeric) %>%
ggpairs()

dev.off()
```

```{r}
#plot relationships
ggplot(cq_rr, aes(x=p_mean, y = runoff_ratio)) + geom_point() + geom_smooth(method = "lm", se = F)

ggplot(cq_rr, aes(x=high_prec_freq, y = runoff_ratio)) + geom_point() + geom_smooth(method = "lm", se = F)

## What are three controls on baseflow_index?
ggplot(cq_rr, aes(x=low_prec_freq, y = runoff_ratio)) + geom_point() + geom_smooth(method = "lm", se = F)
```

```{r}
#model relationships to get summarys

## What are three controls on mean flow?
p_mean_rr_mod <- lm(runoff_ratio~p_mean, data = cq_rr)
summary(p_mean_rr_mod)

high_p_rr_mod <- lm(runoff_ratio~high_prec_freq, data = cq_rr)
summary(high_p_rr_mod)

low_p_rr_mod <- lm(runoff_ratio~low_prec_freq, data = cq_rr)
summary(low_p_rr_mod)
##repeat this for the other two! Then write a description of what we found
```

Three controls on average runoff ratio include p_mean, high precipitation frequency, and low precipitation frequency. All three variables have a highly significant p-value in relationship to runoff ratio.

## What are three controls on baseflow_index?

```{r}
cq_bf <- inner_join(climate,hydro %>%
select(gauge_id, baseflow_index))
```

```{r}
png(filename = 'bigbfplot.png', width = 10, height = 8, units = 'in', res = 300)

cq_bf %>%
select_if(is.numeric) %>%
ggpairs()

dev.off()
```

```{r}
#plot relationships
ggplot(cq_bf, aes(x=frac_snow, y = baseflow_index)) + geom_point() + geom_smooth(method = "lm", se = F)

ggplot(cq_bf, aes(x=high_prec_freq, y = baseflow_index)) + geom_point() + geom_smooth(method = "lm", se = F)

ggplot(cq_bf, aes(x=low_prec_freq, y = baseflow_index)) + geom_point() + geom_smooth(method = "lm", se = F)
```

```{r}
frac_snow_bf_mod <- lm(baseflow_index~frac_snow, data = cq_bf)
summary(frac_snow_bf_mod)

high_prec_freq_bf_mod <- lm(baseflow_index~high_prec_freq, data = cq_bf)
summary(high_prec_freq_bf_mod)

low_prec_freq_bf_mod <- lm(baseflow_index~low_prec_freq, data = cq_bf)
summary(low_prec_freq_bf_mod)
```

Three controls on average runoff ratio include fraction of precip falling as snow, high precipitation frequency, and low precipitation frequency. All three variables have a highly significant p-value in relationship to runoff ratio.

## What are three controls on mean flow?

```{r}
cq_flow <- inner_join(climate,hydro %>%
select(gauge_id, q_mean))
```

```{r}
png(filename = 'bigmeanflowplot.png', width = 10, height = 8, units = 'in', res = 300)

cq_flow %>%
select_if(is.numeric) %>%
ggpairs()

dev.off()
```

```{r}
#plot relationships
ggplot(cq_flow, aes(x=p_mean, y = q_mean)) + geom_point() + geom_smooth(method = "lm", se = F)

ggplot(cq_flow, aes(x=aridity, y = q_mean)) + geom_point() + geom_smooth(method = "lm", se = F)

ggplot(cq_flow, aes(x=low_prec_dur, y = q_mean)) + geom_point() + geom_smooth(method = "lm", se = F)
```

```{r}
p_mean_q_mod <- lm(q_mean~p_mean, data = cq_flow)
summary(p_mean_q_mod)

aridity_q_mod <- lm(q_mean~aridity, data = cq_flow)
summary(aridity_q_mod)

low_prec_dur_q_mod <- lm(q_mean~low_prec_dur, data = cq_flow)
summary(low_prec_dur_q_mod)
```

Three controls on average runoff ratio include fraction of mean precipitation, aridity, and low precipitation duration. All three variables have a highly significant p-value in relationship to runoff ratio.
Loading