Skip to content

Commit

Permalink
Merge pull request #262 from 4DModeller/Iss253/data_loading
Browse files Browse the repository at this point in the history
Add a data loading tutorial
  • Loading branch information
mnky9800n authored Nov 20, 2023
2 parents 1fb71c8 + e368c7b commit 77e0dee
Showing 1 changed file with 192 additions and 0 deletions.
192 changes: 192 additions & 0 deletions vignettes/data_loading.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
---
title: "Data loading"
output:
bookdown::html_document2:
base_format: rmarkdown::html_vignette
fig_caption: yes
link-citations: yes
vignette: >
%\VignetteIndexEntry{Data loading}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

In this tutorial, we'll show the steps of importing netCDF and raster data into R and preparing for the modelling in the fdmr package.

# Import netCDF data and prepare for fdmr
To begin, we'll demonstrate how to import netCDF data into R. Now we create a netCDF file which stores the temperature values at a number of geographical locations provided with longitude and latitude information, at 5 time points.

```{r createNetCDF}
library(ncdf4)
filename="temp.nc"
xvals <- seq(-177.5, 177.5, 10)
yvals <- seq(-87.5, 87.5, 10)
nx <- length(xvals)
ny <- length(yvals)
lon <- ncdim_def("longitude", "degrees_east", xvals)
lat <- ncdim_def("latitude", "degrees_north", yvals)
time <- ncdim_def("Time","months", 1:5, unlim=TRUE)
var_temp <- ncvar_def("temperature", "celsius",
list(lon, lat, time),
longname="value")
ncnew <- nc_create(filename, list(var_temp))
data <- runif(nx*ny*5, 0,1)
ncvar_put(nc=ncnew,
varid=var_temp,
data,
start=c(1,1,1),count=c(nx,ny,5))
```

`ncnew` is a netCDF file, whose class is `ncdf4`. This file has 1 variable (temperature), and 3 dimensions (longitude, latitude and time).

```{r}
class(ncnew)
print(paste("The file has", ncnew$nvars,"variables"))
print(paste("The file has", ncnew$ndim,"dimensions"))
```

A summary of `ncnew` is
```{r}
print(ncnew)
```

```{r closenetCDF}
nc_close(ncnew)
```

Now the netCDF file `ncnew` with the name `temp.nc` is created, and we can read the values we put in.

```{r opennetCDF}
ncnew <- nc_open('temp.nc')
time <- ncvar_get(ncnew,"Time")
nt <- dim(time)
tmp_vec_long <- as.vector(ncvar_get(ncnew,"temperature"))
tmp_mat <- matrix(tmp_vec_long, ncol=nt)
lon <- ncvar_get(ncnew, "longitude")
lat <- ncvar_get(ncnew, "latitude")
lonlat <- as.matrix(expand.grid(lon,lat))
```

We store the data in a data frame with a structure that is expected by `fdmr`.

```{r storedat}
tmp_df <- data.frame(cbind(lonlat,tmp_mat))
names(tmp_df) <- c("lon","lat","tmp_time1","tmp_time2","tmp_time3",
"tmp_time4","tmp_time5")
tmp_df<-reshape(tmp_df,
varying = c("tmp_time1","tmp_time2","tmp_time3",
"tmp_time4","tmp_time5"),
v.names = "temperature value",
timevar = "time",
times = c("1", "2", "3","4","5"),
idvar= 'location ID',
new.row.names = 1:(nx*ny*nt),
direction = "long")
```

Then the first 6 rows of the data frame can be viewed using the following code.

```{r headat}
utils::head(tmp_df)
```

Here we give another example of importing the netCDF file named `oisst-sst.nc`, which is available at [https://github.com/rstudio/leaflet/tree/main/docs/nc](https://github.com/rstudio/leaflet/tree/main/docs/nc), into R. We first download it from the above link, and then save it to the computer. Likewise, we use the `nc_open()` function from the `ncdf4` package to open and import this netCDF file into R. Ensure that the R working directory is set to the location of `oisst-sst.nc`, and then pass in the filename (including the extension) of the netCDF file as the first argument to the `nc_open()` function.


```{r imporoisst, eval=FALSE}
oisst<-nc_open('oisst-sst.nc')
```

A summary of `oisst` is
```{r, eval=FALSE}
print(oisst)
```

`oisst` is a `ncdf4` object, which contains one variable named `Daily.sea.surface.temperature` and two dimensions, i.e., longitude and latitude. Now we can read the values, and store them in a data frame with a structure that is expected by `fdmr`.

```{r, eval=FALSE}
Daily_sea_surface_temperature <- as.vector(ncvar_get(oisst,"Daily.sea.surface.temperature"))
lon <- ncvar_get(oisst, "longitude")
lat <- ncvar_get(oisst, "latitude")
lonlat <- as.matrix(expand.grid(lon,lat))
tmp_df <- data.frame(cbind(lonlat,Daily_sea_surface_temperature))
colnames(tmp_df)<-c('lon', 'lat', 'Daily.sea.surface.temperature')
```

The first 6 rows of the data frame can be viewed using the following code.

```{r headoisst, eval=FALSE}
utils::head(tmp_df)
```

# Import raster data and prepare for fdmr

In this section we'll demonstrate how to import raster data into R. Now we create a raster file which stores the temperature values at a number of geographical locations provided with longitude and latitude information, at 3 time points.


```{r createRaster}
library(raster)
r1 <- raster(ncol=30, nrow=30, xmn=-180, xmx=180, ymn=-90, ymx=90)
projection(r1) <- "+proj=longlat +datum=WGS84"
values(r1) <- runif(length(values(r1)),0,1)
r2 <- raster(ncol=30, nrow=30, xmn=-180, xmx=180, ymn=-90, ymx=90)
projection(r2) <- "+proj=longlat +datum=WGS84"
values(r2) <- runif( length(values(r2)),0,1)
r3 <- raster(ncol=30, nrow=30, xmn=-180, xmx=180, ymn=-90, ymx=90)
projection(r3) <- "+proj=longlat +datum=WGS84"
values(r3) <- runif( length(values(r3)),0,1)
r_stack = stack(list(r1=r1, r2=r2, r3=r3))
```


The class of `r_stack` is a `raster`.

```{r classraster}
class(r_stack)
```

Note that here we create a raster object directly in the R environment, but raster files are most easily read into R with the `raster()` function from the `raster` package. You simply pass in the filename (including the extension) of the raster file as the first argument. For example, if the raster file is a netCDF file, it can be loaded into R by

```{r importraster, eval=FALSE}
r_stack <- raster::raster('filename.nc')
```

We can plot the raster data at each time point.

```{r plotraster, fig.cap="Plots of the raster data at each time point.", fig.width=8, fig.height=4, fig.align='center'}
plot(r_stack)
```

Then we extract the data values in `r_stack`, and save them in a data frame with a structure that is expected by `fdmr`.

```{r storedata}
r_df <-data.frame(raster::rasterToPoints(r_stack))
r_df<-reshape(r_df,
varying = c("r1", "r2", "r3"),
v.names = "temperature value",
timevar = "time",
times = c("1", "2", "3"),
idvar= 'location ID',
new.row.names = 1:(nx*ny*nt),
direction = "long")
```

Then the first 6 rows of the data frame can be viewed using the following code.

```{r headata}
utils::head(r_df)
```

0 comments on commit 77e0dee

Please sign in to comment.