-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathuvabits_to_movebank.Rmd
173 lines (128 loc) · 5.86 KB
/
uvabits_to_movebank.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
---
title: "Prepare Uva-BiTS (meta)data for Movebank upload"
author: "Peter Desmet"
date: "`r Sys.Date()`"
output:
html_document
---
This script prepares [UvA-BiTS](http://www.uva-bits.nl/) bird tracking data for upload to [Movebank](https://www.movebank.org/). For a project, it downloads **reference, GPS and acceleration data** from UvA-BiTS and transforms these to the [Movebank data format](https://www.movebank.org/node/2381) using [SQL](sql). See the steps at the end of this document to check for outliers and to concatenate files before upload.
```{r setup, include = FALSE}
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
```
Load libraries:
```{r}
library(tidyverse)
library(DBI)
library(RPostgres)
library(config)
library(here)
library(glue)
# For RPostgres, see https://github.com/r-dbi/RPostgres/issues/181#issuecomment-438700465
```
## Connect to UvA-BiTS database
Get connection settings from `config.yml` (not included in this repo) and connect to database:
```{r connect_to_uvabits}
uvabits <- config::get("uvabits")
con = DBI::dbConnect(
drv = RPostgres::Postgres(),
dbname = uvabits$database,
host = uvabits$server,
port = uvabits$port,
user = uvabits$username,
password = uvabits$password
)
# The alternative is getting the connection defined in odbc.ini:
# con <- DBI::dbConnect(odbc::odbc(), "uvabits")
```
## Set user-defined values
Some values cannot be automatically assumed for the Movebank format and have to be defined by the user. See [here](https://www.movebank.org/node/2381#metadata) for information on all reference data terms.
Set values:
```{r}
project_id <- "O_WESTERSCHELDE"
shared <- FALSE
overwrite_files <- FALSE
animal_life_stage <- "adult"
manipulation_type <- "none"
bird_remarks_is_nickname <- TRUE
start_date <- "2019-01-01 00:00:00"
end_date <- "2019-12-31 24:00:00"
```
## Get reference data
For a project, get data on animals, tags and deployments in the [Movebank reference data format](https://www.movebank.org/node/27).
Query data:
```{r get_ref_data}
movebank_ref_sql <- glue::glue_sql(readr::read_file(here::here("sql", "uvabits_to_movebank_ref.sql")), .con = con)
movebank_ref_data <- DBI::dbGetQuery(con, movebank_ref_sql)
```
Save to CSV:
```{r}
readr::write_csv(movebank_ref_data, file = here::here("data", "processed", project_id, "movebank_ref_data.csv"), na = "")
```
### Note on track session remarks
Text in `track_session.remarks` is parsed to different Movebank fields using regular expressions and should follow a certain format:
```
Waterland-Oudeman | wing_length: 385 | tarsus_length: 71 | Tracker reused from L143467, assumed dead.
# Release location only
Waterland-Oudeman
# Measurements only
wing_length: 385 | tarsus_length: 71
# Remarks only
| Tracker reused from L143467, assumed dead.
```
- `study-site` = `Waterland-Oudeman` (the release location). Write at start of `remarks`. Start with capital letter, don't use spaces.
- `deploy-on-measurements` = `wing_length: 385 | tarsus_length: 71`. Write as `key_name: value` pairs, where `key_name` is lower snake_case and pairs are separated by ` | `.
- `deployment-comments` = `Tracker reused from L143467, assumed dead.`. Write at end of `remarks` and end with `.`. Don't use `:`.
- `deployment-end-type` = `dead`. The words `dead`, `defect`, `malfunction` in `remarks` are used to map to specific types.
## Get ring numbers
Get individuals with track sessions starting before the `end_date` (avoids unnecessary downloads for individuals with later tracking sessions) by ring number (stored as `animal-id`). Note that ring number is a better identifier than `tag-id` as it is unique to an individual (and cannot be reused).
```{r}
ring_numbers <- movebank_ref_data %>%
dplyr::filter(`alt-project-id` == project_id) %>%
dplyr::filter(`deploy-on-timestamp` <= end_date) %>%
dplyr::distinct(`animal-id`, .keep_all = TRUE) %>%
dplyr::arrange(`animal-id`) %>%
dplyr::pull(`animal-id`)
```
Load custom function to download data (will iterate over ring numbers):
```{r}
source(here::here("src", "functions", "download_data.R"))
```
## Get GPS data
Query and save data (one csv file per ringnumber):
```{r}
download_data(
sql_file = here::here("sql", "uvabits_to_movebank_gps.sql"),
download_directory = here::here("data", "processed", project_id, "gps"),
download_filename = "movebank_gps",
ring_numbers = ring_numbers,
shared = shared,
con = con,
overwrite = overwrite_files
)
```
## Get acceleration data
Query and save data (one csv file per ringnumber):
```{r}
download_data(
sql_file = here::here("sql", "uvabits_to_movebank_acc.sql"),
download_directory = here::here("data", "processed", project_id, "acc"),
download_filename = "movebank_acc",
ring_numbers = ring_numbers,
shared = shared,
con = con,
overwrite = overwrite_files
)
```
## Outlier detection
After download, run `outliers.Rmd` to mark outliers based on speed.
## Upload to Movebank
For smaller projects (e.g. MH_WATERLAND), files can be uploaded to Movebank as they are queried here: per individual (encompassing all years). To update the dataset on Movebank, just query all years again and replace the file on Movebank.
For larger projects (e.g. HG_OOSTENDE), it is better to upload per year, as data won't change for previous years. This can be done by using the `start_date`/`end_date` filters above and concatenating the resulting individual files. To update the dataset on Movebank, start from the last uploaded year (as it might contain new records), replace that one on Movebank, and add the new years as new files. The maximum upload limit per file is around 1.2 GB, so large files might need to be broken up (especially acceleration data).
To concatenate files, use:
```
awk 'FNR > 1' gps_outliers/movebank_gps_*.csv > movebank_gps_yyyy_nh.csv
```
After which the header needs to be added with:
```
cat movebank_gps_outliers_headers.csv movebank_gps_yyyy_nh.csv > movebank_gps_yyyy.csv
```