-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathREADME.Rmd
113 lines (72 loc) · 3.7 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
[![Travis-CI Build Status](https://travis-ci.org/boettiger-lab/mdplearning.svg?branch=master)](https://travis-ci.org/boettiger-lab/mdplearning)
[![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/boettiger-lab/mdplearning?branch=master&svg=true)](https://ci.appveyor.com/project/boettiger-lab/mdplearning)
[![Coverage Status](https://img.shields.io/codecov/c/github/boettiger-lab/mdplearning/master.svg)](https://codecov.io/github/boettiger-lab/mdplearning?branch=master)
[![Project Status: WIP - Initial development is in progress, but there has not yet been a stable, usable release suitable for the public.](http://www.repostatus.org/badges/latest/wip.svg)](http://www.repostatus.org/#wip)
[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/mdplearning)](https://cran.r-project.org/package=mdplearning)
[![DOI](https://zenodo.org/badge/68045474.svg)](https://zenodo.org/badge/latestdoi/68045474)
# mdplearning
## Install
```{r eval=FALSE}
devtools::install_github("boettiger-lab/mdplearning")
```
## Basic Use
Use transition matrices for two different modesl in an example fisheries system:
```{r message=FALSE}
library("mdplearning")
library("ggplot2")
library("dplyr")
library("tidyr")
```
```{r}
source(system.file("examples/K_models.R", package="mdplearning"))
transition <- lapply(models, `[[`, "transition")
```
Use the reward matrix from the first model (reward function is known)
```{r}
reward <- models[[1]][["reward"]]
```
### Planning
Compute the optimal policy when planning over model uncertainty, without any adaptive learning. Default type is policy iteration. Default prior belief is a uniform belief over the models.
```{r}
unif <- mdp_compute_policy(transition, reward, discount)
```
We can compare this policy to that of believing certainly in either model A or in model B:
```{r}
lowK <- mdp_compute_policy(transition, reward, discount, c(1,0))
highK <- mdp_compute_policy(transition, reward, discount, c(0,1))
```
We can plot the resulting policies. Note that uniform uncertainty policy is a compromise intermediate between low K and high K models.
```{r fig1}
dplyr::bind_rows(unif = unif, lowK = lowK, highK = highK, .id = "model") %>%
ggplot(aes(state, state - policy, col = model)) + geom_line()
```
We can use `mdp_planning` to simulate (without learning) by specifying a fixed policy in advance. `mdp_planning` also permits us to include observation error in the simulation (though it is not accounted for by MDP optimization).
```{r fig2}
df <- mdp_planning(transition[[1]], reward, discount, x0 = 10, Tmax = 20,
policy = unif$policy, observation = models[[1]]$observation)
df %>%
select(-value) %>%
gather(series, stock, -time) %>%
ggplot(aes(time, stock, color = series)) + geom_line()
```
### Learning
Given a transistion matrix from which the true transitions will be drawn, we can use Bayesian learning to update our belief as to which is the true model. Note that we must now specify a list of transition matrices representing the models under consideration, and separately specify the true transition. The function also now returns a list, which includes two data frames; one for the time series as before, and another showing the evolution of the posterior belief over models.
```{r}
out <- mdp_learning(transition, reward, discount, x0 = 10,
Tmax = 20, true_transition = transition[[1]])
```
The final belief shows a strong convergence to model 1, which was used as the true model.
```{r}
out$posterior[20,]
```