-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathmodeling.qmd
574 lines (424 loc) · 27.4 KB
/
modeling.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
# Machine Learning {#sec-ml}
In this section, we introduce the machine learning pipeline for estimating soil properties using soil NIR data. We will use the **mlr3** framework for predicting Soil Organic Carbon (SOC). We will train a predictive model using the NIR spectra from US soil samples, while the testing of the model's generalization happens on African soil samples.
::: {.callout-note}
## Note
The `train.csv` and `test.csv` are prepared in the previous section: [**2.6 Preparing machine learning dataset**](#subsec-preparing).
:::
The **mlr3** framework is well presented in @Bischl2024.
You can also find a lot of interesting video tutorials on machine learning in [OpenGeoHub training videos](https://av.tib.eu/publisher/OpenGeoHub_Foundation).
A list of all required packages for this section is provided in the following code chunk:
```{r, libraries, message=FALSE, warning=FALSE}
library("tidyverse")
library("mlr3verse")
library("Cubist") # Cubist ML algorithm
library("ranger") # Random Forest ML algorithm
library("glmnet") # Elastic net ML Algorithm
library("future") # Parallel processing
library("yardstick") # Additional metrics
library("mlr3extralearners") # Additional models and features
library("iml") # Feature importance
```
Except for **mlr3extralearners**, you can install all the above packages using `install.packages()`. In this case, follow the intructions on [**mlr3extralearners** GitHub repository](https://github.com/mlr-org/mlr3extralearners):
```{r, eval=FALSE}
# latest GitHub release
remotes::install_github("mlr-org/mlr3extralearners@*release")
```
Please, set the working directory (or load an RStudio project) in your local machine:
```{r working_directory}
my.wd <- "~/projects/local-files/soilspec_training"
# Or within an RStudio project
# my.wd <- getwd()
```
## Introduction to mlr3 framework
We introduce novel principles of the **mlr3** framework using the first chapter of the official **mlr3** book [@mlr3Intro].
::: {.callout-tip
(...) The **mlr3** package and the wider mlr3 ecosystem provide a generic, object-oriented, and extensible framework for regression, classification, and other machine learning tasks for the R language. (...) We build on R6 for object orientation and data.table to store and operate on tabular data.
:::
[![Figure 1: The mlr3 ecosystem](./img/mlr3_ecosystem.svg)](https://mlr3book.mlr-org.com/chapters/chapter1/Figures/mlr3_ecosystem.svg)
The most convenient way how to load the whole **mlr3** ecosystem is to load the `mlr3verse` package.
```{r setup, message=FALSE, warning=FALSE}
#install.packages("mlr3verse")
library("mlr3verse")
library("tidyverse")
# Loading train and test data
train_matrix <- read_csv(file.path(my.wd, "train.csv"))
test_matrix <- read_csv(file.path(my.wd, "test.csv"))
```
**What is R6?**
::: {.callout-note}
[**R6**](https://cran.r-project.org/web/packages/R6/index.html) is one of R’s more recent paradigms for object-oriented programming.
:::
**Objects** are created by constructing an instance of an [**R6::R6Class**](https://www.rdocumentation.org/packages/R6/versions/2.5.1/topics/R6Class) variable using the `$new()` initialization method. For example, say we have used a **mlr3** class `TaskRegr`, then `TaskRegr$new(id = "NIR_spectral", backend = train_matrix, target = "oc_usda.c729_w.pct")` would create a new object of class `TaskRegr` and sets the `id` , `backend `and `target` **fields** that encapsulates mutable state of the object.
**Methods** allow users to inspect the object’s state, retrieve information, or perform an action that changes the internal state of the object. For example`$set_col_roles()` method sets the special roles such as ID or blocking during resampling of the columns.
```{r R6_object_example, message=FALSE, warning=FALSE}
# Creating an example object
example_object <- TaskRegr$new(id = "NIR_spectral",
backend = train_matrix,
target = "oc_usda.c729_w.pct")
# Access to the $id field
example_object$id
# Calling methods of the object
example_object$set_col_roles("id.sample_local_c", role = "name")
example_object$set_col_roles("location.country_iso.3166_txt", role = "group")
example_object$col_roles
```
**mlr3 Utilities**
**mlr3** uses convenience functions called **helper or sugar functions** to create most of the objects. For example, `lrn("regr.rpart")` returns the decision tree learner without having to explicitly create a new R6 object.
**mlr3** uses **Dictionaries** to store R6 classes. For example `lrn("regr.rpart")` is a wrapper around `mlr_learners$get("regr.rpart")`. You can see an overview of available learners by calling the sugar function without any arguments, i.e. `lrn()`.
## Inspecting data and a baseline model
Soil organic carbon (SOC) is a measurable component of soil organic matter. Organic matter makes up just a small part of soil's mass and has an important role in the physical, chemical and biological function of agricultural soils. SOC is represented in percent units (0-100%) and have a highly right skewed probability distribution because SOC makes up just 2-10% of most soil's mass. The higher content is typical mainly for Organic soils i. e. peatbogs. Therefore SOC is often modeled in a log-scale to penalize for the higher SOC content that is a minor part of the data. Let's see if its is our case too.
```{r histograms, message=FALSE, warning=FALSE}
hist(train_matrix$oc_usda.c729_w.pct, breaks = 100)
```
Yes, it is. So lets apply `log1p` function to SOC values of our train and test set. `log1p` function calculates natural logarithm of the values with an offset of 1 (which avoid negative log values).
```{r log1p, message=FALSE, warning=FALSE}
train_matrix$oc_usda.c729_w.pct <- log1p(train_matrix$oc_usda.c729_w.pct)
test_matrix$oc_usda.c729_w.pct <- log1p(test_matrix$oc_usda.c729_w.pct)
hist(train_matrix$oc_usda.c729_w.pct, breaks = 100)
```
It is a good practice to train a dummy model as a baseline to track the predictive power increment of our ML models. But lets first create our ML task using **mlr3** syntax to follow the logic of the framework.
We will use the convenient method `as_task_regr()` to convert data object to a `TaskRegr` class. We need to set argument `target` specifying our target variable and `id` of the task. We will also set special role `"name"` (Row names/observation labels) to the `"id.sample_local_c"` column.
```{r task, message=FALSE, warning=FALSE}
# Setting a seed for reproducibility
set.seed(349)
# Create a task
train_matrix <- train_matrix %>%
select(-location.country_iso.3166_txt)
task_neospectra <- as_task_regr(train_matrix,
target = "oc_usda.c729_w.pct",
id = "NIR_spectral")
# Set row names role to the id.sample_local_c column
task_neospectra$set_col_roles("id.sample_local_c", roles = "name")
task_neospectra
```
We will train a dummy model that always predicts new values as the mean of the target variable. The dummy model is called `"regr.featureless"` in the **mlr3** dictionary. We can call `lrn("regr.featureless")` to create a `Learner` class R6 object with no explicit programming.
```{r featureless_baseline, message=FALSE, warning=FALSE}
# Load featureless learner
# featureless = always predicts new values as the mean of the dataset
lrn_baseline <- lrn("regr.featureless")
# Only one hyperparameter can be set: robust = calculate mean if false / calculate median if true
# lrn_baseline$param_set
# Train learners using $train() method
lrn_baseline$train(task_neospectra)
```
Now we can calculate the Root Mean Square Error (RMSE), Mean Error (Bias), and Coefficient of Determination (R^2^) to assess the goodness-of-fit of a model.
::: {.callout-tip}
# Root Mean Square Error (RMSE)
RMSE is calculated as a square root of the mean of the squared differences between predicted and expected values. The units of the RMSE are the same as the original units of the target value that is being predicted. RMSE of the perfect match is 0 ([Wikipedia](https://en.wikipedia.org/wiki/Root-mean-square_deviation)).
:::
::: {.callout-tip}
# Mean Error (Bias)
Bias is calculated as the average of error values. Good predictions score close to 0 ([Wikipedia](https://en.wikipedia.org/wiki/Mean_signed_deviation)).
:::
::: {.callout-tip}
# Coefficient of Determination (R^2^)
R^2^ is the proportion of the variation in the dependent variable that is predictable from the independent variable(s) ([Wikipedia](https://en.wikipedia.org/wiki/Coefficient_of_determination)). In the best case, the modeled values exactly match the observed values, which results in residual sum of squares 0 and R^2^ = 1. A baseline model, which always predicts mean, will have R2 <= 0 when a non-linear function is used to fit the data [@ColinCameron1997].
:::
We will use [`msrs` sugar function](https://mlr3.mlr-org.com/reference/mlr_measures.html) featured in the **mlr3** dictionary to create an object containing a list of class `Measures` objects: `c("regr.rmse", "regr.bias", "regr.rsq")`.
```{r featureless_baseline_evaluation, message=FALSE, warning=FALSE}
# Setting a seed for reproducibility
set.seed(349)
# Define goodness-of-fit metrics
measures <- msrs(c("regr.rmse", "regr.bias", "regr.rsq"))
# Predict on a new data using $predict_newdata() method
pred_baseline <- lrn_baseline$predict_newdata(test_matrix)
# Evaluation on the new data applying $score() method
pred_baseline$score(measures)
```
## Simple benchmarking
We have the baseline for now. So we can start selecting the best performing ML model. How does one selects the best performing model for a specific task? One can compare estimated generalization error (accuracy) of the models using a Cross-Validation (CV) scheme.
Benchmarking is easy in **mlr3** framework thanks to predefined object classes, sugar functions and dictionaries. One has to define the set of learners for benchmarking using `lrns()` and other dictionaries. The second step is to define a `Resample` object that encapsulates the CV. We will use 10-fold CV for now.
```{r learners_cv_benchmarking, message=FALSE, warning=FALSE}
lrn_cv <- lrns(c("regr.ranger", "regr.cv_glmnet", "regr.cubist"))
cv10 <- rsmp("cv", folds = 10)
```
We can pass both objects to `benchmark_grid()` together with the task for constructing an exhaustive design that describe all combinations of the learners, tasks, and resamplings to be used in a benchmark experiment. The benchmark experiment is conducted with `benchmark()`. The results are stored in the object of the `BenchmarkResult` class. One can access the results of benchmarking applying `$score()` or `$aggregate()` methods on the object. We can pass `Measures` object storing the selected metrics in the functions [@mlr3Eval].
```{r simple_benchmarking, message=FALSE, warning=FALSE, results=FALSE}
bench_cv_grid <- benchmark_grid(task_neospectra, lrn_cv, cv10)
bench_cv <- benchmark(bench_cv_grid)
```
```{r simple benchmarking results, message=FALSE, warning=FALSE}
# Check the results of each fold using $score()
bench_cv$score(measures)
# Check the aggregated results of the models using $aggregate()
bench_cv$aggregate(measures)
```
## Simple tuning
::: {.callout-tip}
## Parameters x Hyperparameters
Machine learning algorithms usually include parameters and hyperparameters. Parameters are the model coefficients or weights or other information that are determined by the learning algorithm based on the training data. In contrast, hyperparameters, are configured by the user and determine how the model will fit its parameters, i.e., how the model is built. The goal of hyperparameter optimization (HPO) or model tuning is to find the optimal configuration of hyperparameters of a machine learning algorithm for a given task. (...) [@mlr3opt].
:::
We will practice HPO with a simple model tuning using the [cubist model](https://cran.r-project.org/web/packages/Cubist/vignettes/cubist.html) learner. For this, we will randomly search for the best pair of `comitees` (a boosting-like scheme where iterative model trees are created in sequence) and `neighbors` (how many similar training points are used for determining the average of terminal points) hyperparameters.
```{r tune_cubist_model, message=FALSE, warning=FALSE, results=FALSE}
# Define search space for committees and neighbors inside the Learner object using to_tune function
lrn_cubist <- lrn("regr.cubist",
committees = to_tune(1, 20),
neighbors = to_tune(0, 5))
# Setting a seed for reproducibility
set.seed(349)
# Define tuning instance using tune() helper function
# tune() function will automatically calls $optimize() method of the instance
simple_tune <- tune(
method = tnr("random_search"), # Picking random combinations of hyperparameters from the search space
task = task_neospectra, # The task with training data
learner = lrn_cubist, # Learner algorithm
resampling = cv10, # Apply 10-fold CV to calculate the evaluation metrics
measures = msr("regr.rmse"), # Use RMSE to select the best set of hyperparameters
term_evals = 10 # Terminate tuning after testing 10 combinations
)
```
```{r tune_cubist_model_results, message=FALSE, warning=FALSE}
# Exploring tuning results
simple_tune$result
```
## Combining benchmarking and tuning
It is a standard practice to combine tuning and benchmarking while searching for the best model.
::: {.callout-tip}
When performing both operations simultaneously, it is a good practice to implement **nested resampling** to get a really robust estimate of the generalization error. This means that we implement inner CV for tuning models and outer CV to benchmark tuned models. We can implement it easily in **mlr3** framework using the previous code and handy functions as indicated in the [Nested Resampling section](https://mlr.mlr-org.com/articles/tutorial/nested_resampling.html).
:::
```{r showing_tunig_spaces, message=FALSE, warning=FALSE}
# Showing dictionary of available pre-trained tuning spaces
as.data.table(mlr_tuning_spaces)[,.(label)]
```
Running internal (inner) cross-validation for independent model optimization.
```{r inner_cv_tuning, message=FALSE, warning=FALSE, results=FALSE}
# Selecting the spaces using sugar function ltss()
# Not available for cubist
spaces <- ltss(c("regr.ranger.default", "regr.glmnet.default"))
# Creating a list of learners and an empty list to write tuned learners
lrn_cv <- lrns(c("regr.ranger", "regr.glmnet"))
lrn_cv_tuned <- list()
# To avoid most of the logs
lgr::get_logger("mlr3")$set_threshold("warn")
# For loop to tune hyperparameters of every learner using random search
i=1
for(i in 1:length(lrn_cv)){
# Using multiple cores for tuning each model
future::plan("multisession")
set.seed(349)
instance <- tune(task = task_neospectra,
method = tnr("random_search"),
learner = lrn_cv[[i]],
resampling = cv10,
measures = msr("regr.rmse"),
term_evals = 10,
search_space = spaces[[i]])
# Writing tuned hyperparameters
lrn_cv_tuned[[i]] <- lrn_cv[[i]]
lrn_cv_tuned[[i]]$param_set$values <- instance$result_learner_param_vals
}
# Close parallel connection
future:::ClusterRegistry("stop")
# Adding previously tuned cubist model to the list for benchmarking
# Create a new instance of the cubist model
lrn_cubist_tuned <- lrn("regr.cubist")
# Set the best performing hyperparameters to the new instance
lrn_cubist_tuned$param_set$values <- simple_tune$result_learner_param_vals
# Bind it to the tuned models lists
lrn_cv_tuned <- c(lrn_cv_tuned, lrn_cubist_tuned)
```
Now we can perform a complete benchmark using outer cross-validation.
```{r outer_cv_benchmarking, message=FALSE, warning=FALSE, results=FALSE}
# Creating a benchmark grid
bench_cv_grid <- benchmark_grid(task_neospectra, lrn_cv_tuned, cv10)
# Implementing bechmarking using multiple cores
future::plan("multisession")
bench_cv <- benchmark(bench_cv_grid)
future:::ClusterRegistry("stop")
```
```{r benchmarking_results, message=FALSE, warning=FALSE}
# Evaluating the results
bench_cv$aggregate(measures)
```
We can see that the cubist model works best. So lets test the tuned cubist model on our test set.
We can get the real generalization error of the best performing model by training a new model instance on the whole dataset and calculating the error metrics on the test set.
```{r final_cubist_model, message=FALSE, warning=FALSE}
# New Cubist learner
lrn_cubist_tuned <- lrn("regr.cubist")
# Setting the best HPs
lrn_cubist_tuned$param_set$values <- lrn_cv_tuned[[3]]$param_set$values # Cubist is the third learner in the list
# Fitting with whole train set
lrn_cubist_tuned$train(task_neospectra)
# Predicting on test set
pred_cubist_response <- lrn_cubist_tuned$predict_newdata(test_matrix)
# Final goodness-of-fit metrics
pred_cubist_response$score(measures)
```
## Model interpretation
Model interpretation helps us to understand which features contribute most to the predictive performance of the model. There are many ways in doing this, and we recommend visiting [Chapter 12 of the MLR3 book](https://mlr3book.mlr-org.com/chapters/chapter12/model_interpretation.html) to learn more about the possibilities.
In this section, we will explore the the **permutation feature importance (PFI)**. This methods randomly shuffles observed values for every single feature and save the model performance change before and after permuting them. Important features cause a significant change of performance when shuffled. It is a very popular method because it is model agnostic and allow the quick inspection of models by a simple procedure.
Let's explore the importance of our tuned cubist model. We use 10 repetitions to be able to capture the range of the importance from several random shuffles. You can increase this number to make it more robust. As we are working with a regression task, the metric that capture the importance change is the mean squared error (mse).
```{r importance, message=FALSE, warning=FALSE}
# Grabbing the features in train data
model_predictors <- task_neospectra$data(cols = task_neospectra$feature_names)
# Grabbing the outcome in train data
model_outcome <- task_neospectra$data(cols = task_neospectra$target_names)
# Building a predictor object for importance evaluation
predictor = Predictor$new(lrn_cubist_tuned,
data = model_predictors,
y = model_outcome)
importance = FeatureImp$new(predictor, loss = "mse", n.repetitions = 10)
importance$plot()
```
Therefore, `pc4` is largely the most important feature for SOC prediction. One can further dive into the `pc4` composition to see which spectral features are being represented by the loadings of this principal component.
## Additional accuracy assessment
We can get the observed and predicted values for creating an accuracy plot.
```{r accuracy_plot, message=FALSE, warning=FALSE}
accuracy.data <- tibble(id = test_matrix$id.sample_local_c,
observed = pred_cubist_response$data$truth,
predicted = pred_cubist_response$data$response)
ggplot(accuracy.data) +
geom_point(aes(x = observed, y = predicted)) +
geom_abline(intercept = 0, slope = 1) +
theme_light()
```
We can also use **yardstick** package to calculate additional goodness-of-fit metrics, like Lin's Concordance Correlation Coefficient (CCC) and Ratio of Performance to the Interquartile Range (RPIQ).
::: {.callout-tip}
# Lin's Concordance Correlation Coefficient (CCC)
CCC measures the agreement between between observed and predicted values, taking in consideration the bias (deviation from the 1:1 perfect agreement line). Scores close to 1 represent the best case. ([Wikipedia](https://en.wikipedia.org/wiki/Concordance_correlation_coefficient)).
:::
::: {.callout-tip}
# Ratio of Performance to the Interquartile Range (RPIQ)
RPIQ measures the consistency of the model by taking into consideration the original variability (Interquartile Range) divided by the predicted mean error (RMSE) and is more easily comparable across distinct studies ([Reference](https://search.r-project.org/CRAN/refmans/chillR/html/RPIQ.html)).
:::
```{r accuracy_metrics, message=FALSE, warning=FALSE}
library("yardstick")
# Calculating metrics
accuracy.metrics <- accuracy.data %>%
summarise(n = n(),
rmse = rmse_vec(truth = observed, estimate = predicted),
bias = msd_vec(truth = observed, estimate = predicted),
rsq = rsq_vec(truth = observed, estimate = predicted),
ccc = ccc_vec(truth = observed, estimate = predicted, bias = T),
rpiq = rpiq_vec(truth = observed, estimate = predicted))
accuracy.metrics
```
## Uncertainty estimation
The 10-fold cross-validated predictions can be leveraged to estimate the error (absolute residual) of the calibration set and employ it to fit an uncertainty estimation model via the [**Split Conformal Prediction**](https://en.wikipedia.org/wiki/Conformal_prediction) method.
![Adapted from <http://cclp.marseille.inserm.fr/>](img/conformal_diagram.png)
The error model is calibrated using the same fine-tuned structure of the response/target variable model.
Conformity scores are estimated for a defined confidence level, like 95% of confidence level or 68% (equivalent to one standard deviation).
In this part, we are going to produce 95% prediction intervals and we expect that the coverage statistics (also known as Prediciton Interval Coverage Percentage - PICP) is at least equal or higher than our predefined confidence level.
```{r uncertainty, message=FALSE, warning=FALSE}
# Runing 10 CV with the Cubist tuned model
cv_results <- resample(task = task_neospectra,
learner = lrn_cubist_tuned,
resampling = rsmp("cv", folds = 10))
# Extracting the observed and predicted values within the folds
cv_results <- lapply(1:length(cv_results$predictions("test")), function(i){
as.data.table(cv_results$predictions("test")[[i]]) %>%
mutate(fold = i)})
cv_results <- Reduce(rbind, cv_results)
head(cv_results)
# Quick visualization
ggplot(cv_results) +
geom_point(aes(x = truth, y = response)) +
geom_abline(intercept = 0, slope = 1) +
theme_light()
# Estimating the absolute error
cv_results <- cv_results %>%
mutate(error = abs(truth-response))
# Create a matrix with the same predictors as train_matrix and
# variable error as the new target
# you can View to see if the join worked well
error_matrix <- train_matrix %>%
mutate(row_ids = row_number()) %>%
left_join(cv_results, by = "row_ids")
# Simplifying with only necessary data
error_matrix <- error_matrix %>%
select(id.sample_local_c, error, starts_with("pc_"))
task_error <- as_task_regr(error_matrix,
target = "error",
id = "error_model")
# Set row names role to the id.sample_local_c column
task_error$set_col_roles("id.sample_local_c", roles = "name")
task_error
# Retrain the same model structure with the task_error
lrn_cubist_error = lrn("regr.cubist")
lrn_cubist_error$param_set$values = lrn_cubist_tuned$param_set$values
# Retrain
lrn_cubist_error <- lrn_cubist_tuned$train(task_error)
# Get the plain error predictions
error_predictions <- predict(lrn_cubist_error, newdata = as.data.table(task_error))
# Calculating the conformity scores (alpha) of the calibration set
error_matrix <- error_matrix %>%
select(-starts_with("pc_")) %>%
mutate(pred_error = error_predictions) %>%
mutate(alpha_scores = error/pred_error)
error_matrix
# Sample correction and final conformity score
target_coverage = 0.95
n_calibration <- nrow(error_matrix)
corrected_quantile <- ((n_calibration + 1)*(target_coverage))/n_calibration
alpha_corrected <- quantile(error_matrix$alpha_scores, corrected_quantile)
# Predicting error on test set and calculating 95% prediction interval
# Predicting on test set
pred_cubist_error = lrn_cubist_error$predict_newdata(test_matrix)
pred_cubist_error
# Final output
final.outputs <- accuracy.data %>%
mutate(error = pred_cubist_error$response) %>%
mutate(PI = error*alpha_corrected) %>%
mutate(lower_PI = predicted-PI,
upper_PI = predicted+PI) %>%
select(-error, -PI)
# Back-transforming all results because we have used log1p earlier
final.outputs <- final.outputs %>%
mutate(observed = expm1(observed),
predicted = expm1(predicted),
lower_PI = expm1(lower_PI),
upper_PI = expm1(upper_PI))
# Calculating coverage statistics
final.outputs <- final.outputs %>%
mutate(covered = ifelse(observed >= lower_PI & observed <= upper_PI, TRUE, FALSE),
wis = ifelse(observed < lower_PI,
(upper_PI-lower_PI)+(2/(1-target_coverage))*(lower_PI-observed),
ifelse(observed > upper_PI,
(upper_PI-lower_PI)+(2/(1-target_coverage))*(observed-upper_PI),
upper_PI-lower_PI)))
```
::: {.callout-note}
We don't match 95% prediction interval (PI95%) probably because of the violation of the [**Exchangeability** assumption](https://en.wikipedia.org/wiki/Conformal_prediction#Theory), i.e, the train and test data might not be identically distributed. This might have happened because the test set does not come from the same geographical region and original spectral library, but split conformal prediction method still does a decent job for deriving the uncertainty band. We need to use additional criteria do decide about the reliability of the final prediction intervals.
:::
In addition to the coverage statistics, another useful statistics to evaluate the prediction intervals is the mean [Winkler Interval Score (WIS)](https://www.kaggle.com/datasets/carlmcbrideellis/winkler-interval-score-metric) @Winkler1972.
```{r final_outputs, message=FALSE, warning=FALSE}
# Coverage statistics
final.outputs %>%
count(covered) %>%
mutate(perc = n/sum(n)*100)
# Final accuracy metrics in the original range
final.accuracy <- final.outputs %>%
summarise(n = n(),
rmse = rmse_vec(truth = observed, estimate = predicted),
bias = msd_vec(truth = observed, estimate = predicted),
rsq = rsq_trad_vec(truth = observed, estimate = predicted),
ccc = ccc_vec(truth = observed, estimate = predicted, bias = T),
rpiq = rpiq_vec(truth = observed, estimate = predicted),
coverage = sum(covered)/n*100,
wis = mean(wis))
final.accuracy
# Final accuracy plot
perfomance.annotation <- paste0("Rsq = ", round(final.accuracy[[1,"rsq"]], 2),
"\nRMSE = ", round(final.accuracy[[1,"rmse"]], 2), " wt%",
"\nCoverage = ", round(final.accuracy[[1,"coverage"]], 2), "%",
"\nWIS = ", round(final.accuracy[[1,"wis"]], 2), " wt%")
p.final <- ggplot(final.outputs) +
geom_pointrange(aes(x = observed, y = predicted,
ymin = lower_PI, ymax = upper_PI,
color = covered)) +
geom_abline(intercept = 0, slope = 1) +
geom_text(aes(x = -Inf, y = Inf, hjust = -0.1, vjust = 1.2),
label = perfomance.annotation, size = 3) +
labs(title = "Soil Organic Carbon (wt%) test prediction",
x = "Observed", y = "Predicted", color = "PI95% covered:") +
theme_light() + theme(legend.position = "bottom")
# Making sure it we have a square plot
r.max <- max(layer_scales(p.final)$x$range$range)
r.min <- min(layer_scales(p.final)$x$range$range)
s.max <- max(layer_scales(p.final)$y$range$range)
s.min <- min(layer_scales(p.final)$y$range$range)
t.max <- round(max(r.max,s.max),1)
t.min <- round(min(r.min,s.min),1)
p.final <- p.final + coord_equal(xlim = c(t.min,t.max), ylim = c(t.min,t.max))
p.final
```