Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column mean_exposure doesn't exist." #1191

Open
mahaja1 opened this issue Dec 19, 2024 · 10 comments
Open

Column mean_exposure doesn't exist." #1191

mahaja1 opened this issue Dec 19, 2024 · 10 comments
Assignees

Comments

@mahaja1
Copy link

mahaja1 commented Dec 19, 2024

Hey, I i came across this issue in robyn and it's not going away:

Run all trials and iterations. Use ?robyn_run to check parameter definition

OutputModels <- robyn_run(

  • InputCollect = InputCollect, # feed in all model specification
  • cores = NULL, # NULL defaults to (max available - 1)
  • iterations = 10000, # 2000 recommended for the dummy dataset with no calibration
  • trials = 8, # 5 recommended for the dummy dataset
  • ts_validation = TRUE, # 3-way-split time series for NRMSE validation.
  • add_penalty_factor = FALSE # Experimental feature. Use with caution.
  • )
    Input data has 42 weeks in total: 2023-08-14 to 2024-05-27
    Initial model is built on rolling window of 42 week: 2023-08-14 to 2024-05-27
    Time-series validation with train_size range of 50%-80% of the data...
    Using geometric adstocking with 14 hyperparameters (14 to iterate + 0 fixed) on 15 cores

Starting 8 trials with 10000 iterations each using TwoPointsDE nevergrad algorithm...
Running trial 1 of 8
| | 0%Timing stopped at: 0.73 0.11 0.86
Error in { : task 1 failed - "Can't select columns that don't exist.
✖ Column mean_exposure doesn't exist."

I'd really appreciate if anyone can help me out with this one.

@mahaja1
Copy link
Author

mahaja1 commented Dec 19, 2024

dt_input

A tibble: 65 × 6

DATE Call RESPONSIVE_SEARCH VIDEO_TRUEVIEW_IN_STREAM FB Conversion

1 2023-09-18 00:00:00 0 818. 75.0 501. 220
2 2023-09-25 00:00:00 0 805. 74.2 450. 198
3 2023-10-02 00:00:00 0 781. 70.2 500. 211
4 2023-10-09 00:00:00 0 802. 55.9 509. 194
5 2023-10-16 00:00:00 0.241 801. 69.6 483. 240
6 2023-10-23 00:00:00 2.87 797. 102. 462. 206
7 2023-10-30 00:00:00 18.9 713. 64.4 460. 224
8 2023-11-06 00:00:00 5.99 740. 82.3 510. 238
9 2023-11-13 00:00:00 2.86 818. 72.0 519. 193
10 2023-11-20 00:00:00 0 718. 81.3 481. 187

ℹ 55 more rows

ℹ Use print(n = ...) to see more rows

All sign control are now automatically provided: "positive" for media & organic

variables and "default" for all others. User can still customise signs if necessary.

Documentation is available, access it anytime by running: ?robyn_inputs

InputCollect <- robyn_inputs(

  • dt_input = dt_input,
  • dt_holidays = dt_prophet_holidays,
  • date_var = "DATE", # date format must be "2020-01-01"
  • dep_var = "Conversion", # there should be only one dependent variable
  • dep_var_type = "conversion", # "revenue" (ROI) or "conversion" (CPA)
  • prophet_vars = c("trend", "season", "holiday"), # "trend","season", "weekday" & "holiday"
  • prophet_country = "US", # input country code. Check: dt_prophet_holidays
  • paid_media_spends = c( "Call", "RESPONSIVE_SEARCH", "VIDEO_TRUEVIEW_IN_STREAM", "FB"
  • ), # mandatory input
  • paid_media_vars = c( "Call", "RESPONSIVE_SEARCH", "VIDEO_TRUEVIEW_IN_STREAM", "FB"), # mandatory.
  • paid_media_vars must have same order as paid_media_spends. Use media exposure metrics like

  • impressions, GRP etc. If not applicable, use spend instead.

  • #context_vars = c("events"),
  • #factor_vars = c("events"), # force variables in context_vars or organic_vars to be categorical
  • #window_start = "2021-05-03",
  • #window_end = "2023-12-25",
  • adstock = "geometric" # geometric, weibull_cdf or weibull_pdf.
  • )
    Warning message:
    In check_datadim(dt_input, all_ind_vars, rel = 10) :
    There are 7 independent variables & 65 data points. We recommend row:column ratio of 10 to 1

print(InputCollect)
Total Observations: 65 (weeks)
Input Table Columns (6):
Date: DATE
Dependent: Conversion [conversion]
Paid Media: Call, RESPONSIVE_SEARCH, VIDEO_TRUEVIEW_IN_STREAM, FB
Paid Media Spend: Call, RESPONSIVE_SEARCH, VIDEO_TRUEVIEW_IN_STREAM, FB
Context:
Organic:
Prophet (Auto-generated): trend, season, holiday on US
Unused variables: None

Date Range: 2023-09-18:2024-12-09
Model Window: 2023-09-18:2024-12-09 (65 weeks)
With Calibration: FALSE
Custom parameters: None

Adstock: geometric
Hyper-parameters: Not set yet

Default media variable for modelling has changed from paid_media_vars to paid_media_spends.

Also, calibration_input are required to be spend names.

hyperparameter names are based on paid_media_spends names too. See right hyperparameter names:

hyper_names(adstock = InputCollect$adstock, all_media = InputCollect$all_media)
[1] "Call_alphas" "Call_gammas" "Call_thetas"
[4] "FB_alphas" "FB_gammas" "FB_thetas"
[7] "RESPONSIVE_SEARCH_alphas" "RESPONSIVE_SEARCH_gammas" "RESPONSIVE_SEARCH_thetas"
[10] "VIDEO_TRUEVIEW_IN_STREAM_alphas" "VIDEO_TRUEVIEW_IN_STREAM_gammas" "VIDEO_TRUEVIEW_IN_STREAM_thetas"

1. IMPORTANT: set plot = TRUE to create example plots for adstock & saturation

hyperparameters and their influence in curve transformation.

plot_adstock(plot = FALSE)
plot_saturation(plot = FALSE)

4. Set individual hyperparameter bounds. They either contain two values e.g. c(0, 0.5),

or only one value, in which case you'd "fix" that hyperparameter.

Run hyper_limits() to check maximum upper and lower bounds by range

hyper_limits()
thetas alphas gammas shapes scales
1 >=0 >0 >0 >=0 >=0
2 <1 <10 <=1 <20 <=1

Example hyperparameters ranges for Geometric adstock

hyperparameters <- list(

  • FB_alphas = c(0.5, 3),
  • FB_gammas = c(0.3, 1),
  • FB_thetas = c(0, 0.3),
  • Call_alphas = c(0.5, 3),
  • Call_gammas = c(0.3, 1),
  • Call_thetas = c(0, 0.3),
  • RESPONSIVE_SEARCH_alphas = c(0.5, 3),
  • RESPONSIVE_SEARCH_gammas = c(0.3, 1),
  • RESPONSIVE_SEARCH_thetas = c(0, 0.3),
  • VIDEO_TRUEVIEW_IN_STREAM_alphas = c(0.5, 3),
  • VIDEO_TRUEVIEW_IN_STREAM_gammas = c(0.3, 1),
  • VIDEO_TRUEVIEW_IN_STREAM_thetas = c(0, 0.3)
  • )

InputCollect <- robyn_inputs(InputCollect = InputCollect, hyperparameters = hyperparameters)

Running feature engineering...
Warning message:
In check_hyperparameters(InputCollect$hyperparameters, InputCollect$adstock, :
Automatically added missing hyperparameter range: 'train_size' = c(0.5, 0.8)
print(InputCollect)
Total Observations: 65 (weeks)
Input Table Columns (6):
Date: DATE
Dependent: Conversion [conversion]
Paid Media: Call, RESPONSIVE_SEARCH, VIDEO_TRUEVIEW_IN_STREAM, FB
Paid Media Spend: Call, RESPONSIVE_SEARCH, VIDEO_TRUEVIEW_IN_STREAM, FB
Context:
Organic:
Prophet (Auto-generated): trend, season, holiday on US
Unused variables: None

Date Range: 2023-09-18:2024-12-09
Model Window: 2023-09-18:2024-12-09 (65 weeks)
With Calibration: FALSE
Custom parameters: None

Adstock: geometric
Hyper-parameters ranges:
FB_alphas: [0.5, 3]
FB_gammas: [0.3, 1]
FB_thetas: [0, 0.3]
Call_alphas: [0.5, 3]
Call_gammas: [0.3, 1]
Call_thetas: [0, 0.3]
RESPONSIVE_SEARCH_alphas: [0.5, 3]
RESPONSIVE_SEARCH_gammas: [0.3, 1]
RESPONSIVE_SEARCH_thetas: [0, 0.3]
VIDEO_TRUEVIEW_IN_STREAM_alphas: [0.5, 3]
VIDEO_TRUEVIEW_IN_STREAM_gammas: [0.3, 1]
VIDEO_TRUEVIEW_IN_STREAM_thetas: [0, 0.3]
train_size: [0.5, 0.8]

Check spend exposure fit if available

if (length(InputCollect$exposure_vars) > 0) {

  • lapply(InputCollect$modNLS$plots, plot)
  • }

Run all trials and iterations. Use ?robyn_run to check parameter definition

OutputModels <- robyn_run(

  • InputCollect = InputCollect, # feed in all model specification
  • cores = NULL, # NULL defaults to (max available - 1)
  • iterations = 5000, # 2000 recommended for the dummy dataset with no calibration
  • trials = 5, # 5 recommended for the dummy dataset
  • ts_validation = TRUE, # 3-way-split time series for NRMSE validation.
  • add_penalty_factor = FALSE # Experimental feature. Use with caution.
  • )
    Input data has 65 weeks in total: 2023-09-18 to 2024-12-09
    Initial model is built on rolling window of 65 week: 2023-09-18 to 2024-12-09
    Time-series validation with train_size range of 50%-80% of the data...
    Using geometric adstocking with 14 hyperparameters (14 to iterate + 0 fixed) on 15 cores

Starting 5 trials with 5000 iterations each using TwoPointsDE nevergrad algorithm...
Running trial 1 of 5
| | 0%Timing stopped at: 0.88 0.05 0.94
Error in { : task 1 failed - "Can't select columns that don't exist.
✖ Column mean_exposure doesn't exist."

The codes were running fine for me before but started giving me problem today. Please help me with this issue @gufengzhou

@mahaja1
Copy link
Author

mahaja1 commented Dec 19, 2024

I found the error! you need to have an organic variable column as well in the input column now for the model to run. This was not the case before, kindly look into it.

@gufengzhou
Copy link
Contributor

Thanks! Will look into it. @laresbernardo would you have time for this one?

@gufengzhou
Copy link
Contributor

@mahaja1 I can't reproduce your error. I can run the demo without organic using the latest version.

@gufengzhou gufengzhou self-assigned this Dec 20, 2024
@mahaja1
Copy link
Author

mahaja1 commented Dec 20, 2024

I am still getting it. What options do i have? When i ran the model using traffic column, it is assigning large contribution value to traffic. I want to run it without any organic variable.

@mahaja1
Copy link
Author

mahaja1 commented Dec 20, 2024

I really need help around this. Please look into this.

@gufengzhou
Copy link
Contributor

which version are you using? you can do so by running packageVersion("Robyn"). Your copy paste message above is quite messy. I might need your dataset as well as your robyn_inputs() configs to debug

@laresbernardo
Copy link
Collaborator

laresbernardo commented Dec 23, 2024

I wasn't able to reproduce the error either. Potentially changing a hard select(...) to select(any_of(...)) could fix it if that's the case @gufengzhou. There are only 4 mean_exposure's in the code.

@dhpz-11
Copy link

dhpz-11 commented Jan 12, 2025

Hi team,

I'm getting the same error as well. I've checked some possible solutions with Gemini but they didn't work as well.

Here's the code I'm using:

#Paramaters:

InputCollect <- robyn_inputs(
dt_input = mmm,
date_var = "Date",
dep_var = "Sales",
dep_var_type = "revenue",
paid_media_spends = c("TikTok", "Facebook", "Google"),
paid_media_vars = c("TikTok", "Facebook", "Google"),
media_vars = NULL, #This was a recommendation from Gemini, in order to solve the Error from the robyn output
window_start = "2018-01-07", #mandatory
window_end = "2021-10-31", #mandatory
adstock = "geometric", #mandatory
)
print(InputCollect)

#Hyperparameters:

hyper_names(adstock = InputCollect$adstock, all_media = InputCollect$all_media)
hyperparameters <- list(
Facebook_alphas = c(0.5, 3),
Facebook_gammas = c(0.3, 1),
Facebook_thetas = c(0, 0.4),
Google_alphas = c(0.5, 3),
Google_gammas = c(0.3, 1),
Google_thetas = c(0, 0.4),
TikTok_alphas = c(0.5, 3),
TikTok_gammas = c(0.3, 1),
TikTok_thetas = c(0, 0.4),
train_size = c(0.5, 0.8)
)
InputCollect <- robyn_inputs(InputCollect = InputCollect, hyperparameters = hyperparameters)
print(InputCollect)

#Robyn Outputs:

Outputmodels <- robyn_run(
InputCollect = InputCollect,
cores = NULL,
interations = 200,
trials = 5,
ts_validation = TRUE,
add_penalty_factor = FALSE
)
print(Outputmodels)

This is the error I'm getting:

Input data has 200 weeks in total: 2018-01-07 to 2021-10-31
Initial model is built on rolling window of 200 week: 2018-01-07 to 2021-10-31
Time-series validation with train_size range of 50%-80% of the data...
Using geometric adstocking with 11 hyperparameters (11 to iterate + 0 fixed) on 7 cores

Starting 5 trials with 2000 iterations each using TwoPointsDE nevergrad algorithm...
Running trial 1 of 5
| | 0%Timing stopped at: 0.425 0.533 0.25
Error in { : task 1 failed - "Can't select columns that don't exist.
✖ Column mean_exposure doesn't exist."

mmm.xlsx

@ONalivajko
Copy link

Hi!
@gufengzhou @laresbernardo
I have been able to reproduce this error with the demo dataset using Robyn 3.12
In order to get it, you need to leave only the paid vars that don't have exposure (i.e. all paid_media_vars are the same as paid_media_spends):

image

And of course leave out the organic variable.

image

Also notice how Robyn counts hyperparameters in this case: we only have 9 parameters + train_size but Robyn says
image
Maybe it creates a placeholder hyperparameter for organic in the background?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants