You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This request is to allow the user to specify not only unit-level weights in a brms formula, but group-level weights as well. This would extend the current syntax from this:
# Specify a formula with unit-level weightsy| weights(w) ~x+ (1|g)
to maybe something like this
# Specify a formula with group-level weights as welly| weights(w) ~x+ ( 1|g| group_weights(w_g) )
where $w$ is a unit-level weight variable with potentially $n$ distinct values, and $w_g$ is a group-level weight variable with one value per group.
I don't know what the best syntax for this functionality would be, but I hope this illustrates the idea well enough.
The motivation for this is to make brms able to fit survey-weighted multilevel models: Savitsky and Williams (2022) showed that only weighting the unit-level likelihood doesn't produce reasonable estimates for multilevel models, but adding group-level weights to the random effects' priors works well.
I think would be a valuable addition to brms for many folks who want to analyze data from health and social surveys. I would be happy to work on implementing this if you are open to it and have a preferred way for how this would look.
Example Data
Below is an example dataset of size $n=6$ with $M=3$ groups. The variable w is a unit level weight, and the variable w_g is a group-level weight.
// generated with brms 2.22.0functions {
}
data {
int<lower=1> N; // total number of observationsvector[N] Y; // response variablevector<lower=0>[N] weights; // model weightsint<lower=1> K; // number of population-level effectsmatrix[N, K] X; // population-level design matrixint<lower=1> Kc; // number of population-level effects after centering// data for group-level effects of ID 1int<lower=1> N_1; // number of grouping levelsint<lower=1> M_1; // number of coefficients per levelarray[N] int<lower=1> J_1; // grouping indicator per observation// group-level predictor valuesvector[N] Z_1_1;
int prior_only; // should the likelihood be ignored?
}
transformed data {
matrix[N, Kc] Xc; // centered version of X without an interceptvector[Kc] means_X; // column means of X before centeringfor (i in2:K) {
means_X[i -1] =mean(X[, i]);
Xc[, i -1] = X[, i] - means_X[i -1];
}
}
parameters {
vector[Kc] b; // regression coefficientsreal Intercept; // temporary intercept for centered predictorsreal<lower=0> sigma; // dispersion parametervector<lower=0>[M_1] sd_1; // group-level standard deviationsarray[M_1] vector[N_1] z_1; // standardized group-level effects
}
transformed parameters {
vector[N_1] r_1_1; // actual group-level effectsreal lprior =0; // prior contributions to the log posterior
r_1_1 = (sd_1[1] * (z_1[1]));
lprior +=student_t_lpdf(Intercept | 3, 0.2, 2.5);
lprior +=student_t_lpdf(sigma | 3, 0, 2.5)
-1*student_t_lccdf(0 | 3, 0, 2.5);
lprior +=student_t_lpdf(sd_1 | 3, 0, 2.5)
-1*student_t_lccdf(0 | 3, 0, 2.5);
}
model {
// likelihood including constantsif (!prior_only) {
// initialize linear predictor termvector[N] mu =rep_vector(0.0, N);
mu += Intercept + Xc * b;
for (n in1:N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n];
}
for (n in1:N) {
target+= weights[n] * (normal_lpdf(Y[n] | mu[n], sigma));
}
}
// priors including constantstarget+= lprior;
target+=std_normal_lpdf(z_1[1]);
}
generated quantities {
// actual population-level interceptreal b_Intercept = Intercept -dot_product(means_X, b);
}
Details
The context that motivates this request is the situation where the analyst has data from a complex survey sample with sample weights and they want to fit a model with group-level parameters (i.e., random intercepts or other group-varying parameters).
The current formula syntax allows users to specify unit-level weights that go into the model likelihood at the unit level, using the syntax
y| weights(w) ~x+ (1|g)
The key part of the generated Stan code that uses the weights is highlighted below:
model {
...
for (n in1:N) {
target+= weights[n] * (normal_lpdf(Y[n] | mu[n], sigma));
}
}
...
}
When there are group-varying parameters in the model, such as random intercepts, Savitsky and Williams (2022) showed that the above approach (which they call "single weighting") is insufficient: it tends to produce substantial bias in estimation of the variance parameters and the group-varying parameters.
They showed though that reasonable estimates can be obtained with a strategy they called "double weighting", which applies weights at the individual level and also at the group level. In other words, this approach weights the likelihood as well as the prior on the random effects, where the weights for the prior are group-level weights.
Below is what this could look like in the Stan code:
model {
...
for (n in1:N) {
target+= weights[n] * (normal_lpdf(Y[n] | mu[n], sigma));
}
}
// priors including constantstarget+= lprior;
for (n_1 in1:N_1) {
target+= group_weights[1][n_1] *std_normal_lpdf(z_1[1][n_1]);
}
}
The text was updated successfully, but these errors were encountered:
I like this idea. My suggestion for syntax would be
y| weights(w) ~x+ ( 1| gr(g, weights=w_g) )
The gr() function already exists and allows us to customize random effects, so it is a good place for this feature. Before anyone starts working on this feature, could you check if the weights option in the multi-membership terms (?mm) may already cover your use-case?
Thanks, Paul. I think that syntax using gr(g, weights = w_g) is quite clear and would be a good place for this.
I inspected the mm() implementation and unfortunately I don't think it can be used to implement this kind of weighting, for two reasons. The first reason is that the mm() function throws an error message when only one grouping variable is supplied, as it expects to have multiple grouping variables to define the multiple memberships. The second and more important reason is that the weights applied by mm() ultimately don't go into the necessary part of the model: mm() weights go into the linear predictor mu rather than affecting the prior for a random effect. So unfortunately mm() can't be hacked to implement this.
Summary
This request is to allow the user to specify not only unit-level weights in a brms formula, but group-level weights as well. This would extend the current syntax from this:
to maybe something like this
where$w$ is a unit-level weight variable with potentially $n$ distinct values, and $w_g$ is a group-level weight variable with one value per group.
I don't know what the best syntax for this functionality would be, but I hope this illustrates the idea well enough.
The motivation for this is to make
brms
able to fit survey-weighted multilevel models: Savitsky and Williams (2022) showed that only weighting the unit-level likelihood doesn't produce reasonable estimates for multilevel models, but adding group-level weights to the random effects' priors works well.I think would be a valuable addition to
brms
for many folks who want to analyze data from health and social surveys. I would be happy to work on implementing this if you are open to it and have a preferred way for how this would look.Example Data
Below is an example dataset of size$n=6$ with $M=3$ groups. The variable
w
is a unit level weight, and the variablew_g
is a group-level weight.We want to estimate a simple varying-intercepts model of the form
y ~ x + (1 | g)
.Below is Stan code that would be generated for this model:
Click to show generated Stan code
Details
The context that motivates this request is the situation where the analyst has data from a complex survey sample with sample weights and they want to fit a model with group-level parameters (i.e., random intercepts or other group-varying parameters).
The current formula syntax allows users to specify unit-level weights that go into the model likelihood at the unit level, using the syntax
The key part of the generated Stan code that uses the weights is highlighted below:
When there are group-varying parameters in the model, such as random intercepts, Savitsky and Williams (2022) showed that the above approach (which they call "single weighting") is insufficient: it tends to produce substantial bias in estimation of the variance parameters and the group-varying parameters.
They showed though that reasonable estimates can be obtained with a strategy they called "double weighting", which applies weights at the individual level and also at the group level. In other words, this approach weights the likelihood as well as the prior on the random effects, where the weights for the prior are group-level weights.
Below is what this could look like in the Stan code:
The text was updated successfully, but these errors were encountered: