Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Management of binary outcome and missing data using interaction terms in glm_weightit #74

Open
kgkirgkiris opened this issue Oct 20, 2024 · 3 comments

Comments

@kgkirgkiris
Copy link

Hi Noah,

I want to start by expressing my sincere thanks, not only for this incredible package but also for everything you have done to make propensity score weighting and matching both accessible and easy to interpret. I have come across countless answers from you on StackExchange and GitHub, and I have learned so much from them. Your contribution has been invaluable. I am sure many others feel the same way. Thank you.

My questions concern estimating effects after weighting. I have a continuous treatment variable and several covariates for a binary outcome. May the proposed algorithm for fitting the outcome model:

fit <- lm_weightit(Y_C ~ splines::ns(Ac, df = 4) *
(X1 + X2 + X3 + X4 + X5 +
X6 + X7 + X8 + X9),
data = d, weightit = W)

be modified as follows in order to convert the binary outcome (Y_B) into continuous as "predicted probabilities of outcome"?

fit <- glm_weightit(Y_B ~ splines::ns(Ac, df = 4) *
(X1+ X2 + X3 + X4 + X5 +
X6 + X7 + X8 + X9),
data = d, weightit = W, family = binomial)

Is the abovementioned modifications enough to continue with the rest of the analysis or my approach is wrong?

An error I also face when using the default "ind" way of dealing with missing data is that when I include the interaction term in my fitting mode, I get this error:

Warning: (from glm()) glm.fit: fitted probabilities numerically 0 or 1 occurred
Error in cbind(psi_out(Bout, w, Y, Xout, SW, offset), psi_treat(Btreat, :
number of rows of matrices must match (see arg 2)

This is also the case when i use a binary treatment variable. It seems that this error does not occur when i remove the interaction term along with the covariates.

Thank you in advance for your time and support.
I am genuinely looking forward to your response.
I would also like to apologize if any of my questions come across as overly basic or elementary.

Kind regards,
Kostas

@kgkirgkiris kgkirgkiris changed the title Management of binary outcome and missing data using interaction terms in the glm_weightit Management of binary outcome and missing data using interaction terms in glm_weightit Oct 20, 2024
@ngreifer
Copy link
Owner

Hi Kostas,

Thank you so much for the kind words about my packages and writing! I'm glad they have been helpful.

Your modification for the binary outcome is correct. Note that your confidence intervals might be outside of [0, 1]; there are ways to prevent this but they are a bit involved, so let me know if that's an issue for you.

Unfortunately, I have not thoroughly tested the performance of glm_weightit() with missing data. Because it calls glm(), it just deletes any missing data, which causes the problems you observed. You should not include any covariate with missingness in the outcome model. Even if that covariate is not part of the interaction, it will still cause your observations to be dropped, which may not be apparent in the output.

Noah

@kgkirgkiris
Copy link
Author

Thank you very much for your kind and prompt response, and for your helpful insights.

Regarding the confidence intervals, your guidance on how to prevent them from falling outside the [0,1] range would be really helpful, especially since my dataset contains small percentages. I would appreciate any advice or methods you could share for addressing this issue.

Kostas

@ngreifer
Copy link
Owner

The code will look a bit esoteric, but here is how you would do it:

p <- avg_predictions(fit,
                     variables = list(Ac = values),
                     byfun = function(...) qnorm(mean(...)),
                     transform = pnorm)

What this does is first put the average predicted probabilities on an unbounded scale, on which standard errors and confidence intervals are estimated, and then transforms the estimates and confidence intervals back to the probability scale. You can replace qnorm() and pnorm() with qlogis() and plogis(), respectively. This would be a bit foreign to some audiences but it does have the nice feature of ensuring the confidence intervals are bounded. They are symmetric around the estimates on the unbounded scale rather than on the probability scale. Otherwise the estimates should be identical and the confidence intervals have the usual interpretation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants