Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancy between weighit() and glm() #73

Open
jeffbone opened this issue Oct 1, 2024 · 3 comments
Open

Discrepancy between weighit() and glm() #73

jeffbone opened this issue Oct 1, 2024 · 3 comments

Comments

@jeffbone
Copy link

jeffbone commented Oct 1, 2024

Hi Noah,

Love your R packages, and thankful for all the work you've put into making them usable and well documented.

I have what I hope is a simple question. Apologies that I cannot produce the example explicitly here as the data is restricted. I am trying to run weightit() as follows:

weightit(treatment ~ covariates, method = 'ps', link = 'logit', estimand = 'ATE', data = my_data)

I get the following error message: glm.fit: algorithm did not converge, glm.fit: fitted probabilities 0 or 1 occured

I expected this was due to sparsity in some covariates within one or both levels of the treatment, but on inspection of the raw data this does not seem to be the case, and when I run:

lr_mod <- glm(treatment ~ covariates, family = 'binomial, data = my_data)

and generate the propensity scores by hand: predict(lr_mod, type = 'response') I get no convergence warnings, and a reasonable looking distribution of propensity scores. I can easily take these values and do the PS weighting myself by hand, but I am curious if you have any idea what is causing the discrepancy between weightit and glm

Any thoughts much appreciated!

@ngreifer
Copy link
Owner

ngreifer commented Oct 1, 2024

Thank you for the kind words! Is there any way you could provide the dataset for me to examine this, or can you replicate it is using lalonde? Also, is there missingness in the data? (weightit() treats this differently from glm().)

@jeffbone
Copy link
Author

jeffbone commented Oct 1, 2024

Thanks for the lightning quick response!

I had also thought missingness may be an issue too, but getting the same issue with complete case analysis.

Sorry, I realize the no data component makes this more annoying on your end. I will take a stab with lalonde later today but haven't had much prior success (tried playing around a bit prior to my post).

Just to confirm, my understanding of my above call to weightit is that it's essentially just a wrapper for glm(..., family = 'binomial'), followed by predict for getting the PS scores, and then binding these weights back to the original data. Are there any differences I'm unaware of that would lead to convergence issue in one case and not the other that I can try exploring?

@ngreifer
Copy link
Owner

ngreifer commented Oct 1, 2024

weightit() does a small amount of processing, which includes removing colinear variables and scaling variables to be on the same scale. That should only improve things, not make them worse. When you do a complete case analysis, did you also try it with glm()? The way weightit() handles missing data by default is to create missingness indicators; if those perfectly predict treatment, they could cause the issue you are seeing. glm() just removes missing cases.

Are you able to reproduce the problem with a small subset of your data and send me an anonymized version (e.g., with variable labels removed, factor levels recoded, and numeric variables rescaled)? That way I could take a look at what's going on. Otherwise I can only speculate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants