Discrepancy between weighit() and glm() #73

jeffbone · 2024-10-01T17:49:12Z

Hi Noah,

Love your R packages, and thankful for all the work you've put into making them usable and well documented.

I have what I hope is a simple question. Apologies that I cannot produce the example explicitly here as the data is restricted. I am trying to run weightit() as follows:

weightit(treatment ~ covariates, method = 'ps', link = 'logit', estimand = 'ATE', data = my_data)

I get the following error message: glm.fit: algorithm did not converge, glm.fit: fitted probabilities 0 or 1 occured

I expected this was due to sparsity in some covariates within one or both levels of the treatment, but on inspection of the raw data this does not seem to be the case, and when I run:

lr_mod <- glm(treatment ~ covariates, family = 'binomial, data = my_data)

and generate the propensity scores by hand: predict(lr_mod, type = 'response') I get no convergence warnings, and a reasonable looking distribution of propensity scores. I can easily take these values and do the PS weighting myself by hand, but I am curious if you have any idea what is causing the discrepancy between weightit and glm

Any thoughts much appreciated!

The text was updated successfully, but these errors were encountered:

ngreifer · 2024-10-01T18:15:45Z

Thank you for the kind words! Is there any way you could provide the dataset for me to examine this, or can you replicate it is using lalonde? Also, is there missingness in the data? (weightit() treats this differently from glm().)

jeffbone · 2024-10-01T18:33:00Z

Thanks for the lightning quick response!

I had also thought missingness may be an issue too, but getting the same issue with complete case analysis.

Sorry, I realize the no data component makes this more annoying on your end. I will take a stab with lalonde later today but haven't had much prior success (tried playing around a bit prior to my post).

Just to confirm, my understanding of my above call to weightit is that it's essentially just a wrapper for glm(..., family = 'binomial'), followed by predict for getting the PS scores, and then binding these weights back to the original data. Are there any differences I'm unaware of that would lead to convergence issue in one case and not the other that I can try exploring?

ngreifer · 2024-10-01T20:19:32Z

weightit() does a small amount of processing, which includes removing colinear variables and scaling variables to be on the same scale. That should only improve things, not make them worse. When you do a complete case analysis, did you also try it with glm()? The way weightit() handles missing data by default is to create missingness indicators; if those perfectly predict treatment, they could cause the issue you are seeing. glm() just removes missing cases.

Are you able to reproduce the problem with a small subset of your data and send me an anonymized version (e.g., with variable labels removed, factor levels recoded, and numeric variables rescaled)? That way I could take a look at what's going on. Otherwise I can only speculate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discrepancy between weighit() and glm() #73

Discrepancy between weighit() and glm() #73

jeffbone commented Oct 1, 2024

ngreifer commented Oct 1, 2024

jeffbone commented Oct 1, 2024

ngreifer commented Oct 1, 2024

Discrepancy between weighit() and glm() #73

Discrepancy between weighit() and glm() #73

Comments

jeffbone commented Oct 1, 2024

ngreifer commented Oct 1, 2024

jeffbone commented Oct 1, 2024

ngreifer commented Oct 1, 2024