Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

love.plot errors when not all clusters include all treatment levels; is there a way to get plots for just the clusters with complete data? #70

Open
eipi10 opened this issue Aug 11, 2023 · 2 comments

Comments

@eipi10
Copy link

eipi10 commented Aug 11, 2023

I'm doing an evaluation of an education intervention with observational data. After matching on the propensity score (with exact matching by gender and nearest-neighbor matching on other covariates), love.plot without clustering works as expected. However, love.plot clustered by gender (love.plot(m.out, abs=TRUE, cluster="gender")) fails with Error in `love.plot()`: ! Not all treatment levels are present in all clusters.

This appears to happen because no students with gender="Nonbinary" were in the treatment group and therefore no Nonbinary students are among the matched controls. I get the same error even if I use the which.cluster argument to include only "Male" and "Female" genders in the plot. A reproducible example is below.

Is there a way to get love.plot to provide plots for all clusters in which both treatment groups appear, rather than erroring? I realize I could exclude Nonbinary students from the data provided to matchit, but the workflow would be more straightforward if I could carry the full dataset through the entire analysis.

library(tidyverse)
library(MatchIt)
library(cobalt)

# Fake data
set.seed(594)
n = 200
dd = tibble(
  ids = 1:n,
  gender = sample(c("Male","Female","Nonbinary"), n, replace=TRUE, 
                  prob=c(0.55, 0.43, 0.02)),
  gpa = rnorm(n, 3.3, 0.5),
  units = sample(9:15, n, replace=TRUE),
  mother.ed = sample(c("High School", "College", "Graduate school"), n, 
                     replace=TRUE, prob=c(0.6,0.3,0.1)),
  treat = factor(sample(c("Treated", "Not treated"), n, replace=TRUE, prob=c(0.2,0.8)),
                 levels=c("Not treated", "Treated"))
)

# Check membership
dd %>% count(gender, treat)
#> # A tibble: 5 × 3
#>   gender    treat           n
#>   <chr>     <fct>       <int>
#> 1 Female    Not treated    64
#> 2 Female    Treated        21
#> 3 Male      Not treated    79
#> 4 Male      Treated        30
#> 5 Nonbinary Not treated     6

m.out = matchit(treat ~ gender + gpa + units + mother.ed, data=dd,
                method="nearest",
                exact=c("gender"),
                distance="glm", ratio=1)

m.out$model$data %>% 
  full_join(match.data(m.out)) %>% 
  group_by(weights, gender, treat) %>% 
  tally()
#> Joining with `by = join_by(ids, gender, gpa, units, mother.ed, treat)`
#> # A tibble: 7 × 4
#> # Groups:   weights, gender [5]
#>   weights gender    treat           n
#>     <dbl> <chr>     <fct>       <int>
#> 1       1 Female    Not treated    21
#> 2       1 Female    Treated        21
#> 3       1 Male      Not treated    30
#> 4       1 Male      Treated        30
#> 5      NA Female    Not treated    43
#> 6      NA Male      Not treated    49
#> 7      NA Nonbinary Not treated     6

# love.plot works
love.plot(m.out, abs=TRUE) 
#> Warning: Standardized mean differences and raw mean differences are present in the same plot. 
#> Use the `stars` argument to distinguish between them and appropriately label the x-axis.

# love.plot fails (presumably because gender="Nonbinary" appears only in "Not treated" treatment group)
love.plot(m.out, abs=TRUE, cluster="gender")
#> Error in `love.plot()`:
#> ! Not all treatment levels are present in all clusters.
#> Backtrace:
#>     ▆
#>  1. ├─cobalt::bal.tab(...)
#>  2. └─cobalt:::bal.tab.matchit(...)
#>  3.   ├─base::do.call("x2base.matchit", c(list(x), args), quote = TRUE)
#>  4.   └─cobalt:::x2base.matchit(...)
#>  5.     └─cobalt:::.cluster_check(cluster, treat)
#>  6.       └─cobalt:::.err("not all treatment levels are present in all clusters")
#>  7.         └─chk::err(..., call = pkg_caller_call(start = 2))
#>  8.           └─rlang::abort(msg, class = class, !!!args[named], call = call)

# love.plot fails even with "Nonbinary" not included in which.cluster
love.plot(m.out, abs=TRUE, cluster="gender", which.cluster=c("Male","Female"), drop.missing=TRUE)
#> Error in `love.plot()`:
#> ! Not all treatment levels are present in all clusters.
#> Backtrace:
#>     ▆
#>  1. ├─cobalt::bal.tab(...)
#>  2. └─cobalt:::bal.tab.matchit(...)
#>  3.   ├─base::do.call("x2base.matchit", c(list(x), args), quote = TRUE)
#>  4.   └─cobalt:::x2base.matchit(...)
#>  5.     └─cobalt:::.cluster_check(cluster, treat)
#>  6.       └─cobalt:::.err("not all treatment levels are present in all clusters")
#>  7.         └─chk::err(..., call = pkg_caller_call(start = 2))
#>  8.           └─rlang::abort(msg, class = class, !!!args[named], call = call)

Created on 2023-08-11 with reprex v2.0.2

@ngreifer
Copy link
Owner

Thanks for the bug report and sorry for the confusion. I'm working on a fix for this. The ideal fix would be to be able to use the subset argument to subset to only cases where gender != "Nonbinary". Normally this would look like

love.plot(m.out, abs=TRUE, cluster="gender", subset = dd$gender != "Nonbinary")

However, the function to check to make sure all treatment levels are represented in each cluster is run before the subsetting. So I am working on a fix for that.

Statistically, you need to exclude those with gender == "Nonbinary" from your original sample. They all have a propensity score of 0 and cannot be matched. You should remove them as a pre-processing step before running the matching. They don't contribute to estimating the propensity score and are excluded from the matching anyway. So my solution to you is to do that step before running matchit() or love.plot().

@eipi10
Copy link
Author

eipi10 commented Aug 11, 2023

Thanks Noah. I'd actually missed the subset argument when I was playing around with this. Good to know about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants