love.plot errors when not all clusters include all treatment levels; is there a way to get plots for just the clusters with complete data? #70

eipi10 · 2023-08-11T17:14:50Z

I'm doing an evaluation of an education intervention with observational data. After matching on the propensity score (with exact matching by gender and nearest-neighbor matching on other covariates), love.plot without clustering works as expected. However, love.plot clustered by gender (love.plot(m.out, abs=TRUE, cluster="gender")) fails with Error in `love.plot()`: ! Not all treatment levels are present in all clusters.

This appears to happen because no students with gender="Nonbinary" were in the treatment group and therefore no Nonbinary students are among the matched controls. I get the same error even if I use the which.cluster argument to include only "Male" and "Female" genders in the plot. A reproducible example is below.

Is there a way to get love.plot to provide plots for all clusters in which both treatment groups appear, rather than erroring? I realize I could exclude Nonbinary students from the data provided to matchit, but the workflow would be more straightforward if I could carry the full dataset through the entire analysis.

library(tidyverse)
library(MatchIt)
library(cobalt)

# Fake data
set.seed(594)
n = 200
dd = tibble(
  ids = 1:n,
  gender = sample(c("Male","Female","Nonbinary"), n, replace=TRUE, 
                  prob=c(0.55, 0.43, 0.02)),
  gpa = rnorm(n, 3.3, 0.5),
  units = sample(9:15, n, replace=TRUE),
  mother.ed = sample(c("High School", "College", "Graduate school"), n, 
                     replace=TRUE, prob=c(0.6,0.3,0.1)),
  treat = factor(sample(c("Treated", "Not treated"), n, replace=TRUE, prob=c(0.2,0.8)),
                 levels=c("Not treated", "Treated"))
)

# Check membership
dd %>% count(gender, treat)
#> # A tibble: 5 × 3
#>   gender    treat           n
#>   <chr>     <fct>       <int>
#> 1 Female    Not treated    64
#> 2 Female    Treated        21
#> 3 Male      Not treated    79
#> 4 Male      Treated        30
#> 5 Nonbinary Not treated     6

m.out = matchit(treat ~ gender + gpa + units + mother.ed, data=dd,
                method="nearest",
                exact=c("gender"),
                distance="glm", ratio=1)

m.out$model$data %>% 
  full_join(match.data(m.out)) %>% 
  group_by(weights, gender, treat) %>% 
  tally()
#> Joining with `by = join_by(ids, gender, gpa, units, mother.ed, treat)`
#> # A tibble: 7 × 4
#> # Groups:   weights, gender [5]
#>   weights gender    treat           n
#>     <dbl> <chr>     <fct>       <int>
#> 1       1 Female    Not treated    21
#> 2       1 Female    Treated        21
#> 3       1 Male      Not treated    30
#> 4       1 Male      Treated        30
#> 5      NA Female    Not treated    43
#> 6      NA Male      Not treated    49
#> 7      NA Nonbinary Not treated     6

# love.plot works
love.plot(m.out, abs=TRUE) 
#> Warning: Standardized mean differences and raw mean differences are present in the same plot. 
#> Use the `stars` argument to distinguish between them and appropriately label the x-axis.

# love.plot fails (presumably because gender="Nonbinary" appears only in "Not treated" treatment group)
love.plot(m.out, abs=TRUE, cluster="gender")
#> Error in `love.plot()`:
#> ! Not all treatment levels are present in all clusters.
#> Backtrace:
#>     ▆
#>  1. ├─cobalt::bal.tab(...)
#>  2. └─cobalt:::bal.tab.matchit(...)
#>  3.   ├─base::do.call("x2base.matchit", c(list(x), args), quote = TRUE)
#>  4.   └─cobalt:::x2base.matchit(...)
#>  5.     └─cobalt:::.cluster_check(cluster, treat)
#>  6.       └─cobalt:::.err("not all treatment levels are present in all clusters")
#>  7.         └─chk::err(..., call = pkg_caller_call(start = 2))
#>  8.           └─rlang::abort(msg, class = class, !!!args[named], call = call)

# love.plot fails even with "Nonbinary" not included in which.cluster
love.plot(m.out, abs=TRUE, cluster="gender", which.cluster=c("Male","Female"), drop.missing=TRUE)
#> Error in `love.plot()`:
#> ! Not all treatment levels are present in all clusters.
#> Backtrace:
#>     ▆
#>  1. ├─cobalt::bal.tab(...)
#>  2. └─cobalt:::bal.tab.matchit(...)
#>  3.   ├─base::do.call("x2base.matchit", c(list(x), args), quote = TRUE)
#>  4.   └─cobalt:::x2base.matchit(...)
#>  5.     └─cobalt:::.cluster_check(cluster, treat)
#>  6.       └─cobalt:::.err("not all treatment levels are present in all clusters")
#>  7.         └─chk::err(..., call = pkg_caller_call(start = 2))
#>  8.           └─rlang::abort(msg, class = class, !!!args[named], call = call)

^{Created on 2023-08-11 with reprex v2.0.2}

The text was updated successfully, but these errors were encountered:

ngreifer · 2023-08-11T18:18:54Z

Thanks for the bug report and sorry for the confusion. I'm working on a fix for this. The ideal fix would be to be able to use the subset argument to subset to only cases where gender != "Nonbinary". Normally this would look like

love.plot(m.out, abs=TRUE, cluster="gender", subset = dd$gender != "Nonbinary")

However, the function to check to make sure all treatment levels are represented in each cluster is run before the subsetting. So I am working on a fix for that.

Statistically, you need to exclude those with gender == "Nonbinary" from your original sample. They all have a propensity score of 0 and cannot be matched. You should remove them as a pre-processing step before running the matching. They don't contribute to estimating the propensity score and are excluded from the matching anyway. So my solution to you is to do that step before running matchit() or love.plot().

eipi10 · 2023-08-11T18:45:10Z

Thanks Noah. I'd actually missed the subset argument when I was playing around with this. Good to know about that.

ngreifer added a commit that referenced this issue Nov 20, 2023

Cleaning, cluster processing now occurs after subsetting (#70)

7006d90

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

love.plot errors when not all clusters include all treatment levels; is there a way to get plots for just the clusters with complete data? #70

love.plot errors when not all clusters include all treatment levels; is there a way to get plots for just the clusters with complete data? #70

eipi10 commented Aug 11, 2023 •

edited

Loading

ngreifer commented Aug 11, 2023

eipi10 commented Aug 11, 2023

love.plot errors when not all clusters include all treatment levels; is there a way to get plots for just the clusters with complete data? #70

love.plot errors when not all clusters include all treatment levels; is there a way to get plots for just the clusters with complete data? #70

Comments

eipi10 commented Aug 11, 2023 • edited Loading

ngreifer commented Aug 11, 2023

eipi10 commented Aug 11, 2023

eipi10 commented Aug 11, 2023 •

edited

Loading