`love.plot()` option to remove missing indicators of categorical variables #89

pasahe · 2024-11-07T08:49:57Z

I have a data set where some variables have some missing values. When I use the love.plot() function, it adds a variable VAR: <NA> for each categorical variable that has missing values. Is there a way to exclude these missing indicators from the plot? If there's no way to do this, I think it would be a nice feature to add to the function. It's really hard to exclude them afterwards in ggplot.

Thank you very much!

The text was updated successfully, but these errors were encountered:

ngreifer · 2024-11-07T16:03:15Z

To remove them, you should first create a bal.tab object and remove the values from the balance table, then call love.plot() on the modified bal.tab object. See below for how to do this.

b <- bal.tab(W, un = TRUE)
b$Balance <- b$Balance[!endsWith(rownames(b$Balance), "<NA>"),]
love.plot(b)

Fortunately it's not too hard (one or two extra lines of code) so I don't plan on adding this as an option. But at least you don't need to modify the ggplot object and can retain all of the options bal.tab() and love.plot() provide. If you have more complicated data (e.g., clustered or multicategory) you will have to do a little more work to exclude the desired values from each balance table.

pasahe · 2024-11-08T09:33:25Z

Thanks, that's a nice workaround! Unfortunately, when I try to add the labels in the love plot with var.names, it gives an error because it expects the <NA> indicators. See this behavior in the following reproducible example:

library(cobalt)
v <- data.frame(old = c("age", "educ", "race_black", "race_hispan", 
                        "race_white", "married", "nodegree", "re74", "re75", "distance"),
                new = c("Age", "Years of Education", "Black", 
                        "Hispanic", "White", "Married", "No Degree Earned", 
                        "Earnings 1974", "Earnings 1975", "Propensity Score"))
covs <- subset(lalonde_mis, select = -c(treat, re78, nodegree, married))
b <- bal.tab(treat ~ covs, data = lalonde_mis)
b$Balance <- b$Balance[!endsWith(rownames(b$Balance), "<NA>"),]
love.plot(b, var.names = v)

This gives me an error in the assingment old_levels[idx] <- names(new_levels) in the last line.

ngreifer · 2024-11-08T17:05:23Z

Ah, that's annoying. I'll work on making a fix for this. In the meantime, you can just change the row names of b$Balance to the new names and then omit the var.names argument to love.plot(). That doesn't allow you to use the feature whereby the stem of a set of dummy variables is replaced by a common new stem, but it does allow you to manually change the name of each variable. So for example, I believe you could add

matches <- na.omit(match(v$old, rownames(b$Balance))
rownames(b$Balance)[matches] <- v$new[matches]

before supplying it to love.plot(). Obviously this is getting into hacky territory which I why I think an automatically solution would be necessary.

pasahe · 2024-11-11T14:13:18Z

Yes, that would be a workaround for this reproducible example. It wouldn't work for my particular case when dealing with categorical variables, because in var.names I have the old and new labels for the entire variable, not for all categories. For example, instead of having all the labels for all the categories in race, I only have the label for the variable name:

v <- data.frame(old = c("age", "educ", "race", "married", "nodegree", "re74", "re75", "distance"),
                new = c("age", "years of education", "Race", "married", "no degree", 
                        "Income 1974", "Income 1975", "Propensity Score"))

But don't worry, I'll tweak the code to print the correct labels. It would be great if a solution for this scenario is added to the package in the near future.

Thank you very much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`love.plot()` option to remove missing indicators of categorical variables #89

`love.plot()` option to remove missing indicators of categorical variables #89

pasahe commented Nov 7, 2024

ngreifer commented Nov 7, 2024

pasahe commented Nov 8, 2024

ngreifer commented Nov 8, 2024

pasahe commented Nov 11, 2024

love.plot() option to remove missing indicators of categorical variables #89

love.plot() option to remove missing indicators of categorical variables #89

Comments

pasahe commented Nov 7, 2024

ngreifer commented Nov 7, 2024

pasahe commented Nov 8, 2024

ngreifer commented Nov 8, 2024

pasahe commented Nov 11, 2024

`love.plot()` option to remove missing indicators of categorical variables #89

`love.plot()` option to remove missing indicators of categorical variables #89