-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weighted distribution across multiple variables #4
Comments
Hi @soooh! You should actually be able to do this with the current code. It'll depend on what, exactly, you're looking to calculate. But lets say you're looking for the weighted distribution of race, by gender and over time. In that case, this should work: grouped = data.groupby([ "year", "gender" ])
dist = calc.distribution(grouped, "race").round(3) Does that work? Are you aiming for something slightly different? |
Ah, so what that does is give me the racial demographics of women and men separately. E.g., of all the women, 20% are white, 10% are black, and so on. What I want is something like, 10% of the group is white men, 8% white women, etc. Does that make sense? |
Yep.
Ah, sounds like I misunderstood the goal. In that case, the easiest way might be like so: data["race_x_gender"] = data[[ "race", "gender" ]].apply(" x ".join, axis=1)
dist = calc.distribution(data.groupby("year"), "race_x_gender").round(3) Does that achieve your goal? (It assumes that I'll also think about ways I could incorporate a generic feature like this into the library itself. Thanks for the suggestion! |
Ah yes, that is actually what I am doing! 😄 |
Could be a useful addition to your library. As an example, I'm interested in getting stats on race and gender in a group over time. Something like:
The text was updated successfully, but these errors were encountered: