Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH make change percentiles plot to histogram #97

Conversation

gwarmstrong
Copy link
Member

This PR should address the plotting concerns in #94, including:

  • change the plot to a histogram
  • add title to the chart
  • label axes
  • remove plotly tools (I do not think panning or zooming are particularly useful on this plot)

Sample below:

Screen Shot 2021-03-05 at 10 52 49 AM

Note that it seems git's diffing algorithm gave some weird results starting around 398/433 starting with function updateSimilarity(similarityData, state){ . To the best of my knowledge, I did not modify anything below that.

@wasade
Copy link
Member

wasade commented Mar 5, 2021

Thanks!!

@wasade
Copy link
Member

wasade commented Mar 5, 2021

@justinshaffer, what do you think about this plot type for alpha diversity?

@wasade
Copy link
Member

wasade commented Mar 5, 2021

Feedback from the comms team:

"

  • Can the labels be things like “richness” or “evenness” or “diversity” and have those clearly laid out to what that is in the text?
  • The axis is labeled Relative Frequency but the paragraph talks about Relative Abundance & diversity value. This seems to be introducing a new concept not yet explained
  • The axis label Faith’s PD makes it challenging to understand what it is referring to. Perhaps use Faith’s Phylogenetic Diversity as a subheading with Richness as the main axis title.

I get that the participant is supposed to see what their alpha-diversity is compared to the two countries that were studied. I just don’t get the how or what it means in comparison. They need to be able to walk away knowing how the information provided directly relates to them and how to interpret it with the least information given as possible."

@justinshaffer
Copy link

@wasade Thanks! I think it's clear and looks good. I'm not sure what the rate of histogram-understanding is, so I suggest to make things super clear, for example by changing the x-axis label to 'proportion of individuals', or even better to percentages, and the y-axis to additionally include a second line of text or similar, that clarifies lower values are less diverse and higher values are more diverse (e.g., "<-- lower diversity / higher-diversity -->"). It's also tempting to place additional markers other than 'You' for fun, but I understand using things implying 'sick' or 'healthy' is dangerous. But what about the average for like, healthy infants or something - just to highlight that increases in Faith's PD are associated with age, at least early on? Just thoughts - I think it's good as is!

@wasade
Copy link
Member

wasade commented Mar 11, 2021

@gwarmstrong, any update here?

@gwarmstrong
Copy link
Member Author

Here are some updates to the wording on the plot!

Screen Shot 2021-03-11 at 2 45 42 PM

@justinshaffer
Copy link

justinshaffer commented Mar 12, 2021 via email

@gwarmstrong
Copy link
Member Author

The metric is Faith's PD. I could Include it in the x axis title, like so:

Screen Shot 2021-03-11 at 4 31 55 PM

@justinshaffer
Copy link

justinshaffer commented Mar 12, 2021 via email

@gwarmstrong
Copy link
Member Author

All good! I think that explanation makes sense. Here is the plot produced with the latest updates:
Screen Shot 2021-03-11 at 5 01 11 PM

@justinshaffer
Copy link

justinshaffer commented Mar 12, 2021 via email

@wasade
Copy link
Member

wasade commented Mar 12, 2021

Thank you both!!! This is great!

@gwarmstrong
Copy link
Member Author

While we're here, I do want to make one more plug for cumulative density. In the cumulative histogram below, compared the histograms shown so far, someone can much more easily see the percentage of samples that have a higher/lower alpha diversity compared to their sample, by looking at the y coordinate of their "You" line. E.g., this sample's alpha diversity is greater than about 60% of US samples and ~55% of UK samples, which at least more directly answers a question I might have when looking at this plot. It is pretty hard to get this from the probability density histogram. Also, the plot is just prettier when the alpha diversity values are not present in all bins (e.g., left tail of the histograms shown above).

@justinshaffer any ideas on how we could make the cumulative density aspect more approachable for general participants?

Screen Shot 2021-03-12 at 10 19 45 AM

@wasade
Copy link
Member

wasade commented Mar 12, 2021

One concern I have here is the cumulative plots don't align on the right side which makes it look like an artifact of the visualization, although I agree it is prettier in that it is smoothed. Is the histogram possible to express with some type of smoothing function maybe?

@justinshaffer
Copy link

justinshaffer commented Mar 15, 2021 via email

@gwarmstrong
Copy link
Member Author

Yes why are the colors not aligned on the right-hand side of the plot? Does this matter?

I had not figured out a way to extend the colors to the right for a cumulative histogram in Plotly yet. I can work more on figuring this out if it matters, but if it is not supported by Plotly then the this ticket gets a lot more open-ended.

I agree with everything you said re: this being much easier to pull something meaningful out of. Sorry this may be dumb but I don't look at these often - is it really the case that 60% of the blue data is to the left of the 'You' line - rather than to the right of it? I'm squinting my eyes and I think the left-hand tails are artifactually making the left-hand part of each dataset seem smaller than what is on the right-hand side.

So the "left seeming smaller than the right" might be confounding area under the curve. It is not necessarily the case that 60% of the are under the cdf is left over the 'You' line. Instead, the blue USA bar having a height of 60% at the 'You' line indicates that 60% of samples from USA have an alpha diversity <= the x-value at the 'You' line. The height of the bar is cumulative frequency, not a count of samples.

@justinshaffer
Copy link

justinshaffer commented Mar 16, 2021 via email

@wasade
Copy link
Member

wasade commented Dec 13, 2021

It would be nice to have this merged as it's one of the few result types we have for skin/oral samples. please let me know if this will be completed or if it should be closed off

@wasade wasade closed this Apr 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants