Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explanation of choice to model Nextstrain clades #266
base: main
Are you sure you want to change the base?
Explanation of choice to model Nextstrain clades #266
Changes from 1 commit
77ead57
742ce1d
2b3b555
51eb2fc
417a8be
42c403c
0d1fb0b
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe worth adding on parenthetically at the end of this sentence something like ", even if there is not always a perfect one-to-one alignment between the clade and lineage assignments for a single sequence."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this in principle, but I'm having some trouble seeing how to be handwavy enough here not to open up a(t least one) rather large can of worms requiring potentially a fair bit more text to explain.
In a perfect world there would be a many-to-one mapping such that every Pango lineage corresponds to exactly one Nextstrain clade (not a one-to-one mapping). But that quickly leads to the cans of worms that are nestedness of naming (which is why the perfect world is many-to-one) and "lineage assignments aren't data" (which is why we don't live in a perfect world), and if we're not careful the pain that is the ARG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we perfectly knew the evolutionary relationships between all the sequences (and, following Pango's approach, starting new trees for every recombination event), whether or not we get a clean many : one mapping of Pango lineage : Nextstrain clade comes down to whether every node which defines a more specific label in one system does so in the other.
(For example, if the node that carves 24B out of 24A is also the node that carves JN.1.11.1 from JN.1.11, things line up cleanly, despite the fact that 24A corresponds to JN.1. All the rest of JN.1 names will map to 24A, while names in JN.1.11.1 will map to 24B. Otherwise there will be slop. If the node that carves 24B out of 24A is the parent of (or any other node basal to) the node that carves JN.1.11.1 from JN.1.11, then JN.1.11 corresponds to both 24A and 24B. If it's the child (or other tip-ward descendant node), JN.1.11.1 (rather than JN.1.11) will correspond to both 24B and 24A.)
But in reality, there is circularity and lots of conditional inference-dependent inference. We don't actually observe the relationships, we estimate them. And we don't observe labels, we assign (infer, really) them, conditioned on both some estimate of the relationships (a phylogeny) and some inference of the labels (the already assigned Pango lineages and nextstrain clades to the sequences in the tree). That's a lot of additional room for mismatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps something like one of these? (Which add load-bearing phrases to accompany the heavy lifting done by "sufficient correspondence.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested just adding "typically possible", which I think captures it. The point about single-sequence-level classification is helpful context, but I'd suggest relegating it to a footnote.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like "typically possible" and I like that a footnote affords a bit more space for clarity in this aside. I have done so.