-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use RowDiff with DBGHashFast in test_aligner_labeled #505
base: rowdiff
Are you sure you want to change the base?
Conversation
--------- Co-authored-by: Oleksandr Kulkov <[email protected]>
if(dynamic_cast<graph::DBGSuccinct const*>(&graph)) { | ||
rd_succ_bv_type().serialize(f); | ||
} else { | ||
rd_succ_bv_type(*rd_succ).serialize(f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what's the best way to avoid copy-construction here, and if it's generally possible for typical results of get_last. Should I try to do dynamic_cast to rd_succ_bv_type, and only if it fails use the constructor?
} else { // Only happens in DBGSuccinct | ||
auto *dbg_succ = dynamic_cast<graph::DBGSuccinct const*>(graph_); | ||
assert(dbg_succ); | ||
node = row_diff_successor(*graph_, node, dbg_succ->get_boss().get_last()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be problematic to, instead of doing this, assign get_last()
to fork_succ
during its load, if it was detected that it is empty? I assume in this case we would also need to change fork_succ_
itself into a pointer to bit_vector
?
} else { | ||
logger->warn("No count vectors could be found in {}. The last outgoing" | ||
" edges will be selected for assigning RowDiff successors", | ||
count_vectors_dir); | ||
rd_succ = last; | ||
if(dynamic_cast<graph::DBGSuccinct const*>(&graph)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't like doing this. If we try to serialize last
, what exactly is the issue with conversion? How much of a slowdown would it be? Alternatively, would it be an option to make some general bit_vector load and add class tags to serialized versions?
@@ -57,33 +57,33 @@ class DBGHashFastImpl : public DBGHashFast::DBGHashFastInterface { | |||
} | |||
|
|||
void add_sequence(std::string_view sequence, | |||
const std::function<void(node_index)> &on_insertion); | |||
const std::function<void(node_index)> &on_insertion) override final; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did it compile without them? I couldn't find the answer online...
if (valid_edges_) { | ||
serialize_valid_edges(valid_edges_); | ||
} else { | ||
serialize_valid_edges(generate_dummy_kmers(1, false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the new indexing, I really don't want to allow these graphs to walk around with invalid in_graph
function. Would lead (and leads if this is removed) to all kinds of hard to debug behaviors. This is not ideal, but at least guarantees it will be valid when the graph is loaded and used somewhere else. But I'm also not sure what's the best place to propagate this information, and how much overhead it may add?
@@ -366,7 +363,7 @@ def test_simple_align_edit_distance_all_graphs(self, representation): | |||
|
|||
params = self._get_stats(self.tempdir.name + '/genome.MT' + graph_file_extension[representation]) | |||
self.assertEqual('11', params['k']) | |||
self.assertEqual('16461', params['nodes (k)']) | |||
self.assertEqual('16438', params['nodes (k)']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This number decreases if valid_edges
is generated and serialized for DBGSuccinct. Probably because dummy nodes are excluded this way. Same with the change above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't looked through everything yet, but here's my first batch of comments
5b5ea5c
to
2d1379e
Compare
Also uses
in_graph(node)
checks more extensively throughout the code base and serializesget_last(graph)
instead ofrd_succ
for non-BOSS graphs.