Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BitSet #235

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Add BitSet #235

wants to merge 2 commits into from

Conversation

clarfonthey
Copy link

Implements #171.

I was also interested in this, so, I decided to take the time to actually create a reasonable wrapper that fits well into the library. It borrows its API from the most similar existing structure in libstd, which is BTreeSet.

This is an "MVP" implementation, meaning that it offers an API that is still useful but missing a few operations that don't have implementations elsewhere in the crate. Specifically, it omits:

  • is_disjoint, is_subset, is_superset checks: I feel like these kinds of checks should probably have a mirrored API for BitSlice before they're implemented here. Specifically, some way of representing binary folds like (x & y).not_all() without allocating.
  • Difference, SymmetricDifference, Intersection, and Union: these also should have a similar API on BitSlice, except closer to (x & y).iter_ones() without allocating
  • IntoIterator: while iterators over references just use IterOnes, this would require an IterOnes-like iterator for BitBox to be efficient.

I've added commented-out boilerplate for those omitted implementations so that someone who has the desire for them knows where to start. If this isn't reasonable to include in the released library, I'd be more than happy to remove them, but I figured I might as well include my work here.

The most relevant design decision here is that this is explicitly a wrapper around BitVec, which means it doesn't add affordances you'd get from something like BTreeSet which will precompute the number of elements in the set. This also means that you can freely access the underlying BitVec without disrupting any invariants.

That said, there is one "invariant" that is offered internally by utilities: in some cases, it's most useful to ensure that the set is "trimmed," i.e. the last bit is a one. Two sets can be compared by comparing their trimmed versions, which I call "shrunken" in the code since it lines up closer to the existing shrink_to_fit method. (Even though it's not quite the same.) I try to uphold this invariant in some of the methods (not leaving too much "slack" at the end of the set), but it's not explicitly required for the methods to work.

@@ -13,7 +13,7 @@
# attr_fn_like_width = 70 # Leave implicit
# chain_width = 60 # Leave implicit
edition = "2018"
fn_args_layout = "Tall"
fn_params_layout = "Tall"
Copy link
Author

@clarfonthey clarfonthey Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes were made because the original name was deprecated in the latest nightly rustfmt, and I needed to make them for just format to work properly. I updated the stable version too since I figured it'd be a good idea.


/// Creates a new bit-set for a range of indices.
#[inline]
pub fn from_range(range: ops::Range<usize>) -> Self {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This felt like the best analogue for BitVec::repeat to me. Since only allowing RangeFrom seemed weird, I decided to accept a proper Range instead.

Comment on lines +464 to +522
self.inner.resize(value + 1, false);
self.inner.set(value, true);
Copy link
Author

@clarfonthey clarfonthey Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I was thinking of avoiding the "redundant" write to the final bit, but then realised that due to the way fill works, it would probably just make the result slower anyway, since it would always need a special write to the last element anyway. If there's a better way of doing this, would love to know about it.

Comment on lines +500 to +565
// NOTE: it's unclear how this affects performance and if we should
// do this automatically, or require it manually only
self.shrink_inner();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't put much thought into this at all, so, would love input on whether this is a good idea.

Comment on lines +536 to +608
self.inner
.iter_mut()
.enumerate()
Copy link
Author

@clarfonthey clarfonthey Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fully aware that this is super inefficient, but iter_ones doesn't have a version that would play nice with mutation. Would be open to not landing this method if it seems dishonest to include here (which, it probably is).

/// assert_eq!(set_iter.next(), None);
/// ```
#[inline]
pub fn iter(&self) -> IterOnes<'_, T, O> {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could change this to a wrapper if desired, but I figured reusing the original type was the best idea since any users here should be clear what API is being used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant