-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ART: Adaptive Radix Tree implementation #307
base: main
Are you sure you want to change the base?
Conversation
This PR implements Adaptive Radix Tree (ART)[1], a compact key-value search tree. It maps byte-sequences to arbitrary values while maintaining data in sorted order. ART can compress multiple nodes into single node and adaptively expand or shrink its internal nodes to avoid worse-case space consumption. Within the broader swift-collection project, it can be considered a compact variant of SortedSet/SortedDictionary but at the cost of keys being convertible to byte-sequences. [1] https://ieeexplore.ieee.org/abstract/document/6544812
b77d93c
to
654b5e4
Compare
654b5e4
to
91e2c7c
Compare
2e56de5
to
4174087
Compare
Instead of using UnsafeMutablePointer<Node?> to locate where parent stores the child pointer everywhere, localize the unsafe behaviour using the `updateChild` method.
58b664a
to
977d7dd
Compare
977d7dd
to
5fa86d5
Compare
70fcdb1
to
c9578a2
Compare
0054aa1
to
27bfed1
Compare
085d821
to
1cf3b88
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll be honest that this is a bit outside of the area of my expertise and I wasn't able to provide a lot of valuable input -- I know Karoy has some specific tricks to pull for such collections. However, all those hints are about to become invalidated by the arrival of moveonly types. At the same time, the allocation semantics this tree has I understand are something we specifically want here -- so it is somewhat novel and not much precedent for it.
One thought is to maybe consider using the benchmark package and measure the ARC traffic before we start doing these Unmanaged dances -- it really seems a bit iffy -- also because the manual retain/releases done on the class references are on a strong class reference (buf
) to a ManagedBuffer class so that would incur ARC trafi anyway -- probably best to kick off the benchmark package mentioned inline and measure ARC traffic before optimizing.
We discussed with @lorentey and the collection should probably incubate for now outside of swift-collections but we'd like to try to help out polish it up -- even if for a specific use case; which may be easier than bringing it up to the long-term compatibility standards swift-collections would require...
I think that perhaps once we put the collection through some usage that'd be a good driver for brushing it up?
Thanks for the awesome work here so far @vishesh 🙏
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
internal struct Const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filename: a bit weird to have filename with spaces... it should "work fine" but just in case I'd suggest avoiding that.
static var testPrintAddr = false | ||
} | ||
|
||
public protocol ARTreeSpec { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for? 🤔
@@ -0,0 +1,25 @@ | |||
//===----------------------------------------------------------------------===// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filename: a bit weird to have filename with spaces... it should "work fine" but just in case I'd suggest avoiding that.
"ARTree/Node48.swift" | ||
"ARTree/NodeHeader.swift" | ||
"ARTree/NodeLeaf.swift" | ||
"ARTree/NodeStorage.swif" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"ARTree/NodeStorage.swif" | |
"ARTree/NodeStorage.swift" |
public typealias KeyPart = UInt8 | ||
public typealias Key = [KeyPart] | ||
|
||
protocol ArtNode<Spec> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming consistency: ARTNode
} | ||
|
||
var endIndex: Index { | ||
return Index(forTree: self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this right?
} | ||
|
||
internal var endIndex: Index { | ||
return capacity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... is it? In other places we bound by the count, so the active element number -- is this ok here?
static var capacity: Int { | ||
@inline(__always) get { 256 } | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's the way without going unsafe I think.
/// | ||
/// `RadixTree` has the same functionality as a standard `Dictionary`, and it largely implements the | ||
/// same APIs. However, `RadixTree` is optimized specifically for use cases where underlying keys | ||
/// share common prefixes. The underlying data-structure is a _persistent variant of _Adaptive Radix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// share common prefixes. The underlying data-structure is a _persistent variant of _Adaptive Radix | |
/// share common prefixes. The underlying data-structure is a _persistent_ variant of _Adaptive Radix |
static func retainChildren(_ children: Children, count: Int) { | ||
for idx in 0..<count { | ||
if let c = children[idx] { | ||
_ = Unmanaged.passRetained(c.buf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really comfortable with the somewhat manual retain dances we're doing here.
On the other hand, I understand the desire -- perhaps the right next step would be to write a benchmark using https://github.com/ordo-one/package-benchmark and observe the refcounts; this way you can actually measure and optimize them.
// It'll report slightly unbalanced retain/release counts, but this is okey: swiftlang/swift#64636 as at least you'll be able to view the numers changing as you tweak
I discussed the refernce counting a bit with @lorentey and here's some notes:
So it really feels like we should try to use a class to do the refcounting and use the package-benchmark to measure and optimize 🤔 |
This PR implements Adaptive Radix Tree (ART)[1], a compact key-value search tree. It maps byte-sequences to arbitrary values while maintaining data in sorted order. ART can compress multiple nodes into single node and adaptively expand or shrink its internal nodes to avoid worse-case space consumption.
Within the broader swift-collection project, it can be considered a compact variant of SortedSet/SortedDictionary but at the cost of keys being convertible to byte-sequences.
[1] https://ieeexplore.ieee.org/abstract/document/6544812
Checklist