Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for custom allocators in standard collections #42774

Closed
alexcrichton opened this issue Jun 20, 2017 · 68 comments
Closed

Tracking issue for custom allocators in standard collections #42774

alexcrichton opened this issue Jun 20, 2017 · 68 comments
Labels
B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@alexcrichton
Copy link
Member

The new Alloc trait defined in RFC 1398 was added in #42313 but the libs teamdecided to hold off on Vec integration just yet to be conservative.

This issue is intended to track the integration of the Alloc trait into standard collections like Vec, BTreeMap, etc. I'm not personally aware of any current blockers, but what we'll probably want to do as part of this issue includes:

  • Develop a set of conventions for collections which support custom allocators
  • Ensure no regressions with integration of a custom allocator:
    • error messages should look as they do today
    • no performance regressions at runtime
    • ideally minimal compile-time impact
@alexcrichton alexcrichton added B-unstable Blocker: Implemented in the nightly compiler and unstable. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jun 20, 2017
@Ericson2314
Copy link
Contributor

There are some very interesting issues here:

  • "Flat collections" (Vec, VecDec) can easily store a &mut to the allocator they were crated with, indirection-heavy ones would be significantly bloated if every Box also had an allocator reference.

  • Ideally all collections would be resilient in the face of allocation failure, but that will take some major algorithms design work. It's better to convert quickly blowing up on OOM as they do today.

  • I had the idea of doing trait InfallableAllocator = Allocator<Error=!> to signify an allocator that wouldn't fail (and thus must panick on OOM). If, when we en-mass convert collections, we have them take an InfallableAllocator parameter, it will be clearer to the end-user that the collections are not oom-resilient, and there would be less panics strune about.

    • I'm not up-to-date with all the discussion of invalid Layouts, so I'll need to double-check that ! wouldn't instead need to be some strict Unsupported;. Then again, it isn't like the collections will be able to handle an unsupported layout either...
  • Not sure what the ETA on default parameters is, so we're probably best off reexporting collections in std with the allocator parameter specialized to the global allcoator. This will work today, right?

@joshlf
Copy link
Contributor

joshlf commented Jun 20, 2017

@Ericson2314

I had the idea of doing trait InfallableAllocator = Allocator<Error=!> to signify an allocator that wouldn't fail (and thus must panick on OOM). If, when we en-mass convert collections, we have them take an InfallableAllocator parameter, it will be clearer to the end-user that the collections are not oom-resilient, and there would be less panics strune about.

I believe that the current design doesn't allow you to parametrize the error, so it must return the concrete error type with two arms - Unsupported and Exhausted - and if you want to panic on OOM, it's your (the client's) responsibility to do that by panicking yourself or by calling the oom method.

@Ericson2314
Copy link
Contributor

@joshlf err yes, this is predicated on us adding an associated error type.

@ghost
Copy link

ghost commented Jul 11, 2017

To support the allocator trait in the collection types, the constructors will need to be duplicated.
@pnkfelix has used the _in suffix and appended the allocator to the list of arguments:

impl RawVec<T, A>
    fn with_capacity(n: usize) -> RawVec<T, HeapAlloc>;
    fn with_capacity_in(n: usize, a: A) -> RawVec<T, A>;
}

I personally quite like this approach.

@ghost
Copy link

ghost commented Jul 14, 2017

@Ericson2314 mentioned this:

"Flat collections" (Vec, VecDec) can easily store a &mut to the allocator they were crated with, indirection-heavy ones would be significantly bloated if every Box also had an allocator reference.

I am currently implementing a non-trivial allocator myself (for a relocatable heap) and noticed that it don't need to keep anything around for deallocation (the type information is still needed).

This still works when multible allocators are used.

I wonder if this could be integrated somehow? Associated types to indicate what needs to be stored?

@joshlf
Copy link
Contributor

joshlf commented Jul 14, 2017

@s3bk

I am currently implementing a non-trivial allocator myself (for a relocatable heap) and noticed that it don't need to keep anything around for deallocation (the type information is still needed).

This still works when multible allocators are used.

I wonder if this could be integrated somehow? Associated types to indicate what needs to be stored?

So the idea is that, for allocators where all instances of the type refer to the same global singleton, you wouldn't actually need to store a reference? Presumably that wouldn't work for allocators that could actually have multiple distinct instances, right?

This motivates a follow-up: if you were implementing a global singleton, couldn't you just make the actual Alloc type that you expose to users be a ZST that delegates to some private instance under the hood?

@ghost
Copy link

ghost commented Jul 15, 2017

@joshlf

So the idea is that, for allocators where all instances of the type refer to the same global singleton, you wouldn't actually need to store a reference? Presumably that wouldn't work for allocators that could actually have multiple distinct instances, right?

Yes, for allocators with multiple instances the allocation function would benefit from non-zero knowledge.
(Mine needs to know the base address for instance. And working with more than one allocator is perfectly reasonable.)
What I was trying to point at, is that the pointer passed to dealloc is enough information to recover the actual instance used to allocate the memory.

@hanna-kruppe
Copy link
Contributor

@s3bk How common is the case where you will only ever need to deallocate? All the standard collections and smart pointers have methods that allocate new memory from the same allocator (usually clone; Rc and Arc have make_mut).

Hm, now that I think about it, those methods already need to clone the allocator, so that would potentially offer a way out. Although it's unclear to me if we can switch between storing A and A::DeallocInfo in the collection depending on whether A: Clone, and it's also not clear to me if that would be a good design (it seems to work out mostly by chance, assuming it works out at all).

@ghost
Copy link

ghost commented Jul 15, 2017

@rkruppe there could be

trait Alloc {
    type DeallocInfo;
    fn dealloc(&self, ptr: *mut T, layout: Layout);
    fn dealloc_with(info: Self::DeallocInfo, ptr: *mut T, layout: Layout);
    // snip
}
struct Rc<T, A: Alloc> {
  dealloc: A::DeallocInfo;
  // snip
}
impl<T> Rc<T, HeapAlloc> {
    fn make_mut(&mut self) -> &mut T { 
         // use HeapAlloc to potentially clone T
    }
}
impl<T, A: Alloc> Rc<T, A> {
    fn make_mut_in(&mut self, a: A) -> &mut T { 
         // use A to potentially clone T
    }
}

This would keep Rc as thin as possible, not break existing code and allow to use make_mut_in for code that is generic over Alloc

edit: The case for only dealloc: Box, and many non-std types that only provide a minimal api.
edit 2: completely wrong.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Jul 15, 2017

@s3bk make_mut_in makes HeapAlloc a special case for no good reason, granting it better ergonomics than other allocators that can support make_mut just as well.

edit: The case for only dealloc: Box, and many non-std types that only provide a minimal api.

Box::clone allocates. Do you have examples of such non-std types?

@ghost
Copy link

ghost commented Jul 15, 2017

@rkruppe It looks like I was completely wrong here. Most cases are covered by Box and the others reallocate.
Not sure how the the ergonomics could be solved, apart from adding another trait.

@joshlf
Copy link
Contributor

joshlf commented Jul 19, 2017

@s3bk

Most cases are covered by Box and the others reallocate.

Is that a problem? The key property of DeallocInfo is the ability to, given a pointer allocated from the allocator, locate the allocator itself. Thus, you should be able to have any method that takes an already-allocated pointer have a _with variant - realloc, shrink, etc.

@joshlf
Copy link
Contributor

joshlf commented Jul 21, 2017

@s3bk

There's another angle to consider here: how do we obtain a DeallocInfo? Presumably we need to guarantee that an object of that type has the same lifetime as the Alloc it was obtained from, so how do we ensure that? I'm imagining something like fn dealloc_info(&mut self) -> &mut Self::DeallocInfo, but then the lifetimes get pretty hairy. In particular, once somebody has a &mut DeallocInfo, the allocator can't be used again until the reference goes away because it's a mutable reference. This means you can allocate one Box at a time from an allocator...

@ghost
Copy link

ghost commented Jul 22, 2017

@joshlf The lifetime problem is indeed tricky…

What if a lifetime would be added to the Alloc trait?

trait Alloc<'a> {
    type DeallocInfo;
    fn dealloc_info(&self) -> Self::DeallocInfo;
    ...
}
impl<'a> Alloc<'a> for &'a MyAllocator {
    type DeallocInfo = MyDeallocInfo<'a>;}

@Mark-Simulacrum Mark-Simulacrum added the C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC label Jul 22, 2017
@joshlf
Copy link
Contributor

joshlf commented Jul 22, 2017

@s3bk that's an interesting approach, but doesn't doing impl<'a> Alloc<'a> for &'a MyAllocator require all allocators to be threadsafe? I'm writing an allocator right now that very much benefits from not needing to be threadsafe.

@ghost
Copy link

ghost commented Jul 22, 2017

@joshlf this crude example should not be Send or Sync. So it could safely be non-theadsafe .
Or am I missing something?

@joshlf
Copy link
Contributor

joshlf commented Jul 23, 2017

@s3bk Ah I think you're right about that. Good call.

@Ericson2314
Copy link
Contributor

Ericson2314 commented Jul 23, 2017

Not quite sure where the conversation is now, but we should probably discuss

type SomeCollection<T> = collections::SomeCollection<T, GlobalAlloc>;

As a temporary way to get the ball rolling here without impacting breaking std, needing newtypes or language design work for default params with aliases.

(Picking up the discussion from #43112 (comment).)

@Ericson2314
Copy link
Contributor

Ericson2314 commented Dec 28, 2017

Cross post #32838 (comment)

I've started generalizing the collections myself; the easiest way to demonstrate how the associated return type helps is to just do the code change, I suppose.

@alexreg
Copy link
Contributor

alexreg commented Apr 30, 2018

So, as far as I can tell, the real issue here is the one with knowing which allocator to use for further allocations or deallocation using a type. In the case of a single instance of each Alloc type, this can be solved all statically (no memory or runtime overhead), as mentioned. For multiple instances of an allocator type, there needs to be a (mutable) reference in each value to the allocator used. I would thus propose an associated type on Alloc with an AllocRef trait bound. The AllocRef trait would have a constructor method that takes the Alloc instance – for a singleton Alloc type (including the global allocator), the type implementing AllocRef would be zero-sized, and the constructor method wouldn't care about the instance it was passed, whereas it would simple store the reference for multiple-instance allocators.

This would seem to achieve the best of both worlds – no additional cost whatsoever over the current situation when you use the global allocator, and the necessary (but minimum) cost when you use custom allocators.

@SimonSapin
Copy link
Contributor

@alexreg I don’t think there is a need for an AllocRef trait.

What happens at the moment for RawVec is that it has a type parameter A: Alloc and contains a value of type A directly. When the allocator type is zero-size like struct Global; in std::heap, this field has no overhead. For allocators that need to keep per-instance state, the idea is that the trait would be implemented not for the allocator struct, but for references to it: impl<'a> Alloc for &'a MyAllocator, and what you pass to generic collections is such a reference. (In this case with &_ rather than &mut _ so that you can use the same allocator in multiple collections at the same time, the allocator needs to use somthing like Cell or Mutex internally.)

@Ericson2314
Copy link
Contributor

Yes that is what I did for Default. We can add additional methods like new_in and from_iterator_in; new classes can be made later.

@glandium
Copy link
Contributor

@fitzgen See the allocator_api crate, which has the Alloc trait, Box and RawVec.

@fitzgen
Copy link
Member

fitzgen commented Nov 29, 2018

@SimonSapin

Regarding Box::leak, this is an interesting case. I think the today’s type system might not be able to express this in its full generality as that would require something like "associated lifetimes" in traits (if that even makes sense). The simplest for now would be to only implement it for Box<T, std::alloc::Global>.

I think this signature would work with Rust today (although I have not tried it yet):

impl<T, A: Alloc> Box<T, A> {
    pub fn leak<'a>(self) -> &'a T
    where
        A: 'a,
    { ... }
}

@fitzgen
Copy link
Member

fitzgen commented Nov 29, 2018

@fitzgen See the allocator_api crate, which has the Alloc trait, Box and RawVec.

Filed fitzgen/bumpalo#2

@lachlansneff
Copy link

Has there been any progress with stabilizing this?

@Ericson2314
Copy link
Contributor

@glandium just got their PR for Box less blocked. Once that lands I'll rebase my PR for the other collections.

@joshlf
Copy link
Contributor

joshlf commented Feb 16, 2019

@Ericson2314 @glandium Link to that PR?

@passchaos
Copy link

@joshlf maybe #58457

@joshlf
Copy link
Contributor

joshlf commented Feb 16, 2019

@passchaos Looks like it; thanks!

@lachlansneff
Copy link

It'd be useful if we could use the traditional collection apis (new, with_capacity, etc) if our allocator type parameter implemented Default.

@SimonSapin
Copy link
Contributor

As long as #27336 is not implemented, that would make some currently-valid programs ambiguous because type inference doesn’t know what allocator to pick.

@lachlansneff
Copy link

lachlansneff commented Feb 17, 2019

That's been feature-gated for 3 years. Can it not just be stabilized?

@SimonSapin
Copy link
Contributor

Assuming yea mean #27336, it’s better discussed in the relevant tracking issue. I haven’t been following this one, but from a quick glance at the thread its implementation is not complete yet and it’s not "just" a matter of flipping the stabilization switch.

@Manishearth
Copy link
Member

Is there an RFC or design document talking about what the current plans for allocators in stdlib collections are? The allocators RFC leaves this stuff undecided, and this issue seems to be an implementation tracking issue.

@SimonSapin
Copy link
Contributor

At this point I think we need a champion who would summarize what’s still unresolved in the (numerous) discussions so far and see, try to build consensus, and make set of proposals (possibly as a new RFC).

Note that I am not volunteering at this time :]

@gnzlbg
Copy link
Contributor

gnzlbg commented Mar 27, 2019

Such a champion might probably want to start a working group, similar to the unsafe-code-guidelines, with its own github repo, where multiple parts of the "Allocators issue" can be explored in parallel, with active meetings to make progress, and where the parts of the issue that achieve consensus get merged slowly into a document with rationale, which is then sliced into suitable RFCs.

Summarizing the discussions effectively and putting them into a single comment would be a lot of work, and it won't IMO help much, because 10 people are going to answer to the summary, asking questions and discussing 10 different aspects of it.

The champion of such a working group would probably need to be a core team or lib team member with the time and will to see this through, and that will probably require somehow fitting allocators in the 2019 roadmap.

@SimonSapin
Copy link
Contributor

Regarding the 2019 roadmap, allocators are already the one item specifically mention for the library team in rust-lang/rfcs#2657. Though to some extent this might be wishful thinking as long as no-one steps up to lead this work.

I feel that this is a much smaller topic than unsafe code guidelines, but on further thought you’re probably right that this is a better model to keep sub-discussions organized.

@Ericson2314
Copy link
Contributor

Once Box<T, A> has landed, I'll finish off https://github.com/QuiltOS/rust/commits/allocator-error / #52420 . I'd be happy to make an RFC at that point describing what I did.

@alexcrichton
Copy link
Member Author

I've made a post on internals where the libs team is considering spinning up a working group to drive this issue to completion, and if you're interested please feel free to comment there!

@TimDiekmann
Copy link
Member

As we now have a dedicated repository for the allocators WG this issue may be closed/locked to keep discussions in one place?

@mark-i-m
Copy link
Member

mark-i-m commented May 4, 2019

I think this issue still serves to funnel people in the direction of that repo. Nobody will know to go there otherwise.

@alexcrichton
Copy link
Member Author

I agree with @TimDiekmann that centralizing discussion is probably best now, so I'm going to close this in favor of the WG repository.

I'll open an issue on the repository about discoverability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests