-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switching away from AxisArrays
to DimArrays
#336
Comments
I guess some things to consider are:
|
|
I would be happy to support this and work on any required changes to DimensionalData.jl. Something not mentioned here is that with a Edit: I think point 2 is resolvable. we should be able to make |
Mentioning @cpfiffer who I think has mentioned not liking AxisArrays. I could get a PR up in a couple days if you'd like. |
I think nobody likes AxisArrays, including me 😄 Nevertheless we need some concrete motivation and clear advantages for why we should make these breaking changes and why it is good to break users' workflows and downstream packages. It seems that many issues with the current design would not be solved by DimArray (e.g., we would still have to group and ungroup e.g. vector- or matrix-valued variables manually and would not be able to work with variables of different types properly), but of course it's nice if we can fix (or rather avoid) JuliaArrays/AxisArrays.jl#182.
It would be interesting to see concrete examples of how this would differ from the current behaviour when indexing or slicing chains. Produce the standard stats implementations the same results that we get with our manual implementations currently? Many statistical summaries have to be implemented manually anyway I assume since they are quite specific for our use case here, and IIRC also the implementations of e.g. Lines 280 to 298 in e7b3db5
It would be nice to make this discussion a bit more concrete but directly comparing current behaviour and how it would change with |
I don't really have a horse in this race or know Turing.jl internals, I was just offering to help out if any changes are needed to DimensionalData.jl. But there shouldn't need to be any breaking workflow or syntax changes here when indexing or slicing. There are some small things that are nicer for this codebase, like: # niter = size(chn, 1)
niter = size(chn, :iter) So you don't need to use magic numbers to specify axes. And I do get the feeling there is code that could leverage DimensionalData.jl here given some thought, like the Tables.jl interface is very similar.
I'm not sure I understand the use case, but for variables of different types and sizes I would use a It's also usable as a table where the columns are the dimensions (which are looped to length) and the layers are the other columns. |
Maybe before I reply I should repeat that I'm not against this change (as I said nobody is really happy with
While
One of the main pain points with |
I was imagining using DimensionalData.jl could simplify some code in this package, more than for the user! That code is from this package. It should be essentially the same for the user, but with hopefully less bugs.
I guess I don't understand why you are stuck in that situation. Why not just use a Essentially |
It doesn't have to and we want to support this case (or maybe rather |
David's got good points here. There are a lot of far better ways to do what we currently do, but switching over to them is the hard bit. We could do DimensionalData.jl, @cscherrer's TupleVectors.jl which I quite like, raw vectors of named tuples, vectors-of-vectors, etc. All of these kind of get us around the two big issues I see with the backend which are (a) forced equal chain sizes and (b) bad handling of multi-type chains. Ultimately I just kind of want anything that supports the convenience function |
Thanks @cpfiffer . Of course I like TupleVectors too, but the whole SampleChains thing hasn't worked out as easily as I had hoped. I think it's yet another case where we really need an abstract interface. To get there, I think it's important to consider different kinds of inference. A result might be an abstract vector of named tuples, but it could also be a Measure or Distribution (using VI or Laplace approximation) or even a continuous trajectory (using @mschauer's ZigZagBoomerang). For standard MCMC samples, TupleVectors works great. But to be more general, I think there just needs to be
As an aside, I hope more packages can work toward an iterator interface rather than a fixed number of samples. But that's more of a community concern, very hard to enforce. |
I decided to separate this out from the question of whether we want to drop the use of regular arrays or switch to TupleVectors/StructArrays. Would we want to switch to DimArrays (or potentially AxisKeys, although that has some problems)?
The text was updated successfully, but these errors were encountered: