Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce copies in VariableMap implementation #1097

Merged

Conversation

stevedlawrence
Copy link
Member

The VariableMap currently uses a Scala Map to look up variable instances, where the key is the QName of the variable and the value is the variable instance stack.

One consideration with the data structure to be used is that it must be frequently copied each time a suspension is created. So although fast lookups are important, so are fast copies. And unfortunately, Map have relatively expensive copies. A large backing array must be created, buckets allocated, keys must be rehashed, etc. This overhead is enough that it can be a significant contribution when profiling, especially in formats with lots of suspensions.

To fix this, this replaces the Map with an Array, and each VariableRuntimeData is assigned an index into this array. Now, instead of looking up a variable by its QName, we just get its index from the VRD and access that index in the array. This has the same constant time lookup, but array copies are faster and require fewer and smaller allocations.

Similarly, during suspensions we must also create an immutable copy of each variable instance inside the array so that suspensions do not see new variables instances. Variable instances are currently represented as an ArrayBuffer, which has noticeable overhead when copying. Since this ArrayBuffer is just used as a stack, this replaces it with a Seq. This gives us similar stack-like behavior, but is immutable so copies are free. This does mean however that new Seq's must be allocated if we create a lot of newVariableInstances, but that is rare enough, and still fairly fast, that the benefit of preallocating an ArrayBuffer stack does not have significant benefits.

Note that the Seq of VariableRuntimeData is added as a member to the VariableMap since in some cases (e.g. debugging, setting external variables), we do need a way to lookup the VRD by QName. This will be relatively slow, so any performance critical sections should find the VRD during compilation and use that during runtime.

This also removes the VariableMapFactory, it doesn't add much value except a level of indirection.

In a large schema with small files and lots of suspensions, this saw around a 50% improvement in unparse performance.

DAFFODIL-2852

Copy link
Contributor

@tuxji tuxji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Changes look good.

There is a codecov warning saying that daffodil-runtime1/src/main/scala/org/apache/daffodil/runtime1/processors/VariableMap1.scala#L398-L399 were not covered by tests. This warning may or may not be spurious because these 2 lines are preceded by 2 lines which received no warnings and these 4 lines should have been executed together or not at all.

Copy link
Contributor

@mbeckerle mbeckerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 Some questions, maybe small optimizations, and this race condition thing ... which feels like a separate bug. But nothing that has to get fixed here.

def qnames(): Seq[GlobalQName] = {
vTable.toSeq.map(_._1)
vrds.map { _.globalQName }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a constant, as in could be lazy val?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, will change. Though on second though, maybe one thing to consider is that making things a lazy val will keep the memory around. Memory seems to be our bottle neck, so maybe for things expected to be accessed rarely or only once, it's better to just recalculate it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just in the variable map though? Isn't that a constant data structure?
Either way no biggie. It's documented as linear so all good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll make it a lazy val.

I'm just thinking in general there might be cases where something is a lazy val, but it's only ever used once, so it's not worth saving and taking up memory--keeping it a def might let it be garbage collected sooner. I don't think this is one of those cases and a lazy val makes sense.

The VariableMap currently uses a Scala Map to look up variable
instances, where the key is the QName of the variable and the value is
the variable instance stack.

One consideration with the data structure to be used is that it must be
frequently copied each time a suspension is created. So although fast
lookups are important, so are fast copies. And unfortunately, Map have
relatively expensive copies. A large backing array must be created,
buckets allocated, keys must be rehashed, etc. This overhead is enough
that it can be a significant contribution when profiling, especially in
formats with lots of suspensions.

To fix this, this replaces the Map with an Array, and each
VariableRuntimeData is assigned an index into this array. Now, instead
of looking up a variable by its QName, we just get its index from the
VRD and access that index in the array. This has the same constant time
lookup, but array copies are faster and require fewer and smaller
allocations.

Similarly, during suspensions we must also create an immutable copy of
each variable instance inside the array so that suspensions do not see
new variables instances. Variable instances are currently represented as
an ArrayBuffer, which has noticeable overhead when copying. Since this
ArrayBuffer is just used as a stack, this replaces it with a Seq. This
gives us similar stack-like behavior, but is immutable so copies are
free. This does mean however that new Seq's must be allocated if we
create a lot of newVariableInstances, but that is rare enough, and
still fairly fast, that the benefit of preallocating an ArrayBuffer
stack does not have significant benefits.

Note that the Seq of VariableRuntimeData is added as a member to the
VariableMap since in some cases (e.g. debugging, setting external
variables), we do need a way to lookup the VRD by QName. This will be
relatively slow, so any performance critical sections should find the
VRD during compilation and use that during runtime.

This also removes the VariableMapFactory, it doesn't add much value
except a level of indirection.

Also removes the allExternalVariables val. Nothing uses it and
VariableMap handles checking external variables itself by inspecting the
VariableRuntimeData. We don't need this and it is trivial enough to
implement if something else needs it.

In a large schema with small files and lots of suspensions, this saw
around a 50% improvement in unparse performance.

DAFFODIL-2852
@stevedlawrence stevedlawrence force-pushed the daffodil-2852-variable-map-copies branch from d5ebd71 to 3111a1d Compare October 20, 2023 19:26
@stevedlawrence stevedlawrence merged commit f3fb555 into apache:main Oct 20, 2023
@stevedlawrence stevedlawrence deleted the daffodil-2852-variable-map-copies branch October 20, 2023 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants