Skip to content

Commit

Permalink
Reduce copies in VariableMap implementation
Browse files Browse the repository at this point in the history
The VariableMap currently uses a Scala Map to look up variable
instances, where the key is the QName of the variable and the value is
the variable instance stack.

One consideration with the data structure to be used is that it must be
frequently copied each time a suspension is created. So although fast
lookups are important, so are fast copies. And unfortunately, Map have
relatively expensive copies. A large backing array must be created,
buckets allocated, keys must be rehashed, etc. This overhead is enough
that it can be a significant contribution when profiling, especially in
formats with lots of suspensions.

To fix this, this replaces the Map with an Array, and each
VariableRuntimeData is assigned an index into this array. Now, instead
of looking up a variable by its QName, we just get its index from the
VRD and access that index in the array. This has the same constant time
lookup, but array copies are faster and require fewer and smaller
allocations.

Similarly, during suspensions we must also create an immutable copy of
each variable instance inside the array so that suspensions do not see
new variables instances. Variable instances are currently represented as
an ArrayBuffer, which has noticeable overhead when copying. Since this
ArrayBuffer is just used as a stack, this replaces it with a Seq. This
gives us similar stack-like behavior, but is immutable so copies are
free. This does mean however that new Seq's must be allocated if we
create a lot of newVariableInstances, but that is rare enough, and
still fairly fast, that the benefit of preallocating an ArrayBuffer
stack does not have significant benefits.

Note that the Seq of VariableRuntimeData is added as a member to the
VariableMap since in some cases (e.g. debugging, setting external
variables), we do need a way to lookup the VRD by QName. This will be
relatively slow, so any performance critical sections should find the
VRD during compilation and use that during runtime.

This also removes the VariableMapFactory, it doesn't add much value
except a level of indirection.

Also removes the allExternalVariables val. Nothing uses it and
VariableMap handles checking external variables itself by inspecting the
VariableRuntimeData. We don't need this and it is trivial enough to
implement if something else needs it.

In a large schema with small files and lots of suspensions, this saw
around a 50% improvement in unparse performance.

DAFFODIL-2852
  • Loading branch information
stevedlawrence committed Oct 20, 2023
1 parent 716e533 commit 3111a1d
Show file tree
Hide file tree
Showing 9 changed files with 183 additions and 210 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -571,12 +571,9 @@ final class SchemaSet private (
Seq(encDFV, boDFV, binDFV, outDFV)
}

lazy val allDefinedVariables = schemas.flatMap {
_.defineVariables
}
lazy val allExternalVariables = allDefinedVariables.filter {
_.external
}
lazy val allDefinedVariables = schemas
.flatMap(_.defineVariables)
.union(predefinedVars)

private lazy val checkUnusedProperties = LV('hasUnusedProperties) {
root.checkUnusedProperties
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@
package org.apache.daffodil.core.runtime1

import org.apache.daffodil.core.dsom.SchemaSet
import org.apache.daffodil.core.grammar.VariableMapFactory
import org.apache.daffodil.lib.util.Logger
import org.apache.daffodil.runtime1.api.DFDL
import org.apache.daffodil.runtime1.processors.DataProcessor
Expand All @@ -36,11 +35,8 @@ trait SchemaSetRuntime1Mixin {
requiredEvaluationsAlways(root.elementRuntimeData.initialize)

override lazy val variableMap: VariableMap = LV('variableMap) {
val dvs = allSchemaDocuments.flatMap {
_.defineVariables
}
val alldvs = dvs.union(predefinedVars)
val vmap = VariableMapFactory.create(alldvs)
val vrds = allDefinedVariables.map { _.variableRuntimeData }
val vmap = VariableMap(vrds)
vmap
}.value

Expand Down

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ trait DFDLDefineVariableRuntime1Mixin { self: DFDLDefineVariable =>
this.namedQName.asInstanceOf[GlobalQName],
this.primType,
this.tunable.unqualifiedPathStepPolicy,
this.schemaSet.allDefinedVariables.indexOf(this),
)
vrd
}
Expand Down Expand Up @@ -97,6 +98,7 @@ trait DFDLNewVariableInstanceRuntime1Mixin { self: DFDLNewVariableInstance =>
defv.namedQName.asInstanceOf[GlobalQName],
defv.primType,
this.tunable.unqualifiedPathStepPolicy,
this.schemaSet.allDefinedVariables.indexOf(defv),
)
vrd
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,6 @@ import scala.xml._

import org.apache.daffodil.core.compiler.Compiler
import org.apache.daffodil.core.dsom._
import org.apache.daffodil.core.grammar.VariableMapFactory
import org.apache.daffodil.io.InputSourceDataInputStream
import org.apache.daffodil.lib.Implicits._
import org.apache.daffodil.lib.api._
Expand Down Expand Up @@ -389,7 +388,7 @@ class Fakes private () {
override def isError: Boolean = false

override def tunables: DaffodilTunables = DaffodilTunables()
override def variableMap: VariableMap = VariableMapFactory.create(Nil)
override def variableMap: VariableMap = VariableMap(Nil)
override def validationMode: ValidationMode.Type = ValidationMode.Full

override def withExternalVariables(extVars: Seq[Binding]): DFDL.DataProcessor = this
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -980,6 +980,7 @@ final class VariableRuntimeData(
val globalQName: GlobalQName,
val primType: NodeInfo.PrimType,
unqualifiedPathStepPolicyArg: UnqualifiedPathStepPolicy,
val vmapIndex: Int,
) extends NonTermRuntimeData(
null, // no variable map
schemaFileLocationArg,
Expand Down
Loading

0 comments on commit 3111a1d

Please sign in to comment.