Skip to content

Commit

Permalink
Optimization: std.sort should only evaluate keyF once per array eleme…
Browse files Browse the repository at this point in the history
…nt (#245)

This PR improves the performance of `std.sort` and related functions
when `keyF` is used.

The existing implementation evaluates `keyF` multiple times per input
element because:

- Prior to sorting, it evaluates `keyF` on all elements to check whether
all keys are of the same type.
- During sorting, it re-evaluates `keyF` on every pair of compared
elements.

In the best case (an already-sorted array), this performs ~3x more
evaluations than needed because each element participates in up to two
extra unnecessary comparisons. In the worst case, we have to do
additional comparisons during sorting and the unnecessary work will be
even higher.

### The fix

The fix:

- Precompute all keys up front (which we already do for type-checking
purposes).
- Sort an array of indices using a comparator which fetches their
corresponding precomputed keys
- Use the sorted indices to project out the array values in the correct
order

I also made a few other small improvements:

- Explicitly error out when trying to sort arrays of booleans: neither
jsonnet nor go-jsonnet supports this. The existing sjsonnet code didn't
either, but failed with a confusing `"Cannot sort with key values that
are not all the same type"` error because `Val.True` and `Val.False` are
different classes. The existing code which did class equality checks on
`Val.Bool` would never match because `Val.Bool` is an abstract class.
- Avoid allocating a `keyTypes` set: we can simply check that all other
elements match the first type's element. This saves some garbage
allocations in the common case.
- Move `.force` calls earlier so that we don't have to call them in the
sort comparator.

### Benchmarking results

Consider the following toy benchmark case:

```jsonnet
local largeArr = [
  {
    complexKey: { a: i, b: i+1, c: i+2 },
    value: "val" + i
  }
  for i in std.range(0, 9999)
];

local sortedArr = std.sort(
  largeArr,
  keyF=function(x) std.toString(x.complexKey)
);

{ sortedArrSample: sortedArr[0:5] }
```

With the `RunProfiler` we can see an enormous difference in the number
of `std.toString` invocations via the key function: with a standard 5
benchmark runs, we expect to see only 50,000 hits but the old code ran
it `241,210` times!

I also measured performance on one of our real-world jsonnet bundles,
where this PR's optimization cut one expensive target's runtime by 25%.
  • Loading branch information
JoshRosen authored Dec 29, 2024
1 parent 89c04a0 commit 6f4e54b
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 18 deletions.
41 changes: 23 additions & 18 deletions sjsonnet/src/sjsonnet/Std.scala
Original file line number Diff line number Diff line change
Expand Up @@ -1700,34 +1700,39 @@ class Std(private val additionalNativeFunctions: Map[String, Val.Builtin] = Map.
} else {
val keyFFunc = if (keyF == null || keyF.isInstanceOf[Val.False]) null else keyF.asInstanceOf[Val.Func]
new Val.Arr(pos, if (keyFFunc != null) {
val keys = new Val.Arr(pos.noOffset, vs.map(v => keyFFunc(Array(v.force), null, pos.noOffset)(ev)))
val keyTypes = keys.iterator.map(_.force.getClass).toSet
if (keyTypes.size != 1) {
val keys: Array[Val] = vs.map(v => keyFFunc(Array(v.force), null, pos.noOffset)(ev).force)
val keyType = keys(0).getClass
if (classOf[Val.Bool].isAssignableFrom(keyType)) {
Error.fail("Cannot sort with key values that are booleans")
}
if (!keys.forall(_.getClass == keyType)) {
Error.fail("Cannot sort with key values that are not all the same type")
}

if (keyTypes.contains(classOf[Val.Str])) {
vs.sortBy(v => keyFFunc(Array(v.force), null, pos.noOffset)(ev).cast[Val.Str].asString)
} else if (keyTypes.contains(classOf[Val.Num])) {
vs.sortBy(v => keyFFunc(Array(v.force), null, pos.noOffset)(ev).cast[Val.Num].asDouble)
} else if (keyTypes.contains(classOf[Val.Bool])) {
vs.sortBy(v => keyFFunc(Array(v.force), null, pos.noOffset)(ev).cast[Val.Bool].asBoolean)
val indices = Array.range(0, vs.length)

val sortedIndices = if (keyType == classOf[Val.Str]) {
indices.sortBy(i => keys(i).cast[Val.Str].asString)
} else if (keyType == classOf[Val.Num]) {
indices.sortBy(i => keys(i).cast[Val.Num].asDouble)
} else {
Error.fail("Cannot sort with key values that are " + keys.force(0).prettyName + "s")
Error.fail("Cannot sort with key values that are " + keys(0).prettyName + "s")
}

sortedIndices.map(i => vs(i))
} else {
val keyTypes = vs.map(_.force.getClass).toSet
if (keyTypes.size != 1) {
Error.fail("Cannot sort with values that are not all the same type")
val keyType = vs(0).force.getClass
if (classOf[Val.Bool].isAssignableFrom(keyType)) {
Error.fail("Cannot sort with values that are booleans")
}
if (!vs.forall(_.force.getClass == keyType))
Error.fail("Cannot sort with values that are not all the same type")

if (keyTypes.contains(classOf[Val.Str])) {
if (keyType == classOf[Val.Str]) {
vs.map(_.force.cast[Val.Str]).sortBy(_.asString)
} else if (keyTypes.contains(classOf[Val.Num])) {
} else if (keyType == classOf[Val.Num]) {
vs.map(_.force.cast[Val.Num]).sortBy(_.asDouble)
} else if (keyTypes.contains(classOf[Val.Bool])) {
vs.map(_.force.cast[Val.Bool]).sortBy(_.asBoolean)
} else if (keyTypes.contains(classOf[Val.Obj])) {
} else if (keyType == classOf[Val.Obj]) {
Error.fail("Unable to sort array of objects without key function")
} else {
Error.fail("Cannot sort array of " + vs(0).force.prettyName)
Expand Down
5 changes: 5 additions & 0 deletions sjsonnet/test/src/sjsonnet/StdWithKeyFTests.scala
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,11 @@ object StdWithKeyFTests extends TestSuite {
evalErr("""std.sort([1,2, error "foo"])""").startsWith("sjsonnet.Error: foo"))
assert(
evalErr("""std.sort([1, [error "foo"]])""").startsWith("sjsonnet.Error: Cannot sort with values that are not all the same type"))
// google/go-jsonnet and google/jsonnet also error on sorting of booleans:
assert(
evalErr("""std.sort([false, true])""").startsWith("sjsonnet.Error: Cannot sort with values that are booleans"))
assert(
evalErr("""std.sort([1, 2], keyF=function(x) x == 1)""").startsWith("sjsonnet.Error: Cannot sort with key values that are booleans"))

eval(
"""local arr = [
Expand Down

0 comments on commit 6f4e54b

Please sign in to comment.