choice randomization: better approximation of JR behaviour, fixes #49 #241

brontolosone · 2024-10-14T16:00:35Z

Closes #49

I have verified this PR works in these browsers (latest versions):

Chromium
Firefox

Some related problems remain to be solved:

String to number conversion: Differences between JavaRosa and Webforms #240
Seed for randomization derived from non-numeric-looking string is always 0 javarosa#800

This brings what Webforms does more in line (barring #240) with what Javarosa does, and as such fixes the immediate problem of #49.

I felt it was worth it to be verbose with the comments here, so check those out.

This story is not over yet. Depending on whether we deem it OK to change the seed derivation algo, I'd like to make it value type/length agnostic and would just hash the input in its textual form and derive a seed from that hash - see getodk/javarosa#800. And in that case this code will need to be altered again.

changeset-bot · 2024-10-14T16:00:38Z

🦋 Changeset detected

Latest commit: 6628199

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
@getodk/xpath	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

eyelidlessness

This is great. I really appreciate how the commentary here tells the why story!

As discussed a bit in Slack, I think a couple of adjustments would make the change clearer. And would either make the need for some of the commentary moot, or make the remaining commentary more useful.

There's a clear "JavaRosa compatibility" responsibility here. While it is inherently coupled to the seededRandomize implementation, it is also very specifically a mapping to a more general concept: longValue. I think it would help immensely for future understanding of what's going on here if we make that an explicit function, with the same name.
In general, I've found liberal use of JSDoc comments (i.e. /** ... */) really helpful. The comment style provides support for all sorts of editor functionality.

Here, we'd get a lot of benefit from inline linking support (i.e. {@link $URL_OR_REFERENCE} and/or {@link $URL_OR_REFERENCER | more specific title}). In particular I think a permalink to the JavaRosa resolveRandomSeed method would be useful, and probably also links to the pertinent issues. This ties directly to point 1: giving the JavaRosa-compatible thing a name corresponding to the Java thing it emulates, also gives a clear place for that JSDoc to reference it and clarify the nuances it's addressing.

I think a few tweaks to this comment would be pretty much perfect.
We can eliminate the divergences from JavaRosa by using BigInt values for several of these cases. This, combined with their usage in a clearly articulated longValue JR-equivalent would also eliminate the need for commentary on those cases. In my local exploration of this, what I found made the most sense with the least fuss was to change type Int = number to type Int = bigint | number, then have the longValue equivalent also produce that Int type. Pretty much everything else that would need to change falls out of that (i.e. any mixed-type operators producing fractional values do the appropriate explicit Number() casts to preserve those mathematical semantics).

It'd also be useful for the Infinity/-Infinity cases to be bound to constants with clear names. Insofar as there's still benefit to commentary on those, JSDoc on those constants is a good place.

Aside from making some of the intent clearer here, I suspect we may find there are other edge cases where we want to cordon off JavaRosa-compat/Java-isms in a general and reusable way. Even if that seems like a premature abstraction, doing it in this case is a direct, 1:1 linkable reference to the existing abstraction we'll be emulating.

Edit: oh, and this definitely feels like it deserves a changeset.

brontolosone · 2024-10-24T10:23:11Z

This needs updates for:

incorporating the fallback for non-numeric-looking seed nodes to hash-based seed derivation of hash un-numeric input when used as PRNG seed, fixes #800 javarosa#801
incorporating the followup thereof; the behaviour for zero-length strings of Extend randomize seed tests javarosa#805

Drafting!

…odk#49

…urcecode

brontolosone · 2024-12-16T21:12:45Z

This needs updates for:

* incorporating the fallback for non-numeric-looking seed nodes to hash-based seed derivation of [hash un-numeric input when used as PRNG seed, fixes #800 javarosa#801](https://github.com/getodk/javarosa/pull/801)

* incorporating the followup thereof; the behaviour for zero-length strings of [Extend randomize seed tests javarosa#805](https://github.com/getodk/javarosa/pull/805)

Done!

eyelidlessness

Thanks! I think this is really close. Most of my remaining feedback is around code clarity (naming, separation of responsibilities, accessibility and applicability of comments).

packages/xpath/src/functions/xforms/node-set.ts

eyelidlessness · 2024-12-17T19:24:55Z

packages/xpath/src/functions/xforms/node-set.ts

+function toBigIntHash(text: string): bigint {
+	// hash text with sha256, and interpret the first 64 bits of output
+	// (the first and second int32s ("words") of CryptoJS digest output)
+	// as a BigInt. Thus the entropy of the hash is reduced to 64 bits, which
+	// for some applications is sufficient.
+	// The underlying representations are big-endian regardless of the endianness
+	// of the machine this runs on, as is the equivalent JavaRosa implementation
+	// at https://github.com/getodk/javarosa/blob/ab0e8f4da6ad8180ac7ede5bc939f3f261c16edf/src/main/java/org/javarosa/xpath/expr/XPathFuncExpr.java#L718-L726


Suggested change

function toBigIntHash(text: string): bigint {

// hash text with sha256, and interpret the first 64 bits of output

// (the first and second int32s ("words") of CryptoJS digest output)

// as a BigInt. Thus the entropy of the hash is reduced to 64 bits, which

// for some applications is sufficient.

// The underlying representations are big-endian regardless of the endianness

// of the machine this runs on, as is the equivalent JavaRosa implementation

// at https://github.com/getodk/javarosa/blob/ab0e8f4da6ad8180ac7ede5bc939f3f261c16edf/src/main/java/org/javarosa/xpath/expr/XPathFuncExpr.java#L718-L726

/**

* Hash text with sha256, and interpret the first 64 bits of output (the first

* and second int32s ("words") of CryptoJS digest output) as a BigInt. Thus the

* entropy of the hash is reduced to 64 bits, which for some applications is

* sufficient. The underlying representations are big-endian regardless of the

* endianness of the machine this runs on, as is the

* {@link https://github.com/getodk/javarosa/blob/ab0e8f4da6ad8180ac7ede5bc939f3f261c16edf/src/main/java/org/javarosa/xpath/expr/XPathFuncExpr.java#L718-L726 | equivalent JavaRosa implementation}.

*/

const toBigIntHash = (text: string): bigint => {

As a JSDoc comment, this allows the same documentation to be accessed at the call site.

Switching to an arrow function is somewhat a nit, but it's generally preferable to avoid unnecessary function functions as they have confusing behavior. (Maybe that's also a thing we could lint?)

ea5c499 removes the function keyword.

As for multiline comments: I don't like them. My editor is not supremely ergonomic with it, especially with the decorative * in front of each line. Which, anyway, diminish the advantages of multiline comments — now one has to prefix each line with * instead of //, PLUS still manage the actual comment start and end markers - how is that a win over just plain simple // line comments, I wonder?
Github is also not super smart with them, look at the "keyword" syntax highlighting it applied to the diff just above! So I don't like to use that comment style myself but if someone else does, they're welcome to ;-)

As for JSDoc links, I don't like them. They move the description of the link to after the link (cf. Markdown). So then to read what the link is doing there, what it's for, I first need to scan to the end of a long URL. The hypothetical usability gain is that if you have an IDE that is smart with specifically JSDoc comments, you can click the link? Copy-pasting isn't so bad and anyway most things — my editor, my terminal — already make http(s)-URLs clickable (or ctrl-clickable). Not worth the disruption of the natural text reading flow to me, but I won't complain if someone else makes {@link https://asdsadsoewofihewofbcewnco.ewewrewrewrewconlnzc.cwefewr.few.cpoqwjeansls | these kind of links}, I just won't emit them myself ;-)

Less is more!

I'll try to make the case for JSDoc. If you're open to reconsidering here, that would be excellent. If not, I think we should come back to this discussion as a team.

JSDoc comments are a standard designed to encode structured documentation about any symbol they're attached to.

The editor support alone goes well beyond linking to URLs. For example, the ability to reference documentation across modules are invaluable. Linking to other symbols (both within and across modules) is also invaluable, both as a navigation tool and because they can be kept up to date as those symbols change.

Beyond editor support, being a standard for structured documentation and association with symbols, JSDoc can be used for documentation output. We are already using this to generate documentation for @getodk/xforms-engine. I'd quite like to expand that to other use cases.

I share your distaste for some of the syntax minutiae of JSDoc, and @link is particularly weird (I suspect this is because inline tags are relatively rare). But that distaste doesn't outweigh the overwhelming benefits of an extensible documentation standard which is widely adopted in tooling we already use. It is also widely adopted throughout this project, and across the ecosystem; which is to say, it is both locally and globally idiomatic.

I also noticed GitHub's odd presentation in a couple diff suggestions in this PR. It's worth noting that:

That's not representative of how GitHub presents JSDoc in complete source

It's not representative of how GitHub presents JSDoc in diffs broadly

GitHub's syntax highlighting is notoriously inconsistent across various views

The highlighting is applied to an incomplete (i.e. syntactically invalid) chunk of code, which likely exacerbates potential issues

Lastly, I am sensitive to poor authoring ergonomics. I'm a bit surprised to hear that your editor doesn't make adding/editing JSDoc comments easier than single line comments, as that's my experience in the editors I'm familiar with. If this is a major hangup, I'd be happy to help look into ways to make the authoring experience nicer for you.

I'll make the change hoping that the benefits will become apparent at some point 😆

packages/xpath/src/lib/collections/sort.ts

packages/xpath/test/xforms/randomize.test.ts

eyelidlessness · 2024-12-17T19:38:14Z

packages/xpath/src/functions/xforms/node-set.ts

-		return seededRandomize(nodes, seed);
+		if (seedExpression === undefined) return seededRandomize(nodes);
+		const seed = seedExpression.evaluate(context);
+		const asNumber = seed.toNumber(); // TODO: There are some peculiarities to address: https://github.com/getodk/web-forms/issues/240


I'm not sure this comment belongs here. It isn't specific to this cast, it's specific to casting to XPath number throughout. Fine to leave since we have an issue tracking it, but we'll probably just find it went stale some time after we address the issue.

This is intended for someone reading the randomization code when trying to figure out why WF and JR still produce different sort orders. If it goes stale (when the issue is resolved) then following the link to the issue will make that apparent. I don't see a big problem.

eyelidlessness · 2024-12-17T19:41:13Z

packages/xpath/src/functions/xforms/node-set.ts

+		let finalSeed: number | bigint | undefined;
+		if (Number.isNaN(asNumber)) {
+			// Specific behaviors for when a seed value is not interpretable as numeric.
+			// We still want to derive a seed in those cases, see https://github.com/getodk/javarosa/issues/800
+			const seedString = seed.toString();
+			if (seedString === '') {
+				finalSeed = 0; // special case: JR behaviour
+			} else {
+				// any other string, we'll convert to a number via a digest function
+				finalSeed = toBigIntHash(seedString);
+			}
+		} else {
+			finalSeed = asNumber;
+		}


Doesn't "special case: JR behavior" apply to all of this?

Not really. Some of the behaviour is in the odk spec. The "zero-length-string becomes 0" behaviour was surprising though.

Co-authored-by: eyelidlessness <[email protected]>

brontolosone requested a review from eyelidlessness October 14, 2024 16:00

brontolosone marked this pull request as draft October 14, 2024 17:40

brontolosone force-pushed the 49_randomization_seed_inputs branch 2 times, most recently from 69be0c6 to 6432f97 Compare October 15, 2024 09:26

brontolosone marked this pull request as ready for review October 15, 2024 09:53

eyelidlessness requested changes Oct 16, 2024

View reviewed changes

brontolosone marked this pull request as draft October 24, 2024 10:23

brontolosone added 2 commits December 10, 2024 14:49

choice randomization: better approximation of JR behaviour, fixes get…

265d0d5

…odk#49

thanks for the diff noise, prettier

159f06c

brontolosone force-pushed the 49_randomization_seed_inputs branch from 706f380 to 159f06c Compare December 10, 2024 14:50

link to org.javarosa.core.model.ItemsetBinding.resolveRandomSeed() so…

bf0008c

…urcecode

brontolosone force-pushed the 49_randomization_seed_inputs branch 2 times, most recently from c435390 to b7cce27 Compare December 16, 2024 20:10

simplify: make ParkMiller PRNG accept a BigInt seed, fix -Infinity

2bda8af

brontolosone force-pushed the 49_randomization_seed_inputs branch 2 times, most recently from 5fa9d75 to 9bb403c Compare December 16, 2024 21:02

derive a numeric seed from non-numeric-looking seed strings via digest

c83fef2

brontolosone force-pushed the 49_randomization_seed_inputs branch from 9bb403c to c83fef2 Compare December 16, 2024 21:08

brontolosone requested a review from eyelidlessness December 16, 2024 21:12

brontolosone marked this pull request as ready for review December 17, 2024 10:20

Create strange-brooms-rush.md

954f9e9

eyelidlessness requested changes Dec 17, 2024

View reviewed changes

brontolosone and others added 3 commits December 18, 2024 08:59

the whitespace is telling

761d7da

cleanup

9b7ffc9

Co-authored-by: eyelidlessness <[email protected]>

const-ize toBigIntHash

ea5c499

brontolosone requested a review from eyelidlessness December 18, 2024 15:18

address various PR comments

cb6a021

brontolosone force-pushed the 49_randomization_seed_inputs branch from b36d0a1 to cb6a021 Compare December 18, 2024 15:19

jsdocify

6628199

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

choice randomization: better approximation of JR behaviour, fixes #49 #241

choice randomization: better approximation of JR behaviour, fixes #49 #241

brontolosone commented Oct 14, 2024

changeset-bot bot commented Oct 14, 2024 •

edited

Loading

eyelidlessness left a comment •

edited

Loading

brontolosone commented Oct 24, 2024

brontolosone commented Dec 16, 2024

eyelidlessness left a comment

eyelidlessness Dec 17, 2024

brontolosone Dec 18, 2024

eyelidlessness Dec 18, 2024

brontolosone Dec 20, 2024

brontolosone Jan 8, 2025

eyelidlessness Dec 17, 2024

brontolosone Dec 18, 2024

eyelidlessness Dec 17, 2024

brontolosone Dec 18, 2024

choice randomization: better approximation of JR behaviour, fixes #49 #241

Are you sure you want to change the base?

choice randomization: better approximation of JR behaviour, fixes #49 #241

Conversation

brontolosone commented Oct 14, 2024

I have verified this PR works in these browsers (latest versions):

changeset-bot bot commented Oct 14, 2024 • edited Loading

🦋 Changeset detected

eyelidlessness left a comment • edited Loading

Choose a reason for hiding this comment

brontolosone commented Oct 24, 2024

brontolosone commented Dec 16, 2024

eyelidlessness left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

changeset-bot bot commented Oct 14, 2024 •

edited

Loading

eyelidlessness left a comment •

edited

Loading