-
-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doc: resolve inconsistency in banned surrogate chars in escaped form #450
Conversation
Also add it explicitly to the spec grammar to make it clear that escaped forms are also affected
I really didn’t want to change the spec at this stage. I think we need to address this with a more explicit errata system that leaves the spec itself intact. The nice thing about this particular change is that most parsers out there will almost certainly already fail if someone tries to do this (except maybe JS parsers that don’t parse docs as actual utf-8). It’s also a significant security issue so we should definitely address it. @eugenesvk what are your thoughts on how to organize errata? Maybe we can add a section to the SPEC.md file for it? Or a separate file? @tabatkins do you have any thoughts/experience with how other standards might handle errata without publishing new versions? My only real experience with it is Common Lisp: https://www.cliki.net/ANSI%20Clarifications%20and%20Errata |
It looks like this is a change in wording only, not in the actual behavior, since the
I think the grammar change also doesn't produce any differences in behavior; the grammar only tells how escapes are analyzed, not what they are resolved into, and the process of resolving escapes is not necessarily total (On an unrelated note, could #449 be merged? it's currently the only test that's incorrect) edit: If an implementation allows surrogates in |
For me errata only makes sense if it has the most recent grammar in its full form as otherwise it'd be confusing trying to parse the rules. And grammar's design should get continuous improvements so that whoever decides later to use it to build some new parser won't have to rediscover the issues (or worse - not discover them). (for example, while updating the syntax highlighter I've WIP converted the grammar to KDL and made the element hierarchy explicit to make it easier to use as a reference) And the current Changelog could continue adding updates similar to how it tracked draft revisions (And yes, the JS parser had this bug, not sure about others as most don't have easily runnable binaries/scripts published) |
Ahhh, so we do specify that already. I’ll let someone else chime in first so it’s not just me but I’m ok with this change. |
Sounds good to me, though I'd like to see a test added as well so other implementations can easily detect if they've made the same mistake as I have. |
added a test separately |
I’ll merge this for now, then |
Me: “I’ll merge this now” |
Also add it explicitly to the spec grammar to make it clear that escaped forms are also affected
Since currently
Disallowed Literal Code Points
text allows surrogates in\u
form, also the spec grammar doesn't specify thatdisallowed-literal-code-points
should be excluded even in their hex code form