You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, pipelines are mainly set to emit using method='xml' or method='text' as appropriate, and/or to indent outputs. In particular, this means that encoding=utf-8 is typically set by default. So UTF-8 is used.
Can and should any of this be exposed better or parameterized? Either of these scenarios is not impossible to imagine:
A schema or XSLT is wanted in ASCII format, with character escapes to represent characters not in ASCII, such as ߞ for the em dash (Unicode U+2014), to make the schema easier to consume by tools that handle ASCII but not UTF-8
A schema is wanted in UTF-16 encoding, simply for efficiency reasons in a non-European language, as compared to UTF-8
If not with switches or controls at the interface, these settings can be made in XSLTs or XProcs.
E.g. calling Saxon with the command-line switch -!encoding=ASCII
Additionally, and separate from the question of how to (re)set the encoding, should output encoding settings be 'ASCII' by default? ASCII-amenability could be a big plus with little down side in expressiveness.
Goals:
Minimally: test and document the presently available options for controlling serialization settings, including character encoding and markup indenting.
Consider also updating to 'ASCII' outputs for some artifacts such as XML schemas and XSLTs produced by XSLT or XProc.
Dependencies:
None known. Also not known is whether (where) any improvement is very impactful.
Acceptance Criteria
All website and readme documentation affected by the changes in this issue have been updated. Changes to the website can be made in the docs/content directory of your branch.
A Pull Request (PR) is submitted that fully addresses the goals of this User Story. This issue is referenced in the PR.
The CI-CD build process runs without any reported errors on the PR. This can be confirmed by reviewing that all checks have passed in the PR.
{The items above are general acceptance criteria for all User Stories. Please describe anything else that must be completed for this issue to be considered resolved.}
The text was updated successfully, but these errors were encountered:
User Story:
Currently, pipelines are mainly set to emit using method='xml' or method='text' as appropriate, and/or to indent outputs. In particular, this means that
encoding=utf-8
is typically set by default. So UTF-8 is used.Can and should any of this be exposed better or parameterized? Either of these scenarios is not impossible to imagine:
ߞ
for the em dash (Unicode U+2014), to make the schema easier to consume by tools that handle ASCII but not UTF-8If not with switches or controls at the interface, these settings can be made in XSLTs or XProcs.
E.g. calling Saxon with the command-line switch
-!encoding=ASCII
Additionally, and separate from the question of how to (re)set the encoding, should output encoding settings be 'ASCII' by default? ASCII-amenability could be a big plus with little down side in expressiveness.
Goals:
Minimally: test and document the presently available options for controlling serialization settings, including character encoding and markup indenting.
Consider also updating to 'ASCII' outputs for some artifacts such as XML schemas and XSLTs produced by XSLT or XProc.
Dependencies:
None known. Also not known is whether (where) any improvement is very impactful.
Acceptance Criteria
{The items above are general acceptance criteria for all User Stories. Please describe anything else that must be completed for this issue to be considered resolved.}
The text was updated successfully, but these errors were encountered: