-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add schema validation code to C generator #1098
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 👍 just a bunch of minor comments/suggestions
|
||
Reformat source and sbt files if necessary: | ||
|
||
sbt scalafmtAll scalafmtSbt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth adding commands for what you use to format h C code?
Maybe for a future PR, it might be worth adding an sbt task that executes those commands with the right options (e.g. sbt cfmtAll
) so we don't have to remember them. Could also be useful to add a step in our github actions lint job to run that and error if git diff reports and changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can write a sbt task which runs clang-format on the C files and errors if it finds and reports any diffs. However, some people may get an error running the task (especially if the task's purpose is to run all the language formatters from a single command) since clang-format is not commonly installed on many (most?) machines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. As long as the error is something useful that should be fine. I recommend doing it later though, doesn't have to be part of this PR.
|
||
sbt scalafmtAll scalafmtSbt | ||
|
||
### Build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we need another build section for building and locally publishing the jars? Anyone developing something integrating Daffodil may want that. Maybe something like this?
Build the Daffodil jars and publish to the a Maven or Ivy repository
Maven:
sbt publishM2
Ivy:
sbt publishLocal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll document the publish commands too and explain Ivy is also used by sbt to build schema-only projects.
break; | ||
case 's': | ||
// Ignore "-s schema" option/optarg | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand this, why of options to just ignore them? Seems that could be very confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A workflow I sometimes use with code generators, DFDL schemas, and TDML tests is to run daffodil generate
on the DFDL schemas, run a pair of daffodil parse
and c/daffodil parse
commands on the same data files, and check that they both produce the same exit status and similar infosets. This manual workflow is finer-grained than and sometimes easier to debug than running daffodil test
on TDML tests using the same data and infosets. I thought it could be nice if I could run the same c/daffodil parse
command as the daffodil parse
command by appending only c/
to daffodil
without having to change any of the options.
I thought I could hide this behavior from users (the usage message doesn't mention the -r and -s options) while making life easier for developers and testers who know about this hidden behavior for greater compatibility between daffodil
and c/daffodil
. I can revert this source file if you think users would be surprised and confused that c/daffodil
allows -r and -s options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, makes sense. As long as the optinos aren't in the usage I think it's fine. But it would be worth adding that reasoning as a comment.
@@ -42,7 +42,7 @@ extern struct daffodil_parse_cli | |||
const char *infoset_converter; | |||
const char *infile; | |||
const char *outfile; | |||
bool validate; | |||
bool validate; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this change 👍 . I've never been a huge fan of aligning variable names in C.
#include "errors.h" // for continue_or_exit, print_diagnostics, Error, Diagnostics, Error::(anonymous) | ||
#include "infoset.h" // for get_infoset, walk_infoset, PState, UState, parse_data, unparse_infoset, InfosetBase, VisitEventHandler | ||
#include "errors.h" // for continue_or_exit, Diagnostics, print_diagnostics, Error | ||
#include "infoset.h" // for ParserOrUnparserState, PState, UState, get_infoset, walk_infoset, parse_data, unparse_infoset, InfosetBase, VisitEventHandler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, do you find it useful to include the functions each header includes? Seems like it's just noise to me. I guess this is auto-generated so not a big deal, but personally I think it would be nice if this could be disabled. Especially since some of these lines get pretty long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just add a comment above the auto-maintained part indicating that these comments and the include list are auto-maintained by iwyu. I know the whole concept of this was news to me when I first read this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My viewpoint (coming back from developing Java to developing C) is that it would be nice if C header lines tell you which symbols they import just like Java import lines tell you which symbols they import. However, I can tell iwyu not to create these comments if you all don't like them. Then the remaining use case for iwyu is to sort and canonicalize C header lines just like Java/Scala tools sort and canonicalize Java/Scala import lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, I've added a // auto-maintained by iwyu
comment to each iwyu-maintained part.
if (e.typeDef.optRestriction.exists(_.hasEnumeration)) { | ||
// Split the enumeration values into an array of strings ready to be inserted into the C code | ||
val enumerationValues = e.typeDef.optRestriction.flatMap(_.enumerationValues).get | ||
val enums = enumerationValues.split('|').map(_.replaceAll("\\\\(\\W)", "$1")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe Daffodil generates a regex to check enumerations, so you're having undo that regex? Instead, I think you can access the raw enum values like this:
val enums = e.typeDef.optRestriction.get.enumerations.map(_.enumValueRaw)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try that expression, thanks.
s"{$arraysName[$index], ${s.length / 2}, false}" | ||
} | ||
val hexEnumsInit = enums.map(_.grouped(2).map("0x" + _).mkString("{", ", ", "}")) | ||
val enumsLenMax = enums.map(_.length / 2).max |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest hexEnumsLenMax
since this is specific to hex? Or move all the hex specific stuff to the HexBinary cases so we don't have to calculate this stuff where it doesn't apply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First change is easier, so will accept that change.
val (enumsInit, extraInit, primType, valType, fieldArg) = e.optPrimType.get match { | ||
case PrimType.Double | PrimType.Float => (enums, "", "double", "floatpt", field) | ||
case PrimType.HexBinary => (hexEnums, arraysInit, "HexBinary", "hexbinary", s"&$field") | ||
case _ => (enums, "", "int64_t", "integer", field) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there value in listing the integer types? I imagine this failing on things like Date/Time/Integer primitives. THough I guess those primitives would fail somewhere much earlier since they aren't supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I ever checked dates, times, or arbitrary length integers longer than 64 bits. I am sure they would fail somewhere else too since the code generator doesn't support them at all (it won't be able to find the right primitive C type, for example).
"{", | ||
", ", | ||
"}", | ||
)};\n""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any concerns that there might be a bunch of hex binary enums with one being much larger than the other, and so this array would have a lot of empty space? I guess the other option would be to create a bunch of separate variables (e.g. hexEnum1, hexEnum2, hexEnum3, ... 0), maybe that's not worth it. Or alternatively, concat the hex arrays into a single long array, and then then hexEnums have to correctly index into that array? Maybe that's too much complexity for a case that isn't that likely? I imagine most hex enums are all the same length, or at least close in length.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have never seen enumeration facets on hexBinary, so I wouldn't sweat it. I had to check the DFDL spec to see if they were even allowed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not seen any schemas with hex binary enums yet, so let's put off any changes until we see hex enums actually used. I only found out through experimentation that daffodil supports hex enums and thought daffodilC should try to support them too. I think your imagination that most hex enums are close in length is probably right anyway.
@@ -212,6 +212,8 @@ final class SchemaDocument private (xmlSDoc: XMLSchemaDocument) | |||
final override lazy val optLexicalParent = Some(xmlSDoc) | |||
final override lazy val optXMLSchemaDocument = Some(xmlSDoc) | |||
|
|||
final lazy val version = (xml \ "@version").text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know XSD had a version attribute. Coudld be useful!
Though, I wonder if this should be an Option so we can tell if it's provided or not? In the code gen case, it can just to optVersion.getOrElse("")
so it wouldn't change much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere on WWW is a diatribe on why using namespace URIs with versions in them is so wrong wrong wrong as a versioning means for XSD schemas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you change the line to return an Option[String]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere on WWW is a diatribe on why using namespace URIs with versions in them is so wrong wrong wrong as a versioning means for XSD schemas.
The built-in XML Schema version attribute is orthogonal to and separate from any namespace URI. We plan to take advantage of it to assign semantic versions (strings like "3.6.0") to our filters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you change the line to return an Option[String]?
(xml \ "@version")
returns a NodeSeq
, which has an implicit conversion to Seq[Node]
, so you can use any Seq
functions on it. So something like this should work:
final lazy val optVersion = (xml \ "@version").headOption.map(_.text)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alas, I merged the PR before I checked my emails and saw your reply, It's too late to put the change into the PR now, but let's keep it in mind for another PR:
final lazy val optVersion = (xml \ "@version").headOption.map(_.text)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
This is an amazing change set. Rather big, and long to review, but really great.
I have a bunch of minor suggestions and a few questions that would be good to understand better, but generally speaking really good.
|
||
sudo dnf install clang gcc git java-11-openjdk-devel llvm make mxml-devel pkgconf | ||
sudo dnf install clang gcc git iwyu java-11-openjdk-devel llvm make mxml-devel pkgconf | ||
|
||
If you want to use clang instead of gcc, you'll have to set your |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: why give people the choice?
We don't have to provide this flexibility unless we or users depend on it for some reason.
This works, so "it aint broke" so no need to change anything. I just wanted to comment that we don't really have to provide options like this unless they're really really required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're talking about the paragraph If you want to use clang instead of gcc, ...
, I originally added that paragraph because you told me you didn't want to require GNU-licensed tools to develop or use any part of Daffodil. I even made sure that the GitHub Action workflow (main.yml) would set these environment variables to build Daffodil with clang and lvvm-ar instead of gcc and ar. I can reword the paragraph to say you can set these environment variables to use clang and llvm-ar instead of gcc and ar if you have a policy forbidding GNU-licensed tools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, gcc is GPLv3 which is category X, so I had concerns about implying it was required. I did find this open Jira LEGAL issue that discusses this:
Sounds like the resolution it's it's fine to use. I'd suggest we still do mention clang support so that people can use it if they want and don't have to figure it all out. But maybe that's an addendum or something, and everything else implies the use of gcc if the goal is to simplify this?
#include "errors.h" // for continue_or_exit, print_diagnostics, Error, Diagnostics, Error::(anonymous) | ||
#include "infoset.h" // for get_infoset, walk_infoset, PState, UState, parse_data, unparse_infoset, InfosetBase, VisitEventHandler | ||
#include "errors.h" // for continue_or_exit, Diagnostics, print_diagnostics, Error | ||
#include "infoset.h" // for ParserOrUnparserState, PState, UState, get_infoset, walk_infoset, parse_data, unparse_infoset, InfosetBase, VisitEventHandler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just add a comment above the auto-maintained part indicating that these comments and the include list are auto-maintained by iwyu. I know the whole concept of this was news to me when I first read this code.
@@ -21,7 +21,7 @@ | |||
#include <stdbool.h> // for bool, false, true | |||
#include <stdio.h> // for fread, fgetc, EOF | |||
#include <stdlib.h> // for free, malloc | |||
#include "errors.h" // for Error, eof_or_error, Error::(anonymous), ERR_LEFTOVER_DATA, add_diagnostic, get_diagnostics, ERR_ARRAY_BOUNDS, ERR_FIXED_VALUE, ERR_HEXBINARY_ALLOC, ERR_PARSE_BOOL, Diagnostics | |||
#include "errors.h" // for Error, eof_or_error, ERR_LEFTOVER_DATA, Error::(anonymous), ERR_HEXBINARY_ALLOC, ERR_PARSE_BOOL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. iwyu specified what "errors.h" was needed for here, but above where SL commented on ERR_ZZZ there's no such comment on the include of "errors.h". What are the rules here about what iwyu maintains?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you're noticing is that iwyu never puts a comment on a C source file's own header file. That is, parsers.c "uses" a few symbols from errors.h, but errors.c "implements" nearly all of errors.h's symbols. Even iwyu put a comment anyway, the comment would be too long trying to list all the header symbols implemented by the source file.
const Error *error; // any error which stops parse | ||
uint8_t unreadBits; // any buffered bits not read yet | ||
uint8_t numUnreadBits; // number of buffered bits not read yet | ||
ParserOrUnparserState pu; // common mutable state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yessssss. embedded by value, not by reference. 👍
void | ||
parse_check_bounds(const char *name, size_t count, size_t minOccurs, size_t maxOccurs, PState *pstate) | ||
parse_fill_bits(size_t end_bitPos0b, PState *pstate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name confused me. It has to be "fill" as in short for the alignmentFill region.
Do you mind changing the name to parse_alignment_bits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I've changed the name.
|
||
<!-- Representation property bindings --> | ||
<!-- Network order binary format (net:format) --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know folks at Sun (back in the day) and IETF wanted to define "Network Byte Order", but they failed because Intel little-endian CPUs took over the world for a while and everybody serialized data little endian because it was easiest. Nobody sees the TCP/IP headers of packets. (or cares) They see their payload data, which is little-endian a huge fraction of the time.
So please say big or little endian here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, saying Network order big endian format (net:format)
.
limitations under the License. | ||
--> | ||
|
||
<!-- Network order binary format: <https://en.wikipedia.org/wiki/Endianness#Networking> --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really. Just tell people this is big endian.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, Network order big endian format.
<annotation> | ||
<appinfo source="http://www.ogf.org/dfdl/"> | ||
<dfdl:defineFormat name="format"> | ||
<dfdl:format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You really should define byteOrder here as bigEndian and bitOrder as mostSignificantBitFirst, because it matters in order to understand this format.
DFDLGeneralFormat.dfdl.xsd is not supposed to change, but it's also supposed to be mostly so you don't have to remember obscure properties Daffodil requires that your format otherwise wouldn't bother to mention.
For properties that really define the essence of the format, you should have them explicitly regardless of whether they match what is in DFDLGeneralFormat or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right, adding byteOrder as bigEndian and bitOrder as mostSignificantBitFirst to the format.
</simpleType> | ||
<simpleType name="simple-hexBinaryPrefixed" | ||
dfdl:prefixLengthType="si:prefixedCount" | ||
dfdl:lengthKind="prefixed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW: there was a DFDL clarification recently that a prefixedLengthType can have facets, and those are checked so as to be able to have say, a 32-bit integer as a lengthKind, but specify only sane values like < 1Million and even minimum lengths are to be checked.
This is super necessary if reading from a TCP socket. Last thing you want to do is a blocking read for 4Gigs of data.
Is generated C code going to implement this prefix value checking with your range validators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know about this requirement until now. It would be a good idea to implement prefix value checking, but not in this PR (going on vacation soon).
@@ -212,6 +212,8 @@ final class SchemaDocument private (xmlSDoc: XMLSchemaDocument) | |||
final override lazy val optLexicalParent = Some(xmlSDoc) | |||
final override lazy val optXMLSchemaDocument = Some(xmlSDoc) | |||
|
|||
final lazy val version = (xml \ "@version").text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere on WWW is a diatribe on why using namespace URIs with versions in them is so wrong wrong wrong as a versioning means for XSD schemas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, see responses below.
|
||
Reformat source and sbt files if necessary: | ||
|
||
sbt scalafmtAll scalafmtSbt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can write a sbt task which runs clang-format on the C files and errors if it finds and reports any diffs. However, some people may get an error running the task (especially if the task's purpose is to run all the language formatters from a single command) since clang-format is not commonly installed on many (most?) machines.
|
||
sbt scalafmtAll scalafmtSbt | ||
|
||
### Build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll document the publish commands too and explain Ivy is also used by sbt to build schema-only projects.
break; | ||
case 's': | ||
// Ignore "-s schema" option/optarg | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A workflow I sometimes use with code generators, DFDL schemas, and TDML tests is to run daffodil generate
on the DFDL schemas, run a pair of daffodil parse
and c/daffodil parse
commands on the same data files, and check that they both produce the same exit status and similar infosets. This manual workflow is finer-grained than and sometimes easier to debug than running daffodil test
on TDML tests using the same data and infosets. I thought it could be nice if I could run the same c/daffodil parse
command as the daffodil parse
command by appending only c/
to daffodil
without having to change any of the options.
I thought I could hide this behavior from users (the usage message doesn't mention the -r and -s options) while making life easier for developers and testers who know about this hidden behavior for greater compatibility between daffodil
and c/daffodil
. I can revert this source file if you think users would be surprised and confused that c/daffodil
allows -r and -s options.
#include "errors.h" // for continue_or_exit, print_diagnostics, Error, Diagnostics, Error::(anonymous) | ||
#include "infoset.h" // for get_infoset, walk_infoset, PState, UState, parse_data, unparse_infoset, InfosetBase, VisitEventHandler | ||
#include "errors.h" // for continue_or_exit, Diagnostics, print_diagnostics, Error | ||
#include "infoset.h" // for ParserOrUnparserState, PState, UState, get_infoset, walk_infoset, parse_data, unparse_infoset, InfosetBase, VisitEventHandler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My viewpoint (coming back from developing Java to developing C) is that it would be nice if C header lines tell you which symbols they import just like Java import lines tell you which symbols they import. However, I can tell iwyu not to create these comments if you all don't like them. Then the remaining use case for iwyu is to sort and canonicalize C header lines just like Java/Scala tools sort and canonicalize Java/Scala import lines.
continue_or_exit(ustate.pu.error); | ||
|
||
// Any diagnostics will fail the unparse if validate mode is on | ||
if (daffodil_parse.validate && ustate.pu.diagnostics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right on both points. It would be even easier to merge daffodil_parse_cli
and daffodil_unparse_cli
into daffodil_pu_cli
than it would be to add validate
to daffodil_unparse_cli
.
I'll wait for Mike and you to respond to the rest of my PR comments before I do another pass on the PR, so please weigh in on how much you want the other suggested changes.
if (e.typeDef.optRestriction.exists(_.hasEnumeration)) { | ||
// Split the enumeration values into an array of strings ready to be inserted into the C code | ||
val enumerationValues = e.typeDef.optRestriction.flatMap(_.enumerationValues).get | ||
val enums = enumerationValues.split('|').map(_.replaceAll("\\\\(\\W)", "$1")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try that expression, thanks.
s"{$arraysName[$index], ${s.length / 2}, false}" | ||
} | ||
val hexEnumsInit = enums.map(_.grouped(2).map("0x" + _).mkString("{", ", ", "}")) | ||
val enumsLenMax = enums.map(_.length / 2).max |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First change is easier, so will accept that change.
"{", | ||
", ", | ||
"}", | ||
)};\n""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not seen any schemas with hex binary enums yet, so let's put off any changes until we see hex enums actually used. I only found out through experimentation that daffodil supports hex enums and thought daffodilC should try to support them too. I think your imagination that most hex enums are close in length is probably right anyway.
val (enumsInit, extraInit, primType, valType, fieldArg) = e.optPrimType.get match { | ||
case PrimType.Double | PrimType.Float => (enums, "", "double", "floatpt", field) | ||
case PrimType.HexBinary => (hexEnums, arraysInit, "HexBinary", "hexbinary", s"&$field") | ||
case _ => (enums, "", "int64_t", "integer", field) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I ever checked dates, times, or arbitrary length integers longer than 64 bits. I am sure they would fail somewhere else too since the code generator doesn't support them at all (it won't be able to find the right primitive C type, for example).
@@ -212,6 +212,8 @@ final class SchemaDocument private (xmlSDoc: XMLSchemaDocument) | |||
final override lazy val optLexicalParent = Some(xmlSDoc) | |||
final override lazy val optXMLSchemaDocument = Some(xmlSDoc) | |||
|
|||
final lazy val version = (xml \ "@version").text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you change the line to return an Option[String]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for the comments, responses below.
<annotation> | ||
<appinfo source="http://www.ogf.org/dfdl/"> | ||
<dfdl:defineFormat name="format"> | ||
<dfdl:format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right, adding byteOrder as bigEndian and bitOrder as mostSignificantBitFirst to the format.
limitations under the License. | ||
--> | ||
|
||
<!-- Network order binary format: <https://en.wikipedia.org/wiki/Endianness#Networking> --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, Network order big endian format.
|
||
<!-- Representation property bindings --> | ||
<!-- Network order binary format (net:format) --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, saying Network order big endian format (net:format)
.
</simpleType> | ||
<simpleType name="simple-hexBinaryPrefixed" | ||
dfdl:prefixLengthType="si:prefixedCount" | ||
dfdl:lengthKind="prefixed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know about this requirement until now. It would be a good idea to implement prefix value checking, but not in this PR (going on vacation soon).
@@ -212,6 +212,8 @@ final class SchemaDocument private (xmlSDoc: XMLSchemaDocument) | |||
final override lazy val optLexicalParent = Some(xmlSDoc) | |||
final override lazy val optXMLSchemaDocument = Some(xmlSDoc) | |||
|
|||
final lazy val version = (xml \ "@version").text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere on WWW is a diatribe on why using namespace URIs with versions in them is so wrong wrong wrong as a versioning means for XSD schemas.
The built-in XML Schema version attribute is orthogonal to and separate from any namespace URI. We plan to take advantage of it to assign semantic versions (strings like "3.6.0") to our filters.
Modify C generator to compare floating point and integer numbers with enumerations and ranges specified in DFDL schemas. Test validation code with additional simple type root elements in simple.dfdl.xsd schema and additional TDML tests in simple_errors.tdml. Also allow C generator to compare hexBinary elements to values in enumerations since Daffodil does the same thing and C generator already compares hexBinary elements to values in fixed attributes. Allow C generator to ignore dfdl:assert expressions in DFDL schemas and generate code anyway instead of throwing an exception. Allow C generator to get schema version from DFDL schemas and put it into the generated C code to version the generated code as well. Found out Daffodil's default alignment properties (alignment="1" and alignmentUnits="bytes") makes Daffodil append extra fill bits to elements with odd sizes not a multiple of 8 bits. Because compatibility between Daffodil and DaffodilC depends on defining both dfdl:alignment="1" (default) and dfdl:alignmentUnits="bits" (non-default), define a known good binary data format in one place (network/format.dfdl.xsd) and include it in rest of daffodil-codegen-c's test schemas. Finally, make changes requested by PR review. DAFFODIL-2853 BUILD.md: Document how to install iwyu and libcriterion-dev (used only for Daffodil C code generator development and maintenance). README.md: Document how to check formatting of or reformat Daffodil. Also document how to put the Daffodil jars in the maven and ivy caches. c/files/.clang-format: Update to clang-format 14. Format declarations more concisely by removing AlignConsecutiveDeclarations (this is why some reformatted C files lose some extra whitespace). c/files/Makefile: Shorten iwyu's maximum line length from 999 to 111. c/files/**/**.[ch]: Add "auto-maintained by iwyu" comment to headers. c/files/libcli/cli_errors.[ch]: Rename CLI_ZZZ, ERR_ZZZ, and FIELD_ZZZ to CLI__NUM_CODES, ERR__NUM_CODES, and FIELD__NO_ARGS. c/files/libcli/daffodil_getopt.[ch]: Merge daffodil_parse_cli and daffodil_unparse_cli structs and objects into daffodil_pu_cli and daffodil_pu. Skip and ignore -r and -s options so one can run "c/daffodil parse" with the same options as "daffodil parse -r root -s schema -o outfile infile" without having to remove these options. c/files/libcli/daffodil_main.c: Merge daffodil_parse_cli and daffodil_unparse_cli structs and objects into daffodil_pu_cli and daffodil_pu. Initialize ParserOrUnparserState fields as well as PState/UState fields. Make sure any diagnostics will fail unparse as well as parse if validate mode is on. c/files/libruntime/errors.[ch]: Add ERR_ENUM_MATCH and ERR_OUTSIDE_RANGE error messages to diagnose elements not matching any of their enumerations or having values outside their allowed ranges. Rename ERR_ZZZ and FIELD_ZZZ to ERR__NUM_CODES and FIELD__NO_ARGS. Rename ERR_ENUM_MATCH, ERR_FIXED_VALUE, and ERR_OUTSIDE_RANGE to ERR_RESTR_ENUM, ERR_RESTR_FIXED, and ERR_RESTR_RANGE. Remove `daffodil_program_version` since we use `daffodil_version` in daffodil_version.h instead. c/files/libruntime/infoset.h: Move common PState/UState fields to ParseOrUnparseState to allow parse and unparse functions to call common validate functions. c/files/libruntime/parsers.[ch]: Use ParserOrUnparserState fields as well as PState/UState fields. Replace parse_check_bounds and parse_validate_fixed functions with validate_array_bounds and validate_fixed_attribute functions in validators.[ch]. Rename parse_align and parse_fill_bits to parse_align_to and parse_alignment_bits. c/files/libruntime/unparsers.[ch]: Use ParserOrUnparserState fields as well as UState fields. Replace unparse_check_bounds and unparse_validate_fixed functions with validate_array_bounds and validate_fixed_attribute functions in validators.[ch]. Rename unparse_align and unparse_fill_bits to unparse_align_to and unparse_alignment_bits. c/files/libruntime/validators.[ch]: Move common validation functions here from parsers.[ch] and unparsers.[ch]. Add new float, hexBinary, int, and universal validation functions to check that elements match their allowed enumerations (requires passing array of floating point numbers, hexBinary structs, or integer numbers) and fit within their allowed ranges. Validation functions create diagnostics instead of errors; the CLI or a caller is responsible for printing the diagnostics and exiting with the appropriate status code. c/files/tests/bits.c: Use ParserOrUnparserState fields as well as PState/UState fields. c/files/tests/extras.c: Regenerate iwyu comments. c/DaffodilCCodeGenerator.scala: Accept but ignore dfdl:assert statements (for now). c/DaffodilCExamplesGenerator.scala: Generate code from simple's all-in-one root element in order to show generated code for all simple types, enums, and ranges. c/generators/AlignmentFillCodeGenerator.scala: Rename parse_align and unparse_align to parse_align_to and unparse_align_to. Use ParserOrUnparserState fields as well as PState/UState fields. c/generators/BinaryBooleanCodeGenerator.scala: Use ParserOrUnparserState fields as well as PState/UState fields. c/generators/BinaryValueCodeGenerator.scala: Generate C code to validate enumerations and ranges of primitive elements. Get raw enumeration values into a Seq[String]. Avoid unsigned >= 0 comparisons to prevent gcc warnings. Use ParserOrUnparserState fields as well as PState/UState fields. Call correct function to validate enums depending on element's primType. Generate extra C initialization code for hexBinary enumerations to define arrays of hexBinary structs and pass them to validate_hexbinary_enumeration. c/generators/CodeGeneratorState.scala: Remove unnecessary immutable (built-in Set is immutable). Use ParserOrUnparserState fields as well as PState/UState fields. Rename parse_fill_bits and unparse_fill_bits to parse_alignment_bits and unparse_alignment_bits. Get actual schema version from Daffodil and assign its value to `schema_version` in generated_code.c. Declare `schema_version` in generated_code.h. Call validate_array_bounds instead of parse/unparse_check_bounds. Make cStructFieldAccess work correctly for DFDL expressions like "/rr:ReqstReply/..." used by NFS schemas. c/generators/HexBinaryCodeGenerator.scala: Use ParserOrUnparserState fields as well as PState/UState fields. Call validate_fixed_attribute instead of parse/unparse_validate_fixed. examples/**/generated_code.[ch]: Regenerate to show changes in C generator such as adding new "auto-maintained by iwyu" comment, calls to renamed functions such as parse_align_to and unparse_align_to, defining schema_version, using pu fields, and calling validation functions. c/data/simple*.dat: Remove (contents now inside simple.tdml). c/ex_nums.dfdl.xsd: Define schema version to be "1.0.2". Include network format instead of defining own binary format. Adjust elements' own format properties as needed to match original data file (explicitness is better than using defaults). c/infosets/simple*.xml: Remove (contents now inside simple.tdml). c/nested.dfdl.xsd: Include network format instead of defining own binary format. c/network/format.dfdl.xsd: Define network format in only one place, making sure to use vitally needed alignment properties but keep the format as minimal as possible to make it easier to include in a wide variety of schemas. Change "Network order binary format" comment to "Network order big endian format" as requested. Define bitOrder and byteOrder explicitly in format schema instead of inheriting them from DFDLGeneralFormat schema. c/padtest.dfdl.xsd: Include network format instead of defining own binary format. c/simple.dfdl.xsd: Define schema version to be "1.0.0". Include network format instead of defining own binary format. Add new elements to test enumerations and ranges of primitive types. Use custom simple types in order to define simple type root elements, enumerations of simple type root elements, ranges of simple type elements, and all-in-one root element as compactly as possible. c/simple.tdml: Make comments clearer how to run tests and include all data and infosets in test cases instead of using files. c/simple_errors.tdml: Make comments clearer how to run tests and add new test cases to validate enumerations and ranges of primitive types. c/variablelen.dfdl.xsd: Include network format instead of defining own binary format. core/dsom/SchemaDocument.scala: Modify SchemaDocument to capture schema versions from XML schema definitions and provide to codegen-c. tdml/TestDaffodilC.scala: Add future-proofing test to check test schema compiles without any warnings (would catch "relative location deprecated" warning if DFDLGeneralFormat url was still relative). Fix inconsistent use of "dp"/"tdp" and "isError"/"isProcessingError". c/TestSimpleErrors.scala: Call new test cases in simple_errors.tdml.
20b9d96
to
604d509
Compare
Modify C generator to compare floating point and integer numbers with enumerations and ranges specified in DFDL schemas. Test validation code with additional simple type root elements in simple.dfdl.xsd schema and additional TDML tests in simple_errors.tdml.
Also allow C generator to compare hexBinary elements to values in enumerations since Daffodil does the same thing and C generator already compares hexBinary elements to values in fixed attributes.
Allow C generator to ignore dfdl:assert expressions in DFDL schemas and generate code anyway instead of throwing an exception.
Allow C generator to get schema version from DFDL schemas and put it into the generated C code to version the generated code as well.
Found out Daffodil's default alignment properties (alignment="1" and alignmentUnits="bytes") makes Daffodil append extra fill bits to elements with odd sizes not a multiple of 8 bits. Because compatibility between Daffodil and DaffodilC depends on defining both dfdl:alignment="1" (default) and dfdl:alignmentUnits="bits" (non-default), define a known good binary data format in one place (network/format.dfdl.xsd) and include it in rest of daffodil-codegen-c's test schemas.
DAFFODIL-2853
BUILD.md: Document how to install iwyu and libcriterion-dev (used only for Daffodil C code generator development and maintenance).
README.md: Document how to check formatting of or reformat Daffodil.
c/files/.clang-format: Update to clang-format 14. Format declarations more concisely by removing AlignConsecutiveDeclarations (this is why some reformatted C files lose some extra whitespace).
c/files/libcli/daffodil_getopt.c: Skip and ignore -r and -s options so one can run "c/daffodil parse" with the same options as "daffodil parse -r root -s schema -o outfile infile" without having to remove these options.
c/files/libcli/daffodil_main.c: Initialize ParserOrUnparserState fields as well as PState/UState fields. Make sure any diagnostics will fail unparse as well as parse if validate mode is on.
c/files/libruntime/errors.[ch]: Add ERR_ENUM_MATCH and ERR_OUTSIDE_RANGE error messages to diagnose elements not matching any of their enumerations or having values outside their allowed ranges. Remove
daffodil_program_version
since we usedaffodil_version
in daffodil_version.h instead.c/files/libruntime/infoset.h: Move common PState/UState fields to ParseOrUnparseState to allow parse and unparse functions to call common validate functions.
c/files/libruntime/parsers.[ch]: Use ParserOrUnparserState fields as well as PState/UState fields. Replace parse_check_bounds and parse_validate_fixed functions with validate_array_bounds and validate_fixed_attribute functions in validators.[ch].
c/files/libruntime/unparsers.[ch]: Use ParserOrUnparserState fields as well as UState fields. Replace unparse_check_bounds and unparse_validate_fixed functions with validate_array_bounds and validate_fixed_attribute functions in validators.[ch].
c/files/libruntime/validators.[ch]: Move common validation functions here from parsers.[ch] and unparsers.[ch]. Add new float, hexBinary, int, and universal validation functions to check that elements match their allowed enumerations (requires passing array of floating point numbers, hexBinary structs, or integer numbers) and fit within their allowed ranges. Validation functions create diagnostics instead of errors; the CLI or a caller is responsible for printing the diagnostics and exiting with the appropriate status code.
c/files/tests/bits.c: Use ParserOrUnparserState fields as well as PState/UState fields.
c/files/tests/extras.c: Regenerate iwyu comments.
c/DaffodilCCodeGenerator.scala: Accept but ignore dfdl:assert statements (for now).
c/DaffodilCExamplesGenerator.scala: Generate code from simple's all-in-one root element in order to show generated code for all simple types, enums, and ranges.
c/generators/AlignmentFillCodeGenerator.scala: Use ParserOrUnparserState fields as well as PState/UState fields.
c/generators/BinaryBooleanCodeGenerator.scala: Use ParserOrUnparserState fields as well as PState/UState fields.
c/generators/BinaryValueCodeGenerator.scala: Generate C code to validate enumerations and ranges of primitive elements. Remove escape characters in enumeration values added by Facets.escapeForRegex (turn "1.0" back into "1.0"). Avoid unsigned >= 0 comparisons to prevent gcc warnings. Use ParserOrUnparserState fields as well as PState/UState fields. Call correct function to validate enums depending on element's primType. Generate extra C initialization code for hexBinary enumerations to define arrays of hexBinary structs and pass them to validate_hexbinary_enumeration.
c/generators/CodeGeneratorState.scala: Remove unnecessary immutable (built-in Set is immutable). Use ParserOrUnparserState fields as well as PState/UState fields. Get actual schema version from Daffodil and assign its value to
schema_version
in generated_code.c. Declareschema_version
in generated_code.h. Call validate_array_bounds instead of parse/unparse_check_bounds. Make cStructFieldAccess work correctly for DFDL expressions like "/rr:ReqstReply/..." used by NFS schemas.c/generators/HexBinaryCodeGenerator.scala: Use ParserOrUnparserState fields as well as PState/UState fields. Call validate_fixed_attribute instead of parse/unparse_validate_fixed.
examples/**/generated_code.[ch]: Regenerate to show changes in C generator such as defining schema_version, using pu fields, and calling validation functions.
c/data/simple*.dat: Remove (contents now inside simple.tdml).
c/ex_nums.dfdl.xsd: Define schema version to be "1.0.2". Include network format instead of defining own binary format. Adjust elements' own format properties as needed to match original data file (explicitness is better than using defaults).
c/infosets/simple*.xml: Remove (contents now inside simple.tdml).
c/nested.dfdl.xsd: Include network format instead of defining own binary format.
c/network/format.dfdl.xsd: Define network format in only one place, making sure to use vitally needed alignment properties but keep the format as minimal as possible to make it easier to include in a wide variety of schemas.
c/padtest.dfdl.xsd: Include network format instead of defining own binary format.
c/simple.dfdl.xsd: Define schema version to be "1.0.0". Include network format instead of defining own binary format. Add new elements to test enumerations and ranges of primitive types. Use custom simple types in order to define simple type root elements, enumerations of simple type root elements, ranges of simple type elements, and all-in-one root element as compactly as possible.
c/simple.tdml: Make comments clearer how to run tests and include all data and infosets in test cases instead of using files.
c/simple_errors.tdml: Make comments clearer how to run tests and add new test cases to validate enumerations and ranges of primitive types.
c/variablelen.dfdl.xsd: Include network format instead of defining own binary format.
core/dsom/SchemaDocument.scala: Modify SchemaDocument to capture schema versions from XML schema definitions and provide to codegen-c.
tdml/TestDaffodilC.scala: Add future-proofing test to check test schema compiles without any warnings (would catch "relative location deprecated" warning if DFDLGeneralFormat url was still relative). Fix inconsistent use of "dp"/"tdp" and "isError"/"isProcessingError".
c/TestSimpleErrors.scala: Call new test cases in simple_errors.tdml.