EBNF for customasm syntax #139

parasyte · 2022-07-10T20:58:30Z

The customasm meta language has a syntax which can be described in EBNF notation. I have been unable to find any language descriptions matching what EBNF would provide. This is essentially a shorthand description of the parser.

It would help to rule out any syntactic ambiguities, especially as the language evolves. And more importantly I think it would be useful to understand the existing syntax rules. For instance, describing why asm block parameters typed by a subrule need enclosing braces (but parameters that are numerically typed must not have enclosing braces): https://github.com/hlorenzi/customasm/wiki/Advanced-rules#asm-blocks

Examples should also be tested against the EBNF so that it stays in sync with the parser.

The text was updated successfully, but these errors were encountered:

hlorenzi · 2022-07-14T00:09:35Z

How would we go about describing instruction invocations? They're not context-free -- they're completely mysterious until it's matching time, and are parsed in context of each possible instruction. Meaning if two possible instructions have expression slots in different spots, what is considered an expression is going to change for the same invocation.

For example:

#ruledef
{
  ld   {a}, x + 1 => 0x11
  ld x + 1,   {a} => 0x22
}

x = 0
ld x + 1, x + 1 ; invocation

It's undefined what the invocation syntax is until the parsing algorithm runs, which will try to parse it twice: one pass for each rule you declared beforehand. When x + 1 is specified verbatim in an instruction's pattern, it's not parsed as an expression -- it's simply parsed as a sequence of characters (currently, not even as proper tokens!).

With that in mind, do you still think it would make sense to keep an EBNF grammar around? Maybe for the other parts of the language?

The reason asm block parameters need enclosing braces is to enable the assembler to perform substitution token-for-token, without syntactic context -- since braces are some of the only tokens not allowed to be part of an instruction's pattern, it's easy to spot them in a context-free manner.

Now, the reason you can also specify numerical asm block parameters without the braces is kind of an oversight of mine -- behavior from before I realized you need token-for-token substitution to cover all cases. Behavior which maybe should be deprecated? All types of parameters should work fine with enclosing braces anyway, albeit changing the semantics a little.

parasyte · 2022-07-15T03:40:26Z

My argument is that there is a grammar for the metalanguage. It might look something like this, just kind of making it up:

letter = "A" | "B" | "C" | "D" | "E" | "F" | "G"
       | "H" | "I" | "J" | "K" | "L" | "M" | "N"
       | "O" | "P" | "Q" | "R" | "S" | "T" | "U"
       | "V" | "W" | "X" | "Y" | "Z" | "a" | "b"
       | "c" | "d" | "e" | "f" | "g" | "h" | "i"
       | "j" | "k" | "l" | "m" | "n" | "o" | "p"
       | "q" | "r" | "s" | "t" | "u" | "v" | "w"
       | "x" | "y" | "z" ;
nonzero digit = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
digit = nonzero digit | "0" ;
binary digit = "0" | "1" ;
octal digit = binary digit | "2" | "3" | "4" | "5" | "6" | "7" ;
hex digit = decimal digit | "A" | "B" | "C" | "D" | "E" | "F"
       | "a" | "b" | "c" | "d" | "e" | "f" ;
character = letter | digit | "_" ;

binary = "0b", binary digit, { binary digit } ;
octal = "0o", octal digit, { octal digit } ;
decimal = nonzero digit, { digit } ;
hex = "0x", hex digit, { hex digit } ;

identifier = ( letter | "_" ), { character } ;
number = [ "-" ], ( binary | octal | decimal | hex ) ;
string = '"', { all characters - '"' }, '"' ;

ruledef directive = "#ruledef", white space, [ identifier ], white space, ruledef arguments ;
ruledef arguments = "{", match expression, { match expression }, "}" ;
match expression = match rule, match body ;
match rule = white space, { all characters }, "=>", white space ;
match body = expression | expressions ;

expressions = "{", white space, expression, { white space, expression }, white space "}" ;

white space = ? white space characters ? ;
all characters = ? all visible characters ? ;

With this, define what an expression is and you have a good starting point for a grammar to write #ruledef directives, identifiers, numbers, and strings.

I don't think it's worth trying to define the grammar of the instructions defined inside #ruledef, which seems to be where you are getting stuck. It's enough to understand the grammar at a higher level.

When x + 1 is specified verbatim in an instruction's pattern, it's not parsed as an expression -- it's simply parsed as a sequence of characters (currently, not even as proper tokens!).

That's perfectly fine! The grammar for the metalanguage should specify this and that solves it.

Now, the reason you can also specify numerical asm block parameters without the braces is kind of an oversight of mine -- behavior from before I realized you need token-for-token substitution to cover all cases. Behavior which maybe should be deprecated? All types of parameters should work fine with enclosing braces anyway, albeit changing the semantics a little.

This would probably be nice to address. AFAIK wrapping integral typed parameters in braces in the asm context does not work.

hlorenzi · 2023-05-03T02:04:41Z

I think the confusion with the asm blocks will mostly be resolved with the next release I'm working on, where all arguments can be specified with braces within the asm block. Feel free to open this again if you still think the EBNF is worth it!

parasyte · 2023-05-03T02:53:30Z

I think some specification of the meta language syntax is still important even if it is not EBNF. For instance when I wrote a syntax definition for Sublime Text, I didn't have a great resource for defining the parser. It is mostly just an approximation based on the wiki and empirical observation.

Cf. #105 (comment)

parasyte mentioned this issue Jul 12, 2022

Language Server for VSCode or other popular IDE's #105

Open

hlorenzi added the enhancement label Jul 14, 2022

hlorenzi closed this as not planned Won't fix, can't repro, duplicate, stale May 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EBNF for customasm syntax #139

EBNF for customasm syntax #139

parasyte commented Jul 10, 2022 •

edited

Loading

hlorenzi commented Jul 14, 2022

parasyte commented Jul 15, 2022

hlorenzi commented May 3, 2023

parasyte commented May 3, 2023 •

edited

Loading

EBNF for customasm syntax #139

EBNF for customasm syntax #139

Comments

parasyte commented Jul 10, 2022 • edited Loading

hlorenzi commented Jul 14, 2022

parasyte commented Jul 15, 2022

hlorenzi commented May 3, 2023

parasyte commented May 3, 2023 • edited Loading

parasyte commented Jul 10, 2022 •

edited

Loading

parasyte commented May 3, 2023 •

edited

Loading