Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EBNF for customasm syntax #139

Closed
parasyte opened this issue Jul 10, 2022 · 4 comments
Closed

EBNF for customasm syntax #139

parasyte opened this issue Jul 10, 2022 · 4 comments

Comments

@parasyte
Copy link

parasyte commented Jul 10, 2022

The customasm meta language has a syntax which can be described in EBNF notation. I have been unable to find any language descriptions matching what EBNF would provide. This is essentially a shorthand description of the parser.

It would help to rule out any syntactic ambiguities, especially as the language evolves. And more importantly I think it would be useful to understand the existing syntax rules. For instance, describing why asm block parameters typed by a subrule need enclosing braces (but parameters that are numerically typed must not have enclosing braces): https://github.com/hlorenzi/customasm/wiki/Advanced-rules#asm-blocks

Examples should also be tested against the EBNF so that it stays in sync with the parser.

@hlorenzi
Copy link
Owner

How would we go about describing instruction invocations? They're not context-free -- they're completely mysterious until it's matching time, and are parsed in context of each possible instruction. Meaning if two possible instructions have expression slots in different spots, what is considered an expression is going to change for the same invocation.

For example:

#ruledef
{
  ld   {a}, x + 1 => 0x11
  ld x + 1,   {a} => 0x22
}

x = 0
ld x + 1, x + 1 ; invocation

It's undefined what the invocation syntax is until the parsing algorithm runs, which will try to parse it twice: one pass for each rule you declared beforehand. When x + 1 is specified verbatim in an instruction's pattern, it's not parsed as an expression -- it's simply parsed as a sequence of characters (currently, not even as proper tokens!).

With that in mind, do you still think it would make sense to keep an EBNF grammar around? Maybe for the other parts of the language?

The reason asm block parameters need enclosing braces is to enable the assembler to perform substitution token-for-token, without syntactic context -- since braces are some of the only tokens not allowed to be part of an instruction's pattern, it's easy to spot them in a context-free manner.

Now, the reason you can also specify numerical asm block parameters without the braces is kind of an oversight of mine -- behavior from before I realized you need token-for-token substitution to cover all cases. Behavior which maybe should be deprecated? All types of parameters should work fine with enclosing braces anyway, albeit changing the semantics a little.

@parasyte
Copy link
Author

My argument is that there is a grammar for the metalanguage. It might look something like this, just kind of making it up:

letter = "A" | "B" | "C" | "D" | "E" | "F" | "G"
       | "H" | "I" | "J" | "K" | "L" | "M" | "N"
       | "O" | "P" | "Q" | "R" | "S" | "T" | "U"
       | "V" | "W" | "X" | "Y" | "Z" | "a" | "b"
       | "c" | "d" | "e" | "f" | "g" | "h" | "i"
       | "j" | "k" | "l" | "m" | "n" | "o" | "p"
       | "q" | "r" | "s" | "t" | "u" | "v" | "w"
       | "x" | "y" | "z" ;
nonzero digit = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
digit = nonzero digit | "0" ;
binary digit = "0" | "1" ;
octal digit = binary digit | "2" | "3" | "4" | "5" | "6" | "7" ;
hex digit = decimal digit | "A" | "B" | "C" | "D" | "E" | "F"
       | "a" | "b" | "c" | "d" | "e" | "f" ;
character = letter | digit | "_" ;

binary = "0b", binary digit, { binary digit } ;
octal = "0o", octal digit, { octal digit } ;
decimal = nonzero digit, { digit } ;
hex = "0x", hex digit, { hex digit } ;

identifier = ( letter | "_" ), { character } ;
number = [ "-" ], ( binary | octal | decimal | hex ) ;
string = '"', { all characters - '"' }, '"' ;

ruledef directive = "#ruledef", white space, [ identifier ], white space, ruledef arguments ;
ruledef arguments = "{", match expression, { match expression }, "}" ;
match expression = match rule, match body ;
match rule = white space, { all characters }, "=>", white space ;
match body = expression | expressions ;

expressions = "{", white space, expression, { white space, expression }, white space "}" ;

white space = ? white space characters ? ;
all characters = ? all visible characters ? ;

With this, define what an expression is and you have a good starting point for a grammar to write #ruledef directives, identifiers, numbers, and strings.

I don't think it's worth trying to define the grammar of the instructions defined inside #ruledef, which seems to be where you are getting stuck. It's enough to understand the grammar at a higher level.

When x + 1 is specified verbatim in an instruction's pattern, it's not parsed as an expression -- it's simply parsed as a sequence of characters (currently, not even as proper tokens!).

That's perfectly fine! The grammar for the metalanguage should specify this and that solves it.

Now, the reason you can also specify numerical asm block parameters without the braces is kind of an oversight of mine -- behavior from before I realized you need token-for-token substitution to cover all cases. Behavior which maybe should be deprecated? All types of parameters should work fine with enclosing braces anyway, albeit changing the semantics a little.

This would probably be nice to address. AFAIK wrapping integral typed parameters in braces in the asm context does not work.

Screen Shot 2022-07-14 at 8 38 38 PM

@hlorenzi
Copy link
Owner

hlorenzi commented May 3, 2023

I think the confusion with the asm blocks will mostly be resolved with the next release I'm working on, where all arguments can be specified with braces within the asm block. Feel free to open this again if you still think the EBNF is worth it!

@hlorenzi hlorenzi closed this as not planned Won't fix, can't repro, duplicate, stale May 3, 2023
@parasyte
Copy link
Author

parasyte commented May 3, 2023

I think some specification of the meta language syntax is still important even if it is not EBNF. For instance when I wrote a syntax definition for Sublime Text, I didn't have a great resource for defining the parser. It is mostly just an approximation based on the wiki and empirical observation.

Cf. #105 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants