Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shift right mistakenly parsed as type argument list #176

Open
Sjord opened this issue Apr 6, 2021 · 10 comments
Open

Shift right mistakenly parsed as type argument list #176

Sjord opened this issue Apr 6, 2021 · 10 comments
Labels

Comments

@Sjord
Copy link
Collaborator

Sjord commented Apr 6, 2021

var R = A < X >> B;

Actual:

Interpreted as A<X> > B

      (equals_value_clause [0, 6] - [0, 18]
        (binary_expression [0, 8] - [0, 18]
          (generic_name [0, 8] - [0, 15]
            (identifier [0, 8] - [0, 9])
            (type_argument_list [0, 10] - [0, 15]
              (identifier [0, 12] - [0, 13])))
          (identifier [0, 17] - [0, 18]))))))))

Expected:

Interpreted as A < (X >> B)

      (equals_value_clause [0, 6] - [0, 18]
        (binary_expression [0, 8] - [0, 18]
          (identifier [0, 8] - [0, 9])
          (binary_expression [0, 12] - [0, 18]
            (identifier [0, 12] - [0, 13])
            (identifier [0, 17] - [0, 18])))))))))
@damieng damieng added the bug label Apr 7, 2021
@initram
Copy link
Contributor

initram commented Jun 3, 2021

expression currently contain _simple_name, but I don't think generic_name is allowed as an expression. e.g. A<X> is never a valid expression by it self.

I have tried to replace _simple_name with _identifier_or_global, but there seems to be a lot of work to get the precedence correct for this to work in all cases. I got the test suite to run and also the example above to parse correctly, but the number of parse errors from the example repos role dramatically

An example of code that parses now, but that I could not get to parse with my changes were:

class SomeClass {
  public static Task<T?> Null<T>() where T : class => Default<T>();
}

Here it tries to parse Default<T>() as a binary expression along the lines of Default < T>().

@tamasvajk
Copy link
Collaborator

I'm struggling to understand what exactly is happening here. I understand that with the current grammar, expression A<X>>B can be parsed both as (A<X>) > B and A < (X >> B). The latter should be the correct one based on Roslyn.

Doing a tree-sitter parse -D shows the below:

Screenshot 2022-12-30 at 15 51 35

Both versions have error_cost = 0. Why are these two cases condensed into a single one? Why is ErrorComparisonNone handled the same way as ErrorComparisonPreferLeft in parser.c? This is the comment a couple of lines above

// Examine each pair of stack versions, removing any versions that
// are clearly worse than another version. Ensure that the versions
// are ordered from most promising to least promising.

As far as I see the two stack versions are equally good. (I tried parsing the same input with a locally built tree-sitter, where the case ErrorComparisonNone was removed, and then I got select_earlier symbol:compilation_unit, over_symbol:compilation_unit, which (I think) is matching what one would get for equal dynamic precedence). I think this would be the correct tree-sitter behaviour, but my experience is quite limited.

Putting aside the above that I don't understand why certain things are happening in tree-sitter. How would one solve this conflicting parsing issue?

I see the following options:

  1. The solution outlined above: Replace _simple_name with _identifier_or_global in expression, which seems to require some more changes to the grammar, because Default<T> is not going to be an expression any longer, so Dynamic<T>() doesn't parse as invocation_expression either.
  2. Use prec.dynamic to prefer one case over the other. So binary_expression should have higher dynamic precedence than generic_name. This seems rather arbitrary, and it interferes with Generic type in declaration parsed as binary expression #243, which contains 2 binary expressions, so variable_declaration would need >2 dynamic precedence. This doesn't seem to scale too well for future precedence issues.

What are other options? None of the above seem viable/straightforward.

@damieng
Copy link
Collaborator

damieng commented Jan 3, 2023

I wish I had more information or an answer for you but my experience in dealing with these awkward conflicts are also limited and there doesn't seem to be much else in the way of documentation or support available (Max has been quiet for quite some time).

I'd be okay going with either solution for a while to see how it pans out. We could always switch later if we needed to - maybe leave some comments as to why things are the way they are in the meantime.

@maxbrunsfeld
Copy link
Contributor

I'm struggling to understand what exactly is happening here.

I believe right now, there are two valid parses, and they both have the same dynamic precedence (zero, the default), so Tree-sitter is having to just choose one semi-arbitrarily.

I agree with @initram - it seems like the grammar should be restructured so that generic_name is only allowed in certain positions (like method calls), not as an expression on its own.

@initram
Copy link
Contributor

initram commented Jan 6, 2023

Looking at this again, it is not so straight forward. It seems that in some rare cases generic_name is a valid expression. Take for example this valid C# code:

var R = nameof(List<int>);

That is parsed as an invocation expression with a generic name as the argument expression by Roslyn.

It is really hard to find a good rule or precedence that solves all these issues with when it should be parsed as a generic name or not. One idea that I have is that generic names maybe should only be allowed as the top level expression, and not as part of a composite expression, but that may easier said than done.

We should add a bunch more test cases to cover some of all these use-cases of generic name and less-than and greather-than symbols. That way it is easier to experiment with new approaches and not have to rely on whether the "example repos" parse as well.

@maxbrunsfeld
Copy link
Contributor

It seems that in some rare cases generic_name is a valid expression.

I see. One way to get this more correct might be to put a negative dynamic precedence on generic_name, so that when possible, we'll prefer not to use it, but then to compensate for this with a larger positive dynamic precedence on constructs where generic_name is normally used (like method calls, etc). This way, the order of preference in the event of a true ambiguity would be:

  • generic method calls etc
  • shift expressions
  • lastly, generic names on their own

@tamasvajk
Copy link
Collaborator

tamasvajk commented Jan 12, 2023

@initram I experimented with your approach a bit further. So changed _simple_name to _identifier_or_global in _lvalue_expression (previously _expression). As you found that was not enough, and generic method calls were not working any longer. So I started adjusting invocation_expression:

invocation_expression: $ => prec(PREC.INVOCATION, seq(
  field('function', $._invocation_function),
  field('arguments', $.argument_list)
)),

_invocation_function: $ => choice(
  $.this_expression,
  $.base_expression,
  $._name,
  $.postfix_unary_expression, // name!(), this could be more specific than postfix_unary_expression
  $.member_access_expression,
  $.conditional_access_expression,
  $.element_access_expression,
  $.invocation_expression,
  $.parenthesized_expression
),

This got me quite far. (There are 177 failing files in Roslyn, which is in line with my earlier checks, so there might not be any new failures.) The above list is somewhat arbitrary, I think all expression types that are in the Primary row of the precedence table would be needed. Let me know if you think this is a wrong assumption.

At the same time var a = b < c >> d; parses as below, which seems correct.

(compilation_unit [0, 0] - [0, 19]
  (global_statement [0, 0] - [0, 19]
    (local_declaration_statement [0, 0] - [0, 19]
      (variable_declaration [0, 0] - [0, 18]
        type: (implicit_type [0, 0] - [0, 3])
        (variable_declarator [0, 4] - [0, 18]
          (identifier [0, 4] - [0, 5])
          (equals_value_clause [0, 6] - [0, 18]
            (binary_expression [0, 8] - [0, 18]
              left: (identifier [0, 8] - [0, 9])
              right: (binary_expression [0, 12] - [0, 18]
                left: (identifier [0, 12] - [0, 13])
                right: (identifier [0, 17] - [0, 18])))))))))

Regarding

It seems that in some rare cases generic_name is a valid expression.

Do you know if this is specific to nameof()? If it is, we could special case it.

See the draft PR #285.

@initram
Copy link
Contributor

initram commented Jan 13, 2023

@tamasvajk It seems promising. But why did you need to limit the expressions available for _invocation_function, why could it not just be (generic_name or _expression)? If it is needed to limit the expressions that are available to use as _invocation_function, then we need to make sure that there is no legal expression from _expression that is missing in _invocation_function. At least you are missing default expression, as this is valid C#: default(Func<int>)()

The only place that I can think of where generic_name is a legal expression is in nameof(). I could be wrong, and there are other cases where Roslyn accepts generic name as expressions for parsing purposes, but the code cannot compile for semantic reasons.

If you could add a few more tests to cover some of the invocation expressions that are not covered currently, and at the very least the shift right expression from this issue. That would help making sure that we do not regress in parsing in the future.

@tamasvajk
Copy link
Collaborator

But why did you need to limit the expressions available for _invocation_function

My initial goal was to limit it as much as possible in the hope that this way there wouldn't be too many new conflicts. The list grew a bit from what I initially added. So for grammar maintainability, your suggestion might be a lot better. I gave it a quick try

_invocation_function: $ => choice(
  $._expression,
  $.generic_name
),

which resulted in a couple more conflicts:

[$._invocation_function, $._simple_name],
[$._invocation_function, $.ref_expression],
[$._invocation_function, $.throw_expression],
[$._invocation_function, $.select_clause],
[$._invocation_function, $.assignment_expression],
[$._invocation_function, $.relational_pattern],
[$._invocation_function, $.group_clause],

and even the tests don't pass this way. For example var x = !Call(); is parsed as

(compilation_unit [0, 0] - [0, 16]
  (global_statement [0, 0] - [0, 16]
    (local_declaration_statement [0, 0] - [0, 16]
      (variable_declaration [0, 0] - [0, 15]
        type: (implicit_type [0, 0] - [0, 3])
        (variable_declarator [0, 4] - [0, 15]
          (identifier [0, 4] - [0, 5])
          (equals_value_clause [0, 6] - [0, 15]
            (invocation_expression [0, 8] - [0, 15]
              function: (prefix_unary_expression [0, 8] - [0, 13]
                (identifier [0, 9] - [0, 13]))
              arguments: (argument_list [0, 13] - [0, 15]))))))))

which I find rather odd. I don't understand why precedences are not working as I would expect. But this also suggests me that _invocation_function should be based on precedence as mentioned above. (The default() case that you mention above would also be covered as default is in the "Primary" precedence row.)

At the same time, default would also need to be special cased like nameof, because it can have a type as an argument. Similarly sizeof(Struct<int>) works too. I find it difficult to come up with a definitive list of such constructs.

@tamasvajk
Copy link
Collaborator

It seems that in some rare cases generic_name is a valid expression.

I realized that this is not specific to nameof, default and sizeof. generic_name can show up in some other places too as an expression. For example:

void M<T>(){}
var x = M<int>;

I'm closing the draft PR #285.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants