Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std/parsesql: Fix JOIN parsing #22890

Merged
merged 2 commits into from
Nov 2, 2024
Merged

Conversation

tuffnatty
Copy link
Contributor

This commit fixes/adds tests for and fixes several issues with JOIN operator parsing:

  • For OUTER joins, LEFT | RIGHT | FULL specifier is not optional
doAssertRaises(SqlParseError): discard parseSql("""
SELECT id FROM a
OUTER JOIN b
ON a.id = b.id
""")
  • For NATURAL JOIN and CROSS JOIN, ON and USING clauses are forbidden
doAssertRaises(SqlParseError): discard parseSql("""
SELECT id FROM a
CROSS JOIN b
ON a.id = b.id
""")
  • JOIN should parse as part of FROM, not after WHERE
doAssertRaises(SqlParseError): discard parseSql("""
SELECT id FROM a
WHERE a.id IS NOT NULL
INNER JOIN b
ON a.id = b.id
""")
  • LEFT JOIN should parse
doAssert $parseSql("""
SELECT id FROM a
LEFT JOIN b
ON a.id = b.id
""") == "select id from a left join b on a.id = b.id;"
  • NATURAL JOIN should parse
doAssert $parseSql("""
SELECT id FROM a
NATURAL JOIN b
""") == "select id from a natural join b;"
  • USING should parse
doAssert $parseSql("""
SELECT id FROM a
JOIN b
USING (id)
""") == "select id from a join b using (id );"
  • Multiple JOINs should parse
doAssert $parseSql("""
SELECT id FROM a
JOIN b
ON a.id = b.id
LEFT JOIN c
USING (id)
""") == "select id from a join b on a.id = b.id left join c using (id );"

@Araq
Copy link
Member

Araq commented Oct 30, 2023

It should still be optional and "JOIN" is a shortcut for "INNER JOIN" for backwards compatibility.

@tuffnatty
Copy link
Contributor Author

@Araq wrote:

It should still be optional and "JOIN" is a shortcut for "INNER JOIN" for backwards compatibility.

Sorry, but I have not understood your message without logical parentheses in place.
Would you like to keep backward compatibility, that is, parse buggy OUTER JOIN clause successfully?
Would you also like to keep backward compatibility in the case of buggy SELECT ... WHERE ... JOIN code?
I have not changed any [ INNER ] JOIN processing, so I don't understand this part as well.

@Araq
Copy link
Member

Araq commented Oct 31, 2023

I mean that

SELECT id FROM a
JOIN b
ON a.id = b.id

should continue to work and the JOIN should continue to mean INNER JOIN. If that's what your PR does, it's fine.

@tuffnatty
Copy link
Contributor Author

@Araq Yes, this example is still parsed (and the corresponding test is kept as it is), what has changed, though, is its tree representation: earlier it was

nkSelect
    nkSelectColumns
    nkFrom
       nkFromItem(a)
    nkJoin
       nkFromItem(b)

now it is:

nkSelect
    nkSelectColumns
    nkFrom
       nkJoin
          nkFromItem(a)
          nkFromItem(b)

@Araq
Copy link
Member

Araq commented Nov 1, 2023

Huh, sorry, that is not good, it's a breaking change of the worst kind, hidden and secret, only detected at runtime. Also, the old AST version makes more sense to me.

@tuffnatty
Copy link
Contributor Author

tuffnatty commented Nov 1, 2023

The module documentation claims:

## The `parsesql` module implements a high performance SQL file
## parser. It parses PostgreSQL syntax and the SQL ANSI standard.
##          
## Unstable API.

PostgreSQL SELECT documentation: https://www.postgresql.org/docs/current/sql-select.html
SQLite SELECT documentation: https://www.sqlite.org/lang_select.html
MySQL SELECT documentation: https://dev.mysql.com/doc/refman/8.0/en/select.html

JOIN is a part of the FROM clause, both syntactically and semantically.

Aiming for backwards compatibility as the highest priority did not make sense to me. The API is declared unstable.

Ok, if we try to keep the compatibility, how do you parse the following:

SELECT a.id
FROM a, b JOIN c ON b.id = c.id, d

In the solution proposed, it would produce the tree:

nkSelect
  nkSelectColumns
  nkFrom
    nkFromItem(a)
    nkJoin
      nkFromItem(b)
      nkFromItem(c)
    nkFromItem(d)

Although semantically FROM x, y is equivalent to FROM x CROSS JOIN y, I decided to keep the existing AST representation as it is simpler and still matches the syntax.

Also, I doubt that any significant amount of code did depend on the current JOIN tree representation, as the current code can't parse the simplest statements:

SELECT * FROM a LEFT JOIN b ON a.id = b.id;
SELECT * FROM a JOIN b ON a.id = b.id WHERE true;
SELECT * FROM a JOIN b ON a.id = b.id ORDER BY 1;
SELECT * FROM a JOIN b ON a.id = b.id, c;
SELECT * FROM a JOIN b ON a.id = b.id JOIN c ON b.id = c.id;

The breaking change can be documented in the release notes.

@tuffnatty
Copy link
Contributor Author

I have just added support for parenthesized JOIN subexpressions, such as:

SELECT id
FROM
    a JOIN (b JOIN c ON b.id = c.id) ON a.id = b.id

I did not plan to do it within this PR, but it should be there for the completeness.

Copy link
Contributor

github-actions bot commented Nov 1, 2024

This pull request is stale because it has been open for 1 year with no activity. Contribute more commits on the pull request and rebase it on the latest devel, or it will be closed in 30 days. Thank you for your contributions.

@github-actions github-actions bot added the stale Staled PR/issues; remove the label after fixing them label Nov 1, 2024
@tuffnatty
Copy link
Contributor Author

Rebased on the latest devel, although it seems no one else wants this to be working.

@Araq Araq merged commit 46bb47a into nim-lang:devel Nov 2, 2024
16 of 18 checks passed
Copy link
Contributor

github-actions bot commented Nov 2, 2024

Thanks for your hard work on this PR!
The lines below are statistics of the Nim compiler built from 46bb47a

Hint: mm: orc; opt: speed; options: -d:release
176527 lines; 8.426s; 655.09MiB peakmem

narimiran pushed a commit that referenced this pull request Jan 14, 2025
This commit fixes/adds tests for and fixes several issues with `JOIN`
operator parsing:

- For OUTER joins, LEFT | RIGHT | FULL specifier is not optional
```nim
doAssertRaises(SqlParseError): discard parseSql("""
SELECT id FROM a
OUTER JOIN b
ON a.id = b.id
""")
```

- For NATURAL JOIN and CROSS JOIN, ON and USING clauses are forbidden
```nim
doAssertRaises(SqlParseError): discard parseSql("""
SELECT id FROM a
CROSS JOIN b
ON a.id = b.id
""")
```

- JOIN should parse as part of FROM, not after WHERE
```nim
doAssertRaises(SqlParseError): discard parseSql("""
SELECT id FROM a
WHERE a.id IS NOT NULL
INNER JOIN b
ON a.id = b.id
""")
```

- LEFT JOIN should parse
```nim
doAssert $parseSql("""
SELECT id FROM a
LEFT JOIN b
ON a.id = b.id
""") == "select id from a left join b on a.id = b.id;"
```

- NATURAL JOIN should parse
```nim
doAssert $parseSql("""
SELECT id FROM a
NATURAL JOIN b
""") == "select id from a natural join b;"
```

- USING should parse
```nim
doAssert $parseSql("""
SELECT id FROM a
JOIN b
USING (id)
""") == "select id from a join b using (id );"
```

- Multiple JOINs should parse
```nim
doAssert $parseSql("""
SELECT id FROM a
JOIN b
ON a.id = b.id
LEFT JOIN c
USING (id)
""") == "select id from a join b on a.id = b.id left join c using (id );"
```

(cherry picked from commit 46bb47a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Staled PR/issues; remove the label after fixing them
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants