From 23865365e89536feb816b321c9bcba40d44155e1 Mon Sep 17 00:00:00 2001 From: Alan Cai Date: Mon, 27 Nov 2023 18:45:50 -0500 Subject: [PATCH] Apply John's feedback - struct -> tuple - list -> array - consistent variable definitions - empty subsumption rule - other feedback --- RFCs/0051-exclude-operator.adoc | 203 ++++++++++++++++---------------- 1 file changed, 103 insertions(+), 100 deletions(-) diff --git a/RFCs/0051-exclude-operator.adoc b/RFCs/0051-exclude-operator.adoc index 13ad1c5..9ab2076 100644 --- a/RFCs/0051-exclude-operator.adoc +++ b/RFCs/0051-exclude-operator.adoc @@ -4,7 +4,7 @@ * Start Date: 2023-11-07 -* PartiQL Issue: https://github.com/partiql/partiql-spec/issues/39 +* PartiQL Issue: https://github.com/partiql/partiql-lang/issues/27 * RFC PR: https://github.com/partiql/partiql-docs/pull/51 == Summary @@ -13,13 +13,13 @@ This doc defines the `EXCLUDE` binding tuple operator used to omit nested values == Motivation -SQL users often use `SELECT *` to project all of the columns of a table. There is frequently a use case in which a user would like to project all the columns from a table other than a subset of the columns (see https://stackoverflow.com/q/729197[slack overflow question]). There are workarounds in some database systems that are somewhat inefficient (e.g. creating a new table and dropping a specific column), but it can be helpful to have a dedicated syntax to filter out certain columns. <> lists out a few databases that provide some version of this column filtering. +SQL users often use `SELECT *` to project all of the columns of a table. There is frequently a use case in which a user would like to project all the columns from a table other than a subset of the columns (see https://stackoverflow.com/q/729197[Stack Overflow question]). There are workarounds in some database systems that are somewhat inefficient (e.g. creating a new table and dropping a specific column), but it can be helpful to have a dedicated syntax to filter out certain columns. <> lists out a few databases that provide some version of this column filtering. There is a similar need among PartiQL users to exclude certain nested fields from semi-structured data. PartiQL supports `SELECT *` to project all of the fields of a binding tuple. If a user wanted to omit one field from this projection, they would need to list out all of the projection fields or perform some intricate combination of `PIVOT` and ``UNPIVOT``s. [source,partiql,subs="+{markup-in-source}"] ---- --- Suppose `tbl` is a collection of structs that have `n` fields, `field~1~,...,field~n~`. +-- Suppose `tbl` is a collection of tuples that have `n` fields, `field~1~,...,field~n~`. -- To filter out `field~i~`, we would have to list out all fields other than `field~i~`. SELECT field~1~, ..., field~i-1~, field~i+1~, ..., field~n~ -- omit `field~i~` from tbl @@ -95,22 +95,24 @@ PartiQL should support s-expression types and values since PartiQL's type system ==== Step 1: subsumption of `EXCLUDE` paths We perform the following step to ensure that there are no redundant `EXCLUDE` paths. That is, there is no path such that all of its excluded binding tuple values are excluded by another exclude path. footnote:[This subsumption step is included to make the subsequent rewrite steps easier to reason about. In a query without redundant exclude paths, this step is not necessary.] -For each `` `p=root~p~s~1~...s~m~`, we compare it with all other ````s. `` `p` is said to be subsumed by another path `q=root~q~t~1~...t~n~` and not included in the rewritten `EXCLUDE` clause if any of the following rules apply: +For each `` `p=root~p~s~1~...s~x~`, we compare it with all other ````s. `` `p` is said to be subsumed by another path `q=root~q~t~1~...t~y~` and not included in the rewritten `EXCLUDE` clause if any of the following rules apply: NOTE: The following rules assume `root~p~=root~q~`. .Subsumption rules [[anchor-1a]] Rule 1.a:: - If `m ≥ n` and `s~1~...s~m~=t~1~...t~m~`, `q` subsumes `p`. Put another way if `p` has at least as many steps as `q` and the steps up to ``q``'s length are equivalent, `q` subsumes `p`. + If `y = 0` (i.e. `q` has no steps), `q` subsumes `p`. +[[anchor-1b]] Rule 1.b:: + If `y ≥ x` and `s~1~...s~x~=t~1~...t~x~`, `q` subsumes `p`. Put another way if `p` has at least as many steps as `q` and the steps up to ``q``'s length are equivalent, `q` subsumes `p`. Otherwise, there must be some step at which `p` and `q` diverge. Let's call this step's index `i`. -[[anchor-1b]] Rule 1.b:: - If `s~i~` is a tuple attribute and `t~i~` is a tuple wildcard and `t~i+1~...t~n~` subsumes `s~i+1~...s~m~` (i.e. the steps following `t~i~` subsumes the steps following `s~i~`), then `q` subsumes `p`. [[anchor-1c]] Rule 1.c:: - If `s~i~` is a collection index and `t~i~` is a collection wildcard and `t~i+1~...t~n~` subsumes `s~i+1~...s~m~` (i.e. the steps following `t~i~` subsumes the steps following `s~i~`), then `q` subsumes `p`. + If `s~i~` is a tuple attribute and `t~i~` is a tuple wildcard and `t~i+1~...t~y~` subsumes `s~i+1~...s~x~` (i.e. the steps following `t~i~` subsumes the steps following `s~i~`), then `q` subsumes `p`. [[anchor-1d]] Rule 1.d:: - If `s~i~` is a case-sensitive tuple attribute and `t~i~` is a case-insensitive tuple attribute and `t~i+1~...t~n~` subsumes `s~i+1~...s~m~` (i.e. the steps following `t~i~` subsumes the steps following `s~i~`), then `q` subsumes `p`. + If `s~i~` is a collection index and `t~i~` is a collection wildcard and `t~i+1~...t~y~` subsumes `s~i+1~...s~x~` (i.e. the steps following `t~i~` subsumes the steps following `s~i~`), then `q` subsumes `p`. +[[anchor-1e]] Rule 1.e:: + If `s~i~` is a case-sensitive tuple attribute and `t~i~` is a case-insensitive tuple attribute and `t~i+1~...t~y~` subsumes `s~i+1~...s~x~` (i.e. the steps following `t~i~` subsumes the steps following `s~i~`), then `q` subsumes `p`. .Subsumption Examples [options="header,footer"] @@ -119,17 +121,18 @@ Otherwise, there must be some step at which `p` and `q` diverge. Let's call this |`s.a` |`t.a` |No subsumption rules apply (roots differ) |`t.a` |`t.b` |No subsumption rules apply |`t.a.b.c` |`t.a.*.d` |No subsumption rules apply -|`t.a.b.c` |`t.a.b.c` |`q` subsumes `p` (by <>) -|`t.a.b.c` |`t.a.b` |`q` subsumes `p` (by <>) -|`t.a.b.c` |`t.a.b.*` |`q` subsumes `p` (by <> then <>) -|`t.a.b.c` |`t.a.*.c` |`q` subsumes `p` (by <> then <>) -|`t.a.b[1]` |`t.a.b` |`q` subsumes `p` (by <>) -|`t.a.b[1]` |`t.a.b[*]` |`q` subsumes `p` (by <> then <>) -|`t.a.b[1].c` |`t.a.b[1]` |`q` subsumes `p` (by <>) -|`t.a.b[1].c` |`t.a.b[*].c`|`q` subsumes `p` (by <> then <>) -|`t.a.b[1].c` |`t.a.b[*]` |`q` subsumes `p` (by <> then <>) -|`t.a."b"` |`t.a.b` |`q` subsumes `p` (by <> then <>) -|`t.a."b".c` |`t.a.b.c` |`q` subsumes `p` (by <> then <>) +|`t.a` |`t` |`q` subsumes `p` (by <>) +|`t.a.b.c` |`t.a.b.c` |`q` subsumes `p` (by <>) +|`t.a.b.c` |`t.a.b` |`q` subsumes `p` (by <>) +|`t.a.b.c` |`t.a.b.*` |`q` subsumes `p` (by <> then <>) +|`t.a.b.c` |`t.a.*.c` |`q` subsumes `p` (by <> then <>) +|`t.a.b[1]` |`t.a.b` |`q` subsumes `p` (by <>) +|`t.a.b[1]` |`t.a.b[*]` |`q` subsumes `p` (by <> then <>) +|`t.a.b[1].c` |`t.a.b[1]` |`q` subsumes `p` (by <>) +|`t.a.b[1].c` |`t.a.b[*]` |`q` subsumes `p` (by <> then <>) +|`t.a.b[1].c` |`t.a.b[*].c`|`q` subsumes `p` (by <> then <>) +|`t.a."b"` |`t.a.b` |`q` subsumes `p` (by <> then <>) +|`t.a."b".c` |`t.a.b.c` |`q` subsumes `p` (by <> then <>) |======================= --- @@ -137,9 +140,9 @@ We first illustrate the rewrite rule for a single `EXCLUDE` path and then explai ==== Step 2 (single): rewrite a single `EXCLUDE` path -To rewrite a single `EXCLUDE` path with `n` steps, `p=r.s~1~...s~n~`, we move the clauses other than the `SELECT`/`PIVOT` into a subquery, which will `EXCLUDE` the binding tuple values at the path `p`. This subquery essentially reconstructs the binding tuple of the other clauses using a `SELECT VALUE` struct to project back the binding tuple variables. All of the variables created from the other clauses not matching the `EXCLUDE` root `r` will use the identity function (e.g. binding tuple variable `foo` will have attribute `'foo'` and value `foo` in the `SELECT VALUE` struct). For the variable matching the `EXCLUDE` path root `r`, we apply the following rewrite rules to define ``r``'s value within the `SELECT VALUE` struct. If there is no such variable matching `EXCLUDE` path root `r`, the `EXCLUDE` path will not alter any of the binding tuple values. Hence, no rewrite rule is applied. +To rewrite a single `EXCLUDE` path with `n` steps, `p=r.s~1~...s~n~`, we move the clauses other than the `SELECT`/`PIVOT` into a subquery, which will `EXCLUDE` the binding tuple values at the path `p`. This subquery essentially reconstructs the binding tuple of the other clauses using a `SELECT VALUE` tuple to project back the binding tuple variables. All of the variables created from the other clauses not matching the `EXCLUDE` root `r` will use the identity function (e.g. binding tuple variable `foo` will have attribute `'foo'` and value `foo` in the `SELECT VALUE` tuple). For the variable matching the `EXCLUDE` path root `r`, we apply the following rewrite rules to define ``r``'s value within the `SELECT VALUE` tuple. If there is no such variable matching `EXCLUDE` path root `r`, the `EXCLUDE` path will not alter any of the binding tuple values. Hence, no rewrite rule is applied. -If the other clauses include an `ORDER BY`, we convert the top-level query back into a list by adding a position variable (i.e. `AT` clause) along with an `ORDER BY` over the position variable. +If the other clauses include an `ORDER BY`, we convert the top-level query back into an array by adding a position variable (i.e. `AT` clause) along with an `ORDER BY` over the position variable. [source,partiql,subs="+{markup-in-source}"] ---- @@ -159,7 +162,7 @@ FROM ( ) -[ -- Include conversion back to list if `ORDER BY` present in `` +[ -- Include conversion back to array if `ORDER BY` present in `` -- Assume `` and `` are fresh variables AS AT ORDER BY @@ -167,11 +170,11 @@ FROM ( ---- -The main idea for rewriting the `EXCLUDE` steps `s~1~,...,s~n~` is to create a nested `CASE` expression for each step, whereby the nested `CASE` expressions for `s~1~,...,s~n-1~` unnest the input binding tuple and the final `CASE` expression for `s~n~` (i.e. the final step) filters out the desired struct field(s) or collection index(es). Every exclude step has an expected type to process during evaluation. Tuple attribute and wildcard exclude steps expect a struct. Whereas a collection index expects a list and a collection wildcard expects a list or bag. The `CASE` expression at each level `i` recreates this expected type by including a `WHEN` branch based on the expected type. Each `CASE` expression will include an `ELSE` branch which outputs the previous level's identifier. This set of branches ensures that at evaluation time, if there is a type mismatch (e.g. evaluation value is a list while the exclude step is a tuple attribute), there is no evaluation error and the previous level's value is returned through the `ELSE` branch. This behavior applies to both the permissive and strict typing modes. +The main idea for rewriting the `EXCLUDE` steps `s~1~,...,s~n~` is to create a nested `CASE` expression for each step, whereby the nested `CASE` expressions for `s~1~,...,s~n-1~` unnest the input binding tuple and the final `CASE` expression for `s~n~` (i.e. the final step) filters out the desired tuple field(s) or collection index(es). Every exclude step has an expected type to process during evaluation. Tuple attribute and wildcard exclude steps expect a tuple. Whereas a collection index expects an array and a collection wildcard expects an array or bag. The `CASE` expression at each level `i` recreates this expected type by including a `WHEN` branch based on the expected type. Each `CASE` expression will include an `ELSE` branch which outputs the previous level's identifier. This set of branches ensures that at evaluation time, if there is a type mismatch (e.g. evaluation value is an array while the exclude step is a tuple attribute), there is no evaluation error and the previous level's value is returned through the `ELSE` branch. This behavior applies to both the permissive and strict typing modes. [source,partiql,subs="+{markup-in-source}"] ---- --- For the value `r` in our `SELECT VALUE` struct: +-- For the value `r` in our `SELECT VALUE` tuple: -- Assuming `` is the identifier created from the previous exclude step, `s~n-1~` SELECT VALUE { 'r': @@ -195,7 +198,7 @@ For this rewrite rule definition, let `` be the identifier created from If `s~i~` is a case-sensitive tuple attribute exclude step (e.g. `."foo"` or `['foo']`), where `` and `` are fresh variables, add the following `WHEN` branch to the `i`^th^ nested `CASE`. [source,partiql,subs="+{markup-in-source}"] ---- -WHEN IS STRUCT THEN ( +WHEN IS TUPLE THEN ( PIVOT ( CASE WHEN = THEN @@ -211,7 +214,7 @@ WHEN IS STRUCT THEN ( If `s~i~` is a case-insensitive tuple attribute exclude step (e.g. `.foo`), where `` and `` are fresh variables, add the following `WHEN` branch to the the `i`^th^ nested `CASE`. [source,partiql,subs="+{markup-in-source}"] ---- -WHEN IS STRUCT THEN ( +WHEN IS TUPLE THEN ( PIVOT ( CASE WHEN LOWER() = LOWER() THEN @@ -229,7 +232,7 @@ NOTE: This is essentially the same as <> but wraps the inner `CASE W If `s~i~` is a tuple wildcard exclude step, where `` and `` are fresh variables, add the following `WHEN` branch to the `i`^th^ nested `CASE`. [source,partiql,subs="+{markup-in-source}"] ---- -WHEN IS STRUCT THEN ( +WHEN IS TUPLE THEN ( PIVOT -- Apply rewrite rules on remaining exclude steps `s~i+1~,...,s~n~` AT @@ -240,7 +243,7 @@ WHEN IS STRUCT THEN ( If `s~i~` is a collection index exclude step, where `` and `` are fresh variables, add the following `WHEN` branch to the `i`^th^ nested `CASE`. [source,partiql,subs="+{markup-in-source}"] ---- -WHEN IS LIST THEN ( +WHEN IS ARRAY THEN ( SELECT VALUE CASE WHEN = THEN @@ -255,7 +258,7 @@ WHEN IS LIST THEN ( If `s~i~` is a collection wildcard exclude step, where `` and `` are fresh variables, add the following `WHEN` branches to the `i`^th^ nested `CASE`. [source,partiql,subs="+{markup-in-source}"] ---- -WHEN IS LIST THEN ( +WHEN IS ARRAY THEN ( SELECT VALUE -- Apply rewrite rules on remaining exclude steps `s~i+1~,...,s~n~` FROM AS AT @@ -285,7 +288,7 @@ Similar to <>, we case on the type of exclude step to determine which If the last step, `s~n~`, is a case-sensitive tuple attribute exclude step, where `` and `` are fresh variables, we add the following `WHEN` branch: [source,partiql,subs="+{markup-in-source}"] ---- -WHEN IS STRUCT THEN ( +WHEN IS TUPLE THEN ( PIVOT AT FROM UNPIVOT AS AT WHERE NOT IN [ ] @@ -295,7 +298,7 @@ WHEN IS STRUCT THEN ( If the last step, `s~n~`, is a case-insensitive tuple attribute exclude step, where `` and `` are fresh variables, we add the following `WHEN` branch: [source,partiql,subs="+{markup-in-source}"] ---- -WHEN IS STRUCT THEN ( +WHEN IS TUPLE THEN ( PIVOT AT FROM UNPIVOT AS AT WHERE LOWER( ) NOT IN [ LOWER() ] -- difference w/ 3.a.i is `LOWER` call on `` and `` @@ -305,14 +308,14 @@ WHEN IS STRUCT THEN ( If the last step, `s~n~`, is a tuple wildcard exclude step, we add the following `WHEN` branch: [source,partiql,subs="+{markup-in-source}"] ---- -WHEN IS STRUCT THEN - { } -- empty struct +WHEN IS TUPLE THEN + { } -- empty tuple ---- [[anchor-3c]] Rule 3.c:: If the last step is a collection index exclude step, where `` and `` are fresh variables, we add the following `WHEN` branch: [source,partiql,subs="+{markup-in-source}"] ---- -WHEN IS LIST THEN +WHEN IS ARRAY THEN SELECT VALUE FROM AS AT WHERE NOT IN [] @@ -322,8 +325,8 @@ WHEN IS LIST THEN If the last step, `s~n~`, is a collection wildcard exclude step, we add the following two `WHEN` branches: [source,partiql,subs="+{markup-in-source}"] ---- -WHEN IS LIST THEN - [] -- empty list +WHEN IS ARRAY THEN + [] -- empty array WHEN IS BAG THEN <<>> -- empty bag ---- @@ -332,45 +335,45 @@ Based on the defined rules for single `EXCLUDE` path rewrites, we will now cover ==== Step 2 (multiple): rewriting multiple `EXCLUDE` paths -For multiple `EXCLUDE` paths, we employ a similar idea as the rewrite for a single path. The clauses other than the `SELECT`/`PIVOT` are moved to a subquery that will be ranged over. This subquery contains a `SELECT VALUE` struct which will reconstruct the binding tuple of the other clauses with the exclude paths' rewrite. Variables created from the other clauses without a matching exclude path root will be included in the struct with the identity function. Every binding tuple variable matching one or more exclude path roots will have a struct value defined using the below rewrites. +For multiple `EXCLUDE` paths, we employ a similar idea as the rewrite for a single path. The clauses other than the `SELECT`/`PIVOT` are moved to a subquery that will be ranged over. This subquery contains a `SELECT VALUE` tuple which will reconstruct the binding tuple of the other clauses with the exclude paths' rewrite. Variables created from the other clauses without a matching exclude path root will be included in the tuple with the identity function. Every binding tuple variable matching one or more exclude path roots will have a tuple value defined using the below rewrites. [source,partiql,subs="+{markup-in-source}"] ---- --- Let `n` represent the number of `EXCLUDE` paths +-- Let `M` represent the number of `EXCLUDE` paths -- Original query: FROM ( SELECT VALUE { 'r~1~': -- apply rewrite rules on exclude paths that have root `r~1~` ⋮ - 'r~m~': -- apply rewrite rules on exclude paths that have root `r~m~` + 'r~R~': -- apply rewrite rules on exclude paths that have root `r~R~` ... -- other variables created from the other clauses } ) -[ -- Include conversion back to list if `ORDER BY` present in `` +[ -- Include conversion back to array if `ORDER BY` present in `` -- Assume `` and `` are fresh variables AS AT ORDER BY ] ---- -Like single path rewriting, we create a nested `CASE` expression for each step. However, for multiple paths, we look at all the paths in parallel and process the steps at the same level. For the following, let `i=1,...,z` where `z` is the length of the longest exclude path. The nested `CASE` expressions for all `i` are created as before. For the following, let `` be the identifier from the previous level (or the root identifier if `i = 1`). +Like single path rewriting, we create a nested `CASE` expression for each step. However, for multiple paths, we look at all the applicable paths in parallel and process the steps at the same level. Applicable paths refers to the subset of paths that have the same root and same tuple attributes/collection indexes at previous levels. For the following, let `z` be the length of the longest exclude path. The nested `CASE` expressions for all level `i=1,...,z` are created as before. For the following, let `` be the identifier from the previous level (or the root identifier if `i = 1`). [source,partiql,subs="+{markup-in-source}"] ---- CASE - WHEN IS STRUCT THEN + WHEN IS TUPLE THEN ... -- apply tuple attr and wildcard path rewrite (rule 4.a) - WHEN IS LIST THEN + WHEN IS ARRAY THEN ... -- apply collection index and wildcard path rewrite (rule 4.b) WHEN IS BAG THEN ... -- apply collection wildcard path rewrite (rule 4.b) @@ -391,21 +394,21 @@ If there are any `EXCLUDE` paths of length `i`, then similar to <> and <>, we add a `CASE` expression within the `PIVOT`. This `CASE` expression within the `PIVOT` will define a `WHEN` branch for each of the unique tuple attribute steps. Each of these `WHEN` branches will apply the rewrite rules for the exclude paths that have additional steps and equivalent tuple attribute or tuple wildcard. An `ELSE` branch will be added to this `CASE` expression which will apply the rewrite rules for the exclude paths with a tuple wildcard at level `i` and additional steps. [source,partiql,subs="+{markup-in-source}"] ---- --- Let `k` represent the number of unique exclude tuple attrs for paths of length +-- Let `T` represent the number of unique exclude tuple attrs for paths of length -- greater than `i`. -- `` and `` are fresh variables -WHEN IS STRUCT THEN ( +WHEN IS TUPLE THEN ( PIVOT ( CASE - WHEN = THEN + WHEN = THEN -- Apply rewrite rules for exclude paths with -- length > i AND - -- tuple attr~1~ or tuple wildcard at ith step + -- tuple attr~unique1~ or tuple wildcard at ith step ⋮ - WHEN = THEN + WHEN = THEN -- Apply rewrite rules for exclude paths with -- length > i AND - -- tuple attr~k~ or tuple wildcard at ith step + -- tuple attr~uniqueT~ or tuple wildcard at ith step ELSE -- Apply rewrite rules for exclude paths with -- length > i AND @@ -414,23 +417,23 @@ WHEN IS STRUCT THEN ( ) AT FROM UNPIVOT AS AT WHERE - NOT IN [] + NOT IN [] AND - LOWER() NOT IN [] -- call `LOWER` on each of the case-insensitive tuple attrs + LOWER() NOT IN [] -- call `LOWER` on each of the case-insensitive tuple attrs ) ---- ===== -NOTE: If the only applicable path at level `i` is a tuple wildcard and this path is of length `i`, we know there are no other applicable tuple paths by the subsumption rules. In this case, we can just return an empty struct for the `ith` nested `CASE` like <>: +NOTE: If the only applicable path at level `i` is a tuple wildcard and this path is of length `i`, we know there are no other applicable tuple paths by the subsumption rules. In this case, we can just return an empty tuple for the `ith` nested `CASE` like <>: [source,partiql,subs="+{markup-in-source}"] ---- -WHEN IS STRUCT THEN +WHEN IS TUPLE THEN { } ---- ===== --- -If any of the applicable `EXCLUDE` paths at level `i` have a collection index or wildcard exclude step, then we add the following `WHEN` branches to the `i`^th^ nested `CASE` expression. If the exclude paths at level `i` are all collection index steps, only a `WHEN` branch casing on if the previous level's value `` was a list will be added. Otherwise, a `WHEN` branch casing on if `` is a bag will also be added. Alike the collection exclude rules defined for single `EXCLUDE` paths, we add a `SELECT VALUE ... FROM` over ``. +If any of the applicable `EXCLUDE` paths at level `i` have a collection index or wildcard exclude step, then we add the following `WHEN` branches to the `i`^th^ nested `CASE` expression. If the exclude paths at level `i` are all collection index steps, only a `WHEN` branch casing on if the previous level's value `` was an array will be added. Otherwise, a `WHEN` branch casing on if `` is a bag will also be added. Alike the collection exclude rules defined for single `EXCLUDE` paths, we add a `SELECT VALUE ... FROM` over ``. Rule 4.b:: We divide the set of applicable `EXCLUDE` paths into two subsets: @@ -438,35 +441,35 @@ We divide the set of applicable `EXCLUDE` paths into two subsets: 1. paths of length `i` (i.e. final step is `i`) 2. paths of length greater than `i` (i.e. have additional steps) -If there are any `EXCLUDE` paths of length `i`, then similar to <>, we add a `WHERE` clause to filter out those fields. The fields to exclude will be grouped together within a list. +If there are any `EXCLUDE` paths of length `i`, then similar to <>, we add a `WHERE` clause to filter out those fields. The fields to exclude will be grouped together within an array. -(Within the `WHEN IS LIST` branch) If there are any `EXCLUDE` paths of length greater than `i`, then similar to <>, we add a `CASE` expression within the `SELECT VALUE ... AT ... ORDER BY`. This `CASE` expression within the `SELECT VALUE` will define a `WHEN` branch for each of the unique collection index steps. Each of these `WHEN` branches will apply the rewrite rules for the exclude paths that have additional steps and equivalent collection indexes or collection wildcard. An `ELSE` branch will be added to this `CASE` expression which will apply the rewrite rules for the exclude paths with additional steps and collection wildcard. +(Within the `WHEN IS ARRAY` branch) If there are any `EXCLUDE` paths of length greater than `i`, then similar to <>, we add a `CASE` expression within the `SELECT VALUE ... AT ... ORDER BY`. This `CASE` expression within the `SELECT VALUE` will define a `WHEN` branch for each of the unique collection index steps. Each of these `WHEN` branches will apply the rewrite rules for the exclude paths that have additional steps and equivalent collection indexes or collection wildcard. An `ELSE` branch will be added to this `CASE` expression which will apply the rewrite rules for the exclude paths with additional steps and collection wildcard. (Within the `WHEN IS BAG` branch, if applicable) We simply have a `FROM` over `` with a `SELECT VALUE` that applies the rewrite rules for exclude paths that have additional steps and collection wildcard at level `i`. [source,partiql,subs="+{markup-in-source}"] ---- --- Let `k` represent the number of unique exclude collection indexes for exclude paths of length +-- Let `C` represent the number of unique exclude collection indexes for exclude paths of length -- greater than `i`. -- `` and `` are fresh variables -WHEN IS LIST THEN ( +WHEN IS ARRAY THEN ( SELECT VALUE CASE - WHEN = THEN + WHEN = THEN -- Apply rewrite rules for exclude paths with -- length > i AND - -- collection index idx~1~ or wildcard at ith step + -- collection index idx~unique1~ or wildcard at ith step ⋮ - WHEN = THEN + WHEN = THEN -- Apply rewrite rules for exclude paths with -- length > i AND - -- collection index idx~k~ or wildcard at ith step + -- collection index idx~uniqueC~ or wildcard at ith step ELSE -- Apply rewrite rules for exclude paths with -- length > i AND -- collection wildcard at ith step END FROM AS AT - WHERE NOT IN [] + WHERE NOT IN [] ORDER BY ) WHEN IS BAG THEN ( @@ -477,11 +480,11 @@ WHEN IS BAG THEN ( ---- ===== -NOTE: If the only applicable path at level `i` is a collection wildcard and this path is of length `i`, we know there are no other applicable collection paths by the subsumption rules. In this case, we can just return an empty list or bag for the `ith` nested `CASE` like <>: +NOTE: If the only applicable path at level `i` is a collection wildcard and this path is of length `i`, we know there are no other applicable collection paths by the subsumption rules. In this case, we can just return an empty array or bag for the `ith` nested `CASE` like <>: [source,partiql,subs="+{markup-in-source}"] ---- -WHEN IS LIST THEN - [] -- empty list +WHEN IS ARRAY THEN + [] -- empty array WHEN IS BAG THEN <<>> -- empty bag ---- @@ -510,12 +513,12 @@ FROM ( SELECT VALUE { 't': CASE - WHEN t IS STRUCT THEN ( + WHEN t IS TUPLE THEN ( PIVOT ( CASE WHEN LOWER(attr_1) = LOWER('a') THEN CASE - WHEN v_1 IS STRUCT THEN ( + WHEN v_1 IS TUPLE THEN ( PIVOT v_2 AT attr_2 FROM UNPIVOT v_1 AS v_2 AT attr_2 WHERE LOWER(attr_2) NOT IN [LOWER('field_x')] @@ -581,12 +584,12 @@ FROM ( SELECT VALUE { 't': CASE - WHEN t IS STRUCT THEN ( + WHEN t IS TUPLE THEN ( PIVOT ( CASE WHEN LOWER(attr_1) = LOWER('a') THEN CASE - WHEN v_1 IS STRUCT THEN + WHEN v_1 IS TUPLE THEN {} ELSE v_1 END @@ -648,10 +651,10 @@ FROM ( SELECT VALUE { 't': CASE - WHEN t IS STRUCT THEN ( + WHEN t IS TUPLE THEN ( PIVOT ( CASE - WHEN v_1 IS STRUCT THEN ( + WHEN v_1 IS TUPLE THEN ( PIVOT v_2 AT attr_2 FROM UNPIVOT v_1 AS v_2 AT attr_2 WHERE LOWER(attr_2) NOT IN [LOWER('field_x')] @@ -716,12 +719,12 @@ FROM ( SELECT VALUE { 't': CASE - WHEN t IS STRUCT THEN ( + WHEN t IS TUPLE THEN ( PIVOT ( CASE WHEN LOWER(attr_1) = LOWER('a') THEN CASE - WHEN v_1 IS LIST THEN ( + WHEN v_1 IS ARRAY THEN ( SELECT VALUE v_2 FROM v_1 AS v_2 AT idx_2 WHERE idx_2 NOT IN [1] @@ -797,12 +800,12 @@ FROM ( SELECT VALUE { 't': CASE - WHEN t IS STRUCT THEN ( + WHEN t IS TUPLE THEN ( PIVOT ( CASE WHEN LOWER(attr_1) = LOWER('a') THEN CASE - WHEN v_1 IS LIST THEN + WHEN v_1 IS ARRAY THEN [] WHEN v_1 IS BAG THEN <<>> @@ -865,13 +868,13 @@ Rewritten query: SELECT t.* FROM ( SELECT VALUE { - 't': CASE WHEN t IS STRUCT THEN ( + 't': CASE WHEN t IS TUPLE THEN ( PIVOT ( CASE WHEN LOWER(attr_1) = LOWER('a') THEN - CASE WHEN v_1 IS LIST THEN ( + CASE WHEN v_1 IS ARRAY THEN ( SELECT VALUE CASE WHEN idx_2 = 1 THEN - CASE WHEN v_2 IS STRUCT THEN ( + CASE WHEN v_2 IS TUPLE THEN ( PIVOT v_3 AT attr_3 FROM UNPIVOT v_2 AS v_3 AT attr_3 WHERE LOWER(attr_3) NOT IN [LOWER('field_x')] @@ -953,12 +956,12 @@ Rewritten query: SELECT t.* FROM ( SELECT VALUE { - 't': CASE WHEN t IS STRUCT THEN ( + 't': CASE WHEN t IS TUPLE THEN ( PIVOT ( CASE WHEN LOWER(attr_1) = LOWER('a') THEN - CASE WHEN v_1 IS LIST THEN ( + CASE WHEN v_1 IS ARRAY THEN ( SELECT VALUE - CASE WHEN v_2 IS STRUCT THEN ( + CASE WHEN v_2 IS TUPLE THEN ( PIVOT v_3 AT attr_3 FROM UNPIVOT v_2 AS v_3 AT attr_3 WHERE LOWER(attr_3) NOT IN [LOWER('field_x')] @@ -970,7 +973,7 @@ FROM ( ) WHEN v_1 IS BAG THEN ( SELECT VALUE - CASE WHEN v_2 IS STRUCT THEN ( + CASE WHEN v_2 IS TUPLE THEN ( PIVOT v_3 AT attr_3 FROM UNPIVOT v_2 AS v_3 AT attr_3 WHERE LOWER(attr_3) NOT IN [LOWER('field_x')] @@ -1046,7 +1049,7 @@ FROM ( SELECT VALUE { 'foo': foo, 'bar': - CASE WHEN bar is STRUCT THEN ( + CASE WHEN bar IS TUPLE THEN ( PIVOT v AT attr FROM UNPIVOT bar AS v AT attr WHERE LOWER(attr) NOT IN [LOWER('d')] @@ -1113,7 +1116,7 @@ SELECT v, attr FROM ( SELECT VALUE { 'v': - CASE WHEN v IS STRUCT THEN ( + CASE WHEN v IS TUPLE THEN ( PIVOT v_v AT attr_v FROM UNPIVOT v AS v_v AT attr_v WHERE LOWER(attr_v) NOT IN [LOWER('foo')] @@ -1182,7 +1185,7 @@ FROM ( SELECT VALUE { 't': CASE - WHEN t IS STRUCT THEN ( + WHEN t IS TUPLE THEN ( PIVOT v AT attr FROM UNPIVOT t AS v AT attr WHERE LOWER(attr) NOT IN [LOWER('a')] @@ -1242,7 +1245,7 @@ FROM ( SELECT VALUE { 't': CASE - WHEN t IS STRUCT THEN ( + WHEN t IS TUPLE THEN ( PIVOT v_1 AT attr_1 FROM UNPIVOT t AS v_1 AT attr_1 WHERE @@ -1300,12 +1303,12 @@ FROM ( SELECT VALUE { 't': CASE - WHEN t IS STRUCT THEN ( + WHEN t IS TUPLE THEN ( PIVOT ( CASE WHEN LOWER(attr_1) = LOWER('a') THEN CASE - WHEN v_1 IS STRUCT THEN ( + WHEN v_1 IS TUPLE THEN ( PIVOT v_2 AT attr_2 FROM UNPIVOT v_1 AS v_2 AT attr_2 WHERE LOWER(attr_2) NOT IN [LOWER('a1')] @@ -1371,12 +1374,12 @@ SELECT t.* FROM ( SELECT VALUE { 't': - CASE WHEN t IS STRUCT THEN ( + CASE WHEN t IS TUPLE THEN ( PIVOT ( CASE WHEN LOWER(attr_1) = LOWER('a') THEN - CASE WHEN v_1 IS STRUCT THEN ( + CASE WHEN v_1 IS TUPLE THEN ( PIVOT ( - CASE WHEN v_2 IS STRUCT THEN ( + CASE WHEN v_2 IS TUPLE THEN ( PIVOT v_3 AT attr_3 FROM UNPIVOT v_2 AS v_3 AT attr_3 WHERE LOWER(attr_3) NOT IN [LOWER('bar')] @@ -1387,9 +1390,9 @@ FROM ( FROM UNPIVOT v_1 AS v_2 AT attr_2 WHERE LOWER(attr_2) NOT IN [LOWER('bar')] ) - WHEN v_1 IS LIST THEN ( + WHEN v_1 IS ARRAY THEN ( SELECT VALUE - CASE WHEN v_2 IS STRUCT THEN ( + CASE WHEN v_2 IS TUPLE THEN ( PIVOT v_3 AT attr_3 FROM UNPIVOT v_2 AS v_3 AT attr_3 WHERE LOWER(attr_3) NOT IN [LOWER('bar')] @@ -1400,7 +1403,7 @@ FROM ( ORDER BY idx_2 ) -- WHEN v_1 IS BAG THEN ... - -- same as for LIST but remove `AT` and `ORDER BY` + -- same as for ARRAY but remove `AT` and `ORDER BY` ELSE v_1 END ELSE v_1 @@ -1494,7 +1497,7 @@ We choose to model `EXCLUDE` as a syntactic rewrite over existing clauses (e.g. Why does `EXCLUDE` not give an evaluation error when an exclude path does not remove anything? Or on data type mismatch (e.g. tuple attribute exclude step on collection)?:: -We have opted to not error at evaluation time when `EXCLUDE` does not omit any values or in data type mismatch cases. It is very possible in the schemaless, semi-structured data domain that our data is missing some fields or has different structures. The idea here is that `EXCLUDE` will guarantee that all values at the exclude path will be omitted from the output binding tuple. This can enable use cases such as <> in which the data we wish to exclude is nested within a heterogeneous set of structs and containers. +We have opted to not error at evaluation time when `EXCLUDE` does not omit any values or in data type mismatch cases. It is very possible in the schemaless, semi-structured data domain that our data is missing some fields or has different structures. The idea here is that `EXCLUDE` will guarantee that all values at the exclude path will be omitted from the output binding tuple. This can enable use cases such as <> in which the data we wish to exclude is nested within a heterogeneous set of tuples and collections. + A future RFC could opt to give a warning/error in these cases when schema is present and we know at static time that an `EXCLUDE` path will not omit values. See <> for more discussion on schema. @@ -1511,7 +1514,7 @@ PartiQL users have frequently asked us for this capability to omit certain neste * Some helpful discussion on the issue of `EXCLUDE` being added to AsterixDB: https://issues.apache.org/jira/browse/ASTERIXDB-3059 * More info on AsterixDB: https://dbdb.io/db/asterixdb -AsterixDB, an implementation of SQL++, has defined an `EXCLUDE` clause to operate on semi-structured data to omit certain nested struct fields; however, AsterixDB's definition is limited and does not cover other common use cases involving collections and multi-struct field exclusions. +AsterixDB, an implementation of SQL++, has defined an `EXCLUDE` clause to operate on semi-structured data to omit certain nested tuple fields; however, AsterixDB's definition is limited and does not cover other common use cases involving collections and multi-tuple field exclusions. Another key difference is that the `EXCLUDE` clause is evaluated on the output of the `SELECT` projection.