From cee5c1ef16246f7d4ec15b09bcbbc75766dd9090 Mon Sep 17 00:00:00 2001 From: Stefan Plantikow Date: Fri, 21 Apr 2017 00:36:07 +0200 Subject: [PATCH 1/9] CIP2017-04-20: Query Combinators --- .../CIP2017-04-20-query-combinators.adoc | 122 ++++++++++++++++++ 1 file changed, 122 insertions(+) create mode 100644 cip/1.accepted/CIP2017-04-20-query-combinators.adoc diff --git a/cip/1.accepted/CIP2017-04-20-query-combinators.adoc b/cip/1.accepted/CIP2017-04-20-query-combinators.adoc new file mode 100644 index 0000000000..a4d4094ff3 --- /dev/null +++ b/cip/1.accepted/CIP2017-04-20-query-combinators.adoc @@ -0,0 +1,122 @@ += CIP2017-04-20 - Query Combinators +:numbered: +:toc: +:toc-placement: macro +:source-highlighter: codemirror + +*Author:* Stefan Plantikow + +[abstract] +.Abstract +-- +This CIP codifies the pre-existing `UNION` and `UNION ALL` clauses as well as proposes the addition of new query combinators for set operations. +-- + +toc::[] + +== Motivation + +Query combinators for set operators are a common feature in other query languages. +Adding more query combinators to Cypher will increase language expressivity and provide functionality that has been requested (and expected to exist) in the language by users. + +== Background + +The vast majority of Cypher clauses allows for sequential composition: The records produces by the first clause become an input to the following clause. +However, some operations require multiple streams of records as inputs. +These are called query combinators. +The most notable example of query combinators are set operations. + +== Proposal + +This CIP proposes the introduction of several new multi-arm query combinators. + +* `UNION` +* `UNION ALL` +* `INTERSECT` +* `INTERSECT ALL` +* `EXCEPT` +* `EXCEPT ALL` +* `EXCLUSIVE UNION` +* `EXCLUSIVE UNION ALL` +* `OTHERWISE` +* `OTHERWISE ALL` + +Multi-arm query combinators can only appear as a primary clause (at the top-level of a query) using the syntax `+ RETURN ... [ + RETURN ...]`. + +The `` can be any of the combinators given above. +Multi-arm query combinators are interpreted left-associative. + +The `RETURN` clause of each arm is either a `RETURN *` or specifies record fields explicitly. +All arms that specify record fields explicitly must specify the exact same set of record fields in the exact same order. +If an arm ends in `RETURN *` it must implicitly return the exact same set of record fields as any other arm that specifies record fields explicitly. +If all arms end in `RETURN *` they must return the exact same set of record fields. + +Multi-arm query combinators determine the result signature of a top-level query. +If any arm specifies recod fields explicitly, the exact same set of record fields in the exact same order is returned by the whole query. +If all arms end in `RETURN *`, the order of record fields is unspecified and left to the implementation. + +Additionally, query combinators may be used in a secondary clause position via nested subqueries (covered in separate CIP). + +=== UNION + +`UNION` computes the logical set union between two sets of input records (i.e discards any duplicates). + +`UNION ALL` computes the logical multiset union between two bags of input records (i.e. preserves duplicates). + +=== INTERSECT + +`INTERSECT` computes the logical set intersection between two sets of input records (i.e discards any duplicates). + +`INTERSECT ALL` computes the logical multiset intersection between two bags of input records (i.e. preserves shared duplicates). + +=== EXCEPT + +`EXCEPT` computes the logical set difference between two sets of input records (i.e discards any duplicates). + +`EXCEPT ALL` computes the logical multiset difference between two bags of input records (i.e. preserves excess duplicates on the left-hand side). + +=== EXCLUSIVE UNION + +`EXCLUSIVE UNION` computes the exclusive logical set union between two sets of input records (i.e discards any duplicates in the final outcome). + +`EXCLUSIVE UNION ALL` computes the exclusive logical multiset union between two bags of input records (i.e. returns the largest remaining excess multiplicity of each record in any argument bag). + +=== OTHERWISE + +`OTHERWISE` computes the logical choice between two sets of input records. +It evaluates to all distinct records from the left argument unless that set is empty in which case it evaluates to all distinct records from the right argument. + +`OTHERWISE ALL` computes the logical choice between two bags of input records. +It evaluates to all records from the left argument unless that set is empty in which case it evaluates to all records from the right argument. + +=== Handling of NULL values + +All query combinators perform record-level comparisons under equivalence (i.e. `NULL` is equivalent to `NULL`). + +=== Interaction with existing features + +This CIP codifies the pre-existing `UNION` and `UNION ALL` constructs. + +The suggested changes are expected to integrate well with the parallel CIP for nested subqueries. + +This CIP adds `INTERSECT`, `EXCLUSIVE`, and `OTHERWISE` as new keywords. + +=== Alternatives + +`EXCLUSIVE UNION` is not provided by SQL and could be omitted. + +`OTHERWISE` is not provided by SQL and could be omitted. + +SQL allows `MINUS` as an alias for `EXCEPT`. + +== What others do + +This proposal mainly follows SQL. + +== Benefits to this proposal + +Set operations are added to the language. + +== Caveats to this proposal + +Increase in language complexity; adopting controversial `NULL` handling issues from SQL. From 7358781c1c0fccfc311ff78a10930f94344bae4e Mon Sep 17 00:00:00 2001 From: Stefan Plantikow Date: Fri, 21 Apr 2017 01:15:51 +0200 Subject: [PATCH 2/9] Add CROSS/CROSS ALL --- .../CIP2017-04-20-query-combinators.adoc | 30 +++++++++++++++---- 1 file changed, 25 insertions(+), 5 deletions(-) diff --git a/cip/1.accepted/CIP2017-04-20-query-combinators.adoc b/cip/1.accepted/CIP2017-04-20-query-combinators.adoc index a4d4094ff3..9535b4e3f4 100644 --- a/cip/1.accepted/CIP2017-04-20-query-combinators.adoc +++ b/cip/1.accepted/CIP2017-04-20-query-combinators.adoc @@ -31,6 +31,7 @@ The most notable example of query combinators are set operations. This CIP proposes the introduction of several new multi-arm query combinators. * `UNION` +* `UNION MAX` * `UNION ALL` * `INTERSECT` * `INTERSECT ALL` @@ -38,6 +39,8 @@ This CIP proposes the introduction of several new multi-arm query combinators. * `EXCEPT ALL` * `EXCLUSIVE UNION` * `EXCLUSIVE UNION ALL` +* `CROSS` +* `CROSS ALL` * `OTHERWISE` * `OTHERWISE ALL` @@ -45,6 +48,7 @@ Multi-arm query combinators can only appear as a primary clause (at the top-leve The `` can be any of the combinators given above. Multi-arm query combinators are interpreted left-associative. +Therefore in the following, we only consider combinator semantics regarding two arms (left and right). The `RETURN` clause of each arm is either a `RETURN *` or specifies record fields explicitly. All arms that specify record fields explicitly must specify the exact same set of record fields in the exact same order. @@ -59,28 +63,40 @@ Additionally, query combinators may be used in a secondary clause position via n === UNION -`UNION` computes the logical set union between two sets of input records (i.e discards any duplicates). +`UNION` computes the logical set union between two sets of input records (i.e.discards any duplicates). -`UNION ALL` computes the logical multiset union between two bags of input records (i.e. preserves duplicates). +`UNION MAX` computes the logical multiset union between two bags of input records (i.e. preserves the larget numer of duplicates from either arm). + +`UNION ALL` computes the logical multiset union between two bags of input records (i.e. preserves all duplicates from both arms). === INTERSECT -`INTERSECT` computes the logical set intersection between two sets of input records (i.e discards any duplicates). +`INTERSECT` computes the logical set intersection between two sets of input records (i.e.discards any duplicates). `INTERSECT ALL` computes the logical multiset intersection between two bags of input records (i.e. preserves shared duplicates). === EXCEPT -`EXCEPT` computes the logical set difference between two sets of input records (i.e discards any duplicates). +`EXCEPT` computes the logical set difference between two sets of input records (i.e.discards any duplicates). `EXCEPT ALL` computes the logical multiset difference between two bags of input records (i.e. preserves excess duplicates on the left-hand side). === EXCLUSIVE UNION -`EXCLUSIVE UNION` computes the exclusive logical set union between two sets of input records (i.e discards any duplicates in the final outcome). +`EXCLUSIVE UNION` computes the exclusive logical set union between two sets of input records (i.e.discards any duplicates in the final outcome). `EXCLUSIVE UNION ALL` computes the exclusive logical multiset union between two bags of input records (i.e. returns the largest remaining excess multiplicity of each record in any argument bag). +=== CROSS + +`CROSS` computes the cartesian product between two sets of input records (i.e. discards any duplicates). + +`CROSS ALL` computes the cartesian product between two bags of input records (i.e. preserves duplicates). + +Contrary to the other query combinators, the rules regarding result fields do not apply to `CROSS` and `CROSS ALL`. +Instead, the set of return fields of both arms of a `CROSS` or a `CROSS ALL` must be non-overlapping. +The final result fields of a `CROSS` or a `CROSS ALL` are the exact same fields from the left arm in the exact same order given by the left arm followed by the exact same fields from the right arm in the exact same order given by the right arm. + === OTHERWISE `OTHERWISE` computes the logical choice between two sets of input records. @@ -107,8 +123,12 @@ This CIP adds `INTERSECT`, `EXCLUSIVE`, and `OTHERWISE` as new keywords. `OTHERWISE` is not provided by SQL and could be omitted. +SQL does not have `UNION MAX` (it has been suggested in the literature though). + SQL allows `MINUS` as an alias for `EXCEPT`. +SQL uses `CROSS JOIN` for `CROSS ALL` and does not provide `CROSS` directly. + == What others do This proposal mainly follows SQL. From 2a916d029a9d42b9cfa530394b7e5d5349d3f4bf Mon Sep 17 00:00:00 2001 From: Stefan Plantikow Date: Fri, 21 Apr 2017 12:37:34 +0200 Subject: [PATCH 3/9] Add THEN, Remove OTHERWISE ALL, CROSS ALL --- .../CIP2017-04-20-query-combinators.adoc | 54 ++++++++++--------- 1 file changed, 30 insertions(+), 24 deletions(-) diff --git a/cip/1.accepted/CIP2017-04-20-query-combinators.adoc b/cip/1.accepted/CIP2017-04-20-query-combinators.adoc index 9535b4e3f4..1582c00c1f 100644 --- a/cip/1.accepted/CIP2017-04-20-query-combinators.adoc +++ b/cip/1.accepted/CIP2017-04-20-query-combinators.adoc @@ -9,7 +9,7 @@ [abstract] .Abstract -- -This CIP codifies the pre-existing `UNION` and `UNION ALL` clauses as well as proposes the addition of new query combinators for set operations. +This CIP codifies the pre-existing `UNION` and `UNION ALL` clauses as well as proposes the addition of new query combinators for set operations and pipelining. -- toc::[] @@ -39,27 +39,26 @@ This CIP proposes the introduction of several new multi-arm query combinators. * `EXCEPT ALL` * `EXCLUSIVE UNION` * `EXCLUSIVE UNION ALL` -* `CROSS` -* `CROSS ALL` * `OTHERWISE` -* `OTHERWISE ALL` +* `CROSS` +* `THEN` -Multi-arm query combinators can only appear as a primary clause (at the top-level of a query) using the syntax `+ RETURN ... [ + RETURN ...]`. +Multi-arm query combinators can only appear as a query using the syntax ` [ ]+`. The `` can be any of the combinators given above. Multi-arm query combinators are interpreted left-associative. Therefore in the following, we only consider combinator semantics regarding two arms (left and right). +For all proposed query combinators but not for `CROSS` and `THEN`, the following standard rules regarding returned We fields apply: + The `RETURN` clause of each arm is either a `RETURN *` or specifies record fields explicitly. -All arms that specify record fields explicitly must specify the exact same set of record fields in the exact same order. -If an arm ends in `RETURN *` it must implicitly return the exact same set of record fields as any other arm that specifies record fields explicitly. -If all arms end in `RETURN *` they must return the exact same set of record fields. +If both arms specify record fields explicitly, then they must specify the exact same set of record fields in the exact same order. +If an arm ends in `RETURN *` and the other arm specifies record fields explicitly, it must implicitly return the exact same set of record fields as the other arm. +If both arms end in `RETURN *` they must return the exact same set of record fields. +If both arms end in `RETURN *`, the order of record fields is unspecified and left to the implementation. -Multi-arm query combinators determine the result signature of a top-level query. +Multi-arm query combinators may determine the result signature of a top-level query. If any arm specifies recod fields explicitly, the exact same set of record fields in the exact same order is returned by the whole query. -If all arms end in `RETURN *`, the order of record fields is unspecified and left to the implementation. - -Additionally, query combinators may be used in a secondary clause position via nested subqueries (covered in separate CIP). === UNION @@ -87,23 +86,28 @@ Additionally, query combinators may be used in a secondary clause position via n `EXCLUSIVE UNION ALL` computes the exclusive logical multiset union between two bags of input records (i.e. returns the largest remaining excess multiplicity of each record in any argument bag). +=== OTHERWISE + +`OTHERWISE` computes the logical choice between two bags of input records. +It evaluates to all records from the left argument unless that bag is empty in which case it evaluates to all records from the right argument. + === CROSS -`CROSS` computes the cartesian product between two sets of input records (i.e. discards any duplicates). +`CROSS` computes the cartesian product between two bags of input records (i.e. preserves duplicates). -`CROSS ALL` computes the cartesian product between two bags of input records (i.e. preserves duplicates). +Contrary to the other query combinators, the standard rules regarding returned record fields do not apply to `CROSS`. +Instead, the set of returned record fields of both arms of a `CROSS` must be non-overlapping. +The finally returned record fields of a `CROSS` are the exact same fields from the left arm in the exact same order given by the left arm followed by the exact same fields from the right arm in the exact same order given by the right arm. -Contrary to the other query combinators, the rules regarding result fields do not apply to `CROSS` and `CROSS ALL`. -Instead, the set of return fields of both arms of a `CROSS` or a `CROSS ALL` must be non-overlapping. -The final result fields of a `CROSS` or a `CROSS ALL` are the exact same fields from the left arm in the exact same order given by the left arm followed by the exact same fields from the right arm in the exact same order given by the right arm. +=== THEN -=== OTHERWISE +`THEN` computes query-level pipelining, i.e. it executes the right-hand query for each input record from the left-hand side and returns the flattened concatenation of all such records produced. -`OTHERWISE` computes the logical choice between two sets of input records. -It evaluates to all distinct records from the left argument unless that set is empty in which case it evaluates to all distinct records from the right argument. +The main feature of `THEN` is that it allows pipelining between nested subqueries due to it's syntatic status as a query combinator. -`OTHERWISE ALL` computes the logical choice between two bags of input records. -It evaluates to all records from the left argument unless that set is empty in which case it evaluates to all records from the right argument. +Contrary to the other query combinators, the standard rules regarding returned record fields do not apply to `THEN`. +Instead, the set of returned record fields of both arms of `THEN` must be non-overlapping. +The finally returned record fields of a `THEN` are the exact same fields from the right arm in the exact same order given by the right arm. === Handling of NULL values @@ -127,8 +131,6 @@ SQL does not have `UNION MAX` (it has been suggested in the literature though). SQL allows `MINUS` as an alias for `EXCEPT`. -SQL uses `CROSS JOIN` for `CROSS ALL` and does not provide `CROSS` directly. - == What others do This proposal mainly follows SQL. @@ -140,3 +142,7 @@ Set operations are added to the language. == Caveats to this proposal Increase in language complexity; adopting controversial `NULL` handling issues from SQL. + +This does not provide for aliasing of subqueries; henceforth set operations over the same argument queries need to repeat the argument subqueries. + +This could be addressed in a future CIP. From bd08272b152170f83586479e39f40348540621e3 Mon Sep 17 00:00:00 2001 From: Petra Selmer Date: Fri, 21 Apr 2017 17:00:00 +0100 Subject: [PATCH 4/9] Textual improvements --- .../CIP2017-04-20-query-combinators.adoc | 73 ++++++++++--------- 1 file changed, 37 insertions(+), 36 deletions(-) diff --git a/cip/1.accepted/CIP2017-04-20-query-combinators.adoc b/cip/1.accepted/CIP2017-04-20-query-combinators.adoc index 1582c00c1f..db908ed57e 100644 --- a/cip/1.accepted/CIP2017-04-20-query-combinators.adoc +++ b/cip/1.accepted/CIP2017-04-20-query-combinators.adoc @@ -9,26 +9,26 @@ [abstract] .Abstract -- -This CIP codifies the pre-existing `UNION` and `UNION ALL` clauses as well as proposes the addition of new query combinators for set operations and pipelining. +This CIP codifies the pre-existing `UNION` and `UNION ALL` clauses, and proposes additional query combinators for set operations and pipelining. -- toc::[] == Motivation -Query combinators for set operators are a common feature in other query languages. -Adding more query combinators to Cypher will increase language expressivity and provide functionality that has been requested (and expected to exist) in the language by users. +Query combinators for set operations are a common feature in other query languages. +Adding more query combinators to Cypher will increase language expressivity and provide functionality that has been requested -- and expected to exist -- in the language by users. == Background -The vast majority of Cypher clauses allows for sequential composition: The records produces by the first clause become an input to the following clause. +The vast majority of Cypher clauses are underpinned by sequential composition; i.e. the records produced by the first clause act as an input to the next clause and so on. However, some operations require multiple streams of records as inputs. -These are called query combinators. -The most notable example of query combinators are set operations. +These are called _query combinators_. +The most notable example of query combinators are _set operations_. == Proposal -This CIP proposes the introduction of several new multi-arm query combinators. +This CIP proposes the introduction of several new multi-arm query combinators: * `UNION` * `UNION MAX` @@ -43,75 +43,76 @@ This CIP proposes the introduction of several new multi-arm query combinators. * `CROSS` * `THEN` -Multi-arm query combinators can only appear as a query using the syntax ` [ ]+`. +Multi-arm query combinators can only appear within a query using the syntax ` [ ]+`. The `` can be any of the combinators given above. -Multi-arm query combinators are interpreted left-associative. -Therefore in the following, we only consider combinator semantics regarding two arms (left and right). +Multi-arm query combinators are interpreted left-associative; that is, the operations are grouped from the left. +Thus, for the remainder of this proposal, we only consider combinator semantics regarding two arms (left and right) -- the semantics follow on straightforwardly by induction for the multi-arm cases. -For all proposed query combinators but not for `CROSS` and `THEN`, the following standard rules regarding returned We fields apply: +For all proposed query combinators -- excluding `CROSS` and `THEN` -- the fields returned are subject to the following standard rules: -The `RETURN` clause of each arm is either a `RETURN *` or specifies record fields explicitly. -If both arms specify record fields explicitly, then they must specify the exact same set of record fields in the exact same order. -If an arm ends in `RETURN *` and the other arm specifies record fields explicitly, it must implicitly return the exact same set of record fields as the other arm. -If both arms end in `RETURN *` they must return the exact same set of record fields. -If both arms end in `RETURN *`, the order of record fields is unspecified and left to the implementation. +* The `RETURN` clause of each arm is either a `RETURN *` or specifies record fields explicitly (e.g. `RETURN n.prop1, n.prop2, ...`). +* If both arms specify record fields explicitly, then they must specify precisely the same set of record fields (by name) in exactly the same order. +* If one of the arms, _arm1_, ends with `RETURN *`, and the other arm, _arm2_, specifies record fields explicitly, then _arm1_ must implicitly return exactly the same set of record fields as _arm2_; i.e. the arm with the explicitly-defined record fields will determine which record fields are returned as well as the order thereof. +* If both arms end with `RETURN *`, they must return exactly the same set of record fields. +* If both arms end with `RETURN *`, the order of record fields is unspecified and left to the implementation. Multi-arm query combinators may determine the result signature of a top-level query. -If any arm specifies recod fields explicitly, the exact same set of record fields in the exact same order is returned by the whole query. +If any arm specifies record fields explicitly, the same set of record fields in exactly the same order is returned by the entire query. === UNION -`UNION` computes the logical set union between two sets of input records (i.e.discards any duplicates). +`UNION` computes the logical set union between two sets of input records (i.e. any duplicates are discarded). -`UNION MAX` computes the logical multiset union between two bags of input records (i.e. preserves the larget numer of duplicates from either arm). +`UNION MAX` computes the logical multiset union between two bags of input records (i.e. preserves the largest number of duplicates from either arm). -`UNION ALL` computes the logical multiset union between two bags of input records (i.e. preserves all duplicates from both arms). +`UNION ALL` computes the logical multiset union between two bags of input records (i.e. all duplicates from both arms are retained). === INTERSECT -`INTERSECT` computes the logical set intersection between two sets of input records (i.e.discards any duplicates). +`INTERSECT` computes the logical set intersection between two sets of input records (i.e. any duplicates are discarded). -`INTERSECT ALL` computes the logical multiset intersection between two bags of input records (i.e. preserves shared duplicates). +`INTERSECT ALL` computes the logical multiset intersection between two bags of input records (i.e. shared duplicates are retained). === EXCEPT -`EXCEPT` computes the logical set difference between two sets of input records (i.e.discards any duplicates). +`EXCEPT` computes the logical set difference between two sets of input records (i.e. any duplicates are discarded). -`EXCEPT ALL` computes the logical multiset difference between two bags of input records (i.e. preserves excess duplicates on the left-hand side). +`EXCEPT ALL` computes the logical multiset difference between two bags of input records (i.e. excess duplicates on the left-hand side are retained). === EXCLUSIVE UNION -`EXCLUSIVE UNION` computes the exclusive logical set union between two sets of input records (i.e.discards any duplicates in the final outcome). +`EXCLUSIVE UNION` computes the exclusive logical set union between two sets of input records (i.e. any duplicates in the final outcome are discarded). -`EXCLUSIVE UNION ALL` computes the exclusive logical multiset union between two bags of input records (i.e. returns the largest remaining excess multiplicity of each record in any argument bag). +`EXCLUSIVE UNION ALL` computes the exclusive logical multiset union between two bags of input records (i.e. the largest remaining excess multiplicity of each record in any argument bag is returned). === OTHERWISE `OTHERWISE` computes the logical choice between two bags of input records. -It evaluates to all records from the left argument unless that bag is empty in which case it evaluates to all records from the right argument. +It evaluates to all records from the left-hand side argument provided the bag of input records is non-empty; otherwise it evaluates to all records from the right-hand side argument. === CROSS `CROSS` computes the cartesian product between two bags of input records (i.e. preserves duplicates). -Contrary to the other query combinators, the standard rules regarding returned record fields do not apply to `CROSS`. +In contrast to the other query combinators, the standard rules regarding returned record fields do not apply to `CROSS`. Instead, the set of returned record fields of both arms of a `CROSS` must be non-overlapping. -The finally returned record fields of a `CROSS` are the exact same fields from the left arm in the exact same order given by the left arm followed by the exact same fields from the right arm in the exact same order given by the right arm. +The returned record fields of a `CROSS` operation consist of all the fields specified in the left arm (appearing in the order specified), followed by all the fields specified in the right arm (appearing in the order specified). === THEN -`THEN` computes query-level pipelining, i.e. it executes the right-hand query for each input record from the left-hand side and returns the flattened concatenation of all such records produced. +`THEN` computes query-level pipelining; i.e. it executes the right-hand side query for each input record from the left-hand side, and returns the flattened concatenation of all such records produced. -The main feature of `THEN` is that it allows pipelining between nested subqueries due to it's syntatic status as a query combinator. +The main feature of `THEN` is that it allows pipelining between nested subqueries. +This is due to its syntactic status as a query combinator. -Contrary to the other query combinators, the standard rules regarding returned record fields do not apply to `THEN`. +In contrast to the other query combinators, the standard rules regarding returned record fields do not apply to `THEN`. Instead, the set of returned record fields of both arms of `THEN` must be non-overlapping. -The finally returned record fields of a `THEN` are the exact same fields from the right arm in the exact same order given by the right arm. +`THEN` returns the record fields that are specified in the right arm, in the order specified in the right arm. === Handling of NULL values -All query combinators perform record-level comparisons under equivalence (i.e. `NULL` is equivalent to `NULL`). +All query combinators perform record-level comparisons under equivalence (i.e. `null` is equivalent to `null`). === Interaction with existing features @@ -141,8 +142,8 @@ Set operations are added to the language. == Caveats to this proposal -Increase in language complexity; adopting controversial `NULL` handling issues from SQL. +Increase in language complexity; adopting controversial `null` handling issues from SQL. -This does not provide for aliasing of subqueries; henceforth set operations over the same argument queries need to repeat the argument subqueries. +This does not consider aliasing of subqueries; henceforth set operations over the same argument queries need to repeat the argument subqueries. This could be addressed in a future CIP. From 17ec5f8915c93074f2e990f33f1f301b81555317 Mon Sep 17 00:00:00 2001 From: Stefan Plantikow Date: Tue, 2 May 2017 01:25:43 +0200 Subject: [PATCH 5/9] Address feedback --- .../CIP2017-04-20-query-combinators.adoc | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/cip/1.accepted/CIP2017-04-20-query-combinators.adoc b/cip/1.accepted/CIP2017-04-20-query-combinators.adoc index db908ed57e..e63a164b32 100644 --- a/cip/1.accepted/CIP2017-04-20-query-combinators.adoc +++ b/cip/1.accepted/CIP2017-04-20-query-combinators.adoc @@ -32,6 +32,7 @@ This CIP proposes the introduction of several new multi-arm query combinators: * `UNION` * `UNION MAX` +* `UNION MIN` * `UNION ALL` * `INTERSECT` * `INTERSECT ALL` @@ -43,7 +44,7 @@ This CIP proposes the introduction of several new multi-arm query combinators: * `CROSS` * `THEN` -Multi-arm query combinators can only appear within a query using the syntax ` [ ]+`. +Multi-arm query combinators can only be used to constuct a compound multi-arm query using the syntax ` [ ]+`. The `` can be any of the combinators given above. Multi-arm query combinators are interpreted left-associative; that is, the operations are grouped from the left. @@ -64,10 +65,13 @@ If any arm specifies record fields explicitly, the same set of record fields in `UNION` computes the logical set union between two sets of input records (i.e. any duplicates are discarded). -`UNION MAX` computes the logical multiset union between two bags of input records (i.e. preserves the largest number of duplicates from either arm). - `UNION ALL` computes the logical multiset union between two bags of input records (i.e. all duplicates from both arms are retained). +`UNION MAX` computes the logical max-bounded multiset union between two bags of input records (i.e. retains the largest number of duplicates from either arm). + +`UNION MIN` computes the logical min-bounded multiset union between two bags of input records (i.e. retains the smallest, non-zero number of duplicates from either arm). + + === INTERSECT `INTERSECT` computes the logical set intersection between two sets of input records (i.e. any duplicates are discarded). @@ -107,7 +111,8 @@ The main feature of `THEN` is that it allows pipelining between nested subquerie This is due to its syntactic status as a query combinator. In contrast to the other query combinators, the standard rules regarding returned record fields do not apply to `THEN`. -Instead, the set of returned record fields of both arms of `THEN` must be non-overlapping. +Instead, the set of returned record fields of both arms of `THEN` may overlap arbitrarily. +All record fields that are returned in the left arm are made visible at the start of the right-arm query. `THEN` returns the record fields that are specified in the right arm, in the order specified in the right arm. === Handling of NULL values @@ -128,7 +133,7 @@ This CIP adds `INTERSECT`, `EXCLUSIVE`, and `OTHERWISE` as new keywords. `OTHERWISE` is not provided by SQL and could be omitted. -SQL does not have `UNION MAX` (it has been suggested in the literature though). +SQL does not have `UNION MIN` or `UNION MAX` (it has been suggested in the literature though). SQL allows `MINUS` as an alias for `EXCEPT`. @@ -145,5 +150,4 @@ Set operations are added to the language. Increase in language complexity; adopting controversial `null` handling issues from SQL. This does not consider aliasing of subqueries; henceforth set operations over the same argument queries need to repeat the argument subqueries. - This could be addressed in a future CIP. From b3b77ec4d48287e879daae8ffb4fee7ee98392eb Mon Sep 17 00:00:00 2001 From: Stefan Plantikow Date: Sat, 6 May 2017 13:38:13 +0200 Subject: [PATCH 6/9] Add EXCLUSIVE UNION MAX, drop UNION MIN and EXCLUSIVE UNION ALL --- .../CIP2017-04-20-query-combinators.adoc | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/cip/1.accepted/CIP2017-04-20-query-combinators.adoc b/cip/1.accepted/CIP2017-04-20-query-combinators.adoc index e63a164b32..3414513f09 100644 --- a/cip/1.accepted/CIP2017-04-20-query-combinators.adoc +++ b/cip/1.accepted/CIP2017-04-20-query-combinators.adoc @@ -31,15 +31,14 @@ The most notable example of query combinators are _set operations_. This CIP proposes the introduction of several new multi-arm query combinators: * `UNION` -* `UNION MAX` -* `UNION MIN` * `UNION ALL` +* `UNION MAX` * `INTERSECT` * `INTERSECT ALL` * `EXCEPT` * `EXCEPT ALL` * `EXCLUSIVE UNION` -* `EXCLUSIVE UNION ALL` +* `EXCLUSIVE UNION MAX` * `OTHERWISE` * `CROSS` * `THEN` @@ -65,12 +64,10 @@ If any arm specifies record fields explicitly, the same set of record fields in `UNION` computes the logical set union between two sets of input records (i.e. any duplicates are discarded). -`UNION ALL` computes the logical multiset union between two bags of input records (i.e. all duplicates from both arms are retained). +`UNION ALL` computes the logical multiset sum between two bags of input records (i.e. all duplicates from both arms are retained). `UNION MAX` computes the logical max-bounded multiset union between two bags of input records (i.e. retains the largest number of duplicates from either arm). -`UNION MIN` computes the logical min-bounded multiset union between two bags of input records (i.e. retains the smallest, non-zero number of duplicates from either arm). - === INTERSECT @@ -88,7 +85,7 @@ If any arm specifies record fields explicitly, the same set of record fields in `EXCLUSIVE UNION` computes the exclusive logical set union between two sets of input records (i.e. any duplicates in the final outcome are discarded). -`EXCLUSIVE UNION ALL` computes the exclusive logical multiset union between two bags of input records (i.e. the largest remaining excess multiplicity of each record in any argument bag is returned). +`EXCLUSIVE UNION MAX` computes the exclusive logical multiset union between two bags of input records (i.e. the largest remaining excess multiplicity of each record in any argument bag is returned). === OTHERWISE @@ -129,11 +126,11 @@ This CIP adds `INTERSECT`, `EXCLUSIVE`, and `OTHERWISE` as new keywords. === Alternatives -`EXCLUSIVE UNION` is not provided by SQL and could be omitted. +SQL does not provide `UNION MAX` (it has been suggested in the literature though). -`OTHERWISE` is not provided by SQL and could be omitted. +`EXCLUSIVE UNION` and `EXCLUSIVE UNION MAX` are not provided by SQL and could be omitted. -SQL does not have `UNION MIN` or `UNION MAX` (it has been suggested in the literature though). +`OTHERWISE` is not provided by SQL and could be omitted. SQL allows `MINUS` as an alias for `EXCEPT`. From bdf4be1e86ee0febf3b85dd03c929559c9232d6f Mon Sep 17 00:00:00 2001 From: Stefan Plantikow Date: Mon, 16 Oct 2017 22:34:23 +0200 Subject: [PATCH 7/9] Remove THEN; reflect companion CIP changes --- ...query-combinators-for-set-operations.adoc} | 49 +++++++------------ 1 file changed, 18 insertions(+), 31 deletions(-) rename cip/1.accepted/{CIP2017-04-20-query-combinators.adoc => CIP2017-04-20-query-combinators-for-set-operations.adoc} (58%) diff --git a/cip/1.accepted/CIP2017-04-20-query-combinators.adoc b/cip/1.accepted/CIP2017-04-20-query-combinators-for-set-operations.adoc similarity index 58% rename from cip/1.accepted/CIP2017-04-20-query-combinators.adoc rename to cip/1.accepted/CIP2017-04-20-query-combinators-for-set-operations.adoc index 3414513f09..d45fd9393a 100644 --- a/cip/1.accepted/CIP2017-04-20-query-combinators.adoc +++ b/cip/1.accepted/CIP2017-04-20-query-combinators-for-set-operations.adoc @@ -1,4 +1,4 @@ -= CIP2017-04-20 - Query Combinators += CIP2017-04-20 - Query combinators for set operations :numbered: :toc: :toc-placement: macro @@ -9,7 +9,7 @@ [abstract] .Abstract -- -This CIP codifies the pre-existing `UNION` and `UNION ALL` clauses, and proposes additional query combinators for set operations and pipelining. +This CIP codifies the pre-existing `UNION` and `UNION ALL` clauses, and proposes additional query combinators for set operations. -- toc::[] @@ -23,12 +23,12 @@ Adding more query combinators to Cypher will increase language expressivity and The vast majority of Cypher clauses are underpinned by sequential composition; i.e. the records produced by the first clause act as an input to the next clause and so on. However, some operations require multiple streams of records as inputs. -These are called _query combinators_. +These are called _query combinators_ (CIP2016-06-22 Nested, updating, and chained subqueries). The most notable example of query combinators are _set operations_. == Proposal -This CIP proposes the introduction of several new multi-arm query combinators: +This CIP proposes the specification of pre-existing and the introduction of several new query combinators for set operations: * `UNION` * `UNION ALL` @@ -41,24 +41,19 @@ This CIP proposes the introduction of several new multi-arm query combinators: * `EXCLUSIVE UNION MAX` * `OTHERWISE` * `CROSS` -* `THEN` -Multi-arm query combinators can only be used to constuct a compound multi-arm query using the syntax ` [ ]+`. +Query combinators are used to construct a (compound) top-level query from two input queries: a left-hand side top-level query and a right-hand side argument query, i.e. always have the form +` `(where `` may be any of the combinators given above). +Query combinators are left-associative; that is, their operations are grouped from the left. -The `` can be any of the combinators given above. -Multi-arm query combinators are interpreted left-associative; that is, the operations are grouped from the left. -Thus, for the remainder of this proposal, we only consider combinator semantics regarding two arms (left and right) -- the semantics follow on straightforwardly by induction for the multi-arm cases. +For all proposed query combinators -- except for `CROSS` -- the fields returned are subject to the following standard rules: -For all proposed query combinators -- excluding `CROSS` and `THEN` -- the fields returned are subject to the following standard rules: +* Both input queries must return precisely the same set of variables +* If both input queries specify the order of returned variables explicitly, they must both return those variables in exactly the same order. +* If one of the input queries does not specify the order of returned variables explicitly (e.g. by using `RETURN *`), then the other input query must specify the order of returned variables explicitly. +This order will then be the order in which variables are returned by the query combinator. +* If both input queries do not specify the order of returned variables explicitly (e.g. by using `RETURN *`), variables are returned in the same order as map keys (i.e. sorted according to their UNICODE name). -* The `RETURN` clause of each arm is either a `RETURN *` or specifies record fields explicitly (e.g. `RETURN n.prop1, n.prop2, ...`). -* If both arms specify record fields explicitly, then they must specify precisely the same set of record fields (by name) in exactly the same order. -* If one of the arms, _arm1_, ends with `RETURN *`, and the other arm, _arm2_, specifies record fields explicitly, then _arm1_ must implicitly return exactly the same set of record fields as _arm2_; i.e. the arm with the explicitly-defined record fields will determine which record fields are returned as well as the order thereof. -* If both arms end with `RETURN *`, they must return exactly the same set of record fields. -* If both arms end with `RETURN *`, the order of record fields is unspecified and left to the implementation. - -Multi-arm query combinators may determine the result signature of a top-level query. -If any arm specifies record fields explicitly, the same set of record fields in exactly the same order is returned by the entire query. === UNION @@ -75,6 +70,7 @@ If any arm specifies record fields explicitly, the same set of record fields in `INTERSECT ALL` computes the logical multiset intersection between two bags of input records (i.e. shared duplicates are retained). + === EXCEPT `EXCEPT` computes the logical set difference between two sets of input records (i.e. any duplicates are discarded). @@ -87,30 +83,21 @@ If any arm specifies record fields explicitly, the same set of record fields in `EXCLUSIVE UNION MAX` computes the exclusive logical multiset union between two bags of input records (i.e. the largest remaining excess multiplicity of each record in any argument bag is returned). + === OTHERWISE `OTHERWISE` computes the logical choice between two bags of input records. It evaluates to all records from the left-hand side argument provided the bag of input records is non-empty; otherwise it evaluates to all records from the right-hand side argument. + === CROSS `CROSS` computes the cartesian product between two bags of input records (i.e. preserves duplicates). In contrast to the other query combinators, the standard rules regarding returned record fields do not apply to `CROSS`. -Instead, the set of returned record fields of both arms of a `CROSS` must be non-overlapping. -The returned record fields of a `CROSS` operation consist of all the fields specified in the left arm (appearing in the order specified), followed by all the fields specified in the right arm (appearing in the order specified). - -=== THEN - -`THEN` computes query-level pipelining; i.e. it executes the right-hand side query for each input record from the left-hand side, and returns the flattened concatenation of all such records produced. - -The main feature of `THEN` is that it allows pipelining between nested subqueries. -This is due to its syntactic status as a query combinator. +Instead, the set of variables returned from both input queries of a `CROSS` must be non-overlapping. +The returned variables of a `CROSS` operation consist of all the variables returned by the left-hand side input query (appearing in the order specified), followed by all the variables returned by the right-hand side input query (appearing in the order specified). -In contrast to the other query combinators, the standard rules regarding returned record fields do not apply to `THEN`. -Instead, the set of returned record fields of both arms of `THEN` may overlap arbitrarily. -All record fields that are returned in the left arm are made visible at the start of the right-arm query. -`THEN` returns the record fields that are specified in the right arm, in the order specified in the right arm. === Handling of NULL values From 7574871205d50754347ee806d7bae9d7ac74d9af Mon Sep 17 00:00:00 2001 From: Stefan Plantikow Date: Tue, 17 Oct 2017 15:35:03 +0200 Subject: [PATCH 8/9] Align again w companion CIP --- ...-query-combinators-for-set-operations.adoc | 28 +++++++++++-------- 1 file changed, 16 insertions(+), 12 deletions(-) diff --git a/cip/1.accepted/CIP2017-04-20-query-combinators-for-set-operations.adoc b/cip/1.accepted/CIP2017-04-20-query-combinators-for-set-operations.adoc index d45fd9393a..1c98a09da7 100644 --- a/cip/1.accepted/CIP2017-04-20-query-combinators-for-set-operations.adoc +++ b/cip/1.accepted/CIP2017-04-20-query-combinators-for-set-operations.adoc @@ -42,17 +42,20 @@ This CIP proposes the specification of pre-existing and the introduction of seve * `OTHERWISE` * `CROSS` -Query combinators are used to construct a (compound) top-level query from two input queries: a left-hand side top-level query and a right-hand side argument query, i.e. always have the form -` `(where `` may be any of the combinators given above). +Query combinators are used to construct a (compound) top-level query from two input query components. + +A query component is a sequence of clauses that either describes an updating query or a read-only that ends in a `RETURN` clause but does not contain any top-level query combinator clauses. +Query components may however contain nested subqueries whose inner queries contain query combinator clauses. + Query combinators are left-associative; that is, their operations are grouped from the left. -For all proposed query combinators -- except for `CROSS` -- the fields returned are subject to the following standard rules: +For all proposed query combinators -- except for `CROSS` -- the variables returned are subject to the following standard rules: -* Both input queries must return precisely the same set of variables -* If both input queries specify the order of returned variables explicitly, they must both return those variables in exactly the same order. -* If one of the input queries does not specify the order of returned variables explicitly (e.g. by using `RETURN *`), then the other input query must specify the order of returned variables explicitly. -This order will then be the order in which variables are returned by the query combinator. -* If both input queries do not specify the order of returned variables explicitly (e.g. by using `RETURN *`), variables are returned in the same order as map keys (i.e. sorted according to their UNICODE name). +* Both input query components must return precisely the same set of variables +* If both input query components specify the order of returned variables explicitly, they must both return those variables in exactly the same order +* If one of the input query components does not specify the order of returned variables explicitly (e.g. by using `RETURN *`), then the other input query must specify the order of returned variables explicitly. +This order will then be the order in which variables are returned by the query combinator +* If both input query components do not specify the order of returned variables explicitly (e.g. by using `RETURN *`), variables are returned in the same order as map keys (i.e. sorted according to their UNICODE name) === UNION @@ -92,11 +95,12 @@ It evaluates to all records from the left-hand side argument provided the bag of === CROSS -`CROSS` computes the cartesian product between two bags of input records (i.e. preserves duplicates). +`CROSS` computes the cartesian product between the records produced by both input query components. +Duplicates are preserved (i.e. `CROSS` does not imply `DISTINCT`). -In contrast to the other query combinators, the standard rules regarding returned record fields do not apply to `CROSS`. -Instead, the set of variables returned from both input queries of a `CROSS` must be non-overlapping. -The returned variables of a `CROSS` operation consist of all the variables returned by the left-hand side input query (appearing in the order specified), followed by all the variables returned by the right-hand side input query (appearing in the order specified). +In contrast to the other query combinators, the standard rules regarding returned variables do not apply to `CROSS`. +Instead, the set of variables returned from both input query components of a `CROSS` must be non-overlapping. +The returned variables of a `CROSS` operation consist of all the variables returned by the left-hand side input query component (appearing in the order specified), followed by all the variables returned by the right-hand side input query component (appearing in the order specified). === Handling of NULL values From bf8ca55f881eb2b87bf23307f36e9c96341fb41b Mon Sep 17 00:00:00 2001 From: Petra Selmer Date: Wed, 17 Jan 2018 16:30:11 +0000 Subject: [PATCH 9/9] Reformatted title --- .../CIP2017-04-20-query-combinators-for-set-operations.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cip/1.accepted/CIP2017-04-20-query-combinators-for-set-operations.adoc b/cip/1.accepted/CIP2017-04-20-query-combinators-for-set-operations.adoc index 1c98a09da7..8f83b92b5a 100644 --- a/cip/1.accepted/CIP2017-04-20-query-combinators-for-set-operations.adoc +++ b/cip/1.accepted/CIP2017-04-20-query-combinators-for-set-operations.adoc @@ -1,4 +1,4 @@ -= CIP2017-04-20 - Query combinators for set operations += CIP2017-04-20 Query combinators for set operations :numbered: :toc: :toc-placement: macro