Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Issues found by Spark UT Framework on RapidsDataFrameAggregateSuite #10772

Closed
5 tasks done
binmahone opened this issue May 7, 2024 · 5 comments · Fixed by #10943
Closed
5 tasks done

[BUG] Issues found by Spark UT Framework on RapidsDataFrameAggregateSuite #10772

binmahone opened this issue May 7, 2024 · 5 comments · Fixed by #10943
Assignees
Labels
bug Something isn't working

Comments

@binmahone
Copy link
Collaborator

binmahone commented May 7, 2024

Describe the bug

Spark UT Framework enabled RapidsDataFrameAggregateSuite (#10743), with the following test cases explicitly excluded:

  • collect_set gave two different outputs on Vanilla Spark and RAPIDS Spark.
    collect functions
  • failure related to the collect_set aggregate function when used with struct types.
    collect functions structs
  • Expected result is a struct with an empty filed, but the result is not.
    collect functions should be able to cast to array type with no null values
  • ordering of elements in the collect_set array is different from the expected order
    collect functions should not collect null values
  • SPARK-19471: AggregationIterator does not initialize the generated result projection before using it
    The case is to assert to find a WholeStageCodegenExec from the plan tree. Not suitable case.

These excluded test cases needs further investigating!!!
Notice: Other test cases in this suite may pass with falling back!

Steps/Code to reproduce bug

  1. Compile everything with mvn -Dbuildver=330 install -DskipTests
  2. Pick a test case name in the above table
  3. Go to RapidsTestSettings and find the line starting with ".exclude" and containing the test case name, comment it out
  4. Run the Suite then you'll see one failed test case. E.g. mvn -nsu -Dbuildver=330 -pl tests -Dsuites="org.apache.spark.sql.rapids.suites.RapidsXXXSuite" test (replace RapidsXXXSuite with the right name in issue header). ALWAYS double check if your suite name coincide with in source code, as it may contain typos!

Expected behavior
The suite can pass without excluding any test case.

@binmahone binmahone added bug Something isn't working ? - Needs Triage Need team to review and classify labels May 7, 2024
@mattahrens
Copy link
Collaborator

Initial scope is triaging unit test failures to determine priorities of individual issues.

@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label May 7, 2024
@abellina
Copy link
Collaborator

I actually see 5 test failures, not 6. I even tried the @NVnavkumar recommended, so one of the tests marked as except is passing fine:

- SPARK-24788: RelationalGroupedDataset.toString with unresolved exprs should not fail

The other 5 are failing with what looks like array sort differences:

collect functions *** FAILED ***
!struct<>                     struct<collect_set(a):array<int>,collect_set(b):array<int>>
![List(1, 2, 3),List(2, 4)]   [ArrayBuffer(2, 1, 3),ArrayBuffer(4, 2)] (RapidsSQLTestsTrait.scala:99)
collect functions structs *** FAILED ***
!== Correct Answer - 1 ==             == RAPIDS Answer - 1 ==
!struct<>                             struct<collect_set(a):array<int>,sort_array(collect_set(b), true):array<struct<x:int,y:int>>>
![List(1, 2, 3),List([2,2], [4,1])]   [ArrayBuffer(2, 1, 3),ArrayBuffer([2,2], [4,1])] (RapidsSQLTestsTrait.scala:99)
SPARK-17641: collect functions should not collect null values *** FAILED ***
!== Correct Answer - 1 ==   == RAPIDS Answer - 1 ==
!struct<>                   struct<collect_set(a):array<string>,collect_set(b):array<int>>
![List(1),List(2, 4)]       [ArrayBuffer(1),ArrayBuffer(4, 2)] (RapidsSQLTestsTrait.scala:99)
collect functions should be able to cast to array type with no null values *** FAILED ***
!== Correct Answer - 1 ==   == RAPIDS Answer - 1 ==
!struct<>                   struct<collect_set(a):array<float>>
![List(1.0, 2.0)]           [ArrayBuffer(2.0, 1.0)] (RapidsSQLTestsTrait.scala:99)

and this last one seems to be something that we probably don't do the same on the GPU and I don't know if it matters yet.

SPARK-19471: AggregationIterator does not initialize the generated result projection before using it *** FAILED ***
DataFrameAggregateSuite.this.find(hashAggPlan)(((x0$1: org.apache.spark.sql.execution.SparkPlan) => x0$1 match {
  case (child: org.apache.spark.sql.execution.SparkPlan)(codegenStageId: Int) org.apache.spark.sql.execution.WholeStageCodegenExec((_: org.apache.spark.sql.execution.aggregate.HashAggregateExec)) => true
case _ => false
})).isDefined was false (DataFrameAggregateSuite.scala:691)

@abellina
Copy link
Collaborator

Ok the last test is invalid. It's looking for HashAggregateExec specifically and we replace this, so we need to skip this test.

@abellina
Copy link
Collaborator

abellina commented May 24, 2024

From the documentation of collect_set:

The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.

collect functions structs uses sort_array for the second column b but not for the first column. The test fails because a is not matching, and that's not sorted.

I think the problem is that we need to have a way to sort the arrays. That said, I don't see consistent use or lack of use of sort_array from spar's side, so I don't think we can enable these tests.. unless we just ported them over. @binmahone.

@binmahone
Copy link
Collaborator Author

well received. This happens a lot. For some test cases we may need to adjust to make them work. This is one of the acceptable ways out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants