Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Almost match Cast Floats to Decimal #10909

Closed
wants to merge 3 commits into from

Conversation

thirtiseven
Copy link
Collaborator

@thirtiseven thirtiseven commented May 27, 2024

Post this pr for sharing tests.

Fixes #9682
Fixes #10809

This PR rewrite Cast floats to decimal with a new floats => string => decimal path, to make the results closer to CPU.

There are still some differences from the known limits of ryu float to string and two edge cases in string to decimal #10890 and #10908.

Performance test to cast 5000000 floats to 10 kinds of decimal types (in ms)

Type CPU 24.08 This PR Speedup vs CPU Speedup vs 24.08
Double 146,524 468.33 660.33 221.90x -29.08%
Float 82,691 412.33 480.67 172.03x -14.22%

We can make it a bit faster by combining the two kernels in jni.

@thirtiseven
Copy link
Collaborator Author

thirtiseven commented May 27, 2024

cc @ttnghia Here are some integration tests. scala test and Spark UT are also useful.
I modified the Scala test to make my code pass, original tests are stronger.

# Integration test:
./integration_tests/run_pyspark_from_build.sh -s -k test_cast_float_to_decimal
# Spark UT:
mvn test -Dbuildver=330 -DwildcardSuites=org.apache.spark.sql.rapids.suites.RapidsCastSuite
# Scala Test:
mvn test -Dbuildver=330 -DwildcardSuites=com.nvidia.spark.rapids.CastOpSuite

@@ -877,8 +877,8 @@ class CastOpSuite extends GpuExpressionTestSuite {

overflowCase(DataTypes.FloatType, precision = 10, scale = 6,
generator = floatGenerator(Seq(12345.678f)))
overflowCase(DataTypes.DoubleType, precision = 15, scale = -5,
generator = doubleGenerator(Seq(1.23e21)))
// overflowCase(DataTypes.DoubleType, precision = 15, scale = -5,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two line failed because #10908

@@ -261,6 +261,23 @@ def test_cast_long_to_decimal_overflow():
lambda spark : unary_op_df(spark, long_gen).select(
f.col('a').cast(DecimalType(18, -1))))

@datagen_overrides(seed=0, reason='edge cases')
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed seed because #10890

@thirtiseven
Copy link
Collaborator Author

Close because we have a better approach in #10917

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant