Skip to content

Releases: NVIDIA/spark-rapids-tools

v24.12.1

14 Jan 03:32
Compare
Choose a tag to compare

Packages

Changes

User Tools

  • Add compute_precision_recall utility function (#1500)
  • Fix additional FutureWarning issues (#1499)
  • Qualx model updates from weekly KPI run 2025-01-10 (#1495)
  • Fix future warnings for pandas>=2.2 (#1494)
  • Pin scikit-learn dependency for shap (#1491)
  • Make spill heuristic 1 TB by default (#1488)
  • Support Python 3.9-3.12 (#1486)
  • Update models for latest code/datasets (#1485)

Core

  • Improve scalastyle rules to detect spaces (#1493)
  • Improve shuffle manager recommendation in AutoTuner with version validation (#1483)
  • Support group-limit optimization for ROW_NUMBER in Qualification (#1487)
  • Bump minimum Spark version to 3.2.0 and improve AutoTuner unit tests for multiple Spark versions (#1482)
  • Fix inconsistent shuffle write time sum results in Profiler output (#1450)
  • Refine Qualification AutoTuner recommendations for shuffle partitions for CPU event logs (#1479)
  • Split AutoTuner for Profiling and Qualification and Override BATCH_SIZE_BYTES (#1471)

v24.12.0

20 Dec 20:23
Compare
Choose a tag to compare

Packages

Changes

User Tools

  • Make '--platform' argument mandatory in qualification and profiling CLI to prevent incorrect behavior (#1463)

Core

  • Skip processing apps with invalid platform and spark runtime configurations (#1421)
  • Improve implementation of finding median in StatisticsMetrics (#1474)
  • Optimize implementation of getAggregateRawMetrics in core-tools (#1468)
  • Adding Spark 3.5.2 support in auto tuner for EMR (#1466)
  • Mark RunningWindowFunction as supported in Qual tool (#1465)
  • Deduplicate calls to aggregateSparkMetricsBySql (#1464)

Miscellaneous

  • Follow Up: Make '--platform' argument mandatory in CLI (#1473)

v24.10.3

13 Dec 01:57
Compare
Choose a tag to compare

Packages

Changes

User Tools

  • Fix dataframe handling of column-types (#1458)

v24.10.2

06 Dec 16:35
Compare
Choose a tag to compare

Packages

Changes

User Tools

  • Update models for latest tools code (#1448)
  • More flexible regexes; fix default split function (#1443)
  • Update models for latest code and dataset JSON (#1442)
  • Add model for databricks-azure_photon and update combined model (#1427)
  • Remove custom-speedup module from user-tools (#1425)

Core

  • Count expressions per Exec in SQLPlanParser (#1449)
  • Report all operators in the output file (#1444)
  • Fix missing exec-to-stageId mapping in Qual tool (#1437)
  • [BUG] Fix Profiler tool index out of bound exception when generating diagnostic metrics (#1439)
  • Sort Qual execs report by sqlId and nodeId (#1436)
  • Include expression parsers for HashAggregate and ObjectHashAggregate (#1432)
  • [FEA] Add stage/task level diagnostic output for GPU slowness in Profiler tool (#1375)
  • Reduce the log noise caused by core report summary (#1426)
  • Trigger GC at the beginning of each benchmark iteration (#1424)

Miscellaneous

  • [BUG] Fix sync plugin files script to handle empty or non-existing cvs files (#1446)
  • Enable license header check (#1440)

v24.10.1

15 Nov 02:36
Compare
Choose a tag to compare

Packages

Changes

User Tools

  • Add qualification support for Photon jobs in the Python Tool (#1409)
  • Add qualx support for platform runtime variants (DB AWS) (#1417)
  • Update models for latest emr, onprem eventlogs (#1410)

Core

  • Adding EMR-specific tunings for shuffle manager and ignoring jar (#1419)
  • Changing autotuner memory error to warning in comments (#1418)
  • Add sparkRuntime property to capture runtime type in application_information (#1414)
  • Refactor Exec Parsers - remove individual parser classes (#1396)
  • Remove estimated GPU duration from qualification output (#1412)

v24.10.0

04 Nov 23:23
Compare
Choose a tag to compare

Packages

Changes

User Tools

  • [FEA] Allow users to specify custom Dependency jars (#1395)
  • Reduce default memory allocation to the java process (#1407)
  • Update error handling in python for parsing cluster information (#1394)
  • user-tools should add xms argument to java cmd (#1391)
  • Use environment variables to set thresholds in static yaml configurations (#1389)
  • Use StorageLib to download dependencies (#1383)
  • Remove total core second heuristic and filter apps only in top candidate view (#1376)
  • Generate log files for Python Profiling cli (#1366)
  • Update models for updated datasets and latest code (#1365)
  • Isolate dataset for qualx plugin invocations (#1361)
  • [FEA] Add total core seconds into top candidate view (#1342)
  • Fix python tool picking up wrong JAR version in Fat wheel mode (#1357)
  • [FOLLOWUP-1326] Set Spark version to 3.4.2 by default for onprem environment (#1358)
  • Disable too-many-positional-arguments in pylintrc (#1353)
  • Reduce console output tree level, exclude JAR tool output files and remove incorrect logging (#1340)

Core

  • Add support for Photon-specific SQL Metrics (#1390)
  • Add support for processing Photon event logs in Scala (#1338)
  • Add Reflection to support custom Spark Implementation at Runtime (#1362)
  • Improve AQE support by capturing SQLPlan versions (#1354)
  • Add PartitionFilters and DataFilters to the dataSourceInfo table (#1346)
  • Add support to ArrayJoin in Qualification tool (#1345)

Miscellaneous

  • Cluster information should handle dynamic allocation and nodes being removed and added (#1369)
  • Rename tag core to core_tools (#1350)

v24.08.2

10 Sep 21:25
Compare
Choose a tag to compare

Packages

Changes

User Tools

  • Add end-to-end behavioural tests for the python CLI (#1313)
  • Add documentation for qualx plugins (#1337)
  • Allow spark dependency to be configured dynamically (#1326)
  • Follow-up 1318: Fix QualX fallback with default speedup and duration columns (#1330)
  • Updated models for EMR NDS-H dataset (#1331)

Core

  • [FEA] Add total core seconds in Qualification core tool output (#1320)
  • Add support to MaxBy and MinBy in Qualification tool (#1335)
  • Add safeguards to prevent older attempts from generating metrics output in Scala Tool (#1324)
  • Sync up DAYTIME and YEARMONTH fields with CSV plugin files (#1328)

Miscellaneous

  • Update signoff usage [skip ci] (#1332)

v24.08.1

04 Sep 01:06
Compare
Choose a tag to compare

Packages

Changes

User Tools

  • [DOC] spark_rapids CLI help cmd still shows cost savings (#1317)
  • Fix Qualification and Profiling tools CLI argument shorthands (#1312)
  • Raise error for enum creation from invalid string values (#1300)
  • Append HADOOP_CONF_DIR to the tools CLASSPATH execution cmd (#1308)
  • Fix key error and cross-join error during qualx evaluate (#1298)
  • Qual tool: Print more useful log messages when failures happen downloading dependencies (#1292)
  • Fix --help text for custom_model_file option (#1285)

Core

  • Remove legacy SpeedupFactor from core output files (#1318)
  • Mark decimalsum as supported in Qualification tool (#1323)
  • Mark SMJ as unsupported operator for corner cases in left join (#1309)
  • Remove arguments and code related to the html-report (#1311)
  • Handle SparkRapidsBuildInfoEvent in GPU event logs (#1203)
  • Enable recursive search for event logs by default and optional --no-recursion flag (#1297)
  • Qualification tool support filtering by a filesystem time range (#1299)
  • Skip generating timeline for stages that do not have completion time (#1290)
  • Save core tools logs to output log file (#1269)
  • Qualification tool - Add option to filter by minimum event log size (#1291)
  • Include exception message for unknown app status in core tool (#1281)

Miscellaneous

  • Remove restricted google sheets link and outdated TCO section (#1289)

v24.08.0

13 Aug 02:52
Compare
Choose a tag to compare

Packages

Changes

User Tools

  • Remove calculation of gpu cluster recommendation from python tool when cluster argument is passed (#1278)
  • Remove unused argument --target_platform in Python Tool (#1279)
  • Qualification tool: Add output stats file for Execs(operators) (#1225)
  • Include GPU information in the cluster recommendation for Dataproc and OnPrem (#1265)
  • Remove speedup based recommendation column from qual_summary csv (#1268)
  • Fix prediction CSV files for multiple qual directories (#1267)
  • Clean up tools after removing CLI dependency (#1256)
  • Rename cluster shape columns to use 'worker' prefix in the output files and rename metadata file (#1258)
  • Remove CLI dependency in Dataproc _pull_gpu_hw_info implementation (#1245)
  • Replace split_nds with split_train_val (#1252)
  • Update xgboost models and metrics (#1244)
  • Add footnotes for config recommendations and speedup category in top candidate view (#1243)
  • [BUG] Update Dataproc instance catalog for n1 series GPU info (#1242)
  • Improvements in Cluster Config Recommender (#1241)
  • Improve console output from python tool for failed/gpu/photon event logs (#1235)
  • [FEA] Generate and use instance description file for Databricks-Azure platform (#1232)
  • Remove arguments related to cost-savings (#1230)
  • Updated models for latest databricks-aws datasets (#1231)
  • Refactor QualX for Linter and Test Compatibility (#1228)
  • Generate summary metadata file and fix node recommendation in python (#1216)
  • [FEA] Remove gcloud CLI dependency for Dataproc platform (#1223)
  • Updated models for latest dataproc eventlogs (#1226)
  • Remove estimation-model column from qualification summary (#1220)
  • Add option to add features.csv files to training set (#1212)
  • Disable cost saving functionality (#1218)
  • [FEA] Remove CLI dependency for EMR and Databricks-AWS platforms in user tool (#1196)
  • Fix some basic pylint errors in qualx code (#1210)
  • Qual tool tuning rec based on CPU event log coherently recommend tunings and node setup and infer cluster from eventlog (#1188)
  • Add shap command to internal CLI for debugging (#1197)
  • Add internal CLI to generate instance descriptions for CSPs (#1137)
  • [FEA] Support custom XGBoost model file via user tools CLI (#1184)
  • Updated models for new training data (#1186)
  • Add evaluate_summary command to internal CLI (#1185)
  • [DOC] Fix broken link to qualX docs and update python prerequisites (#1180)
  • Bump to certifi-2024.7.4 and urllib3-1.26.19 (#1173)
  • Disable UI-HTML report by default in Qualification tool (#1168)
  • Fix parsing App IDs inside metrics directory in QualX (#1167)
  • Refactor Databricks-AWS Qual tool to cache and process pricing info from DB website (#1141)
  • Add plugin mechanism for dataset-specific preprocessing in qualx (#1148)
  • Unsupported op logic should read action column from qual's output (#1150)
  • Update qualx readme for training (#1140)
  • Disable pylint-unreachable code in tox.ini (#1145)

Core

  • Include GPU information in the cluster recommendation for Dataproc and OnPrem (#1265)
  • [TASK] Optimize the storage of accumulables in core tools (#1263)
  • Sync GetJsonObject support with Rapids-Plugin (#1266)
  • Do not create new StageInfo object (#1261)
  • [FEA] Add support for map_from_arrays in qualification tools (#1248)
  • Rename cluster shape columns to use 'worker' prefix in the output files and rename metadata file (#1258)
  • Fix stage level metrics output csv file (#1251)
  • Handle event logs with wildcards in status report generation (#1237)
  • Fix duplicate records in DataSourceInfo report (#1227)
  • Reduce memory footprint of stageInfo (#1222)
  • Ensure UTF-8 encoding for reading non-english characters (#1211)
  • Sync plugin support for hash-hive and shift operators (#1198)
  • Sync-up the support of parse_url in qualification tool (#1195)
  • Include status information for failed event logs in core tool (#1187)
  • [FEA] Adding Benchmarking classes to evaluate core tools performance (#1169)
  • [BUG] Fix handling of non-english characters in tools output files (#1189)
  • [Bug] Fix java Qual tool handling of --platform argument (#1161)
  • Add all stage metrics to tools output (#1151)
  • Follow-up 1142: remove TODO line (#1146)
  • Mark wholestageCodeGen as shouldRemove when child nodes are removed (#1142)
  • [FEA] Display full failure messages in failed CSV files (#1135)

Miscellaneous

  • Qualification tool: Add option to filter event logs for a maximum file system size (#1275)
  • Qualification tool should print Kryo related recommendations (#1204)
  • Fix header check script to exclude files (#1224)
  • Update header check script for pre-commit hooks (#1219)
  • Follow-up 1189: handle non-english characters in data-output.js (#1208)
  • Update pre-commit hooks to check for headers and white-spaces (#1205)
  • user-tools:Update --help for cluster argument (#1178)
  • Support fine-tuning models (#1174)

v24.06.1

18 Jun 22:44
Compare
Choose a tag to compare

Packages

Changes

User Tools

  • Fix Python runtime error caused by numpy 2.0.0 release (#1130)
  • Disable the spark_rapids bootstrap command (#1114)

Core

  • Handle different exception thrown by incomplete eventlogs (#1124)
  • Include number of executors per node in cluster information (#1119)