Skip to content

Commit

Permalink
fix #575 use a flag to enable large-kernel algo
Browse files Browse the repository at this point in the history
  • Loading branch information
FindDefinition committed Mar 23, 2023
1 parent f101f97 commit cd99e7a
Show file tree
Hide file tree
Showing 3 changed files with 260 additions and 182 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## [2.3.5] - 2023-03-24
### Fixed
- pypi project reach size limit, so we need to assign a new version number.
- use a flag to enable large kernel algo (need time to compile at runtime)

## [2.3.4] - 2023-03-23
### Added
Expand Down
4 changes: 3 additions & 1 deletion docs/PERFORMANCE_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,6 @@
* spconv 2.x in Windows 10 is 1.5x~2x slower than Linux. use Linux if possible.
* If you train with float32 and ampere or later GPUs, you can set ```spconv.constants.SPCONV_ALLOW_TF32``` to enable faster fp32 training.
See [benchmark](BENCHMARK.md) for more performance details of different algorithms.
* Different CUDA version of spconv may have different performance. Use newest cuda version if possible. For example, spconv-cu117 is faster than spconv-cu114, spconv-cu114 is faster than spconv-cu111.
* Different CUDA version of spconv may have different performance. Use newest cuda version if possible. For example, spconv-cu117 is faster than spconv-cu114, spconv-cu114 is faster than spconv-cu111.
* if your kernel size volume larger than 32, spconv will use a slower (and more inaccurate in fp16) algorithm. to use a faster algo for large kernel size (need time to compile at runtime), use ```large_kernel_fast_algo=True```
* use ```SparseGlobalMaxPool``` instead of use large kernel size when you need global pool.
Loading

0 comments on commit cd99e7a

Please sign in to comment.