Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement tile size scaling for the compute renderer #2065

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

FireNX70
Copy link

Based on @RSDuck's comments in #2047.

Right now it scales starting at 4.5x (effectively 5x). Based on some more recent testing, it looks like it should start scaling at 3.25x (effectively 4x). See this gist for more information. It also contains notes on the rationale behind the values for ClearCoarseBinMaskLocalSize.

The value used for CoarseTileCountY when using 32x32 tiles was chosen to match the base resolution's vertical resolution of 192 (6 32x32 tiles vertically per coarse tile means coarse tiles end up being 256x192).

@FireNX70
Copy link
Author

As far as I can tell, the absurd line count is github going insane over the changes to the shaders.

@CasualPokePlayer
Copy link
Contributor

CasualPokePlayer commented Jun 13, 2024

It seems more that the files' line endings got converted from LF to CRLF for some reason (text editor/IDE goof?)

@FireNX70
Copy link
Author

You're right, looks like VSCodium got me. Should be fairly easy to fix.

@FireNX70 FireNX70 force-pushed the try-fixing-the-compute-renderer branch from 00dd600 to 6f8ce9f Compare June 13, 2024 07:41
@FireNX70
Copy link
Author

Ok, looks like it's fixed but I'm not sure the line endings for the template I added to Utils.h aren't messed up. Thanks for the tip @CasualPokePlayer.

@OpenSauce04
Copy link

OpenSauce04 commented Jul 12, 2024

Just tested this on my end, and it indeed seems like this PR makes the issue disappear completely.
Is there any particular holdup for this being merged?

@user18081972
Copy link

user18081972 commented Jul 14, 2024

Just tested this on my end, and it indeed seems like this PR makes the issue disappear completely. Is there any particular holdup for this being merged?

it doesnt fix it completely, but very close. see: #2047 (comment)
Theres a few locations that still show corruption on some specific internal resolutions

@FireNX70
Copy link
Author

Tile scaling alone will never fix the problem completely. At scales 13x and higher it would need tiles larger than 32x32. This isn't feasible because there's a limit of 1024 on the product of the local sizes in compute shaders. Implementations are free to raise this limit, but nVidia (which is the one implementation that has the problem with UB to begin with) doesn't, so we're limited to 32x32 tiles (the tile size is used as the x and y local sizes in at least one shader).

As for 4x and 8x, it's what I wrote in the OP. My branch scales on a factor of 4.5, it looks like more aggressive scaling is needed. Specifically, using a factor of 3.25 (scaling at 4x and 7x, instead of 5x and 9x) seems like it should be fine. 13x and higher will still be a problem, so I was unsure if it was even worth changing and I decided to wait for @RSDuck's input.

FYI, the worst-case scenario I used for testing was MKDS's Desert Hills fire balls (near the pyramid, during laps 2 and 3). You have to try and get them as close to the camera as possible.

@extherian
Copy link

I found this implementation flawless in Phantom Hourglass and Spirit Tracks, at any resolution from 1x to 16x. A far as I can tell, it completely fixes this issue. What's the holdup on getting it merged? Is this planned to be part of the November v1.0 release?

@user18081972
Copy link

I tested this PR on an Intel Arc A380 GPU 6GB. Multiple AMD GPUs, I tried on an Apple M3 but realized compute shader renderer is not supported there, and I tested it a lot on my main Nvidia 3000 series gpu.
I couldn't find any regressions.

I even tested it on RC 1.0, didn't seem to cause any problems.

@Tagger102
Copy link

Hopefully this will be officially merged into main! It fixes all issues I noticed testing on my RTX 3090, currently using 1.0RC with this fix merged into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants