-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unnecessary barrier + possibly incorrect barriers? #1097
Comments
This is the intra-compute barrier function I've done in my port: void computeToComputeBarrier(const vk::CommandBuffer& cmdBuf, bool reverse) {
std::array<vk::BufferMemoryBarrier2, 2> barriers;
barriers[0] = vk::BufferMemoryBarrier2{ //
// src stage and access
vk::PipelineStageFlagBits2::eComputeShader, vk::AccessFlagBits2::eShaderWrite,
// dst stage and access
vk::PipelineStageFlagBits2::eComputeShader, vk::AccessFlagBits2::eShaderRead,
// src and dst stage
VK_QUEUE_FAMILY_IGNORED, VK_QUEUE_FAMILY_IGNORED,
// buffer, offset and size
reverse ? storageBuffers.input.buffer : storageBuffers.output.buffer, 0, VK_WHOLE_SIZE
};
barriers[1] = vk::BufferMemoryBarrier2{ //
// src stage and access
vk::PipelineStageFlagBits2::eComputeShader, vk::AccessFlagBits2::eShaderRead,
// dst stage and access
vk::PipelineStageFlagBits2::eComputeShader, vk::AccessFlagBits2::eShaderWrite,
// src and dst stage
VK_QUEUE_FAMILY_IGNORED, VK_QUEUE_FAMILY_IGNORED,
// buffer, offset and size
reverse ? storageBuffers.output.buffer : storageBuffers.input.buffer, 0, VK_WHOLE_SIZE
};
cmdBuf.pipelineBarrier2(vk::DependencyInfo{ {}, nullptr, barriers, nullptr });
} I'm able to run this without any validation errors, though I'm not getting validation errors on your version either, so I guess the layers can't detect something subtle like this. |
Sorry, I didn't realize you already had a global "rework sync" issue that would cover this here #871 |
I may have been the last one to touch this sync code, so I should probably try to answer your comments above:
I think you may be right about this for the compute-to-graphics transition.
The only compute queue memory barriers that matter are the write-to-read barriers. These are needed to ensure the previous stage write operations are completed before reads happen in the next stage of the pipeline (when the buffers flip from write-mode to read-mode). That's why you only see write-to-read barriers in the code. Read-to-write operations do not require memory barriers - these are execution based. See https://github.com/KhronosGroup/Vulkan-Docs/wiki/Synchronization-Examples-(Legacy-synchronization-APIs) for more info. I will have another look at the barriers, including the cross-queue ones, to see if any need correction or simplification (i.e. make sure masks are correct and get rid of unneeded barriers for buffers not involved in transitions). |
After studying the existing code, the compute-to-compute buffer memory barriers are functional as is but could use some optimization based on your analysis above. The write-to-read barrier is correct, but only needs to be applied to one of the buffers during each call, i.e. the one being written to. There is no need to create a memory barrier for the other buffer which is being read from, since the same The split queue transfer operations use access masks and pipeline stages for the graphics and compute queues. For the graphics -> compute queue transfer I think it's better to use VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT instead of VK_PIPELINE_STAGE_VERTEX_INPUT_BIT, and also use VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT for the access mask. Regarding the "extra" queue ownership transitions for Thanks @jherico for pointing out these issues. I will submit a PR that addresses them, and in addition adds double buffering for the compute queue for parallel operation with the graphics queue. I always wondered why there were 2 compute command buffers defined, but only 1 was used. I suspect the double buffering code was never completed. I have completed this with a small number of changes primarily to the semaphores. The results are quite interesting and speed up the fps for this example by ~50% on Windows, and >100% on Linux. No change on macOS since there is no dedicated compute queue on that platform - it's shared with the graphics queue. Here are some RenderDoc screen caps so you can see what is happening with the compute buffers in the compute-to-compute transitions (using my pending PR). The purple triangles are the memory barriers, and they follow the yellow write operations. The read operations are also marked in yellow, but they are not followed by memory barriers. You can see how the memory barriers are interleaved for |
My experience with this is that when I corrected the barriers, the behavior changed drastically, specifically the amount of computed movement per frame increased. I suspect that the current barrier setup may be sufficiently correct (or just too complicated) to satisfy the validation layers but is causing the computations in the 64-pass compute loop to be overwritten frequently. I'll try to grab a local renderdoc capture and see if I can verify the issue. |
There are now two pull request for this. Which one should a look into? |
You're moving both the storage buffers between the compute and graphics queues and vice versa, but only the
storageBuffers.output.buffer
is actually used on the graphics queue.Vulkan/examples/computecloth/computecloth.cpp
Line 188 in 44ff7a1
Looking more closely at this, I think the logic in
addComputeToComputeBarriers
is wrong. The loop inbuildComputeCommandBuffer
executes 64 times, using the descriptor set to swap which buffer is being read and which one is being written, but theaddComputeToComputeBarriers
only ever does one thing... it ensures that both the input and output buffers are going from a write state to a read state.One barrier should be going from read to write, and the other should be going from write to read, and the next frame they should swap back, over and over until the loop is exited, at which point the cross-queue barriers come into play.
The text was updated successfully, but these errors were encountered: