-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
soundwire: intel_bus_common: enable interrupts before exiting reset #5106
soundwire: intel_bus_common: enable interrupts before exiting reset #5106
Conversation
The existing code enables the Cadence IP interrupts after the bus reset sequence. The problem with this sequence is that it might be pre-empted, giving SoundWire devices time to sync and report as ATTACHED before the interrupts are enabled. In that case, the Cadence IP will not detect a state change and will not throw an interrupt to proceed with the enumeration of a Device0. In our overnight stress tests, we observed that a slight sub-millisecond delay in enabling interrupts after the reset was correlated with detection failures. This problem is more prevalent on the LunarLake silicon, likely due to SOC integration changes, but it was observed on earlier generations as well. This patch reverts the sequence, with the interrupts enabled before performing the bus reset. This removes the race condition and makes sure the Cadence IP is able to detect the presence of a Device0 in all cases. Signed-off-by: Pierre-Louis Bossart <[email protected]>
…Attached This is for CI only, this will help detect cases where Device0 was not enumerated with the interrupt in the delayed work. Signed-off-by: Pierre-Louis Bossart <[email protected]>
multiple-pause-resume failed on all LNL models in https://sof-ci.01.org/linuxpr/PR5106/build4026/devicetest/index.html At least one of the failures looks like #5109 which I just filed. Everything else is either passing or missing. |
SOFCI TEST |
@marc-hb we can ignore the pause/release issues, this isn't related to this PR at all. What is changed here is the SoundWire bus startup, it will only be influenced by pm_runtime transitions. |
@bardliao can you take this PR and let it run overnight on your device to see if this confirms my results that we don't see the enumeration issue? |
I can only run the test on a CI device as I work remotely this week. Currently all LNLM_SDW_AIOC devices are reserved. I will check if I can reserve one later. |
don't worry about it @bardliao I'll run a test myself then. We can re-run more tests next week. |
I ran a test last night, it passed 853 iterations before my network disconnected. I am running another test now. |
I didn't get "Peripheral 0" error, but got
|
NACK means the command isn't accepted by the Peripheral, that's usually a sign of low-level electrical issues. But it's also possible that by enabling interrupts earlier, we might get some weird signals on the bus and then the interrupt and command channels might be disrupted. So we should retry and see if this happens again... I was hoping to merge this quickly but that's a red flag. |
I retested and it passed more than 1500 iterations. |
ok let's merge then and see what happens. We can always revert. |
Humm let's redo the tests first, we seem to have a regression on CML? https://sof-ci.01.org/linuxpr/PR5106/build4064/devicetest/index.html?model=CML_RVP_SDW-ipc3&testcase=multiple-pipeline-all |
SOFCI TEST |
the CML problem didn't re-occur, let's merge then and see what the daily tests show |
ed7746e
into
thesofproject:topic/sof-dev
Only if you count a bug filed in 2021 as a "regression" :-D |
BTW, I am still running the test on a LNLM_SDW_AIOC device. It passed more than 1950 iterations so far. I will stop it before off work today. |
It passed more than 2500 iterations. I will stop here and release the device. |
The existing code enables the Cadence IP interrupts after the bus reset sequence. The problem with this sequence is that it might be pre-empted, giving SoundWire devices time to sync and report as ATTACHED before the interrupts are enabled. In that case, the Cadence IP will not detect a state change and will not throw an interrupt to proceed with the enumeration of a Device0.
In our overnight stress tests, we observed that a slight sub-millisecond delay in enabling interrupts after the reset was correlated with detection failures. This problem is more prevalent on the LunarLake silicon, likely due to SOC integration changes, but it was observed on earlier generations as well.
This patch reverts the sequence, with the interrupts enabled before performing the bus reset. This removes the race condition and makes sure the Cadence IP is able to detect the presence of a Device0 in all cases.