-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ASoC: SOF: sof-pcm/pm: Stop paused streams before the system suspend #5058
base: topic/sof-dev
Are you sure you want to change the base?
ASoC: SOF: sof-pcm/pm: Stop paused streams before the system suspend #5058
Conversation
Paused streams will not receive a suspend trigger, they will be marked by ALSA core as suspended and it's state is saved. Since the pause stream is not in a fully stopped state, for example DMA might be still enabled (just the trigger source is removed/disabled) we need to make sure that the hardware is ready to handle the suspend. This involves a bit more than just stopping a DMA since we also need to communicate with the firmware in a delicate sequence to follow IP programming flows. To make things a bit more challenging, these flows are different between IPC versions due to the fact that they use different messages to implement the same functionality. To avoid adding yet another path, callbacks and sequencing for handling the corner case of suspending while a stream is paused, and do this for each IPC versions and platforms, we can move the stream back to running just to put it to stopped state. Explanation of the change: Streams moved to SUSPENDED state from PAUSED without trigger. If a stream does not support RESUME then on system resume the RESUME trigger is not sent, the stream's state and suspended_state remains untouched. When the user space releases the pause then the core will reject this because the state of the stream is _not_ PAUSED, it is still SUSPENDED. From this point user space will do the normal (hw_params) prepare and START, PAUSE_RELEASE trigger will not be sent by the core after the system has resumed. Link: thesofproject#5035 Signed-off-by: Peter Ujfalusi <[email protected]>
Right, so this is a resurrection of #5040 with extended comments and explanation. Replaces: #5054 |
To note: this patch can handles so far all cases that I have thrown at it as long as we don't support RESUME, With RESUME advertised the system locks up and parts of #5054 is needed to fix the lock, but we really cannot handle this corner case in any other sane way. |
continue; | ||
|
||
ret = substream->ops->trigger(substream, cmd); | ||
if (ret) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the core still thinks that the stream is in the paused state when in fact we went ahead and stopped it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The core thinks that the stream is suspended, the state it was before it got suspended was paused as far as the core knows, but if the SNDRV_PCM_INFO_RESUME is not set then on system resume the stream will stay in this state and on an attempt to un-pause it, it will fail as the stream is not in paused state, it is still in suspended.
Might be a bug in core, but this is how it has been, so it is a feature now.
We don't support RESUME, so after resume the stream must be restarted, this applies to running or paused streams, so to cut down on things, we just stop the paused ones to avoid dealing with different flows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ujfalusi will this handle both the BE and FE pipelines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ujfalusi the other concern I have with this is that when you stop during suspend and then after system resumne, the user releases the pause, the started_count will not be incremented right? what happens during stop after pause/release then? Wont the started_count go below zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ujfalusi will this handle both the BE and FE pipelines?
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ujfalusi the other concern I have with this is that when you stop during suspend and then after system resumne, the user releases the pause, the started_count will not be incremented right? what happens during stop after pause/release then? Wont the started_count go below zero?
The user releasing the pause will not going to be propagated down here, it is going to error out in the core because the stream is in SUSPENDED state.
Since the release of pause failed, user space will start the stream again.
This patch handles this correctly as well, counters are all well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not move the state back to PAUSED on resume? It seems like a better idea that building on a design that relies on an error concealment by userspace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not move the state back to PAUSED on resume? It seems like a better idea that building on a design that relies on an error concealment by userspace?
I don't think we are able to do that easily. To move to PAUSED we need to start first right after the resume and since we don't support RESUME we would need someone to call prepare() then trigger:START.
This is what happens if we suspend with active audio as well. prepare() followed by trigger() and this is what happens if we were in PAUSED state: core will reject a PAUSE_RELEASE after the system has resumed, so user space will do a complete restart (or close the PCM and gives up).
return ret; | ||
|
||
return sof_pcm_trigger_suspended_paused_streams(sdev, | ||
SNDRV_PCM_TRIGGER_STOP); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it actually important to first release and then to stop all streams or would it also work if you send release + stop to each of them in turn? That would be easier technically (one loop instead of 2) and the window to get some audio in or out for individual streams would get smaller.
Also, doesn't the commit description have to be updated - it's still saying that we just release paused streams
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it actually important to first release and then to stop all streams or would it also work if you send release + stop to each of them in turn? That would be easier technically (one loop instead of 2) and the window to get some audio in or out for individual streams would get smaller.
One loop does not work for some reason, if I do that then some refcounting will got crazy and we will end up with all sorts of errors.
I can re-test it again, but it was failing either with IPC3 or IPC4.
Also, doesn't the commit description have to be updated - it's still saying that we just release paused streams
Not sure what you mean, I think it is saying that we stop them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it actually important to first release and then to stop all streams or would it also work if you send release + stop to each of them in turn? That would be easier technically (one loop instead of 2) and the window to get some audio in or out for individual streams would get smaller.
One loop does not work for some reason, if I do that then some refcounting will got crazy and we will end up with all sorts of errors. I can re-test it again, but it was failing either with IPC3 or IPC4.
@lyakh, the old #5040 implemented a single loop (it was with SUSPEND instead of STOP, but used STOP later) and there were a reason I added this comment: https://github.com/thesofproject/linux/pull/5040/files/358f4ee05c54dfa8fb74b6235ca6d3e08dc947ae#r1634986648
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, doesn't the commit description have to be updated - it's still saying that we just release paused streams
Not sure what you mean, I think it is saying that we stop them.
Right, sorry, I misunderstood
I think the important note is that this is fixing an extremely rare corner case. It is as rare as I think it is broken for sever years and we did not received a single report of it, but if it happens your machine will lock up hard. |
Coming back after a short break, I am not sure why this is needed? we have code in hda_dsp_dais_suspend() that supposedly takes care of the suspend-while-paused configuration. I am 100% sure we tested this with @ranj063 So the questions are: a) what is missing in the existing code? |
I'm also 100% sure that this was tested, but a long-long time ago and it got broken at some point (it was not teted in CI or by developers).
A completely new state machine addition to handle this corner case for IPC4 and for IPC3.
This PR cuts the Gordian-knot by placing the firmware, hardware and kernel states to what they should be using existing and well tested flows. |
Still not following if we need to keep the existing code in hda_dsp_dais_suspend() ?? I am now starting to wonder if the issue only happens with IPC4. |
In theory the hda_dsp_dais_suspend() will be a NOP as we stop all paused streams properly and not leaving anything to be handled later, but I would wait a bit for the removal.
The issue happens with both IPC3 and IPC4. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still many opens...
* supporting RESUME. | ||
* | ||
* First we need to move (trigger) the paused streams to RUNNING state, | ||
* then we need to stop them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this necessary? it's perfectly legal to go from PAUSED to STOPPED. it's not required to go back to RUNNING
This is the case for soc-pcm.c:
case SNDRV_PCM_TRIGGER_STOP:
if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_START) &&
(be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED))
goto next;
and likewise IPC4 transitions are fine with a PAUSE->STOP.
IMHO the transition from PAUSED to RUNNING creates a racy behavior where the state can change and some samples might be generated before stopping again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this necessary? it's perfectly legal to go from PAUSED to STOPPED. it's not required to go back to RUNNING
This is the case for soc-pcm.c:
case SNDRV_PCM_TRIGGER_STOP: if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_START) && (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED)) goto next;
@plbossart, it might be so in ASoC DPCM, but if a stream is PAUSED then the core would not send a STOP trigger:
static int snd_pcm_do_stop(struct snd_pcm_substream *substream,
snd_pcm_state_t state)
{
if (substream->runtime->trigger_master == substream &&
snd_pcm_running(substream)) {
substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_STOP);
substream->runtime->stop_operating = true;
}
return 0; /* unconditionally stop all substreams */
}
and likewise IPC4 transitions are fine with a PAUSE->STOP.
Furthermore, we in sof do not handle well the PAUSE->STOP if we handle it all.
Yes, I have tested this as the first thing.
IMHO the transition from PAUSED to RUNNING creates a racy behavior where the state can change and some samples might be generated before stopping again.
Yes, that might happen. We are talking about a rare corner case. The options is to end up with a broken stack or a possible glitch.
continue; | ||
|
||
ret = substream->ops->trigger(substream, cmd); | ||
if (ret) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not move the state back to PAUSED on resume? It seems like a better idea that building on a design that relies on an error concealment by userspace?
@ujfalusi can I suggest a different approach? ALSA suspends stream only if the dpcm state is START. How about we add PAUSED to this also? I mean invoke the dai suspend trigger but leave the state as PAUSED. That way we wont have to do any special handling in the SOF driver at all? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plbossart, the state transitions are just too complicated and they differ between IPC3 and IPC4. If we want to handle the suspend while paused we would need unknown (but not small!) amount of tweaking, extending, rewriting of the state machines and ops in core and in vendor code.
I'm not saying that it is not possible, but every time we do that, the state handling goes more and more complex to comprehend what is going on.
Cutting the knot with a good amount of documentation and explanation is far better solution, imho.
* supporting RESUME. | ||
* | ||
* First we need to move (trigger) the paused streams to RUNNING state, | ||
* then we need to stop them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this necessary? it's perfectly legal to go from PAUSED to STOPPED. it's not required to go back to RUNNING
This is the case for soc-pcm.c:
case SNDRV_PCM_TRIGGER_STOP: if ((be->dpcm[stream].state != SND_SOC_DPCM_STATE_START) && (be->dpcm[stream].state != SND_SOC_DPCM_STATE_PAUSED)) goto next;
@plbossart, it might be so in ASoC DPCM, but if a stream is PAUSED then the core would not send a STOP trigger:
static int snd_pcm_do_stop(struct snd_pcm_substream *substream,
snd_pcm_state_t state)
{
if (substream->runtime->trigger_master == substream &&
snd_pcm_running(substream)) {
substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_STOP);
substream->runtime->stop_operating = true;
}
return 0; /* unconditionally stop all substreams */
}
and likewise IPC4 transitions are fine with a PAUSE->STOP.
Furthermore, we in sof do not handle well the PAUSE->STOP if we handle it all.
Yes, I have tested this as the first thing.
IMHO the transition from PAUSED to RUNNING creates a racy behavior where the state can change and some samples might be generated before stopping again.
Yes, that might happen. We are talking about a rare corner case. The options is to end up with a broken stack or a possible glitch.
continue; | ||
|
||
ret = substream->ops->trigger(substream, cmd); | ||
if (ret) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not move the state back to PAUSED on resume? It seems like a better idea that building on a design that relies on an error concealment by userspace?
I don't think we are able to do that easily. To move to PAUSED we need to start first right after the resume and since we don't support RESUME we would need someone to call prepare() then trigger:START.
This is what happens if we suspend with active audio as well. prepare() followed by trigger() and this is what happens if we were in PAUSED state: core will reject a PAUSE_RELEASE after the system has resumed, so user space will do a complete restart (or close the PCM and gives up).
@ranj063, I did thought about that, but that will have effect on all drivers. I cannot predict how they will handle this fundamental change. |
@ujfalusi why not make it opt-in? One suggestion I have is to make this a property of struct snd_soc_pcm_runtime that can be set by the driver opting-in in the be_hw_params_fixup callback |
it is ALSA core which stops the triggers. |
@ujfalusi this is what I had in mind
|
@ranj063, that is all fine, but the SUSPEND will not reach that point since in sound/core/pcm_native.c we have this: static int snd_pcm_do_suspend(struct snd_pcm_substream *substream,
snd_pcm_state_t state)
{
struct snd_pcm_runtime *runtime = substream->runtime;
if (runtime->trigger_master != substream)
return 0;
if (! snd_pcm_running(substream))
return 0;
substream->ops->trigger(substream, SNDRV_PCM_TRIGGER_SUSPEND);
runtime->stop_operating = true;
return 0; /* suspend unconditionally */
} The trigger is 'eaten' up in core level. |
Can we extend this to snd_pcm_runtime also to do the same or do you think its too invasive? |
Since this trigger sequence has never been tested (as it was not even possible to get SUSPEND while PAUSED) it is highly likely that it is broken. I have reservations on how simple is to handle this also. |
But you're trying to do the same within our driver anyway no by calling the suspend trigger for paused streams? Wouldnt it make sense to simplify it? |
@ujfalusi how about consulting with Mark or Takashi about this instead of assuming that we have to deal with this in the SOF driver? |
Not really, I'm moving the paused stream to RUNNING and then to STOPPED. If we move it to SUSPENDED then there were other issues popping up. After system resume a SUSPENDED stream is handled differently than a PAUSED stream, given the SNDRV_PCM_INFO_RESUME is not set. For example: if the stream was PAUSED, it will not receive the RESUME trigger... |
@ranj063, yes, it is just I never had time to think about he proper not too long intro.. |
ok, pls confirm this only impacts paused streams at suspend and normal running streams are stopped properly ? If so - we dont have any impact atm as none of our pipeline advertise pause today. We can revisit this though if ever needed. |
Paused streams will not receive a suspend trigger, they will be marked by ALSA core as suspended and it's state is saved.
Since the pause stream is not in a fully stopped state, for example DMA might be still enabled (just the trigger source is removed/disabled) we need to make sure that the hardware is ready to handle the suspend.
This involves a bit more than just stopping a DMA since we also need to communicate with the firmware in a delicate sequence to follow IP programming flows.
To make things a bit more challenging, these flows are different between IPC versions due to the fact that they use different messages to implement the same functionality.
To avoid adding yet another path, callbacks and sequencing for handling the corner case of suspending while a stream is paused, and do this for each IPC versions and platforms, we can move the stream back to running just to put it to stopped state.
Explanation of the change:
Streams moved to SUSPENDED state from PAUSED without trigger. If a stream does not support RESUME then on system resume the RESUME trigger is not sent, the stream's state and suspended_state remains untouched. When the user space releases the pause then the core will reject this because the state of the stream is not PAUSED, it is still SUSPENDED.
From this point user space will do the normal (hw_params) prepare and START, PAUSE_RELEASE trigger will not be sent by the core after the system has resumed.
Link: #5035