Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASoC: SOF: core/Intel: Follow the pause_supported flag from topology to disable or keep the pause support enabled #5041

Conversation

ujfalusi
Copy link
Collaborator

@ujfalusi ujfalusi commented Jun 5, 2024

Hi,

new token set is added by thesofproject/sof#9202 to indicate that the pause operation is supported or not on the given PCM device.

Pause is an optional feature that depends on pipeline, topology and modules running within that pipeline. Lets have a marker to help userspace and kernel know which pipelines should advertise pause.

Use this information in hda-pcm.c to disable the pause support or keep it enabled.

@ujfalusi ujfalusi changed the title ASoC: SOF: core/Intel: FOllow the hint from topology to disable or keep the pause support enabeld ASoC: SOF: core/Intel: Follow the hint from topology to disable or keep the pause support enabeld Jun 5, 2024
@ujfalusi
Copy link
Collaborator Author

ujfalusi commented Jun 5, 2024

Yes, with this PR the PAUSE is going to be mostly disabled for Intel platforms regardless of IPC version.

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typoe in title "enabeld"

Patch looks good but I wonder if this is ok to change the behaviour for existing systems -- this will go all the way back to CML/CFL systems (and not just Intel).. It's possible somebody is happily using SOF on a system with apps that use pause, and this kernel PR could break their use-case when they upgrade the kernel.

Staged introduction (or disable_pause token in tplg) would be more incremental.

@plbossart
Copy link
Member

I am not aligned with the approach, sorry.

I agree that PAUSE is not a release blocker. but in practice it's the only way to trigger quick transitions and find bugs we wouldn't see otherwise.

It's also incorrect to say it's not been tested. What has not been tested is suspend while paused which is a second-order corner case.

The net effect of this topology token would be to COMPLETELY disable PAUSE, including in our CI tests. That's different to disabling it for releases, and making an educated decision to avoid spending cycles on problematic issues with no ROI.

The topology approach has zero flexibiliity, it's all or nothing.

I wouldn't mind if this topology-based approach was contingent on a 'default n' kernel Kconfig, which would only be used by developers and CI tests. I am not ok with completely disabling this PAUSE capability for tests.

@ujfalusi
Copy link
Collaborator Author

ujfalusi commented Jun 6, 2024

Typoe in title "enabeld"

Patch looks good but I wonder if this is ok to change the behaviour for existing systems -- this will go all the way back to CML/CFL systems (and not just Intel).

Right now the suspend while paused is causing the system to hard freeze. I have no idea when this regression happened, but it shows the extent to testing and use of pause/resume. Sure, this is a corner case, but I have a faint recollection that it worked.
The only solution so far for the freeze is my #5040, for both IPC3 and IPC4.

It's possible somebody is happily using SOF on a system with apps that use pause, and this kernel PR could break their use-case when they upgrade the kernel.

Yes, everything is a possibility, true.

Staged introduction (or disable_pause token in tplg) would be more incremental.

Initially I wanted to have opt-out for pause/resume but #5035 shows that this feature has not been tested by CI and users to gave any confidence that it is working.

@ujfalusi ujfalusi force-pushed the peter/sof/pr/pause-resume-token-01 branch from 91a1557 to c6e6b47 Compare June 7, 2024 07:49
@ujfalusi
Copy link
Collaborator Author

ujfalusi commented Jun 7, 2024

Changes since v1:

  • add force_pause_support module parameter (to hda-pcm.c) to ignore the pasue_supported flag from topology and to keep the pause enabled. Iow, if the force_pause_support is set to true then there is no change in behavior

@plbossart
Copy link
Member

@fredoh9 @marc-hb @ssavati can we add this to the CI configuration before merging this:

options snd_sof_intel_hda_common force_pause_support=true

plbossart
plbossart previously approved these changes Jun 7, 2024
Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, let's make sure CI is ready before merging this, otherwise we're going to get a slew of errors across the board.

@ujfalusi ujfalusi force-pushed the peter/sof/pr/pause-resume-token-01 branch from c6e6b47 to bb8730d Compare June 7, 2024 12:52
@ujfalusi
Copy link
Collaborator Author

ujfalusi commented Jun 7, 2024

Changes since v2:

  • drop thehda_ prefix for the module parameter name (no idea how it keeps coming back - removed like four times while playing with it...)

plbossart
plbossart previously approved these changes Jun 7, 2024
ranj063
ranj063 previously approved these changes Jun 13, 2024
@marc-hb
Copy link
Collaborator

marc-hb commented Jun 13, 2024

but #5035 shows that this feature has not been tested by CI and users to gave any confidence that it is working.

#5035 is a pretty specific corner case. Simpler pause/resume cases have been regularly tested for a long time with very low failure rates.

@@ -240,6 +245,14 @@ int hda_dsp_pcm_open(struct snd_sof_dev *sdev,
if (hda_always_enable_dmi_l1 && direction == SNDRV_PCM_STREAM_CAPTURE)
runtime->hw.info &= ~SNDRV_PCM_INFO_PAUSE;

/*
* Follow the hint (unless a module parameter is set to ignore it) and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this called a "hint" when it's always followed? Because older kernel versions ignore it?

This could/should be clearer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flag is documented in previous patch as 'hint' as the flag is defaulting to false for all platforms, it is only handled by Intel code and older topologies will have PAUSE disabled also.
This is intentional decision, but I rather call it as a hint in SOF context than a hard rule, I don't expect NXP/AMD/Mediatek to disable PAUSE support, but they can if they also set to follow this hint.

Copy link
Collaborator

@marc-hb marc-hb Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I should have spent more time reading the commit message. A commit message is not a substitute for documentation though, is there or will there be any? Somewhere in ALSA user space maybe?

A "hint" is indeed something that may or may not have an effect depending on other factors. However this particular "hint" will be 100% deterministic on every device! It will be 100% effective on some devices and 100% ignored on other devices. It is a "hint" only from the perspective of the developer (=you) looking at all devices at once. But it is not a hint from the perspective of any final user. I don't think this should be called a "hint", it's not necessary and quite misleading. It's just a parameter that is not supported in every configuration. Ignoring some (e.g. incompatible) parameter in some configurations is not so unusual and not called a "hint".

I don't expect NXP/AMD/Mediatek to disable PAUSE support, but they can if they also set to follow this hint.

What would be great is to print a warning in all configurations where the parameter is set and ignored, is that possible? Silently ignoring user input is extremely time consuming.

@ujfalusi ujfalusi dismissed stale reviews from ranj063 and plbossart via e5e16f5 July 12, 2024 16:24
@ujfalusi ujfalusi force-pushed the peter/sof/pr/pause-resume-token-01 branch from bb8730d to e5e16f5 Compare July 12, 2024 16:24
@ujfalusi
Copy link
Collaborator Author

Changes since v3:

  • Reword commit messages and comments to omit the 'hint' word.

Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of corrections needed @ujfalusi but no objections on the directions.

@@ -332,6 +332,14 @@ struct snd_sof_pcm_stream {
struct work_struct period_elapsed_work;
struct snd_soc_dapm_widget_list *list; /* list of connected DAPM widgets */
bool d0i3_compatible; /* DSP can be in D0I3 when this pcm is opened */
/*
* Flag for the core and platform code about PCM pause/resume support.
* Note: ENd-users are using audio via media servers (Pulseaudio, Pipewire,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stray capital letter: End-users

* Flag for the core and platform code about PCM pause/resume support.
* Note: ENd-users are using audio via media servers (Pulseaudio, Pipewire,
* CRAS, etc) and media servers do not use pause/resume.
* Media players using the ALSA directly have means to handle PCM devices
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using the ALSA API directly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or 'using ALSA directly' ?

include/uapi/sound/sof/tokens.h Show resolved Hide resolved
sound/soc/sof/intel/hda-pcm.c Show resolved Hide resolved
@ujfalusi ujfalusi force-pushed the peter/sof/pr/pause-resume-token-01 branch from e5e16f5 to 783785a Compare July 16, 2024 12:09
@ujfalusi
Copy link
Collaborator Author

ujfalusi commented Jul 19, 2024

@marc-hb, something is not quite right in the expect script in case-lib/apause.exp, not sure what to be honest.

It is pretty interesting what aplay/arecord does.
Pause is supported

  • if you run w/ -v or with one -v and pause is supported
    After the first SPACE it will print === PAUSED === with the cursor moved to column 71:
    image
    After releasing the pause the === PAUSED === remain, but the cursor appears to be at the start of the line:
    image
  • if you have -vv then the === PAUSED === replaces the volume bar
  • In every cases it is using the same line

Pause is not supported

  • if you run w/ -v or with one -v and pause is supported
    Each SPACE will result a print and a new line (PAUSE command ignored (no hw support))
    image
  • if you have -vv then it will still print lines and the volume meter is moving as the last line:
    image

But it certainly not printing === PAUSED ===, so the script is to blame for the false report.

@ujfalusi
Copy link
Collaborator Author

@marc-hb, I think I have found a plausible reason...
aplay prints out the === PAUSE === unconditionally before trying to pause. If pause is not supported then it will write PAUSE command ignored (no hw support)\n which will replace the previously printed === PAUSE ===
expect might be catching this.
https://github.com/alsa-project/alsa-utils/blob/master/aplay/aplay.c#L1645
https://github.com/alsa-project/alsa-utils/blob/master/aplay/aplay.c#L1610

So the expect script is OK.

@marc-hb
Copy link
Collaborator

marc-hb commented Jul 20, 2024

In every cases it is using the same line

Yes, aplay uses \r instead of \n to constantly overwrite its previous output and not scroll. The script should be ready for that. If you think it's not please file an new sof-test bug and I'll try to fix ASAP.

Generally speaking, Expect (or anything reading from a pipe) has no concept of "overwriting" or "scrolling"; only terminals do. Other programs reading from a pipe only see a stream of characters, some of them happening to be \r and \n.

aplay prints out the === PAUSE === unconditionally before trying to pause. If pause is not supported then it will write PAUSE command ignored (no hw support) which will replace the previously printed === PAUSE ===
expect might be catching this.

Thanks for the investigation!

So the expect script is OK.

The script is unfortunately not OK because it trusts the first, unconditional (and TBH: wrong) === PAUSE === message and changes its state machine accordingly. And then it gets very confused.

Not sure how to fix that. Is there some sort of more convenient API to detect the lack of pause support? And then abort/skip the test immediately instead of wasting a lot of time running pointlessly and returning misleading results. I mean more convenient than this very misleading aplay output.

@ujfalusi
Copy link
Collaborator Author

The script is unfortunately not OK because it trusts the first, unconditional (and TBH: wrong) === PAUSE === message and changes its state machine accordingly. And then it gets very confused.

Not sure how to fix that. Is there some sort of more convenient API to detect the lack of pause support? And then abort/skip the test immediately instead of wasting a lot of time running pointlessly and returning misleading results. I mean more convenient than this very misleading aplay output.

I'm not sure about other ways to check runtime info flag of the PCM, but that would be ideal

This might avoid confusing the script (might need to handle the not supported case and abort?): alsa-project/alsa-utils#271

@marc-hb
Copy link
Collaborator

marc-hb commented Jul 22, 2024

I'm not sure about other ways to check runtime info flag of the PCM, but that would be ideal

Anyone?

This might avoid confusing the script (might need to handle the not supported case and abort?): alsa-project/alsa-utils#271

Very useful but it won't help systems and people who don't update ALSA user space regularly.

The script is unfortunately not OK because it trusts the first, unconditional (and TBH: wrong) === PAUSE === message and changes its state machine accordingly. And then it gets very confused. Not sure how to fix that.

Wait: it shouldn't be too hard to catch PAUSE command ignored and fail or skip even after entering a "confused" state. It only needs to happen before the confusion leads to another, earlier and misleading failure...

@plbossart
Copy link
Member

@ujfalusi @marc-hb I am not able to follow what the thread is about. Is our CI able to deal with this or not?

The agreement was that the pause support would be forced on all CI devices, so what is the issue? We don't really care about the case where the pause is NOT supported, because it's not really our problem is it?

What I don't get is whether all CI devices are ready for this PR to be merged, adding a kernel parameter requires some changes in the setup.

@marc-hb
Copy link
Collaborator

marc-hb commented Jul 24, 2024

We don't really care about the case where the pause is NOT supported, because it's not really our problem is it?

It's not our problem in theory. In practice, we still don't have consistent device configuration.

marc-hb added a commit to marc-hb/sof-test that referenced this pull request Aug 1, 2024
Fail fast and avoid timeouts.

This is especially important considering pause will soon be disabled by
default:
- thesofproject/linux#5041

Signed-off-by: Marc Herbert <[email protected]>
@marc-hb
Copy link
Collaborator

marc-hb commented Aug 1, 2024

Is our CI able to deal with this or not?

It will be after small sof-test fix thesofproject/sof-test#1226 is merged (DONE). That fix will let us tell the difference between misconfigured devices versus well configured devices actually failing the test. @ujfalusi please review.

marc-hb added a commit to thesofproject/sof-test that referenced this pull request Aug 1, 2024
Fail fast and avoid timeouts.

This is especially important considering pause will soon be disabled by
default:
- thesofproject/linux#5041

Signed-off-by: Marc Herbert <[email protected]>
@marc-hb
Copy link
Collaborator

marc-hb commented Aug 6, 2024

SOFCI TEST

@marc-hb marc-hb changed the title ASoC: SOF: core/Intel: Follow the pause_supported flag from topology to disable or keep the pause support enabeld ASoC: SOF: core/Intel: Follow the pause_supported flag from topology to disable or keep the pause support enabled Aug 9, 2024
bardliao
bardliao previously approved these changes Oct 1, 2024
Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ujfalusi can you rewrite the commit message. Thanks.

@ujfalusi ujfalusi dismissed stale reviews from bardliao, ranj063, and plbossart via 7c2318b October 7, 2024 12:26
@ujfalusi ujfalusi force-pushed the peter/sof/pr/pause-resume-token-01 branch from 783785a to 7c2318b Compare October 7, 2024 12:26
New tokens are added to topology:
1202: SOF_TKN_STREAM_PLAYBACK_PAUSE_SUPPORTED
1203: SOF_TKN_STREAM_CAPTURE_PAUSE_SUPPORTED

The new tokens are used to advertise support for PAUSE/RESUME operation on
a PCM device depending on firmware product, use case, pipeline topology.

The snd_sof_pcm_stream.pause_supported is updated to reflect the advertised
value for the PCM device.

If the token does not exist then the pause_supported is set to false.

Note: it is up to the platform code to use this flag to decide to advertise
the PAUSE support for user space or not.

Signed-off-by: Peter Ujfalusi <[email protected]>
…USE support

If the stream's pause_supported flag is false then mask out the PAUSE
support, so user space will be prevented to use it.

Introduce a module parameter to ignore the pause_supported flag, named as
force_pause_support to allow testing of the PAUSE feature.

Signed-off-by: Peter Ujfalusi <[email protected]>
@ujfalusi
Copy link
Collaborator Author

ujfalusi commented Oct 7, 2024

Changes since v3:

  • Commit message updated
  • The module parameter updated from bool to int with three value: -1: Use default, 0L Disable pause, 1: Enable pause

@ssavati
Copy link

ssavati commented Oct 9, 2024

@ujfalusi I have added below flag in all CI devices
options snd_sof_intel_hda_common force_pause_support=true

@ssavati, I have changed the parameter from bool to int, so it should be
options snd_sof_intel_hda_common force_pause_support=1

Sorry, it was a result of a popular vote:
force_pause_support=-1 (default) don't force pause support, use what the system woudl use
force_pause_support=0 - disable the pause support
force_pause_support=1 - Enable the pause support when it can be (forced DMI_L1 and Capture streams cannot have pause enabled, but this also needs module parameter to hit)

@ujfalusi
Copy link
Collaborator Author

SOFCI TEST

@lgirdwood
Copy link
Member

SOFCI TEST

@lgirdwood lgirdwood merged commit 0691969 into thesofproject:topic/sof-dev Oct 24, 2024
12 of 14 checks passed
@ujfalusi ujfalusi deleted the peter/sof/pr/pause-resume-token-01 branch December 13, 2024 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants