Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TTL and DCSM reloading #27

Open
bbert opened this issue Sep 29, 2023 · 12 comments
Open

TTL and DCSM reloading #27

bbert opened this issue Sep 29, 2023 · 12 comments

Comments

@bbert
Copy link

bbert commented Sep 29, 2023

I have some considerations on content steering specification about TTL.

  1. First, specification is a bit confusing if the player SHOULD or SHALL reload the steering manifest after the specified TTL interval.
  • In the semantics (table 6.3.1) the spec says:
    “Specifies how many seconds the client shall wait before reloading the DCSM.”
    => SHALL

  • In section 7, bullet 7:
    “the client should parse it and retrieve the VERSION, TTL….”
    => SHOULD (it shall be SHALL)

  • In section 7, bullet 8:
    “The client sets a timer to re-request the STEERING-SERVER-URL after TTL seconds.”
    => SHALL or SHOULD is missing

  • In section 7, bullet 13:
    “It should still reload the RELOAD-URI after the specified TTL interval in case new service locations are added.”
    => SHOULD

  1. Before clarifying if it must be SHALL or SHOULD, I’d like to consider the use case for which the DCSM could be refreshed before TTL interval.
    The recommended value for TTL is 300 seconds, and in some cases, it would be valuable to force the players to refresh the manifest without waiting for TTL interval.
    As an example, a new CDN server could be allocated and which we would like to prioritize as soon as possible. Instead of globally reducing the TTL to a very low value and overload the steering server, one could decide to force the players to update the DCSM when necessary.
    This would require obviously an external control mean to steer the players and I don’t know how and if this should be part of the content steering specification.
    As a control mechanism, one potential solution is for example to standardize a CMSD response header key to force the players to reload the DCSM.
    Whatever the solution to control the DCSM reloading, if that makes sense we should consider adding some text in specification to take into account the use case where client can reload the DCSM before waiting for TTL delay.
@haudiobe
Copy link

haudiobe commented Oct 6, 2023

2023/10/06 TF meeting

  • everyone please review the issue.
  • how much do we want to make it a player implementation or mandate behaviour. Please discuss

@haudiobe haudiobe added good first issue Good for newcomers encourage-discussion Encourages discussion labels Oct 6, 2023
@gwendalsimon
Copy link

  1. Before clarifying if it must be SHALL or SHOULD, I’d like to consider the use case for which the DCSM could be refreshed before TTL interval.
    The recommended value for TTL is 300 seconds, and in some cases, it would be valuable to force the players to refresh the manifest without waiting for TTL interval.
    As an example, a new CDN server could be allocated and which we would like to prioritize as soon as possible. Instead of globally reducing the TTL to a very low value and overload the steering server, one could decide to force the players to update the DCSM when necessary.

I am not really convinced we need such a method to break the TTL (besides the complexity of it, see below). With a 300-sec TTL, all players are evenly distributed within this 5-min window, so they will come to the steering server one by one. In this particular use-case, it guarantees a graceful redirection of players to the new CDN, although forcing an update could generate a storm on the new CDN.

This would require obviously an external control mean to steer the players and I don’t know how and if this should be part of the content steering specification.

I would expect the steering server to be a stateless service, which does not store any information about the players. Furthermore, the steering server is not expected to know which players are still watching the session.

As a control mechanism, one potential solution is for example to standardize a CMSD response header key to force the players to reload the DCSM.

A CMSD message is issued by the CDN edge server. It cannot be the trigger to reload the DCSM since the decision to force the reload would come from the steering server... unless the steering server could ask the CDNs to send a CMSD message on its behalf.

@burak-kara
Copy link

As an example, a new CDN server could be allocated and which we would like to prioritize as soon as possible. Instead of globally reducing the TTL to a very low value and overload the steering server, one could decide to force the players to update the DCSM when necessary.

I remember this video from Apple WWDC22. They explain Pathway Cloning (starting from 8:16 with the background story) used to introduce a new CDN to the system. The idea still relies on the DCSM update at each TTL. They add PATHWAY-CLONES field to DCSM.

I try to illustrate the edge cases in which we want the new CDN to join the system before TTL (maybe preferably without any delay). But, for such cases, the player has the second (and so on) pathway on the PATHWAY-PRIORITY list as a backup.

@bbert
Copy link
Author

bbert commented Oct 13, 2023

Thanks @gwendalsimon and @burak-kara for your comments.

@burak-kara yes I know about pathway cloning but the use case was to update DCSM in order to get precisely new pathways before TTL delay.

@gwendalsimon I agree with you on the facts that steering server should preferably be stateless and the difficulties to ask CDN sending CMSD messages.

Let's tackle this issue in another way.
In fact the use case would be to enable a player to know about new pathways when it encounters some issues with current available pathways.

A potential solution is to complete the client bahaviour specification by adding the possibility for the client to refresh the DCSM when it encounters playback problems and when it has already switched to all of the available pathways.

By the way, I think we should explain more precisely in the client steering behaviour what is meant by "If the client encounters playback problems". When should a client make a BaseURL or Location switch?:

  • when it gets an error from the server?
  • when playback stalls after a failed request?
  • when playback stucks on lowest bitrate?
  • when time to first byte highly increase?
  • others? (for example CMSD duress field from CDN)

Or is it completely opened to player implementation?
@dsilhavy do you have any opinion on that?

@haudiobe
Copy link

haudiobe commented Jan 9, 2024

Encourage to review the latest specification here: https://members.dashif.org/wg/Interoperability/document/4810

@haudiobe
Copy link

We should check check what the IOP says. Do we reload the MPD in case of repeated segment 404? IOP and MPEG-DASH recommends to reload the MPD. That may resolve the issue for bertrand.

@dsilhavy please let know how you have implemented. Then we fix the spec. and the we check of bertrands still exists and then we fix the spec even more.

@bbert
Copy link
Author

bbert commented Feb 23, 2024

We should check check what the IOP says. Do we reload the MPD in case of repeated segment 404? IOP and MPEG-DASH recommends to reload the MPD. That may resolve the issue for bertrand.

And in case MPD uses the same baseUrl as for the segments, the player would not be able to refresh the MPD.
Please consider the use case where the player streams the content (MPD+segments) from a CDN and needs to be redirected to a newly created CDN/pathway to avoid playback failure.

@dsilhavy
Copy link

dsilhavy commented Mar 1, 2024

This is what dash.js does today:

  • If we see a failed request, we trigger a clock sync to check if we have the right offset between client and server reference clock
  • The number of retryAttempts and retryInterval can be configured via the player settings. Per default we try three times with a waiting time between 500-1000ms (depends on the type of the object that is requested)
  • If the number of remaining retryAttempts reaches 0 we blacklist the BaseURL and move to the next available BaseURL.
  • If all BaseURL elements are blacklisted playback is terminated.

As of today, we are not refreshing the manifest in case of repeated segment 404s. We are also not refreshing the DCSM.

What would be great if we can also collect the relevant parts of the specifications that dash.js shall implement to improve the current behavior.

@haudiobe
Copy link

haudiobe commented Mar 1, 2024

Live TF 2024/03/01

Accepted that the spec details need to be collected.

@haudiobe
Copy link

IOP WG 2024/10/29

We suggest to update client behaviour

  • in case of segment errors (404), an MPD update may provide new pathways and a content steering update should done as well.
  • only after 3 attempts, an content steering update will be done. Should be done in the background.
  • We still need to define the order in the player.

Please comment, we will update the spec.

@haudiobe haudiobe added probable-agreement-please-comment and removed good first issue Good for newcomers encourage-discussion Encourages discussion labels Oct 29, 2024
@thasso
Copy link

thasso commented Oct 29, 2024

As @dsilhavy was explaining what dash.js does, let me try to generalise the list.

The player receives a 404 response on a segment download and can perform the following actions:

  • clock-sync and try again (IOP / DASH)
  • try again based on client configuration (custom?)
  • use a different BaseURL and try again (IOP / DASH)
  • update the MPD and try again (IOP / DASH)
  • use a different rendition and try again (custom but I have seen this in implementations)
  • update content steering information and try again (new)

Generally I would be okay to add Content Steering update to the list. Its a reasonable thing to do. My question would be if we want to formalise the client behaviour more than just allowing this option as well? If we just add this to the content steering spec, as an implementer, it might be unclear in which order the client is to go through the list above.

In the IOP Guidelines we basically quote the MPEG Spec and say in 4.8.2.1

Similarly, if the DASH access client receives an HTTP client error (i.e. messages with 4xx
error code) for the request of a Media Segment, the requested Media Segment may not be
available anymore or may not be available yet. In both these case the client should check
if the precision of the time synchronization to a globally accurate time standard or to the
time offered in the MPD is sufficiently accurate. If the clock is believed accurate, or the
error re-occurs after any correction, the client should check for an update of the MPD. . If
multiple BaseURL elements are available, the client may also check for alternative
instances of the same content that are hosted on a different server.

This is in itself already ambiguous since it it not clear if the client should prioritise multiple BaseURL entries over retry behaviour or manifest updates. That said, I would propose the following order:

  1. Clock Sync (unless the client is sure that the clock is correct) and try again
  2. Retry according to client's retry configuration
  3. Use an alternative BaseURL and try again
  4. Content Steering Update and try again
  5. Manifest Update and try again
  6. Blocklist the rendition (for a configurable period of time) and try again with a different rendition
  7. Terminate the streaming session with an error

The implementation may decide to do steps 4. (and 5.) in parallel to ongoing segment download retries and not synchronously.

We should do the Content Steering update before the Manifest Update. @bbert mentioned above already that in case Manifest+Segments are coming from the same CDN and there is an issue, the player will not be able to do the Manifest update.

I added 2. and 6. to the list because this is something that I think is reasonable behaviour and there are popular implementations out there (ExoPlayer is one of them) that implement this as well.

What I am not sure of is if we should first try alternative BaseURLs or first (synchronously) update Content Steering. At the end I think it is a matter of available time for the client. If the client has enough buffer, it can easily first get an update from the content steering server. If the client is very close to running out of buffer, it might be better to use an alternative BaseURL.
I would also assume here that the list of alternative BaseURLs is already sorted based on the last pathway priority response from the steering server. In this case, going to the next in the list is probably a reasonable and fast choice?

@bbert you also asked here if we should further clarify when the client should do a BaseURL or location switch. Personally I think this should only happen in the error case. Mostly because it would keep it simple and a lot of the other properties might easily depends on the client and the clients network rather than something upstream.

@haudiobe haudiobe added the encourage-discussion Encourages discussion label Nov 12, 2024
@haudiobe
Copy link

IOP 2024/12/10

Please discuss further, but suggestion from @thasso may be used as a baseline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants