Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hierarchy to published dashboard plots #5130

Open
mrocklin opened this issue Jul 27, 2021 · 7 comments
Open

Add hierarchy to published dashboard plots #5130

mrocklin opened this issue Jul 27, 2021 · 7 comments

Comments

@mrocklin
Copy link
Member

In dask/dask-labextension#179 (comment) we propose changing the Dask JupyterLab extension to provide a hierarchy of plots, for example "Worker Metrics", "Computation Progress", or "Profiling". Before the labextension could add something like this, the Dask scheduler probably needs to provide this hierarchy.

Currently the labextension queries for the individual-plots.json route, which is filled with the following code:

class IndividualPlots(RequestHandler):
def get(self):
try:
from bokeh.server.tornado import BokehTornado
bokeh_application = first(
app
for app in self.server.http_application.applications
if isinstance(app, BokehTornado)
)
individual_bokeh = {
uri.strip("/").replace("-", " ").title(): uri
for uri in bokeh_application.app_paths
if uri.lstrip("/").startswith("individual-")
and not uri.endswith(".json")
}
individual_static = {
uri.strip("/")
.replace(".html", "")
.replace("-", " ")
.title(): "/statics/"
+ uri
for uri in os.listdir(
os.path.join(os.path.dirname(__file__), "..", "static")
)
if uri.lstrip("/").startswith("individual-") and uri.endswith(".html")
}
result = {**individual_bokeh, **individual_static}
self.write(result)
except (ImportError, StopIteration):
self.write({})

We should maybe change this code (or provide another route in parallel) that provided a more hierarchical view, maybe like the following (using yaml here just for convenience)

real-time-worker-metrics:
- {"network": "individual-network-workers.html"}
- {"disk": "individual-disk-workers.html"}
- ...
aggregate-worker-metrics
- ...
scheduler-metrics:
- ...
gpu:
- ...

(Or some other, much better grouping). I think that this requires both an understanding of how Dask gets used, as well as enough understanding of the labextension to understand what it needs. To me this sounds like a job for @jacobtomlinson if he has time.

I think that this information should originate from this repository rather than dask-labextension so that we don't need to coordinate across codebases as we add plots.

cc also @ian-r-rose and @bryevdv

@jacobtomlinson
Copy link
Member

Changing the structure of that file will require updating the lab extension to support that new hierarchy, even before displaying that hierarchy in the frontend. Otherwise existing users will have issues if they have a newer distributed with an older labextension.

https://github.com/dask/dask-labextension/blob/f6141455d770ed7de564fc4aa403b9964cd4e617/src/dashboard.tsx#L582

Perhaps individual-plots.json should be deprecated and left as it is and a new layout be added which the extension could then adopt at a later time.

@ian-r-rose
Copy link
Collaborator

This proposal makes sense to me, and would be a nice improvement, I think.

I agree with @jacobtomlinson that we'd want to have a decently long deprecation cycle for individual-plots.json, and for a while make sure that dask-labextension can understand both formats.

@mrocklin
Copy link
Member Author

mrocklin commented Jul 29, 2021 via email

@bryevdv
Copy link
Contributor

bryevdv commented Jul 29, 2021

individual-plots.json is just a dict of all the plots that are currently running? And for a given Dask Distributed version, the set of all potentially available plots is known up front? The it seems a grouping could be defined on the set of all potential available plots up front (i.e. not computed at all), and added as a distinguished key e.g. __groups___, inside individual-plots.json. The extension can match up the plots that are actually present in individual-plots.json into that fixed, overall grouping (if it is provided, or just use the list as-is if not).

New extension versions will work with new or old Dask Distributed versions. Old extension versions will at least work with old Dask Distributed versions, and if it happens to already know to ignore keys that start with __ it would work with the new Dask Distributed versions too (no need to deprecate anything in any case).

@mrocklin
Copy link
Member Author

individual-plots.json is just a dict of all the plots that are currently running?

Yes

And for a given Dask Distributed version, the set of all potentially available plots is known up front?

The scheduler publishes it based on what it has at that moment. This is useful, for example, in the case of GPUs. The scheduler says "Hey! I have some GPU workers. Let me add these other plots into this dict."

The extension can match up the plots that are actually present in individual-plots.json into that fixed, overall grouping (if it is provided, or just use the list as-is if not).

To be clear, I'm suggesting that the scheduler publish the grouping, not that the labextension match things up.

  • Labextension: Hey scheduler, what do you have?
  • Scheduler: I have all of these plots. They're arranged into these groups
  • Labextension: Great! I'll render that on the screen. Thanks for putting everything together for me!

@bryevdv
Copy link
Contributor

bryevdv commented Jul 29, 2021

To be clear, I'm suggesting that the scheduler publish the grouping, not that the labextension match things up.

I'm suggesting the scheduler publish a pre-defined grouping over all possible plots (whether they are currently running or not) [1] to avoid having to make a dynamic grouping (less code to maintain, less work for the scheduler to do), and as a side-benefit nothing that currently exists needs to be deprecated (or kept around as disused cruft forever). But the end result for the user, a hierarchical menu of plots, is the same. Sending the base list of plots separate from an overall grouping is also a good separation of concerns and means that the grouping could be swapped out (e.g. for different classes of users or usage patterns where a different grouping is more appropriate) or ignored entirely (e.g. a simple sorted list for a dev/debugging view).

[1] this would only need updating e.g. when a new version adds or removes available plot types

@mrocklin
Copy link
Member Author

In general I hear your points, but in this case I think that it will be pretty trivial to publish a dict of grouped plots alongside the current dict. I think that this is both the easiest and most powerful solution. I don't think that we should get too creative here.

publish a pre-defined grouping over all possible plots

Given extensions that might exist, this is hard to predict ahead of time.

to avoid having to make a dynamic grouping (less code to maintain, less work for the scheduler to do)

I don't yet think that this will be hard to do, but I could definitely be wrong. I don't think that there is really much code here. I think that it's mostly just some dict of dicts that we write in this file, maybe modify with some conditionals (like if has_gpu: add_some_plots()) and then we ship that out to whoever asks. This is, I think, the simplest approach.

and as a side-benefit nothing that currently exists needs to be deprecated (or kept around as disused cruft forever)

Keeping it around forever sounds fairly cheap to me. I don't think that there is a major cost here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants