Add hierarchy to published dashboard plots #5130

mrocklin · 2021-07-27T23:54:24Z

In dask/dask-labextension#179 (comment) we propose changing the Dask JupyterLab extension to provide a hierarchy of plots, for example "Worker Metrics", "Computation Progress", or "Profiling". Before the labextension could add something like this, the Dask scheduler probably needs to provide this hierarchy.

Currently the labextension queries for the individual-plots.json route, which is filled with the following code:

distributed/distributed/http/scheduler/info.py

Lines 149 to 179 in 50fd3ff

    
           class IndividualPlots(RequestHandler): 
        
               def get(self): 
        
                   try: 
        
                       from bokeh.server.tornado import BokehTornado 
        
                       bokeh_application = first( 
        
                           app 
        
                           for app in self.server.http_application.applications 
        
                           if isinstance(app, BokehTornado) 
        
                       ) 
        
                       individual_bokeh = { 
        
                           uri.strip("/").replace("-", " ").title(): uri 
        
                           for uri in bokeh_application.app_paths 
        
                           if uri.lstrip("/").startswith("individual-") 
        
                           and not uri.endswith(".json") 
        
                       } 
        
                       individual_static = { 
        
                           uri.strip("/") 
        
                           .replace(".html", "") 
        
                           .replace("-", " ") 
        
                           .title(): "/statics/" 
        
                           + uri 
        
                           for uri in os.listdir( 
        
                               os.path.join(os.path.dirname(__file__), "..", "static") 
        
                           ) 
        
                           if uri.lstrip("/").startswith("individual-") and uri.endswith(".html") 
        
                       } 
        
                       result = {**individual_bokeh, **individual_static} 
        
                       self.write(result) 
        
                   except (ImportError, StopIteration): 
        
                       self.write({})

We should maybe change this code (or provide another route in parallel) that provided a more hierarchical view, maybe like the following (using yaml here just for convenience)

real-time-worker-metrics:
- {"network": "individual-network-workers.html"}
- {"disk": "individual-disk-workers.html"}
- ...
aggregate-worker-metrics
- ...
scheduler-metrics:
- ...
gpu:
- ...

(Or some other, much better grouping). I think that this requires both an understanding of how Dask gets used, as well as enough understanding of the labextension to understand what it needs. To me this sounds like a job for @jacobtomlinson if he has time.

I think that this information should originate from this repository rather than dask-labextension so that we don't need to coordinate across codebases as we add plots.

cc also @ian-r-rose and @bryevdv

jacobtomlinson · 2021-07-29T14:37:00Z

Changing the structure of that file will require updating the lab extension to support that new hierarchy, even before displaying that hierarchy in the frontend. Otherwise existing users will have issues if they have a newer distributed with an older labextension.

https://github.com/dask/dask-labextension/blob/f6141455d770ed7de564fc4aa403b9964cd4e617/src/dashboard.tsx#L582

Perhaps individual-plots.json should be deprecated and left as it is and a new layout be added which the extension could then adopt at a later time.

ian-r-rose · 2021-07-29T14:46:53Z

This proposal makes sense to me, and would be a nice improvement, I think.

I agree with @jacobtomlinson that we'd want to have a decently long deprecation cycle for individual-plots.json, and for a while make sure that dask-labextension can understand both formats.

mrocklin · 2021-07-29T14:55:39Z

Yup. We can keep individual-plots.json around for forever as far as I'm concerned.

…

On Thu, Jul 29, 2021 at 7:47 AM Ian Rose ***@***.***> wrote: This proposal makes sense to me, and would be a nice improvement, I think. I agree with @jacobtomlinson <https://github.com/jacobtomlinson> that we'd want to have a decently long deprecation cycle for individual-plots.json, and for a while make sure that dask-labextension can understand both formats. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#5130 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACKZTGWGDKRH7T7FDMYFV3T2FSXRANCNFSM5BDHX57Q> .

bryevdv · 2021-07-29T22:29:40Z

individual-plots.json is just a dict of all the plots that are currently running? And for a given Dask Distributed version, the set of all potentially available plots is known up front? The it seems a grouping could be defined on the set of all potential available plots up front (i.e. not computed at all), and added as a distinguished key e.g. __groups___, inside individual-plots.json. The extension can match up the plots that are actually present in individual-plots.json into that fixed, overall grouping (if it is provided, or just use the list as-is if not).

New extension versions will work with new or old Dask Distributed versions. Old extension versions will at least work with old Dask Distributed versions, and if it happens to already know to ignore keys that start with __ it would work with the new Dask Distributed versions too (no need to deprecate anything in any case).

mrocklin · 2021-07-29T22:50:58Z

individual-plots.json is just a dict of all the plots that are currently running?

Yes

And for a given Dask Distributed version, the set of all potentially available plots is known up front?

The scheduler publishes it based on what it has at that moment. This is useful, for example, in the case of GPUs. The scheduler says "Hey! I have some GPU workers. Let me add these other plots into this dict."

The extension can match up the plots that are actually present in individual-plots.json into that fixed, overall grouping (if it is provided, or just use the list as-is if not).

To be clear, I'm suggesting that the scheduler publish the grouping, not that the labextension match things up.

Labextension: Hey scheduler, what do you have?
Scheduler: I have all of these plots. They're arranged into these groups
Labextension: Great! I'll render that on the screen. Thanks for putting everything together for me!

bryevdv · 2021-07-29T23:08:56Z

To be clear, I'm suggesting that the scheduler publish the grouping, not that the labextension match things up.

I'm suggesting the scheduler publish a pre-defined grouping over all possible plots (whether they are currently running or not) [1] to avoid having to make a dynamic grouping (less code to maintain, less work for the scheduler to do), and as a side-benefit nothing that currently exists needs to be deprecated (or kept around as disused cruft forever). But the end result for the user, a hierarchical menu of plots, is the same. Sending the base list of plots separate from an overall grouping is also a good separation of concerns and means that the grouping could be swapped out (e.g. for different classes of users or usage patterns where a different grouping is more appropriate) or ignored entirely (e.g. a simple sorted list for a dev/debugging view).

[1] this would only need updating e.g. when a new version adds or removes available plot types

mrocklin · 2021-07-30T00:04:02Z

In general I hear your points, but in this case I think that it will be pretty trivial to publish a dict of grouped plots alongside the current dict. I think that this is both the easiest and most powerful solution. I don't think that we should get too creative here.

publish a pre-defined grouping over all possible plots

Given extensions that might exist, this is hard to predict ahead of time.

to avoid having to make a dynamic grouping (less code to maintain, less work for the scheduler to do)

I don't yet think that this will be hard to do, but I could definitely be wrong. I don't think that there is really much code here. I think that it's mostly just some dict of dicts that we write in this file, maybe modify with some conditionals (like if has_gpu: add_some_plots()) and then we ship that out to whoever asks. This is, I think, the simplest approach.

and as a side-benefit nothing that currently exists needs to be deprecated (or kept around as disused cruft forever)

Keeping it around forever sounds fairly cheap to me. I don't think that there is a major cost here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add hierarchy to published dashboard plots #5130

Add hierarchy to published dashboard plots #5130

mrocklin commented Jul 27, 2021

jacobtomlinson commented Jul 29, 2021

ian-r-rose commented Jul 29, 2021

mrocklin commented Jul 29, 2021 via email

bryevdv commented Jul 29, 2021 •

edited

Loading

mrocklin commented Jul 29, 2021

bryevdv commented Jul 29, 2021 •

edited

Loading

mrocklin commented Jul 30, 2021

Add hierarchy to published dashboard plots #5130

Add hierarchy to published dashboard plots #5130

Comments

mrocklin commented Jul 27, 2021

jacobtomlinson commented Jul 29, 2021

ian-r-rose commented Jul 29, 2021

mrocklin commented Jul 29, 2021 via email

bryevdv commented Jul 29, 2021 • edited Loading

mrocklin commented Jul 29, 2021

bryevdv commented Jul 29, 2021 • edited Loading

mrocklin commented Jul 30, 2021

bryevdv commented Jul 29, 2021 •

edited

Loading

bryevdv commented Jul 29, 2021 •

edited

Loading