Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk read/write dashboard plots #5122

Closed
mrocklin opened this issue Jul 26, 2021 · 6 comments
Closed

Disk read/write dashboard plots #5122

mrocklin opened this issue Jul 26, 2021 · 6 comments

Comments

@mrocklin
Copy link
Member

Following on #5090 it would be useful to have similar plots for Disk usage. I suspect that it would be the exact same chart (read/write per worker and timeseries), but we'll need to capture disk I/O statistics. I'm not sure that we do this today. It might be worth checking out the system_monitor.py file to see and if not maybe add measurements coming from psutil (or somewhere else if there are better ways of measuring this).

@jrbourbeau
Copy link
Member

After a quick look, it appears we don't measure disk I/O statistics in the system monitor. We do capture network I/O using psutil.net_io_counters

ioc = psutil.net_io_counters()

From the psutil docs, I see there's a similar looking psutil.disk_io_counters function for gathering disk I/O statistics. That would be one way to start gathering this information

@mrocklin
Copy link
Member Author

@ncclementi if it's easy for you to roll this into the current network bandwidth plots that would be welcome.

@ncclementi
Copy link
Member

I believe the current network plot PR is ready to merge (as soon as CI finishes) but I can include this on a separate PR. I have a question:

  • It seems we are not tracking disk I/O statistics. As @jrbourbeau mentioned we can do this with psutil.disk_io_counters Do we want to add this to distributed/distributed/system_monitor.py following a similar approach to what is done with the network read_bytes/write_bytes or there is a more suitable place.

Something like (code not tested)

       ...
        result = {"cpu": cpu, "memory": memory, "time": now, "count": self.count}

        if self._collect_disk_io_counters:
            try:
                io_disk = psutil.disk_io_counters()
            except Exception:
                pass
            else:
                last = self._last_disk_io_counters
                duration = now - self.last_time
                read_bytes = io_disk.read_bytes/((duration or 0.5))
                write_bytes = io_disk.write_bytes / (duration or 0.5)
                self.last_time = now
                self._last_disk_io_counters = io_disk
                self.read_bytes_disk.append(read_bytes)
                self.write_bytes_disk.append(write_bytes)
                result["read_bytes_disk"] = read_bytes
                result["write_bytes_disk"] = write_bytes

Side note: I notice we have this in metrics but it's not used, is this something that should be removed?

disk_io_counters = _psutil_caller("disk_io_counters")

@mrocklin
Copy link
Member Author

Yes, we want to add something very much like what you have above to the system monitor. I don't recall the reason for the disk_io_counters. One could use git blame here, or just ignore it for now.

@quasiben
Copy link
Member

Some time ago @pentschev added logging for spilling rapidsai/dask-cuda#442 . The PR logging for each event and can query for total time spilling . Might be of interest to folks on this issue

@jrbourbeau
Copy link
Member

I believe this was included in #5129 (thanks @ncclementi!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants