What information is missing? #341

dkfellows · 2021-06-21T09:00:21Z

What information should be produced by the spalloc reimplementation, but isn't?

Christian-B · 2021-06-21T09:43:10Z

At this point we should consider what info is required for either a normal user or an admin and in a second pass decide which only admins should see,

rowleya · 2021-06-21T10:12:47Z

Some statistics could be useful, which then might avoid logfile analysis. The things we currently look for are:

Total Core hours used by all jobs (where core hours is cores used x duration of job in hours)
Total number of jobs

Note that core hours could become board hours here if useful (which can then be multiplied up by an estimated average cores-per-board if desired).

It would be even better if these statistics can then be broken down (depending on the user model) e.g.:

Core hours used by HBP/EBRAINS users
Core hours used by local users
Core hours used by testing (could be split into hardware / software testing)
Core hours used by service e.g. HBP batch jobs vs. Jupyter users

All of this is desirable, but if it is hard to achieve, we can always do it by post-analysis instead of course.

Christian-B · 2021-06-21T10:16:56Z

Answering only the should not the insn't.

Info on each job.
Boards used
Time running
Size of data
Owner
Current status for example DSE, loading execs, running, extracting data, waiting to close ect
if available time in current status
Job type splloc vs Jupiter vs portal vs tests

Summaries for all jobs
n jobs running
total size boards, data ect

History of jobs
ideally by user and job type
n jobs
total time
total size

Machine info
Boards in use
Boards available
From those we should be able to work out
Largest machine currently available

Also if applicable the number of jobs in the queue due to machine full and then stats on wait times ect.

rowleya · 2021-06-21T10:21:59Z

Steve also requested the ability to report on down boards / chips / cores over time. Not sure how easy that would be to keep completely in spalloc though.

dkfellows · 2021-06-21T12:18:14Z

Collecting statistics with a database present should be a lot easier.

dkfellows · 2021-06-21T12:19:43Z

Re job internal status, that would have to be something told to us and which we would just report onwards. Not much we can do otherwise; spalloc really doesn't see what is going on inside.

dkfellows · 2021-06-21T12:21:44Z

I'd be tempted to make the long-term aggregate reporting stuff be things that is just done by running scripts against the DB, instead of being part of the application itself.

Christian-B · 2021-06-21T12:21:56Z

Re job internal status, that would have to be something told to us and which we would just report onwards. Not much we can do otherwise; spalloc really doesn't see what is going on inside.

If we dont have the data then lets not complicate the system at this point.

rowleya · 2021-06-21T12:26:07Z

I'd be tempted to make the long-term aggregate reporting stuff be things that is just done by running scripts against the DB, instead of being part of the application itself.

Happy enough for that to be done, at least initially, especially as this is likely to be faster than scanning files. Longer term, having a web page with nice graphs is an option that can be implemented later.

Christian-B · 2021-06-21T12:29:59Z

If there is a away to allow all query scripts but block data changing ones that is fine.

Allowing none precanned scripts/ queries that change the data is dangerous as once accident could destroy the whole database

Christian-B · 2021-06-21T12:31:26Z

When we had webpages with graphs ect these can use prepared scripts which run on the same API.

dkfellows · 2021-06-21T13:40:53Z

If there is a away to allow all query scripts but block data changing ones that is fine.

Direct access to the DB is always an admin-only thing, as I won't put a general query interface in the service. (For one thing, the connection management API is not set up for producing read-only connections, and for another there will be fields that should remain shrouded from general users.) If you want to run a general query, the way to do it will be to log onto the spalloc machine and either run against the live DB or take a copy of it.

Making a copy of the DB could be an (admin-only) operation.

dkfellows · 2021-06-21T13:43:40Z

I've added the ability to look up board information from a machine by the IP address of the board. That was the missing whereis operation from the existing spalloc. 😁

rowleya · 2021-06-21T13:46:25Z

Ah yes, that inspires further things:

Ability to reserve multiple boards starting at IP address
Ability to find job from board IP address

dkfellows added question Further information is requested spalloc server Relating to the new spalloc server labels Jun 21, 2021

dkfellows pinned this issue Nov 18, 2021

dkfellows unpinned this issue Apr 10, 2022

dkfellows moved this to To do in Spalloc Server Aug 4, 2022

dkfellows added this to Spalloc Server Aug 4, 2022

dkfellows added this to the Bluesky milestone Mar 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What information is missing? #341

What information is missing? #341

dkfellows commented Jun 21, 2021

Christian-B commented Jun 21, 2021

rowleya commented Jun 21, 2021

Christian-B commented Jun 21, 2021

rowleya commented Jun 21, 2021

dkfellows commented Jun 21, 2021

dkfellows commented Jun 21, 2021

dkfellows commented Jun 21, 2021

Christian-B commented Jun 21, 2021

rowleya commented Jun 21, 2021

Christian-B commented Jun 21, 2021

Christian-B commented Jun 21, 2021

dkfellows commented Jun 21, 2021 •

edited

Loading

dkfellows commented Jun 21, 2021

rowleya commented Jun 21, 2021

What information is missing? #341

What information is missing? #341

Comments

dkfellows commented Jun 21, 2021

Christian-B commented Jun 21, 2021

rowleya commented Jun 21, 2021

Christian-B commented Jun 21, 2021

rowleya commented Jun 21, 2021

dkfellows commented Jun 21, 2021

dkfellows commented Jun 21, 2021

dkfellows commented Jun 21, 2021

Christian-B commented Jun 21, 2021

rowleya commented Jun 21, 2021

Christian-B commented Jun 21, 2021

Christian-B commented Jun 21, 2021

dkfellows commented Jun 21, 2021 • edited Loading

dkfellows commented Jun 21, 2021

rowleya commented Jun 21, 2021

dkfellows commented Jun 21, 2021 •

edited

Loading