Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What information is missing? #341

Open
dkfellows opened this issue Jun 21, 2021 · 14 comments
Open

What information is missing? #341

dkfellows opened this issue Jun 21, 2021 · 14 comments
Labels
question Further information is requested spalloc server Relating to the new spalloc server
Milestone

Comments

@dkfellows
Copy link
Member

What information should be produced by the spalloc reimplementation, but isn't?

@dkfellows dkfellows added question Further information is requested spalloc server Relating to the new spalloc server labels Jun 21, 2021
@Christian-B
Copy link
Member

At this point we should consider what info is required for either a normal user or an admin and in a second pass decide which only admins should see,

@rowleya
Copy link
Member

rowleya commented Jun 21, 2021

Some statistics could be useful, which then might avoid logfile analysis. The things we currently look for are:

  • Total Core hours used by all jobs (where core hours is cores used x duration of job in hours)
  • Total number of jobs

Note that core hours could become board hours here if useful (which can then be multiplied up by an estimated average cores-per-board if desired).

It would be even better if these statistics can then be broken down (depending on the user model) e.g.:

  • Core hours used by HBP/EBRAINS users
  • Core hours used by local users
  • Core hours used by testing (could be split into hardware / software testing)
  • Core hours used by service e.g. HBP batch jobs vs. Jupyter users

All of this is desirable, but if it is hard to achieve, we can always do it by post-analysis instead of course.

@Christian-B
Copy link
Member

Answering only the should not the insn't.

Info on each job.
Boards used
Time running
Size of data
Owner
Current status for example DSE, loading execs, running, extracting data, waiting to close ect
if available time in current status
Job type splloc vs Jupiter vs portal vs tests

Summaries for all jobs
n jobs running
total size boards, data ect

History of jobs
ideally by user and job type
n jobs
total time
total size

Machine info
Boards in use
Boards available
From those we should be able to work out
Largest machine currently available

Also if applicable the number of jobs in the queue due to machine full and then stats on wait times ect.

@rowleya
Copy link
Member

rowleya commented Jun 21, 2021

Steve also requested the ability to report on down boards / chips / cores over time. Not sure how easy that would be to keep completely in spalloc though.

@dkfellows
Copy link
Member Author

Collecting statistics with a database present should be a lot easier.

@dkfellows
Copy link
Member Author

Re job internal status, that would have to be something told to us and which we would just report onwards. Not much we can do otherwise; spalloc really doesn't see what is going on inside.

@dkfellows
Copy link
Member Author

I'd be tempted to make the long-term aggregate reporting stuff be things that is just done by running scripts against the DB, instead of being part of the application itself.

@Christian-B
Copy link
Member

Re job internal status, that would have to be something told to us and which we would just report onwards. Not much we can do otherwise; spalloc really doesn't see what is going on inside.

If we dont have the data then lets not complicate the system at this point.

@rowleya
Copy link
Member

rowleya commented Jun 21, 2021

I'd be tempted to make the long-term aggregate reporting stuff be things that is just done by running scripts against the DB, instead of being part of the application itself.

Happy enough for that to be done, at least initially, especially as this is likely to be faster than scanning files. Longer term, having a web page with nice graphs is an option that can be implemented later.

@Christian-B
Copy link
Member

If there is a away to allow all query scripts but block data changing ones that is fine.

Allowing none precanned scripts/ queries that change the data is dangerous as once accident could destroy the whole database

@Christian-B
Copy link
Member

When we had webpages with graphs ect these can use prepared scripts which run on the same API.

@dkfellows
Copy link
Member Author

dkfellows commented Jun 21, 2021

If there is a away to allow all query scripts but block data changing ones that is fine.

Direct access to the DB is always an admin-only thing, as I won't put a general query interface in the service. (For one thing, the connection management API is not set up for producing read-only connections, and for another there will be fields that should remain shrouded from general users.) If you want to run a general query, the way to do it will be to log onto the spalloc machine and either run against the live DB or take a copy of it.

Making a copy of the DB could be an (admin-only) operation.

@dkfellows
Copy link
Member Author

I've added the ability to look up board information from a machine by the IP address of the board. That was the missing whereis operation from the existing spalloc. 😁

@rowleya
Copy link
Member

rowleya commented Jun 21, 2021

Ah yes, that inspires further things:

  • Ability to reserve multiple boards starting at IP address
  • Ability to find job from board IP address

@dkfellows dkfellows pinned this issue Nov 18, 2021
@dkfellows dkfellows unpinned this issue Apr 10, 2022
@dkfellows dkfellows moved this to To do in Spalloc Server Aug 4, 2022
@dkfellows dkfellows added this to the Bluesky milestone Mar 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested spalloc server Relating to the new spalloc server
Projects
Status: To do
Development

No branches or pull requests

3 participants