-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Coiled docs to include notebooks, CLI jobs and more details on software environments #498
base: main
Are you sure you want to change the base?
Conversation
… software environments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some small suggestions, do what you want with them. Otherwise seems good to me (though I did not test this)
# RAPIDS packages | ||
- rapids={{ rapids_version }} | ||
- python=3.12 | ||
- cuda-version>=12.0,<=12.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- cuda-version>=12.0,<=12.5 |
rapids
already has a run:
dependency on compatible versions of cuda-version
: https://github.com/rapidsai/integration/blob/172ef624ea50670969e1fd79930a46eabdd9c3c9/conda/recipes/rapids/meta.yaml#L31
And which packages you get should happen nicely and automatically based on the detected CUDA version on the system.
I think we could probably safely remove this. That'd also mean one less thing that's likely to become out of date over time, as RAPIDS support matrix of CUDA versions shifts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This environment file gets built on a build server which AFAIK doesn't have GPUs. They just set the cuda package, but it's not clear what they set it to. So if we omit this constraint I'm not sure what will happen. What do you think?
$ coiled env create --help
...
--gpu-enabled Set CUDA virtual package for Conda
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting. Then yes I guess something like this is necessary to specify explicitly.
But then could we at least relax this to cuda-version>=12.0,<13
? Just to reduce the risk of this doc getting out of date over time? We should be fine with any CUDA 12.x for the purpose of this example, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left couple of comments.
The only one thing I'm a bit torn is diving into all the possible ways of creating a software environment.
Is the goal of the doc, showcasing all the possible ways to run rapids in Coiled, or to guide people on how to get rapids up and running in the way we consider most "adequate".
I f we want to show all the options, I'd suggest we do it such that anyone that reads this, knows what to pick. For example
Do you have a local GPU? Yes, No and based on that suggest what's their best option.
Do you want/need a jupyter instance? Yes/No ...
The way I read it now, I'm not sure what a user would know what to pick out of all the options. Does that makes sense?
) | ||
## Software Environments | ||
|
||
By default when running remote operations Coiled will [attempt to create a copy of your local software environment](https://docs.coiled.io/user_guide/software/sync.html) which can be loaded onto the remote VMs. While this is an excellent feature it's likely that you do not have all of the GPU software libraries you wish to use installed locally. In this case we need to tell Coiled which software environment to use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would launching via the notebooks container also imply that you don't need a GPU locally neither? If so add a note about that can be helpful.
``` | ||
|
||
Or you can create it with a list of conda packages in case you want to customize the environment further. | ||
This is often the most convenient way to try out existing software environments, but is often not the most performant due to the way container images are unpacked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fair point, but if this is the way we recommend people using it, maybe we don't want to be negative about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intent was to document that RAPIDS should either use the container, or a prebuilt software environment. It should not use package sync.
I keps the container in most of the examples because it feels simpler and is very copy/paste friendly. But the prebuilt software environment is more performant, and it probably my recommendation as a best practice.
What do you think we should do to make this more clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that package sync is more performant, but that implies that you have
- a properly set environment, which is not hard but it depend a lot on the user.
- a GPU, if you don't package sync will not work (or this used to be the case)
My guess is that someone trying to use coiled, probably doesn't have a local GPU in their laptop. I might be wrong, though.
What we could do, is mentioned that to use package sync you need a local GPU and a properly set environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mean package sync. I mean manual software environments.
For GPU stuff we can't use package sync, but manual software environments are much more performant than pulling containers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, I remember this now. Thanks for the clarification.
Co-authored-by: James Lamb <[email protected]>
Co-authored-by: James Lamb <[email protected]>
Thanks for the reviews!
I think my goal is to show that RAPIDS can be used with CLI scripts, notebooks and Dask clusters. A demonstration that it is possible, rather than a demonstration of the best way to do things. This may not be the best goal though.
This is useful feedback. I think a little more about restructuring this. |
Closes #250