Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Coiled docs to include notebooks, CLI jobs and more details on software environments #498

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jacobtomlinson
Copy link
Member

Closes #250

@jacobtomlinson jacobtomlinson added doc Improvements or additions to documentation platform/coiled Runs on Coiled labels Jan 23, 2025
@jacobtomlinson jacobtomlinson requested a review from a team as a code owner January 23, 2025 14:56
Copy link
Member

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some small suggestions, do what you want with them. Otherwise seems good to me (though I did not test this)

source/platforms/coiled.md Outdated Show resolved Hide resolved
# RAPIDS packages
- rapids={{ rapids_version }}
- python=3.12
- cuda-version>=12.0,<=12.5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- cuda-version>=12.0,<=12.5

rapids already has a run: dependency on compatible versions of cuda-version: https://github.com/rapidsai/integration/blob/172ef624ea50670969e1fd79930a46eabdd9c3c9/conda/recipes/rapids/meta.yaml#L31

And which packages you get should happen nicely and automatically based on the detected CUDA version on the system.

I think we could probably safely remove this. That'd also mean one less thing that's likely to become out of date over time, as RAPIDS support matrix of CUDA versions shifts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This environment file gets built on a build server which AFAIK doesn't have GPUs. They just set the cuda package, but it's not clear what they set it to. So if we omit this constraint I'm not sure what will happen. What do you think?

$ coiled env create --help
...
  --gpu-enabled                   Set CUDA virtual package for Conda
...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting. Then yes I guess something like this is necessary to specify explicitly.

But then could we at least relax this to cuda-version>=12.0,<13? Just to reduce the risk of this doc getting out of date over time? We should be fine with any CUDA 12.x for the purpose of this example, I think.

source/platforms/coiled.md Outdated Show resolved Hide resolved
Copy link
Contributor

@ncclementi ncclementi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left couple of comments.

The only one thing I'm a bit torn is diving into all the possible ways of creating a software environment.

Is the goal of the doc, showcasing all the possible ways to run rapids in Coiled, or to guide people on how to get rapids up and running in the way we consider most "adequate".

I f we want to show all the options, I'd suggest we do it such that anyone that reads this, knows what to pick. For example

Do you have a local GPU? Yes, No and based on that suggest what's their best option.
Do you want/need a jupyter instance? Yes/No ...

The way I read it now, I'm not sure what a user would know what to pick out of all the options. Does that makes sense?

)
## Software Environments

By default when running remote operations Coiled will [attempt to create a copy of your local software environment](https://docs.coiled.io/user_guide/software/sync.html) which can be loaded onto the remote VMs. While this is an excellent feature it's likely that you do not have all of the GPU software libraries you wish to use installed locally. In this case we need to tell Coiled which software environment to use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would launching via the notebooks container also imply that you don't need a GPU locally neither? If so add a note about that can be helpful.

```

Or you can create it with a list of conda packages in case you want to customize the environment further.
This is often the most convenient way to try out existing software environments, but is often not the most performant due to the way container images are unpacked.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fair point, but if this is the way we recommend people using it, maybe we don't want to be negative about it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intent was to document that RAPIDS should either use the container, or a prebuilt software environment. It should not use package sync.

I keps the container in most of the examples because it feels simpler and is very copy/paste friendly. But the prebuilt software environment is more performant, and it probably my recommendation as a best practice.

What do you think we should do to make this more clear?

Copy link
Contributor

@ncclementi ncclementi Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that package sync is more performant, but that implies that you have

  1. a properly set environment, which is not hard but it depend a lot on the user.
  2. a GPU, if you don't package sync will not work (or this used to be the case)

My guess is that someone trying to use coiled, probably doesn't have a local GPU in their laptop. I might be wrong, though.

What we could do, is mentioned that to use package sync you need a local GPU and a properly set environment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mean package sync. I mean manual software environments.

For GPU stuff we can't use package sync, but manual software environments are much more performant than pulling containers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, I remember this now. Thanks for the clarification.

jacobtomlinson and others added 2 commits January 23, 2025 16:54
Co-authored-by: James Lamb <[email protected]>
Co-authored-by: James Lamb <[email protected]>
@jacobtomlinson
Copy link
Member Author

jacobtomlinson commented Jan 23, 2025

Thanks for the reviews!

Is the goal of the doc, showcasing all the possible ways to run rapids in Coiled, or to guide people on how to get rapids up and running in the way we consider most "adequate".

I think my goal is to show that RAPIDS can be used with CLI scripts, notebooks and Dask clusters. A demonstration that it is possible, rather than a demonstration of the best way to do things. This may not be the best goal though.

The way I read it now, I'm not sure what a user would know what to pick out of all the options. Does that makes sense?

This is useful feedback. I think a little more about restructuring this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Improvements or additions to documentation platform/coiled Runs on Coiled
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document Coiled Notebooks and CLI Jobs
3 participants