-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify launcher world size parsing #3398
Simplify launcher world size parsing #3398
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Any way we can do a test for this? |
Unfortunately not easily since pytest runs in a single instance and this would require a separate launch. But ill include a manual test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
waiting for dais hold & tests pass, lgtm otherwise
can someone put a request changes hold to block during codefreeze? @dakinggg ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Holding temporarily
I'm curious, why set env_var |
was bug in mcloud, it should set all env vars even if on single node
will investigate |
Discussed offline, if you dont use the launcher it will still use the WORLD_SIZE env var which we cannot fix. This break is not related to this PR, you must use |
What does this PR do?
Simplify launcher world size parsing. Now,
composer -n
correctly enables running on fewer GPUs when iterating on a single machine.