Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration of Agent/Environment with runner #332

Open
wants to merge 58 commits into
base: main
Choose a base branch
from

Conversation

srivatsankrishnan
Copy link
Contributor

@srivatsankrishnan srivatsankrishnan commented Jan 11, 2025

Summary

Integrates the agent, environment to gather all actions and uses existing runner to generate/launch jobs. This PR also closes the previous native Gym implementation PRand uses a simple abstract definition of Gym interface to have more control over the typing during integration.

Input

  • test toml with lists for DSE support
  • Agent + Environment for configuration

Output

  • Generated srun scripts with static values instead of list.

Before

...
    --fdl.NUM_GPUS=[1, 8, 16] \
    --fdl.CHECKPOINT_POLICY=['\\"aaaa\\"', '\\"bbb\\"'] \
...

After Agent/Env integration

...
  --fdl.NUM_GPUS=1 \
  --fdl.CHECKPOINT_POLICY=\"aaa\"

Test Plan

CI/CD.

  • Compare the generated bash scripts

Additional Notes

Include any other notes or comments about the pull request here. This can include challenges faced, future considerations, or context that reviewers might find helpful.

amaslenn and others added 30 commits January 10, 2025 14:21
1. Inherit from ABC if there @abstractmethods
2. Do not make gen_srun_success_check() abstruct, simply return an empty
   string by default. When needed, this method will be overriden.
Some slurm setups do not allow running enroot from the head node. Let's
rely on actual 'enroot import' run via srun and report its real error
message to user.
* Remove conf/common/test/chakra_replay.toml

* Remove conf/common/test_scenario/chakra_replay.toml
@srivatsankrishnan srivatsankrishnan marked this pull request as ready for review January 13, 2025 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants