Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not require enroot on head node (backport #323) #326

Closed
wants to merge 9 commits into from
Prev Previous commit
Next Next commit
Revert changes in README
amaslenn committed Jan 8, 2025

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
commit ca970723563b6ca6eaaf8f1b10317c3abb877c69
33 changes: 0 additions & 33 deletions USER_GUIDE.md
Original file line number Diff line number Diff line change
@@ -363,39 +363,6 @@ You can update the fields to adjust the behavior. For example, you can update th
2. Follow the instructions under 'Usage Tips' on how to download the tokenizer.
3. Replace "training.model.tokenizer.model=TOKENIZER_MODEL" with "training.model.tokenizer.model=YOUR_TOKENIZER_PATH" (the tokenizer should be a .model file) in conf/common/test/llama.toml.


## Using Test Hooks in CloudAI

A test hook in CloudAI is a specialized test that runs either before or after each main test in a scenario, providing flexibility to prepare the environment or clean up resources. Hooks are defined as pre-test or post-test and referenced in the test scenario’s TOML file using `pre_test` and `post_test` fields.
```
name = "nccl-test"

pre_test = "nccl_test_pre"
post_test = "nccl_test_post"

[[Tests]]
id = "Tests.1"
test_name = "nccl_test_all_reduce"
num_nodes = "2"
time_limit = "00:20:00"
```

CloudAI organizes hooks in a dedicated directory structure:

- Hook directory: All hook configurations reside in `conf/hook/`.
- Hook test scenarios: Place pre-test hook and post-test hook scenario files in `conf/hook/`.
- Hook tests: Place individual tests referenced in hooks within `conf/hook/test/`.

In the execution flow, pre-test hooks run before the main test, which only executes if the pre-test completes successfully. Post-test hooks follow the main test, provided the prior steps succeed. If a pre-test hook fails, the main test and its post-test hook are skipped.
```
name = "nccl_test_pre"

[[Tests]]
id = "Tests.1"
test_name = "nccl_test_all_reduce"
time_limit = "00:20:00"
```

## Troubleshooting
In this section, we will guide you through identifying the root cause of issues, determining whether they stem from system infrastructure or a bug in CloudAI. Users should closely follow the USER_GUIDE.md and README.md for installation, adding test templates, tests, and test scenarios.