Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama #23

Merged
merged 3 commits into from
Jun 4, 2024
Merged

Llama #23

merged 3 commits into from
Jun 4, 2024

Conversation

jeffnvidia
Copy link
Contributor

@jeffnvidia jeffnvidia commented May 19, 2024

Summary

Based on PR 20
enable to use Llama tests
create llama toml file and adapt NeMo gen command

There is a FIXME inside this PR (in the Llama.toml file):

FIXME : ~training.model.position_embedding_type was added in the extra_cmd_args in order to fix a bug from NeMo repository (https://github.com/NVIDIA/NeMo).
the commit that should fix this issue in NeMo is : 5b296e8af832c67d361fdfb80a165db3affaf76a.
Once the new release of NeMoLauncher includes this commit (check by downloading the corresponding container and look inside /opt for this commit), ~training.model.position_embedding_type should be removed from the extra args

Test Plan

Test by @amaslenn
CI
Test by @jeffnvidia
2.1 Slurm command generation
$ python ./cloudaix.py --mode run --system_config_path conf/v0.6/general/system/... --test_scenario_path conf/v0.6/general/test_scenario/llama/llama.toml

Additional Notes

Llama 16 nodes alone (3, 5, 8) : /auto/mtrsysgwork/jmahou/git/asap_cloudai/results/2024-05-01_12-17-26

image

cmd : $ python ./cloudaix.py --mode run --system_config_path conf/v0.6/general/system/... --test_scenario_path conf/v0.6/general/test_scenario/llama/llama.toml
Llama 16 nodes (3, 5, 8) + bisection noise 72 nodes : /auto/mtrsysgwork/jmahou/git/asap_cloudai/results/2024-05-01_13-42-47

image

cmd : $ python ./cloudaix.py --mode run --system_config_path conf/v0.6/general/system/... --test_scenario_path conf/v0.6/general/test_scenario/llama/llama_with_noise.toml

@jeffnvidia jeffnvidia force-pushed the Llama branch 3 times, most recently from a0be834 to 56f7c46 Compare May 29, 2024 14:35
@jeffnvidia jeffnvidia force-pushed the Llama branch 2 times, most recently from ff437f8 to 5d7761a Compare June 3, 2024 13:14
@TaekyungHeo TaekyungHeo merged commit 121bab7 into NVIDIA:main Jun 4, 2024
2 checks passed
@jeffnvidia jeffnvidia deleted the Llama branch July 30, 2024 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants