-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement NcclTestJobStatusRetrievalStrategy and add corresponding tests #53
Conversation
eb178f9
to
7ef6735
Compare
1515c4b
to
a687059
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, everything is OK and compliant with our design.
But IMO we overuse Strategy pattern. For example, in this case we can put get_job_status(path)
into a particular test implementation itself (src/cloudai/schema/test_template/nccl_test/template.py
):
- By default,
TestTemplate
has no-op returning OK. - Each test overrides it if needed. Because only test knows how to define it.
- We won't need registrations and assertion if such implementation exists. Second part might be especially difficult because one might notice it only while running tests. So we might even consider making it an abstract method to highlight its importance.
I'm not saying we change it right now, but I want to discuss it and better understand why you made it like this.
src/cloudai/schema/test_template/nccl_test/job_status_retrieval_strategy.py
Outdated
Show resolved
Hide resolved
a687059
to
ccd3e7d
Compare
Thanks, @amaslenn. This PR actually depends on #46. Therefore, I assume these comments are for that PR. I agree that we may overuse the strategy pattern. However, I would like to keep the current design for future use cases. Currently, it is used in a slurm system where the path is given directly. However, this may change, and we may need different implementations for each scheduler. For example, in Meta, we cannot access the file system directly because it has an internal remote file system called manifolds and buckets. Therefore, simply providing an output path may not work. Moreover, the job status may need to be retrieved with a special library or interface as they have an internal scheduler and interfaces. For these reasons, I am still actively using the strategy pattern. |
ccd3e7d
to
fa2fe00
Compare
Summary
Implement NcclTestJobStatusRetrievalStrategy and add corresponding tests
Test Plan