-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZBVZeroBubble error #774
Comments
if i replace 'loop' to 'v' style:
|
Can you include your full repro instructions? (at least: whole toml file, command-line, branch/commit of torchtitan and pytorch being used) Just sharing the pipeline-specific snippet from the config isn't enough for me to run and reproduce your issue. |
@wconstab I think this feature is missing a proper interface in torchtitan. Here's what @H-Huang told me: ZBVSchedule requires that the stages on each rank to be ordered differently (v-formation). In titan we do interleaved by default so something like: Rank 0: 0, 4, 8 For allocating the stages in a V shape, it requires something like: Rank 0: 0 7 8 Did you mean to do InterleavedZeroBubSchedule instead? Or were you trying out the ZBVschedule? ZBV is new and I still haven't added a test for ZBV in titan and need to figure out a UX friendly way to do it. |
Compared to its importance, I hope there is a way to address #773, as ZeroBubble TGS is quite slow, which renders it ineffective. |
I found that the |
world-size is 8
The text was updated successfully, but these errors were encountered: