Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to support garbage collection after torch compilation #2559

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

qiurc
Copy link
Contributor

@qiurc qiurc commented Dec 13, 2024

Summary:
X-link: pytorch/pytorch#142821

This diff is an extension of ezyang's PR https://fburl.com/6uvvzb4f.
In ezyang's PR above, it adds gc after torch compilation finished.
The gc operation is guarded by jk: pytorch/compiler:enable_run_gc_after_compile
The gc op time cost will be logged into dynamo_compile scuba table.

This diff extends the PR to:

  • Use garbage collection on Generation 1 instead of generation 2 (default), which greatly reduced the gc latency overhead from 160 sec per rank to 10 sec per rank.
  • Additionally introduce an environment variance which has the higher priority than the JK to control whether we do gc or not after the torch compilation. (default value set to gc enabled). This environment variance will be used for AB testing of training jobs to compare the pt2 compilation time and memory cost.

Reviewed By: ezyang

Differential Revision: D67062158

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67062158

qiurc added a commit to qiurc/benchmark that referenced this pull request Dec 14, 2024
…orch#2559)

Summary:

X-link: pytorch/pytorch#142821

This diff is an extension of ezyang's PR https://fburl.com/6uvvzb4f.
In ezyang's PR above, it adds gc after torch compilation finished. 
The gc operation is guarded by jk: pytorch/compiler:enable_run_gc_after_compile
The gc op time cost will be logged into dynamo_compile scuba table.

This diff extends the PR to: 
  - Use garbage collection on Generation 1 instead of generation 2 (default), which greatly reduced the gc latency overhead from 160 sec per rank to 10 sec per rank.
  - Additionally introduce an environment variance which has the higher priority than the JK to control whether we do gc or not after the torch compilation. (default value set to gc enabled). This environment variance will be used for AB testing of training jobs to compare the pt2 compilation time and memory cost.

Reviewed By: ezyang, yuxihu

Differential Revision: D67062158
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67062158

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67062158

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67062158

@qiurc qiurc force-pushed the export-D67062158 branch 2 times, most recently from d082407 to 29f6ce7 Compare December 14, 2024 03:01
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67062158

masnesral and others added 2 commits December 15, 2024 17:57
Summary:
See test plan in internal diff [D66679369](https://our.internmc.facebook.com/intern/diff/D66679369)

X-link: pytorch/pytorch#141919
Approved by: https://github.com/jamesjwu, https://github.com/ezyang

Differential Revision: D67218561

Pulled By: masnesral
Differential Revision: D67226982
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D67062158

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants