New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add option to support garbage collection after torch compilation #2559

Open

qiurc wants to merge 2 commits into pytorch:main from qiurc:export-D67062158

Contributor

qiurc commented Dec 13, 2024

Summary:
X-link: pytorch/pytorch#142821

This diff is an extension of ezyang's PR https://fburl.com/6uvvzb4f.
In ezyang's PR above, it adds gc after torch compilation finished.
The gc operation is guarded by jk: pytorch/compiler:enable_run_gc_after_compile
The gc op time cost will be logged into dynamo_compile scuba table.

This diff extends the PR to:

Use garbage collection on Generation 1 instead of generation 2 (default), which greatly reduced the gc latency overhead from 160 sec per rank to 10 sec per rank.
Additionally introduce an environment variance which has the higher priority than the JK to control whether we do gc or not after the torch compilation. (default value set to gc enabled). This environment variance will be used for AB testing of training jobs to compare the pt2 compilation time and memory cost.

Reviewed By: ezyang

Differential Revision: D67062158

facebook-github-bot added the cla signed label

Contributor

facebook-github-bot commented Dec 13, 2024

This pull request was exported from Phabricator. Differential Revision: D67062158

facebook-github-bot added the fb-exported label

qiurc added a commit to qiurc/benchmark that referenced this pull request


          Add option to support garbage collection after torch compilation (pyt…

95f500a

…orch#2559)

Summary:

X-link: pytorch/pytorch#142821

This diff is an extension of ezyang's PR https://fburl.com/6uvvzb4f.
In ezyang's PR above, it adds gc after torch compilation finished. 
The gc operation is guarded by jk: pytorch/compiler:enable_run_gc_after_compile
The gc op time cost will be logged into dynamo_compile scuba table.

This diff extends the PR to: 
  - Use garbage collection on Generation 1 instead of generation 2 (default), which greatly reduced the gc latency overhead from 160 sec per rank to 10 sec per rank.
  - Additionally introduce an environment variance which has the higher priority than the JK to control whether we do gc or not after the torch compilation. (default value set to gc enabled). This environment variance will be used for AB testing of training jobs to compare the pt2 compilation time and memory cost.

Reviewed By: ezyang, yuxihu

Differential Revision: D67062158

qiurc force-pushed the export-D67062158 branch from b5754ce to 95f500a Compare

December 14, 2024 00:13

Contributor

facebook-github-bot commented Dec 14, 2024

This pull request was exported from Phabricator. Differential Revision: D67062158

1 similar comment

Contributor

facebook-github-bot commented Dec 14, 2024

This pull request was exported from Phabricator. Differential Revision: D67062158

qiurc force-pushed the export-D67062158 branch from 95f500a to 68829a6 Compare

December 14, 2024 00:36

Contributor

facebook-github-bot commented Dec 14, 2024

This pull request was exported from Phabricator. Differential Revision: D67062158

qiurc force-pushed the export-D67062158 branch 2 times, most recently from d082407 to 29f6ce7 Compare

December 14, 2024 03:01

Contributor

facebook-github-bot commented Dec 14, 2024

This pull request was exported from Phabricator. Differential Revision: D67062158

masnesral and others added 2 commits

December 15, 2024 17:57


          Log runtime autotuning timing to scuba (#141919)

409d280

Summary:
See test plan in internal diff [D66679369](https://our.internmc.facebook.com/intern/diff/D66679369)

X-link: pytorch/pytorch#141919
Approved by: https://github.com/jamesjwu, https://github.com/ezyang

Differential Revision: D67218561

Pulled By: masnesral


          init

Differential Revision: D67226982

Contributor

facebook-github-bot commented Dec 16, 2024

This pull request was exported from Phabricator. Differential Revision: D67062158

qiurc force-pushed the export-D67062158 branch from 29f6ce7 to 0489041 Compare

December 16, 2024 02:06

qiurc had a problem deploying to docker-s3-upload

December 16, 2024 02:06

— with

GitHub Actions Failure

qiurc temporarily deployed to docker-s3-upload

December 16, 2024 02:06

— with

GitHub Actions Inactive

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed fb-exported