-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add thunder benchmarks #3394
base: main
Are you sure you want to change the base?
Add thunder benchmarks #3394
Changes from all commits
85c5328
219f8a3
29dd738
68454ee
4fe7ad3
b7e0b3e
7821174
615c0e0
dd302c9
083d89d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,7 +14,6 @@ | |
import thunder | ||
from thunder.executors.nvfuserex import nvfuserex | ||
|
||
|
||
# These variables can be overwritten through CLI commands | ||
# --benchmark-rounds=rounds --benchmark-warmup-rounds=warmup_rounds | ||
# --benchmark-num-inputs=num_inputs | ||
|
@@ -23,6 +22,8 @@ | |
L2_CACHE_SIZE = DEVICE_PROPERTIES["gpu_l2_bytes"] | ||
PEAK_BANDWIDTH_GBPS = DEVICE_PROPERTIES["gpu_peak_bandwidth_gbps"] | ||
|
||
DEFAULT_EXECUTORS = ["eager", "torchcompile", "thunder"] | ||
|
||
|
||
def clear_l2_cache() -> None: | ||
""" | ||
|
@@ -44,7 +45,8 @@ def clear_dynamo_cache() -> None: | |
|
||
|
||
# Backward function for torch baseline benchmarks. | ||
def unary_bwd_torch(inputs: List): # [output, grad_out] | ||
# The first two inputs are expected to be out and grad_out. The remaining are inputs of the forward pass used to clear grad between subsequent runs to avoid grad accumulation. See setup() in run_benchmark(). | ||
def unary_bwd_torch(inputs: List): # [output, grad_out, fwd_inputs] | ||
inputs[0].backward(inputs[1], retain_graph=True) | ||
|
||
|
||
|
@@ -329,6 +331,9 @@ def run_benchmark( | |
def setup(): | ||
clear_l2_cache() | ||
if device == "cuda": | ||
for inp in inputs: | ||
if isinstance(inp, torch.Tensor): | ||
inp.grad = None | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for this one. But this is only the cases where input requires gradient. Are we also clearing gradient on parameters? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you mean, for instance, weights in layernorm? Then, yes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm curious how this works in code. If i.e. something like this
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ahh you're right. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got'ya. No worries. I'm not totally clear what's the protocol in thunder on ownership of parameters, I think it's supposed to be a functional compilation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. BTW, this could also contribute to potential performance diff. if there are parameter requiring grad, thunder will generate backward graph and save intermediates, regardless of whether backward is being called or not. |
||
return [inputs], {} | ||
|
||
# Device = 'host' | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should be named differently since these are not run in nightly, but for most benchmarks, these are the set of executors we execute weekly. We also have
thunder-torchcompile
for RoPE.Maybe
BASELINE_EXECUTORS
is better, although Thunder is not really a baseline.