Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor] Use default device instead of CPU in losses #2687

Merged
merged 9 commits into from
Jan 16, 2025

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jan 10, 2025

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Jan 10, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2687

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 34 Pending

As of commit d2585fa with merge base dc25a55 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 10, 2025
vmoens added a commit that referenced this pull request Jan 10, 2025
ghstack-source-id: 52a013a04a763bdb8c1c77a43a0984babe32bd77
Pull Request resolved: #2687
Copy link

github-actions bot commented Jan 10, 2025

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}5$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_simple 0.5449s 0.4462s 2.2411 Ops/s 2.2695 Ops/s $\color{#d91a1a}-1.25\%$
test_transformed 0.7162s 0.6290s 1.5898 Ops/s 1.6078 Ops/s $\color{#d91a1a}-1.12\%$
test_serial 1.4327s 1.3529s 0.7392 Ops/s 0.7229 Ops/s $\color{#35bf28}+2.25\%$
test_parallel 1.2824s 1.1915s 0.8393 Ops/s 0.8071 Ops/s $\color{#35bf28}+3.99\%$
test_step_mdp_speed[True-True-True-True-True] 0.1641ms 29.6184μs 33.7628 KOps/s 32.9342 KOps/s $\color{#35bf28}+2.52\%$
test_step_mdp_speed[True-True-True-True-False] 59.4210μs 17.5461μs 56.9926 KOps/s 55.1711 KOps/s $\color{#35bf28}+3.30\%$
test_step_mdp_speed[True-True-True-False-True] 44.5940μs 16.8872μs 59.2166 KOps/s 58.0549 KOps/s $\color{#35bf28}+2.00\%$
test_step_mdp_speed[True-True-True-False-False] 34.2940μs 9.9000μs 101.0102 KOps/s 98.6145 KOps/s $\color{#35bf28}+2.43\%$
test_step_mdp_speed[True-True-False-True-True] 84.5550μs 31.7122μs 31.5336 KOps/s 30.5849 KOps/s $\color{#35bf28}+3.10\%$
test_step_mdp_speed[True-True-False-True-False] 72.3760μs 19.4377μs 51.4465 KOps/s 49.8852 KOps/s $\color{#35bf28}+3.13\%$
test_step_mdp_speed[True-True-False-False-True] 58.6390μs 18.8993μs 52.9120 KOps/s 52.0349 KOps/s $\color{#35bf28}+1.69\%$
test_step_mdp_speed[True-True-False-False-False] 61.4980μs 11.7020μs 85.4551 KOps/s 82.7234 KOps/s $\color{#35bf28}+3.30\%$
test_step_mdp_speed[True-False-True-True-True] 75.9220μs 33.5227μs 29.8305 KOps/s 29.1589 KOps/s $\color{#35bf28}+2.30\%$
test_step_mdp_speed[True-False-True-True-False] 57.7580μs 21.4148μs 46.6966 KOps/s 45.8499 KOps/s $\color{#35bf28}+1.85\%$
test_step_mdp_speed[True-False-True-False-True] 51.7360μs 18.9352μs 52.8117 KOps/s 52.1228 KOps/s $\color{#35bf28}+1.32\%$
test_step_mdp_speed[True-False-True-False-False] 58.0380μs 11.7600μs 85.0337 KOps/s 83.0965 KOps/s $\color{#35bf28}+2.33\%$
test_step_mdp_speed[True-False-False-True-True] 94.3890μs 35.5710μs 28.1128 KOps/s 27.3731 KOps/s $\color{#35bf28}+2.70\%$
test_step_mdp_speed[True-False-False-True-False] 78.2740μs 23.2588μs 42.9944 KOps/s 42.2223 KOps/s $\color{#35bf28}+1.83\%$
test_step_mdp_speed[True-False-False-False-True] 82.1240μs 20.5296μs 48.7102 KOps/s 47.7719 KOps/s $\color{#35bf28}+1.96\%$
test_step_mdp_speed[True-False-False-False-False] 42.0980μs 13.6001μs 73.5288 KOps/s 72.3067 KOps/s $\color{#35bf28}+1.69\%$
test_step_mdp_speed[False-True-True-True-True] 88.1250μs 33.9483μs 29.4565 KOps/s 28.8886 KOps/s $\color{#35bf28}+1.97\%$
test_step_mdp_speed[False-True-True-True-False] 69.5270μs 21.4347μs 46.6533 KOps/s 45.5806 KOps/s $\color{#35bf28}+2.35\%$
test_step_mdp_speed[False-True-True-False-True] 50.5650μs 21.5511μs 46.4014 KOps/s 45.7537 KOps/s $\color{#35bf28}+1.42\%$
test_step_mdp_speed[False-True-True-False-False] 86.6220μs 13.5508μs 73.7964 KOps/s 73.4715 KOps/s $\color{#35bf28}+0.44\%$
test_step_mdp_speed[False-True-False-True-True] 73.1260μs 35.0468μs 28.5332 KOps/s 27.6822 KOps/s $\color{#35bf28}+3.07\%$
test_step_mdp_speed[False-True-False-True-False] 61.9860μs 22.9660μs 43.5426 KOps/s 42.4040 KOps/s $\color{#35bf28}+2.69\%$
test_step_mdp_speed[False-True-False-False-True] 2.5836ms 23.3449μs 42.8359 KOps/s 42.0084 KOps/s $\color{#35bf28}+1.97\%$
test_step_mdp_speed[False-True-False-False-False] 60.9920μs 14.7085μs 67.9881 KOps/s 65.4141 KOps/s $\color{#35bf28}+3.93\%$
test_step_mdp_speed[False-False-True-True-True] 0.1086ms 37.1754μs 26.8995 KOps/s 26.1703 KOps/s $\color{#35bf28}+2.79\%$
test_step_mdp_speed[False-False-True-True-False] 60.6140μs 25.0727μs 39.8840 KOps/s 38.9669 KOps/s $\color{#35bf28}+2.35\%$
test_step_mdp_speed[False-False-True-False-True] 56.4160μs 22.9903μs 43.4966 KOps/s 42.3457 KOps/s $\color{#35bf28}+2.72\%$
test_step_mdp_speed[False-False-True-False-False] 57.6480μs 14.9199μs 67.0247 KOps/s 65.4182 KOps/s $\color{#35bf28}+2.46\%$
test_step_mdp_speed[False-False-False-True-True] 90.4090μs 39.0149μs 25.6312 KOps/s 25.4010 KOps/s $\color{#35bf28}+0.91\%$
test_step_mdp_speed[False-False-False-True-False] 58.1890μs 26.5278μs 37.6963 KOps/s 36.3460 KOps/s $\color{#35bf28}+3.72\%$
test_step_mdp_speed[False-False-False-False-True] 56.6770μs 24.5440μs 40.7432 KOps/s 39.6198 KOps/s $\color{#35bf28}+2.84\%$
test_step_mdp_speed[False-False-False-False-False] 62.7380μs 16.4677μs 60.7251 KOps/s 59.1985 KOps/s $\color{#35bf28}+2.58\%$
test_values[generalized_advantage_estimate-True-True] 10.0653ms 9.7563ms 102.4974 Ops/s 103.5871 Ops/s $\color{#d91a1a}-1.05\%$
test_values[vec_generalized_advantage_estimate-True-True] 38.9686ms 34.0285ms 29.3871 Ops/s 28.8895 Ops/s $\color{#35bf28}+1.72\%$
test_values[td0_return_estimate-False-False] 0.2348ms 0.1834ms 5.4522 KOps/s 5.2858 KOps/s $\color{#35bf28}+3.15\%$
test_values[td1_return_estimate-False-False] 28.1620ms 24.2369ms 41.2595 Ops/s 40.6802 Ops/s $\color{#35bf28}+1.42\%$
test_values[vec_td1_return_estimate-False-False] 35.9614ms 33.4967ms 29.8537 Ops/s 29.4865 Ops/s $\color{#35bf28}+1.25\%$
test_values[td_lambda_return_estimate-True-False] 38.5881ms 34.1817ms 29.2554 Ops/s 29.1017 Ops/s $\color{#35bf28}+0.53\%$
test_values[vec_td_lambda_return_estimate-True-False] 39.2012ms 33.7342ms 29.6435 Ops/s 29.4049 Ops/s $\color{#35bf28}+0.81\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 9.6990ms 8.4947ms 117.7207 Ops/s 118.8821 Ops/s $\color{#d91a1a}-0.98\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.3723ms 1.8356ms 544.7827 Ops/s 539.3626 Ops/s $\color{#35bf28}+1.00\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4398ms 0.3575ms 2.7971 KOps/s 2.7606 KOps/s $\color{#35bf28}+1.32\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 43.2593ms 42.2500ms 23.6686 Ops/s 23.0428 Ops/s $\color{#35bf28}+2.72\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 3.9713ms 3.0593ms 326.8719 Ops/s 324.6215 Ops/s $\color{#35bf28}+0.69\%$
test_dqn_speed[False-None] 2.2456ms 1.3959ms 716.3735 Ops/s 700.9315 Ops/s $\color{#35bf28}+2.20\%$
test_dqn_speed[False-backward] 1.9956ms 1.8991ms 526.5579 Ops/s 515.2744 Ops/s $\color{#35bf28}+2.19\%$
test_dqn_speed[True-None] 0.6768ms 0.4832ms 2.0696 KOps/s 1.9905 KOps/s $\color{#35bf28}+3.97\%$
test_dqn_speed[True-backward] 0.9418ms 0.8959ms 1.1161 KOps/s 1.0660 KOps/s $\color{#35bf28}+4.70\%$
test_dqn_speed[reduce-overhead-None] 0.8773ms 0.4958ms 2.0168 KOps/s 2.0202 KOps/s $\color{#d91a1a}-0.17\%$
test_dqn_speed[reduce-overhead-backward] 0.9910ms 0.9221ms 1.0845 KOps/s 1.0845 KOps/s $+0.00\%$
test_ddpg_speed[False-None] 4.3689ms 2.9210ms 342.3455 Ops/s 337.8407 Ops/s $\color{#35bf28}+1.33\%$
test_ddpg_speed[False-backward] 5.2154ms 4.0708ms 245.6496 Ops/s 239.2376 Ops/s $\color{#35bf28}+2.68\%$
test_ddpg_speed[True-None] 1.2722ms 1.0201ms 980.2994 Ops/s 971.8816 Ops/s $\color{#35bf28}+0.87\%$
test_ddpg_speed[True-backward] 2.1691ms 1.9315ms 517.7421 Ops/s 431.0656 Ops/s $\textbf{\color{#35bf28}+20.11\%}$
test_ddpg_speed[reduce-overhead-None] 1.4028ms 1.0151ms 985.1490 Ops/s 972.5548 Ops/s $\color{#35bf28}+1.29\%$
test_ddpg_speed[reduce-overhead-backward] 2.1147ms 1.9321ms 517.5693 Ops/s 520.9489 Ops/s $\color{#d91a1a}-0.65\%$
test_sac_speed[False-None] 9.8495ms 8.0654ms 123.9868 Ops/s 118.5384 Ops/s $\color{#35bf28}+4.60\%$
test_sac_speed[False-backward] 11.9276ms 10.8851ms 91.8685 Ops/s 91.3487 Ops/s $\color{#35bf28}+0.57\%$
test_sac_speed[True-None] 2.0792ms 1.8300ms 546.4464 Ops/s 543.6985 Ops/s $\color{#35bf28}+0.51\%$
test_sac_speed[True-backward] 3.5561ms 3.4986ms 285.8315 Ops/s 282.1465 Ops/s $\color{#35bf28}+1.31\%$
test_sac_speed[reduce-overhead-None] 2.4002ms 1.8531ms 539.6332 Ops/s 540.3104 Ops/s $\color{#d91a1a}-0.13\%$
test_sac_speed[reduce-overhead-backward] 3.6738ms 3.5845ms 278.9805 Ops/s 283.6920 Ops/s $\color{#d91a1a}-1.66\%$
test_redq_speed[False-None] 15.0426ms 13.2588ms 75.4218 Ops/s 76.9406 Ops/s $\color{#d91a1a}-1.97\%$
test_redq_speed[False-backward] 36.9157ms 23.2693ms 42.9751 Ops/s 44.4215 Ops/s $\color{#d91a1a}-3.26\%$
test_redq_speed[True-None] 5.7317ms 4.5915ms 217.7936 Ops/s 214.2645 Ops/s $\color{#35bf28}+1.65\%$
test_redq_speed[True-backward] 12.6088ms 11.9555ms 83.6433 Ops/s 80.3001 Ops/s $\color{#35bf28}+4.16\%$
test_redq_speed[reduce-overhead-None] 5.9455ms 4.7125ms 212.2038 Ops/s 221.8307 Ops/s $\color{#d91a1a}-4.34\%$
test_redq_speed[reduce-overhead-backward] 13.3775ms 12.4273ms 80.4677 Ops/s 80.4478 Ops/s $\color{#35bf28}+0.02\%$
test_redq_deprec_speed[False-None] 15.5299ms 13.0149ms 76.8349 Ops/s 73.8884 Ops/s $\color{#35bf28}+3.99\%$
test_redq_deprec_speed[False-backward] 22.1376ms 19.3023ms 51.8072 Ops/s 52.5585 Ops/s $\color{#d91a1a}-1.43\%$
test_redq_deprec_speed[True-None] 4.5331ms 3.8246ms 261.4657 Ops/s 271.8345 Ops/s $\color{#d91a1a}-3.81\%$
test_redq_deprec_speed[True-backward] 10.1679ms 8.4113ms 118.8880 Ops/s 119.7198 Ops/s $\color{#d91a1a}-0.69\%$
test_redq_deprec_speed[reduce-overhead-None] 4.2290ms 3.6267ms 275.7314 Ops/s 275.3517 Ops/s $\color{#35bf28}+0.14\%$
test_redq_deprec_speed[reduce-overhead-backward] 9.6297ms 8.4065ms 118.9556 Ops/s 114.0677 Ops/s $\color{#35bf28}+4.29\%$
test_td3_speed[False-None] 10.1743ms 8.1276ms 123.0372 Ops/s 119.0418 Ops/s $\color{#35bf28}+3.36\%$
test_td3_speed[False-backward] 11.8111ms 10.5457ms 94.8258 Ops/s 92.1138 Ops/s $\color{#35bf28}+2.94\%$
test_td3_speed[True-None] 2.2938ms 1.7530ms 570.4453 Ops/s 566.7701 Ops/s $\color{#35bf28}+0.65\%$
test_td3_speed[True-backward] 3.5453ms 3.3828ms 295.6121 Ops/s 297.1393 Ops/s $\color{#d91a1a}-0.51\%$
test_td3_speed[reduce-overhead-None] 2.1678ms 1.7467ms 572.5220 Ops/s 557.9931 Ops/s $\color{#35bf28}+2.60\%$
test_td3_speed[reduce-overhead-backward] 3.4614ms 3.3588ms 297.7248 Ops/s 292.5916 Ops/s $\color{#35bf28}+1.75\%$
test_cql_speed[False-None] 40.9920ms 38.3304ms 26.0889 Ops/s 26.8939 Ops/s $\color{#d91a1a}-2.99\%$
test_cql_speed[False-backward] 49.0816ms 46.7830ms 21.3753 Ops/s 20.8553 Ops/s $\color{#35bf28}+2.49\%$
test_cql_speed[True-None] 18.4026ms 15.8359ms 63.1478 Ops/s 61.9942 Ops/s $\color{#35bf28}+1.86\%$
test_cql_speed[True-backward] 23.9803ms 22.6391ms 44.1714 Ops/s 42.6980 Ops/s $\color{#35bf28}+3.45\%$
test_cql_speed[reduce-overhead-None] 18.0185ms 16.1564ms 61.8949 Ops/s 62.0727 Ops/s $\color{#d91a1a}-0.29\%$
test_cql_speed[reduce-overhead-backward] 25.0107ms 23.1992ms 43.1049 Ops/s 44.9345 Ops/s $\color{#d91a1a}-4.07\%$
test_a2c_speed[False-None] 8.7816ms 7.3664ms 135.7516 Ops/s 137.4624 Ops/s $\color{#d91a1a}-1.24\%$
test_a2c_speed[False-backward] 16.0252ms 14.4083ms 69.4043 Ops/s 70.1166 Ops/s $\color{#d91a1a}-1.02\%$
test_a2c_speed[True-None] 4.5496ms 4.2076ms 237.6629 Ops/s 236.1046 Ops/s $\color{#35bf28}+0.66\%$
test_a2c_speed[True-backward] 12.9788ms 10.8473ms 92.1889 Ops/s 93.8060 Ops/s $\color{#d91a1a}-1.72\%$
test_a2c_speed[reduce-overhead-None] 5.6528ms 4.3456ms 230.1182 Ops/s 235.8193 Ops/s $\color{#d91a1a}-2.42\%$
test_a2c_speed[reduce-overhead-backward] 12.1299ms 10.6973ms 93.4819 Ops/s 94.1697 Ops/s $\color{#d91a1a}-0.73\%$
test_ppo_speed[False-None] 9.3512ms 7.6351ms 130.9736 Ops/s 132.7197 Ops/s $\color{#d91a1a}-1.32\%$
test_ppo_speed[False-backward] 17.3498ms 15.2335ms 65.6448 Ops/s 69.0009 Ops/s $\color{#d91a1a}-4.86\%$
test_ppo_speed[True-None] 4.3531ms 3.7270ms 268.3123 Ops/s 267.2383 Ops/s $\color{#35bf28}+0.40\%$
test_ppo_speed[True-backward] 11.1946ms 9.5765ms 104.4222 Ops/s 103.9670 Ops/s $\color{#35bf28}+0.44\%$
test_ppo_speed[reduce-overhead-None] 4.0705ms 3.7238ms 268.5406 Ops/s 268.9654 Ops/s $\color{#d91a1a}-0.16\%$
test_ppo_speed[reduce-overhead-backward] 10.4053ms 9.7433ms 102.6351 Ops/s 104.5547 Ops/s $\color{#d91a1a}-1.84\%$
test_reinforce_speed[False-None] 8.2672ms 6.7725ms 147.6570 Ops/s 152.3898 Ops/s $\color{#d91a1a}-3.11\%$
test_reinforce_speed[False-backward] 10.5284ms 10.0364ms 99.6374 Ops/s 97.6601 Ops/s $\color{#35bf28}+2.02\%$
test_reinforce_speed[True-None] 3.2272ms 2.6583ms 376.1776 Ops/s 353.3546 Ops/s $\textbf{\color{#35bf28}+6.46\%}$
test_reinforce_speed[True-backward] 9.6479ms 8.7692ms 114.0351 Ops/s 111.9929 Ops/s $\color{#35bf28}+1.82\%$
test_reinforce_speed[reduce-overhead-None] 3.4027ms 2.6908ms 371.6405 Ops/s 355.0596 Ops/s $\color{#35bf28}+4.67\%$
test_reinforce_speed[reduce-overhead-backward] 9.6402ms 8.6497ms 115.6104 Ops/s 113.0202 Ops/s $\color{#35bf28}+2.29\%$
test_iql_speed[False-None] 34.7724ms 32.4701ms 30.7976 Ops/s 30.0723 Ops/s $\color{#35bf28}+2.41\%$
test_iql_speed[False-backward] 48.1173ms 45.5908ms 21.9343 Ops/s 15.0166 Ops/s $\textbf{\color{#35bf28}+46.07\%}$
test_iql_speed[True-None] 12.2925ms 10.9118ms 91.6437 Ops/s 92.5662 Ops/s $\color{#d91a1a}-1.00\%$
test_iql_speed[True-backward] 25.9332ms 22.1780ms 45.0898 Ops/s 45.4138 Ops/s $\color{#d91a1a}-0.71\%$
test_iql_speed[reduce-overhead-None] 11.8731ms 10.8415ms 92.2378 Ops/s 91.9006 Ops/s $\color{#35bf28}+0.37\%$
test_iql_speed[reduce-overhead-backward] 23.8651ms 22.8938ms 43.6800 Ops/s 45.4214 Ops/s $\color{#d91a1a}-3.83\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.5112ms 5.1111ms 195.6542 Ops/s 194.1304 Ops/s $\color{#35bf28}+0.78\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9519ms 0.5320ms 1.8798 KOps/s 1.9183 KOps/s $\color{#d91a1a}-2.01\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.8334ms 0.5110ms 1.9571 KOps/s 1.8018 KOps/s $\textbf{\color{#35bf28}+8.62\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.6238ms 4.9312ms 202.7916 Ops/s 212.5205 Ops/s $\color{#d91a1a}-4.58\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.8791ms 0.5261ms 1.9008 KOps/s 1.9672 KOps/s $\color{#d91a1a}-3.37\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.8109ms 0.4975ms 2.0099 KOps/s 2.0969 KOps/s $\color{#d91a1a}-4.15\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 2.3687ms 1.6679ms 599.5729 Ops/s 602.3795 Ops/s $\color{#d91a1a}-0.47\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 2.4553ms 1.5783ms 633.5783 Ops/s 607.6875 Ops/s $\color{#35bf28}+4.26\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.8477ms 4.8541ms 206.0114 Ops/s 206.0919 Ops/s $\color{#d91a1a}-0.04\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.0675ms 0.6612ms 1.5124 KOps/s 1.5429 KOps/s $\color{#d91a1a}-1.97\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 1.3463ms 0.6491ms 1.5407 KOps/s 1.6037 KOps/s $\color{#d91a1a}-3.93\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 5.4575ms 4.8620ms 205.6752 Ops/s 214.5040 Ops/s $\color{#d91a1a}-4.12\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.0683ms 0.5308ms 1.8840 KOps/s 1.9422 KOps/s $\color{#d91a1a}-3.00\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7132ms 0.5057ms 1.9774 KOps/s 2.0212 KOps/s $\color{#d91a1a}-2.17\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 8.2310ms 5.1720ms 193.3497 Ops/s 215.0749 Ops/s $\textbf{\color{#d91a1a}-10.10\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.1592ms 0.5106ms 1.9586 KOps/s 539.0730 Ops/s $\textbf{\color{#35bf28}+263.32\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.9175ms 0.5089ms 1.9649 KOps/s 2.0805 KOps/s $\textbf{\color{#d91a1a}-5.55\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.5302ms 5.1423ms 194.4649 Ops/s 206.4973 Ops/s $\textbf{\color{#d91a1a}-5.83\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.9270ms 0.6621ms 1.5104 KOps/s 1.5287 KOps/s $\color{#d91a1a}-1.20\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 8.0926ms 0.6499ms 1.5387 KOps/s 1.6004 KOps/s $\color{#d91a1a}-3.86\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 5.5975ms 4.1577ms 240.5183 Ops/s 231.7547 Ops/s $\color{#35bf28}+3.78\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 5.3231ms 2.3211ms 430.8389 Ops/s 418.7567 Ops/s $\color{#35bf28}+2.89\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 7.4645ms 1.4307ms 698.9596 Ops/s 758.5154 Ops/s $\textbf{\color{#d91a1a}-7.85\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.4469s 13.2514ms 75.4640 Ops/s 227.2804 Ops/s $\textbf{\color{#d91a1a}-66.80\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 8.5487ms 2.4240ms 412.5489 Ops/s 444.0713 Ops/s $\textbf{\color{#d91a1a}-7.10\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 6.5557ms 1.4208ms 703.8458 Ops/s 706.2494 Ops/s $\color{#d91a1a}-0.34\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 6.1794ms 4.4929ms 222.5752 Ops/s 238.8942 Ops/s $\textbf{\color{#d91a1a}-6.83\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 7.2386ms 2.4733ms 404.3106 Ops/s 418.2796 Ops/s $\color{#d91a1a}-3.34\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 4.7735ms 1.4825ms 674.5342 Ops/s 651.9543 Ops/s $\color{#35bf28}+3.46\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 13.9781ms 13.5091ms 74.0240 Ops/s 72.1388 Ops/s $\color{#35bf28}+2.61\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 17.3337ms 15.0965ms 66.2406 Ops/s 66.9456 Ops/s $\color{#d91a1a}-1.05\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 23.1813ms 22.4856ms 44.4729 Ops/s 44.5659 Ops/s $\color{#d91a1a}-0.21\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.1214ms 15.2347ms 65.6396 Ops/s 66.4141 Ops/s $\color{#d91a1a}-1.17\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 25.0170ms 22.4991ms 44.4461 Ops/s 45.3019 Ops/s $\color{#d91a1a}-1.89\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 18.9324ms 16.5918ms 60.2708 Ops/s 61.2589 Ops/s $\color{#d91a1a}-1.61\%$

Copy link

github-actions bot commented Jan 10, 2025

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}21$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_simple 0.8314s 0.7470s 1.3387 Ops/s 1.3606 Ops/s $\color{#d91a1a}-1.61\%$
test_transformed 0.9733s 0.9711s 1.0297 Ops/s 1.0193 Ops/s $\color{#35bf28}+1.03\%$
test_serial 2.1409s 2.1344s 0.4685 Ops/s 0.4708 Ops/s $\color{#d91a1a}-0.49\%$
test_parallel 1.8551s 1.8235s 0.5484 Ops/s 0.5471 Ops/s $\color{#35bf28}+0.23\%$
test_step_mdp_speed[True-True-True-True-True] 0.1429ms 39.4260μs 25.3639 KOps/s 24.7783 KOps/s $\color{#35bf28}+2.36\%$
test_step_mdp_speed[True-True-True-True-False] 60.9210μs 23.1501μs 43.1963 KOps/s 43.2693 KOps/s $\color{#d91a1a}-0.17\%$
test_step_mdp_speed[True-True-True-False-True] 91.4210μs 21.8193μs 45.8311 KOps/s 44.8306 KOps/s $\color{#35bf28}+2.23\%$
test_step_mdp_speed[True-True-True-False-False] 46.9410μs 12.8454μs 77.8490 KOps/s 78.8526 KOps/s $\color{#d91a1a}-1.27\%$
test_step_mdp_speed[True-True-False-True-True] 82.3420μs 42.7481μs 23.3929 KOps/s 23.4170 KOps/s $\color{#d91a1a}-0.10\%$
test_step_mdp_speed[True-True-False-True-False] 57.3910μs 25.5370μs 39.1589 KOps/s 39.1658 KOps/s $\color{#d91a1a}-0.02\%$
test_step_mdp_speed[True-True-False-False-True] 54.2310μs 24.2588μs 41.2221 KOps/s 40.4738 KOps/s $\color{#35bf28}+1.85\%$
test_step_mdp_speed[True-True-False-False-False] 73.4610μs 15.0830μs 66.2999 KOps/s 66.5945 KOps/s $\color{#d91a1a}-0.44\%$
test_step_mdp_speed[True-False-True-True-True] 95.0020μs 44.2369μs 22.6056 KOps/s 22.6796 KOps/s $\color{#d91a1a}-0.33\%$
test_step_mdp_speed[True-False-True-True-False] 67.8020μs 27.7649μs 36.0167 KOps/s 35.7724 KOps/s $\color{#35bf28}+0.68\%$
test_step_mdp_speed[True-False-True-False-True] 62.9810μs 24.4227μs 40.9455 KOps/s 41.7161 KOps/s $\color{#d91a1a}-1.85\%$
test_step_mdp_speed[True-False-True-False-False] 52.3310μs 15.4935μs 64.5432 KOps/s 65.6357 KOps/s $\color{#d91a1a}-1.66\%$
test_step_mdp_speed[True-False-False-True-True] 91.5020μs 47.5848μs 21.0151 KOps/s 21.3157 KOps/s $\color{#d91a1a}-1.41\%$
test_step_mdp_speed[True-False-False-True-False] 65.8810μs 30.2868μs 33.0176 KOps/s 33.4162 KOps/s $\color{#d91a1a}-1.19\%$
test_step_mdp_speed[True-False-False-False-True] 70.7220μs 26.6958μs 37.4590 KOps/s 37.6223 KOps/s $\color{#d91a1a}-0.43\%$
test_step_mdp_speed[True-False-False-False-False] 45.8510μs 17.3695μs 57.5721 KOps/s 57.6652 KOps/s $\color{#d91a1a}-0.16\%$
test_step_mdp_speed[False-True-True-True-True] 0.1007ms 44.4946μs 22.4747 KOps/s 22.2814 KOps/s $\color{#35bf28}+0.87\%$
test_step_mdp_speed[False-True-True-True-False] 61.3110μs 28.2052μs 35.4544 KOps/s 35.3463 KOps/s $\color{#35bf28}+0.31\%$
test_step_mdp_speed[False-True-True-False-True] 63.5410μs 27.6692μs 36.1412 KOps/s 35.0693 KOps/s $\color{#35bf28}+3.06\%$
test_step_mdp_speed[False-True-True-False-False] 0.4046ms 16.6161μs 60.1826 KOps/s 58.0792 KOps/s $\color{#35bf28}+3.62\%$
test_step_mdp_speed[False-True-False-True-True] 0.4587ms 47.7269μs 20.9525 KOps/s 21.3046 KOps/s $\color{#d91a1a}-1.65\%$
test_step_mdp_speed[False-True-False-True-False] 66.1610μs 30.2580μs 33.0491 KOps/s 33.4239 KOps/s $\color{#d91a1a}-1.12\%$
test_step_mdp_speed[False-True-False-False-True] 3.1577ms 30.7560μs 32.5139 KOps/s 32.0611 KOps/s $\color{#35bf28}+1.41\%$
test_step_mdp_speed[False-True-False-False-False] 59.4910μs 19.3187μs 51.7634 KOps/s 51.9134 KOps/s $\color{#d91a1a}-0.29\%$
test_step_mdp_speed[False-False-True-True-True] 88.9620μs 49.5534μs 20.1802 KOps/s 19.9613 KOps/s $\color{#35bf28}+1.10\%$
test_step_mdp_speed[False-False-True-True-False] 77.3810μs 32.7583μs 30.5266 KOps/s 30.2956 KOps/s $\color{#35bf28}+0.76\%$
test_step_mdp_speed[False-False-True-False-True] 0.4344ms 30.0938μs 33.2294 KOps/s 32.5804 KOps/s $\color{#35bf28}+1.99\%$
test_step_mdp_speed[False-False-True-False-False] 0.4340ms 19.0084μs 52.6084 KOps/s 51.8481 KOps/s $\color{#35bf28}+1.47\%$
test_step_mdp_speed[False-False-False-True-True] 0.4477ms 51.2985μs 19.4938 KOps/s 19.6182 KOps/s $\color{#d91a1a}-0.63\%$
test_step_mdp_speed[False-False-False-True-False] 0.4587ms 34.5634μs 28.9324 KOps/s 28.7456 KOps/s $\color{#35bf28}+0.65\%$
test_step_mdp_speed[False-False-False-False-True] 69.4910μs 32.2047μs 31.0514 KOps/s 31.2593 KOps/s $\color{#d91a1a}-0.67\%$
test_step_mdp_speed[False-False-False-False-False] 0.4427ms 21.6187μs 46.2563 KOps/s 47.2986 KOps/s $\color{#d91a1a}-2.20\%$
test_values[generalized_advantage_estimate-True-True] 25.6053ms 25.1179ms 39.8123 Ops/s 40.1050 Ops/s $\color{#d91a1a}-0.73\%$
test_values[vec_generalized_advantage_estimate-True-True] 96.4162ms 2.8420ms 351.8680 Ops/s 313.3580 Ops/s $\textbf{\color{#35bf28}+12.29\%}$
test_values[td0_return_estimate-False-False] 0.1068ms 82.0844μs 12.1826 KOps/s 12.3419 KOps/s $\color{#d91a1a}-1.29\%$
test_values[td1_return_estimate-False-False] 60.1309ms 57.0665ms 17.5234 Ops/s 17.9424 Ops/s $\color{#d91a1a}-2.34\%$
test_values[vec_td1_return_estimate-False-False] 1.4052ms 1.1025ms 907.0476 Ops/s 917.9438 Ops/s $\color{#d91a1a}-1.19\%$
test_values[td_lambda_return_estimate-True-False] 94.2487ms 90.6006ms 11.0374 Ops/s 11.3265 Ops/s $\color{#d91a1a}-2.55\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.4462ms 1.1011ms 908.2186 Ops/s 921.2015 Ops/s $\color{#d91a1a}-1.41\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 26.3255ms 25.3768ms 39.4061 Ops/s 40.2523 Ops/s $\color{#d91a1a}-2.10\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1.0591ms 0.7716ms 1.2960 KOps/s 1.3253 KOps/s $\color{#d91a1a}-2.21\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7684ms 0.6894ms 1.4505 KOps/s 1.4754 KOps/s $\color{#d91a1a}-1.69\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.7057ms 1.4931ms 669.7317 Ops/s 676.7136 Ops/s $\color{#d91a1a}-1.03\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7440ms 0.7059ms 1.4166 KOps/s 1.4553 KOps/s $\color{#d91a1a}-2.67\%$
test_dqn_speed[False-None] 6.8926ms 1.5457ms 646.9408 Ops/s 659.3430 Ops/s $\color{#d91a1a}-1.88\%$
test_dqn_speed[False-backward] 2.1901ms 2.1371ms 467.9309 Ops/s 471.0485 Ops/s $\color{#d91a1a}-0.66\%$
test_dqn_speed[True-None] 0.6513ms 0.5553ms 1.8009 KOps/s 1.7935 KOps/s $\color{#35bf28}+0.41\%$
test_dqn_speed[True-backward] 1.1839ms 1.1261ms 888.0460 Ops/s 804.5682 Ops/s $\textbf{\color{#35bf28}+10.38\%}$
test_dqn_speed[reduce-overhead-None] 0.7063ms 0.5725ms 1.7468 KOps/s 1.7122 KOps/s $\color{#35bf28}+2.02\%$
test_dqn_speed[reduce-overhead-backward] 1.1251ms 1.0808ms 925.2648 Ops/s 915.9322 Ops/s $\color{#35bf28}+1.02\%$
test_ddpg_speed[False-None] 3.2545ms 2.9084ms 343.8262 Ops/s 345.8183 Ops/s $\color{#d91a1a}-0.58\%$
test_ddpg_speed[False-backward] 4.5708ms 4.3000ms 232.5577 Ops/s 233.8914 Ops/s $\color{#d91a1a}-0.57\%$
test_ddpg_speed[True-None] 1.1774ms 1.0950ms 913.2053 Ops/s 899.2029 Ops/s $\color{#35bf28}+1.56\%$
test_ddpg_speed[True-backward] 2.3714ms 2.3163ms 431.7154 Ops/s 432.8576 Ops/s $\color{#d91a1a}-0.26\%$
test_ddpg_speed[reduce-overhead-None] 1.5309ms 1.1189ms 893.7018 Ops/s 905.2832 Ops/s $\color{#d91a1a}-1.28\%$
test_ddpg_speed[reduce-overhead-backward] 1.8456ms 1.7859ms 559.9472 Ops/s 554.1864 Ops/s $\color{#35bf28}+1.04\%$
test_sac_speed[False-None] 8.6533ms 8.1611ms 122.5326 Ops/s 123.6049 Ops/s $\color{#d91a1a}-0.87\%$
test_sac_speed[False-backward] 11.7734ms 11.3767ms 87.8991 Ops/s 88.8203 Ops/s $\color{#d91a1a}-1.04\%$
test_sac_speed[True-None] 1.9900ms 1.5513ms 644.6168 Ops/s 643.4662 Ops/s $\color{#35bf28}+0.18\%$
test_sac_speed[True-backward] 3.3277ms 3.2454ms 308.1327 Ops/s 290.6724 Ops/s $\textbf{\color{#35bf28}+6.01\%}$
test_sac_speed[reduce-overhead-None] 23.0830ms 12.9345ms 77.3127 Ops/s 77.4898 Ops/s $\color{#d91a1a}-0.23\%$
test_sac_speed[reduce-overhead-backward] 1.4166ms 1.3537ms 738.7148 Ops/s 653.2282 Ops/s $\textbf{\color{#35bf28}+13.09\%}$
test_redq_speed[False-None] 8.4095ms 7.6190ms 131.2516 Ops/s 131.1962 Ops/s $\color{#35bf28}+0.04\%$
test_redq_speed[False-backward] 12.3193ms 11.5298ms 86.7317 Ops/s 84.9149 Ops/s $\color{#35bf28}+2.14\%$
test_redq_speed[True-None] 2.1025ms 2.0135ms 496.6590 Ops/s 501.8534 Ops/s $\color{#d91a1a}-1.04\%$
test_redq_speed[True-backward] 3.9374ms 3.7560ms 266.2439 Ops/s 272.4559 Ops/s $\color{#d91a1a}-2.28\%$
test_redq_speed[reduce-overhead-None] 2.1844ms 2.0767ms 481.5257 Ops/s 499.6082 Ops/s $\color{#d91a1a}-3.62\%$
test_redq_speed[reduce-overhead-backward] 3.7928ms 3.7099ms 269.5512 Ops/s 264.3138 Ops/s $\color{#35bf28}+1.98\%$
test_redq_deprec_speed[False-None] 9.6812ms 9.2564ms 108.0335 Ops/s 108.1939 Ops/s $\color{#d91a1a}-0.15\%$
test_redq_deprec_speed[False-backward] 12.9397ms 12.2867ms 81.3891 Ops/s 80.1544 Ops/s $\color{#35bf28}+1.54\%$
test_redq_deprec_speed[True-None] 2.4980ms 2.3922ms 418.0331 Ops/s 415.8729 Ops/s $\color{#35bf28}+0.52\%$
test_redq_deprec_speed[True-backward] 4.2361ms 4.0735ms 245.4865 Ops/s 248.2893 Ops/s $\color{#d91a1a}-1.13\%$
test_redq_deprec_speed[reduce-overhead-None] 2.4499ms 2.3805ms 420.0831 Ops/s 423.3833 Ops/s $\color{#d91a1a}-0.78\%$
test_redq_deprec_speed[reduce-overhead-backward] 4.1420ms 4.0590ms 246.3657 Ops/s 248.4351 Ops/s $\color{#d91a1a}-0.83\%$
test_td3_speed[False-None] 35.1391ms 8.3464ms 119.8124 Ops/s 123.6917 Ops/s $\color{#d91a1a}-3.14\%$
test_td3_speed[False-backward] 10.9937ms 10.4825ms 95.3967 Ops/s 95.0176 Ops/s $\color{#35bf28}+0.40\%$
test_td3_speed[True-None] 1.6869ms 1.6213ms 616.7770 Ops/s 628.6458 Ops/s $\color{#d91a1a}-1.89\%$
test_td3_speed[True-backward] 3.3933ms 3.1860ms 313.8738 Ops/s 313.2330 Ops/s $\color{#35bf28}+0.20\%$
test_td3_speed[reduce-overhead-None] 60.5207ms 27.1669ms 36.8094 Ops/s 37.7420 Ops/s $\color{#d91a1a}-2.47\%$
test_td3_speed[reduce-overhead-backward] 1.5237ms 1.4691ms 680.7018 Ops/s 759.2254 Ops/s $\textbf{\color{#d91a1a}-10.34\%}$
test_cql_speed[False-None] 17.6329ms 17.0848ms 58.5316 Ops/s 58.6090 Ops/s $\color{#d91a1a}-0.13\%$
test_cql_speed[False-backward] 23.0946ms 22.6349ms 44.1796 Ops/s 44.9612 Ops/s $\color{#d91a1a}-1.74\%$
test_cql_speed[True-None] 3.0859ms 2.9922ms 334.2000 Ops/s 339.8785 Ops/s $\color{#d91a1a}-1.67\%$
test_cql_speed[True-backward] 5.2445ms 5.1384ms 194.6141 Ops/s 190.5041 Ops/s $\color{#35bf28}+2.16\%$
test_cql_speed[reduce-overhead-None] 0.3607s 15.2863ms 65.4181 Ops/s 74.2233 Ops/s $\textbf{\color{#d91a1a}-11.86\%}$
test_cql_speed[reduce-overhead-backward] 1.6107ms 1.5442ms 647.5748 Ops/s 583.4436 Ops/s $\textbf{\color{#35bf28}+10.99\%}$
test_a2c_speed[False-None] 3.4495ms 3.3231ms 300.9280 Ops/s 304.7965 Ops/s $\color{#d91a1a}-1.27\%$
test_a2c_speed[False-backward] 6.6891ms 6.2582ms 159.7905 Ops/s 155.0988 Ops/s $\color{#35bf28}+3.02\%$
test_a2c_speed[True-None] 1.1017ms 1.0338ms 967.2767 Ops/s 973.0328 Ops/s $\color{#d91a1a}-0.59\%$
test_a2c_speed[True-backward] 2.7359ms 2.6502ms 377.3300 Ops/s 362.5039 Ops/s $\color{#35bf28}+4.09\%$
test_a2c_speed[reduce-overhead-None] 22.0023ms 11.8775ms 84.1929 Ops/s 85.6821 Ops/s $\color{#d91a1a}-1.74\%$
test_a2c_speed[reduce-overhead-backward] 1.0449ms 0.9820ms 1.0183 KOps/s 859.7955 Ops/s $\textbf{\color{#35bf28}+18.44\%}$
test_ppo_speed[False-None] 3.8670ms 3.7796ms 264.5805 Ops/s 266.3660 Ops/s $\color{#d91a1a}-0.67\%$
test_ppo_speed[False-backward] 7.4194ms 6.9798ms 143.2703 Ops/s 139.7817 Ops/s $\color{#35bf28}+2.50\%$
test_ppo_speed[True-None] 1.0218ms 0.9757ms 1.0249 KOps/s 1.0296 KOps/s $\color{#d91a1a}-0.46\%$
test_ppo_speed[True-backward] 2.6523ms 2.5927ms 385.6939 Ops/s 389.8611 Ops/s $\color{#d91a1a}-1.07\%$
test_ppo_speed[reduce-overhead-None] 0.5979ms 0.5396ms 1.8533 KOps/s 67.6433 Ops/s $\textbf{\color{#35bf28}+2639.80\%}$
test_ppo_speed[reduce-overhead-backward] 1.0607ms 0.9738ms 1.0269 KOps/s 972.0047 Ops/s $\textbf{\color{#35bf28}+5.65\%}$
test_reinforce_speed[False-None] 2.4109ms 2.3183ms 431.3544 Ops/s 430.1851 Ops/s $\color{#35bf28}+0.27\%$
test_reinforce_speed[False-backward] 3.3788ms 3.3125ms 301.8870 Ops/s 300.4434 Ops/s $\color{#35bf28}+0.48\%$
test_reinforce_speed[True-None] 0.8982ms 0.8455ms 1.1827 KOps/s 1.1326 KOps/s $\color{#35bf28}+4.42\%$
test_reinforce_speed[True-backward] 2.5031ms 2.4340ms 410.8545 Ops/s 404.4188 Ops/s $\color{#35bf28}+1.59\%$
test_reinforce_speed[reduce-overhead-None] 0.2939s 12.3397ms 81.0392 Ops/s 86.6578 Ops/s $\textbf{\color{#d91a1a}-6.48\%}$
test_reinforce_speed[reduce-overhead-backward] 1.0901ms 1.0422ms 959.5253 Ops/s 949.7128 Ops/s $\color{#35bf28}+1.03\%$
test_iql_speed[False-None] 10.0239ms 9.4779ms 105.5089 Ops/s 102.6639 Ops/s $\color{#35bf28}+2.77\%$
test_iql_speed[False-backward] 13.6673ms 13.2123ms 75.6870 Ops/s 74.6767 Ops/s $\color{#35bf28}+1.35\%$
test_iql_speed[True-None] 1.9264ms 1.7964ms 556.6841 Ops/s 534.8500 Ops/s $\color{#35bf28}+4.08\%$
test_iql_speed[True-backward] 4.5811ms 4.2911ms 233.0410 Ops/s 227.0022 Ops/s $\color{#35bf28}+2.66\%$
test_iql_speed[reduce-overhead-None] 20.3053ms 11.7408ms 85.1730 Ops/s 85.9332 Ops/s $\color{#d91a1a}-0.88\%$
test_iql_speed[reduce-overhead-backward] 1.4999ms 1.4340ms 697.3458 Ops/s 672.9900 Ops/s $\color{#35bf28}+3.62\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 8.0659ms 6.5038ms 153.7560 Ops/s 153.2188 Ops/s $\color{#35bf28}+0.35\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.6022ms 0.3017ms 3.3151 KOps/s 2.8188 KOps/s $\textbf{\color{#35bf28}+17.61\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5838ms 0.2835ms 3.5278 KOps/s 2.9885 KOps/s $\textbf{\color{#35bf28}+18.05\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.4819ms 6.2238ms 160.6743 Ops/s 160.6349 Ops/s $\color{#35bf28}+0.02\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7372ms 0.3960ms 2.5253 KOps/s 3.0548 KOps/s $\textbf{\color{#d91a1a}-17.33\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5655ms 0.3081ms 3.2453 KOps/s 3.0260 KOps/s $\textbf{\color{#35bf28}+7.25\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.4773ms 1.2684ms 788.4025 Ops/s 691.6585 Ops/s $\textbf{\color{#35bf28}+13.99\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.5632ms 1.3415ms 745.4239 Ops/s 722.9342 Ops/s $\color{#35bf28}+3.11\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.4771ms 6.3651ms 157.1059 Ops/s 154.7823 Ops/s $\color{#35bf28}+1.50\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8762ms 0.4084ms 2.4487 KOps/s 2.0990 KOps/s $\textbf{\color{#35bf28}+16.66\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7935ms 0.3850ms 2.5974 KOps/s 2.3955 KOps/s $\textbf{\color{#35bf28}+8.43\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.3628ms 6.2299ms 160.5163 Ops/s 159.0973 Ops/s $\color{#35bf28}+0.89\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9740ms 0.3643ms 2.7453 KOps/s 3.1887 KOps/s $\textbf{\color{#d91a1a}-13.91\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.6742ms 0.3104ms 3.2221 KOps/s 3.5124 KOps/s $\textbf{\color{#d91a1a}-8.26\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.3592ms 6.1495ms 162.6158 Ops/s 160.5114 Ops/s $\color{#35bf28}+1.31\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.5160ms 0.2887ms 3.4636 KOps/s 3.1011 KOps/s $\textbf{\color{#35bf28}+11.69\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5801ms 0.2826ms 3.5383 KOps/s 3.0882 KOps/s $\textbf{\color{#35bf28}+14.58\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.5789ms 6.3513ms 157.4474 Ops/s 155.2354 Ops/s $\color{#35bf28}+1.42\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8535ms 0.4154ms 2.4075 KOps/s 2.1313 KOps/s $\textbf{\color{#35bf28}+12.96\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.7451ms 0.3994ms 2.5035 KOps/s 2.2041 KOps/s $\textbf{\color{#35bf28}+13.59\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 7.1495ms 5.5526ms 180.0951 Ops/s 183.0267 Ops/s $\color{#d91a1a}-1.60\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 3.9051ms 1.8483ms 541.0347 Ops/s 434.2631 Ops/s $\textbf{\color{#35bf28}+24.59\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 9.0433ms 1.2455ms 802.9086 Ops/s 869.7394 Ops/s $\textbf{\color{#d91a1a}-7.68\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 7.6816ms 5.4711ms 182.7800 Ops/s 182.7142 Ops/s $\color{#35bf28}+0.04\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 8.4045ms 2.0627ms 484.7997 Ops/s 473.5077 Ops/s $\color{#35bf28}+2.38\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 7.1512ms 1.2470ms 801.9252 Ops/s 781.5490 Ops/s $\color{#35bf28}+2.61\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.4908s 15.4983ms 64.5232 Ops/s 32.9771 Ops/s $\textbf{\color{#35bf28}+95.66\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 10.7619ms 2.2656ms 441.3854 Ops/s 443.5956 Ops/s $\color{#d91a1a}-0.50\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 6.8707ms 1.3663ms 731.8860 Ops/s 694.7436 Ops/s $\textbf{\color{#35bf28}+5.35\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 16.1802ms 15.6328ms 63.9680 Ops/s 62.4878 Ops/s $\color{#35bf28}+2.37\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.6865ms 17.6263ms 56.7333 Ops/s 55.4352 Ops/s $\color{#35bf28}+2.34\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 20.1304ms 19.6092ms 50.9964 Ops/s 49.1124 Ops/s $\color{#35bf28}+3.84\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 20.4937ms 17.7923ms 56.2041 Ops/s 55.1546 Ops/s $\color{#35bf28}+1.90\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 19.9824ms 19.6750ms 50.8259 Ops/s 49.4379 Ops/s $\color{#35bf28}+2.81\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 21.1393ms 19.1903ms 52.1096 Ops/s 51.1403 Ops/s $\color{#35bf28}+1.90\%$

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Jan 10, 2025
ghstack-source-id: dfcb987806f7dfc4d1d9a1ef6a5161a35284fdf0
Pull Request resolved: #2687
@vmoens vmoens added the Refactoring Refactoring of an existing feature label Jan 10, 2025
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@vmoens vmoens merged commit d2585fa into gh/vmoens/65/base Jan 16, 2025
60 of 68 checks passed
@vmoens vmoens deleted the gh/vmoens/65/head branch January 16, 2025 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Refactoring Refactoring of an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants