Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tutorial] Beam search with GPT models #2623

Open
wants to merge 8 commits into
base: gh/vmoens/47/base
Choose a base branch
from

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Dec 2, 2024

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Dec 2, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2623

Note: Links to docs will display an error until the docs builds have been completed.

❌ 9 New Failures, 7 Unrelated Failures

As of commit 8949b19 with merge base 133d709 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vmoens added a commit that referenced this pull request Dec 2, 2024
ghstack-source-id: b37305f2d8c42a070c1113435cabf46926a4fa12
Pull Request resolved: #2623
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 2, 2024
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Dec 3, 2024
ghstack-source-id: 62f96bf1965a65ca35485de6ee66260abe33f117
Pull Request resolved: #2623
Copy link

github-actions bot commented Dec 3, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}30$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_simple 0.4404s 0.4385s 2.2807 Ops/s 2.2167 Ops/s $\color{#35bf28}+2.89\%$
test_transformed 0.6529s 0.6229s 1.6054 Ops/s 1.6256 Ops/s $\color{#d91a1a}-1.25\%$
test_serial 1.3953s 1.3885s 0.7202 Ops/s 0.7208 Ops/s $\color{#d91a1a}-0.09\%$
test_parallel 1.3428s 1.2315s 0.8120 Ops/s 0.8049 Ops/s $\color{#35bf28}+0.89\%$
test_step_mdp_speed[True-True-True-True-True] 0.2810ms 31.2296μs 32.0209 KOps/s 32.3086 KOps/s $\color{#d91a1a}-0.89\%$
test_step_mdp_speed[True-True-True-True-False] 54.4710μs 18.2950μs 54.6597 KOps/s 54.5406 KOps/s $\color{#35bf28}+0.22\%$
test_step_mdp_speed[True-True-True-False-True] 75.1400μs 17.5992μs 56.8209 KOps/s 57.8115 KOps/s $\color{#d91a1a}-1.71\%$
test_step_mdp_speed[True-True-True-False-False] 41.4280μs 10.2852μs 97.2274 KOps/s 97.7530 KOps/s $\color{#d91a1a}-0.54\%$
test_step_mdp_speed[True-True-False-True-True] 0.1009ms 33.0961μs 30.2151 KOps/s 30.4697 KOps/s $\color{#d91a1a}-0.84\%$
test_step_mdp_speed[True-True-False-True-False] 78.6470μs 20.2533μs 49.3746 KOps/s 49.6535 KOps/s $\color{#d91a1a}-0.56\%$
test_step_mdp_speed[True-True-False-False-True] 99.2550μs 19.2484μs 51.9523 KOps/s 51.5570 KOps/s $\color{#35bf28}+0.77\%$
test_step_mdp_speed[True-True-False-False-False] 45.0240μs 12.3143μs 81.2062 KOps/s 82.6478 KOps/s $\color{#d91a1a}-1.74\%$
test_step_mdp_speed[True-False-True-True-True] 98.1740μs 35.1724μs 28.4314 KOps/s 28.8646 KOps/s $\color{#d91a1a}-1.50\%$
test_step_mdp_speed[True-False-True-True-False] 79.5180μs 22.0711μs 45.3081 KOps/s 45.3934 KOps/s $\color{#d91a1a}-0.19\%$
test_step_mdp_speed[True-False-True-False-True] 52.8190μs 19.4455μs 51.4258 KOps/s 51.9320 KOps/s $\color{#d91a1a}-0.97\%$
test_step_mdp_speed[True-False-True-False-False] 77.3590μs 12.4393μs 80.3901 KOps/s 82.6367 KOps/s $\color{#d91a1a}-2.72\%$
test_step_mdp_speed[True-False-False-True-True] 0.1063ms 36.3495μs 27.5107 KOps/s 27.4781 KOps/s $\color{#35bf28}+0.12\%$
test_step_mdp_speed[True-False-False-True-False] 90.0280μs 24.2106μs 41.3043 KOps/s 41.8284 KOps/s $\color{#d91a1a}-1.25\%$
test_step_mdp_speed[True-False-False-False-True] 0.4939ms 21.2301μs 47.1030 KOps/s 47.4247 KOps/s $\color{#d91a1a}-0.68\%$
test_step_mdp_speed[True-False-False-False-False] 74.3890μs 14.2180μs 70.3335 KOps/s 71.7681 KOps/s $\color{#d91a1a}-2.00\%$
test_step_mdp_speed[False-True-True-True-True] 67.9270μs 35.3610μs 28.2798 KOps/s 28.2544 KOps/s $\color{#35bf28}+0.09\%$
test_step_mdp_speed[False-True-True-True-False] 81.1910μs 22.4197μs 44.6037 KOps/s 44.8141 KOps/s $\color{#d91a1a}-0.47\%$
test_step_mdp_speed[False-True-True-False-True] 94.8470μs 22.2219μs 45.0007 KOps/s 45.2978 KOps/s $\color{#d91a1a}-0.66\%$
test_step_mdp_speed[False-True-True-False-False] 47.4080μs 13.7613μs 72.6677 KOps/s 74.6100 KOps/s $\color{#d91a1a}-2.60\%$
test_step_mdp_speed[False-True-False-True-True] 0.1066ms 36.7658μs 27.1992 KOps/s 27.1202 KOps/s $\color{#35bf28}+0.29\%$
test_step_mdp_speed[False-True-False-True-False] 68.3380μs 23.9413μs 41.7689 KOps/s 41.4437 KOps/s $\color{#35bf28}+0.78\%$
test_step_mdp_speed[False-True-False-False-True] 3.1460ms 24.3509μs 41.0662 KOps/s 41.6896 KOps/s $\color{#d91a1a}-1.50\%$
test_step_mdp_speed[False-True-False-False-False] 0.2670ms 16.2223μs 61.6437 KOps/s 65.0085 KOps/s $\textbf{\color{#d91a1a}-5.18\%}$
test_step_mdp_speed[False-False-True-True-True] 0.1023ms 38.3626μs 26.0670 KOps/s 26.1223 KOps/s $\color{#d91a1a}-0.21\%$
test_step_mdp_speed[False-False-True-True-False] 85.5400μs 26.0293μs 38.4183 KOps/s 38.4092 KOps/s $\color{#35bf28}+0.02\%$
test_step_mdp_speed[False-False-True-False-True] 60.7340μs 23.8858μs 41.8658 KOps/s 42.0543 KOps/s $\color{#d91a1a}-0.45\%$
test_step_mdp_speed[False-False-True-False-False] 72.7450μs 15.4357μs 64.7850 KOps/s 64.6514 KOps/s $\color{#35bf28}+0.21\%$
test_step_mdp_speed[False-False-False-True-True] 0.1184ms 39.6747μs 25.2050 KOps/s 24.7851 KOps/s $\color{#35bf28}+1.69\%$
test_step_mdp_speed[False-False-False-True-False] 0.1730ms 28.0396μs 35.6638 KOps/s 35.8669 KOps/s $\color{#d91a1a}-0.57\%$
test_step_mdp_speed[False-False-False-False-True] 96.2890μs 25.5607μs 39.1225 KOps/s 39.5790 KOps/s $\color{#d91a1a}-1.15\%$
test_step_mdp_speed[False-False-False-False-False] 0.5955ms 17.1004μs 58.4782 KOps/s 58.7457 KOps/s $\color{#d91a1a}-0.46\%$
test_values[generalized_advantage_estimate-True-True] 10.4558ms 10.1418ms 98.6016 Ops/s 103.2719 Ops/s $\color{#d91a1a}-4.52\%$
test_values[vec_generalized_advantage_estimate-True-True] 37.3525ms 34.5097ms 28.9774 Ops/s 29.8238 Ops/s $\color{#d91a1a}-2.84\%$
test_values[td0_return_estimate-False-False] 0.2329ms 0.1927ms 5.1898 KOps/s 5.3747 KOps/s $\color{#d91a1a}-3.44\%$
test_values[td1_return_estimate-False-False] 27.5656ms 24.5993ms 40.6515 Ops/s 41.8822 Ops/s $\color{#d91a1a}-2.94\%$
test_values[vec_td1_return_estimate-False-False] 35.7659ms 33.6179ms 29.7461 Ops/s 29.9448 Ops/s $\color{#d91a1a}-0.66\%$
test_values[td_lambda_return_estimate-True-False] 38.7931ms 35.3594ms 28.2810 Ops/s 28.9922 Ops/s $\color{#d91a1a}-2.45\%$
test_values[vec_td_lambda_return_estimate-True-False] 36.8479ms 33.7210ms 29.6551 Ops/s 29.8446 Ops/s $\color{#d91a1a}-0.63\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 10.3320ms 8.4863ms 117.8370 Ops/s 120.2012 Ops/s $\color{#d91a1a}-1.97\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.2098ms 1.7672ms 565.8562 Ops/s 554.4300 Ops/s $\color{#35bf28}+2.06\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4844ms 0.3615ms 2.7659 KOps/s 2.8108 KOps/s $\color{#d91a1a}-1.60\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 45.1850ms 40.8571ms 24.4756 Ops/s 23.5019 Ops/s $\color{#35bf28}+4.14\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 3.7658ms 3.0692ms 325.8193 Ops/s 327.6148 Ops/s $\color{#d91a1a}-0.55\%$
test_dqn_speed[False-None] 5.6575ms 1.4274ms 700.5793 Ops/s 718.9446 Ops/s $\color{#d91a1a}-2.55\%$
test_dqn_speed[False-backward] 2.0623ms 1.9290ms 518.4031 Ops/s 539.4248 Ops/s $\color{#d91a1a}-3.90\%$
test_dqn_speed[True-None] 0.6943ms 0.4826ms 2.0723 KOps/s 2.0882 KOps/s $\color{#d91a1a}-0.76\%$
test_dqn_speed[True-backward] 0.9551ms 0.9123ms 1.0961 KOps/s 818.5333 Ops/s $\textbf{\color{#35bf28}+33.91\%}$
test_dqn_speed[reduce-overhead-None] 0.6248ms 0.4854ms 2.0602 KOps/s 2.1167 KOps/s $\color{#d91a1a}-2.67\%$
test_dqn_speed[reduce-overhead-backward] 1.0130ms 0.9148ms 1.0932 KOps/s 1.1052 KOps/s $\color{#d91a1a}-1.09\%$
test_ddpg_speed[False-None] 3.4000ms 2.9634ms 337.4527 Ops/s 345.5664 Ops/s $\color{#d91a1a}-2.35\%$
test_ddpg_speed[False-backward] 4.2281ms 4.1035ms 243.6918 Ops/s 250.2338 Ops/s $\color{#d91a1a}-2.61\%$
test_ddpg_speed[True-None] 1.6942ms 1.0389ms 962.5937 Ops/s 990.2992 Ops/s $\color{#d91a1a}-2.80\%$
test_ddpg_speed[True-backward] 2.6524ms 1.9948ms 501.2915 Ops/s 525.9010 Ops/s $\color{#d91a1a}-4.68\%$
test_ddpg_speed[reduce-overhead-None] 1.5053ms 1.0346ms 966.5496 Ops/s 993.6274 Ops/s $\color{#d91a1a}-2.73\%$
test_ddpg_speed[reduce-overhead-backward] 2.0858ms 1.9578ms 510.7898 Ops/s 527.9350 Ops/s $\color{#d91a1a}-3.25\%$
test_sac_speed[False-None] 8.7957ms 8.1656ms 122.4652 Ops/s 122.9222 Ops/s $\color{#d91a1a}-0.37\%$
test_sac_speed[False-backward] 11.8356ms 11.2676ms 88.7500 Ops/s 92.6571 Ops/s $\color{#d91a1a}-4.22\%$
test_sac_speed[True-None] 2.1921ms 1.8770ms 532.7552 Ops/s 545.3069 Ops/s $\color{#d91a1a}-2.30\%$
test_sac_speed[True-backward] 4.0419ms 3.7864ms 264.1059 Ops/s 283.5879 Ops/s $\textbf{\color{#d91a1a}-6.87\%}$
test_sac_speed[reduce-overhead-None] 2.3472ms 1.8957ms 527.5083 Ops/s 542.2530 Ops/s $\color{#d91a1a}-2.72\%$
test_sac_speed[reduce-overhead-backward] 3.6325ms 3.5648ms 280.5212 Ops/s 285.4160 Ops/s $\color{#d91a1a}-1.71\%$
test_redq_speed[False-None] 14.6502ms 13.1919ms 75.8038 Ops/s 77.7103 Ops/s $\color{#d91a1a}-2.45\%$
test_redq_speed[False-backward] 26.8527ms 23.1257ms 43.2420 Ops/s 43.9517 Ops/s $\color{#d91a1a}-1.61\%$
test_redq_speed[True-None] 5.9483ms 5.1043ms 195.9141 Ops/s 223.4048 Ops/s $\textbf{\color{#d91a1a}-12.31\%}$
test_redq_speed[True-backward] 12.9772ms 12.4868ms 80.0847 Ops/s 80.0438 Ops/s $\color{#35bf28}+0.05\%$
test_redq_speed[reduce-overhead-None] 5.4450ms 4.9157ms 203.4281 Ops/s 217.9846 Ops/s $\textbf{\color{#d91a1a}-6.68\%}$
test_redq_speed[reduce-overhead-backward] 13.9246ms 12.7122ms 78.6649 Ops/s 84.0688 Ops/s $\textbf{\color{#d91a1a}-6.43\%}$
test_redq_deprec_speed[False-None] 14.6456ms 13.2455ms 75.4972 Ops/s 78.2901 Ops/s $\color{#d91a1a}-3.57\%$
test_redq_deprec_speed[False-backward] 20.4319ms 19.2695ms 51.8954 Ops/s 54.5962 Ops/s $\color{#d91a1a}-4.95\%$
test_redq_deprec_speed[True-None] 4.0302ms 3.6162ms 276.5370 Ops/s 274.7004 Ops/s $\color{#35bf28}+0.67\%$
test_redq_deprec_speed[True-backward] 9.4108ms 8.5239ms 117.3175 Ops/s 120.2187 Ops/s $\color{#d91a1a}-2.41\%$
test_redq_deprec_speed[reduce-overhead-None] 4.8692ms 3.5800ms 279.3306 Ops/s 277.8262 Ops/s $\color{#35bf28}+0.54\%$
test_redq_deprec_speed[reduce-overhead-backward] 8.4235ms 8.1506ms 122.6904 Ops/s 118.2574 Ops/s $\color{#35bf28}+3.75\%$
test_td3_speed[False-None] 8.1892ms 7.9200ms 126.2634 Ops/s 122.5588 Ops/s $\color{#35bf28}+3.02\%$
test_td3_speed[False-backward] 12.2626ms 10.3296ms 96.8096 Ops/s 95.2698 Ops/s $\color{#35bf28}+1.62\%$
test_td3_speed[True-None] 1.8860ms 1.7137ms 583.5471 Ops/s 574.3862 Ops/s $\color{#35bf28}+1.59\%$
test_td3_speed[True-backward] 3.4133ms 3.3170ms 301.4779 Ops/s 284.1521 Ops/s $\textbf{\color{#35bf28}+6.10\%}$
test_td3_speed[reduce-overhead-None] 2.0724ms 1.7458ms 572.8164 Ops/s 566.7116 Ops/s $\color{#35bf28}+1.08\%$
test_td3_speed[reduce-overhead-backward] 3.4748ms 3.3212ms 301.0983 Ops/s 289.9271 Ops/s $\color{#35bf28}+3.85\%$
test_cql_speed[False-None] 47.0946ms 36.7055ms 27.2439 Ops/s 26.4888 Ops/s $\color{#35bf28}+2.85\%$
test_cql_speed[False-backward] 49.0883ms 46.8177ms 21.3595 Ops/s 20.8691 Ops/s $\color{#35bf28}+2.35\%$
test_cql_speed[True-None] 16.8852ms 15.7743ms 63.3941 Ops/s 62.7221 Ops/s $\color{#35bf28}+1.07\%$
test_cql_speed[True-backward] 23.6543ms 22.5739ms 44.2990 Ops/s 44.4770 Ops/s $\color{#d91a1a}-0.40\%$
test_cql_speed[reduce-overhead-None] 16.9229ms 15.9864ms 62.5533 Ops/s 60.6006 Ops/s $\color{#35bf28}+3.22\%$
test_cql_speed[reduce-overhead-backward] 24.5218ms 23.2130ms 43.0793 Ops/s 42.9688 Ops/s $\color{#35bf28}+0.26\%$
test_a2c_speed[False-None] 9.1324ms 7.3994ms 135.1465 Ops/s 132.3160 Ops/s $\color{#35bf28}+2.14\%$
test_a2c_speed[False-backward] 16.1642ms 14.8631ms 67.2806 Ops/s 67.9838 Ops/s $\color{#d91a1a}-1.03\%$
test_a2c_speed[True-None] 4.9050ms 4.2291ms 236.4551 Ops/s 226.3751 Ops/s $\color{#35bf28}+4.45\%$
test_a2c_speed[True-backward] 11.4049ms 10.9758ms 91.1093 Ops/s 87.7999 Ops/s $\color{#35bf28}+3.77\%$
test_a2c_speed[reduce-overhead-None] 4.6265ms 4.2163ms 237.1766 Ops/s 232.1793 Ops/s $\color{#35bf28}+2.15\%$
test_a2c_speed[reduce-overhead-backward] 11.8338ms 11.3383ms 88.1970 Ops/s 87.8386 Ops/s $\color{#35bf28}+0.41\%$
test_ppo_speed[False-None] 9.1730ms 7.8100ms 128.0405 Ops/s 121.1559 Ops/s $\textbf{\color{#35bf28}+5.68\%}$
test_ppo_speed[False-backward] 16.4153ms 15.5891ms 64.1475 Ops/s 64.5257 Ops/s $\color{#d91a1a}-0.59\%$
test_ppo_speed[True-None] 4.4048ms 3.8134ms 262.2350 Ops/s 264.9727 Ops/s $\color{#d91a1a}-1.03\%$
test_ppo_speed[True-backward] 10.5479ms 9.9718ms 100.2831 Ops/s 102.1698 Ops/s $\color{#d91a1a}-1.85\%$
test_ppo_speed[reduce-overhead-None] 4.3566ms 3.8052ms 262.7989 Ops/s 262.5827 Ops/s $\color{#35bf28}+0.08\%$
test_ppo_speed[reduce-overhead-backward] 10.5713ms 9.8209ms 101.8236 Ops/s 97.9643 Ops/s $\color{#35bf28}+3.94\%$
test_reinforce_speed[False-None] 7.9115ms 6.8780ms 145.3914 Ops/s 147.5591 Ops/s $\color{#d91a1a}-1.47\%$
test_reinforce_speed[False-backward] 10.7185ms 10.3307ms 96.7987 Ops/s 97.8597 Ops/s $\color{#d91a1a}-1.08\%$
test_reinforce_speed[True-None] 3.1517ms 2.7326ms 365.9509 Ops/s 345.3458 Ops/s $\textbf{\color{#35bf28}+5.97\%}$
test_reinforce_speed[True-backward] 9.3650ms 8.8149ms 113.4446 Ops/s 109.0049 Ops/s $\color{#35bf28}+4.07\%$
test_reinforce_speed[reduce-overhead-None] 3.0457ms 2.6977ms 370.6896 Ops/s 349.6610 Ops/s $\textbf{\color{#35bf28}+6.01\%}$
test_reinforce_speed[reduce-overhead-backward] 9.3327ms 8.8797ms 112.6164 Ops/s 108.5337 Ops/s $\color{#35bf28}+3.76\%$
test_iql_speed[False-None] 33.3469ms 32.4632ms 30.8041 Ops/s 29.8794 Ops/s $\color{#35bf28}+3.09\%$
test_iql_speed[False-backward] 46.4922ms 45.4341ms 22.0099 Ops/s 21.0961 Ops/s $\color{#35bf28}+4.33\%$
test_iql_speed[True-None] 11.5944ms 10.9658ms 91.1924 Ops/s 80.9574 Ops/s $\textbf{\color{#35bf28}+12.64\%}$
test_iql_speed[True-backward] 23.5174ms 22.5669ms 44.3126 Ops/s 40.8209 Ops/s $\textbf{\color{#35bf28}+8.55\%}$
test_iql_speed[reduce-overhead-None] 12.1372ms 10.9661ms 91.1901 Ops/s 86.4290 Ops/s $\textbf{\color{#35bf28}+5.51\%}$
test_iql_speed[reduce-overhead-backward] 22.7993ms 21.7708ms 45.9332 Ops/s 43.2114 Ops/s $\textbf{\color{#35bf28}+6.30\%}$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.6634ms 4.8648ms 205.5595 Ops/s 184.2830 Ops/s $\textbf{\color{#35bf28}+11.55\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.9305ms 0.5549ms 1.8021 KOps/s 1.6681 KOps/s $\textbf{\color{#35bf28}+8.03\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.8334ms 0.4937ms 2.0256 KOps/s 1.9728 KOps/s $\color{#35bf28}+2.67\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.8258ms 4.6715ms 214.0625 Ops/s 198.2983 Ops/s $\textbf{\color{#35bf28}+7.95\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.5600ms 0.5055ms 1.9784 KOps/s 1.9161 KOps/s $\color{#35bf28}+3.25\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7781ms 0.4778ms 2.0928 KOps/s 2.0065 KOps/s $\color{#35bf28}+4.30\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 2.7631ms 1.6496ms 606.2076 Ops/s 567.5803 Ops/s $\textbf{\color{#35bf28}+6.81\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 2.8490ms 1.6122ms 620.2637 Ops/s 623.6216 Ops/s $\color{#d91a1a}-0.54\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.4551ms 4.9047ms 203.8874 Ops/s 189.1160 Ops/s $\textbf{\color{#35bf28}+7.81\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 3.1848ms 0.6550ms 1.5268 KOps/s 1.4726 KOps/s $\color{#35bf28}+3.68\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.9209ms 0.6241ms 1.6022 KOps/s 1.5416 KOps/s $\color{#35bf28}+3.93\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.4822ms 4.7221ms 211.7709 Ops/s 195.8637 Ops/s $\textbf{\color{#35bf28}+8.12\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2.2713ms 0.5132ms 1.9486 KOps/s 1.8785 KOps/s $\color{#35bf28}+3.73\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.8717ms 0.5039ms 1.9847 KOps/s 1.9672 KOps/s $\color{#35bf28}+0.89\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 5.4834ms 4.6493ms 215.0850 Ops/s 195.7386 Ops/s $\textbf{\color{#35bf28}+9.88\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.4819ms 0.5032ms 1.9874 KOps/s 1.8620 KOps/s $\textbf{\color{#35bf28}+6.73\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.9106ms 0.4815ms 2.0769 KOps/s 1.9954 KOps/s $\color{#35bf28}+4.09\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 4.9395ms 4.7251ms 211.6349 Ops/s 185.7568 Ops/s $\textbf{\color{#35bf28}+13.93\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.1739ms 0.6506ms 1.5370 KOps/s 1.4442 KOps/s $\textbf{\color{#35bf28}+6.42\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8372ms 0.6178ms 1.6185 KOps/s 1.4342 KOps/s $\textbf{\color{#35bf28}+12.85\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.4262s 12.6620ms 78.9766 Ops/s 34.8446 Ops/s $\textbf{\color{#35bf28}+126.65\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 7.4478ms 2.3241ms 430.2685 Ops/s 387.8829 Ops/s $\textbf{\color{#35bf28}+10.93\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 4.4526ms 1.2984ms 770.1657 Ops/s 753.0093 Ops/s $\color{#35bf28}+2.28\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 5.4490ms 4.2341ms 236.1794 Ops/s 222.5966 Ops/s $\textbf{\color{#35bf28}+6.10\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 8.1040ms 2.3122ms 432.4813 Ops/s 411.3339 Ops/s $\textbf{\color{#35bf28}+5.14\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 2.4490ms 1.3090ms 763.9270 Ops/s 697.6822 Ops/s $\textbf{\color{#35bf28}+9.49\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.3684s 11.7617ms 85.0219 Ops/s 216.7870 Ops/s $\textbf{\color{#d91a1a}-60.78\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 3.5237ms 2.2485ms 444.7497 Ops/s 351.3233 Ops/s $\textbf{\color{#35bf28}+26.59\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 1.9006ms 1.3557ms 737.5994 Ops/s 635.4712 Ops/s $\textbf{\color{#35bf28}+16.07\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 13.3442ms 13.1198ms 76.2208 Ops/s 69.4456 Ops/s $\textbf{\color{#35bf28}+9.76\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 16.5036ms 14.8599ms 67.2953 Ops/s 60.7055 Ops/s $\textbf{\color{#35bf28}+10.86\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 22.1208ms 21.7622ms 45.9512 Ops/s 42.6311 Ops/s $\textbf{\color{#35bf28}+7.79\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 16.8387ms 15.1120ms 66.1726 Ops/s 63.6196 Ops/s $\color{#35bf28}+4.01\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 23.1267ms 21.9826ms 45.4905 Ops/s 44.2041 Ops/s $\color{#35bf28}+2.91\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 20.5577ms 16.3898ms 61.0135 Ops/s 58.5814 Ops/s $\color{#35bf28}+4.15\%$

Copy link

github-actions bot commented Dec 3, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 149. Improved: $\large\color{#35bf28}17$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_simple 0.6945s 0.6934s 1.4421 Ops/s 1.3972 Ops/s $\color{#35bf28}+3.21\%$
test_transformed 0.9417s 0.9360s 1.0684 Ops/s 1.0675 Ops/s $\color{#35bf28}+0.08\%$
test_serial 2.1379s 2.0552s 0.4866 Ops/s 0.4869 Ops/s $\color{#d91a1a}-0.06\%$
test_parallel 1.8499s 1.7937s 0.5575 Ops/s 0.5592 Ops/s $\color{#d91a1a}-0.30\%$
test_step_mdp_speed[True-True-True-True-True] 0.4663ms 39.9199μs 25.0502 KOps/s 25.9819 KOps/s $\color{#d91a1a}-3.59\%$
test_step_mdp_speed[True-True-True-True-False] 0.4325ms 22.8188μs 43.8235 KOps/s 44.9100 KOps/s $\color{#d91a1a}-2.42\%$
test_step_mdp_speed[True-True-True-False-True] 0.4064ms 21.6057μs 46.2840 KOps/s 47.4744 KOps/s $\color{#d91a1a}-2.51\%$
test_step_mdp_speed[True-True-True-False-False] 0.3556ms 12.6224μs 79.2245 KOps/s 81.9958 KOps/s $\color{#d91a1a}-3.38\%$
test_step_mdp_speed[True-True-False-True-True] 0.4414ms 42.3858μs 23.5928 KOps/s 24.3083 KOps/s $\color{#d91a1a}-2.94\%$
test_step_mdp_speed[True-True-False-True-False] 0.3852ms 25.0601μs 39.9041 KOps/s 41.0681 KOps/s $\color{#d91a1a}-2.83\%$
test_step_mdp_speed[True-True-False-False-True] 0.4298ms 24.3555μs 41.0585 KOps/s 42.4921 KOps/s $\color{#d91a1a}-3.37\%$
test_step_mdp_speed[True-True-False-False-False] 0.4036ms 14.9993μs 66.6697 KOps/s 68.9790 KOps/s $\color{#d91a1a}-3.35\%$
test_step_mdp_speed[True-False-True-True-True] 0.4398ms 44.3260μs 22.5601 KOps/s 23.0921 KOps/s $\color{#d91a1a}-2.30\%$
test_step_mdp_speed[True-False-True-True-False] 0.4132ms 27.4268μs 36.4607 KOps/s 37.5190 KOps/s $\color{#d91a1a}-2.82\%$
test_step_mdp_speed[True-False-True-False-True] 0.3495ms 23.9344μs 41.7809 KOps/s 42.0700 KOps/s $\color{#d91a1a}-0.69\%$
test_step_mdp_speed[True-False-True-False-False] 0.4181ms 14.9889μs 66.7159 KOps/s 68.6073 KOps/s $\color{#d91a1a}-2.76\%$
test_step_mdp_speed[True-False-False-True-True] 0.4505ms 45.8442μs 21.8130 KOps/s 22.3786 KOps/s $\color{#d91a1a}-2.53\%$
test_step_mdp_speed[True-False-False-True-False] 0.4722ms 29.5295μs 33.8644 KOps/s 34.6783 KOps/s $\color{#d91a1a}-2.35\%$
test_step_mdp_speed[True-False-False-False-True] 0.4076ms 26.1743μs 38.2053 KOps/s 38.9781 KOps/s $\color{#d91a1a}-1.98\%$
test_step_mdp_speed[True-False-False-False-False] 0.3849ms 17.0160μs 58.7683 KOps/s 59.9507 KOps/s $\color{#d91a1a}-1.97\%$
test_step_mdp_speed[False-True-True-True-True] 0.4397ms 44.3290μs 22.5586 KOps/s 23.3776 KOps/s $\color{#d91a1a}-3.50\%$
test_step_mdp_speed[False-True-True-True-False] 0.3954ms 27.3679μs 36.5392 KOps/s 37.7352 KOps/s $\color{#d91a1a}-3.17\%$
test_step_mdp_speed[False-True-True-False-True] 0.4250ms 28.1288μs 35.5507 KOps/s 36.0455 KOps/s $\color{#d91a1a}-1.37\%$
test_step_mdp_speed[False-True-True-False-False] 0.4487ms 16.6430μs 60.0853 KOps/s 61.3644 KOps/s $\color{#d91a1a}-2.08\%$
test_step_mdp_speed[False-True-False-True-True] 0.4410ms 46.3629μs 21.5690 KOps/s 21.7290 KOps/s $\color{#d91a1a}-0.74\%$
test_step_mdp_speed[False-True-False-True-False] 60.2640μs 29.5395μs 33.8529 KOps/s 34.0968 KOps/s $\color{#d91a1a}-0.72\%$
test_step_mdp_speed[False-True-False-False-True] 3.1827ms 30.3004μs 33.0029 KOps/s 33.4025 KOps/s $\color{#d91a1a}-1.20\%$
test_step_mdp_speed[False-True-False-False-False] 0.3689ms 18.8546μs 53.0374 KOps/s 54.2075 KOps/s $\color{#d91a1a}-2.16\%$
test_step_mdp_speed[False-False-True-True-True] 0.3964ms 48.6691μs 20.5469 KOps/s 20.7946 KOps/s $\color{#d91a1a}-1.19\%$
test_step_mdp_speed[False-False-True-True-False] 68.7840μs 32.1332μs 31.1204 KOps/s 32.0485 KOps/s $\color{#d91a1a}-2.90\%$
test_step_mdp_speed[False-False-True-False-True] 0.4398ms 30.3576μs 32.9406 KOps/s 33.4512 KOps/s $\color{#d91a1a}-1.53\%$
test_step_mdp_speed[False-False-True-False-False] 0.4139ms 18.8776μs 52.9729 KOps/s 54.2948 KOps/s $\color{#d91a1a}-2.43\%$
test_step_mdp_speed[False-False-False-True-True] 0.4265ms 50.2812μs 19.8881 KOps/s 20.3421 KOps/s $\color{#d91a1a}-2.23\%$
test_step_mdp_speed[False-False-False-True-False] 0.4380ms 34.2288μs 29.2151 KOps/s 30.2007 KOps/s $\color{#d91a1a}-3.26\%$
test_step_mdp_speed[False-False-False-False-True] 66.1440μs 31.7019μs 31.5439 KOps/s 32.4299 KOps/s $\color{#d91a1a}-2.73\%$
test_step_mdp_speed[False-False-False-False-False] 0.4250ms 20.2165μs 49.4645 KOps/s 49.3123 KOps/s $\color{#35bf28}+0.31\%$
test_values[generalized_advantage_estimate-True-True] 24.7822ms 23.6168ms 42.3428 Ops/s 41.9198 Ops/s $\color{#35bf28}+1.01\%$
test_values[vec_generalized_advantage_estimate-True-True] 0.1101s 3.0792ms 324.7629 Ops/s 349.7628 Ops/s $\textbf{\color{#d91a1a}-7.15\%}$
test_values[td0_return_estimate-False-False] 0.1009ms 77.1235μs 12.9662 KOps/s 12.8170 KOps/s $\color{#35bf28}+1.16\%$
test_values[td1_return_estimate-False-False] 54.8366ms 52.4267ms 19.0742 Ops/s 18.6369 Ops/s $\color{#35bf28}+2.35\%$
test_values[vec_td1_return_estimate-False-False] 1.2554ms 1.0542ms 948.5793 Ops/s 936.3847 Ops/s $\color{#35bf28}+1.30\%$
test_values[td_lambda_return_estimate-True-False] 86.2196ms 83.5019ms 11.9758 Ops/s 11.7536 Ops/s $\color{#35bf28}+1.89\%$
test_values[vec_td_lambda_return_estimate-True-False] 1.3007ms 1.0488ms 953.4922 Ops/s 931.9326 Ops/s $\color{#35bf28}+2.31\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 24.4569ms 23.6256ms 42.3269 Ops/s 41.7949 Ops/s $\color{#35bf28}+1.27\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 0.9981ms 0.7258ms 1.3778 KOps/s 1.3667 KOps/s $\color{#35bf28}+0.81\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7440ms 0.6444ms 1.5519 KOps/s 1.5347 KOps/s $\color{#35bf28}+1.12\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.4990ms 1.4480ms 690.5945 Ops/s 687.0648 Ops/s $\color{#35bf28}+0.51\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.7156ms 0.6561ms 1.5241 KOps/s 1.4982 KOps/s $\color{#35bf28}+1.73\%$
test_dqn_speed[False-None] 6.7869ms 1.4462ms 691.4574 Ops/s 677.3677 Ops/s $\color{#35bf28}+2.08\%$
test_dqn_speed[False-backward] 2.0703ms 2.0217ms 494.6361 Ops/s 486.0447 Ops/s $\color{#35bf28}+1.77\%$
test_dqn_speed[True-None] 0.6621ms 0.5435ms 1.8398 KOps/s 1.8382 KOps/s $\color{#35bf28}+0.09\%$
test_dqn_speed[True-backward] 1.2308ms 1.1815ms 846.4086 Ops/s 820.7980 Ops/s $\color{#35bf28}+3.12\%$
test_dqn_speed[reduce-overhead-None] 0.6144ms 0.5440ms 1.8381 KOps/s 1.7928 KOps/s $\color{#35bf28}+2.53\%$
test_dqn_speed[reduce-overhead-backward] 1.0931ms 1.0557ms 947.2178 Ops/s 917.6494 Ops/s $\color{#35bf28}+3.22\%$
test_ddpg_speed[False-None] 3.0373ms 2.7206ms 367.5608 Ops/s 359.1988 Ops/s $\color{#35bf28}+2.33\%$
test_ddpg_speed[False-backward] 4.1056ms 4.0018ms 249.8845 Ops/s 244.4286 Ops/s $\color{#35bf28}+2.23\%$
test_ddpg_speed[True-None] 1.0991ms 1.0420ms 959.7306 Ops/s 923.8671 Ops/s $\color{#35bf28}+3.88\%$
test_ddpg_speed[True-backward] 2.3674ms 2.2424ms 445.9506 Ops/s 440.4900 Ops/s $\color{#35bf28}+1.24\%$
test_ddpg_speed[reduce-overhead-None] 1.1201ms 1.0559ms 947.0355 Ops/s 926.3453 Ops/s $\color{#35bf28}+2.23\%$
test_ddpg_speed[reduce-overhead-backward] 1.7720ms 1.7348ms 576.4511 Ops/s 562.4295 Ops/s $\color{#35bf28}+2.49\%$
test_sac_speed[False-None] 8.0079ms 7.5858ms 131.8260 Ops/s 128.3721 Ops/s $\color{#35bf28}+2.69\%$
test_sac_speed[False-backward] 11.0410ms 10.6005ms 94.3348 Ops/s 92.3157 Ops/s $\color{#35bf28}+2.19\%$
test_sac_speed[True-None] 1.6232ms 1.4785ms 676.3647 Ops/s 655.2473 Ops/s $\color{#35bf28}+3.22\%$
test_sac_speed[True-backward] 3.3706ms 3.2945ms 303.5325 Ops/s 318.2442 Ops/s $\color{#d91a1a}-4.62\%$
test_sac_speed[reduce-overhead-None] 23.4232ms 12.5865ms 79.4505 Ops/s 80.1820 Ops/s $\color{#d91a1a}-0.91\%$
test_sac_speed[reduce-overhead-backward] 1.6253ms 1.4958ms 668.5236 Ops/s 745.6832 Ops/s $\textbf{\color{#d91a1a}-10.35\%}$
test_redq_speed[False-None] 7.7900ms 7.0680ms 141.4819 Ops/s 138.7277 Ops/s $\color{#35bf28}+1.99\%$
test_redq_speed[False-backward] 11.9922ms 11.0107ms 90.8204 Ops/s 93.3886 Ops/s $\color{#d91a1a}-2.75\%$
test_redq_speed[True-None] 2.2033ms 1.9046ms 525.0345 Ops/s 503.6524 Ops/s $\color{#35bf28}+4.25\%$
test_redq_speed[True-backward] 3.7756ms 3.6887ms 271.0972 Ops/s 261.7237 Ops/s $\color{#35bf28}+3.58\%$
test_redq_speed[reduce-overhead-None] 1.9824ms 1.9071ms 524.3633 Ops/s 507.4507 Ops/s $\color{#35bf28}+3.33\%$
test_redq_speed[reduce-overhead-backward] 4.0936ms 3.6842ms 271.4310 Ops/s 265.2176 Ops/s $\color{#35bf28}+2.34\%$
test_redq_deprec_speed[False-None] 9.0211ms 8.5232ms 117.3265 Ops/s 110.6883 Ops/s $\textbf{\color{#35bf28}+6.00\%}$
test_redq_deprec_speed[False-backward] 12.1687ms 11.6553ms 85.7981 Ops/s 82.3848 Ops/s $\color{#35bf28}+4.14\%$
test_redq_deprec_speed[True-None] 2.4257ms 2.2720ms 440.1390 Ops/s 438.2179 Ops/s $\color{#35bf28}+0.44\%$
test_redq_deprec_speed[True-backward] 4.2773ms 3.8708ms 258.3476 Ops/s 258.8369 Ops/s $\color{#d91a1a}-0.19\%$
test_redq_deprec_speed[reduce-overhead-None] 2.3655ms 2.2634ms 441.8059 Ops/s 440.1128 Ops/s $\color{#35bf28}+0.38\%$
test_redq_deprec_speed[reduce-overhead-backward] 3.9887ms 3.8661ms 258.6590 Ops/s 258.9618 Ops/s $\color{#d91a1a}-0.12\%$
test_td3_speed[False-None] 7.6278ms 7.5741ms 132.0284 Ops/s 131.9462 Ops/s $\color{#35bf28}+0.06\%$
test_td3_speed[False-backward] 10.3134ms 9.7854ms 102.1934 Ops/s 103.0452 Ops/s $\color{#d91a1a}-0.83\%$
test_td3_speed[True-None] 1.6104ms 1.5636ms 639.5478 Ops/s 643.3406 Ops/s $\color{#d91a1a}-0.59\%$
test_td3_speed[True-backward] 3.2077ms 3.0500ms 327.8708 Ops/s 331.3874 Ops/s $\color{#d91a1a}-1.06\%$
test_td3_speed[reduce-overhead-None] 80.2212ms 25.6408ms 39.0004 Ops/s 38.7064 Ops/s $\color{#35bf28}+0.76\%$
test_td3_speed[reduce-overhead-backward] 1.3565ms 1.2880ms 776.3919 Ops/s 768.7172 Ops/s $\color{#35bf28}+1.00\%$
test_cql_speed[False-None] 16.7025ms 16.1344ms 61.9794 Ops/s 61.7171 Ops/s $\color{#35bf28}+0.42\%$
test_cql_speed[False-backward] 21.7160ms 21.0490ms 47.5082 Ops/s 47.7611 Ops/s $\color{#d91a1a}-0.53\%$
test_cql_speed[True-None] 2.9957ms 2.8843ms 346.7018 Ops/s 349.5752 Ops/s $\color{#d91a1a}-0.82\%$
test_cql_speed[True-backward] 5.3995ms 4.9730ms 201.0870 Ops/s 202.2470 Ops/s $\color{#d91a1a}-0.57\%$
test_cql_speed[reduce-overhead-None] 21.4994ms 13.0636ms 76.5486 Ops/s 78.4297 Ops/s $\color{#d91a1a}-2.40\%$
test_cql_speed[reduce-overhead-backward] 1.7472ms 1.6797ms 595.3436 Ops/s 587.0409 Ops/s $\color{#35bf28}+1.41\%$
test_a2c_speed[False-None] 3.2235ms 3.0995ms 322.6304 Ops/s 315.7425 Ops/s $\color{#35bf28}+2.18\%$
test_a2c_speed[False-backward] 6.7265ms 6.1014ms 163.8975 Ops/s 161.9480 Ops/s $\color{#35bf28}+1.20\%$
test_a2c_speed[True-None] 1.1603ms 1.0010ms 999.0491 Ops/s 1.0071 KOps/s $\color{#d91a1a}-0.80\%$
test_a2c_speed[True-backward] 2.7262ms 2.6800ms 373.1315 Ops/s 363.0167 Ops/s $\color{#35bf28}+2.79\%$
test_a2c_speed[reduce-overhead-None] 21.2173ms 11.4614ms 87.2492 Ops/s 89.8285 Ops/s $\color{#d91a1a}-2.87\%$
test_a2c_speed[reduce-overhead-backward] 1.2104ms 1.1356ms 880.6077 Ops/s 875.1958 Ops/s $\color{#35bf28}+0.62\%$
test_ppo_speed[False-None] 3.6084ms 3.4722ms 288.0028 Ops/s 272.4633 Ops/s $\textbf{\color{#35bf28}+5.70\%}$
test_ppo_speed[False-backward] 7.1122ms 6.7201ms 148.8072 Ops/s 146.8572 Ops/s $\color{#35bf28}+1.33\%$
test_ppo_speed[True-None] 1.1221ms 0.9400ms 1.0638 KOps/s 1.0546 KOps/s $\color{#35bf28}+0.88\%$
test_ppo_speed[True-backward] 2.9873ms 2.6319ms 379.9515 Ops/s 397.8325 Ops/s $\color{#d91a1a}-4.49\%$
test_ppo_speed[reduce-overhead-None] 0.5571ms 0.4994ms 2.0026 KOps/s 1.8724 KOps/s $\textbf{\color{#35bf28}+6.95\%}$
test_ppo_speed[reduce-overhead-backward] 1.1481ms 1.0954ms 912.8955 Ops/s 1.0165 KOps/s $\textbf{\color{#d91a1a}-10.19\%}$
test_reinforce_speed[False-None] 2.2606ms 2.1547ms 464.1046 Ops/s 449.9099 Ops/s $\color{#35bf28}+3.16\%$
test_reinforce_speed[False-backward] 3.2393ms 3.1993ms 312.5719 Ops/s 315.8017 Ops/s $\color{#d91a1a}-1.02\%$
test_reinforce_speed[True-None] 0.9316ms 0.8053ms 1.2417 KOps/s 1.1948 KOps/s $\color{#35bf28}+3.93\%$
test_reinforce_speed[True-backward] 2.5180ms 2.4696ms 404.9170 Ops/s 416.3912 Ops/s $\color{#d91a1a}-2.76\%$
test_reinforce_speed[reduce-overhead-None] 21.3497ms 11.3940ms 87.7658 Ops/s 89.5045 Ops/s $\color{#d91a1a}-1.94\%$
test_reinforce_speed[reduce-overhead-backward] 1.3425ms 1.1780ms 848.9322 Ops/s 951.8785 Ops/s $\textbf{\color{#d91a1a}-10.82\%}$
test_iql_speed[False-None] 9.5644ms 8.9587ms 111.6230 Ops/s 111.2186 Ops/s $\color{#35bf28}+0.36\%$
test_iql_speed[False-backward] 13.2251ms 12.6059ms 79.3278 Ops/s 80.0084 Ops/s $\color{#d91a1a}-0.85\%$
test_iql_speed[True-None] 1.8231ms 1.6992ms 588.5266 Ops/s 588.3990 Ops/s $\color{#35bf28}+0.02\%$
test_iql_speed[True-backward] 4.3870ms 4.0327ms 247.9702 Ops/s 233.0301 Ops/s $\textbf{\color{#35bf28}+6.41\%}$
test_iql_speed[reduce-overhead-None] 19.4545ms 11.2158ms 89.1596 Ops/s 90.6450 Ops/s $\color{#d91a1a}-1.64\%$
test_iql_speed[reduce-overhead-backward] 1.7138ms 1.5992ms 625.3138 Ops/s 634.7034 Ops/s $\color{#d91a1a}-1.48\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 7.9296ms 6.3437ms 157.6359 Ops/s 158.7336 Ops/s $\color{#d91a1a}-0.69\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.5457ms 0.2931ms 3.4121 KOps/s 2.9991 KOps/s $\textbf{\color{#35bf28}+13.77\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.4645ms 0.2892ms 3.4583 KOps/s 3.6333 KOps/s $\color{#d91a1a}-4.82\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.4506ms 5.9555ms 167.9124 Ops/s 166.3861 Ops/s $\color{#35bf28}+0.92\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2.0608ms 0.2702ms 3.7004 KOps/s 3.4618 KOps/s $\textbf{\color{#35bf28}+6.89\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5336ms 0.2741ms 3.6483 KOps/s 3.4520 KOps/s $\textbf{\color{#35bf28}+5.69\%}$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 1.4884ms 1.2875ms 776.6840 Ops/s 834.3013 Ops/s $\textbf{\color{#d91a1a}-6.91\%}$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 1.5962ms 1.2353ms 809.5078 Ops/s 824.3974 Ops/s $\color{#d91a1a}-1.81\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.6292ms 6.2740ms 159.3883 Ops/s 162.9109 Ops/s $\color{#d91a1a}-2.16\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2.3071ms 0.4630ms 2.1600 KOps/s 2.1366 KOps/s $\color{#35bf28}+1.09\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.6014ms 0.3938ms 2.5391 KOps/s 2.4095 KOps/s $\textbf{\color{#35bf28}+5.38\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 6.3160ms 6.0936ms 164.1062 Ops/s 167.6341 Ops/s $\color{#d91a1a}-2.10\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.6051ms 0.3094ms 3.2319 KOps/s 2.9493 KOps/s $\textbf{\color{#35bf28}+9.58\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.5139ms 0.2550ms 3.9217 KOps/s 3.1020 KOps/s $\textbf{\color{#35bf28}+26.42\%}$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 6.3808ms 6.0396ms 165.5749 Ops/s 166.7909 Ops/s $\color{#d91a1a}-0.73\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.8298ms 0.3059ms 3.2691 KOps/s 3.0762 KOps/s $\textbf{\color{#35bf28}+6.27\%}$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.5136ms 0.2693ms 3.7134 KOps/s 3.6102 KOps/s $\color{#35bf28}+2.86\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 6.2255ms 6.0733ms 164.6540 Ops/s 161.2171 Ops/s $\color{#35bf28}+2.13\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 0.8700ms 0.3939ms 2.5387 KOps/s 2.2967 KOps/s $\textbf{\color{#35bf28}+10.54\%}$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.5788ms 0.3724ms 2.6851 KOps/s 2.4326 KOps/s $\textbf{\color{#35bf28}+10.38\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 6.9195ms 5.3069ms 188.4331 Ops/s 188.1191 Ops/s $\color{#35bf28}+0.17\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 3.8414ms 1.8922ms 528.4861 Ops/s 444.0941 Ops/s $\textbf{\color{#35bf28}+19.00\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 8.6838ms 1.2038ms 830.7106 Ops/s 866.8547 Ops/s $\color{#d91a1a}-4.17\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 7.7028ms 5.4100ms 184.8422 Ops/s 188.7780 Ops/s $\color{#d91a1a}-2.08\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 8.2612ms 2.0050ms 498.7615 Ops/s 442.5517 Ops/s $\textbf{\color{#35bf28}+12.70\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 7.6900ms 1.2296ms 813.2667 Ops/s 883.4691 Ops/s $\textbf{\color{#d91a1a}-7.95\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.5167s 15.9015ms 62.8872 Ops/s 32.9251 Ops/s $\textbf{\color{#35bf28}+91.00\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 10.0707ms 2.1903ms 456.5580 Ops/s 459.8704 Ops/s $\color{#d91a1a}-0.72\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 7.2002ms 1.3012ms 768.5343 Ops/s 719.0874 Ops/s $\textbf{\color{#35bf28}+6.88\%}$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-True] 15.1972ms 14.6959ms 68.0460 Ops/s 68.0542 Ops/s $\color{#d91a1a}-0.01\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 19.4982ms 16.9637ms 58.9493 Ops/s 59.2672 Ops/s $\color{#d91a1a}-0.54\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-True] 20.1233ms 19.2591ms 51.9235 Ops/s 50.0154 Ops/s $\color{#35bf28}+3.82\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 19.9131ms 17.2040ms 58.1260 Ops/s 57.5307 Ops/s $\color{#35bf28}+1.03\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 20.0887ms 19.2647ms 51.9084 Ops/s 50.0612 Ops/s $\color{#35bf28}+3.69\%$
test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-False] 20.5109ms 18.5580ms 53.8851 Ops/s 53.5666 Ops/s $\color{#35bf28}+0.59\%$

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Dec 3, 2024
ghstack-source-id: aff610c34d130f62b2a7a4cc859b36d1c6e6bed9
Pull Request resolved: #2623
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Dec 4, 2024
ghstack-source-id: 45e29fe6418e57ceb5997f9547d9e52a356302c0
Pull Request resolved: #2623
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Dec 4, 2024
ghstack-source-id: f7257a61ce2443b6edbbcc064da8b3efc0483a95
Pull Request resolved: #2623
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Dec 4, 2024
ghstack-source-id: 578b2b0d8e5278dae37a56b8cb04ec6549822ae6
Pull Request resolved: #2623
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Dec 4, 2024
ghstack-source-id: 396baef4490d010cf55171280d6382257a25577f
Pull Request resolved: #2623
[ghstack-poisoned]
@vmoens vmoens mentioned this pull request Dec 19, 2024
Comment on lines +31 to +34
plotly
igraph
transformers
datasets
Copy link
Collaborator

@kurtamohler kurtamohler Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to mention in the tutorial that these need to be installed, along with anything else that may not be installed in a user's env by default, like tqdm. I also found that I needed nbformat installed to generate the plot of the tree


def select_unique_obs(td):
# Get the obs (the hash)
hashes = td["hash"]
Copy link
Collaborator

@kurtamohler kurtamohler Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like LLMHashingEnv uses the key "hashing", not "hash", so this currently fails in the generated tutorial: https://docs-preview.pytorch.org/pytorch/rl/2623/tutorials/beam_search_with_gpt.html#run-policy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!
I think we should avoid any "hash" that refers to a hashing value, bc it may override a built-in in some cases (eg hash = do_smth())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. tutorials
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants