Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rsz: Try multiple repairs per iteration #6262

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

kbieganski
Copy link
Contributor

@kbieganski kbieganski commented Nov 28, 2024

This makes RSZ try multiple fixes at once per iteration.
If the slack is worsened, fall back to one repair (old behavior).
Speeds up some specific designs significantly without really slowing anything else down.
The biggest gains I got were on a design that I can't share, unfortunately.

It follows a heuristic where the number of repairs is proprortional to the currently repaired path slack, with a cap of 10 repairs.
Probably not the best approach possible, but I tried multiple strategies and this one is the best so far.

Note: The code needs cleaning up, but I'll do that after I get some feedback.

asap7/<secret design 1>, floorplan

Branch Time [s] Relative WS TNS Power
Optimized 641.23 1.00 1.57 0.00 4.46
Baseline 1031.58 1.61 0.77 -3.64 4.46

asap7/<secret design 2>, floorplan

Branch Time [s] Relative WS TNS Power
Optimized 4882.00 1.00 0.71 -12.16 45.86
Baseline 8534.00 1.75 -22.62 -18.86 45.86

asap7/aes, floorplan

Branch Time [s] Relative WS TNS Power
Optimized 38.53 1.00 -36.44 -1088.10 0.11
Baseline 38.43 1.00 -36.44 -1088.10 0.11

asap7/aes, floorplan

Branch Time [s] Relative WS TNS Power
Optimized 13.92 1.00 0.09 -0.11 0.04
Baseline 14.16 1.02 0.09 -0.04 0.04

asap7/aes, floorplan

Branch Time [s] Relative WS TNS Power
Optimized 64.13 1.00 -0.08 -1.94 0.31
Baseline 70.62 1.10 -0.07 -1.89 0.31

asap7/aes, floorplan

Branch Time [s] Relative WS TNS Power
Optimized 55.13 1.01 0.32 0.00 0.13
Baseline 54.34 1.00 0.32 0.00 0.13

asap7/aes, floorplan

Branch Time [s] Relative WS TNS Power
Optimized 40.75 1.00 0.00 0.00 0.06
Baseline 40.97 1.01 0.00 -0.00 0.06

@kbieganski kbieganski force-pushed the rsz-multiple-repairs-per-iter branch from 390c298 to 4504aab Compare November 28, 2024 14:01
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

src/rsz/src/RepairSetup.cc Outdated Show resolved Hide resolved
@kbieganski kbieganski marked this pull request as draft November 28, 2024 15:50
@kbieganski kbieganski force-pushed the rsz-multiple-repairs-per-iter branch from 4504aab to 430d571 Compare December 4, 2024 10:00
Copy link
Contributor

github-actions bot commented Dec 4, 2024

clang-tidy review says "All clean, LGTM! 👍"

@kbieganski kbieganski force-pushed the rsz-multiple-repairs-per-iter branch from 430d571 to 439c31b Compare December 4, 2024 13:48
Copy link
Contributor

github-actions bot commented Dec 4, 2024

clang-tidy review says "All clean, LGTM! 👍"

@kbieganski kbieganski force-pushed the rsz-multiple-repairs-per-iter branch from 439c31b to e74e2f7 Compare December 4, 2024 21:40
Copy link
Contributor

github-actions bot commented Dec 4, 2024

clang-tidy review says "All clean, LGTM! 👍"

@kbieganski kbieganski marked this pull request as ready for review December 5, 2024 09:53
@povik
Copy link
Contributor

povik commented Dec 11, 2024

@kbieganski If you are interested in repair_timing performance see parallaxsw/OpenSTA#140

@QuantamHD
Copy link
Collaborator

@maliberty ready for review again

@maliberty
Copy link
Member

@precisionmoon please review

@precisionmoon
Copy link
Collaborator

@precisionmoon please review

Ok. Let me review the changes. @kbieganski, were you able to run ORFS tests? I see some small degradations in unit tests, so I'm curious what you see in other public PDK designs beyond asap7.

@@ -309,6 +309,15 @@ bool Resizer::bufferBetweenPorts(Instance* buffer)
bool Resizer::removeBuffer(Instance* buffer,
bool honorDontTouchFixed,
bool recordJournal)
{
if (canRemoveBuffer(buffer, honorDontTouchFixed)) {
removeBuffer2(buffer, recordJournal);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good to rename removeBuffer2 to something like doRemoveBuffer or removeBufferCore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I plan on doing that, I wanted to get some general feedback first.

@kbieganski
Copy link
Contributor Author

@kbieganski, were you able to run ORFS tests?

I also tested nangate45, I believe there was no significant difference. I'll dig those results up/rerun those tests and add them here.

@maliberty
Copy link
Member

I merged master to a copy of this branch and started a secure CI.

Comment on lines 480 to 420
worst slack -3.94
worst slack -4.44
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-trivial degradation.

Comment on lines 392 to 393
in_db_net->setDoNotTouch(false);
out_db_net->setDoNotTouch(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be restored when done to not lose user intent later in the flow.

@@ -537,7 +539,19 @@ bool RepairSetup::repairPath(PathRef& path,
|| (pair1.second == pair2.second && pair1.first > pair2.first);
});
// Attack gates with largest load delays first.
int repairs_per_iter = std::round(std::max(-path_slack * 10e9f, 1.0f));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an assumption that slack is in ns? This is dependent on the Liberty units and may vary by PDK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a constant that worked pretty well.
The final version was going to either have this factor configurable via a parameter, or have the number of repairs always be the maximum (10 here) for the first endpoint, and then linearly go down to 1 for the final endpoints.

@maliberty
Copy link
Member

In the public CI:

------------------------------
       Failing designs
------------------------------
asap7 mock-alu (base)
  Flow reached last stage.
  Found 1 metrics failures.
      [ERROR] finish__timing__setup__ws fail test: -549.646 >= -537.65

gf180 aes-hybrid (base)
  Flow reached last stage.
  Found 1 metrics failures.
      [ERROR] finish__timing__setup__ws fail test: -1.56671 >= -1.38

gf180 aes (base)
  Flow reached last stage.
  Found 1 metrics failures.
      [ERROR] finish__timing__setup__ws fail test: -1.28034 >= -1.17

gf180 ibex (base)
  Flow reached last stage.
  Found 1 metrics failures.
      [ERROR] finish__timing__drv__setup_violation_count fail test: 920.0 <= 794.0

nangate45 bp_fe (base)
  Flow reached last stage.
  Found 2 metrics failures.
      [ERROR] finish__timing__setup__ws fail test: -0.868694 >= -0.68
      [ERROR] finish__timing__wns_percent_delay fail test: -37.627236 >= -35.7

nangate45 swerv_wrapper (base)
  Last log file 3_3_place_gp.log
  Found 1 errors in the logs.
      [ERROR GPL-0307] RePlAce divergence detected. Re-run with a smaller max_phi_cof value.

sky130hd gcd (base)
  Flow reached last stage.
  Found 1 metrics failures.
      [ERROR] finish__timing__drv__setup_violation_count fail test: 43.0 <= 40.0

sky130hd microwatt (base)
  Last log file 5_1_grt.log
  Found 1 errors in the logs.
      [ERROR GRT-0116] Global routing finished with congestion. Check the congestion regions in the DRC Viewer.

The non-metrics failures should be investigated first.

@precisionmoon
Copy link
Collaborator

This is an interesting approach. Besides the CI test failures from @maliberty that need review, I have some additional questions:

  1. This approach tries multiple transforms at once. Does it lead to an over-fixing for one driver?
    For example, buffering, sizing, pin swap and cloning can all happen on the same driver. But only sizing may have been sufficient.
  2. Also, can this lead to under-fixing for the overall design? This approach may do aggressive fixing for a few drivers and can get stuck in local minima early. TNS degradation in repair_fanout7 from -294.3 to -412.3 may be due to this. Also, delayed buffer removal may make it harder to estimate its impact.
  3. Can we measure QoR at the end of the flow instead of at floorplanning? FP results can be misleading because this doesn't reflect the true QoR at the end.

@kbieganski
Copy link
Contributor Author

This approach tries multiple transforms at once. Does it lead to an over-fixing for one driver?

Do you mean one driver or one path? The idea is that you need multiple repairs for the worst path of the worst endpoint anyway, and then the number of repairs goes down when we get to paths with a better slack.

Can we measure QoR at the end of the flow instead of at floorplanning?

I'll try and get some measurements.

@kbieganski
Copy link
Contributor Author

@maliberty Where can I see those failures? Or do you mean private CI? Can't see it in Jenkins.

@maliberty
Copy link
Member

That particular pipeline is private but the test cases are all in ORFS so you can run them yourself.

@precisionmoon
Copy link
Collaborator

This approach tries multiple transforms at once. Does it lead to an over-fixing for one driver?

Do you mean one driver or one path? The idea is that you need multiple repairs for the worst path of the worst endpoint anyway, and then the number of repairs goes down when we get to paths with a better slack.

One driver. Currently, only one transform is tried per driver. But you will allow multiple ones.

Can we measure QoR at the end of the flow instead of at floorplanning?

I'll try and get some measurements.

Thank you.

@kbieganski
Copy link
Contributor Author

kbieganski commented Dec 19, 2024

One driver. Currently, only one transform is tried per driver. But you will allow multiple ones.

I think I'm confused by the terminology. In the code, we iterate over driver vertices in a given path. Only one transform is allowed per each vertex because if it's successful, it goes to the next iteration immediately.

Full flow results for asap7/ibex:

branch power_internal power_switching power_leakage power_total total_neg_slack worst_slack
Baseline 0.0264367 0.0280711 2.07997e-06 0.0545099 -681.398 -62.5979
This PR 0.0264367 0.0280711 2.07997e-06 0.0545099 -681.398 -62.5979

For nangate45/ibex:

branch power_internal power_switching power_leakage power_total total_neg_slack worst_slack
Baseline 0.0530963 0.0498528 0.000808576 0.103758 -74.3429 -0.278099
This PR 0.052714 0.0488411 0.000799717 0.102355 -44.9603 -0.184015

asap7/aes:

branch power_internal power_switching power_switching power_total total_neg_slack worst_slack time
Baseline 0.060006 0.0897325 0.0897325 0.14974 -3135.11 -44.9717 96.3463
This PR 0.060006 0.0897325 0.0897325 0.14974 -3135.11 -44.9717 95.4654

nangate45/aes:

branch power_internal power_switching power_switching power_total total_neg_slack worst_slack time
Baseline 0.24709 0.25256 0.25256 0.500457 -9.28 -0.131091 186.176
This PR 0.249503 0.254516 0.254516 0.50483 -7.16125 -0.119882 171.022

Also I think this is a mode that should only be enabled in floorplanning, as later on it doesn't seem to speed anything up.

@kbieganski
Copy link
Contributor Author

@maliberty Is there a particular way I should run these? If I run e.g.

make DESIGN_CONFIG=designs/sky130hd/microwatt/config.mk

congestion is ok:

[INFO GRT-0096] Final congestion report:
Layer         Resource        Demand        Usage (%)    Max H / Max V / Total Overflow
---------------------------------------------------------------------------------------
li1                  0             0            0.00%             0 /  0 /  0
met1           1537989        645858           41.99%             0 /  0 /  0
met2           1579174        696075           44.08%             0 /  0 /  0
met3           1044188        197190           18.88%             0 /  0 /  0
met4            475065        149522           31.47%             0 /  0 /  0
met5            137620          5504            4.00%             0 /  0 /  0
---------------------------------------------------------------------------------------
Total          4774036       1694149           35.49%             0 /  0 /  0

(unless I'm misinterpreting it)
And the SweRV one is successful as well.

@maliberty
Copy link
Member

You want the metadata target to check against the limits. Usually I use make finish metadata

@kbieganski kbieganski force-pushed the rsz-multiple-repairs-per-iter branch from e74e2f7 to 53bcae0 Compare January 3, 2025 22:58
@kbieganski kbieganski force-pushed the rsz-multiple-repairs-per-iter branch from 53bcae0 to 5b71e5b Compare January 8, 2025 12:52
Copy link
Contributor

github-actions bot commented Jan 8, 2025

clang-tidy review says "All clean, LGTM! 👍"

Signed-off-by: Krzysztof Bieganski <[email protected]>
Signed-off-by: Krzysztof Bieganski <[email protected]>
@kbieganski kbieganski force-pushed the rsz-multiple-repairs-per-iter branch from 5b71e5b to 3a7b658 Compare January 8, 2025 13:24
Copy link
Contributor

github-actions bot commented Jan 8, 2025

clang-tidy review says "All clean, LGTM! 👍"

Signed-off-by: Krzysztof Bieganski <[email protected]>
Copy link
Contributor

github-actions bot commented Jan 8, 2025

clang-tidy review says "All clean, LGTM! 👍"

@kbieganski
Copy link
Contributor Author

I put this feature behind a flag, and added separate tests for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants