Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

false hold violations and false negatives on setup violations in floorplan due to mixing propagated clock in macros and ideal clock of top level #6181

Closed
oharboe opened this issue Nov 18, 2024 · 49 comments · Fixed by #6458
Assignees
Labels
sta Static Timing Analysis

Comments

@oharboe
Copy link
Collaborator

oharboe commented Nov 18, 2024

Description

In megaboom there are macros that have a non-trivial network clock insertion latency internally. This causes the floorplan with 0 clock network insertion latency at the top level to believe that there are hold violations.

megaboom uses HOLD_SLACK_MARGIN=-300, so no hold cells are inserted in this case.

In the past, I have seen megaboom gnaw on the problem of how to get rid of these hold violations for a very long time.

image

Doesn't this case false negatives on setup violations too?

Suggested Solution

Unsure. Modify timing to set network clock insertion latency of macros to 0 instead of using a mix of ideal and propagated clock.

Additional Context

No response

@oharboe oharboe changed the title Deal with bogus hold violations in floorplan false hold violations and false negatives on setup violations in floorplan due to mixing propagated clock in macros and ideal clock of top level Nov 18, 2024
@maliberty
Copy link
Member

My plan has been to turn off hold fixing in floorplan which should make this moot.

@oharboe
Copy link
Collaborator Author

oharboe commented Nov 20, 2024

My plan has been to turn off hold fixing in floorplan which should make this moot.

What about false setup violations?

If the toplevel has 0 clock latency(ideal clock) and the macro has propagated clock and significant clock insertion latency then there can easily be a false setup violation during floorplan timing repair.

@maliberty
Copy link
Member

Whether it helps or hurts depends if it is on the launching or capturing end of the path. Its a good question for @tspyrou as to what is the normal sta handling here.

@oharboe
Copy link
Collaborator Author

oharboe commented Nov 21, 2024

Whether it helps or hurts depends if it is on the launching or capturing end of the path. Its a good question for @tspyrou as to what is the normal sta handling here.

I see. I didn't think about the case where a signal is captured inside a macro, but yes: there you get false setup non-violations of course.

@tspyrou
Copy link
Contributor

tspyrou commented Nov 23, 2024

I think turning off propagated clocks until after cts would help with this. Hold fixing could also be delayed until after cts.

@oharboe
Copy link
Collaborator Author

oharboe commented Nov 23, 2024

I think turning off propagated clocks until after cts would help with this. Hold fixing could also be delayed until after cts.

But the macros are already past CTS and they have a clock network insertion latency regardless of whether clocks propagation is turned off, no?

Also, you skipped past the question of what about false timing violations in setup due to network insertion latency in macros.

@tspyrou
Copy link
Contributor

tspyrou commented Nov 23, 2024

I dont think the network insertion delay would be propagated if propagated clocks are off. @precisionmoon can you comment?

@precisionmoon
Copy link
Contributor

Correct, macro insertion delays should not count in idea l clock mode. Is this not what @oharboe is seeing?

@oharboe
Copy link
Collaborator Author

oharboe commented Nov 23, 2024

Correct, macro insertion delays should not count in idea l clock mode. Is this not what @oharboe is seeing?

In floorplan, I see that false hold violations are being repaired, which takes "forever", but I dont have an actual example of false setup violations being repaired.

Next step is that someone who understands this better than me creates a PR for floorplan.tcl to disable hold repair and not to propagate the clock and all should be well?

@maliberty
Copy link
Member

The clocks are not propagated in floorplan already. I have a PR to disable hold fixing but if you are right the problem could affect setup as well.

If you "see that false hold violations are being repaired" then you should have an example based on what you see.

@oharboe
Copy link
Collaborator Author

oharboe commented Nov 23, 2024

Didnt @rovinski say something like: macros dont have clock insertion latency, only setup and hold requirements"?

If that is the case, why does it matter if you propagate the clock or not?

I will create some examples next time I run across them.

@rovinski
Copy link
Collaborator

Macros can have clock insertion annotated on them, but they don't have to. All of that info is captured within the setup and hold requirements of the block terms.

@maliberty
Copy link
Member

If you simply move the clock delay into setup/hold it doesn't solve the problem at all. It just become part of another component and there is no hope to remove it from those.

@oharboe
Copy link
Collaborator Author

oharboe commented Nov 24, 2024

Macros can have clock insertion annotated on them, but they don't have to. All of that info is captured within the setup and hold requirements of the block terms.

If I understand you correctly, the clock insertion delay is, and must per .lib format definition, captured within the setup and hold requirements. The clock insertion delay is just an optional annotation.

@maliberty
Copy link
Member

Yes but if you want to adjust timing for ideal clocks it would have to be there.

@maliberty maliberty added the sta Static Timing Analysis label Nov 24, 2024
@maliberty
Copy link
Member

I have nothing useful to add here so I'll leave it up to @tspyrou

@tspyrou
Copy link
Contributor

tspyrou commented Nov 24, 2024

I would like to see the report_checks with full clock details for the false hold violation.

@rovinski
Copy link
Collaborator

If I understand you correctly, the clock insertion delay is, and must per .lib format definition, captured within the setup and hold requirements. The clock insertion delay is just an optional annotation.

Setup and hold requirements in the .lib are specified as a timing arc from another pin (i.e. the clock). So for example, if you had a register with 0 clock insertion delay inside a block, the constraints might look like:

Setup time: 4 ps
Hold time: 2 ps

But if you have an internal clock tree with 100 ps of insertion delay, the same constraint would now look like:

setup time: (4 - 100) = -96 ps
hold time: (100 + 2) = 102 ps

These are both specified relative to the clock pin so these constraints are saying

  • Setup: the signal must arrive at the signal pin earlier than -96 ps before the clock arrives at the clock pin (earlier than 96 ps after the clock)
  • Hold: the signal must remain stable for 102 ps after the clock

image

So, the effect of the clock insertion delay is completely captured within the setup and hold constraints. They do not need to be annotated separately for proper timing closure, although it can be useful to know. You might be able to roughly approximate the clock insertion latency by taking the midpoint between the negative setup time and hold time, for example,

(-(-96 ps) + 102 ps)/2 = 99 ps

Which is close to the actual insertion delay of 100 ps.

@oharboe
Copy link
Collaborator Author

oharboe commented Nov 25, 2024

@rovinski Thanks! This is my understanding too. What I don't understand, exactly, is what it means to do timing repair in floorplan, before a clock tree exists, in the case of a macro that has a large clock network insertion latency encoded into its setup/hold time requirements.

To my mind, if the clock is ideal, then that creates a mismatch between the macro and the floorplan's ideal clock. I don't know what it means to propagate a clock before CTS.

It could be that I don't understand things I don't need to understand, it could be that ORFS/openroad needs to change, I don't know.

@oharboe
Copy link
Collaborator Author

oharboe commented Nov 25, 2024

To reproduce, open up floorplan in mock-array:

make DESIGN_CONFIG=designs/asap7/mock-array/config.mk synth do-2_1_floorplan gui_2_1_floorplan

Below is what I believe to be a false setup violation. 156.17 156.17 ^ ces_0_4/io_lsbOuts_3 (Element) includes ca. 70ps of clock insertion delay.

The same problem can occur for a path starting/ending in a macro or hold violations going into a macro.

>>> report_checks -path_group reg2reg
Startpoint: ces_0_4 (rising edge-triggered flip-flop clocked by clock)
Endpoint: REG_0$_DFF_P_ (rising edge-triggered flip-flop clocked by clock)
Path Group: reg2reg
Path Type: max

  Delay    Time   Description
---------------------------------------------------------
   0.00    0.00   clock clock (rise edge)
   0.00    0.00   clock network delay (ideal)
   0.00    0.00 ^ ces_0_4/clock (Element)
 156.17  156.17 ^ ces_0_4/io_lsbOuts_3 (Element)
  41.44  197.61 ^ ces_0_5/io_lsbOuts_2 (Element)
  39.23  236.83 ^ ces_0_6/io_lsbOuts_1 (Element)
  29.56  266.39 ^ ces_0_7/io_lsbOuts_0 (Element)
   0.00  266.39 ^ REG_0$_DFF_P_/D (DFFHQNx1_ASAP7_75t_R)
         266.39   data arrival time

 250.00  250.00   clock clock (rise edge)
   0.00  250.00   clock network delay (ideal)
   0.00  250.00   clock reconvergence pessimism
         250.00 ^ REG_0$_DFF_P_/CLK (DFFHQNx1_ASAP7_75t_R)
 -14.96  235.04   library setup time
         235.04   data required time
---------------------------------------------------------
         235.04   data required time
        -266.39   data arrival time
---------------------------------------------------------
         -31.35   slack (VIOLATED)

If I look at the same path after CTS, no violation:

>>> report_checks -to {REG_0$_DFF_P_/D}
Startpoint: ces_0_4 (rising edge-triggered flip-flop clocked by clock)
Endpoint: REG_0$_DFF_P_ (rising edge-triggered flip-flop clocked by clock)
Path Group: reg2reg
Path Type: max

  Delay    Time   Description
---------------------------------------------------------
   0.00    0.00   clock clock (rise edge)
 323.93  323.93   clock network delay (propagated)
   0.00  323.93 ^ ces_0_4/clock (Element)
 156.18  480.11 ^ ces_0_4/io_lsbOuts_3 (Element)
  41.45  521.56 ^ ces_0_5/io_lsbOuts_2 (Element)
  39.24  560.79 ^ ces_0_6/io_lsbOuts_1 (Element)
  31.66  592.45 ^ ces_0_7/io_lsbOuts_0 (Element)
   0.33  592.78 ^ REG_0$_DFF_P_/D (DFFHQNx2_ASAP7_75t_R)
         592.78   data arrival time

 250.00  250.00   clock clock (rise edge)
 374.26  624.26   clock network delay (propagated)
   2.82  627.08   clock reconvergence pessimism
         627.08 ^ REG_0$_DFF_P_/CLK (DFFHQNx2_ASAP7_75t_R)
  -5.90  621.18   library setup time
         621.18   data required time
---------------------------------------------------------
         621.18   data required time
        -592.78   data arrival time
---------------------------------------------------------
          28.39   slack (MET)

Note that the clock tree has a significant amount of skew, between flops and macros, so the effect should be even more announced with less skew. The skew for the path above is ca. 323+70(macro clock insertion latency as from my inspection of the Element macro) - 374 = 19. So little skew for this particular path.

So, near as I can see, floorplan had an error of 28+19+31.35 = 80ps for this particular path w.r.t. its policy w.r.t. repairing or not repairing.

image

@oharboe
Copy link
Collaborator Author

oharboe commented Nov 25, 2024

@tspyrou You wanted an example of false hold violations... These violations do not exist in CTS.

untar and run mock-array-false-hold-violations.tar.gz

based on The-OpenROAD-Project/OpenROAD-flow-scripts#2589

[deleted]
---------------------------------------------------------------------------------------------------
[WARNING RSZ-0062] Unable to repair all setup violations.
[INFO RSZ-0046] Found 2048 endpoints with hold violations.
Iteration | Resized | Buffers | Cloned Gates |   WNS   |   TNS   | Endpoint
---------------------------------------------------------------------------
        0 |       0 |       0 |            0 | -44.432 | -75327.305 | ces_0_0/io_ins_up[56]
    final |       0 |    1678 |            0 | -37.475 | -41988.176 | ces_7_7/io_ins_left[32]
---------------------------------------------------------------------------
[WARNING RSZ-0066] Unable to repair all hold violations.
[INFO RSZ-0032] Inserted 1678 hold buffers.
[ERROR RSZ-0060] Max buffer count reached.
Error: floorplan.tcl, 112 RSZ-0060
openroad> 

@maliberty
Copy link
Member

I think of it as
Ideal Clocks

If there were no macro boundary then the delays of the clock buffers would all be zero in ideal clock mode. However since they are inside the macro they are baked into the setup/hold time of the macro pin relative to the clock pin of the macro. Since we record the delay from clk to the C ff's clock we can remove it in ideal clock mode from the setup/hold and accomplish the same as the flat case. If you don't have that data I see no hope but with it we still need to account for it in sta.

@maliberty
Copy link
Member

@precisionmoon
Copy link
Contributor

@rovinski, my understanding is that macro cell insertion delays don't impact setup/hold times for commercial signoff timers. Can you confirm this? Also, the default for commercial timing model extractor is not to generate insertion delays. For OpenSTA, the default is to generate them. For pre-CTS, it makes sense to use timing models without insertion delays and switch to models with insertion delays after CTS.

@maliberty
Copy link
Member

The setup/hold have to include the clock insertion delay otherwise it wouldn't work with propagated clocks, no?

@oharboe
Copy link
Collaborator Author

oharboe commented Nov 26, 2024

@maliberty So you're thinking about creating an abstract from floorplan, then switch the abstract at CTS time?

So continuing:

(The "make do-" prefixes are to work around the make dependencies which would redo stages I don't want redone when I'm mocking floorplan abstract being used in timing repair at the top level)

ABSTRACT_SOURCE=2_floorplan make DESIGN_CONFIG=designs/asap7/mock-array/Element/config.mk floorplan do-generate_abstract
make DESIGN_CONFIG=designs/asap7/mock-array/config.mk place

This works, no false setup or hold violations in floorplan, nor in repair_timing in place:

repair_timing -verbose -setup_margin 0 -repair_tns 100
[WARNING RSZ-0021] no estimated parasitics. Using wire load models.
[INFO RSZ-0098] No setup violations found
[INFO RSZ-0033] No hold violations found.

Then continue with:

make DESIGN_CONFIG=designs/asap7/mock-array/Element/config.mk final generate_abstract
make DESIGN_CONFIG=designs/asap7/mock-array/config.mk do-cts

This is bizarre... What does it mean to have a negative TNS when WNS is positive??

[INFO RSZ-0046] Found 232 endpoints with hold violations.
Iteration | Resized | Buffers | Cloned Gates |   WNS   |   TNS   | Endpoint
---------------------------------------------------------------------------
        0 |       0 |       0 |            0 |   4.962 | -4522.159 | io_outs_right_4_REG\[51\]$_DFF_P_/D
    final |       0 |     496 |            0 |   0.110 |   0.000 | io_outs_down_7_REG\[63\]$_DFF_P_/D
---------------------------------------------------------------------------
[INFO RSZ-0032] Inserted 496 hold buffers.

The macro placement is a bit funny because after I increased the die area to make place for the flip-flops II added in the PR:

image

The macro placement problem is a non-sequitor to the current discussion, but mock-array needs to have an even amount of space around all the elements in the middle to fit input/output flipflops. Otherwise the flip flops end up being put in strange places like in the middle of the rows. If the PR is to be merged because of its value to study the false setup/hold violations in floorplan, then this macro placement problem has to be fixed.

image

@oharboe
Copy link
Collaborator Author

oharboe commented Nov 26, 2024

Filed an issue report on macro placement.

@rovinski
Copy link
Collaborator

rovinski commented Nov 27, 2024

To my mind, if the clock is ideal, then that creates a mismatch between the macro and the floorplan's ideal clock. I don't know what it means to propagate a clock before CTS.

This makes sense and I think it is true. If the clock delay is not annotated, then the only way to infer an ideal clock would be the (hold-setup)/2 technique or similar. I have no idea if we are doing that now.

Since we record the delay from clk to the C ff's clock we can remove it in ideal clock mode from the setup/hold and accomplish the same as the flat case.

I'm still a little confused, do you remove the delay from the clock source to the block clock pin, or remove the delay from the clock source all the way to the C FF? I don't see how the latter is possible unless the block has the clock latency property, which many .libs don't.

The only way I see to do it is to estimate the clock latency based on the setup times and hold times. One way to do that could be to find a clock delay X such that Σ((hold-setup)/2 - X) = 0 Where the summation is over all input terms. Perhaps there are easier ways, but I don't see it. There might have to be some balancing taken into account for the output terms as well.

I have no idea if OR does something similar to that right now. If there is no adjustment to the timing on the block, then @oharboe is absolutely right that there will be false timing on pre-CTS optimization on signals coming out of large macros.

@maliberty
Copy link
Member

If OR generates the .lib abstract then we should have the clock latency property in the .lib. For vendor IPs it could be more of an issue. I'm not sure how this is handled in other tools in that case.

(@gadfort any experience with this?)

@gadfort
Copy link
Collaborator

gadfort commented Nov 27, 2024

  1. Hold violations should not really be considered until after the clock tree is generated and the clock is propagated.
  2. Looking through the timing reports from various stages it doesn't look like the tools are doing anything different with the IP (though they do maintain a faux clock tree so the clock stays propagated from early on), which might account for the lack of differentiation.

@maliberty
Copy link
Member

@gadfort what is a "faux clock tree"?

@parallaxsw
Copy link

If the timing model is generated with propagated clocks, the timing
model setup/hold times will include the clock network delays. If the
model is then used in a context with ideal clocks, the setup/hold
times will be incorrect because they include clock network delays
inside the macro. This will generate the "false negatives" oyvind is
seeing.

OpenSTA can't use the model/liberty min/max_clock_tree_path delays to
correct the setup/hold times for a context with ideal clocks because
they charaterize the whole macro clock tree, not the clock delay for a
specific input setup/hold time.

The only way around the issue that I can see is to genarate separate
timing models for use in contexts with ideal clocks and propagated
clocks.

@oharboe
Copy link
Collaborator Author

oharboe commented Nov 30, 2024

Some thoughts on what can be done now with ORFS as is and what could be done:

  • @gadfort Any thoughts on how to implement a "faux clock tree" in OpenROAD? It sounds like this is a good long term solution where macros can have significant clock tree insertion latency without causing problems when used in combination with ideal clock.
  • Possible workaround: use SETUP/HOLD_SLACK_MARGIN smaller than -max(clock insertion latency of macros) up until CTS. This would limit timing repair to repairing paths that pathologically bad and not overrepair in ancitipation of post CTS timing repair.
  • modify timing repair to ignore paths with non-trivial setup/hold at path ends in ideal clock mode as with ideal clock one can't know if these timing violations are real or not.
  • Following @parallaxsw proposal works, but it can only work for macros that are created by ORFS. I suppose 3rd party macros won't come with two .lib files.

At this point, I'm hope @gadfort can shed some light on whether a "faux clock tree" could be implemented in OpenROAD before pondering the next step. A "faux clock tree" is to me an "ideal clock tree" with zero skew, but a clock insertion latency that is large enough that the zero skew applies to flip flops as well as macros with non-zero clock insertion latency.

@maliberty
Copy link
Member

There is no way to achieve zero skew if you don't know what the clock tree inside the macro looks like. The only information we have is the single value in the .lib which doesn't give the exact information. A faux clock tree can't achieve zero skew if there is skew inside the macro already.

We could simply refuse to repair any path to a macro with *_clock_tree_path in ideal clock mode.

@oharboe
Copy link
Collaborator Author

oharboe commented Nov 30, 2024

Input/output constraints that end up in a flip flop can have problems too.

reg2reg paths, ignoring macros, seems like the only thing that can be safely repaired?

@maliberty
Copy link
Member

You have full control of io constraints and can provide ideal clock values.

@oharboe
Copy link
Collaborator Author

oharboe commented Dec 1, 2024

You have full control of io constraints and can provide ideal clock values.

Investigating the make DESIGN_CONFIG=designs/nangate45/config.mk results, I find that the set_input_delay has identical numbers for set_input_delay for 2_floorplan.sdc and 4_cts.sdc, makes sense the number in set_input_delay is w.r.t. the clock pin, which is identical in ideal and propagated clock case.

Normally I would expect false negative hold violations in floorplan for a path from an input pin to a flip flop, because the clock will get to flip flop in 0 time with ideal clock. Only when the after CTS do I expect these violations to appear.

I'm not sure if this is a problem(no hold violations repaired in floorplan, whereas they exist in CTS) or if it is, what I as a user should do about it. Not inserting hold cells early, when they should be there, will make placement worse: space was not reserved for something that is going to be needed after CTS. Also, isn't it better to have hold cells placed than added in repair? Same for false negatives on setup violations.

image

I don't know how a the "faux clock tree" works, it will be interesting to learn more about @gadfort thoughts.

If there is a "faux clock tree" that resembles the final clock tree, then I can see how constraints can be articulated and repair can take place.

@precisionmoon
Copy link
Contributor

  • Following @parallaxsw proposal works, but it can only work for macros that are created by ORFS. I suppose 3rd party macros won't come with two .lib files.

Well, third party liberty files can be edited to remove insertion delays for pre-CTS steps.

Another way to mitigate false hold violations is to use clock uncertainty for hold. I think negative values can be used to make hold analysis less conservative.

@jjcherry56
Copy link
Contributor

I thought about this some more and I think it actually makes sense for OpenSTA to remove the liberty min/max_clock_tree_delays from the setup/hold times when the clocks are ideal. That sort of mimics the best that CTS can do to balance the clock tree. It isn't the same as having ideal clocks in the macro but it does correspond to an ideal top level clock network. And it would certainly reduce the magnitude of the violations.

@tspyrou
Copy link
Contributor

tspyrou commented Dec 3, 2024

I thought about this some more and I think it actually makes sense for OpenSTA to remove the liberty min/max_clock_tree_delays from the setup/hold times when the clocks are ideal. That sort of mimics the best that CTS can do to balance the clock tree. It isn't the same as having ideal clocks in the macro but it does correspond to an ideal top level clock network. And it would certainly reduce the magnitude of the violations.

@precisionmoon what do you think about this solution? It does represent a model of the best that CTS could do.

@precisionmoon
Copy link
Contributor

Yes, the timer change proposed by Cherry makes sense. If insertion delays are not included for setup/hold violations in ideal clock mode, there is no need to keep multiple liberty models for macros.. If some pre-CTS hold fixing is needed, positive hold clock uncertainty can be added based on some early CTS estimates. For congested designs, late stage hold fixing may not be possible because there is no room to insert buffers around registers. For designs where there is plenty of room around registers, this won't be an issue.

@jjcherry56
Copy link
Contributor

@oharboe is correct. Clocked paths from the macro also need to remove the macro clock delays when the model is used with ideal clocks. Changing output delay constraints does not solve the problem to outputs because the path from the macro may merge with other paths at multiple input gates before it gets to the output.

@maliberty
Copy link
Member

Unless the other paths it merges with are purely combinational from an input they will also have a clock delay to remove. It seems like Liberty should let you distinguish those.

@jjcherry56
Copy link
Contributor

The following OpenSTA commit removes the macro model clock tree delay from the setup/hold
times and the clock to output delays when the context uses ideal clocks. It will not be exactly
the same as using ia separate model built with deal clocks but it should eliminate the bulk of the
differences.
a82361ce timing models rm clk tree delay clk->output w/ideal clk conttext

@oharboe
Copy link
Collaborator Author

oharboe commented Dec 8, 2024

I had a go at fleshing out the SETUP/HOLD_SLACK_MARGIN docs: The-OpenROAD-Project/OpenROAD-flow-scripts#2615

@oharboe
Copy link
Collaborator Author

oharboe commented Dec 8, 2024

@parallaxsw FYI, I retried #6181 (comment) with parallaxsw/OpenSTA@a82361c

As expected, there are now no false timing violations to be repaired:

$ ./run-me-mock-array-asap7-base.sh 
[deleted]
==========================================================================
Floorplan check_setup
--------------------------------------------------------------------------
Warning: There are 2049 input ports missing set_input_delay.
Warning: There are 2112 output ports missing set_output_delay.
Warning: There are 2080 unconstrained endpoints.
number instances in verilog is 8385
[INFO IFP-0001] Added 1424 rows of 6560 site asap7sc7p5t.
repair_timing -verbose -setup_margin 0 -repair_tns 100
[WARNING RSZ-0021] no estimated parasitics. Using wire load models.
[INFO RSZ-0098] No setup violations found
[INFO RSZ-0033] No hold violations found.
[deleted]
openroad> 

@maliberty
Copy link
Member

@eder-matheus please update the sta submodule to get this change.

@eder-matheus
Copy link
Contributor

@eder-matheus please update the sta submodule to get this change.

I've started a secure-ci for this update.

@eder-matheus
Copy link
Contributor

@oharboe Sorry for the delay on it, but we finally merged the latest STA changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sta Static Timing Analysis
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants