-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
false hold violations and false negatives on setup violations in floorplan due to mixing propagated clock in macros and ideal clock of top level #6181
Comments
My plan has been to turn off hold fixing in floorplan which should make this moot. |
What about false setup violations? If the toplevel has 0 clock latency(ideal clock) and the macro has propagated clock and significant clock insertion latency then there can easily be a false setup violation during floorplan timing repair. |
Whether it helps or hurts depends if it is on the launching or capturing end of the path. Its a good question for @tspyrou as to what is the normal sta handling here. |
I see. I didn't think about the case where a signal is captured inside a macro, but yes: there you get false setup non-violations of course. |
I think turning off propagated clocks until after cts would help with this. Hold fixing could also be delayed until after cts. |
But the macros are already past CTS and they have a clock network insertion latency regardless of whether clocks propagation is turned off, no? Also, you skipped past the question of what about false timing violations in setup due to network insertion latency in macros. |
I dont think the network insertion delay would be propagated if propagated clocks are off. @precisionmoon can you comment? |
Correct, macro insertion delays should not count in idea l clock mode. Is this not what @oharboe is seeing? |
In floorplan, I see that false hold violations are being repaired, which takes "forever", but I dont have an actual example of false setup violations being repaired. Next step is that someone who understands this better than me creates a PR for floorplan.tcl to disable hold repair and not to propagate the clock and all should be well? |
The clocks are not propagated in floorplan already. I have a PR to disable hold fixing but if you are right the problem could affect setup as well. If you "see that false hold violations are being repaired" then you should have an example based on what you see. |
Didnt @rovinski say something like: macros dont have clock insertion latency, only setup and hold requirements"? If that is the case, why does it matter if you propagate the clock or not? I will create some examples next time I run across them. |
Macros can have clock insertion annotated on them, but they don't have to. All of that info is captured within the setup and hold requirements of the block terms. |
If you simply move the clock delay into setup/hold it doesn't solve the problem at all. It just become part of another component and there is no hope to remove it from those. |
If I understand you correctly, the clock insertion delay is, and must per .lib format definition, captured within the setup and hold requirements. The clock insertion delay is just an optional annotation. |
Yes but if you want to adjust timing for ideal clocks it would have to be there. |
I have nothing useful to add here so I'll leave it up to @tspyrou |
I would like to see the report_checks with full clock details for the false hold violation. |
Setup and hold requirements in the .lib are specified as a timing arc from another pin (i.e. the clock). So for example, if you had a register with 0 clock insertion delay inside a block, the constraints might look like: Setup time: 4 ps But if you have an internal clock tree with 100 ps of insertion delay, the same constraint would now look like: setup time: (4 - 100) = -96 ps These are both specified relative to the clock pin so these constraints are saying
So, the effect of the clock insertion delay is completely captured within the setup and hold constraints. They do not need to be annotated separately for proper timing closure, although it can be useful to know. You might be able to roughly approximate the clock insertion latency by taking the midpoint between the negative setup time and hold time, for example, (-(-96 ps) + 102 ps)/2 = 99 ps Which is close to the actual insertion delay of 100 ps. |
@rovinski Thanks! This is my understanding too. What I don't understand, exactly, is what it means to do timing repair in floorplan, before a clock tree exists, in the case of a macro that has a large clock network insertion latency encoded into its setup/hold time requirements. To my mind, if the clock is ideal, then that creates a mismatch between the macro and the floorplan's ideal clock. I don't know what it means to propagate a clock before CTS. It could be that I don't understand things I don't need to understand, it could be that ORFS/openroad needs to change, I don't know. |
To reproduce, open up floorplan in mock-array:
Below is what I believe to be a false setup violation. The same problem can occur for a path starting/ending in a macro or hold violations going into a macro.
If I look at the same path after CTS, no violation:
Note that the clock tree has a significant amount of skew, between flops and macros, so the effect should be even more announced with less skew. The skew for the path above is ca. 323+70(macro clock insertion latency as from my inspection of the Element macro) - 374 = 19. So little skew for this particular path. So, near as I can see, floorplan had an error of 28+19+31.35 = 80ps for this particular path w.r.t. its policy w.r.t. repairing or not repairing. |
@tspyrou You wanted an example of false hold violations... These violations do not exist in CTS. untar and run mock-array-false-hold-violations.tar.gz based on The-OpenROAD-Project/OpenROAD-flow-scripts#2589
|
If there were no macro boundary then the delays of the clock buffers would all be zero in ideal clock mode. However since they are inside the macro they are baked into the setup/hold time of the macro pin relative to the clock pin of the macro. Since we record the delay from clk to the C ff's clock we can remove it in ideal clock mode from the setup/hold and accomplish the same as the flat case. If you don't have that data I see no hope but with it we still need to account for it in sta. |
If you want to fiddle with the diagram https://docs.google.com/drawings/d/1cKDgcGLazn38BuL3cTI6dhzQ8mAMN9cUsO4eEm6yQOM/edit?usp=sharing |
@rovinski, my understanding is that macro cell insertion delays don't impact setup/hold times for commercial signoff timers. Can you confirm this? Also, the default for commercial timing model extractor is not to generate insertion delays. For OpenSTA, the default is to generate them. For pre-CTS, it makes sense to use timing models without insertion delays and switch to models with insertion delays after CTS. |
The setup/hold have to include the clock insertion delay otherwise it wouldn't work with propagated clocks, no? |
@maliberty So you're thinking about creating an abstract from floorplan, then switch the abstract at CTS time? So continuing: (The "make do-" prefixes are to work around the make dependencies which would redo stages I don't want redone when I'm mocking floorplan abstract being used in timing repair at the top level)
This works, no false setup or hold violations in floorplan, nor in repair_timing in place:
Then continue with:
This is bizarre... What does it mean to have a negative TNS when WNS is positive??
The macro placement is a bit funny because after I increased the die area to make place for the flip-flops II added in the PR: The macro placement problem is a non-sequitor to the current discussion, but mock-array needs to have an even amount of space around all the elements in the middle to fit input/output flipflops. Otherwise the flip flops end up being put in strange places like in the middle of the rows. If the PR is to be merged because of its value to study the false setup/hold violations in floorplan, then this macro placement problem has to be fixed. |
Filed an issue report on macro placement. |
This makes sense and I think it is true. If the clock delay is not annotated, then the only way to infer an ideal clock would be the (hold-setup)/2 technique or similar. I have no idea if we are doing that now.
I'm still a little confused, do you remove the delay from the clock source to the block clock pin, or remove the delay from the clock source all the way to the C FF? I don't see how the latter is possible unless the block has the clock latency property, which many The only way I see to do it is to estimate the clock latency based on the setup times and hold times. One way to do that could be to find a clock delay X such that Σ((hold-setup)/2 - X) = 0 Where the summation is over all input terms. Perhaps there are easier ways, but I don't see it. There might have to be some balancing taken into account for the output terms as well. I have no idea if OR does something similar to that right now. If there is no adjustment to the timing on the block, then @oharboe is absolutely right that there will be false timing on pre-CTS optimization on signals coming out of large macros. |
If OR generates the .lib abstract then we should have the clock latency property in the .lib. For vendor IPs it could be more of an issue. I'm not sure how this is handled in other tools in that case. (@gadfort any experience with this?) |
|
@gadfort what is a "faux clock tree"? |
If the timing model is generated with propagated clocks, the timing OpenSTA can't use the model/liberty min/max_clock_tree_path delays to The only way around the issue that I can see is to genarate separate |
Some thoughts on what can be done now with ORFS as is and what could be done:
At this point, I'm hope @gadfort can shed some light on whether a "faux clock tree" could be implemented in OpenROAD before pondering the next step. A "faux clock tree" is to me an "ideal clock tree" with zero skew, but a clock insertion latency that is large enough that the zero skew applies to flip flops as well as macros with non-zero clock insertion latency. |
There is no way to achieve zero skew if you don't know what the clock tree inside the macro looks like. The only information we have is the single value in the .lib which doesn't give the exact information. A faux clock tree can't achieve zero skew if there is skew inside the macro already. We could simply refuse to repair any path to a macro with *_clock_tree_path in ideal clock mode. |
Input/output constraints that end up in a flip flop can have problems too. reg2reg paths, ignoring macros, seems like the only thing that can be safely repaired? |
You have full control of io constraints and can provide ideal clock values. |
Investigating the Normally I would expect false negative hold violations in floorplan for a path from an input pin to a flip flop, because the clock will get to flip flop in 0 time with ideal clock. Only when the after CTS do I expect these violations to appear. I'm not sure if this is a problem(no hold violations repaired in floorplan, whereas they exist in CTS) or if it is, what I as a user should do about it. Not inserting hold cells early, when they should be there, will make placement worse: space was not reserved for something that is going to be needed after CTS. Also, isn't it better to have hold cells placed than added in repair? Same for false negatives on setup violations. I don't know how a the "faux clock tree" works, it will be interesting to learn more about @gadfort thoughts. If there is a "faux clock tree" that resembles the final clock tree, then I can see how constraints can be articulated and repair can take place. |
Well, third party liberty files can be edited to remove insertion delays for pre-CTS steps. Another way to mitigate false hold violations is to use clock uncertainty for hold. I think negative values can be used to make hold analysis less conservative. |
I thought about this some more and I think it actually makes sense for OpenSTA to remove the liberty min/max_clock_tree_delays from the setup/hold times when the clocks are ideal. That sort of mimics the best that CTS can do to balance the clock tree. It isn't the same as having ideal clocks in the macro but it does correspond to an ideal top level clock network. And it would certainly reduce the magnitude of the violations. |
@precisionmoon what do you think about this solution? It does represent a model of the best that CTS could do. |
Yes, the timer change proposed by Cherry makes sense. If insertion delays are not included for setup/hold violations in ideal clock mode, there is no need to keep multiple liberty models for macros.. If some pre-CTS hold fixing is needed, positive hold clock uncertainty can be added based on some early CTS estimates. For congested designs, late stage hold fixing may not be possible because there is no room to insert buffers around registers. For designs where there is plenty of room around registers, this won't be an issue. |
@oharboe is correct. Clocked paths from the macro also need to remove the macro clock delays when the model is used with ideal clocks. Changing output delay constraints does not solve the problem to outputs because the path from the macro may merge with other paths at multiple input gates before it gets to the output. |
Unless the other paths it merges with are purely combinational from an input they will also have a clock delay to remove. It seems like Liberty should let you distinguish those. |
The following OpenSTA commit removes the macro model clock tree delay from the setup/hold |
I had a go at fleshing out the SETUP/HOLD_SLACK_MARGIN docs: The-OpenROAD-Project/OpenROAD-flow-scripts#2615 |
@parallaxsw FYI, I retried #6181 (comment) with parallaxsw/OpenSTA@a82361c As expected, there are now no false timing violations to be repaired:
|
@eder-matheus please update the sta submodule to get this change. |
I've started a secure-ci for this update. |
@oharboe Sorry for the delay on it, but we finally merged the latest STA changes. |
Description
In megaboom there are macros that have a non-trivial network clock insertion latency internally. This causes the floorplan with 0 clock network insertion latency at the top level to believe that there are hold violations.
megaboom uses HOLD_SLACK_MARGIN=-300, so no hold cells are inserted in this case.
In the past, I have seen megaboom gnaw on the problem of how to get rid of these hold violations for a very long time.
Doesn't this case false negatives on setup violations too?
Suggested Solution
Unsure. Modify timing to set network clock insertion latency of macros to 0 instead of using a mix of ideal and propagated clock.
Additional Context
No response
The text was updated successfully, but these errors were encountered: