-
-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
speed up new router implementation #279
base: main
Are you sure you want to change the base?
Conversation
This makes sense and was probably foreseeable. The previous version created
the MM matchers ahead of time. I guess with a forking server, this makes
the most sense: longer start-up, but faster operation.
…On Tue, Dec 31, 2024, 6:04 PM Kyle Plump ***@***.***> wrote:
related to #278 <#278>
in the new Trie leaf implementation, Mustermann matchers were created at
the time of evaluation, causing a semi-significant slowdown. we have all of
the information to create the matcher when building the trie - so, this fix
creates the matcher at the time of leaf creation (leaving a fallback in
#match), while maintaining the benefits of having leaves as introduced in
2.2, with this PR <#273>
this brings us back to similar performance as hanami-router v2.1, with
the exception of startup (this fix is temporarily called 2.2.1):
rps.png (view on web)
<https://github.com/user-attachments/assets/e4a135d5-53f5-49ca-8be0-acab1d4674d9>
log_rps.png (view on web)
<https://github.com/user-attachments/assets/5a31d0b9-a928-4a32-8103-71fa2db5d4f0>
runtime_with_startup.png (view on web)
<https://github.com/user-attachments/assets/4514f85f-abb1-42da-8d3c-2d72485ce156>
this slowdown on boot is expected - more work (creating leaves /
mustermann) is happening. the way to get around this would likely be a new
structure other than a trie (maybe someday!), or maybe evaluating /
building the trie async (😬)
please let me know if you'd like more test cases (especially looking for
thoughts from @dcr8898 <https://github.com/dcr8898>) . thanks!
------------------------------
You can view, comment on, or merge this pull request online at:
#279
Commit Summary
- 9d0cebd
<9d0cebd>
initialize matcher on leaf init
File Changes
(1 file <https://github.com/hanami/router/pull/279/files>)
- *M* lib/hanami/router/leaf.rb
<https://github.com/hanami/router/pull/279/files#diff-0231862f976007f9290c3585560fd4109a0a674c1d56f418a1de35eb0b50b12d>
(1)
Patch Links:
- https://github.com/hanami/router/pull/279.patch
- https://github.com/hanami/router/pull/279.diff
—
Reply to this email directly, view it on GitHub
<#279>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC7IFGFPTZNTR2GZCTRHDBD2IMPGHAVCNFSM6AAAAABUN74JTWVHI2DSMVQWIX3LMV43ASLTON2WKOZSG43DINZRGY2TEMA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Sweet! The 10+ second startup time for 10,000 routes is still worrisome to me, is there anyway we could bring that down? Is it possible to postpone creating leaves until we need them, or batch them up somehow? I guess that's what you mean by making it async, but I could be wrong. |
Postponing object creation in leaves is what was causing the problem: it
meant creating a Mustermann matchers each time we had a potential route
match.
Kyle's fix moves that cost to boot up by creating all possible matchers at
boot time instead.
I would note that 2.2 should create many fewer MM matchers compared to 2.1.
2.2 creates one matcher for each complete defined route (that contains a
dynamic element). 2.1 was creating a MM matcher for every dynamic element,
which could mean multiple matchers for a route, depending on the number of
dynamic elements in the route.
I haven't seen how the 10,000 routes are structured in this test. It's
possible they don't reflect real-world scenarios.
It's also possible that the 2.2 router code that constructs the routing
trie could be better optimized for performance. I think it's more or less
idiomatic Ruby code. I don't think there's much there you would call
"clever." But idiomatic Ruby is not necessarily the fastest Ruby. Jeremy's
Polished Ruby book goes into that, but I have much to learn in that area. I
am happy to dig deeper.
I really appreciate Kyle zeroing on the main problem so quickly! 🧐 Kudos
to him!
…On Wed, Jan 1, 2025, 5:47 PM Sean Collins ***@***.***> wrote:
Sweet! The 10+ second startup time for 10,000 routes is still worrisome to
me, is there anyway we could bring that down?
Is it possible to postpone creating leaves until we need them, or batch
them up somehow? I guess that's what you mean by making it async, but I
could be wrong.
—
Reply to this email directly, view it on GitHub
<#279 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC7IFGDLVCT7LEIIBTIIYN32IRV7HAVCNFSM6AAAAABUN74JTWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRXGE3TAMZXG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
howdy gentlemen thank you for the thoughts! i think i made some headway here contexti did some poking around to see if i could figure out where the slowdown on boot was, and surprise surprise, it's our friend Mustermann. Damian is correct - in theory we're creating significantly less matchers since its only creating them for defined routes, rather than each individual dynamic segment. (for reference, the original design). so in 2.1, we would have matchers for things like mustyin this section of the mustermann docs we learn two things:
point #2 is the important part. when giving dynamic segments in 2.1, it was very possible we were sharing matchers between routes. a matcher would match i confirmed this by looking at the memory usage of in Sean's original test, memory was similar between proposalroute caching 🤔 i think there could be a world where we pre-compile the Trie on the first boot, and then save to a gitignore'd file on disk. on subsequent loads, we could check if the routes have changed. if they havent, we should have constant boot times. if they have changed, compile the changed routes into a new version of that file. i think this could be an acceptable solution, since long start up times really only starts to take effect when you have a significant number of routes, and i imagine very few people are going to add 10k routes before the first time starting the app. though, the consequence of this would be adding more complexity to the Hanami boot process. if this is a direction people are interested in, i'd love to create a separate issue for tracking that effort lmk what you think! thank you as always |
follow up here: ended up squeezing some more juice out of the 2.2.1 implementation (now named 2.2.12 although idk if this is the correct usage of semver or not 🤷 ) updated graphs (on my machine at least): as you can see memory usage is pretty much the same, this is still because of the issue discussed above. i did some extra benchmarking and realized we were leaving a lot on the table with please let me know if you see any issues with this updated approach! |
Thank you for continuing to look into this! I've been swamped on my end through the holidays. Optimizing Hanami Router 2.2Your switch from RegEx to string splitting is what I meant about finding the most performant implementation for our approach. It's good to see that 2.2.1 is faster than 2.1 when implemented properly. There may be more wins to be had with similar optimizations. Jeremy Evans' book has a lot of tips for this. Using Mustermann will always require a trade-offIn terms of boot time vs response time, I believe this will always be a trade-off as long as we are using Mustermann. I don't know if it would be possible to pre-compile the trie structure somehow. I've never seen that done. You would essentially have to serialize Ruby objects and then de-serialize them at runtime. Not sure if this would be possible or faster. Interesting thought. At present, I consider a longer boot-time for faster run-time in 2.2.1 to be the better trade-off (compared to the opposite in 2.2). Mustermann discussionWith regard to Mustermann, I would bet that Hanami's use of tries allows it to compare favorably with any other framework that also uses Mustermann. This doesn't apply to Roda, since Roda does not seem to use Mustermann. However, it would be hard for Hanami Router to match Roda's speed, because Roda's routing config is essentially a trie structure implemented in code that does not use Mustermann (see the "Faster" section here). Therefore, it's faster both at boot (there is no routing compile step at all) and during run time. (You can read a comprehensive write-up of Roda here.) Hanami uses Mustermann because of the power and flexibility it provides out of the box, and because it's battle tested. With that said, there are other options. Options for moving forwardI recommend that we continue to use Mustermann for now, as improved in this PR, but we should consider other options that could further improve Hanami Router, such as:
Other thoughtsAs a general rule, the more flexible we allow our routes to be, the more overhead we should expect. Part of the reason Roda is fast is because it implements a small set of routing options to start (for example, only Other considerations include developer experience. Hanami Router's configuration of routes is very similar to that of Rails, and (I think) very readable. Roda's presentation is very different, and perhaps not as readable (could just be my lack of familiarity). The fastest possible implementation would have developers specify a routing trie data structure directly in the routes file, but the DX for this would be poor. Trade-offs of some kind will always exist. @kyleplump do you have time to discuss your code changes before we recommend merging them? |
Thanks for finding this and writing up your thoughts @kyleplump @dcr8898! Pre-compiling the trie is an interesting idea but adds too much complexity. It'd be a cool experiment for sure, but it's too novel to add at this point. In terms of versioning proposals, I think it'd be better to refer to them via something like 2.2.0.proposal1 and 2.2.0.proposal2 or Is there some way we could take a hybrid approach here? That is, building Mustermann leaves ahead of time, for the first N routes, then after that just build the matchers lazily. We could use benchmarking to figure out what N should be by default, and allow that N to be configurable. Similarly, could we have pre-building leaves an option? i.e. N=0 is valid and implements (essentially) the previous behavior from 2.1. I don't know if it would be safe, but I could see wanting lazy built matchers in dev and test environments, then build them ahead of time in production for increased performance (at the expense of slower startup time). |
@cllns These are some interesting ideas for sure. My initial feeling is that this would be a case of premature optimization. I don't know how many apps there are out there with 10K routes. I would think a huge app would have 1-2K routes (based on my limited experience). If the graph above is logarithmic, this would imply startup times of 1-2 seconds for such apps. Do you think that would be too long for dev & test environments? My feeling is that we should wait until developers are actually feeling that pain before we introduce more complexity. For example, I would like to speak with @kyleplump more to see if we can squeeze the current memory usage a little more, and maybe the startup time, with further optimizations. If we do explore your suggestions, what would be the gain of compiling some matchers ahead of time and not others? As far as this benchmark is concerned, I presume it attempts to exercise all of the generated routes more or less equally. This is actually one factor that makes the r10k tool a little unrealistic. In a real-world app, there would definitely be some distribution of popular and less-popular routes, but this distribution would not be captured by an arbitrary division of route handling based on a number. What might work in this case is the original 2.2 strategy and some kind of fast, in-memory cache of created Mustermann matchers. I don't know if this is worthwhile, but it would provide a compromise approach. (EDIT: I have no idea how this would work in a concurrent environment. Maybe it is best to compile the routes first, and then potentially freeze the router. 🤷 ) The second part of your suggestion, handling dev and test environments differently than production, could be implemented pretty easily (I think 🤔 ), but I still wouldn't recommend introducing that additional complexity until actual developer experience starts to suffer. |
i appreciate you both thinking about this so deeply! i love hearing your perspectives - trying to keep up with you two is helping me learn a ton :) i like this idea of pre-generating them all on production builds. is there a world where we keep it simple, at first, and just lazy load the matchers when running regardless - i'd love to keep poking around and seeing if we can be more 'clever' with our optimizations and make this router blazingly fast (side note: do people still say this?). be happy to bounce ideas off of anyone, but def @dcr8898 if youre volunteering on versioning: once we iron out a game plan, i can make the modifications, close this PR and create a formal 'proposal' laying out changes made if that would be better? 2.2 fixes the bug, we're now just talking about optimization (which is the name of the game for a project like this). point being: if we bundle it all up nicely it can get merged whenever. thanks again! |
@kyleplump I'm in favor of polishing this PR up and merging it to fix the problem. Then we can experiment with further changes at our leisure. Let me know when you want to meet up. I think we can make a few more optimizations before submitting. I might still saying blazing fast, but I'm likely an anomaly. 🤔 |
related to #278
in the new Trie
leaf
implementation, Mustermann matchers were created at the time of evaluation, causing a semi-significant slowdown. we have all of the information to create the matcher when building the trie - so, this fix creates the matcher at the time ofleaf
creation (leaving a fallback in#match
), while maintaining the benefits of havingleaves
as introduced in 2.2, with this PRthis brings us back to similar performance as
hanami-router v2.1
, with the exception of startup (this fix is temporarily called2.2.1
):this slowdown on boot is expected - more work (creating leaves / mustermann) is happening. the way to get around this would likely be a new structure other than a trie (maybe someday!), or maybe evaluating / building the trie async (😬)
please let me know if you'd like more test cases (especially looking for thoughts from @dcr8898) . thanks!