speed up new router implementation #279

kyleplump · 2024-12-31T23:03:42Z

related to #278

in the new Trie leaf implementation, Mustermann matchers were created at the time of evaluation, causing a semi-significant slowdown. we have all of the information to create the matcher when building the trie - so, this fix creates the matcher at the time of leaf creation (leaving a fallback in #match), while maintaining the benefits of having leaves as introduced in 2.2, with this PR

this brings us back to similar performance as hanami-router v2.1, with the exception of startup (this fix is temporarily called 2.2.1):

this slowdown on boot is expected - more work (creating leaves / mustermann) is happening. the way to get around this would likely be a new structure other than a trie (maybe someday!), or maybe evaluating / building the trie async (😬)

please let me know if you'd like more test cases (especially looking for thoughts from @dcr8898) . thanks!

dcr8898 · 2025-01-01T13:52:06Z

This makes sense and was probably foreseeable. The previous version created the MM matchers ahead of time. I guess with a forking server, this makes the most sense: longer start-up, but faster operation.

…

On Tue, Dec 31, 2024, 6:04 PM Kyle Plump ***@***.***> wrote: related to #278 <#278> in the new Trie leaf implementation, Mustermann matchers were created at the time of evaluation, causing a semi-significant slowdown. we have all of the information to create the matcher when building the trie - so, this fix creates the matcher at the time of leaf creation (leaving a fallback in #match), while maintaining the benefits of having leaves as introduced in 2.2, with this PR <#273> this brings us back to similar performance as hanami-router v2.1, with the exception of startup (this fix is temporarily called 2.2.1): rps.png (view on web) <https://github.com/user-attachments/assets/e4a135d5-53f5-49ca-8be0-acab1d4674d9> log_rps.png (view on web) <https://github.com/user-attachments/assets/5a31d0b9-a928-4a32-8103-71fa2db5d4f0> runtime_with_startup.png (view on web) <https://github.com/user-attachments/assets/4514f85f-abb1-42da-8d3c-2d72485ce156> this slowdown on boot is expected - more work (creating leaves / mustermann) is happening. the way to get around this would likely be a new structure other than a trie (maybe someday!), or maybe evaluating / building the trie async (😬) please let me know if you'd like more test cases (especially looking for thoughts from @dcr8898 <https://github.com/dcr8898>) . thanks! ------------------------------ You can view, comment on, or merge this pull request online at: #279 Commit Summary - 9d0cebd <9d0cebd> initialize matcher on leaf init File Changes (1 file <https://github.com/hanami/router/pull/279/files>) - *M* lib/hanami/router/leaf.rb <https://github.com/hanami/router/pull/279/files#diff-0231862f976007f9290c3585560fd4109a0a674c1d56f418a1de35eb0b50b12d> (1) Patch Links: - https://github.com/hanami/router/pull/279.patch - https://github.com/hanami/router/pull/279.diff — Reply to this email directly, view it on GitHub <#279>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC7IFGFPTZNTR2GZCTRHDBD2IMPGHAVCNFSM6AAAAABUN74JTWVHI2DSMVQWIX3LMV43ASLTON2WKOZSG43DINZRGY2TEMA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

cllns · 2025-01-01T22:46:54Z

Sweet! The 10+ second startup time for 10,000 routes is still worrisome to me, is there anyway we could bring that down?

Is it possible to postpone creating leaves until we need them, or batch them up somehow? I guess that's what you mean by making it async, but I could be wrong.

dcr8898 · 2025-01-01T23:25:51Z

Postponing object creation in leaves is what was causing the problem: it meant creating a Mustermann matchers each time we had a potential route match. Kyle's fix moves that cost to boot up by creating all possible matchers at boot time instead. I would note that 2.2 should create many fewer MM matchers compared to 2.1. 2.2 creates one matcher for each complete defined route (that contains a dynamic element). 2.1 was creating a MM matcher for every dynamic element, which could mean multiple matchers for a route, depending on the number of dynamic elements in the route. I haven't seen how the 10,000 routes are structured in this test. It's possible they don't reflect real-world scenarios. It's also possible that the 2.2 router code that constructs the routing trie could be better optimized for performance. I think it's more or less idiomatic Ruby code. I don't think there's much there you would call "clever." But idiomatic Ruby is not necessarily the fastest Ruby. Jeremy's Polished Ruby book goes into that, but I have much to learn in that area. I am happy to dig deeper. I really appreciate Kyle zeroing on the main problem so quickly! 🧐 Kudos to him!

…

On Wed, Jan 1, 2025, 5:47 PM Sean Collins ***@***.***> wrote: Sweet! The 10+ second startup time for 10,000 routes is still worrisome to me, is there anyway we could bring that down? Is it possible to postpone creating leaves until we need them, or batch them up somehow? I guess that's what you mean by making it async, but I could be wrong. — Reply to this email directly, view it on GitHub <#279 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC7IFGDLVCT7LEIIBTIIYN32IRV7HAVCNFSM6AAAAABUN74JTWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRXGE3TAMZXG4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

kyleplump · 2025-01-03T15:41:20Z

@cllns @dcr8898

howdy gentlemen

thank you for the thoughts! i think i made some headway here

context

i did some poking around to see if i could figure out where the slowdown on boot was, and surprise surprise, it's our friend Mustermann.

Damian is correct - in theory we're creating significantly less matchers since its only creating them for defined routes, rather than each individual dynamic segment. (for reference, the original design).

so in 2.1, we would have matchers for things like :id, or :uuid instead of full paths like /entity/:id or /some-route/:unique.

musty

in this section of the mustermann docs we learn two things:

creating pattern matching objects is (intentionally) expensive
when calling #new, and passing the same arguments, mustermann might just return an already created object instead of initializing a new one.

point #2 is the important part. when giving dynamic segments in 2.1, it was very possible we were sharing matchers between routes. a matcher would match :id, regardless of what route that dynamic segment was used in, utilizing the same underlying object. so in theory - less matchers. in practice - probably not.

i confirmed this by looking at the memory usage of 2.2.1:

in Sean's original test, memory was similar between 2.1 and 2.2. this is because we were creating the matchers in the leaves at runtime. when 'precompiling' the routes, we see that initial memory usage is much higher. more evidence that we're initializing many more objects

proposal

route caching 🤔

i think there could be a world where we pre-compile the Trie on the first boot, and then save to a gitignore'd file on disk. on subsequent loads, we could check if the routes have changed. if they havent, we should have constant boot times. if they have changed, compile the changed routes into a new version of that file. i think this could be an acceptable solution, since long start up times really only starts to take effect when you have a significant number of routes, and i imagine very few people are going to add 10k routes before the first time starting the app. though, the consequence of this would be adding more complexity to the Hanami boot process. if this is a direction people are interested in, i'd love to create a separate issue for tracking that effort

lmk what you think! thank you as always

kyleplump · 2025-01-03T17:15:15Z

@cllns @dcr8898

follow up here: ended up squeezing some more juice out of the 2.2.1 implementation (now named 2.2.12 although idk if this is the correct usage of semver or not 🤷 )

updated graphs (on my machine at least):

as you can see memory usage is pretty much the same, this is still because of the issue discussed above.

i did some extra benchmarking and realized we were leaving a lot on the table with Trie#segments_from. turns out this function was taking most of the time in our implementation. we should be splitting based on a string instead of a regex (i didnt just know this off the top of my head, i found this and then tried it), as well as we should be memoizing segments (#put can do the split and cache, #find can just read segments from this cache)

please let me know if you see any issues with this updated approach!

dcr8898 · 2025-01-06T21:47:45Z

@kyleplump

Thank you for continuing to look into this! I've been swamped on my end through the holidays.

Optimizing Hanami Router 2.2

Your switch from RegEx to string splitting is what I meant about finding the most performant implementation for our approach. It's good to see that 2.2.1 is faster than 2.1 when implemented properly. There may be more wins to be had with similar optimizations. Jeremy Evans' book has a lot of tips for this.

Using Mustermann will always require a trade-off

In terms of boot time vs response time, I believe this will always be a trade-off as long as we are using Mustermann. I don't know if it would be possible to pre-compile the trie structure somehow. I've never seen that done. You would essentially have to serialize Ruby objects and then de-serialize them at runtime. Not sure if this would be possible or faster. Interesting thought.

At present, I consider a longer boot-time for faster run-time in 2.2.1 to be the better trade-off (compared to the opposite in 2.2).

Mustermann discussion

With regard to Mustermann, I would bet that Hanami's use of tries allows it to compare favorably with any other framework that also uses Mustermann. This doesn't apply to Roda, since Roda does not seem to use Mustermann.

However, it would be hard for Hanami Router to match Roda's speed, because Roda's routing config is essentially a trie structure implemented in code that does not use Mustermann (see the "Faster" section here). Therefore, it's faster both at boot (there is no routing compile step at all) and during run time. (You can read a comprehensive write-up of Roda here.)

Hanami uses Mustermann because of the power and flexibility it provides out of the box, and because it's battle tested. With that said, there are other options.

Options for moving forward

I recommend that we continue to use Mustermann for now, as improved in this PR, but we should consider other options that could further improve Hanami Router, such as:

Contribute improvements to Mustermann.
Consider using a different Mustermann type (other than Rails), or even creating a Hanami Mustermann type.
Drop Mustermann and create a new matching strategy, perhaps using Roda for inspiration.

Other thoughts

As a general rule, the more flexible we allow our routes to be, the more overhead we should expect. Part of the reason Roda is fast is because it implements a small set of routing options to start (for example, only GET and POST are available by default). More complex options are available as plugins, but users don't incur their added overhead unless they choose to use them. If we do refactor Hanami Router along a different path, we could consider a similar approach of simple first.

Other considerations include developer experience. Hanami Router's configuration of routes is very similar to that of Rails, and (I think) very readable. Roda's presentation is very different, and perhaps not as readable (could just be my lack of familiarity). The fastest possible implementation would have developers specify a routing trie data structure directly in the routes file, but the DX for this would be poor. Trade-offs of some kind will always exist.

@kyleplump do you have time to discuss your code changes before we recommend merging them?

cllns · 2025-01-06T22:36:20Z

Thanks for finding this and writing up your thoughts @kyleplump @dcr8898!

Pre-compiling the trie is an interesting idea but adds too much complexity. It'd be a cool experiment for sure, but it's too novel to add at this point.

In terms of versioning proposals, I think it'd be better to refer to them via something like 2.2.0.proposal1 and 2.2.0.proposal2 or .fixN etc. Just something that signifies that it's not an actual gem release. At some point we'll make a 2.2.1 gem release and then the chart & discussion will be inaccurately labeled, which is confusing for future readers.

Is there some way we could take a hybrid approach here? That is, building Mustermann leaves ahead of time, for the first N routes, then after that just build the matchers lazily. We could use benchmarking to figure out what N should be by default, and allow that N to be configurable.

Similarly, could we have pre-building leaves an option? i.e. N=0 is valid and implements (essentially) the previous behavior from 2.1.

I don't know if it would be safe, but I could see wanting lazy built matchers in dev and test environments, then build them ahead of time in production for increased performance (at the expense of slower startup time).

dcr8898 · 2025-01-06T23:51:10Z

@cllns These are some interesting ideas for sure.

My initial feeling is that this would be a case of premature optimization. I don't know how many apps there are out there with 10K routes. I would think a huge app would have 1-2K routes (based on my limited experience). If the graph above is logarithmic, this would imply startup times of 1-2 seconds for such apps. Do you think that would be too long for dev & test environments? My feeling is that we should wait until developers are actually feeling that pain before we introduce more complexity.

For example, I would like to speak with @kyleplump more to see if we can squeeze the current memory usage a little more, and maybe the startup time, with further optimizations.

If we do explore your suggestions, what would be the gain of compiling some matchers ahead of time and not others? As far as this benchmark is concerned, I presume it attempts to exercise all of the generated routes more or less equally. This is actually one factor that makes the r10k tool a little unrealistic.

In a real-world app, there would definitely be some distribution of popular and less-popular routes, but this distribution would not be captured by an arbitrary division of route handling based on a number. What might work in this case is the original 2.2 strategy and some kind of fast, in-memory cache of created Mustermann matchers. I don't know if this is worthwhile, but it would provide a compromise approach. (EDIT: I have no idea how this would work in a concurrent environment. Maybe it is best to compile the routes first, and then potentially freeze the router. 🤷 )

The second part of your suggestion, handling dev and test environments differently than production, could be implemented pretty easily (I think 🤔 ), but I still wouldn't recommend introducing that additional complexity until actual developer experience starts to suffer.

kyleplump · 2025-01-07T02:15:32Z

@cllns @dcr8898

i appreciate you both thinking about this so deeply! i love hearing your perspectives - trying to keep up with you two is helping me learn a ton :)

i like this idea of pre-generating them all on production builds. is there a world where we keep it simple, at first, and just lazy load the matchers when running hanami dev, and pre-generate them on prod builds? as long as we document it, and maybe even provide the option to override this behavior, i feel like thats a big win. it allows developers the ability to see everything 'live' and un-cached, but once they're ready to ship a build we can eagerly optimize. i've worked with some tools that do similar-ish things, so as long as we're transparent, there's a precedent.

regardless - i'd love to keep poking around and seeing if we can be more 'clever' with our optimizations and make this router blazingly fast (side note: do people still say this?). be happy to bounce ideas off of anyone, but def @dcr8898 if youre volunteering

on versioning: once we iron out a game plan, i can make the modifications, close this PR and create a formal 'proposal' laying out changes made if that would be better? 2.2 fixes the bug, we're now just talking about optimization (which is the name of the game for a project like this). point being: if we bundle it all up nicely it can get merged whenever.

thanks again!

dcr8898 · 2025-01-07T02:21:54Z

@kyleplump I'm in favor of polishing this PR up and merging it to fix the problem. Then we can experiment with further changes at our leisure. Let me know when you want to meet up. I think we can make a few more optimizations before submitting.

I might still saying blazing fast, but I'm likely an anomaly. 🤔

initialize matcher on leaf init

9d0cebd

kyleplump mentioned this pull request Dec 31, 2024

Performance regression from 2.1 to 2.2 #278

Open

split on string instead of regex, memo segments

0836b19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speed up new router implementation #279

speed up new router implementation #279

kyleplump commented Dec 31, 2024

dcr8898 commented Jan 1, 2025 via email

cllns commented Jan 1, 2025

dcr8898 commented Jan 1, 2025 via email

kyleplump commented Jan 3, 2025

kyleplump commented Jan 3, 2025 •

edited

Loading

dcr8898 commented Jan 6, 2025

cllns commented Jan 6, 2025

dcr8898 commented Jan 6, 2025 •

edited

Loading

kyleplump commented Jan 7, 2025

dcr8898 commented Jan 7, 2025

speed up new router implementation #279

Are you sure you want to change the base?

speed up new router implementation #279

Conversation

kyleplump commented Dec 31, 2024

dcr8898 commented Jan 1, 2025 via email

cllns commented Jan 1, 2025

dcr8898 commented Jan 1, 2025 via email

kyleplump commented Jan 3, 2025

context

musty

proposal

kyleplump commented Jan 3, 2025 • edited Loading

dcr8898 commented Jan 6, 2025

Optimizing Hanami Router 2.2

Using Mustermann will always require a trade-off

Mustermann discussion

Options for moving forward

Other thoughts

cllns commented Jan 6, 2025

dcr8898 commented Jan 6, 2025 • edited Loading

kyleplump commented Jan 7, 2025

dcr8898 commented Jan 7, 2025

kyleplump commented Jan 3, 2025 •

edited

Loading

dcr8898 commented Jan 6, 2025 •

edited

Loading