-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create PagefindIndex struct for rust lib to be more sdk like #751
Conversation
Hmm, I'm open to exposing more from the Pagefind library, though I don't think these are the best candidates for making public — these can change between versions and making them part of the public API would lock them in somewhat, thus changing them would require ticking up a major version. Ideally we can expose this behavior using the service API methods — e.g. the endpoints available in https://pagefind.app/docs/node-api/ — which are defined in service/requests.rs and service/responses.rs. These represent the programmatic interface to Pagefind that I'm comfortable supporting, whereas the fossick module and the DomParserResult concepts are very internal. Does that sound feasible? |
This makes sense, the issue with that approach is that it becomes reliant on a top level process to managed all the indexes, ideally we wouldn't have a seperate process running. Would you be open to me exposing each of theses mappers as a function that could also interface as an api? The main thing I wanted was a way to call these functions directly. https://github.com/CloudCannon/pagefind/blob/main/pagefind/src/service/mod.rs#L138-L282 |
Ah sorry, I wasn't super clear in my reply — yes my suggestion is to add a library interface to Pagefind that is based on the service API, going through the same final code paths, but not going through the service pipe itself. So yes exposing each of those request/responses as a library function would be ideal :) |
Ok great, I'll spec something out really quick for this. I was so excited that I already setup a PR for this based on the api of my fork. devflowinc/trieve#2934 Thank you for fast response time. |
83bda22
to
e6ce8ef
Compare
I created a struct How would I test that I didn't introduce any breaking changes for the non-rust api's https://pagefind.app/docs/node-api/ I tested using |
I also updated my example code for how to the pagefind. https://github.com/devflowinc/Pagefind-Example-Usage-Rust/blob/main/src/main.rs |
I have enabled the test runners on this PR — it does look like this change has broken the existing API endpoints. I don't have time to review today but can tomorrow. Curious to hear what issues you have hit setting up Toolproof — that project is in its nascent stage but I would like to start fixing any issues people are hitting there too. |
Thank you for running the tests, going to see if I can reproduce the error cases based on the CI run.
Fixed after installing chromium |
Ah interesting — what operating system are you on, and do you have Chrome installed? You can follow the logic the underlying crate uses to find a Chrom(e/ium) here: https://github.com/mattsse/chromiumoxide/blob/main/src/detection.rs If you can point a Edit: Nice one! I'll look at catching this error in Toolproof itself and providing better error messaging. |
I believe I found the fix the tests here (Accidentally removed this line of code in translation). Locally 2 tests fail against main, same tests fail against on my branch. Can you rerun the workflow just to verify? |
f9591cc
to
5f94a48
Compare
@bglw sorry for the double ping, when would you be free to review? Really want to get a release out so we can ship to production and iterate on pagefind search quality more, and would rather not publish a new cargo crate |
Will look at this tomorrow — initial skim looks good though! |
Hi @cdxker 👋 I have done some work on top of this branch — I can push it to your remote if you enable access to maintainers, otherwise if you want to cherry pick my commit from 5ecb1fd into this branch we can continue from there. High level changes in that commit:
Let me know how you see these changes, hopefully it's compatible with what you're after. I'll still mulling over the Rust API. The Node and Python APIs have the benefit of new arguments not being breaking changes, since they go through serialization and have defaults. However with the Rust API, adding a new argument to |
Will cherry pick it on in just a minute. (The button to allow maintainer edits seems to have disappeared) |
@bglw All these changes look great on my end, just pushed the cherry pick'ed commit . I wan't sure how you wanted to deal with error handling so I just deferred it to you. As for limiting breaking changes, the best way is likely to pass sa struct type to each function instead of a list of arguments. That way we can do a few serialization hacks. Its more messy in straight rust though. |
Yeah, I think the only true way would be to move each function into a builder pattern but that becomes pretty non-ergonomic to use so I'm happy to ship it as-is. I am very resistant to breaking changes though, so if there's anything you're wanting to have added to the API now is the time to do it. Once we ship this I won't want to make a major release just for API changes, so we'll need to add something like a I'll need to look into those two failing tests — they're failing in this PR run as well, however both this branch and main are passing on my machine. Possibly a non-deterministic test, though it seems like a recent regression if so. |
The current api interface works for me, even tested integrating with our example usecase. The only functions I would need are |
Sweet as — alright I'll merge and release as soon as I figure out what is happening in these tests. |
Phew okay, located the issue. Legitimate bug that will have possibly been affecting ranking in some cases, where it isn't guaranteed how a compound word will be ranked. I wasn't seeing it locally as the machine I'm on had an older Rust version — so I believe the culprit will be the recent(ish) Rust version that changed the implementation of the std sorting algorithms. I think it was still a bug before, these tests just happened to skate by.
Updated commit, if you could cherry pick that onto this branch :) Context since this doesn't have its own PR:In the "Compound words prefixes prioritise the lower weight" test, when you search The section of code with the bug identifies these cases and ensures that we use the lower weighted match. Since we only searched This means whether it looked at |
Wow pretty nasty bug, just cherry-pick'ed your commit so it should be ready to rerun the workflows |
Excellent! Merging now. I'd like to roll a few other little fixes in to a release so there's something more user-facing, so I'll get a couple things in there today/tomorrow and then fire a release out (~30 hours from now). Sorry to delay! |
👋 @cdxker This has landed in Pagefind v1.3.0 🎉 |
Hi @CloudCannonMain love pagefind a lot, I am trying to use the
pagefind
rust crate in a similar way to the other wrapper libraries using the pagefind service . We created an example of using pagefind to index a simple json file https://github.com/devflowinc/Pagefind-Example-Usage-Rust , however we need these 2 structs to be exposed publicly.Let me know if you have another questions, I would love to talk.