-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loadFragment() with given hash #371
Comments
That could definitely be exposed, though Pagefind offers no method for finding a hash you're looking for outside of a search result. What's the use-case you're looking to fill here? (How are you planning to get the hash to pass to this function?) There might be a better way to get there 🙂 |
I may have to give you some background on this. I'm using pagefind on geospatial datasets that come with a title, description, quicklook images, etc. (the usual :-). It does a really great job and allows me to do 98% of my search needs in a wonderful easy way (thanks for the lib btw.).
The remaining 2% of search functionality I would like to implement that page can't solve is a geospatial search like bounding-box intersection. There are other very performant libraries (like https://github.com/mourner/flatbush) to do this. My idea is to just have a small list of structs with only a boundingbox and the pagefind hash to feed one of those libraries wirh it. I would then make my intersection and use the hash to get the metadata (title, image, url, etc.) directly from the pagefind index.
To build the list of bbox+hash structs I planned to just query all data from all records with a search term 'null'; not at runtime but at build time (build page -> build pagefind index -> build geospatial index).
I hope this makes sense.
|
I'm not sure how relevant it would be for other applications and what it would mean for the pagefind code, but maybe an attribute like |
Ah, cool! Nice use-case. The purpose of the hashes is to eliminate any stale caching issues, so I'd be hesitant to provide a custom hash functionality. The option for Pagefind to write a plain JSON file is totally doable, though, I'll look into that. And no reason the explicit call to load a fragment can't be exposed, so I'll tackle that too. |
Thanks for your effort.
|
@bglw I have this issue with the node library. A quick solution could be to return the hash, alongside others data, when the record is created. |
This is exactly what I need too. In my project I have 500 indexed HTML files all of which I display in my web UI. Displaying and visually filtering this many elements forces me to Work Arounds My workaround is to create a reverse lookup table from result To fix the UI freezing, one needs the Scheduler API or a I also tried moving Solution? If |
Hi all 👋 I'll be working on this one soon, along with #715 Both will come via a CLI flag to output a file containing information about the index — filters, fragments, etc. This will be output at the conclusion of the build. The API will gain a matching function, something like
Unfortunately this one isn't possible without some more changes. At present, the IDs aren't allocated until the conclusion of indexing, so they aren't known at the point of responding to any of the |
Hi @bglw. Thank you for listening for our issues :) In my use case, the best solution would be to have the record hash directly returned by |
Hmm, well that needs some more thought 😅 Just to rattle off some thoughts, for context and for myself: Pagefind uses fairly short page IDs, to reduce the size of the metadata it needs to load up front. The downside of this is that collisions can and do happen, so the IDs are allocated at the end of the indexing, and pages will adjust their hash if it would collide. One goal for this is that both pages should adjust, which means the ID of a page may need to change after it has been allocated. So the big issue is until all files have been indexed, we don't know how short to make the page ID. The primary purpose of these hashes is to solve caching issues when the index changes after a build, so I'm hesitant to change the strategy too significantly. One idea that might work would be to adopt a git-ish concept of short and long IDs, and return the long ID from the With that:
|
Hi @bglw Given the ID de-duplication restrictions you mention, it's fine for my use case to leave Cheers. |
Thank you for your elaborated answer. I understand the issue you are facing and why the ID is not already returned on record creation. Your solution would work but I see two issues :
I would rather suggest, if possible, to check the ID availability (and regenerate it if duplicate) at creation time. However, that's fine, I can use Edit : i'm thinking of the following solution that would address more directly our use-cases. In my case, I have a map with points. I need to know the location of all the points (with and without filters). But I need only need the location. Currently, I am relying on a pre-generated JSON file to retrieve the location from the fragment ID, without having to fetch each fragment individually. This issue could be resolved with the combination of :
|
As a user story, my initial developer ergonomics expectation was that After some digging, I now understand that One possibility, perhaps too piecemeal a change, is including @julbd, @bglw however, once that's solved, arbitrary (build time) fragment splitting could indeed massively reduce my UI's search times. For hundreds of indexed items I only need I think it's fair to say that loading hundreds of fragments could be considered out of scope for Pagefind. I'm sure 95% of applications are paginating results. Also fragment splitting sounds like a major rewrite of core functionality. Anyway, just food for thought. And thanks for helping. |
👋 @julbd
Correct! That's the limitation. For people loading them all into a client-side bundle, the recommendation would be to use the indexCatalogue to look up the corresponding short hash — but at that point you may as well just rely on the indexCatalogue for everything. From my side, I'll continue with the indexCatalogue idea and we'll see how it goes, but we can revisit the idea of returning hashes while indexing if it seems crucial!
The main blocker here is that:
Importantly for the second one, playing through a scenario:
Now if any user has the hash fragment for We are getting into micro-optimizations here! But these are also all scenarios that have been encountered with Pagefind in practice 😅
This is an interesting idea! I like it 🤔 It feels tangential to this issue, would you mind opening a new one for that? :)
Ah, the
Agreed! The URL not being returned is quite intentional, so I'd be resistant to adding it. (Currently all IDs are loaded up front with Pagefind, and loading the URLs at the same time would start getting heavy). I like that the indexCatalogue concept gives an extension to some of these niche use cases where it's needed without impacting the base case for bandwidth.
Can you elaborate? The two ways I can read this is:
Correct! Or my favorites use an IntersectionObserver to load the fragment when the result enters the viewport :) |
For the impetus for this addition see CloudCannon#371
😆 Cheers. I've sent you PR #719 with two minor additions to the getting stared docs re the
Option 2 (+ extras): Multple fragments for each indexed file:
This opt-in generation of fragment subsets would allow users to make their own trade-offs between the number of HTTP requests required and (even further) reduced bandwidth (reduced search times). I'm not overly familiar with the Pagefind code-base itself, so take my idea with a pinch of salt, but that's my conceptualisation of @julbd's idea.
I think I'll give that a go! |
After deciding on Web Components, Go types had to be duplicated in JSDoc syntax for the JS client code. Although rewriting the static site generation in Node makes SSG slower for a total build, the developer experience can actually be faster because reloading Web Components is fast. Initially native Web Components were working well, but the value of reactive state mounted. Lit Elements add reactive state to Web Component standards, so that, unlike React, it's a small dependency, and requires no build step. The client UI itself is now a two sidebar layout, both attached to the left of the viewport, each scrollable. One for filters, and one for results, with content being loading in the page body. Pagefind performance improves a lot with this commit following the adoption of an IntersectionObserver for loading data as discussed in CloudCannon/pagefind#371. Rurther reductions to initalisation times to follow with the proposed fixes in that issue. Client JS now uses modules (as opposed to synchronous plain JS) in the browser. This approach initially broke support for some older browsers, specifically iOS 16.3 and it's lack of support for import maps. To solve this Vite is now being used to compile client code in a way that's compatible with older browsers.
Is it possible to directly load a fragment for a hash that was not obtained by the search?
Something like a public version of the
loadFragment
function: https://github.com/CloudCannon/pagefind/blob/main/pagefind_web_js/lib/coupled_search.ts#L234 ?The text was updated successfully, but these errors were encountered: