-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LanceDB Destination #1375
LanceDB Destination #1375
Conversation
Signed-off-by: Marcel Coetzee <[email protected]>
✅ Deploy Preview for dlt-hub-docs canceled.
|
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
…ties Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
…l inference Signed-off-by: Marcel Coetzee <[email protected]>
…ame for reserved fields Signed-off-by: Marcel Coetzee <[email protected]>
Storage options are only available in asynchronous Python API. See https://lancedb.github.io/lancedb/guides/storage/ Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
Update: if I change the lancedb_adapter to |
# Conflicts: # poetry.lock
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
@akelad All configuration fields are optional now. Please see associated tests as well. Mind checking whether it works for you? Provided you aren't computing embeddings, you should be able to run as is after a |
Signed-off-by: Marcel Coetzee <[email protected]>
@Pipboyguy I'll try it out, thanks. Can you also take a look at my other comments? Both Anuun and I get the |
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
Can confirm it works without defining the api/embedding api key |
Signed-off-by: Marcel Coetzee <[email protected]>
Signed-off-by: Marcel Coetzee <[email protected]>
Not all fields contain search-relevant information, and fields may require distinct embedding functions, especially for multi-modal data. Users must explicitly specify source fields for embeddings in their schema to ensure optimal performance and accuracy in vector search operations. This is why the adapter has to be written by the user that understands what they are trying to achieve. If we want the adapter auto generated for the example we'd have to add this to the |
Signed-off-by: Marcel Coetzee <[email protected]>
Actually it seems that you can. Like you demonstrated above ( also ran it on my end), the adapter seems to work just fine with the pokemon source. I speak under correction here though. I actually wasn't aware the an adapter also works on sources. The general flow at least in my experience is to use the adapter with resources. |
Signed-off-by: Marcel Coetzee <[email protected]>
…on module Signed-off-by: Marcel Coetzee <[email protected]>
Thanks for this catch @akelad ! This should be fixed now. Kindly test again? |
The fixed import works now! About the dlt init command - that's fine with me, maybe we should add some sort of note either in the generated code or as a warning that you have to add the lancedb_adapter for embeddings? Because out of the box it will just act as a normal DB. @sh-rp what do you think?
Actually the example I showed there only runs on a resource: |
# Conflicts: # poetry.lock
@akelad thank you so much for the detailed investigation. Apologies, you are right the following does indeed work: lancedb_adapter(pokemon_source.resources['pokemon'], embed=["name"])
load_info = pipeline.run(pokemon_source) whereas the following doesn't raise a warning or exception, but nothing is embedded: lancedb_adapter(pokemon_source, embed=["name"])
load_info = pipeline.run(pokemon_source) I've updated the docs to make it clear that the adapter should only be used on resources, and that no fields will be embedded unless explicitly listed in the adapter. |
Signed-off-by: Marcel Coetzee <[email protected]>
# Conflicts: # dlt/destinations/adapters.py # poetry.lock
Description
This PR adds support for using LanceDB as a destination.
Related Issues
Additional Context
With this change, dlt users can easily load their transformed data into LanceDB for efficient vector similarity search, full-text search, and SQL querying. LanceDB's ability to store raw data alongside vector embeddings makes it a powerful destination for AI applications.