Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot Upload docx document to Milvus Database because of DOCXMetadata #8727

Open
saikanov opened this issue Jan 16, 2025 · 0 comments
Open
Labels
P2 Medium priority, add to the next sprint if no P1 available

Comments

@saikanov
Copy link

saikanov commented Jan 16, 2025

Describe the bug
Cannot upload docx file to Milvus database because of DOCXMetadata

Error message
TypeError: 'DOCXMetadata' object is not subscriptable

Additional context
Add any other context about the problem here, like document types / preprocessing steps / settings of reader etc.

To Reproduce
Use DOCX pipeline with Milvus as vectordb

So i already fix this issue at the time i post this, the issue is about the DOCXMetadata cannot be indexed, and after knowing the issue i try to pop(delete) the metadata and it works fine.

after that i go to [haystack/components/converters/docx.py ](https://github.com/deepset-ai/haystack/blob/main/haystack/components/converters/docx.py)

and edit the merged_metadata variable so it not include the DOCXMetadata
merged_metadata = {**bytestream.meta, **metadata}

and now it work with Pipeline

The thing i want to ask is, what is DOCXMetadata do? does it only error on milvus? and is it fine to not include it to resolve my issue?

Thanks!

@julian-risch julian-risch added the P2 Medium priority, add to the next sprint if no P1 available label Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 Medium priority, add to the next sprint if no P1 available
Projects
None yet
Development

No branches or pull requests

2 participants